Kinetic-Injury-Triage

This repository contains the full pipeline for the paper Classification of Kinetic-Related Injury in Hospital Triage Data Using NLP, published in Advanced Data Mining and Applications: 21st International Conference, ADMA 2025, Kyoto, Japan, October 22–24, 2025, Proceedings, Part III — Lecture Notes in Computer Science (LNCS, vol. 16199), Lecture Notes in Artificial Intelligence subseries, © 2026, Springer Nature. The paper is on Springer Nature ADMA 2025.

This repository includes scripts for pretraining, fine-tuning, prediction, hyperparameter search, and statistical analysis, designed to run across:

HPC (using either Slurm or PBS schedulers)
Local machine (Python scripts only)

Full results are at WSU data repository. Note that these results were run on the NCI, therefore are slightly different from the results in the paper due to hardware differences. The full NCI results analysis are in the NCI_results branch.

Repository Structure

`Nextflow/`

This folder contains Nextflow pipeline files, it is currently designed for the NCI Gadi system which uses the PBS scheduler but can easily be adopted for Slurm scheduler.

Scripts include:

gadi_nfcore_report.sh – collects resource usage from work/ directory (adapted from nf-core configs)
gadi_nf_extract_tasks.sh – extracts job names and .command.run information for correlating with parameters
merge_resource_report.py – merges resource usage with hyperparameter configurations and Nextflow trace
nextflow.config – defines PBS Pro settings for NCI Gadi queueing system
main.nf – Nextflow pipeline script managing the workflow stages
pbs_nf.sh – PBS submission script to launch main.nf

Notes:

NCI Gadi uses PBS Pro.
Nextflow handles job orchestration, but each task is launched as a PBS job behind the scenes.
GPU and CPU usage are explicitly defined in nextflow.config and .nf files.

See Nextflow/README.md for more details.

`Results/` – Metrics & Analysis

Contains model results, timing benchmarks, and supplementary materials:

fine_tune_results.csv, fine_tune_prediction_results.csv – Fine-tuning metrics (Ingham One → Ingham Two)
prediction_results.csv, training_results.csv – Pretraining results (MIMIC dataset)
Supplementary.pdf – Summary of results (Adam/SGD comparisons)
resultsAnalysis.ipynb – Jupyter notebook for plots and statistical summaries
resComparison.R, ttestAllPairs.R – Paired t-tests and statistical tests
Timing CSVs for CPU vs GPU performance

See Results/README.md for more.

`Scripts/` – Python Code (CPU/GPU)

Contains all core Python scripts for model training, fine-tuning, and prediction:

Bio_ClinicalBERTClassifier.py – modified wrapper for emilyalsentzer/Bio_ClinicalBERT
- Supports layer freezing/unfreezing, mixed precision, per-epoch metrics, robust checkpointing
train.py – Pretraining script (MIMIC)
finetune.py – Fine-tuning script (Ingham One)
predict.py – Generates predictions and evaluation reports
configCreator.py – Generates hyperparameter search CSVs
Aggregate_Results.py – Aggregates output logs into final CSVs
MIMIC3_data_preprocessing.ipynb – Notebook for preparing MIMIC data

See Scripts/README.md for usage.

`Slurm/`

This folder contains Slurm job scripts for running the model on HPCs that use Slurm job scheduler.

Scripts include:

parameter_search.slurm – Pretraining on GPUs (MIMIC dataset). Launches 810 parallel jobs with varying hyperparameters.
finetune_models.slurm – Fine-tuning on CPUs (Ingham One dataset). Launches 240 jobs reading from finetune_parameters.csv.
predict.slurm – Runs predictions on fine-tuned models. Outputs per-sample predictions and metrics.

Hyperparameter Configuration:

parameter_search.csv – optimizer, learning rate, dropout, layer unfreeze, seeds (pretraining)
finetune_parameters.csv – same parameters for fine-tuning sweep

See Slurm/README.md for job instructions.

Workflow Summary

Data Preparation
Prepare CSVs with TEXT, LABEL, and ID columns.
Training & Fine-tuning

Use train.py for MIMIC dataset pretraining (typically GPU)
Use finetune.py for Ingham One fine-tuning (typically CPU)

Prediction & Evaluation
Run predict.py to evaluate models on Ingham Two dataset.
Hyperparameter Search
Use configCreator.py + Slurm job arrays to sweep parameters.
Results Aggregation
Run Aggregate_Results.py to compile final CSVs.
Statistical Analysis
Run resComparison.R or resultsAnalysis.ipynb for plots and paired t-tests.

Requirements

Python 3.8–3.10
transformers, torch, pandas, scikit-learn, numpy
R with ggplot2

See requirements.txt for Python packages.

Outputs

Final outputs include:

Model weights: Outputs/models/bcbert_runs/ and Outputs/models/cpu_finetune/
Per-run metrics: Outputs/models/*/results.csv
Aggregated metrics: Results/*.csv
Statistical plots and comparisons

Contact

For questions or collaboration:

CRMDS / Western Sydney University
South Western Emergency Research Institute (SWERI) / Ingham Institute for Applied Medical Research

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kinetic-Injury-Triage

Repository Structure

`Nextflow/`

Scripts include:

Notes:

`Results/` – Metrics & Analysis

`Scripts/` – Python Code (CPU/GPU)

`Slurm/`

Scripts include:

Hyperparameter Configuration:

Workflow Summary

Requirements

Outputs

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
DataRelease		DataRelease
Nextflow		Nextflow
Results		Results
Scripts		Scripts
Slurm		Slurm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Kinetic-Injury-Triage

Repository Structure

Nextflow/

Scripts include:

Notes:

Results/ – Metrics & Analysis

Scripts/ – Python Code (CPU/GPU)

Slurm/

Scripts include:

Hyperparameter Configuration:

Workflow Summary

Requirements

Outputs

Contact

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Nextflow/`

`Results/` – Metrics & Analysis

`Scripts/` – Python Code (CPU/GPU)

`Slurm/`

Packages