Deep Learning, Machine Learning, or Statistical Models for Weather-Related Crash Severity Prediction [supporting dataset]
-
2023-12-01
-
Details:
-
Creators:
-
Corporate Creators:
-
Corporate Contributors:
-
Subject/TRT Terms:
-
Publication/ Report Number:
-
DOI:
-
Resource Type:
-
Geographical Coverage:
-
Corporate Publisher:
-
Abstract:Nearly 5,000 people are killed and more than 418,000 are injured in weather-related traffic incidents each year. Assessments of the effectiveness of statistical models applied to crash severity prediction compared to machine learning (ML) and deep learning techniques (DL) help researchers and practitioners know what models are most effective under specific conditions. Given the class imbalance in crash data, the synthetic minority over-sampling technique for nominal (SMOTE-N) data was employed to generate synthetic samples for the minority class. The ordered logit model (OLM) and the ordered probit model (OPM) were evaluated as statistical models, while random forest (RF) and XGBoost were evaluated as ML models. For DL, multi-layer perceptron (MLP) and TabNet were evaluated. The performance of these models varied across severity levels, with property damage only (PDO) predictions performing the best and severe injury predictions performing the worst. The TabNet model performed best in predicting severe injury and PDO crashes, while RF was the most effective in predicting moderate injury crashes. However, all models struggled with severe injury classification, indicating the potential need for model refinement and exploration of other techniques. Hence, the choice of model depends on the specific application and the relative costs of false negatives and false positives. This conclusion underscores the need for further research in this area to improve the prediction accuracy of severe and moderate injury incidents, ultimately improving available data that can be used to increase road safety.
The total size of the zip file is 13.252 MB. The .txt file type is a common text file, which can be opened with a basic text editor. The most common software used to open .txt files are Microsoft Windows Notepad, Sublime Text, Atom, and TextEdit (for more information on .txt files and software, please visit https://www.file-extensions.org/txt-file-extension). The .xlsx and .xls file types are Microsoft Excel files, which can be opened with Excel, and other free available spreadsheet software, such as OpenRefine.
-
Content Notes:National Transportation Library (NTL) Curation Note: As this dataset is preserved in a repository outside U.S. DOT control, as allowed by the U.S. DOT’s Public Access Plan (https://doi.org/10.21949/1503647) Section 7.4.2 Data, the NTL staff has performed NO additional curation actions on this dataset. The current level of dataset documentation is the responsibility of the dataset creator. This dataset has been curated to CoreTrustSeal's curation level "C. Initial Curation." To find out more information on CoreTrustSeal's curation levels, please consult their "Curation & Preservation Levels" CoreTrustSeal Discussion Paper" (https://doi.org/10.5281/zenodo.8083359). NTL staff last accessed this dataset at its repository URL on 2024-01-30. If, in the future, you have trouble accessing this dataset at the host repository, please email NTLDataCurator@dot.gov describing your problem. NTL staff will do its best to assist you at that time.
-
Format:
-
Funding:
-
Collection(s):
-
Main Document Checksum:
-
Download URL:
-
File Type: