Curators to the Rescue: New Strategies for Making Legacy Data Accessible to the Public
-
2023-12-15
Details:
-
Creators:
-
Corporate Creators:
-
Corporate Contributors:
-
Subject/TRT Terms:
-
Publication/ Report Number:
-
DOI:
-
Resource Type:
-
Right Statement:
-
Geographical Coverage:
-
Corporate Publisher:
-
Abstract:As the Bureau of Transportation Statistics (BTS) in 2019 began planning for the 2021 return of the Vehicle Inventory and Use Survey (VIUS), the National Transportation Library (NTL) Data Services team was tasked with locating and sharing the historic, digital data tables and reports from the previous VIUS surveys. The NTL Data Services team was able to locate and provide digital files of the legacy VIUS data beginning with 1977. However, the 1963, 1967, and 1972 Truck Inventory and Use Surveys (TIUS) needed to be addressed as well. Unfortunately for the Data Services team, those data tables were trapped in the PDF scans of the original, 50-year-old print documents. How was the NTL Data Services team able to liberate this data and make it reusable for transportation statisticians, researchers, and the public? The NTL had to create new workflows and strategies for rescuing legacy datasets and reports.
Legacy data, which are older datasets that are trapped in non-machine-readable formats, have not been accessible or easily usable by researchers for decades. The NTL Data Services team is working to make these tables trapped behind pdf-scans accessible using ABBYY FineReader PDF software. ABBYY FineReader uses optical character recognition to create a machine-readable text layer embedded in the PDF, making each report and table searchable and editable, where the OCR text needs to be corrected. Additionally, the program allows for the export of these data tables into tabular formats. Using these new techniques, legacy Truck Inventory and Use Surveys have become available to the public and researchers in non-print form for the first time in decades, providing an opportunity for a complete longitudinal analysis of the legacy TIUS/VIUS data just as the 2021 VIUS data is being released!
The NTL Data Services team efforts can be replicated by other research programs wishing to liberate useful data from PDF scans. This poster will highlight NTL activities around: Historical TIUS data tables and reports rescue efforts; Incorporating new technologies, such as ABBYY FineReader PDF, into data curation workflows; Increasing accessibility for the entirety of the Truck Inventory and Use Survey/Vehicle Inventory and Use Survey series from 1963 to today; and describe innovative data rescue workflows that can be implemented at other institutions.
-
Content Notes:A version of this poster was first presented at the Transportation Research Board 103rd Annual Meeting, on January 8, 2024. The poster, P24-20518, was presented as part of the TRB Research Innovation Implementation Management (RIIM) Committee (AJE35) poster session "Solicited Research Management and Innovation."
-
Format:
-
Collection(s):
-
Main Document Checksum:
-
Download URL:
-
File Type: