# Incorporating infrastructure and vehicle technology requirements, changes in demand, and Decarbonization policies’ considerations into freight planning

Dataset DOI: [10.5061/dryad.tb2rbp0d1](10.5061/dryad.tb2rbp0d1)

## Description of the data and file structure

#### **Data was collected for the project:**

Incorporating infrastructure and vehicle technology requirements, changes in demand, and decarbonization policies considerations into freight

### Files and variables

#### File: Deliverables.zip

**Description:** Deliverables.zip contains five files described below.

List of deliverables included in the ZIP file:

1. synthetic_aggregated_23.gpkg: This file contains an aggregation of the 2023 synthetic population, derived from the analysis, and is organized by census tract, including shopping characteristics and census geometry. Variables:
   * GEOID: Numeric codes that uniquely identify a census tract geographic area.
   * NAME: Combination of census tract number, county, and state.
   * total_pop: Number of inhabitants per census tract.
   * shop_in_store: Number of in-store shoppers per census tract.
   * shop_online: Number of online shoppers per census tract.
   * pct_instore: Percentage of in-store shoppers per census tract.
   * pct_online: Percentage of online shoppers per census tract.
   * gi_store: Local Getis-Ord Gi∗ value used to identify statistically significant spatial clusters of high or low values for in-store purchases.
   * gi_online:  Local Getis-Ord Gi∗ value used to identify statistically significant spatial clusters of high or low values for online purchases.
   * demand_capita: Delivery Demand per Capita
   * geometry: Polygon geometry per census tract.
2. Store_shopping_emissions_map.gpkg: This dataset contains the estimation of CO2, NOx, and PM25 for in-store shopping assigned to the origin of the shopping trip. Data was collected from the EMFAC2021 software by the California Air Resources Board. Data is aggregated by census tract, considering the years 2017 and 2023. Variables:
   * GEOID: Combination of census tract number, county, and state.
   * estimated_CO2: CO2 estimation for the synthetic population and aggregated by census tract.
   * estimated_NOx: NOx estimation for the synthetic population and aggregated by census tract.
   * estimated_PM25: PM25 estimation for the synthetic population and aggregated by census tract.
   * NAME: Combination of census tract number, county, and state.
   * total_pop: Number of inhabitants per census tract.
   * geometry: Polygon geometry per census tract.
3. Combined_services.gpkg: This file contains a combination of all the retail locations with the category, description and geometry. The data in this file was collected under the Open Data Commons Open Database License (ODbL), OpenStreetMap Foundation (OSMF). Data was retrieved from January to March 2025. Variables:
   * fid: Feature Identifier
   * oms_id: OpenStreetMap’s identifier.
   * category: Type of service provided by the retail, which includes restaurants, accommodation, food service, technical, health.
   * retail: Type of service provided by the retail but more disaggregated.
   * geometry: Polygon geometry of the building area.
4. Merge_facilities_buildings.gpkg: This file contains industry buildings with a sample of freight facilities, including characteristics such as classification, height, and building area. The data in this file was collected under the Open Data Commons Open Database License (ODbL), OpenStreetMap Foundation (OSMF). Data was retrieved from January to March 2025.  Data was complemented using the Overture Maps Dataset from Amazon S3 and Microsoft Azure Blob Storage using an API key. Variables:
   * fid: Feature Identifier
   * Search_Type: Term used for searching the facility in OpenStreetMap. No classified means the facility was identified from other sources, such as Overture.
   * lon: longitude finder.
   * lat: latitude finder.
   * Classification: Classification of facilities based on two-digit North American Industry Classification System (NAICS).
   * height: building height information in meters.
   * area: building area in square degrees.
5. Facilities_year.csv: This file contains a sample of freight facilities with the year of construction, name, address, type of facility, and geometry. The data in this file was collected under the Open Data Commons Open Database License (ODbL) OpenStreetMap Foundation (OSMF). Data was retrieved from January to March 2025. Imagery data from Google Earth Studio. © Google was used as a complementary source of information to verify and complete missing data. Variables:
   * Facility_Name: Name of the freight facility.
   * Address: Address of the freight facility.
   * Search_Type: Classification based on two-digit North American Industry Classification System (NAICS).
   * lon: longitude finder.
   * lat: latitude finder.
   * Year: year of the facility's starting operation.

## Code/software

To read the .csv data, users can open it directly in a spreadsheet program like Excel or Google Sheets.

Users can use various GIS software and programming languages, including ArcGIS, QGIS, or Manifold, to read a shapefile within a GeoPackage (.gpkg) file. They can also read shapefiles using programming languages such as Python and R.

## Access information

Data was derived from the following sources:

* American Time Use Survey (ATUS)

- American Community Survey (ACS)

* National Household Travel Survey (NHTS)

- Emissions Inventory - EMFAC2021 software

* OpenStreetMap (OSM)

Additional information such as ATUS, ACS, or NHTS data collected, and synthetic population created, can be provided upon request via email.

## Human subjects data

This dataset includes human subjects data obtained from public-use datasets that have been fully de-identified and made publicly available by the original data providers. Specifically:

*The American Community Survey (ACS) and American Time Use Survey (ATUS) data were accessed via the IPUMS USA platform using an API. IPUMS USA provides anonymized, public-use microdata that are de-identified in accordance with U.S. Census Bureau disclosure standards. All direct identifiers have been removed, and statistical techniques have been applied to reduce the risk of re-identification.

*The Longitudinal Employer-Household Dynamics (LEHD) data were accessed via an API and consist of aggregated or anonymized public-use datasets from the U.S. Census Bureau. These data comply with strict confidentiality protocols under Title 13 of the U.S. Code.

*The California Communities Environmental Health Screening Tool (CalEnviroScreen 4.0) includes environmental, health, and demographic indicators at the census tract level. The data are aggregated and do not contain any personally identifiable information (PII).

*The California Energy Commission, California Air Resources Board (EMFAC tool), and National Household Travel Survey (NHTS) datasets were downloaded from public sources. These datasets are anonymized and do not include individual-level identifiers.

*The County Business Patterns (CBP) and Economic Census (ECNSVY) data were obtained from the U.S. Census Bureau's public data portals. These are aggregated at the business and geographic level and do not include PII.

No personally identifiable information (PII) or protected health information (PHI) is included in the data deposited to Dryad. All data used in this submission are in the public domain or provided under terms consistent with a CC0 license, and have been de-identified by the original data providers in accordance with legal and ethical standards for public data release. No additional participant consent was required beyond the terms established by the original data custodians.