U.S. flag An official website of the United States government.
Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

i

Harvesting data from advanced technologies.

File Language:
English


Details

  • Creators:
  • Corporate Creators:
  • Corporate Contributors:
  • Subject/TRT Terms:
  • Publication/ Report Number:
  • Resource Type:
  • Geographical Coverage:
  • Corporate Publisher:
  • Abstract:
    Data streams are emerging everywhere such as Web logs, Web page click streams, sensor data streams, and credit card transaction flows.

    Different from traditional data sets, data streams are sequentially generated and arrive one by one rather than being available for random access

    before learning begins, and they are potentially huge or even infinite that it is impractical to store the whole data.

    To study learning from data streams, we target online learning, which generates a best–so far model on the fly by sequentially feeding in the newly

    arrived data, updates the model as needed, and then applies the learned model for accurate real-time prediction or classification in real-world

    applications. Several challenges arise from this scenario: first, data is not available for random access or even multiple access; second, data

    imbalance is a common situation; third, the performance of the model should be reasonable even when the amount of data is limited; fourth, the

    model should be updated easily but not frequently; and finally, the model should always be ready for prediction and classification. To meet these

    challenges, we investigate streaming feature selection by taking advantage of mutual information and group structures among candidate features.

    Streaming feature selection reduces the number of features by removing noisy, irrelevant, or redundant features and selecting relevant features on

    the fly, and brings about palpable effects for applications: speeding up the learning process, improving learning accuracy, enhancing generalization

    capability, and improving model interpretation. Compared with traditional feature selection, which can only handle pre-given data sets without

    considering the potential group structures among candidate features, streaming feature selection is able to handle streaming data and select

    meaningful and valuable feature sets with or without group structures on the fly.

    In this research, we propose 1) a novel streaming feature selection algorithm (GFSSF, Group Feature Selection with Streaming Features) by

    exploring mutual information and group structures among candidate features for both group and individual levels of feature selection from streaming

    data, 2) a lazy online prediction model with data fusion, feature selection and weighting technologies for real-time traffic prediction from

    heterogeneous sensor data streams, 3) a lazy online learning model (LB, Live Bayes) with dynamic resampling technology to learn from

    imbalanced embedded mobile sensor data streams for real-time activity recognition and user recognition, and 4) a lazy update online learning

    model (CMLR, Cost-sensitive Multinomial Logistic Regression) with streaming feature selection for accurate real-time classification from

    imbalanced and small sensor data streams. Finally, by integrating traffic flow theory, advanced sensors, data gathering, data fusion, feature

    selection and weighting, online learning and visualization technologies to estimate and visualize the current and future traffic, a real-time

    transportation prediction system named VTraffic is built for the Vermont Agency of Transportation.

  • Format:
  • Funding:
  • Collection(s):
  • Main Document Checksum:
    urn:sha256:60fb7cad8f5cf91bf55ff522baa037497d80da3182f8a7832056d00f40f04f65
  • Download URL:
  • File Type:
    Filetype[PDF - 3.64 MB ]
File Language:
English
ON THIS PAGE

ROSA P serves as an archival repository of USDOT-published products including scientific findings, journal articles, guidelines, recommendations, or other information authored or co-authored by USDOT or funded partners. As a repository, ROSA P retains documents in their original published format to ensure public access to scientific information.