Harvesting data from advanced technologies.
Advanced Search
Select up to three search categories and corresponding keywords using the fields to the right. Refer to the Help section for more detailed instructions.

Search our Collections & Repository

For very narrow results

When looking for a specific result

Best used for discovery & interchangable words

Recommended to be used in conjunction with other fields



Document Data
Clear All
Clear All

For additional assistance using the Custom Query please check out our Help Page


Harvesting data from advanced technologies.

Filetype[PDF-3.64 MB]



  • Creators:
  • Corporate Creators:
  • Corporate Contributors:
  • Subject/TRT Terms:
  • Publication/ Report Number:
  • Resource Type:
  • Geographical Coverage:
  • Corporate Publisher:
  • Abstract:
    Data streams are emerging everywhere such as Web logs, Web page click streams, sensor data streams, and credit card transaction flows.

    Different from traditional data sets, data streams are sequentially generated and arrive one by one rather than being available for random access

    before learning begins, and they are potentially huge or even infinite that it is impractical to store the whole data.

    To study learning from data streams, we target online learning, which generates a best–so far model on the fly by sequentially feeding in the newly

    arrived data, updates the model as needed, and then applies the learned model for accurate real-time prediction or classification in real-world

    applications. Several challenges arise from this scenario: first, data is not available for random access or even multiple access; second, data

    imbalance is a common situation; third, the performance of the model should be reasonable even when the amount of data is limited; fourth, the

    model should be updated easily but not frequently; and finally, the model should always be ready for prediction and classification. To meet these

    challenges, we investigate streaming feature selection by taking advantage of mutual information and group structures among candidate features.

    Streaming feature selection reduces the number of features by removing noisy, irrelevant, or redundant features and selecting relevant features on

    the fly, and brings about palpable effects for applications: speeding up the learning process, improving learning accuracy, enhancing generalization

    capability, and improving model interpretation. Compared with traditional feature selection, which can only handle pre-given data sets without

    considering the potential group structures among candidate features, streaming feature selection is able to handle streaming data and select

    meaningful and valuable feature sets with or without group structures on the fly.

    In this research, we propose 1) a novel streaming feature selection algorithm (GFSSF, Group Feature Selection with Streaming Features) by

    exploring mutual information and group structures among candidate features for both group and individual levels of feature selection from streaming

    data, 2) a lazy online prediction model with data fusion, feature selection and weighting technologies for real-time traffic prediction from

    heterogeneous sensor data streams, 3) a lazy online learning model (LB, Live Bayes) with dynamic resampling technology to learn from

    imbalanced embedded mobile sensor data streams for real-time activity recognition and user recognition, and 4) a lazy update online learning

    model (CMLR, Cost-sensitive Multinomial Logistic Regression) with streaming feature selection for accurate real-time classification from

    imbalanced and small sensor data streams. Finally, by integrating traffic flow theory, advanced sensors, data gathering, data fusion, feature

    selection and weighting, online learning and visualization technologies to estimate and visualize the current and future traffic, a real-time

    transportation prediction system named VTraffic is built for the Vermont Agency of Transportation.

  • Format:
  • Funding:
  • Collection(s):
  • Main Document Checksum:
  • Download URL:
  • File Type:

Supporting Files

  • No Additional Files
More +

You May Also Like

Checkout today's featured content at rosap.ntl.bts.gov