Project Image Bayesian Nonparametric Modeling of Driver Behavior

Modern vehicles are equipped with increasingly complex sensors. These sensors generate large volumes of data that provide opportunities for modeling and analysis. Here, we are interested in exploiting this data to learn aspects of behaviors and the road network associated with individual drivers. Our dataset is collected on a standard vehicle used to commute to work and for personal trips. A Hidden Markov Model (HMM) trained on the GPS position and orientation data is utilized to compress the large amount of position information into a small amount of road segment states. Each state has a set of observations, i.e. car signals, associated with it that are quantized and modeled as draws from a Hierarchical Dirichlet Process (HDP). The inference for the topic distributions is carried out using MCMC split-merge sampling and online variational inference algorithms. The topic distributions over joint quantized car signals characterize the driving situation in the respective road state. In a novel manner, we demonstrate how the sparsity of the personal road network of a driver in conjunction with a hierarchical topic model allows data driven predictions about destinations as well as likely road conditions.

People Involved: Julian Straub, Sue Zheng, Vadim S. Smolyakov, John W. Fisher III

Vehicles are increasingly equipped with sensors and electronics to react dynamically to changing road conditions and to increase driver safety. As such, large volumes of driver-specific data related to driving conditions and driver behavior are generated. We are interested in analyzing this data to learn models of driving behavior. Such models could be used to anticipate dangerous situations, to improve the driving schedule of a person, and to tailor various aspects of the driving experience to the individual.

Here, we use data collected from one vehicle’s sensors over numerous trips to construct a Hierarchical Dirichlet Process (HDP) model of driving behavior and road conditions. HDPs are commonly used for topic modeling of text corpora to uncover the set of topics that comprise each document in the corpus. In our case, the documents are road segments and the words are associated quantized sensor measurements. The topics in the HDP model are sensor distributions in the road segments; these distributions capture the driving conditions in each road segment as encountered by the driver as well as their driving behavior and common driving conditions. To our knowledge this is a new approach for modeling driving behavior. Unlike related work which is based on assumptions about the capabilities and behaviors of humans, our model is purely data driven.

It is important to note that the hierarchy within the HDP model allows sharing of measurements across similar road segments. This is an appealing aspect of the model since it enables us to learn an expressive model for road segments which are visited rarely via similar road segments that are visited more often. In order to utilize an HDP model, we first organize the sensor data into ”documents” (i.e., road segments and their associated quantized measurements). We consider the case in which a road map is not available, however, it is straightforward to incorporate such information. Additionally, typical drivers often traverse a small subset of the roads in the road network. We use a Hidden Markov Model (HMM) to learn the road segments. The HMM condenses position information from recorded trips into road segment states. The set of hidden states effectively corresponds to a sparse road network which consists only of the roads which the driver has traversed. We then use the trained HMM to associate sensor measurements to road segments to produce ”documents” for the HDP model.

In addition to organizing the data for the HDP model, the HMM also provides insight into driver behavior such as typical routes and probable destinations. Special hidden states are introduced in the HMM to represent starting locations (sources) and destinations. Consequently, identification of the most likely route between two states and finding the distribution over probable destinations become well-posed questions and allow us to make route and destination predictions.

The contributions of this paper are (1) to show how sparsity in the HMM transition matrix together with starting and absorption states lead to accurate long term predictions of driver routes and destinations and (2) the novel application of a HDP to model the joint distribution of quantized vehicle signals.

The top plot shows that the predicted route (blue) aligns with observed trip (green). Absorption probabilities corresponding to the five most likely destinations of the model are shown in the bottom plot as the trip progresses in time. The absorption probability of the true destination (green) dominates over alternative possible destinations early in the trip.

Plots in the first two rows depict the road states in red, in which the respective topic has the highest likelihood. In the third row, the respective topics are shown. These are categorical distributions over the joint quantized speed and time-of-day measurements. The probabilities are color-coded from blue (low) to red (high).

MCMC split-merge sampling and Stochastic Variational Inference (SVI) algorithms were used to infer the topic distributions and mixing proportions associated with car sensory data. The top 4 matched global topic distributions along with their location on the map are shown below

The first four plots show the location of the highest probability matched global topic distributions for split-merge MCMC and SVI algorithms.