Analysis and Forecasting for Traffic Flow Data

The urban transportation system involves the challenging task of transferring people and materials across densely populated areas, and hence its operational efficiency directly affects the entire city. In this study, we overcome the restriction of both time and space by introducing an online version of the principal component analysis (PCA), called the projection approximation subspace tracking with deflation (PASTd) algorithm. The algorithm is implemented to derive core traffic patterns of traffic flow data of Baltimore, Maryland, US. The k-nearest-neighbor (KNN) method is applied to predict the values of these core traffic patterns in the near future. Thus, the traffic information of Baltimore County can be forecasted with linear complexity and traffic congestion can be traced with little latency. Unlike traditional traffic prediction methods, our method aims at network-level prediction, regardless of urban or freeway road segments. The results show that our forecasting method is efficient, flexible, and robust.


Introduction
Traffic data information plays a very important role in our daily life.(3) In general, we regard traffic data as the streaming data generated at regular time intervals.At each time step, we receive traffic information about a large number of road segments, which has to be analyzed and disseminated in real time.On the other hand, vehicle operators would like to receive immediate up-to-date traffic summaries and cannot afford any postprocessing. (4)n effective way to handle this problem is to reduce the large volumes of traffic data into a small number of meaningful trends that can be updated and broadcasted in real time.Traffic flow data correlate with the information about most road segments.Hence, one possible research direction is to use clustering algorithms to group together road segments that follow similar traffic patterns.Instead of analyzing the traffic of n road segments, we can analyze the patterns of k groups, where k is much smaller than n.Applying self-organizing maps (SOMs) is a possible approach, which has been used in postprocessing analysis. (5)Mixture models present good opportunities for streaming data analysis (6,7) as well.Here, we pursue a different approach based on finding patterns (or hidden variables) such that the time series of average speeds for each road segment can be generated using a linear combination of these patterns.References 8-10 explicitly focus on discovering hidden variables.In CluStream, (8) patterns are found by an offline strategy based on stored data.Sakurai et al. (9) determined lag correlations among multiple streams.StatStream (10) uses discrete Fourier transform (DFT) to summarize streams within a finite window size.We would like to find a method that can find a relatively few inherent patterns in an online fashion with linear complexity, with no need for data buffering.
Short-term traffic prediction plays a crucial role in an intelligent transportation system (ITS).With reliable forecasting data, administrators can manage traffic networks effectively and travelers can decide on departure time or travel routes more easily. (11)Many statistical models have been proposed for short-term traffic forecasting.For example, time series models, (12,13) Bayesian models, (14) Kalman filter models, (15) and support vector machine regression models (16) have been widely applied to predict motorway and freeway traffic conditions.Neural network models using artificial intelligence algorithms (17)(18)(19) and unsupervised machine learning algorithms (20) have also gained researchers' attention recently.Until now, most models focus on motorways and freeways. (21,22)A network-level method is needed for better prediction.We require that the prediction method is efficient and scalable.Even though the number of road segments can become very large, the method should be able to make reliable prediction in real time.In this paper, we propose a method that can meet all the following requirements: online use, linear complexity, no need for data buffering, scalability, network level, and reliability.

Pattern Discovery for Traffic Flow Data
Problem Formulation Given n time series corresponding to average speeds on n road segments, updated at each time step t, we aim to determine k hidden variables, where k n, such that the linear combinations of these k hidden variables can be used to reconstruct the time series data.Thus, the dimension of the data set is significantly reduced.As a result, we can make more effective, low-cost prediction for speeds in the near future.Figure 1 shows an example of a time series of average speeds for a road segment over a week.The x-axis represents the minutes in a week ranging from 1 to 10080, while the y-axis represents the corresponding average speed in mph.

Principal component analysis (PCA)
PCA is a popular tool for data analysis by which high-dimensional data are projected onto a low-dimensional subspace while preserving most of the variance in the data.The method is simple and nonparametric. (4)In essence, PCA can be applied to reduce the dimension of a complex data set while revealing the hidden, simplified patterns underlying the data.In the following, x t = [x 1,t , x 2,t , ..., x n,t ] T n ∈ R is an n-dimensional column vector of average speeds of different road segments at time step t.X t = [x 1 , x 2 , ..., x t ] n t × ∈ R can be viewed as an n × t matrix, where a new column is added at each time step t.
There are several ways to explain the PCA technique.One way is to model the vector x t as a linear combination of k hidden variables.That is, we express x t = Wz t , where z t are k hidden variables whose values depend on the time step t, and k n.W is an n × k orthonormal matrix to be determined.Since W is orthonormal, WW T = I k×k .Hence, we deduce that z t = W T x t .Using this model, we can reconstruct each x t using t x = WW T x t .Assume that we want to focus on a time window of size T, and that we would like to reconstruct all the data within this window, say, X T = [x 1 , ..., x T ].Then, our optimization (minimizing reconstruction error) can be formulated as ( ) (1) Using the singular-value decomposition (SVD) technique, the solution can be expressed as W = [w 1 , ..., w k ], where each column w i is the eigenvector corresponding to the i-th largest eigenvalue of X T .Then, the hidden variables are given by z R .Thus, for any given k < n, we can find an orthonormal matrix W and the k hidden variables z t to reconstruct the data.

Discovering hidden variables
In this section, we show how to use PCA to find the most important patterns underlying our complex traffic data set.We firstly introduce a conventional PCA method to find the hidden variables so as to reconstruct the data set as accurately as possible.Then, we describe the online method that can update hidden variables at every time step with linear complexity.
Problem Formulation Given n time series corresponding to average speeds on n road segments, updated at each time step t, we aim to determine k hidden variables z t , where k n, such that linear combinations of these k hidden variables can be used to reconstruct the data matrix within any time window of size T. Thus, the dimension of the data set is significantly reduced.As a result, we can make more effective, low-cost prediction of speeds in the near future.

Offline PCA pattern discovery
As discussed in Sect.2.1, we can identify the hidden variables by computing the eigenvectors of the sample covariance matrix of our input data.Then, we can use the first k eigenvectors to reconstruct the data matrix.More details are described in the following algorithm that generates the k hidden variables corresponding to the traffic data of n road segments over a time window of size T.

Algorithm 1. PCA Pattern Discovery
Given: window size T, number of hidden variables k.After receiving every set of T streaming data vectors, we 1. organize the data into an n T × matrix, i.e., X T n T × ∈ R , 2. normalize X T , 3. calculate the k eigenvectors corresponding to the k largest eigenvalues of X T , i.e., w 1 , ..., w k , 4. compute the k hidden variables z t , where x t is the t-th column of X T , and 5. reconstruct the data matrix T = X WZ .
W = [w 1 , ..., w k ] is called the weight matrix.For each element w i,j , the magnitude , i j w provides some indication on how much the i-th segment depends on the j-th hidden variable. ( 4

Online projection approximation subspace tracking with deflation (PASTd) pattern discovery
The previous PCA algorithm requires the buffering of the data for every time window and a significant amount of computation, namely, computing the first few eigenvectors of the sample covariance matrix, which can be fairly large.The PASTd algorithm, which is based on adaptive filtering techniques and PCA, is an online method that updates the hidden variables and weight matrix incrementally in linear time.The PASTd algorithm has been shown to perform very well in various settings and different applications, such as signal tracking for antenna arrays and image compression.
As each point x t arrives, set 1 ˆt = x x. 2. For 1 ≤ i ≤ k, we perform the following assignments and updates: .These eigenvalues may be used to estimate the number of hidden variables k, if it is not given.The use of the forgetting factor 0 1 γ < ≤ is intended to ensure that the data matrix t S is more dependent on the most recent data.Since the traffic data is nonstationary, γ can guarantee the tracking ability and will give more precise estimates of the eigencomponents.The vector e i (t) is the error between the true data and the reconstruction, and e i (t) ⊥ w i (t).The step for updating the eigenvector w i (t) can be interpreted as a gradient descent method with a self-tuning step size d i (t).
In the next section, instead of predicting average speeds, we predict hidden variables.Thus, significant amounts of computational time and energy could be saved.The method of using the PASTd to make travel time prediction is shown as where f(t) is the forecasting hidden variables ( ) and ( ) ˆ1 t + x is the forecasted speed at time t.

Short-term Forecasting for Traffic Flow Data
In this section, we start by giving a brief introduction about the k-nearest-neighbor (KNN) method.Then, we show how to apply the KNN method to forecasting the hidden variables over the next brief time horizon.In our work, we found that, for each day of a week, the hidden variables follow similar patterns.Thus, we can forecast the hidden variables for the (l + 1)-th Wednesday by using the hidden variables in the past l Wednesdays.

KNN method
The KNN method collects historical data as the sample database.In our case, a k-dimensional vector z t = [z 1,t , …, z k,t ] is stored, where z i,t is the i-th hidden variable for time step t.Then, the Euclidean distances between all sample points and current data are calculated to generate the KNN's nearest neighbors.Finally, future hidden variables are forecasted by using a weighted average of these KNN's nearest neighbors.
The KNN method is presented in Algorithm 3. To deal with missing data of vehicle speeds, we use the values at the previous time step or the average of the two previous time steps.Thus, we have 1440 time steps for each day.We note that h f is the forecasted horizon, while h p is the past horizon.Algorithm 3. KNN Method 0. Initialize: w equal to the values from the last sample data.1.At time step t during the (l + 1)-th week, collect the l historical data z t−hf ( j), …, z t−1 ( j), 1 ≤ j ≤ l. 2. Compute the Euclidean distances between z i,t−hp (l + 1), …, z i,t−1 (l + 1) and the l historical data.3. Find the KNN nearest neighbors with the first KNN shortest distances d 1 , …, d knn , and the corresponding weeks are l 1 , …, l knn.

Forecast the vehicle speeds as
.
Update w, d using the PASTd method and go to the next time step.

Forecasting algorithm
Combining the KNN method with the PASTd algorithm, our complete forecasting algorithm is shown in Algorithm 4. Algorithm 4. Forecasting Algorithm 0. Initialize: k, knn, h p , h f , l. 1.For each time t, receive the speed t x .2. Compute the corresponding hidden variables z(t) by the PASTd algorithm.3. Collect the hidden variables into a matrix for each day for consecutive l weeks as historical data.4. Forecast the hidden variable for the (l + 1)-th week using the KNN method.

Forecast f t h +
x through the hidden variables generated at the last step.6. Compute the error between forecasted and actual speeds.
Updating sample data After the (l + 1)-th week, sample data should be updated to forecast the speeds during the (l + 2)-th week.In our algorithm, we disregard the sample data in the first week and add the data corresponding to the (l + 1)-th week.Thus, the space to store historical data is fixed.

Reconstruction results
In this section, we show the reconstruction results obtained by using both the classical PCA and online PASTd methods.We then compare these two methods in terms of accuracy as well as time efficiency.The data we used are vehicle probe project (VPP) data granted by the RITIS system.In our tests, we chose n = 48 road segments over a whole week, which amount to 7 × 24 × 60 = 10080 vectors each of dimension 48.These 48 segments were randomly chosen from all the road segments of the State of Maryland.The order of the days are Sunday, Monday, ..., to Saturday.If any data is missing, we use the data of the previous time step to fill-in the gap.

PCA performance
In our test, the time window size is selected to be T = 30, meaning that we buffer the data and compute the covariance matrix every 30 min.We use k = 2 hidden variables to reconstruct the data matrix within each time window.
In Fig. 2(a), the blue line shows the original data, while the red line corresponds to the reconstructed data.As we can see, our reconstruction captures the largest statistical variance and has a small deviation from the true data.Figure 2(b) shows the first two hidden variables for one window with size T = 30.Note that these are the hidden variables for the normalized data.Thus, the y-axis value illustrates the deviation from the mean.Although this figure represents the patterns for only one time window, we find that the hidden variables for other windows follow similar patterns.Owing to such characteristics, we are able to make predictions for hidden variables.
Figures 3(a) and 3(b) show how changing the values of the parameters affects the performance.In our test, we vary the time window size T from 15 to 60 min and the number of hidden variables from two to five.Here, we use the mean square error (MSE) to value the accuracy and the CPU runtime to value the runtime.In Fig. 3(a), the smaller the window size and the larger the number of hidden variables, the better the performance.In Fig. 3(b), the runtime is in seconds.As we can see, the larger the window size and the smaller the number of hidden variables, the faster the algorithm, which is to be expected.Thus, there is a trade-off between the reconstruction error and the runtime.The results show that the PCA is suitable for traffic flow data reconstruction.

PASTd performance
For PASTd, we use k = 2 hidden variables to update the two hidden variables and weight matrix at every time step.In Fig. 4(a), the blue line represents the original data and the red line shows the reconstructed data.We can barely see blue lines in our figures, meaning that our reconstruction is basically the same as the original data.Figure 4(b) illustrates the time series of the two hidden variables determined over a week.As we can see from the time series of the first hidden variable, the five weekdays show similar patterns, while weekend days show different patterns.This can be justified by the fact that weekdays have obvious rush hours, while weekends do not necessarily follow that pattern.Also, the first hidden variable captures the largest variance and hence there will be significant differences between weekdays and weekends.
In PASTd, we update the weight matrix as well as hidden variables incrementally at each time step.The only parameter that matters is the number of hidden variables k.As in the previous subsection, we test the performance of PASTd in terms of reconstruction error and time efficiency as functions of k.
Figures 5(a) and 5(b) show how changing the values of the parameters affects the performance.As we can see, by using a large number of hidden variables, we can obtain a high accuracy, while a small number of hidden variables leads to a short runtime.We note that even for k = 2, the error is less than 0.3 mph, while for k = 5, the runtime is less than 1.5 s over a week.The results show the high competence and robustness of the PASTd method.
We compare the performance between PCA and PASTd in terms of accuracy and time efficiency.For accuracy, online PASTd outperforms classical PCA probably because PCA captures only the patterns in a single time window, while PASTd incrementally updates the patterns taking into consideration the overall past with more weight assigned to the most recent ones.Another reason might be that PCA is sensitive to the window size T.For time efficiency, PASTd outperforms PCA again because PASTd has a linear complexity O(n), while PCA has at least O(n 3 ).When n increases, the difference will be substantially larger.Thus, we choose the PASTd method for real-time traffic pattern discovery.

Forecasting results
In this section, we show the forecasting results of our proposed algorithm.We choose the MSE and mean absolute proportion error (MAPE) as the performance measurements.
The data used to evaluate the performance are the vehicle speeds collected in Baltimore County, Maryland, which contain 1751 road segments.We set the speeds during the first 40 weeks in 2014 as the sample data and try to forecast the speeds for the 41st week.To obtain the highest time and space efficiency, we choose k = 1, knn = 1, and h p = 1, and test the performance by varying the forecasting horizon h f .
Figure 6 shows the forecasting results with h f = 60 for the first road segment, where the blue line represents the actual speeds, while the red line corresponds to the forecasted speeds.As we can see, our forecasting algorithm captures the largest statistical variance and has a small deviation from the true data.various forecasting horizons.When h f increases, MSE and MAPE do not increase rapidly, verifying the effectiveness and robustness of our forecasting method.
To further show the advantage of our forecasting algorithm, we compare our method with the historical mean method, which also reduces the data dimension to 1. Figure 7(a) shows the forecasting results of our method, while Fig. 7(b) shows those of the historical mean method.As   we can see, the historical mean method can barely capture the statistical variance.Moreover, its MSE = 45.60, which is much larger than ours.

Robustness test and outliner detection
The top figure in Fig. 8(a) corresponds to the road segment with the largest MSE, while the bottom figure with the smallest MSE.The road segment whose speed changes frequently may lead to a larger MSE.Note that, even in the worst case, our method can still capture the speed changes and make accurate predictions very quickly.Similarly, the top figure in Fig.We note that, in both worst cases of MSE and MAPE, the vehicle speeds change too frequently to be realistic.It is reasonable to doubt that the sensors for detecting these two road segments are nonfunctional.As a result, our forecasting algorithm helps detect outliners as well.

Conclusions
In this study, we managed to make short-term real-time prediction of traffic flow data of Baltimore, Maryland, US.We applied VPP data from the RITIS system, which are the true world data.PCA is used to derive core traffic patterns from streams of traffic data on a large number of road segments.Furthermore, a more efficient online method, called the PASTd algorithm, is implemented to reduce the data dimension.We use the KNN method to predict the hidden variables.As a result, we are able to forecast the speeds for all the road segments in linear complexity.Our method aims at network-level prediction, regardless of freeway or urban road segments.As far as we know, our method is the first traffic flow data prediction scheme that meets the following requirements: scalability, linear complexity, and no need for data buffering.It also overcomes the restriction of both time and space.

Fig. 2 .
Fig. 2. (Color online) (a) PCA reconstruction using k = 2 patterns for one road segment and (b) hidden variables for one window.

Fig. 7 .
Fig. 7. (Color online) Forecasting results for the first 9 road segments using (a) our method and (b) historical mean.
8(b) corresponds to the road segment with the largest MAPE, while the bottom figure with the smallest MAPE.The results are consistent with MSE.
Algorithm 2 enables the explicit computation of eigencomponents.In fact, w i (t) is an estimate of the i-th eigenvector at time step t, and d i (t) is an estimate of the corresponding

Table 1
provides more specific performance characteristics at