Incremental Clustering Method for Recognizing User Destinations and Routes Using Smartphone Global Positioning System Sensor

An effective user route identification system is a major technological capability that mobile intelligent systems can exploit. To develop such a system, data that include information about user destinations and routes are required. In this paper, we propose a method that incrementally recognizes user destinations and routes by means of logs collected from a global positioning system (GPS) sensor. Applying clustering methods to detect destinations incrementally requires defined thresholds and a strategy of cluster creation. In addition, to identify routes, a procedure for dividing GPS logs into trajectories and a function for calculating the similarity between trajectories are necessary. In this paper, we address these requirements.


Introduction
Personal route information is critical for mobile intelligent systems. Many studies have been conducted on user route prediction and tracking based on mobile sensor data. (1)(2)(3)(4) Sensor data presented as simple numerical values cannot be adequately represented semantically, as is often required by mobile intelligent systems. Therefore, systems that identify the personal routes of mobile users based on sensor data are necessary. In order to develop such systems, sample data from destinations and personal routes of mobile users must be included in the design process.
In this paper, we propose several practical approaches to identifying user destinations and travel routes based on global positioning system (GPS) data from smartphones. To enable our system to recognize destinations and routes in real time, we adopt densitybased clustering and an algorithm for calculating the similarity between trajectories presented by GPS coordinates. Destinations differ among individuals. Therefore, the thresholds used for density-based clustering (minimum radius of the neighborhood and minimum number of points) are adopted dynamically. In addition, we define an incremental creation and merge methods for updating new destinations and routes. Because GPS signal reception indoors is impossible and because most GPS coordinates are inaccurate, misidentifying user movements often occurs. In the process of identifying and updating user routes, we employ a function that calculates the distance between two streets.

Recognizing initial user destinations
First, we must find the coordinates at which a mobile user remains for more than 10 min according to an initial GPS log because a person stays for more than 10 min in a given destination. Coordinates differ slightly when a user revisits a location. Therefore, in our study, we merged the coordinates representing a revisited location using the density-based spatial clustering of applications with noise (DBSCAN). DBSCAN refines the destinations when a minimum radius of the neighborhood, or Eps, and a minimum number of points in an Eps, or MinPts, are set as thresholds to determine the density of clusters. When a large Eps is set, several destinations are clustered into a single destination. However, when a small Eps is set, the actual destinations are clustered separately. When we analyze the GPS log collected from participants in our experiments, the distances between the destinations differ according to the travel patterns of the participants. Thus, having the system automatically determine an appropriate Eps for each user is necessary. Table 1 provides a means of determining Eps automatically, which is accomplished by confirming the number of clusters and reducing the Eps based on the general MinPts. If this process is repeated, the number of clusters increases. However, when Eps falls below the clustering criterion, the number of clusters decreases. Therefore, Eps is determined on the basis of this process in order to reduce the number of clusters.
In order to determine the general MinPts, we first assume that most people visit common places such as their homes or offices every day. In addition, the fault probability for our system to detect GPS is reflected in a certainty factor, because a smartphone's GPS signal is not always detected. The following eq. (1) is given: where α denotes the certainty factor and n denotes the total number of days in which data was collected. A destination is represented as the center point of each cluster. We

Recognizing initial user routes
The algorithmic procedure for recognizing initial routes contains five steps. The first step is detecting candidate trip switch points (TSPs). TSPs are coordinates that change from natural modes of transportation or travel behavior to artificial (automatic) modes of transportation, or the reverse. The detected candidate TSPs are refined into TSPs by DBSCAN. The second step involves dividing the coordinate sequences of the initial GPS log into trips based on user destinations and TSPs. Each trip includes the sequence of streets on which the user traveled while using the same mode of transportation. The third step involves classifying trips with the same starting and finishing points. The fourth step involves generating trip clusters (i.e., a set of similar trips) after calculating the similarity of trips within each class. The final step involves identifying routes by arranging trip sequences based on the initial GPS log and trip clusters. Trip clustering is the most critical procedure used to recognize routes. Table 2 describes the steps involved in trip clustering.
Function 1 is used to calculate the similarity between a particular cluster C n and a trip TR j . The similarity is computed based on the distance between streets of C n and TR j . To calculate the distance between two streets, we adopted a method proposed by Froehlich and Krumm. (5) Given s i (a street with a nonclustered trip TR j ) and s j (a street in a cluster C n ), the following eq. (2) determines the distance between the two streets ( Fig. 1). Table 2 Algorithm and function for recognizing initial routes. Algorithm 3 -trip clustering Function 1 -sim(C n , TR j ): Computing similarity of the particular C n with TR j Input: trip set {TR 1 , TR 2 , ..., TR j }, Output: trip-clusters {C 1 , C 2 , ..., C n } sim(C n , TR j ) = Create a new trip cluster for T R j ∪ δ(s n T R j , S C n ) : The closest distance between s n T R j and S C n s n T R j : The nth street of T R j S C n : The street set in C n

Updating user destinations and routes
Because a user may not always travel through recognized routes to destinations, updating user destinations and routes based on sequences of detected GPS coordinates in real time is necessary. Therefore, the current destination must be merged with existing clusters of destinations, or a new cluster should be created. Figure 2 shows procedures for updating clusters of destinations.
If any coordinates in existing clusters are not included in Eps for the current destination p, a new cluster including p is created and the cluster level is set to 3. If the coordinates p i are included in Eps for p, the number of coordinates n and p in Eps of p i is confirmed. When n satisfies MinPts, p is merged to the cluster C i that includes p i . By contrast, a new cluster is created by p and the cluster level is set to 3 in the event that MinPts are not satisfied. Finally, the level of clusters is adjusted by n in each cluster. (If n is more than General MinPts, the level is 1; if n is less than 25% of General MinPts, the level is 2; and if n is greater than 1, the level is 3.) Regarding the updating of routes, after selecting a representative trip t i among trips in each trip cluster T n , we compute the similarity of t i and trip t n , which is refined from the sequence of newly collected GPS data. Because a trip consists of a series of streets, the variable t i is the trip that shares the greatest number of similar streets to all other trips in T n . If the similar value of t i and t n is less than a given threshold, a new trip cluster is created by t n and a new route is then composed of the sequence of newly collected GPS coordinates.

Results and Discussion
GPS logs used in our experiments were collected from the Android smartphones of five participants. The time, GPS coordinates, travel behaviors (6) (In ref. 6, travel behaviors are identified from a method using accelerometers. We use only the results of this paper), and streets were recorded in the log. First, we evaluated the efficiency of the automatically calculated Eps and the accuracy of the incrementally recognized destinations. To evaluate the accuracy of the recognized destinations, we visualized the clusters of destinations using Google Maps and participants verified the data for their own destination coordinates. Table 3 shows the data for collected logs and numbers of destinations correctly recognized based on Eps, which was calculated using the proposed algorithm and two fixed Eps values according to general GPS errors (25 and 50 m).
Second, we evaluated the accuracy of the recognized routes by considering their robustness and completeness. The robustness was based on routes in a particular route cluster all having the same travel trajectory. The completeness was based on the same travel trajectories in the GPS logs forming a single route. Table 4 shows the robustness and completeness of the routes incrementally identified from the GPS logs of each participant.
From experiments conducted with five smartphone users, the average accuracy of incrementally recognized user destinations was 91.4%, the robustness of identified routes was 94.8%, and the completeness was 89.8%. These results show that the suggested approach is efficient and practical.

Conclusions
We proposed a practical approach for recognizing personal destinations and routes using a smartphone GPS sensor. To improve the efficiency of the system, we focused on three major areas of concern: (1) recognition of user destinations by clustering using an algorithm for identifying a suitable Eps, (2) identification of user routes by trip clustering by computing the distance between two streets, and (3) an incremental updater of destinations and routes based on sequences of GPS coordinates obtained in real time.
The experimental results showed that the suggested approach is extremely efficient and practical and can be effectively adopted for use in constructing a personal route model for location-based services.