Spatial Clustering of Seoul’s Elderly Captive Riders Using Smart Card Spatial Autocorrelation Analysis

Smart card transactions contain user information and travel patterns. Thus, in this study, elderly smart card transactions were analyzed to determine elderly captive riders’ hot spots that need appropriate social services for them. There has been minimal focus on the spatial autocorrelation of smart card big data when developing new traffic policies. Therefore, in this study, spatial autocorrelation analysis was performed using Seoul’s smart card data for six weeks. In the collected data, it was found that 76.3% of the elderly trips were concentrated on subways, which offer free tickets. For this reason, we examined elderly captive bus riders in this study. Moran’s I was 0.277 for the elderly smart card transactions, and it has a positive spatial autocorrelation with the significance level of 0.01. Local indicators of spatial association (LISA) analysis is used to determine the spatially autocorrelated areas. Fifty administrative units (dongs) in Seoul were considered hot spots, and spatial clustering was confirmed; 61 dongs were considered cold spots. The distributions of hot spots and cold spots seem to be closely related to the subway supply level rather than the elderly population. Twenty-eight hot spots seriously need appropriate social services for elderly bus users because those hot spots do not operate subway service. First, barrier-free bus stops should be installed at the 28 hot spots. Second, bus lines that pass the 28 hot spots need to have high priority when supplying low-floor buses. Third, the low-floor bus shuttle service from/to the 28 hot spots is proposed by analyzing the top nine origins and destinations of the elderly. To propose advanced public transportation policies for the elderly, smart card spatial autocorrelation analysis can be used.


Introduction
Metropolitan cities around the world have complex public transportation networks. Most of them use public transportation smart card systems and collect big data from there. These big data can be used to customize public transportation service for users. Despite the smart card data consisting of various user behavior information (e.g., user class, ride station ID, alight station ID, ride time, alight time, and number of transfers), most smart card data analyses have been limited to simple descriptive statistics. (1)(2)(3)(4) The proper utilization of smart card data can help in developing customized public transportation policies for each user group. However, previous researchers rarely demonstrated the spatial autocorrelation of smart card big data. We discovered some transportation services for elderly riders from smart card big data.
Metropolitan cities around the world are fast becoming aging societies. According to Statistics Korea, (5) 20% of Koreans will be elderly (>65 years old) in 2026, and more than 30% may be elderly by 2037. Korea is aging quickest among Organization for Economic Cooperation and Development (OECD) countries. Thus, we should prepare in advance public transportation services for the elderly by analyzing their actual travel patterns. Many of the local governments have interest in maximizing public transportation services for the elderly within a limited budget. Metropolitan cities already invest a considerable portion of their budgets in the travel convenience of the elderly. For example, Seoul, the capital city of Korea, invests 20-30 million dollars every year to subsidize low-floor buses and free subway rides for elderly citizens. Thus, 76.3% of elderly travel in Seoul is concentrated on subways because it is free of charge to them. Nevertheless, some elderly people still ride the bus because their home or destination is not located near a subway station. Proper traffic services are needed for these elderly bus captive riders.
Previous traffic policies have been based on social statistics such as resident populations. However, this does not reflect the actual travel demand. Thus, in this study, we analyzed elderly travel patterns using smart card data from the Seoul metropolitan area. Through spatial autocorrelation analysis, we found the hot spots and cold spots of elderly travels in Seoul. In contrast to previous studies, this study method can provide on-demand services on the basis of actual travel patterns. In conclusion, we suggest some public transit services for elderly captive riders.

Smart card data
Smart cards are being increasingly used in public transit areas. Seoul's metropolitan public transportation system also widely uses smart cards and is one of the most developed areas that widely use smart cards. Smart card data are categorized into 16 columns (Table 1). To classify the target user group, the 9th column (user class code) of smart card data is used. The user class code can classify 15 classes (Table 2), such as general people, youth, and elderly.

Spatial autocorrelation
Ordinary least square (OLS) analysis is used for normal data analysis. However, when spatial characteristics affect a nearby area, researchers should perform spatial autocorrelation analysis. Tobler's first law of geography (6) defines spatial autocorrelation as follows: everything is related to everything else, but near things are more related than distant things. When there is spatial autocorrelation, the independent assumption of the OLS model fails. In such a case, a spatial regression model with local characteristic variables is necessary. The general Spatial Regression Model is shown in Eqs. (1) and (2). When the spatial weight matrix of error term (w 2 ) is 0, it is called Spatial Lag Model (SLM), and when the spatial weight matrix of dependent variable (w 1 ) is 0, it is called Spatial Error Model (SEM). (7,8) y = ρw 1 y + Xβ + μ (1) μ = λw 2 μ + ε (2) ε ~ MNV(0, σ 2 I n ) (MVN: Multivariate Normal Distribution) y: dependent variable X: variable of interest β: coefficient ρ: spatially lagged dependent variable λ: spatial autoregressive w 1 : spatial weight matrix of dependent variable w 2 : spatial weight matrix of error term μ: error term To show the spatial autocorrelation, we can calculate a spatial autocorrelation index such as Moran's I. For spatial autocorrelation analysis, matrixed spatial weights, such as Queen contiguity, Rook contiguity, distance weight, and threshold distance, are needed.
The Queen contiguity, which is the most famous method, sets the weight matrix on the basis of adjacent edge or corner boundaries (Fig. 1). Through this weight matrix, we calculated Moran's I and derived the spatial autocorrelation between observations [Eq.
I: Moran's I N : number of spatial units indexed by i and j x: variable of interest X : mean of X matrix w ij : matrix of spatial weight with zeros on the diagonal (i.e., w ij = 0) Besides Moran's I analysis, local indicators of spatial association (LISA) analysis is necessary to examine the spatial autocorrelation between a specific area and the entire study area. LISA analysis serves two purposes. First, it is an indicator of local pockets of nonstationarity or hot spots. Second, it can be used to assess the effect of individual locations on the magnitude of the global statistic and to identify outliers similarly to Anselin's (9) Moran scatterplot. (10) In this study, LISA analysis is used for the second purpose. It is possible to show the spatial autocorrelation graphically through the GIS map. When there is spatial autocorrelation in the data, the OLS model has errors. In that case, SLM or SEM is used as an alternative model.

Analysis method
We collected the smart card transactions for the Seoul metropolitan area from Oct. 1, 2015 to Nov. 12, 2015. Many elderly riders tend to prefer the subway, which they can ride free of charge (social welfare service). However, in this study, we particularly focused on captive elderly riders who are forced to ride the bus because of the absence of subway service in their areas.
To sort out the target user data from smart card transactions, we used the big data analysis engine Splunk. The elderly bus travelers can be classified on the basis of the user class code 06 (Table 2). Next, boarding and alighting stations' latitude and longitude information should be converted to the Korean basic administrative unit (dong) for the spatial autocorrelation analysis. Dong is the basic neighborhood unit in Korea. The average area of a dong is 1.43 km 2 . Converting the smart card data to the dong unit is necessary because the boarding and alighting stations provide point unit information, and the spatial autocorrelation analysis is carried out under the polygon unit. (11)(12)(13) Smart card transactions of boarding and alighting based on station units were collected and reclassified to the dong through the GIS program. Through these steps, we measured the local Moran's I and performed LISA analysis to identify areas where spatial clustering occurs.
For spatial modelling, dependent and explanatory variables are necessary. As mentioned earlier, the dependent variables are the elderly bus riders' travel transactions and the explanatory variables are selected from Seoul statistics, (14) which can explain the characteristics of elderly bus travel well. There are limited explanatory variables in Seoul statistics, which are related to elderly travel and collected by the dong space unit. The explanatory variables are elderly population, superaged population, aged-child ratio, solitary senior citizen population, local government tax, and area of each dong. The definitions of the explanatory variables are as follows. The elderly population is the population of individuals aged over 65 and the superaged population is the population of individuals aged over 80. The aged-child ratio is the ratio between the number of individuals aged under 13 and those aged over 65. When the aged-child ratio exceeds 30, that society is classified as an aging society. The local government tax reflects the income level of that area.

Error data processing
The smart card raw data contain erroneous data. Kim (2007) classified error data into three major categories and 12 subcategories. (15) Each type of error data should be removed before analyzing the smart card data.
When filtering the error data according to Ref. 15, we found that 7.26% of the data were erroneous (Table 3). In Table 4, error types 5 and 7 do not affect the analysis. Thus, only error types 8, 10, and 12 (1.7% in total) are removed for the analysis.

Descriptive statistics
In total, 60.23 million pieces of elderly smart card data were collected during the six-week study period; 45.94 million (76.3%) trips were on the subway and the other 14.29 million (23.7%) trips were on the bus. Most of the elderly people (76.3%) take the subway, which they can ride free of charge. In particular, when elderly individuals are the choice riders, 89.6% concentrated on the subway. Elderly people prefer the subway, and we determined that mainly captive riders use buses. Among the 1.26 million (14) elderly citizens of Seoul as of 2015, each elderly person took 0.26 bus trips and 0.85 subway trips per day on average.
As a subway supply factor, we counted the number of subway operations each day. Among the 420 dongs in Seoul, only 222 dongs have a subway station. In Seoul, 170.9 subway rolling stocks on average are operating each day. The detailed numbers of subway operations by lines are shown in Table 4. Those are reclassified according to the dong unit for the spatial autocorrelation analysis (Fig. 2). Figure 3 shows the spatial percentile distributions of the boarding and alighting of the elderly citizens. The number of elderly bus riders tends to be high in the southwestern and northern areas of Seoul. The global Moran's I of the elderly Seoul citizens' smart card transaction is 0.277 and the p-value is 0.001 (Fig. 4). There is a positive spatial autocorrelation at a significant level of 99.9%. LISA analysis is performed to confirm the location of spatial autocorrelation. As shown in Fig. 5, the LISA clustermap derives 50 high-high hot spots and 61 low-low cold spots of elderly bus travelers at a significance level of 0.05.

Spatial autocorrelation analysis
In Fig. 5, Gangbuk-gu, Eunpyeong-gu, and Geumcheon-gu are hot spots. More specifically, the hot spots are located around Jeongnung-dong, Hongeun-dong, and Jongam-dong. These areas are mountain terrain areas. Thus, various bus lines are on operation and subway lines do not exist here. In contrast, Gangdong-gu, where one can move to the downtown area by subway line 5, is a cold spot.   There are various factors that affect the elderly bus traffic; generally, it seems that cold spots are in areas where the subway supply level is high and hot spots are formed in mountain areas where the subway supply level is low.

Spatial regression model
According to Moran's I and LISA analysis results, elderly bus travel data have spatial autocorrelation. For the spatial regression model, socioeconomic statistics are used as explanatory variables. The basic OLS model shows that all the explanatory variables are significant, except the elderly population, and r 2 is 0.3264. The multi-collinearity problem is not serious because the condition number of the OLS model is 14.0308. The error terms also show homoscedasticity based on the Breusch-Pagan test result (p-value is 0.000).
For the next step, we reflect the spatial weight in the OLS model. According to the LM test result, both SLM and SEM are significant. The r 2 values of SLM and SEM are 0.4184 and 0.4388, respectively ( Table 5). The spatial regression models show that all the explanatory variables have a positive relationship with elderly bus travel. As the economic activity factors such as local government tax and the area of each dong are increased, the number of elderly bus travels is increased. Thus, the positive relationship between a dependent variable and local government tax or the area of each dong is reliable. The positive relationships between elderly bus travel and the superaged population, aged-child ratio, and solitary senior citizen population have complex reasons. The general residential environment of hot spots shown in Fig. 5 is poor because the subway supply level is low. Owing to this disadvantage in public transportation, young people hesitate to move to this area. Thus, old native residents remain in this area and the superaged population, aged-child ratio, and solitary senior citizen population become high in hot spots.

Suggestion of New Services for Elderly Captive Riders
Through the LISA cluster analysis of elderly smart card transaction data, we found that 50 out of the 420 dongs in Seoul had spatial clustering as hot spots. Another 61 dongs were cold spots. Among the 50 hot spots, 28 dongs were captive areas for buses because there are no subway stations in those areas (Table 6). These 28 dongs have spatial autocorrelation, but travelers are unable to choose alternative modes of transport besides the bus, so it is necessary to create proper social services for elderly bus captive riders to make it more convenient for them to use the bus.
First, bus stations should be barrier-free for the elderly in hot spots. These 28 hot spot areas could be priorities for installing barrier-free bus stations. Second, more low-floor buses should be supplied for these 28 hot spot areas. In these areas, a total of 64 bus lines are in operation. Table 7 shows the elderly boarding and alighting records for these bus lines in hot spot areas during the analysis period. The top lines in Table 7 have many boarding and alighting records in hot spots, and they can be good routes to supply low-floor buses preferentially. Finally, we propose the development of a low-floor bus shuttle route for elderly riders. Table 8 shows the

Discussion and Conclusion
In this study, we analyzed the smart card transactions of the elderly and found spatial autocorrelation. Results revealed that 50 out of 420 dongs in Seoul show spatial clustering.
Elderly smart card transactions reflect the actual travel pattern. The elderly bus travel spatial clustering characteristics depend on the superaged population, aged-child ratio, solitary senior citizen population, local government tax, and the area of each dong.
With the coming of the superaged society, preparing the proper public transportation service for the elderly is crucial. Thus, analyzing smart card transactions using spatial autocorrelation is meaningful. The spatial regression model and LISA cluster analysis results in this study lead to the necessity of appropriate bus services for the elderly. When the superaged population, aged-child ratio, and solitary senior citizen population are large, elderly bus travel is more activated. Nevertheless, when the subway supply level is high, elderly individuals tend to prefer the free-ride ticket of the subway.
Thus, we propose some services for elderly riders in 28 hot spots where the only mode of public transportation is the bus. First, bus stations should be made barrier-free in the 28 captive hot spots. Second, we should supply low-floor buses for the 64 bus routes that pass the 28 hot spots in the order of their total amounts of boarding and alighting. Third, on the basis of our analysis of the O/D from/to 28 hot spots, we suggest low-floor bus shuttle lines for the elderly in the nine most frequent O/D routes. Beyond those social services, local governments can design various social services and upgrade the current system. Thus, it is desirable to analyze the spatial clustering of various smart card users and find suitable customized traffic policies. In this study, we suggest only a few policies based on elderly smart card analysis results, but we would be able to add further conclusions and implications if we could later analyze the individual behaviors of users after the social services proposed in this study are provided.