Development and Demonstration of Grid-based Spatiotemporal Data Model for Monitoring Small Businesses

The objective of this study is to propose a spatiotemporal data model (STDM) to systematically analyze the dynamic changes of market areas using the small business database provided by the Market Area Information System of Small Enterprise and Market Service and demonstrate the applicability of the model according to a series of analysis scenarios. In the literature review section, we reviewed studies related to indexes that summarize the changes in the small enterprise and market area and reflect their characteristics as well as studies on STDM and discussed the direction for application. In the methodology section, we presented methodologies to establish the STDM and relative diversity index (RDI) using the nationwide small business database. In the case study section, we established a nationwide STDM to monitor the changes in the RDI of the small business database based on a 1 km grid and demonstrated the applicability of the model by analyzing the RDI in the macroscale at the metropolitan/provincial level and in the 1 km grid-based mesoscale. Moreover, on the basis of the empirical results, we discussed that applying the grid-based STDM facilitates comparative analysis in terms of macroscopic, regional, and microscopic aspects by solving the modifiable areal unit problem (MAUP) that changes according to time, bringing convenience to the visualization analysis of spatiotemporal patterns, and establishing a grid-based RDI of small businesses with consistent spatial units.


Introduction
Small Enterprise and Market Service provides market area information through the market area information system. The market area information is aggregated, analyzed, and geovisualized over a census block using the small business database, which is composed of individual small business datum. The census block is not appropriate for the spatial unit to observe real-world patterns derived from information at the level of individual small business. It can lead to major misinterpretations caused by ecological fallacy and the modifiable areal unit problem (MAUP). To cope with the issue and systematically analyze the dynamic changes in market area information, it is necessary to reflect the spatiotemporal features of the small business database and develop a spatiotemporal data model (STDM) to provide a customized visualization model. Moreover, it is necessary to demonstrate the applicability of the proposed data model by summing up, comparing, analyzing, and visualizing the small business database.
As shown in Fig. 1, the objective of this study is to propose an STDM to systematically monitor the dynamic changes of market areas using the small business database provided by the Market Area Information System of Small Enterprise and Market Service and demonstrate the applicability of the model. To this end, in the literature review section, we reviewed previous studies to summarize the changes in the small business database and reflect their characteristics and selected indexes to apply to this data model. Moreover, we selected the STDM suitable for the nationwide small business database by reviewing the types of STDM. In the methodology section, we presented methodologies to establish the index selected using the nationwide small business database as well as the STDM and provided analytical methods and scenarios to demonstrate the applicability of the model. In the case study section, we established a nationwide STDM to monitor the changes in relative diversity index (RDI) based on a 1 km grid and demonstrated the applicability of the model by analyzing the RDI in the macroscale at the metropolitan/provincial level and in the 1 km grid-based mesoscale. On the basis of the empirical results, we discussed the significance of this research that the application of the grid-based STDM is one of the solutions to deal with the MAUP and visualize spatiotemporal patterns of small businesses. Finally, we concluded that the procedural methodologies of this study can be used for determination of grid-based STDM obtained from scientific measurement for various applications and the further research studies are necessary to flexibly customize a STDM in various spatial units.

Diversity index for monitoring small businesses
To analyze and visualize the dynamic changes of market areas with an STDM, there is a need for an index to measure the changes according to time considering business startups and closures as well as growth and decline due to competition among similar industries. The RDI is used to quantify the phenomenon in which certain local market areas are specialized in the same industry or differentiated into various industries. Duranto and Puga (2000) presented five indexes that represent diversity and specialization, as shown in Table 1. (1) The absolute specialization index (ZI) represents the employment share of each city's largest sector. The relative specialization index (RZI) calculates the locational quotient (LQ) of each industry by region compared with the nationwide industrial distribution and shows the largest value. The absolute diversity index (ADI) is a reciprocal of the Hirshman-Herfindahl Index (HHI) and shows the growth driving to a certain industry as well as diversity. RDI shows whether the industrial distribution of each area is relatively more diverse than the nationwide industrial distribution. The advantage of RDI is that it can consider the regional characteristics of market areas as well as the specificity and diversity of industries. Therefore, in this study, we measured whether the distribution of a certain market area is relatively diverse compared with the industrial distribution of small businesses in cities/counties/boroughs under jurisdiction by applying this index. Moreover, we visualized the changes in the market areas of small businesses according to time and space by applying the RDI to the STDM. Table 1 Various specialization and diversity indexes. (1) Type of index Equation Description Specialization Absolute measure Ratio of the industry that represents the biggest portion in the area, where ZI i is employment share of each i city's largest sector. s ij is the share of industry j employment in city i.

Relative measure
Highest value among the LQs of each industry in the region compared with the whole nation, showing the relative specialization of the region to the industry, where RZI i is i city's relative specialization. s j is the share of industry j in national employment.

Diversity
Absolute measure Inverse of an HHI that shows whether the industry is biased or diverse. If the economic activity in the city under consideration is fully concentrated in a sector, we find DI i = 1, and this index increases as activities in the city become more diverse. DI i is i city's diversity index. Σ j s 2 ij is the sum over the square of the share of industry j employment in city i.

STDM
For the development of the customized STDM to explore and analyze the variance of RDI as time passes, the spatiotemporal features of a small business database are investigated. The relevant data model and table structure are also identified. Güting and Schneider (2005) classified the features of events represented by the STDM in temporal and spatial dimensions and presented 10 different types as shown in Table 2. (2) They also provided examples of events that correspond to the point and regional features for the 10 types. In the temporal dimension of events, an event is classified either as an instant or periodical event depending on whether the event that occurs is an instant phenomenon, such as a traffic accident or a lightning strike, or whether it is a periodical event that occurs for a certain period of time such as a home address or construction sites. In the spatial dimension of events, an event is classified as either a not moving event or a moving event depending on whether it moves or not. Moreover, events that occur in an irregular cycle at a certain place such as volcanic eruptions are classified as a sequence of not moving events since instant phenomena occurred consecutively. Market areas of small businesses affect the diversity of industry depending on business startup, operation, temporary shutdown, and closure. (1) In the temporal dimension, business startup and closure are instant events, whereas temporary shutdown and operation are periodical events. In the spatial dimension, business startup, closure, temporary shutdown, and operation are all not moving events.
Pebesma (2012) (3) presented four types of STDM as shown in Table 3 considering 10 spatiotemporal features presented by Güting and Schneider (2005). (2) 'Spatiotemporal full Table 2 Examples of five types of spatiotemporal event in terms of point and regional features. (2) Temporal dimension Instant events Periodical events

Spatial dimensions
Not moving events • Point feature: accidents, lightning strike, birth, death, archeological discoveries, plane crashes, volcanic eruptions, earthquakes • Regional feature: a large-scale forest fire • Point feature: tree, people's home address, cities built at some time, still existing or destroyed; construction sites such as buildings and highways; stores of a company being used for some time; or "immovables", anything that is built at some place and later destroyed • Regional feature: the area closed for a certain time after a traffic accident Sequence of not moving events • Point feature: volcanic eruptions of last year • Regional feature: Olympic games viewed collectively, at a large scale • Point feature: accommodations of a traveler during a trip; the trip of an email message (assuming transfer times between nodes are zero) • Regional feature: countries, real estate (changes in shape only through legal acts), agricultural land use, and so forth Moving events • Point feature: trajectories of one or more persons, planes, cars, or birds • Regional feature: forests (growth); small-scale forest fires (i.e., we describe the development); people in history grids' is a data model with a consistent spatiotemporal unit for event measurement and is suitable for events that can be measured continuously throughout all time and space such as temperature or wind velocity on Earth's surface. 'Spatiotemporal sparse grids' is also a data model with a consistent spatiotemporal unit for event measurement but is suitable for events that occur sparsely in a certain spatiotemporal unit such as crime occurrence frequency in each administrative district by year or the number of traffic accidents. 'Spatiotemporal irregular data' is a data model with irregular spatiotemporal units to measure events and is suitable for events that occur sparsely according to time and space such as forest fires, crimes, and diseases. 'Data for moving objects and trajectories' is a data model suitable for expressing the trajectories of moving objects according to spatiotemporal changes such as people, wild animals, and typhoons. (4,5) Events such as startup and closure of small businesses are spatiotemporal irregular data that deal with instant and not moving events, while operation is an event that takes up a specific location for a certain period of time, and thus spatiotemporal sparse grids are suitable for this event. For reasons of simplicity, spatiotemporal data often come in the form of single tables. If this is the case, they come in one of three forms, as shown in Fig. 2

Methodology
A methodology used to build an STDM varies according to spatial and temporal feature types to represent events in real world. Owing to a small business database including only xy-coordinates, this study is restricted to consider an STDM representing events as a point. Numerous methods have been developed for spatial and spatiotemporal point pattern analyses in various applications, such as the monitoring of the spatial and temporal patterns of disease infection, the detection of spatial and spatiotemporal clusters, and the representation of the birth and death of trees in forests. These studies use a dot map on a grid map to visualize the spatial and temporal patterns of point-based events. The dot map is appropriate to display the spatial and temporal patterns of individual events. However, RDI shows the relative composition of small businesses in certain areas. The dot map does not meet to represent the spatial and temporal patterns of RDI. The MAUP affects results when point-based measures of spatial phenomena, such as the startup, closure, or operation of small businesses, are aggregated into districts. The results of data aggregation are dependent on the analyst's choice of which "modifiable area unit" to use in the analysis. The market area information is aggregated, analyzed, and geovisualized over a census block using the small business database, composed of individual small business datum. The census block is also subject to change over time, meaning that the MAUP must be considered when comparing past data with current data. The census block is not appropriate for the spatial unit to catch real-world patterns derived from information at the level of individual small business. Therefore, the grid map is employed to analyze spatiotemporal data with temporally independent spatial analysis units as time passes. This is one of the feasible solutions to the MAUP. We presented the overall building procedure of the grid-based STDM for small businesses as shown in Table 4. Figure 3 shows the conceptual data model to visualize the changes in the relative diversity of small businesses according to time and space in the macroscale at the metropolitan/provincial level and in the mesoscale at the city/ county/borough level. This model generates the spatiotemporal data tables by processing three types of source data, such as small business database, 1 km grid, and administrative boundaries of cities/counties/ boroughs, and visualizes the RDI of the small business database by connecting source data with spatiotemporal data tables. Table 5 shows a list of raw data applied to build the grid-based STDM. The small business database in this study is the data of small businesses nationwide provided by the Market Area Information System of Small Enterprise and Market Service based on the Act on the Protection of and Support for Micro Enterprises. Small Enterprise and Market Service classifies the types of industry on the small business database into 20 high-level categories, 238 mid-level categories, and 3,348 low-level categories.
The scope of this data includes shops that are in operation on the relevant base date in each of the four quarters of a year. Currently, it provides data on shops nationwide that are in operation for a total of 13 quarters from Q4 2015 to Q4 2018. For the 1 km grid, we used the grid system data defined by the National Geographic Information Institute according to the 'grid system setup and spatial informatization of administrative data'. (6) For administrative boundary data, we used the relevant data of digital maps provided by the Road Name Address Developers' Center.   To build a massive STDM, we used the Geospatial Big Data (GSBD) Service System that can save, manage, analyze, and provide spatial big data as shown in Fig. 4. (7,8) The GSBD service system is composed of four layers, including the GSBD source layer, GSBD storage and management layer, GSBD analytics layer, and GSBD service application layer. In particular, the GSBD storage and management layer is mainly employed for the demonstration scenario. The layer performs the functions of GSBD collection, storage/management, and query processing. In other words, it stores, manages, and accesses GSBD rapidly in the Hadoop environment, supporting the query processing engine, which processes the SQL-based spatial query language, as well as the application program interface, which enables the application program to access and use GSBD. Batch-based GSBD Extract, Transform, and Load (ETL) and the GSBD Storing and Management Tool are used to save and preprocess source data in a form that can be processed as spatial big data.
The small business database with point features is connected to the 1 km grid and city/county/borough administrative district data by applying a spatial join function. The spatiotemporal data table that shows the RDI of small businesses was built in the form of spatiotemporal sparse grids and a time-wide table using the GSBD Query/Interface. This index is created by applying Eq.

Case Study
Choropleth maps are conventionally applied to explore and investigate statistical data aggregated over administrative boundaries of metropolitans, provinces, cities, counties, or boroughs. The administrative boundaries employed as a spatial unit of choropleth maps are good for the abstraction of regional statistics. However, the spatial unit is not appropriate to observe real-world patterns derived from information at the level of individual small business. This case study employs a 1 km grid as a spatial unit of the STDM for monitoring the RDI of a small business database. Through the case study, we tried to demonstrate and examine the applicability of the grid-based STDM by identifying the changes in the RDI of small businesses according to time and space in the macroscale at the metropolitan/provincial level and in the mesoscale at the city/county/borough level. Figure 5 shows the mean of the RDI for 17 metropolitan cities and provinces by quarter. Seoul shows an RDI mean of approximately 2.0 throughout 13 quarters, which indicates that, compared with the other 16 metropolitan cities and provinces, the relative diversity of small businesses is maintained consistently at a relatively high level. The RDI tends to be directly proportional to the number of shops of small businesses owing to the nature of the equation. Figure 6 shows that the dots in the plots  represent the distribution of the RDI and the number of small businesses opened in Q4 2018 in scatter plots to compare these features among 17 metropolitan cities and provinces. For all 17 metropolitan cities and provinces, the RDI tended to be directly proportional to the number of small businesses included in the 1 km grid. Seoul and Gyeonggi-do especially show that the sizes of businesses in operation and the diversity indexes were widely distributed at high levels compared with those in other metropolitan cities and provinces. A large RDI represents a change into different market areas with various industrial compositions as well as market areas convenient for multipurpose shopping with diverse industries concentrated. On the other hand, a small RDI represents the same market areas focused on certain industries compared with the entire local government, forming market areas with convenient comparative shopping as the same industry businesses are concentrated. Figure 7 spatially shows the RDI of small businesses in operation in Q4 2018 based on the 1 km grid.
The diversity index of small businesses was high, at least 2.5, around the central market areas of 17 metropolitan cities and provinces. These major market areas are composed of relatively diverse industries compared with the entire component ratio of the relevant local government. The neighboring areas of these major market areas have values of 1.5-2.5. The farther the neighboring areas from the major market areas, the lower their values become below 1.5. Figure 8 shows the result of the spatiotemporal visualization of RDI in the area that connects Eungam Station, the central street area formed along Yeonseoro and Tongillo, and the neighborhood market area in the surrounding residential district to analyze the RDI changes over time in the market areas of Eunpyeog-gu, Seoul at a microscopic level. Among the four subway stations, Eungam Station and its neighboring areas have high values of more than 3.5, whereas the street market area near Yeonseoro that passes Eungam Station, Gusan Station, and Yeonsinnae Station and the street market area around Tongillo that passes Yeonsinnae Station and Gupabal Station have values of 3.0-3.5. As they move farther away, they have values lower than 3.0. This indicates that the RDI of small businesses changes according to the station influence area, street market area, and neighborhood market area. In other words, the diversity of industrial compositions for small businesses increases from the street market area to the station influence area, which increases the RDI. Moreover, market areas are formed with the same industry businesses necessary for neighborhood living from the street market area to the neighborhood market area, which decreases the RDI.

Discussion
Studies analyzing spatiotemporal data define temporally independent spatial analysis units as time passes and build and analyze spatiotemporal data based on the data used to solve the MAUP that changes over time. (9)(10)(11)(12) The MAUP affects results when point-based measures of spatial phenomena, such as the startup, closure, or operation of small businesses, are aggregated into districts. The results of data aggregation are dependent on the analyst's choice of which "modifiable area unit" to use in the analysis. In particular, in the case of using census district boundaries as a spatial analysis unit to represent the spatiotemporal trend of residential population or employment, the boundaries are also subject to change over time, meaning that the MAUP must be considered when comparing past data with current data.
In this study, we established the industry data of small businesses in the nationwide 1 km grid-based data models to solve the variable MAUPs that may occur when monitoring the changes in market areas over time. These data models standardize the spatiotemporal data at certain units, facilitating the comparison, analysis, and visualization of the changes in the relative diversity of small businesses in the macroscale at the metropolitan/provincial level and in the mesoscale at the city/county/borough level. To analyze the spatiotemporal changes of more microscopic market areas, it is necessary to subdivide the existing spatiotemporal analysis units into smaller units. (12) However, since information loss and data volume may increase geometrically as the spatial analysis unit is subdivided, there is a need for research to prevent information loss by systematically subdividing the spatial analysis unit and efficiently using computing resources.
This study proved that by applying RDI, the diversity of industry compositions of small businesses increased toward the station influence areas, whereas it decreased toward the neighborhood market areas by forming the same industry market areas. However, there are insufficient grounds to determine whether the RDI among the five indexes presented by Duranto and Puga (1) is the most suitable index in measuring the changes in the relative diversity of small businesses or specialization. Therefore, the grounds for choosing a more suitable index can be established by applying the STDM presented in this study to the four other indexes and comparatively analyzing the results.
The results of this study provide the research foundation to analyze and monitor various external factors affecting the relative diversity of small businesses according to spatiotemporal changes. In other words, we can analyze the changes in the spatiotemporal distribution of the population around the market area, improve accessibility to the population around the market area according to the establishment and expansion of urban infrastructures, such as subway stations, and examine the changes in market areas of small businesses according to business startups or closures in the same or different industries in association with the grid-based STDM. The results can be used to analyze and generalize the external factors affecting startups and closures of small businesses in the spatiotemporal aspect and also as the baseline data to establish differentiated small business support policies by coming up with regional features.

Conclusions
In this study, we proposed a grid-based STDM applying the RDI to analyze the market areas of small businesses in terms of relative diversity. Owing to the small business database including only xy coordinates, this study is restricted to consider an STDM representing events as a point. The MAUP affects results when point-based measures of spatial phenomena, such as the startup, closure, or operation of small businesses, are aggregated into districts. The grid map is employed to analyze spatiotemporal data with temporally independent spatial analysis units as time passes. This is one of the feasible solutions to the MAUP. We presented the overall building procedure of the grid-based STDM for small businesses. The conceptual data model is present to visualize the changes in the relative diversity of small businesses according to time and space in the macroscale at the metropolitan/provincial level and in the mesoscale at the city/county/borough level. This model generates the spatiotemporal data tables by processing three types of source data, such as small business database, 1 km grid, and administrative boundaries of cities/counties/boroughs, and visualizes the RDI of the small business database by connecting source data with spatiotemporal data tables. We classified changes in the market areas of small businesses into periodical events and not moving events in the spatial and temporal dimensions, respectively. The market areas of small businesses are events that are changed by activities, such as the startup, closure, temporary shutdown, and operation of small businesses, while taking up a specific location according to a certain time. Thus, we applied spatial-temporal sparse grids that are most suitable among the four types of STDM. We selected the time-wide format, which is suitable for monitoring the RDI according to temporal changes in market areas and openly shared data of small businesses every quarter as the type of spatiotemporal data table. Through the case study, we demonstrated that applying the grid-based STDM facilitates comparative analysis in terms of macro-and mesoscales by solving the MAUP that changes according to time, bringing convenience to the visualization analysis of spatiotemporal patterns, and establishing a grid-based RDI of small businesses with consistent spatial units. The procedural methodologies of designing and building a gridbased STDM and measuring, comparing, and visualizing the diversity of market areas of small businesses through the case study can be used for spatial and spatiotemporal point pattern analyses in various applications, such as the monitoring of the spatial and temporal patterns of industry in terms of diversity or specialization and the detection of spatial and spatiotemporal industry clusters. In addition, it is necessary to test various sizes of spatial units in macro-, meso-, and microscales. The result could be used to demonstrate a methodology developed to flexibly customize an STDM in various spatial units.