Comparison between Object-based Method and Deep Learning Method for Extracting Road Features Using Submeter-grade High-resolution Satellite Imagery

1Department of Civil Engineering, Sangji University, 83, Sangjidae-gil, Wonju-si, Gangwon-do 26339, Republic of Korea 2Chung-ang Aerosurvey Co., Ltd., 146-1, Tongil-ro, Jongno-gu, Seoul 03180, Republic of Korea 3Geospatial Information Technology Co., Ltd., 15, Pangyo-ro 228-gil, Bundang-gu, Seongnam-si, Gyeonggi-do 13487, Republic of Korea 4Inspace Co., Ltd., 96, Gajeongbuk-ro, Yuseong-gu, Daejeon 34111, Republic of Korea


Introduction
The Ministry of Land, Infrastructure, and Transport of the Republic of Korea has been developing two satellites, KAS500-1 and KAS500-2, that are capable of acquiring images with a ground sample distance (GSD) of 0.5 m. The KAS500 satellites are intended to be used solely for land observation purposes. By gaining competence in technologies for land surface monitoring, such as land use (LU) classification and spatial feature extraction, change detection and time-series monitoring, and digital surface model/digital terrain model (DSM/ DTM) extraction, the Ministry has been developing software that enables the utilization of the information provided by these satellites.
LU classification and spatial feature extraction technologies are excellent tools for evaluating the environment and its changes. (1) LU and land cover (LC) changes over a certain period are essential to understand the development of human activities within a region to define the impact of anthropogenic and natural activities. (2) Remote sensing technology, which is similar to LU classification and spatial feature extraction technologies, provides information about the environment on a regional or global scale using time-series data, as well as real-time data. (3,4) Various image classification technologies associated with LU classification and spatial feature extraction have been developed. Threshold values can be applied using a brake or boundary threshold value in a single-reflection bandwidth or using a derived spectral index or changed bandwidth (5)(6)(7) to either distinguish a single object or classify it into multiple classes. There are two techniques involved in this type of classification method: the supervised classification technique, which uses ground verification data, (8,9) and the unsupervised classification technique, which searches the endmembers first. (10) With the unsupervised classification technique, the linear unmixing method utilizes the endmembers to solve the spectrum of the image and classifies the image by detecting the proportion of a feature in each pixel's spectrum. (11,12) Another method of extracting features is the object-based classification method. Objectbased classification typically uses spatial information of the pixel's group, which is recognized along with the object. It has been demonstrated that this method is effective primarily for high-resolution satellite imagery, such as ASTER, (13) KOMPSAT-2, (14) QuickBird, (15)(16)(17) and WorldView-3 imagery. (18) Object-based analysis has been gaining importance in the fields of remote sensing, especially for high-spatial-resolution image processing. (19) Recently, deep learning-an algorithm that uses artificial intelligence (AI)-based machine learning-has been utilized in diverse image analysis fields. A vast amount of input imagery is needed for machine learning. (20) Machine learning in image analysis has been applied to the development of a semantic segmentation algorithm (21) that distinguishes the desired target from an image by implementing a fully convolution network (FCN) by performing network surgery on a convolution neural network (CNN). Recent achievements in machine learning have shown that its performance is similar to or exceeds the decision-making capability of humans. (22) Owing to the advancements in the field of high-resolution remote sensing and the success of semantic segmentation using deep learning in computer version, extracting a road network from a high-resolution remote sensing image is becoming increasingly popular, and has become a new tool to update the geospatial database. (23) AI-Khudhairy et al. extracted roads using object-based methods. (24) Their method contains three steps: texture information extraction road extraction and postprocessing. Compared with unsupervised methods, supervised methods are generally more accurate. (25) These methods, which include support vector machine (SVM), random decision forests, and deep learning, extract the roads on the basis of training using labeled samples. (26) In this study, two solutions-an object-based spatial feature extraction software tool developed using on open-source software and the "eCognition developer" commercial software developed by Trimble-were used to extract road features from satellite imagery, and the feasibility of using the software developed in this study was determined by quality analysis. In addition, a deep-learning-based spatial feature extraction module was developed and evaluated. Issues with the deep learning technique and needs for its improvement were identified by road feature extraction and quality analysis. Only the images taken by the KOMPSAT-3A satellite were used for the experiment in this study because the KAS500 satellite is still under development and cannot capture images. Figure 1 shows an overview of this study.

Materials and Methods
Currently, the Korea Land Satellite Center of the National Geographic Information Institute of the Republic of Korea is developing the KAS500 satellite, as well as object-based spatial feature extraction software in an open-source environment to maximize the utilization of images obtained by the KAS500 satellite. (27) In this study, both object-based spatial feature extraction software developed in an open-source environment and commercial software were used to extract road features, as shown in Fig. 2, and quality analysis of the results was performed. In addition, road features were extracted using deep learning training to assess the applicability of deep learning techniques to the same task.

Study Area
The area of the target region selected for this study is approximately 6.167 km 2 . The top left corner of the region is at 531054.370m(N), 281930.629m(E) and the bottom right corner is at 528257.670m(N), 284172.949m(E), using the GRS80 TM coordinate system. This area is located in Wonju-si, Gangwon-do, South Korea (Fig. 3). The altitude above sea level is between 104.173 and 193.257 m.

Satellite images used in road feature extraction
Because the KAS500 satellite is currently under development, satellite images taken by it are not yet available. Therefore, KOMPSAT-3A satellite images, which are expected to have similar specifications to those of the images to be acquired by the KAS500 satellite, were used to conduct this research. High-resolution multispectral data acquired by the KOMPSAT-3A satellite on October 29, 2015 were used in this study. The KOMPSAT-3A satellite images used in this study were purchased from the Korea Aerospace Research Institute (KARI) through the Arirang Satellite Image Search and Order System (ASIOS) portal. These data consist of red (R), green (G), blue (B), near-infrared (NIR), and panchromatic images in the GeoTIFF format, and they are level-1G images. The KOMPSAT-3A satellite images collected for this study were 14 bits; however, in this study, they were converted to 8 bits. Table 1 shows the specifications of the KOMPSAT-3A satellite images used in this study.

Development of Object-based Spatial Feature Extraction Software Based on
Open Source Software

Design of object-based spatial feature extraction software
System for Automated Geoscientific Analyses (SAGA) GIS version 6.4.0 was chosen as the basic platform for use in developing the object-based spatial feature extraction software based on open-source software. SAGA GIS is open-source software that supports the objectbased classification technique, and the level of difficulty for development is low. In addition, it was determined that functions already implemented on the platform could be utilized in this study. Libraries such as GDAL, OpenCV, CXSparse, and the SAGA GIS engine were required to develop the software. The software was designed in two phases: a data creation and visualization phase and data postprocessing phase. The required functions were designed as shown in Fig. 4.  The user interface (UI) of the object-based spatial feature extraction software was designed to show the process of extracting spatial features using satellite images and to provide various attribute properties and layers of information. As shown in Fig. 5, the UI is composed of multiple windows, such as the window that allows the user to verify the layer information of the input data, the main window, and the window that shows the attribute properties.

Development of image segmentation function
If spatial feature extraction is performed using the object-based classification technique, image segmentation must be performed. The image segmentation analyzes the similarity of the pixels that make up the image and creates pixel groups.
Because SAGA GIS was selected as the development platform for this study, the seeded region growing algorithm-the image segmentation algorithm used by the SAGA GIS platform-was implemented. The seeded region growing algorithm calculates the similarity among the pixels using the mean and the standard deviation of the pixels surrounding the extracted seed and forms a cluster consisting of pixels with high similarity (Fig. 6).
The image segmentation stores the observed values of the pixels in a table format based on the seed. The pixels are grouped into clusters based on similarity. During this process, pixels with low similarity are corrected. The values of the neighboring pixels are checked, and pixels with low similarity are grouped into clusters (Fig. 7).

Development of image classification function
Clusters that have high similarity are formed by performing image segmentation, and these clusters need to include parameters for performing the image classification. Polygon-shaped files are created for the clusters that have high similarity. Parameters such as the spectrum average, standard deviation, and spectrum ratio are calculated using the pixels for each band inside the polygon. The parameters are stored as attribute properties in each polygon. In addition, the normalized difference vegetation index (NDVI) and normalized difference water index (NDWI) are determined through the computation of the image in each band, and the average value of the values inside the polygon is inputted as an attribute property. NDVI can identify or monitor vegetation such as crops. (28) NDWI can detect water bodies and wetlands. (29) In addition to the computation using the bands of the satellite images, the average slope inside the polygon, which is calculated using a DTM, and the average value of the normalized digital surface model (nDSM), which is obtained by subtracting DTM from DSM, are entered as attribute properties. All these attribute properties are used as parameters for the image classification. Table 2 shows the algorithms used in this study to create parameters for image classification.

Development of function that merges image segmentation and classification results
A spatial feature extracted through the image segmentation and classification process consists of several clusters of pixels, so it contains a considerable number of polygons. If polygons of the same class are connected to each other as shown in Fig. 8, a function that merges these polygons into a single polygon is needed. However, the SAGA GIS platform does not support a function for merging the polygons. Hence, a function to merge polygons of the same class was implemented by linking to an open-source database service called PostgreSQL.

Accelerating satellite image processing
When inputting a satellite image with a size of 9041 × 10771, it took approximately 40 s to load the new image onto the SAGA GIS development platform. Additionally, performing the image segmentation using the seeded region growing algorithm took more than 3 h. From the software user's perspective, the spatial feature extraction process taking more than 3 h is very inconvenient. Therefore, a parallel processing technique that utilizes Intel's advanced vector extensions 2 (AVX2) was employed to accelerate data processing during the spatial feature extraction (Fig. 9). Intel's AVX2 serializes two-dimensional image data into one-dimensional data.
By using AVX2 during the satellite image loading process, the image data were serialized into one-dimensional data, and the image loading time was reduced to approximately 1 s. In addition, the image segmentation process, which is the most time-consuming part of the spatial feature extraction process, took between 20 min and 1 h, depending on the value of the scale factor.

Road Feature Extraction Using Object-based Method
To analyze the quality of the object-based spatial feature extraction software developed in an open-source environment, both this software and the commercial software were used to extract road features in the target area of this study, and quality analysis of the results was performed. In this study, roads were defined as paved surfaces on which vehicles can be driven.

Establishing spatial feature extraction process using object-based method
Instead of the existing serial process for creating a thematic map, a parallel process for extracting individual features and utilizing the extracted features by classification item was applied to the object-based spatial feature extraction software developed in this study.
The serial process is a process that extracts from the largest feature to the smallest feature in sequential order. It takes a long time to extract all the features, and this method also has the disadvantage of having to extract features not needed by the user to extract a small feature. On the other hand, the parallel process utilizes a specific feature as spatial information. Unlike the serial process, the parallel process extracts only specific features. The parallel process is faster than the serial process because it selects the feature you want to extract and removes the feature that you do not need, and it is expected that this process can be automated. (30) Therefore, the parallel process was used in this study. Figure 10 shows a comparison between the serial and parallel processes.

Road feature extraction using object-based spatial feature extraction software based on open-source software and commercial software
The object-based spatial feature extraction software developed based on open-source software uses the seeded region growing algorithm during the image segmentation and uses the size of the region clustered around the seed as the parameter. The commercial software, Trimble's eCognition Developer, version 9.2.1, utilizes a multiresolution segmentation algorithm, which is the most commonly used algorithm for image segmentation. The scale, shape, and compactness are used as parameters, and ideally, the same threshold values for the shape and compactness are used. In addition, it is most efficient to apply the algorithm after setting the threshold values of the shape parameter and compactness parameter to 0.1 and 0.5, respectively. (31) Image segmentation was performed for both the object-based spatial feature extraction software and the commercial software, such that the sizes of the polygons created after the image segmentation for both software tools were similar. In addition, image classification for road feature extraction was performed after configuring the parameters and the threshold values of the parameters to be identical for both software tools, as shown in Fig. 11.

Development of Deep-learning-based Spatial Feature Extraction Module and Road Feature Extraction
The deep learning module was designed to use the Deep U-net model to train and extract spatial features. Implementation and development of the module were carried out using TensorFlow as the development platform.

Deep U-net model for data learning and spatial feature extraction
Because the purpose of applying deep learning in this study was to extract the region of features corresponding to the road, semantic segmentation, which classifies each pixel from the satellite images, was used. Because deep learning training must be carried out first to perform the semantic segmentation, the Deep U-net model was implemented using TensorFlow. Compared with the existing U-net model, (32) which can perform accurate semantic segmentation using a small amount of learning data, the Deep U-net model used in this study can reduce losses. As shown in Fig. 12, the Deep U-net model consists of the DownBlock and UpBlock components, so it has a symmetrical structure like the U-net model. This model can solve the problem of loss errors increasing as the network deepens. Furthermore, the Deep U-net model demonstrates excellent performance when performing segmentation of complex images. (33)

Collection of deep learning training data
In this study, images from the WorldView-2 and WorldView-3 satellites were used for deep learning training. Considering the regional characteristics and the specifications of the satellite images, conducting deep learning training using images acquired over South Korea by the KOMPSAT-3A satellite would yield the best results. However, there was an insufficient amount of satellite image data collected over South Korea available from the KOMPSAT-3A satellite, and it was also difficult to obtain these data. Therefore, images from the satellites WorldView-2 and WorldView-3 were collected from SpaceNet on the Amazon Web Service (AWS), because it was easy to collect these images and because the specifications of the images taken by these satellites are similar to those of the KOMPSAT-3A satellite images. Table 3 shows the spectral region of each band for the WorldView-2 and WorldView-3 satellite images, as well as the KOMPSAT-3A satellite images. Table 4 shows information about the WorldView-2 and WorldView-3 satellite images that were collected for use in the deep learning training.

Deep learning training through preprocessing of training data and road feature extraction
True values corresponding to roads in the satellite images should be learned to extract road features through semantic segmentation. The true values of training information for the roads were generated by digitizing the regions corresponding to the roads on the WorldView-2 and WorldView-3 images, as shown in Fig. 13.
For the deep learning training, road mask data were generated through digitizing using the satellite images. They were then cropped to 256 × 256 pixels using a sliding window to create training data sets. A total of 8000 data sets were created, including sets for red, green, blue, and near-infrared band images and road mask data. The total duration of the deep learning training was 126 h, and the training was conducted after setting the number of learning cycles to 600 epochs for the learning data sets. To extract the road features, the training information was applied to the KOMPSAT-3A satellite images of the target region for this study.

Results of road feature extraction for target region of study
In this study, road objects extracted with open-source-based developed software, commercial software, and deep learning were compared. The software developed in this study allowed algorithms such as NDVI and NDWI to be calculated, and parameters generated by algorithm calculation were used for object extraction.
The road features extracted using the object-based spatial feature extraction software, which was developed based on open-source software, and the commercial software were created as binary data, as shown in Figs. 14(a) and 14(b). Data of this type have a value of either 0 or 1. A value of 0 indicates that the extracted feature is not a road, and a value of 1 indicates that the  extracted feature is a road. In addition, regions that are not roads are marked in white, and the regions that correspond to the roads are marked in black. Results showed that road features extracted by the two methods were similar. The deep-learning-based spatial feature extraction module used the completed deep learning training results to extract the road features in the target region of this study, as shown in Fig.  14(c). The results of the road feature extraction using deep learning were generated as raster data with values between 0 and 1. If the value is closer to 1, it is highly probable that the extracted feature corresponds to a road. If the value is equal to 1, it indicates that the extracted feature is definitely a road. A value of 0 indicates that the extracted feature is definitely not a road. Road features extracted by deep learning showed different shapes from those extracted by object-based classification methods.  The roads extracted by the object-based classification method and the deep learning method were compared at the same location. Results showed that the roads extracted with the objectbased spatial feature development software developed in this study and eCognition were similar, but the roads extracted by deep learning were clearly extracted from broad roads, as shown in Fig. 15.
As a result of extracting the road feature by each method, the object-based method had a problem that the road was not properly extracted because of shadows. Moreover, the deep learning method used satellite images from overseas regions, so there was a problem that narrow roads were not extracted.

Quality analysis of road feature extraction results
In the quality analysis of the road feature extraction results, a confusion matrix was used to evaluate the performance of a binary classification technique that checks whether the extracted road feature is equal to an actual road feature. The confusion matrix assesses the quality of extraction results using true positive (TP), false negative (FN), false positive (FP), and true negative (TN) classifications. Recall indicates the capability of the method to extract a pixel, and accuracy indicates the ratio of the successfully extracted or recognized pixels to the entire set of pixels. (34) On the basis of the ground truth of the target region in this study shown in Fig. 16, the quality analysis involved using the confusion matrix to calculate the recall and accuracy of the road features extracted using the developed object-based spatial feature extraction software, the commercial software, and the deep-learning-based module using Eqs. (1) and (2). TP is the number of pixels that correctly matched what is actual as actual, and TN is the number of pixels that correctly matched what is not actual as not actual. FP is the number of pixels that incorrectly matched what is not actual as actual, and FN is the number of pixels that incorrectly matched what is actual as not actual.
The quality analysis was conducted with the confusion matrix using the extracted road features and the ground truth. The results showed that the roads extracted using the objectbased spatial feature extraction software were represented with 83.9% accuracy and 50.2% recall. The roads extracted using the commercial software were represented with 84.1% accuracy and 54.0% recall. The roads extracted using the deep-learning-based module were represented with 88.6% accuracy and 29.7% recall (Tables 5 and 6).

Conclusions
In this study, quality analysis was conducted on road features extracted using (1) an objectbased spatial feature extraction software tool developed using open-source software, (2) commercial software, and (3) a module developed based on deep learning. The accuracy of the extracted road features for all three methods was over 80%.
Image classification for road feature extraction was performed after configuring the parameters and the threshold values of the parameters to be identical for both software tools. The road feature extraction recall for the object-based spatial feature extraction software and the commercial software were similar, both being approximately 50%. The reason the recall was as low as 50% in areas where the road could not be extracted is that shadows covered the road. Since this research is ongoing, it is expected that the developed software and commercial software will be similar to the current level, but will be capable of outperforming commercial software through ongoing research.
AVX2 was used in the object-based spatial feature extraction software to improve the image processing speed. As a result, the image processing time, which used to be more than 3 h, was reduced to between approximately 20 min and 1 h. Hence, it is anticipated that the object-based spatial feature extraction software developed in this study will be attractive for widespread use.
The module developed based on deep learning demonstrated a very low recall of approximately 29%. This can be attributed to the fact that the satellite images used in the deep learning training were taken over overseas regions instead of South Korea, and the road conditions in the overseas regions and in South Korea are different. In addition, it was determined that the recall was low because the satellite images used for the deep learning training and road feature extraction were not identical. Through continued study and the use of images taken by the same satellite over South Korea for both deep learning training and spatial feature extraction, the spatial feature extraction recall of the deep-learning-based module can be improved.