Land Cover Classification of Imagery from Landsat Operational Land Imager Based on Optimum Index Factor

With over four decades spent collecting spaceborne moderate-resolution imagery, Landsat represents the longest remote sensing mission in the world, and has had various applications. Land cover mapping is its heritage for research around the world. Landsat 8 continues the legacy of previous Landsat systems, with a new Operational Land Imager (OLI) sensor that has high spectral resolution and improved signal-to-noise ratio for better characterization of land cover. With improved quality, data size also increases. Hence, with limited research in adjusting data size, it is necessary to explore robust land cover classification techniques that produce accurate maps with more or fewer inputs. The Optimum Index Factor (OIF) is a statistic value that can be used to select the optimum combination of three bands in a satellite image that has the highest amount of information. In this study, we explore the land cover classification of OLI imagery based on OIF. Two test sites were selected around the hilly regions of Korea for OLI original composite, first-rank OIF composite, and OLI original with sum derivative of top-three OIF ranked composites. These three composites were classified with the well-known Spectral Angle Mapper (SAM) and Support Vector Machine (SVM) classifiers. The results were then analyzed and compared on the basis of producer accuracy, user accuracy, overall accuracy, and kappa coefficient. The result shows that the first-ranked OIF with a three-band composite shows a similar classification accuracy in SVM and slightly less in SAM, while the ten-band composite with OLI original bands and the sum derivative of the top-three OIF rank shows the same result or a small improvement in SVM classification. OIF-derivative composites can be useful in classification problems depending on whether the minimum amount of data for a similar result or more data to achieve higher accuracy is preferred.


Introduction
The era of satellite remote sensing began after the successful launch of Sputnik 1, the first artificial satellite of the former Soviet Union, on October 4, 1957. (1) Since then, many remote sensors with different capabilities have been providing revolutionary scientific insights of the Earth's surface for various purposes. The remotely sensed data have been widely used in various applications such as agriculture, disaster monitoring, forest and vegetation monitoring, hydrology, land use/cover change, and so on.
Since 1972, Landsat satellites have provided satellite imagery of the Earth's surface through its sensors. (2) Owing to the consistent, cross-calibrated set of records, (3) Landsat imagery has maintained a tradition of land cover mapping in many research studies. (4) Landsat 8 is the latest addition that maintains the legacy of previous Landsat systems. Its new Operational Land Imager (OLI) sensor, with high spectral resolution and improved signal-to-noise ratio, (5,6) provides high-quality land cover mapping imagery, enabling better characterization of the Earth's surface. (7) The high image quality also increases the image size, requiring high computation costs for classification tasks.
Lu and Weng carried out a detailed survey on classification methods to improve performance. (8) In recent years, various techniques have been used to produce land cover maps using OLI imagery. (9)(10)(11)(12)(13)(14)(15) These studies have contributed significantly to the field but have been limited to classification results and accuracy comparisons. Very limited focus has been on improving accuracy by altering original data, i.e., reducing bands or adding derivative bands. Moreover, comparisons of classifiers for complex urban greenery, vinyl house farmlands, mixtures of coniferous and deciduous forests, and other land cover types are also limited. Hence, it is necessary to explore robust land cover classification techniques with adjusted data inputs that produce accurate maps.
The Optimum Index Factor (OIF) is a statistic value that can be used to select the optimum combination of three bands out of all possible three-band combinations, which has the highest amount of 'information', i.e., the highest sum of standard deviations, with the least amount of duplication (lowest correlation among band pairs). (16) OIF can provide a minimum band composite with much information, and the sum of these composites could give extra information. In order to test this, in this study, we aim to explore the land cover classification of OLI imagery in two test sites around the hilly regions of Korea based on OIF scores. The original OLI composite (Comp7) along with first-ranked OIF composite (Comp3) and OLI original with the sum derivative of top-three OIF ranked composites (Comp10) were classified using the Spectral Angle Mapper (SAM) and Support Vector Machine (SVM). In order to examine the robustness of the band composites, the accuracies of the results of the new composites based on ground truth were compared with the original OLI composite.

Study area
Two study areas in South Korea were selected for this study (Fig. 1). The first study area is located in Hwasun county, Jeollanam province. It is situated between 35°0'17.15" to 35°7'39.08" N latitude and 126°54'41.89" to 127°2'35.08" E longitude covering an area of approximately 167.05 km 2 . The elevation in the area ranges from 30 to 1177 m and mostly a hilly area with forest, agricultural lands, a small urban area with a small river, and a few small water bodies.
The second study area is located slightly northeast of the first in Gumi City, Gyeongsangbuk province. It is situated between 36°4'16.19" to 36°11'34.06" N latitude and 128°16'21.83" to 128°24'30.97" E longitude covering an area of approximately 165.16 km 2 . The elevation ranges from 25 to 958 m and has hilly forest, agricultural land, dense urban areas with a large river, and a few small water bodies. The urban area is a dense residential and industrial built-up area with some rural villages.
The two areas represent the typical land cover in South Korea with dense urban areas, mixed forests, and farmlands with vinyl houses. The selection of these was based on the availability of Landsat imagery and similar-date high-resolution imagery in Google Earth Pro (GEP) for validation purposes.

Data used
For this study, Landsat 8 OLI images (one for each study area) were selected. Both of the images were downloaded from the United States Geological Survey (USGS) EarthExplorer platform and were the highest quality Level-1 Terrain corrected products, i.e., L1T. The image details are shown in Table 1. The obtained GeoTiFF images were converted from digital numbers to Top of the Atmosphere (TOA) radiance by using the Radiometric Calibration preprocessing tool in ENvironment for Visualizing Images (ENVI) v.5.2 based on the MTL header files information. The Radiometric Calibration carried out subsetting of the whole scene to the study area based on the provided shape file. Then, the subsetted TOA radiance scenes were transformed to reflectance using the Fast Line-of-sight Atmospheric Analysis of Hypercubes (FLAASH) tool in the ENVI. For the average elevation parameter in FLAASH, the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model (GDEM) of 1 arc-sec resolution downloaded from the USGS EarthExplorer was used.
Four types of land cover classes were chosen for this study. A detailed description of each class is shown in Table 2. Additionally, for the purpose of training and accuracy assessment, the high-resolution images available in GEP were used as the ground truth data. The availability of the high-resolution images in GEP corresponding to the OLI imagery was also one of the major reasons for the selection of the study area as well as the dates. The corresponding images available in GEP for Hwasun and Gumi are October 24 and 30, 2013 and January 17, 2015, respectively, which are the nearest possible dates for similar land-cover identification. These two dates also represent the typical season and variation in land cover in Korea.
On the basis of the knowledge of the areas and the distribution of land cover, 3 × 3 pixel polygons were selected as the training input for classification, which were as pure as possible and represent all types of classes ( Fig. 1). However, for validation, random points were used and crosschecked with GEP and its high-resolution images. Figure 2 shows the validation points in the first-rank OIF images. The numbers of used training polygons and validation points are shown in Table 2. Figure 3 shows the overall schematics of the study. After the preprocessing and atmospheric correction of the subsetted Landsat OLI imagery, each band of the images was exported to 8-bit gray-scale GeoTIFF files in ENVI Classic and later imported to Integrated Land and Water Information System (ILWIS) v.3.31 Academic to form a map list and OIF scores were thus calculated for each study area.

Methodology
In order to calculate the OIF, a minimum of three raster maps are required within the same value domain in the same georeference. The inputs for the calculation of OIF are standard deviations and correlation coefficients for each band in the composite image. First, the possible where Std is standard deviation of band i and Cor is the correlation coefficient of bands i, j, and k. Finally, the OIF values are ranked. Using the rank, one can create a three-band composite of a satellite image for maximum visual information.
On the basis of the ranked composite, we create two new types of composite image out of the original OLI bands. The first composite is a three-band composite of the first-rank OIF and the other is a ten-band composite of seven OLI original bands and three derivative bands created by summing the top-three OIF ranked composites, i.e., OIF1 (sum of 3 bands), OIF2 (sum of 3 bands), and OIF3 (sum of 3 bands). The Comp3 composite represents the minimum set of band composites that could be used for the classification, whereas Comp10 represents the addition of informative bands for the evaluation of classification accuracy. Both Comp3 and Comp7 composites will be compared with the original composite bands for accuracy assessment and whether they improve the accuracy of land cover mapping.  For the classification of the band composites, the two most widely used classifiers that are available in ENVI software, namely, SAM and SVM, are used for the classification of the images.
SAM is a physically based spectral classification method that allows quick mapping of the spectral symmetry based on angles of the image spectra to the reference spectra, treating them as vectors in n-dimensional space. (17) The reference spectra can either be a spectrum measured in a laboratory, a field spectrum, or obtained directly from the image. It is not affected by solar illumination factors, because the angle between the two vectors is independent of the vector length. (17,18) In ENVI, SAM assigns the angles to output channels, and then every pixel is allocated to the class defined by the reference spectrum. The class that is assigned to each pixel is saved in the output channel. (19) It has been widely used in the classification of land cover in remote sensing. (20)(21)(22) SVM is a nonparametric machine learning algorithm used for classification and regression in remote sensing. It is based on minimizing the structural risk and maximizing the separation margin. (23) The success of SVM depends on how well the process is trained. SVM often yields good classification results from complex and noisy data and is thus often used as the reference state-of-the-art method for comparison of object identification and classification. (24)(25)(26)(27)(28) In order to assess the accuracy of the classifications, measures of overall accuracy (OA), kappa coefficient (kappa), producer's accuracy (PA), and user's accuracy (UA) were used. These statistics are the most widely used in remote sensing classification, in which the OA is defined as the ratio of the total number of correctly classified pixels to the total number of pixels (the total number of all ground truth reference pixels), whereas the PA corresponds to the omission error and the UA corresponds to commission error. (20,25,29) The kappa coefficient uses all elements in the error matrix, and it is used to accurately explain errors ranging from −1 to +1, where 0 represents the amount of agreement that can be expected from random chance, and 1 represents perfect agreement between the raters. (30)

Results and Discussion
The map list formed in ILWIS was used to create the correlation matrix and standard deviation of each OLI band as shown in Table 3. By using values from Table 3 and Eq. (1), the three-band composite OIF scores, and thus ranks, were derived. The ranks for both study areas are shown in Table 4. In Fig. 2, the first-rank composites are shown in the Red-Green-Blue color composite. In both images, land covers (built-up, forest, land, and water) are distinctly separate from each other.
After forming the ten-band composite using the sum of the top-three OIF ranked bands, the training polygons were analyzed to see the effectiveness of the newly formed bands using the plots of the mean reflectance in each class. Figure 4 shows that in both study areas, the composites newly formed by summing have high separability and are important in the classifications.
After applying the classifiers on each of the composite images, the resulting maps were as shown in Figs. 5 and 6 for the Hwasun and Gumi study areas, respectively. The results were  neither aggregated nor filtered for better comparison. The results in both study areas were different and the methods produced varied results. Visually, owing to the limitation of the satellite resolution, the results show a great deal of salt/pepper effects and only with careful observation can a difference in the classes of pixels be detected. The general misclassification was between water and the shadows of tall buildings; between sparse vegetation, green roof buildings, cropland, and forest; between bare land and urban areas; between factories with blue roofs and vinyl houses; and so on. The water area in the river was not large enough to be continuously mapped and small water bodies were highly confused with the growing vegetation surrounding them. Similarly, limitations were also apparent in the linear visualization of the roads. Overall, both Comp3 composite with only three bands and Comp10 composite with ten bands produce visually similar maps compared with the original composite of OLI. As shown in Table 5, the percentages of the pixels classified by each method were very inconsistent among built-up areas, forest, and open land. The highest inconsistency was shown between open land and built-up classifications in the Hwasun area, whereas it was between open land and forest classifications in the Gumi area. This could be the result of the dominant class and limitations in the training data. In the Hwasun area, the SVM method produced more built-up pixels than SAM, assigning them to the land class, whereas in the Gumi area, land pixels were more assigned to the forest class. The reason for the inconsistency is the seasonal variation in the study area images. The Hwasun area was more covered by green forest, and the farmland with growing crops showed much similarity to the impervious built-up areas. Additionally, in the case of the Gumi area, the fact that the image was from winter meant that the deciduous forest was sparse and very similar to farmland with growing crops. This could also be the effect of a dominant class feature. In terms of the band composites, the results of Comp7 are very similar to those of Comp3 using SAM, whereas the SVM results from Comp7 were similar to those from Comp10. This shows that depending on the classifier and study area, both Comp3 and Comp10 can produce similar results comparable to the original OLI composite, i.e., Comp7.
Results were validated with ground truth to see how the randomly sampled pixels were classified. Table 6 shows the classification accuracies of the classifiers for all three composite images of the Hwasun and Gumi areas. It can be very easily seen that differences in accuracies were observed for different classifiers in different study areas. In the Hwasun area, the overall accuracy was the highest (89.6%) for Comp7 for the SVM classification, and the PA and UA achieved the highest accuracies of 100% for water only. In the Gumi area, the SVM classification for Comp10 achieved the highest accuracy of 92.8%, and the SVM classifier for Comp10 forest samples and the SAM classifier for Comp10 water samples achieved the highest PA and UA. From the variation in accuracies in Table 6, we can see that Comp3 has reduced OA while Comp10 has improved with reduced kappa coefficient compared with Comp7. Comp3 shows a similar but reduced change in OA for SAM while showing improvements in the Hwasun and Gumi regions. However, in Comp10, the OA has improved or remained nearly identical. Also, SVM shows better OA and kappa coefficient results than SAM in both of the composite cases. The similarity or improvement of results based on the OIF scores and composites can be very useful depending on the classifier, seasonality, and study area. For transferring data or comparing large-volume data, the Comp3 bands are useful, whereas improved accuracy for complex land cover can be achieved by the addition of derivative bands. It also shows that SAM does not work well with fewer bands and shows less improvement with the addition of bands. In contrast, SVM shows a similar result with even fewer bands and is improved with the addition of derivative bands.

Conclusions
With the improvement in data quality of additional new remote sensors, the exploration of improvements in land cover classification accuracy at the cost of data size is essential. In this study, we explored the classification accuracy of Landsat OLI imagery derivatives in two test sites around the hilly regions of Korea based on OIF scores using two well-known classifiers, SAM and SVM. The Comp3 and Comp10 composites were compared with the Comp7 composite. On the basis of the validation by ground truth, results were compared with PA, UA, and OA along with the kappa coefficient. It can be concluded that only Comp3 shows a similar classification accuracy in SVM and slightly less in SAM. In the case of Comp10, the composite shows the same results or an improvement in the SVM classification. OIF derivative composites can be useful for classification problems depending on whether the minimum amount of data for a similar result or more data to achieve higher accuracy is preferred. Improving land cover mapping accuracy is beneficial to authorities for better analysis of the environment, but further work is required to validate our findings for different cases and variations in sensors, seasons, and classifiers.