Spatial Distribution Characteristics of Species Diversity Using Geographically Weighted Regression Model

The objective of this study is to evaluate the spatial distribution patterns of species diversity at different spatial scales, focusing on the Baekdudaegan Protected Area, which is a biodiversity hotspot in the Republic of Korea. The tree species diversity index (Shannon–Weaver index; H ′) was calculated using tree species data from a 1:5k forest-type map, and the spatial analysis was performed with a 1 × 1 km 2 grid. Ten factors were selected to estimate the impact of topographic (elevation, slope, northern slope, curvature, wetness, and relief) and geographic (distances from water, road, forest road, and urban area) factors on H ′ using the ordinary least squares (OLS) and geographically weighted regression (GWR) models. H ′ increased with the spatial scale. Also, the coefficient of determination ( R ²) of the GWR and OLS models increased proportionally and the R ² of the GWR model was higher than that of the OLS model. Corrected Akaike information criterion (AICc) was lower in the GWR model than in the OLS model, which indicates that the GWR model fits the calculated H′ better than the OLS model. Thus, the GWR model is considered to be more practical than the OLS model for understanding the effects of topographic and geographic factors on H ′ at different scales.


Introduction
Rapidly increasing deforestation and the resulting environmental changes have caused a significant reduction in global biodiversity. The Convention on Biological Diversity held in Rio de Janeiro in 1992 led to the establishment of the Nagoya Protocol in 2010, which aims to conserve global biodiversity. (1) To cope with the changing environmental policy in intergovernmental relations on global biodiversity issues, the Republic of Korea established a national plan for forest biodiversity (2013 to 2017) and made related efforts including hosting the 12th Convention on Biological Diversity. (2,3) The Baekdudaegan region is well known for abundant biological and ecological communities and for playing a key role in forest ecosystems in the Republic of Korea. Owing to its unique vegetative characteristics within varied topographic features, the Baekdudaegan region has a diverse range of wildlife and plant species and is a hotspot for biodiversity both ecologically and for academic research. (4,5) Determining the factors that affect species and biodiversity distributions is critical in conservation perspectives. (6) However, huge gaps in the available information on the spatial distribution of plant species pose a major challenge for regional conservation planning in many parts of the world. (7) Previous studies on domestic forest distribution patterns have examined the relationships of the distribution of Salicaceae plant species, topography, and soils with the factors that affect the distribution patterns of plants in the northern Gyeonggi region by principal component analysis (PCA). (8,9) Many studies have investigated the spatial patterns of species diversity and the relationship between species diversity and topographic and geographic factors using various methods such as correlation, regression, and geographic information system (GIS) analyses. (10)(11)(12) Various regression, habitat, and fitness models have been developed in an attempt to quantitatively evaluate biodiversity and spatial distribution features. (13) Although the spatial range of biodiversity research studies has been expanded, it is still limited to site-specific studies along with the recognition of environmental issues including climate change. (14,15) In the Republic of Korea, Han and Lee (16) and Sim et al. (17) conducted studies on the dependence of the distribution modeling of species on the spatial scale and distribution patterns of biodiversity. The topographic and geographic factors affecting forest biodiversity were often spatially inconsistent with similar distributions in adjacent areas but different distributions in distant areas. (18,19) In addition, because the vegetation exhibits various communities in a continuous pattern, causing spatial autocorrelation, the spatial location of attribute information should be considered to analyze the quantitative spatial pattern. (20,21) The geographically weighted regression (GWR) model has been used in forestry sectors, such as for forest resource evaluation (e.g., timber volume estimation), because it allows the estimation of the regression model even in cases where the requirements for spatial correlation, normality, and homoscedasticity are not met. (22,23) Although the distribution of species diversity varies regionally and with the spatial scale within a forest ecosystem, few studies have examined its correlation with the tree species diversity index. (24) Our study aims to compare the GWR model with the ordinary least squares (OLS) model in terms of the variation with the spatial scale and evaluate major factors affecting tree species diversity by examining the effects of topographic and geographic factors on the tree species diversity index (H′) of vegetation communities in the Baekdudaegan region.
We concluded that the GWR model is more practical than the OLS model for understanding the effects of topographic and geographic factors on H′ at different spatial scales. It is hoped that this study will be an important reference for future studies on the biodiversity distribution in the region.

Study area
The study site is the Baekdudaegan Protected Area of Gangwon Province, which is well known as a biodiversity hotspot in the Republic of Korea, as shown in Fig. 1. The Baekdudaegan Protected Area of Gangwon Province is divided into 10 sections according to the ecological map of Baekdudaegan Mountains; the northern region (Goseong, Inje, Sokcho, and Yangyang) containing section 10; the central region (Hongcheon, Gangneung, and Pyeongchang) contains sections 8 and 9; and the southern region (Jeongseon, Donghae, Samcheok, Taebaek, and Yeongwol) contains sections 6 and 7. (25) The total study area is 134093 ha, 70% of which is the core area and 30% the buffer zone. The total area of the forest is 126363 ha, 74% of which is deciduous forest, 14% coniferous forest, and 12% mixed coniferous and deciduous forests. (26) The major deciduous species are Quercus mongolica, Fraxinus rhynchophylla, Carpinus cordata, Betula costata, Acer pseudosieboldianum, and Lindera obtusiloba. The major coniferous species are Pinus densiflora, Pinus koraiensis, Larix leptolepis, and Abies koreana. (27)(28)(29)(30)(31) The average elevation is higher in the northern (894 m) and southern (863 m) regions than in the central region (758 m). The average slope gradient is also higher in the northern (33°) and southern (29°) regions than in the central region (28°).

Data and study methods
In this study, the tree species diversity index, which is a dependent variable to be applied to the GWR and OLS models, was calculated for each of the spatial scales 3 by 3 (s3), 5 by 5 (s5), 7 by 7 (s7), and 9 by 9 (s9) using the Baekdudaegan Protected Area map (1:25000 scale) and the forest-type map (1:5000 scale) provided by the Korea Forest Service. Topographic factors were used as independent variables to obtain a digital elevation model [e.g., DEM (30 m) from forest type map (1:5000 scale)] and geographic factors were calculated as independent variable using a river map, a road map, a forest road map, and a building map (1:25000 scale).
The calculated factor values were applied to the OLS and GWR models; then, the corresponding values obtained with both models were compared and the major factors affecting H′ were evaluated, as shown in Fig. 2. The ArcGIS 10.1 software was used to calculate H′. The dependent variable H and independent variables, and the topographic and geographic factors were used to construct the GIS database by using the ArcGIS 10.1 program. GWR and OLS were calculated by using "Geographically weighted regression" tool and the "Ordinary least squares" tool of the ArcGIS 10.1 program.

Tree species diversity index (H′)
H′ is related to species richness and species evenness. It provides more information than simply the number of species present and makes species richness and evenness easier to see through numerical structures by quantifying the local tree species diversity. (32)  H′ was estimated using tree species data in the forest-type map. The forest-type map is a spatial digital map that shows the age and diameter classes of trees on a map of 1:5000 scale in the Republic of Korea. The tree species data of the forest-type map is composed of approximately 52 items of forest information obtained from a forest of less than 0.5 ha, which is the minimum area unit of the forest. Tree species data were converted from the polygon form to the grid form except for nonforest and deforested areas. The spatial resolution of the grid was set to 1 × 1 km 2 . Graham and Hijmans analyzed correlation at a resolution using 1 × 1 km 2 , 25 × 25 km 2 , and 50 × 50 km 2 in the calculation of species richness and found that the calculation of species richness is more correlated at a low resolution than at a high resolution. (15) However, in this study, the forest-type map used in the calculation of the species diversity index had a scale of 1:5000. Therefore, when converting from the polygon form to the grid form, the highest resolution of 1 × 1 km 2 was used. The spatial scale used to determine the range of population in the species diversity index calculation was set to 3 by 3 (s3), 5 by 5 (s5), 7 by 7 (s7), and 9 by 9 (s9).
As shown in Fig. 3, 3 by 3 means that the value of each cell is calculated by moving the cell from the total of nine populations to determine the value of H′ of the central grid at the spatial scale that was calculated by applying the spatial scale technique. (17) This was carried out to determine H′ according to the change in spatial scale.
The Shannon-Weaver index, which is the most widely used index for calculating H′ in forest ecosystems, was chosen to calculate H′ according to the spatial scale. H′ is calculated by applying the tree species diversity index to Shannon's formula. H′ is the Shannon-Weaver index, whose value is the number of grids of all species at the spatial scale divided by the total number of grids at the spatial scale. (32) The formula used to calculate H′ is as follows.

Topographic and geographic factors
As the independent variables to be applied to the GWR and OLS models, six topographic factors and four geographic factors were selected as additional factors that were judged to affect biodiversity by referring to prior studies on the evaluation of species diversity in Baekdudaegan. (8,10,34) Song and Cao reported that elevation and wetness significantly correlate with tree species richness. (35) Gonzalez and Mata reported that tree species richness and diversity correlate with elevation and slope, which are important factors for predicting tree species richness and diversity. (36) On the other hand, slope is expressed as one of the eight directions, and cannot be digitalized; thus, in this study, calculations were performed using the northern slope, which can be digitalized.
The following factors are considered to affect tree species diversity. The topographic factors included elevation, slope, northern slope, curvature, relief, and wetness data extracted from the DEM, and the geographic factors included distance from water (D_water), distance from a road (D_ road), distance from a forest road (D_forest road), and distance from an urban area (D_urban), which were calculated using the Euclidean distance, as shown in Table 1. The spatial analysis unit of the grid for topographic and geographic factors was set to 1 × 1 km 2 and constructed using the GIS database.

GWR model
The GWR model is used to determine tree species diversity from topographic and geographic factors.
The GWR model provides each spatial position with a regression coefficient, which is unattainable using the global model (OLS). The regression analysis performed with the GWR model selects the center point and the standard distance in each selected area under the assumption that different regression coefficients can be acquired. This means that the regression coefficient is not calculated using a constant value but using the position functions for each geographic location i. (37,38) The formulas used for the OLS and GWR models are given in Eqs. (2) and (3), where Y i are the response variables. X k are the set of predictor variables p (k = 1, 2, ..., p) are the regression coefficients for the kth predictor variable and ith location and ε i are the random error terms. The GWR model calculation was used only when the location coefficient (u i , v i ) was assigned to the OLS model and the population parameter could be estimated for the position. (39) The regression coefficient β k was assigned a weighted value as a function of the position. A location near geographic location i had a greater effect than a distant area. Although the GWR model followed the weighted least squares (WLS) structure, the regression coefficient was weighted as The weighted matrix W i for geographic location i produced different estimated results depending on the spatial composition method. It was generally used as a bandwidth using a different kernel function depending on the density tendency of the data. When the bandwidth that determined the weighted value was fixed, the fixed kernel method was used, whereas when the bandwidth was variable, the adaptive kernel method was used. In the present study, the fixed kernel method was used when the observed sites were distributed regularly throughout the study area, whereas the adaptive kernel method was used when the sites were distributed irregularly or had uncertainty to determine. In addition, the corrected Akaike information criterion (AICc) method [Eq. (5)] was used to determine bandwidth because it considers the difference between observed and estimated values, and the complexity of a model. (23,40) We used the residuals between the regression points to evaluate the models.

Model evaluation
The coefficient of determination (R²), the root mean square error (RMSE), the AICc index, and, the global Moran's I values were used to assess the OLS and GWR models. (23) The AICc was calculated using the bandwidth between the data point and the regression point, and used to determine the degree of improvement of the GWR model in comparison with the OLS model. If the GWR model had a lower AICc value and the difference was greater than 4, it was considered that the GWR model was better than the OLS model. (40,41) Here, n is the number of features, y is the dependent variable, ŷ is the estimation, and σ is the estimated standard error of residuals.
To determine the spatial autocorrelation of residuals in spatial data, we used global Moran's I values. Also, Moran's I value was used to determine the similarities between the attribute data of the surrounding feature and the average attribute data of the total study area from the target feature. As Moran's I value approaches 0, spatial autocorrelation rarely appears and the observed value becomes more random; as it approaches +1, a positive spatial autocorrelation exists and the observed value becomes more clustered; and as it approaches −1, a negative spatial autocorrelation exists and the observed value becomes more dispersed. (42,43) For the spatial weight matrix W ij , the inverse distance method was used, in which the spatial heterogeneity increases with the distance between spatial unit features i and j. The Euclidean distance was measured as the distance between features. (21) The weighted distance was set as 1 km so that the grids facing the surrounding area could be included.
Here, n is the number of observations (grids), x is the mean of the variable, x i is the value of the variable at a particular location, x j is the value of the variable at another location, and W ij is the weight index for the location of i relative to j.
The average H′ value at the different spatial scales ranged from 0.991 (s3) to 1.371 (s9). These values were similar to the average H′ value calculated for areas from Hyangnobong to Gitdaebaggybong in the Baekdudaegan region of Gangwon Province by Jeong and Oh. (44) Gabriel et al. (45) reported that there were significant differences in the tree species diversity distribution according to the spatial scale. In this study, the average H′ increased with the spatial scale and was 1.37 times higher at s9 than at s3. Moreover, the minimum H′ increased from s7, but the maximum H′ hardly changed with the spatial scale.
The average H′ was higher in the northern and southern regions, whose elevation and slope were higher than those of the central region regardless of the spatial scale. In particular, in the southern region, H′ had the highest value of 1.879 at s9, similar to the value of H′ calculated by Hwang et al. (28) as shown in Table 2 and Fig. 4.

Selection of major factors affecting H′
The correlation coefficients between the major factors, except for that between the slope and the relief, ranged from 0.005 to −0.511. The correlation coefficient between the slope and the relief was 0.811 but the variance inflation factors (VIFs) for the slope and relief factors were lower than 10 and multicollinearity did not occur, as shown in Table 3. Therefore, we included all 10 factors when constructing the regression model.

Evaluating model fitness
In this study, the fitness of the OLS and GWR models was evaluated to analyze the spatial relationship between the dependent and independent variables. OLS is a global model that estimates regression coefficients at the overall level, and if there is geographic irregularity, it cannot distinguish spatial changes. GWR is a regional model that is used to analyze the difference according to the spatial location.
By comparing the fitness between OLS and GWR, it was determined which model was more suitable for analyzing the spatial relationship between the dependent and independent variables. The coefficient of determination of the OLS and GWR models increased with the spatial scale. In the case of s9, the coefficient of determination was 0.41 for the GWR model, which was 3.4 times higher than that in the OLS model. Also, the coefficient of determination was higher for GWR than for OLS regardless of the spatial scale. This was consistent with the results of other studies. Liu et al. used GWR to determine the changes in microregions. (23) The RMSE tended to decrease for both models as the spatial scale increased. Also, the RMSE was lower for the GWR model than for the OLS model regardless of the spatial scale,   and the difference in RMSE between the models increased with the spatial scale. In addition, the AICc of the models tended to decrease as the spatial scale range increased; it was lower for GWR than for OLS. Jo (46) and Kim and Jun (38) reported the improvement factors evaluated on the basis of the difference in AICc, and in the present study, as the spatial scale increased, the difference in AICc between models occurred, indicating that GWR performed better than OLS.
Moreover, the Moran's I of the two models increased with the spatial scale. In the case of s3, Moran's I, values of the OLS and GWR models were close to 0, and thus, there was no spatial autocorrelation. Similarly to that reported by Kim et al., (47) Moran's I was lower in GWR than in OLS and the vegetation distribution characteristics were differently composed with clusters. Therefore, the GWR model is more suitable when considering the spatial effect in the current study area, as shown in Table 4.

Distribution of local R² of tree diversity index calculated by incorporating six topographic factors and four geographic factors
The main advantage of the GWR model is that the local R² and residual values can be determined for each feature, (48) and the coefficient of determination was used to measure the fitness of the regression equations at each spatial scale. When the coefficient of determination is near 1, ten factors have a strong effect on H′. (49) Figure 5 shows the distribution of the local R² at each spatial scale.
The distributions of local R² values were calculated using each grid, which is the spatial unit in the GWR model, as shown in Fig. 5. The difference between the maximum and minimum local R² values was the highest (0.46) for the s9 spatial scale. The differences in the R² values for the s3, s5, and s7 spatial scales were 0.11, 0.24, and 0.40, respectively, indicating that as the spatial scale increased, the difference in the coefficient of determination also increased.
In terms of spatial distribution characteristics, the coefficient of determination of the southern region (Samcheok, Taebaek, and Yeongwol) was the highest in s3, but that of the central region (Gangneung and Pyeongchang) was the highest in s5 and s7. Moreover the local R² of s9 was high in the southern region (Taebaek and Yeongwol) as well as in the central region (Gangneung and Pyeongchang).

Distribution of local R² of H′ calculated by separating each factor
In this study, the tree species diversity distribution obtained using the GWR model depended on the region owing to the effects of topographic and geographic factors.
The correlation was determined by regression analysis to examine the effects of topographic and geographic factors on the tree species diversity. Table 5 gives the regression coefficient distribution of the six topographic factors and four geographic factors affecting H′, which includes the coefficients, average, minimum, and maximum values of each factor in the study area. The GWR model was used to determine H′, which is a variable dependent on the topographic and geographic factors. H′ was calculated to determine which topographic and geographic factors affected the tree species diversity.
The average coefficient of regression for elevation was low for all four spatial scales with values of less than -0.0007. The coefficients of regression for the average slope were 0.00763 and 0.00187 for s3 and s5, respectively, indicating that H′ increased with the slope. For s7 and s9, the coefficients of regression for the average slope were −0.00189 and −0.00207, respectively, demonstrating that H′ decreased with decreasing slope. The coefficients of regression for the average northern slope were −0.01051 and −0.00447 for s3 and s5, respectively, indicating that H′ increased as far as the northern slope of high value. For s7 and s9, the coefficients of regression were 0.00332 and 0.01030, respectively, indicating that H′ increased as close to the northern slope of high value. For the curvature, the average coefficient of regression was positive for all four spatial scales (s3, 0.51848; s5, 0.12887; s7, 0.15992; and s9, 0.10466), showing that H′ increased with the curvature. By comparing the curvature with the other factors, the average coefficient of regression for the curvature was high and had the greatest impact on H′. The coefficients of regression for average wetness were 0.01135 and 0.00465 for s3 and s5, respectively. H′ increased with the wetness, whereas for s7 and s9, H′ increased with decreasing wetness. The coefficient of regression for the average relief had positive values for all four spatial scales (s3, 0.00006; s5, 0.00058; s7, 0.00076; and s9, 0.00057), indicating that H′ increased with the difference in the elevation of the surrounding terrain. However, the effect of elevation on H′ was minimal. The coefficient of regression for the four geographic factors of D_water, D_road, D_forest road, and D_urban ranged from −0.00002 to 0.00000, showing that their effects on H′ were negligible.
The major topographic factors determining H′ for the GWR model were curvature, northern slope, slope, and wetness. In particular, the curvature and northern slope were positive, and when the curvature and northern slope increased or when the slope and wetness decreased, H′ increased. The relief was also positive, and when the relief increased, H′ also increased. However, the regression coefficient of the relief is low, meaning that it has a negligible effect on H′. The effect of geographic factors was lower in the H′ prediction model.  Our study found that topographic factors significantly correlate with tree species diversity, proving the practicality of topographic factors for studying the tree species diversity distribution for each grid (region).
On the basis of the above discussion, we can predict how tree species diversity will change with topographic factors. In Table 5, each number gives the impact of a factor on the dependent variable H′ for the corresponding grid unit (region). Figure 6 shows the regional coefficient distribution of the explanatory variables for tree species diversity calculated by the regression formula at spatial scale s9. The effect of each coefficient on tree species diversity varies from region to region. The coefficient for each factor has positive and negative values, and the magnitude of the coefficient is different for each grid (region).
The northern region of the study area has high elevation and slope. As the elevation and slope increased, H′ also increased. In contrast, the central and southern regions have low values, which led to low H′. In addition, the northern slope and curvature are high in most areas, and as the northern slope and curvature increased in most areas of the study area, H′ also increased. From the effects of each factor on R² for the s9 spatial scale, which had the highest R² among the four spatial scales, the elevation, wetness, and D_water had the greatest effects in the northern areas; the slope and northern slope had the greatest effects in the central area; the curvature and D_road had the greatest effects in the southern area; and the relief, D_forest road, and D_urban had the greatest effects in the northern and southern areas. These results indicate that H′ was affected differently according to differences in the regional distribution of topographic and geographic factors, as shown in Fig. 6.

Conclusions
This study was carried out to determine the spatial distribution characteristics and effects of topographic and geographic factors on H′ at four spatial scales (s3, s5, s7, and s9) in the Baekdudaegan Protected Area using the GWR model.
The method of calculating H′ at each spatial scale using a forest-type map was effective for understanding and displaying the spatial distribution characteristics of species diversity in the Baekdudaegan Protected Area. In addition, OLS and GWR were compared in terms of their effectiveness in determining the relationship between topographic and geographic factors, and species diversity. Two models were compared using R², RMSE, AICc, and Moran's I.
The GWR model was more suitable for examing the relationship of H′ with topographic and geographic factors because R² was higher in the GWR model than in the OLS model, and AICc and RMSE were lower in the GWR model than in the OLS model. Because the GWR model shows the variations of properties caused by spatial specificity, it is possible to determine the variations of species diversity depending on the relationship of variables. By calculating H′ at each spatial scale, the coefficient of determination between the 10 factors and H′ was obtained, and the effect of each factor on tree species diversity and the effects of the different regions were determined.
In a future study, if the variables affecting tree species diversity are predicted for each grid (region) through the GWR model, or if other factors are added to the factors proposed in this study, it will be important to increase the efficiency of the investigation of tree species diversity by identifying the spatial changes in the region. Localized and detailed information of factors affecting tree species diversity can be useful for the management and planning of the Baekdudaegan Protected Area by estimating the distribution of tree species diversity in each region, which is expected to be important for improving the forest protection system.