Visual-perception-driven Urban Three-dimensional Scene Data Scheduling Method

Toward solving the problems of low data scheduling efficiency and relative delay in rendering when constructing complex urban three-dimensional (3D) scenes, we propose a visualperception-driven strategy based on scene graphs. According to the spatial distribution characteristics of 3D scene data, this strategy uses a scene graph to organize the local 3D scene of a city. It uses a level-of-detail simplification algorithm to simplify the 3D model of the city into four resolution levels. On this basis, a visual-perception-driven strategy based on scene graphs is designed. This strategy utilizes the good relationship attributes between geographical entities provided by scene graphs to construct a visual-perception evaluation model and help constrain the adaptive scheduling of models at different detail levels. Experimental results show that this method can effectively improve the data scheduling efficiency and accelerate the construction of local 3D scenes.


Introduction
With the rapid development of the economy and society, the urban modernization process is also accelerating. Recent decades have seen the construction of more high-rise buildings and multistory underground and above-ground spaces. Cities are beginning to become threedimensional (3D) and complicated. These changes have piqued interest among the research community and cartographers. The cartography community urgently needs to strive for a new level of information acquisition and storage and processing technology to ensure the smooth flow of information and the migration of map functions through maps. Map products are more likely to meet users' needs directly. (1) Owing to the increasing development and popularization of computer technology, traditional maps have evolved from paper to electronic and from two-dimensional planes to 3D images. Maps contain more information than before, the degree of restoration of scenes contained in these maps is increasing, and the requirements for data are also increasing. The rapidly developing data acquisition and modeling technologies provide various data products for visualizing local urban scenes, including large-scale digital surface models, point cloud models, and 3D building model data. On the one hand, the large amount of data output makes the 3D visualization of an urban scene more 3D and realistic; on the other hand, it requires a heavy computational burden for the rendering of 3D scenes. It is difficult to meet the real-time and fluency requirements of an urban 3D scene drawing with only the basic rendering pipeline. (2) Therefore, making full use of multiscale spatial data for transmission and expression to realize the rapid construction of 3D scenes and provide users with a good experience is of great significance.
Many attempts have been made to solve this problem, and one of the techniques employed is dynamic data scheduling. This is achievable by constructing a multiresolution hierarchical model to gradually transmit data in a scene based on a certain principle with increasing detail. (3)(4)(5) Previous studies have shown that reasonable data scheduling can solve the problems of drawing bottlenecks and long effective waiting times for users to a certain extent. Wu et al. proposed a prioritized presentation of abstract spatial location relationships based on the discretized storage of scene data and designed a feedback scheduling mechanism that focuses on user pause time based on this outline. This study improved the scene response speed and reduced the transmission of invalid data. (6) Zhu et al. proposed a visual-perception-driven strategy based on an adaptive quadtree, which can schedule a suitable level-of-detail (LOD) model by calculating screen errors in real time. The dynamic visualization rendering frame rate is maintained at approximately 40 f/s. This rate meets the visual consistency requirements of 3D visualization of complex urban scenes in a network environment. (7) Benz and Weibel proposed a loading strategy based on urban area density and points of interest. This strategy prioritizes the rendering of road elements that attract users' attention. (8) Su et al. proposed a fast display method for a large-scale 3D city model based on the rule of viewpoint movement in which the appropriate LOD model was selected on the basis of the distance from the entity to the viewpoint position and eccentricity. The operating efficiency was greatly improved, and the frame rate was increased from 18.9 to 30.1 f/s, which provided a continuous and smooth visualization effect. (9) Wang et al. proposed a high-precision model adaptive display strategy based on block storage. By employing the block matching algorithm from this viewpoint, a suitable number of block models were matched from a database for partial rendering. The local high-precision display results were improved significantly, and the frame rate under the front view angle increased from 5 to 88. (10) Huo et al. proposed an evaluation model that takes the characteristic local degree into account. After evaluating the selected characteristic degree at different scales, detailed data with a high local characteristic degree are loaded according to the weight. The dynamic visualization rendering frame rate reached 58 f/s, improving the user experience. (11) On the basis of a dynamic scheduling strategy of an undirected graph, the distance between a node and viewpoint is measured in real time, and the appropriate LOD model is replaced. The efficiency of dynamic visualization was between 38 and 62 f/s, which can meet the requirements for continuous and smooth browsing. (12) Popescu and Zhen proposed a greedy scheduling strategy that can optimize the transmission of graphics data on a bandwidth-constrained network and increase the visual quality by an average of 50%. (13) Zhang et al. proposed a progressive grid-based scene management strategy for regions of interest in underground space, which was loaded on demand, effectively reducing the use of invalid resources in a CPU. The efficiency of dynamic visualization was maintained at 30 f/s. (14) Although the above methods optimized the visualization effect from the results, most of them were designed from a computing data perspective, for instance, the screen error, the distance between the viewpoint and node, the eccentricity, the volume ratio, or the local feature degree was calculated. However, the content of human visual perception is never numerical data such as distance data and volume ratio data but a relative and imprecise feeling. For example, a house may be brightly colored (compared with the surrounding buildings) or a gymnasium may be difficult to see (compared with closer buildings). The corresponding concept of "bright color" is prominent or not prominent, and the corresponding concept of "difficult to see" is near or far. Prominent or not prominent and near or far are two groups of relations. Therefore, the scheduling strategy of model transmission between a server and web page in this study is more inclined to simulate human perception by comparing relationships rather than by calculating data. Specifically, in this study, we propose a visual-perception-driven strategy based on scene graphs. According to the spatial distribution characteristics of 3D scene data, this strategy uses a scene graph to organize a local 3D scene of a city. It uses an LOD simplification algorithm to simplify the 3D city model into four resolution levels. Finally, the good relationship attributes between geographical entities provided by the scene graph are used to construct a visualperception evaluation model to assist in the adaptive scheduling of different LOD models. Experimental results show that this method can effectively optimize and improve the loading and transmission of a city's 3D model. Figure 1 shows the flow of the proposed algorithm. This algorithm comprises four parts: scene graph partition, LOD model design, a visual-perception evaluation model based on geographical relations, and a dynamic scheduling strategy. First, according to the spatial distribution of buildings, scene graphs are used to organize complex 3D city scenes. Then, the edge folding algorithm is used to simplify the model into four LOD levels as per the "Technical Specifications for Urban 3D Modeling" and store them. (15) Then, the good relationship attributes between the geographical entities provided by the scene graph are used to construct a visualperception evaluation model. Finally, on the basis of the viewpoint information, data are preloaded. The weight evaluation of the visual-perception evaluation model based on geographical relations constrains the adaptive scheduling of different LOD models.

Simplification of city's 3D model
A 3D model of a city is mainly composed of many triangles, so simplifying the model can be considered equivalent to simplifying the triangles. An improved simplification algorithm based on edge folding employing the principle of wavelet surface subdivision is adopted, correction coefficients are extracted for new vertices generated by edge folding, and the new vertices are corrected. (16) Figure 2 is a schematic diagram of the edge folding algorithm.
Suppose the newly generated vertex has coordinates X_0 (x_0, y_0, z_0) and the actual coordinates of the vertex are X_1 (x_1, y_1, z_1), then the correction coefficient J is the difference between X_0 and X_1, that is, According to the above algorithm and referring to the "Technical Specifications for Urban 3D Modeling," the 3D model is simplified into four LOD levels and stored. Table 1 shows the LOD classification.

Scene graph division
Scene graph division is a data management method that organizes scene data into graphs. (17) For geographical scenarios, each geographical entity corresponds to a specific node. In this method, a root node is selected, then group nodes are added on top of the root node. This operation is then iterated by adding attributes to each geographical entity.
As shown in Fig. 3, the organization and management of a 3D scene adopt a hierarchical management method. This structure shows the hierarchical relationship between objects. Each  Special application 10-20 LOD3 Urban regional development trend analysis 20-50 LOD4 City pattern evolution analysis >50 connected object in the figure has its corresponding hierarchical relationship. The upper object can serve as the reference coordinate system of the object below connected to it. At the same time, the positions of the object and the parent node can also be spatially translated and rotated, and other operations can be performed. (2)

Evaluation model of visual perception based on geographical relationships
The design of a node resolution evaluation model is usually combined with specific application scenarios. For example, in a complex 3D city scene, line-of-sight factors, perspective factors, and the model size are mainly considered, (18)(19)(20) or a multiangle and multielement evaluation of a 3D scene based on the local feature degree is performed, (11) or terrain feature information, distance, terrain roughness, terrain slope, moving speed when roaming, viewpoint moving direction, visual-perception intensity, and the size of the data block in large-scale terrain are considered. (21)(22)(23)(24)(25)(26) Although previous research nearly optimized the visualization effect from the results, most of the points considered fit human visual perception from the viewpoint of data. (27) This kind of fitting process will cause information loss and has a large computational overhead due to the artificially set threshold-perception conversion.

Human visual system
The information processing mechanism of the human visual system is a highly complex process, which includes three characteristics: visual attention, brightness and contrast sensitivity, and visual concealment. Visual attention: Human vision quickly locates important target areas and conducts a detailed analysis, whereas other areas are only roughly analyzed or even ignored. Brightness and contrast sensitivity: The human eye's perception of the brightness of an external target depends on the brightness difference between the target and background. In other words, the human visual system's ability to distinguish brightness is limited and it can only distinguish target objects with a certain brightness difference, and special attention will not be paid to brightnesses with a small difference. Visual concealment: The interaction or mutual interference between visual information will have a visual concealment effect.

Visual-perception evaluation model based on geographical relationships
Most geographical entities do not exist in isolation. The spatial distribution relationships of geographical entities in geographical space include topological relationships. These relationships correspond to the characteristics of visual occlusion in the human visual system. Moreover, each geographical entity also contains its functional and shape attributes and should be appropriately converted. This feature also matches the characteristics of visual attention, brightness, and contrast sensitivity. These attributes and relationships can be accurately expressed in group nodes in a scene graph. Therefore, a visual-perception evaluation model based on geographical relationships is designed. On the basis of the good relationship attributes provided by the scene graph, this model considers application requirements, as shown in Fig. 4.
We also designed the definition rules of entity weights. If part of a building is occluded, the building is defined as partially occluded; if the difference between the gray value of the building and the average gray value of the surrounding environment is more than 50, the building is defined as prominent. The hierarchical relationship between nodes reflects the distance between nodes. The hierarchical relationship between nodes reflects the distance of buildings to a certain extent. The more hierarchical the relationship, the farther the building is. According to the requirements of different scenarios, some buildings with specific functions should be given a larger weight. For example, in an emergency and disaster relief system, the weight of a hospital will be higher.

Dynamic scheduling
The dynamic data scheduling in this study mainly includes two parts: preloading of data based on viewpoint information and weight evaluation of the visual-perception evaluation model based on the geographical relationship to constrain the adaptive scheduling of different LOD models.

Data preloading
Data preloading refers to loading the visible model data of the next frame into memory in advance to reduce the user waiting time. (1) The range of data preloading is typically determined by predicting future viewpoint information based on the current viewpoint motion state and line-of-sight direction, and then predicting the data range that needs to be scheduled and the LOD required. The traditional view-based data preloading method uses the projection area to represent the display range, which is divided into the loaded area, the data-preloading area, and other areas, (28)(29)(30) as shown in Fig. 5.
The diopter of the human eye is typically 124°, and when attention is focused, it is approximately one-fifth of that, i.e., 25°. The horizontal viewing angle of a single eye can reach a maximum of 156°, and the horizontal viewing angle of both eyes can reach a maximum of 188°. The overlapping visual field of human eyes is 124°, and the comfortable visual field of a single eye is 60°. Generally, in 3D scene browsing, the currently visible area should be 60°, which corresponds to a comfortable monocular field of view. Therefore, we adopt a preloading design that fits the human eye comfortably, as shown in Fig. 6.
The data preloading area is 32° on one side and the loaded area is 60°. Suppose the loaded area of the traditional method is S 1 , the preloaded area of the traditional method is B 1 , the loaded area of this method is S 1 , the preloaded area of this method is B 2 , the actual visible area is S, and the loaded area of the traditional method is S 1 = a 2 . The loaded area of the proposed method is    In summary, the efficiency of the traditional method in data loading and preloading is significantly lower than that of the proposed method. Figure 7 shows the current viewpoint scene using an example urban area. Data scheduling based on the visual-perception evaluation model generates a graph of the scene in the viewpoint according to the current viewpoint parameters, as shown in Fig. 8.

Data scheduling based on visual-perception evaluation model
A weight calculation is performed for each geographical entity in the current scene according to the set evaluation process, and then different levels of LOD models are loaded according to the weight. Figure 9 shows the evaluation process.
When the scene is roaming, the updated viewpoint scene is shown (Fig. 10). At this time, there is no need to regenerate the scene graph, and a group node of the parent node can be used as a new parent node, resulting in a new scene graph, as shown in Fig. 11. The remaining geographical entity weights are then calculated on the basis of the existing weights.
In summary, the weight evaluation of the visual-perception evaluation model based on geographical relationships proposed in this study has two advantages in constraining the adaptive scheduling of different LOD models. The first is that it avoids information loss when using data to fit human visual perception. The second is that it reduces the calculation overhead of the screen error of each geographical entity in the real-time calculation of the scene.

Experiment 1: Data transmission volume experiment based on visual-perceptiondriven data scheduling method
Experiment 1 uses a visual-perception-driven data scheduling method to visualize largescale complex 3D urban scenes. Data scheduling is performed without using the visualperception-driven method, using the traditional visual-perception-driven method based on screen error, and using the proposed method to compare the initial data input volume, response time, and invalid data input volume of the three methods. Tables 2 and 3 show the results.
As shown in Table 2, when the three viewpoints are loaded, the initial incoming data volumes when using the proposed method and the traditional visual-perception-driven method based on the screen error are significantly smaller than that when no visual-perception-driven method is used. When the first viewpoint is loaded, the initial incoming data volume of the proposed method is slightly higher than that of the traditional visual-perception-driven method. When the second viewpoint is loaded, the initial incoming data volume of the proposed method is slightly lower than that of the traditional visual-perception-driven method. When the third viewpoint is loaded, the initial incoming data volume of the proposed method is the same as that of the traditional visual-perception-driven method. Therefore, in terms of the amount of initial incoming data, the proposed method is superior to that with no use of the visual-perceptiondriven method. However, its performance is not significantly different from that of the traditional visual-perception-driven method based on the screen error.
From the response time perspective, when the three viewpoints are initialized and loaded, the response times of the method in this study and the traditional visual-perception-driven method based on screen error are shorter than that of the method with no visual-perception driving. When the three viewpoints are initialized and loaded, the response time of the proposed method is better than that of the traditional visual-perception-driven method based on the screen error. This also means that when the amount of data transmission is the same, the proposed method can construct the scene at a higher speed and has advantages in data scheduling.
As shown in Table 3, the volume of data transmission of different scheduling methods is compared in the experimental scenario. The experimental results show that the longer a user browses the scene, the more the amount of data scheduling can be reduced with the proposed method and the traditional visual-perception-driven method based on the screen error. The proposed method is not significantly different from the traditional visual-perception-driven method based on the screen error in reducing the amount of invalid data transmission. In summary, the proposed method can effectively reduce the transmission of invalid data, effectively improve the speed of data scheduling, and build scenarios at a higher speed.

Experiment 2: Visualization effect of urban 3D data scheduling method based on
visual-perception evaluation model Figure 12 depicts the real-time frame rate of roaming using three different indexing methods, obtained by analyzing the change in the frame rate throughout the roaming process.
According to the image analysis, at 0-2 s, when the viewpoint is initially loaded, the proposed method has the highest frame rate of 47 fps, whereas that of the traditional visualperception-driven method based on the screen error is only 46 fps and that of the method without  visual-perception driving is only 44 fps. At 2.5-5 s and 25-30 s, when the viewpoint is moving rapidly, the proposed method can control the frame rate fluctuation with a more stable curvature, whereas the traditional screen-error-based method has slightly inferior control to the proposed method. A good real-time visual experience cannot be provided without the use of the visualperception-driven approach to deal with the rapid movement of viewpoints. At 11-24 s, when the viewpoint moves at a relatively uniform rate, the proposed method maintains a higher frame rate than the other two methods to ensure a better display effect. The dynamic scheduling method based on the visual-perception evaluation model can be used to easily determine the LOD level that should be loaded for buildings in the field of view.
The cost of the screen error calculation is thus reduced. The visualization effect is shown in Fig.  13. In Fig. 13(a), the buildings judged to be in front of the viewpoint are loaded with the LOD4 model, the buildings judged to be sheltered are loaded with the LOD3 model, and the buildings judged to be included are loaded with the LOD2 model. In Fig. 13(b), the LOD4 model was loaded for the buildings judged to be in front of the viewpoint, and the LOD3 model was loaded for the buildings judged to be sheltered. This can effectively reduce the transfer of useless data, improve the speed of data scheduling, and build scenarios faster.

Conclusions and Prospects
Toward solving the problems of low data scheduling efficiency and relatively late rendering when constructing complex urban 3D scenes, we propose a visual-perception-driven strategy based on scene graphs. In experiments, the proposed method was used to construct a 3D visualization platform of a city in China. According to the spatial distribution characteristics of the 3D scene data, we used a scene graph to organize the local 3D scene of the city and the LOD simplification algorithm to simplify the 3D city model into four resolution levels. Finally, good relationship attributes between geographical entities provided by the scene graph were used to facilitate the construction of the visual-perception evaluation model. Finally, data were preloaded on the basis of viewpoint information. A weight evaluation of the visual-perception evaluation model based on geographical relations was performed to constrain the adaptive scheduling of different LOD models. The proposed method meets the performance requirements of the system (fluency, stability) and improves the flexibility of scene management (scene graph). The experimental results show that the proposed method can effectively improve the efficiency of data scheduling and accelerate the construction of local 3D scenes.
However, we did not study the data cache replacement mechanism or build a complete data scheduling system based on visual-perception evaluation. In addition, the 3D scene management of a macro city scene and a local city scene was not meaningfully unified. The design of the LOD model is based solely on the "Technical Specifications for Urban 3D Modeling" and does not consider the characteristics of visual perception for in-depth research. Other directions also need further research. One possible approach is to organize the macro and meso scenarios in a suitable way, and select the appropriate nodes in the meso scenarios as the root nodes of the micro scenarios to organically integrate the scenes of the three scales.
Tao Shen graduated from Beijing Normal University with a doctorate in science in 2007. At present, he is mainly engaged in the teaching and research of remote sensing and geographic information, and his research directions include remote sensing information processing, ecological environment remote sensing, spatial information analysis, and spatial information system evaluation. He has written one monograph and won three provincial and ministerial science and technology progress awards. (shentao@bucea.edu.cn) Liang Huo is an expert in spatial information system software evaluation of the Remote Sensing Center of the Ministry of Science and Technology. At present, he is mainly engaged in the research and teaching of cartography and geographic information engineering, and his research direction is UNINFO, a geographic information system software platform with independent intellectual property rights. (huoliang@bucea.edu.cn) Xiaoyong Zhang graduated from Beijing University of Civil Engineering and Architecture in 2021 with a master's degree. He is now working at the Chinese Academy of Fishery Sciences. His research interests include 3D geographic information systems and spatio-temporal data analysis. (460846124@qq.com)