Enhancing Validity of Green Building Information Modeling with Artificial-neural-network-supervised Learning —Taking Construction of Adaptive Building Envelope Based on Daylight Simulation as an Example

Green building information modeling (Green BIM) is focused on a project using BIM as a basic tool from the beginning of the design stage and employs building performance analysis (BPA) in the design-analysis decision-making cycle to obtain an optimized design proposal. However, there are inevitable discrepancies between the simulated performance data and the data obtained from the actual environment. Neural network learning can be used in conjunction with training to obtain a predictive ability, and the resulting predictive values are more representative of actual performance than simulation values. In this study, it is proposed that a predictive value be used instead of a simulation value in judging whether design goals have been met. To construct an adaptive building envelope based on daylight simulation, this project plans to carry out the following six steps in a two-stage process: Stage 1: Data collection and learning: (1) BIM modeling, (2) BPA performance simulation, (3) production of an actual structure and illuminance measurement, and (4) collection of sample data to perform training in supervised neural network learning. Stage 2: After obtaining a predictive ability: (5) setting targets to find an optimized adaptation plan and (6) implementation of script-oriented automatic control.


Introduction
Integrated design and analysis procedures based on green building information modeling (Green BIM) have become an important tool for architects and design teams wishing to select and improve design proposals. Nevertheless, when using building performance analysis (BPA) software to predict building performance in actual environments, there are inevitable discrepancies σ s between simulation data obtained from the software and measurements in the actual environment ( Fig. 1), which have caused the validity of the software's simulation performance to be questioned. The project discussed in this paper therefore seeks to use supervised learning by a neural network to reduce this gap and enhance the optimization ability of Green BIM.

Literature Review
This study addresses the subjects of Green BIM and artificial neural network (ANN)supervised learning. Green BIM involves BIM and BPA, and these two methods have been used extensively in sustainable building design. Krygiel et al. first proposed the concept of Green BIM in 2008 and explained the integrated application of BIM and BPA to promote the development of sustainable design. (1) Bernstein et al. pointed out that Green BIM could greatly enhance the results of sustainable design through the application of BIM tools. (2) BIM involves building information modeling and building information management. The use of BIM encompasses the entire building life cycle, including building design, construction drawing, construction, operation management, and even waste recycling. The term BIM originated from Autodesk Company's use of the concept of building information modeling in 2002 to explain the function and design of its architecture, engineering, and construction (AEC) products. (3) Nevertheless, in 1999, Eastman defined building product model concepts, technologies, and standards, which set the stage for BIM. (4) In 2008, Eastman defined BIM and related technologies in a handbook and provided BIM applications and illustrative cases for various types of participants (e.g., project owners, project managers, designers, engineers, and contractors). (5) Although BPA and BIM consist of two different technologies, they have become increasingly integrated. BPA, which is also known as building performance simulation (BPS), involves the use of computer software to predict building performance and output visualized images, data, statistical analysis charts, and forms resulting from the simulation. BPA can help users to understand the performance of their design proposals, which will facilitate design decisionmaking and provide a basis for the continuing optimization of design proposals. BPA is an effective, scientific, and internationally acknowledged tool. (6) Early modeling and performance simulation tools were independent, and performance simulation tools usually consisted of two parts, where the first part was a simulation engine, which included formulas and procedures, and the second part consisted of a user interface, which facilitated the input of parameters and data and the display of results, and handled various user requirements (Fig. 2). (7) Basic building simulation work began in the 1960s and 1970s, and focused on building cooling performance, specifically thermal load calculations and energy consumption analysis. (8,9) By the 1980s, researchers were performing analytical verification and experimental testing to improve simulation tools. (10) The focus of performance analysis shifted from energy consumption to many other building performance characteristics during the early 1990s, and integrated modeling was used to assess heat and mass transfer, airflow, and visual and acoustic performance. (11) In recent years, BPA has gradually come to be seen as part of integrated design procedures and is generally integrated with a BIM platform. For instance, Autodesk's BIM software (Revit) includes a built-in BPA function (such as energy and lighting analysis) menu. After performing modeling with BIM software, designers can also transmit geometric and nongeometric data to simulation engines in the cloud (such as Green Building Studio), and the visualized results of analysis are transmitted back to the BIM software. (12) Although the analysis by simulation engines has required the use of third-party user interface software (such as Design Builder) for display, this role is gradually being assumed by BIM software platforms (such as Revit) (Fig. 3).
BPA is based on hypothetical models of real situations and provides approximate values. As a consequence, discrepancies inevitably exist between the results of performance simulation and the real data, which has caused the validity of the software to be extensively questioned. Nevertheless, the use of BIM models to monitor actual building operating performance during the operating management stage can provide environmental and building performance data that can be used to improve actual building management and enhance building performance. (13) If this data could be used for comparative purposes and specifically to revise the predictive values obtained during the design stage, it should be possible to improve the predictive accuracy of Green BIM.
This study recommends a supervised learning backpropagation neural network (BPN) that gains predictive capabilities through training to reduce discrepancies and uses the root mean square of error (RMSE) to confirm the results. Neural network learning roughly includes the three categories of supervised learning, unsupervised learning, and reinforcement learning. This study employed supervised learning in an attempt to reduce the data discrepancy; the  principles and theory of this process are as follows. Supervised learning is an inferential process in which the corresponding functions are derived from the inputs and outputs for given examples. For instance, the network will generate a function h approximating f from a group of examples of f. An example consists of a set (x, f(x)), where x is the input and f(x) is the output of the function applied to x. Function h is termed the hypothesis, and the set of all possible hypotheses is termed the version space. All hypotheses in the version space must be consistent with examples. Supervised learning takes prior knowledge as the basis for a current best hypothesis search in the version space, and this consists of a search for hypothesis h best approximating the target function f. The process of searching for function f or its optimal hypothesis in a version space is known as learning or training. (14) The BPN is one of the commonly used architectures for neural network training. The output of a function can be a continuous value, called a regression analysis, or a label, called a classification. On the predictive analytics, the RMSE is usually used to evaluate the model results. (15) The BPN has been applied to environmental performance analysis to obtain good predictive results. The method was to collect historical data from weather stations for training. Its applications included predictions of solar radiation, (16,17) air quality, (18) and wind speed. (19)

Theory and Method
To summarize the above literature, Green BIM emphasizes the use of BIM, a basic design tool, from the earliest stage of the design process. Responding to local climatic conditions, BPA can be used in the decision-making cycle consisting of design and analysis steps to achieve the continuing optimization of design and generate an optimized proposal consistent with environmental performance requirements. Nevertheless, when an optimized proposal derived using Green BIM is realized under real-world conditions, the simulation values obtained by the software invariably have discrepancies with the actual measured environmental performance. Taking light environment adaptation as an example, when the working surface illuminance value with a window opening ratio of X% derived by a simulation tool is Y′ lux and the actually measured illuminance value in a real environment with a similar window opening ratio is Y lux, a discrepancy σ s exists between Y′ and Y ( Fig. 4) In a basic BPN such as that shown in Fig. 5, data undergoes four processing stages from input to output: (1) input, (2) an aggregator function (sometimes an activation function must be added to make the aggregator function more sensitively), (3) a transfer function, and (4) output. In addition, the system estimates the cost of the actual and desired output values, calculates the error, and adjusts the weight (ω n ) in accordance with the error. The process that begins from the time the neural network starts revision until the error is less than a certain preset threshold value is termed learning, training, or adaptation. Supervised learning refers to the constant revision of the network's transmission weights to achieve consistency with the expected value.
In the training process, weights are adjusted to reduce the discrepancy between the network's actual and target output values until the difference is less than a certain threshold value, at which point the process stops. In principle, a good hypothesis must generalize well, which will allow the system to make correct predictions concerning unknown examples. (20)

Experimental Verification
In accordance with the neural network learning characteristics and steps discussed in the previous section, this study verified the feasibility of this method in a six-step, two-stage experimental process (Figs. 6 and 7). A. Stage 1: Data collection, learning algorithm, and acquiring predictive ability (Fig. 6). The steps are as follows:

BIM modeling
Revit was used in modeling and Dynamo for consistency software was used to control the Revit model to adjust the façade window opening ratio X% (Fig. 8).

BPA performance simulation
The BIM model was imported to analyze the performance of a daylight environment. The Revit model was output in gbXML format to Ecotect to simulate the working surface illuminance, and the simulated illuminance (Y′ lux) was obtained from an exported text file. The latitude and longitude in this trial were (24.1638, 120.6471), and the simulation time settings consisted of three days October 1, October 9, and October 16, in 2017. The four red dots in Fig. 9 (sp 1 -sp 4 ) represent the simulated illuminances (Y′ lux) at different points in time. The recorded illuminances at the start of each hour and 30 min past each hour were used as the training set input values (Table 1), while the recorded data at 15 and 45 min past each hour were used as the testing set input values ( Table 2).

Actual construction and illuminance measurement
The actual construction was produced and used in accordance with the BIM model. Dynamo was linked with the plugins Firefly and Arduino, the window opening ratio (X%) was entered into Arduino to drive and control the adaptive building façade in the actual construction, and a light meter was used to measure the actual illuminance of   the working surface. The latitude and longitude at the actual structure were the same as those of the simulated location, and the actual periods consisted of the previously mentioned three days in October 2017. The four red spots (rp 1 -rp 4 ) in Fig. 10 represent the actual measured illuminances at different points in time. The recorded illuminances at the start of each hour and 30 min past each hour were used as the training set input values (white background in Table 3), while the recorded data at 15 and 45 min past each hour were used as the testing set desired values (orange background). 4. Collection of sample data, implementation of supervised learning training, and acquisition of predictive ability The simulated light environment data obtained by BPA were used as the input values, and the measured illuminances from the actual structure served as the desired values. After implementing supervised learning training, the neural BPN acquired predictive ability and was able to predict the approximate Y′′ (predictive values) from the Y′ lux (simulation values). The following steps were employed when using neural network software to perform learning from the sample data: [1] This study employed NeuroSolutions software and a multilayer BPN as the learning algorithm. The training and testing sets were both selected from the sample data ( Fig.  11). (21) [2] Definition of the input and expected values in the rows and columns of the training. [3] Definition of the percentage of the data set used for cross-validation: 20% in this example (Fig. 12).      [4] Definition of the transfer function: In this example, we choose the "tangent hyperbolic function" as the transfer function and adding the "momentum term" to the learning rules to increase the rate of network weight adjustment (Fig. 13). [5] Training set learning (Fig. 14). [6] After acquiring predictive ability, the sample data in the testing set were used to perform validation. The left side of Table 4 shows the predictive value Y″ = ap n and the right side shows the actual measured values Y = rp n . [7] It was confirmed that the network system had learned from the sample data and possessed predictive ability. As shown in Table 5, after using simulation records and on-site measurements for the three days, the simulated values sp n were employed as the input values in the neural network training, and the measured values ap n were taken as the expected values; the results verified that the predictive values had greater validity and good reliability, as shown below.
This study uses the RMSE as the standard deviation. If the measured values are taken as the true values, then i. the standard deviation σ a of the predictive values ap n and measured values rp n at the four points was uniformly smaller than the standard deviation σ s of the simulated values sp n and measured values rp n . In addition, the population standard deviation at point p 1 decreased from 2814 to 2236, that at point p 2 decreased from 2850 to 2064, that at point p 3 decreased from 2115 to 1416, and that at point p 4 fell from 2380 to 1861  Table 5 (Color online) Training set, test set data collection, and analysis (excerpt).
Day_Time  (Table 6) [Eqs. (2) and (3)]. The predictive values were uniformly smaller than the simulated values at all four points and closer to the measured values, which verified that the validity of the predictive values increased after the system underwent training. ii. The magnitudes of the decreases at points p 1 through p 4 [Eq. (4)] were 578, 786, 699, and 519, and the mean decrease was A = 645.5 [Eq. (5)]. The standard deviation σ Δ of the decreases at the four points was 103.87 [Eq. (6)]. With regard to illuminance, the fact that the difference of 103.87 lux was not so large that the training results at the four points were reasonable and that the findings of this study had good reliability. iii. The calculation formulas are explained as follows: When N records are made at observation point p n , the following formula was used to obtain the population standard deviation σ s of the simulated values sp n and measured values rp n at observation point p n : When N records are made at observation point p n , the following formula was used to obtain the population standard deviation σ a of the training values ap n and measured values rp n at observation point p n : The formula for the magnitude of the decrease ∆ between the population standard deviations σ s and σ a at observation point p n was When there are k observation points p, the mean A of the decrease ∆ (k=1,...,n) from point p 1 to point p k is obtained as The standard deviation σ ∆ of the decrease from point p 1 to point p k is obtained as B. Stage 2: Setting of targets in accordance with prediction, finding an optimized adaptation plan, and performing automated control (Fig. 7). The following steps were employed: 5. Finding an optimized adaptation plan After the system acquired predictive ability, it was able to use the predictive value Y″ as its target setting condition and find an optimized adaptation plan. In other words, in the future, it will only be necessary to input a simulation value set as the testing set in the trained neural network, and the network will be able to obtain the corresponding predictive values. Regarding the setting of targets, taking the light environment as an example, illuminance levels can be set according to the planned uses and activities of a space referring to Chinese National Standards (CNS) illuminance standards. For instance, the function of the location of the actual measurements was designated as a studio, which had the uses of reading and writing. As a result, the appropriate illuminance scope for working surfaces within the space was set as 500-1000 lux ( Table 7). The window opening ratio X% and predictive value Y in the adaptation plan had to satisfy this target setting range. 6. Implementation of script-oriented automatic control In accordance with the parameters of the optimized proposal, Dynamo relied on linkage with the Firefly and Arduino plugins to perform script-oriented automatic control to drive the adaptive façade elements of the actual structure. This system operated in a cyclic fashion and enhanced environmental quality by responding to environmental changes by employing adaptive mechanisms. The figure below shows how Dynamo sends and receives data from the Arduino software IED via the plugin Firefly to control the embedded microprocessor and execute the script to the operator to trigger device actions (Fig. 15). (23)

Conclusions and Recommendations
BPS simulation values are approximations of real values. The greater the validity of Green BIM, the better it can discover problems early in the design stage, enabling the proposal of precise decision-making strategies and marked reductions in construction and operating costs. ANN-supervised learning can reduce the discrepancy between predictive and actual values and enhance the validity of Green BIM. In accordance with the theory and method outlined in this paper, a six-step, two-stage process was employed to verify the optimization strategy in a virtual environment, construct an adaptive mechanism based on the light environment in a physical environment, and perform script-oriented automated control. The completed building prediction and control system was compiled as shown in Fig. 16. This system enables design and analysis work during the initial stage of Green BIM to be used in conjunction with environmental data during the operating management stage, which can boost the validity of Green BIM through the use of environmental data records and feedback. In addition, recommendations for further research are as follows: (1) The standard deviation in Eqs. (2) and (3) is also known as the mean Euclidean distance (MED) and is an indicator of the distance between the prediction and learning sample points of a neural network. (24) The smaller the MED value, the closer the predicted time sequence values are to the true values. Although this study has shown that the validity of the predictive values increased, whether an algorithm to find the minimum sampling distance exists must await future research.
(2) BPA is gradually being considered as part of integrated design procedures and is increasingly integrated with BIM platforms. The BIM platform Revit can already use a lighting analysis plugin to analyze natural lighting and visualize illuminance. However, this plugin lacks a numerical output function. The Ecotect software used in this study was withdrawn from update service in March 2015, and illuminance analysis can only be performed by exporting data from the Revit model in gbXML format to Ecotect, which cannot be considered a fully integrated part of the BIM platform. In the future, if lighting analysis plugins can add a numerical output function, this will facilitate the convenient generation of predictive values and enable script-oriented automatic control. In this way, the integrated light environment adaptive capability of Green BIM will become increasingly accurate and effective. (3) This study takes daylight simulation as an example, but its method can also be applied to climate analysis, thermal comfort, energy calculations, and other BPA dimensions.