Big Data Analysis for Effective Management of Power Distribution Network

To find a way to manage power distribution networks efficiently, we researched the use of big data analysis and established a model with mathematical functions to assess the benefit, risk, and economy of the power supply in a power distribution network. The necessary data were collected from the sensors in the network and analyzed with an algorithm using the particle swarm optimization (PSO) method. The powers from wind and solar energy were adopted as distributed power generation (DG) sources. The result of this study showed that the position of the access of the DG to the network is important as it affects the benefit and risk of the power supply for the network. We tested three different connections of the DG to the network, which had a 10% difference in the maximum power supply in the network. Along with the appropriate position of the DG access, the consideration of the risk assessment and the risk-taking also had a significant effect on the efficient management of the network. The model with the power supply risk function (R-PS) required a fourfold higher power supply from the DG, yielding a higher power supply (11%) and overall benefit (44%) than those without the risk function. The degree of risk-taking also affected the management of the network as the result revealed that power supply management with high risk-taking needed less power from the DG (14%), less power supply (2%), and had one-third less overall benefit than those with low risk-taking. We expect the method and results in this study to provide a model for the effective management of a power distribution network with power from DG sources.


Introduction
Power grids are becoming 'intelligent' with the development of big data analysis technology. Recent distribution networks of energy are highly interactive and coupled with other technologies such as Internet of Things (IoT). This makes the structure of the distribution network more complicated, meaning that the efficiency and coverage must be combined with big data to ensure the safe and economic operation of the network. In the network, the allocation and capacity of distributed power generation (DG) are important from the perspective of ensuring the stable power supply of the network. (1) The DG must ensure the security and stability of the power supply of the network. Real-time access of the DG requires the appropriate selection and position in the network, along with optimization for the economy of the network with multiple loads. (2) However, previous studies (1,2) did not consider the realtime characteristics of the DG and changes in the network power capability. Thus, Xu and An established an objective function to guide DG access from the perspective of network loss and benefits, (3) and Teng et al. evaluated the optimization of the reactive power in a distribution network with DG. (4) The methods with DG enable the effective operation and the reduced loss of power in the distribution network. (5) A conventional cost estimation considers the failure rate and reliability index of a distribution line with line switches. (6,7) Owing to the differences between a transmission network and a distribution network, the K (N -1 + 1) reliability criterion is usually used to estimate the cost of line switches. However, previous studies did not consider the cost of the switches in a distribution network with DG.
Researchers have proposed a DG grid connection based on network acceptance (8) and adjusted the level and distribution of loads in the network by adjusting a photovoltaic inverter. (9) Yang proposed a method of analyzing the capacity of DG acceptance based on the flexibility of the distribution network. (10) With the method, a hierarchical and multilevel analysis was performed, and data in distribution networks were processed to plan the related distribution network. (11) As the development of sensor technologies and wireless sensor networks is creating huge databases in power distribution networks with DG, several methods of evaluating the importance of big data in power grids have been proposed (12) and the benefit of DG to power distribution networks has been proved. (13) To plan a power grid economically, the positions of DG must be chosen to distribute power reasonably within the capacity of the grid. It is also necessary to determine how to use big data to integrate multilevel data and optimize the DG access to improve the economy of the power supply for structuring the network. This ensures the sufficient capacity of the power supply and the reliability of the distribution network.
The above studies all used various data analysis methods to provide more reasonable suggestions for the access of DG to the network. In practice, however, the location of DG access to a distribution network is determined by the power user, the output power of DG is random, and the importance of various types of DG is different for different measures of the power grid operation, such as power quality, node voltage, etc. In general, the impact of DG on a distribution network is comprehensive. A distribution network obtains data through various sensor elements and analyzes it through big data analysis technology. Therefore, comprehensive benefits can be used to reflect the impact of DG access, so as to guide DG access to the distribution network.
With this background, we propose in this paper a new evaluation method for distribution networks based on big data analysis. This method first guarantees the power supply capacity of the distribution network. In order to evaluate the risk degree and actual benefits of the DG to the distribution network, we use the structure and economic big data to carry out a hierarchical analysis of the distribution network, and then evaluate the economic benefits of the network. In addition to securing the capacity, to assess the degree of risk and practical benefits of the DG for a distribution network of a power supply, we used structural and economic big data for the hierarchical analysis of the network. Then, we assessed the economic benefits of the distribution network. To do this, we evaluated the capacity of the power supply of the network, estimated the risk function and the maximum benefit, and determined the optimal size of the DG that is connected to the distribution network. The results ensure the safety and economic viability of the distribution network and provide guidelines for operating distribution networks.

Power Supply of Distribution Network
Power supply affluence refers to sufficient capacity to supply enough load for a power distribution network. Such affluence reflects the relationship between the reliability and economy of the distribution network. A sufficient power capacity is necessary to sustain the safety of a network. Figure 1 shows the adequacy of a distribution network within the boundaries in which the power load is available. Within the safety range between boundaries one and two, the larger the power generation (load), the greater the power supply affluence in the system. Figure 1 also implies that, depending on the initial conditions of the system and the agreed boundaries, the affluence has two aspects: (1) When the power supply is between boundary one and the x-axis, the capacity of the power supply in the network, the network structure, and other factors have the ability to afford an increase in the power load. Thus, the larger the ratio of the power supply (generation) to the power load, the stronger the network is. (2) When the power supply is between boundary two and the x-axis, the capacity of the network no longer handles the demanded load, transmission, and supply of power. In this case, affluence is not satisfied, and the amount of load (transaction) that must be disconnected from the network needs to be determined within the acceptable limit by the network. (14) In Fig. 2, the optimal operating point of the network is assumed to be S n , whose corresponding degree of affluence is D n . The probability of the occurrence of S n is defined as P n . Under normal circumstances, it is usual to determine the maximum and minimum operating points of the network with deterministic criteria. Therefore, the final operating range of the network is {min(D n ), max(D n × p n )}.
(1) To determine the working capacity of the system more accurately, the operating range must consider not only the adequacy of each operating point but also the occurrence probability of each operating point. (14) The adequacy of operating points is judged with the decisionmaking criteria for the most serious incident without considering the economy of the network operation and the occurrence probability of operating points. When the affluence is determined conservatively in general, the economy of the network operation is not good. However, if the probability evaluation takes into account the adequacy of the power distribution network and the possible scale of a certain operation point, the range of the economic network operation becomes more reasonable and practical.

Power Supply Capacity
A distribution network has an economic power supply capacity with an adequate power supply. (15) As DG cannot be determined when its volatility is large, the risk of the deficiency of the power in the network is high when DG is connected to the distribution network. Therefore, the power supply risk of DG must be considered to determine the capacity of the power supply in the distribution network. The power supply risk is represented by the probability of the power supply at different sizes. The risk probability of the power supply with connected DG is calculated as where N(s < s A ) is the number of times the power supply is generated at a certain level, N S is the total number of samples, and f(s A ) is the consequence of the degree of risk in practice. The greater the risk, the more serious the possible consequence (the loss of the DG power supply). The analysis of the risk requires several functions to be defined: a power supply benefit function (B-PS), a power supply risk function (R-PS), and an economic power supply function (E-PS). All functions of the distribution network with DG reflect the economic capacity of the power supply in the network. The following are the expressions for the three functions.

Power supply benefit function
With a larger power supply, the economic benefit increases as users obtain more available power. For cross-sectional transmitted power, (16) the B-PS is defined as where T obv is the observation time for calculating the transmitted power, C uit is the unit of the economic benefit from the transmitted power, and f(x) is the risk probability function of the power supply.

Power supply risk function
According to the principle of safe and stable operation of the network, (17) the estimation of the economic benefit of the power supply in a power grid considers various types of incidents that may occur during the transmission of the DG output.
First, the benefit function y(x) can be expressed as three different equations.
Then, the R-PS is defined as where T res is the duration of occurrence of event E i , L is the economic cost caused by the event, f(s A ) is the risk of occurrence of event A, g DG is the average probability of the failure of the power distribution, and s 0 is the loss of DG in the back-up power supply of the network.

Economic power supply function
According to the relationship between the B-PS and the R-PS, the E-PS is defined as where E ps (x) is the E-PS and B ps (x) and R ps (x) are the B-PS and R-PS, respectively. Equation (6) comprehensively represents the actual risk and benefit with the attitude of the decision-maker. The analysis of the theory combined with practical experience makes the comprehensive results more reasonable.

Data integration
The data of the distribution network are obtained from various types of sensors.
(1) Grid structure data The real-time data of a general power grid mainly come from the supervisory control and data acquisition (SCADA) system. It can provide time data; the operation data related to the detailed state of the power grid are collected and recorded in every minute, such as system frequency, total output, total load, and station operation conditions, including unit output, line power flow, bus voltage, and switch status. The sensor is mainly collected by the remote terminal unit (RTU) set at the side of the station. The interface can be an RS485 or RS232 asynchronous data interface. According to the demand for information, the rate is generally 1200 to 9600 bit/s. (2) Telemeter reading data The telemeter reading system (TMRS) is a subsystem for the automatic collection, remote transmission, storage, preprocessing, and statistical analysis of electric energy data. It supports the development of future smart grids, the grid connection of new energy, the operation assessment of the power market, the setting of electricity charges, and the calculation of economic compensation. It is the basis of the system operation.
The statistical power data mainly includes power supply, power generation, etc. The data of the TMRS come from the measurement of electric energy at the gateway points of power plants on-line and off-grid and their tie line, which are collected, stored, and processed in different periods, providing the basis for settlement and analysis.
The load control terminal is installed on the power user side. Owing to the large number of points, the overall business volume is relatively small. For economic considerations, public network communication systems are generally used to realize information transmission, and most of them adopt a wireless mode. In the early stage, power communication was applied for special wireless frequency point resources and directly controlled the load terminal to perform a load-shedding operation through a wireless channel. At present, the mainstream technology is to use the general packet radio service (GPRS) or code division multiple access (CDMA) to obtain the power consumption of users and carry out load-limiting operations.
A distribution network can use big data analysis technology to classify and mine existing data, make full use of the potential value of the data, and integrate different types of data to provide better convenience for power grid planning.
The distribution network uses big data integration to facilitate grid planning efficiently. One of the strengths of big data is real-time collection and analysis. At present, three steps are involved in big data analysis: analysis, integration, and application of the data. (18) Distribution networks with different structures collect a variety of data continuously. The amount of data is huge for the actual operation of a distribution network with DG. At the same time, the data of the network are messy and uncertain.
The integration of the big data from various sources correlates, integrates, synthesizes, and obtains accurate information through computational analysis. (19) It uses mathematical methods to transform the gathered information to comprehensive information. (20) Different levels of data integration are needed for different analysis, integration, and application of the data. For low-level integration, the methods of state estimation (filtering method), data association, and classification recognition are used, while for high-level integration, reasoning and intelligent algorithms and artificial intelligence (AI) such as artificial neural networks (ANNs) and genetic algorithms (GAs) are used.

Data integration for DG access
The main idea of the data integration for optimizing the DG access to the power distribution network is shown in Fig. 3. First, the original data are collected and processed for a hierarchical analysis. Then, the classified data at different levels are integrated and finally the data are used. The categories of the data include the data of the power supply and the economic benefits of the distribution network. The power supply has the structural data of a power grid. The size of the data is determined by the structure of the power grid and the loads of the network. The economic benefit of the power grid is determined by the difference between the cost and income from the users of the power grid. We established the relationship between the expenditure and income through a risk function. We analyzed the correlation among the different types of data, and integrated the data for the DG access.
The repeat power flow (RPF) method is used to estimate the maximum power supply with DG connected to the power distribution network. (21) The input values for the method are the power distribution, power consumption, cost, and so on. The maximum power supply is obtained by the RPF, which is calculated from the risk function of the DG access to the power distribution network. According to the function of the economic benefits of the power supply, a particle swarm optimization (PSO) method is adopted to solve the economic power supply of DG and obtain the output power of the DG. The RPF method is often used to solve an increasing load power flow, that is, by gradually increasing the power generation and load of the system, the power flow equation is solved repeatedly until the constraint condition exceeds the limit. Equation (7) represents the objective function of maximum power supply. The constraint on g(x n , u n , z n ) in Eq. (8) is the equality constraint for the load flow, and that on h(x n , u n , z n ) in Eq. (8) is the inequality constraint for the node voltage and node power.
( , , ) 0 n n n n n n g x u z h x u z The relationship of the variables during the iteration is expressed as where n is the number of iterations (n = 1, 2, ..., N), N is the maximum number of iterations, d n is the amount of unit growth at the nth iteration (the adjustment direction), and l n is the adjustment step size at the nth iteration. The iterative trajectory of the power flow is shown in Fig. 4. With the initial state of the power flow S 0 , the step size is changed one by one, and the result is obtained from the trajectory of the line M, which is an arc.
PSO is a method of obtaining optimal values. In a D-dimensional search space, there are N particles that form a community in which i = 1, 2, ..., N. The particle X i and the "moving" speed V i of the ith particle are respectively represented as two D-dimensional vectors as follows: The optimal position found by each particle is called the individual extreme value, which is expressed as By comparing the best positions for each individual value, the extreme value is recorded as When these two optimal values are found, the particles update their speeds and positions according to the following equations: With these equations, PSO optimizes the maximum power supply. That is, PSO treats the value of the maximum power supply as a particle that belongs to a community and obtains the economic power supply at a certain DG penetration rate according to the principle of PSO. The result evaluates the economic power supply of the distribution network. The specific model is as follows.
The objective function is The constraints are x ≤ S A , where x is the maximum power supply, and x min and x max are the lower and upper limits of the maximum power supply range y, respectively.

Program for optimizing DG access
The probability model mainly considers wind power and photovoltaic power as the sources of DG and the load size in a power distribution network.

Solar cell
The output power of a solar cell is directly related to the light intensity. (22) The following equation defines the relationship between the output power of a solar cell and the light intensity: where P R is the output power of the solar cell, r is the actual light intensity, η is the conversion efficiency of the solar cell array, and A is the actual area of receiving light on the solar cell array. Since the probability distribution of the light intensity is close to the β distribution, the output power of the solar cell also follows the β distribution and is defined as where α and β are the shape parameters of the β distribution and P R,max is the maximum output power of the solar cell. The power curve of the solar cell output is shown in Fig. 5(a).
Wind turbine Similar to the output power of a solar cell, as the output power of a wind turbine is mainly affected by the wind speed, the probability distribution of the wind speed mainly follows the two-parameter Weibull distribution, (23) and its probability density is defined as where k and c are the shape parameter and scale parameter of the Weibull distribution, respectively, and v is the wind speed of the wind turbine. According to the relationship between the wind speed and the output power of the wind turbine, the actual output of the wind turbine is obtained as where k 1 = P r /(v r − v c1 ), k 2 = −k 1 vc 1 , P W is the output power of the wind turbine, v c1 , v r , and v co are the cut-in wind speed, rated wind speed, and cut-out wind speed, respectively, and P r is the rated power of the wind turbine. The power curve of the wind turbine output is shown in Fig. 5(b).
Stochastic load model It has been proposed that the normal distribution can approximate the uncertainty of the load. (24) The parameters µ p , σ p , µ q , and σ q of loads P and Q are included in the following probability density functions.
The specific implementation steps of the model are as follows: (1) According to the probability density curve of the DG and the load, the number of samples N is determined, and the uncertainty factors are sampled using the multilevel coordinate search (MCS) method to determine the initial value of the network load S 0 , the DG processing size S DG , and the load growth value S d . (2) The load growth rate k as the search step in the program is determined and the convergence accuracy ε is provided.  power (A, B), the probability distribution function f(s) of the maximum power is yielded. (10) The particle swarm is initialized with a group size N, particle dimension D, inertia weight w, acceleration factors c 1 and c 2 , particle positions x i , and particle velocities v i . The risk function f(s) is substituted into the target function and recorded as a fitness function.
(11) It is determined whether the range of x i and v i meet more conditions, and the adaptive value F it (i) for each particle is calculated. (12) For each particle, its fitness value F it (i) is compared with the individual extreme value p best (i). If F it (i) > p best (i), p best (i) is replaced with F it (i). (13) For each particle, its fitness value F it (i) is compared with the global extreme value g best . If The velocity v i and the position x i of the particles are updated. (15) With the satisfaction of the end condition, go to step (16). Otherwise, return to step (11). (16) The maximum value of the fitness function is the final output, which includes the economic power supply and the corresponding value of x i .

Results
We took a power distribution network for the simulation with the structure shown in Fig. 6. In this structure, the original load is 3800 + 2469j kVA and the voltage is 12.66 kV. (25) The rated power of the photovoltaic cell is 200 kW and the shape coefficient of its probability density function is 0.75. The rated power of the wind turbine is 300 kW and the shape parameter and scale parameter of the probability density function of the wind speed are 2.49 and 8.96, respectively. v c1 , v r , and v co are 2.6, 12, and 20 m/s, respectively. The K (N − 1 + 1) criterion of the line is not considered. (26)

Model validation
To validate the model in this study, we set the sampling number as 1000 and used a combination of Monte Carlo simulation and RPF. The maximum and minimum power supply of the DG at different positions are shown in Table 1.  The power risk of the grid is shown in Fig. 7, which is obtained from the risk function [Eq. (4)]. In Fig. 7, f(x) shows the change in power supply risk when DG is connected. The slopes of the lines at lower and higher ends of the f(x) values are gentler than the slope at intermediate values.
If the power supply of the distribution network is in the intermediate range of f(x), a change in the supply has a considerable impact on the power supply risk. At the same time, as Fig. 7 shows, DG at position 3 has the largest power supply change. To determine the power supply risk, we used the PSO algorithm with the following parameters: number of iterations N = 1000, size of initial population = 24, weight coefficient w = 0.85, learning factors c 1 and c 2 = 2.13, maximum iteration number N max = 1500. We took the test time of T obv = 1 h and the economic benefit per power unit of C uit = 3000 yuan/kW•h. When the duration of the risk was T res = 0.25 h, the average economic cost for the risk was L = 700 yuan/kW•h with a DG failure probability of 0.15 given by the R-PS. (27) The B-PS at position 3 with and without the risk function is shown in Fig. 8. The effect of the risk function on the power supply and the economic benefit are clear, as can also be seen in Table 2.
From Fig. 8 and Table 2, the following are observed. (1) Without the risk function, the benefit of the network does not fluctuate, but the benefit decreases with increasing power supply. With the risk function, the benefit increases with the peak power supply, then decreases. A power supply of larger than 5.34 MW has similar values and a similar tendency for the benefit function with and without the risk function. When the power from the DG has a strong influence on the power grid at position 3, the benefit becomes minimal with decreasing power supply. Therefore, the maximum benefit is more reliable with the risk function. (2) Without the risk function, the power from the DG is much lower with a higher power supply and the benefit is lower. Thus, the risk function makes the benefit function of the network more real and reliable. (3) The economic power supply with the risk function is 12% higher than that without the function, but the overall benefit is 31% lower.   Table 3. The overall benefit is calculated as follows. As the cost of the power is 3000 yuan/kW•h, assuming that the duration of the risk is 0.25 h, the average cost for the risk is 700 yuan/kW•h. For the failure probability of 0.15 for the power supply from the DG, the final calculation result is defined as the overall benefit. Table 3 reveals that the DG access position affects the power supply range, the power from the DG, the economic power supply, and the overall benefits of the network. Positions 1 and 3 have similar power from the DG but position 2 has significantly different power. The economic power supply varies by 2−5% among the three positions, and the overall benefit of the distribution network at the three positions varies considerably: position 2 has 3.13 times the benefit of position 1 and position 3 has 4.78 times the benefit of position 1. This implies that the design of the DG access to the power distribution network is important for the efficient management of the network.

Discussion
Both the access position and output supply of the DG affect the power supply capacity of the network. However, to ensure the power supply capacity of the distribution network, it is  necessary to find a reasonable position for the DG. We used big data analysis to compare the economy of the distribution network connected to the DG. Figure 10 shows the effect of the attitude of decision-makers toward the use of DG access in position 3 on B-PS. The E-PS and B-PS for different attitudes are shown in Table 4. The degree of the risk-taking affects the power supply from the DG, the economic power supply, and the overall benefit. Higher risk-taking requires a 14% lower power supply from the DG than lower risk-taking. The value of the economic power supply is 1.8% lower for high risk-taking than for low risk-taking. The difference in the overall benefit is significant as the  management with low risk-taking has 2.5 times larger benefit than that with high risk-taking. The degree of risk-taking has more effect on the overall benefit than the position of the DG access to the network. Thus, the level of risk taken for the management is important for the management of the power distribution network.
The main novelty of this study is the proposal of a new evaluation method for distribution networks based on big data analysis. This method first guarantees the power supply capacity of the distribution network itself in order to evaluate the risk of DG to the distribution network. We used structural and economic big data to analyze the distribution network hierarchy, and then evaluated the economic benefits of the network.

Conclusions
In this study, we considered the impact and role of DG for a power distribution network on the power supply capacity and economy of the network. Through the integration of big data that were obtained from the sensors in the network, we proposed a method of evaluating the economic power supply capacity, enabling us to provide guidelines for connecting DG to the power grid. The method included three functions to evaluate the power supply benefit, power supply risk, and economic power supply. We used an algorithm with a stochastic load model and the power from a wind turbine and a solar cell as the DG. The results showed that the risk of the power supply with DG access to the distribution network was different at different positions of the connection. The risk function affected the power supply from the DG and  the economic power supply of the network. Without the risk function, the overall benefit was higher, but the power supply from the DG and the economic power supply were lower than those with the risk function. The position of the connection of the DG to the power distribution network also affected the benefit, risk, and economy of the power supply in the network. The decision on how much risk to take in the power supply management is also important for the economic operation of the power distribution network. Depending on the position and the degree of risk-taking, the overall benefit can vary by a factor of 3-5. With appropriate use, the power from the DG ensures the affluent power supply capacity of the distribution network to maximize the overall economic benefits of the distribution network. For the management of the power distribution network, big data analysis can play an important role in establishing guidelines for optimizing the use of DG.