Using Microservice Architecture as a Load Prediction Strategy for Management System of University Public Service

,


Introduction
With the rapid development of cloud computing and Internet of Things (IoT) technologies, more hardware and software resources are being provided by Internet service providers as software as a service (SaaS) schemes. IoT devices represent the sensor layer in an IoT infrastructure, which is responsible for obtaining different types of static/dynamic information about the real world through sensors with different functions. IoT devices can be embedded with software, electronics, and sensors, and it can also be used for feature connectivity with constrained resources. The IoT is a paradigm that allows many smart devices to be connected to the Internet in the cloud, and these devices can be sensors, which can operate and transmit data from a system to other systems. These IoT devices require high extensibility and agility and low fault tolerance in the performance of software systems. Thus, it is important to investigate or develop a system having a backend service architecture that can also have a monolithic architecture in the early stage, for example, a service-oriented architecture (SoA) (1,2) and a microservice-based architecture. (3,4) A microservice-based architecture has a system of realtime environmental sensors, and it is highly scalable for applications in a cloud environment.
The microservice architecture is connected to computational workflows for data analysis because this architecture provides complete, ready-to-run, reproducible data analysis solutions that can be easily deployed in private and public clouds as well as on desktop computers. Moreover, using the bounded context method of the microservice decomposition architecture, design ideas and implementation schemes for resource monitoring and scheduling and for obtaining data synchronization in the cloud have been proposed to provide combat capabilities with high efficiency and fast collaboration. (5) A microservice-based architecture also has many advantages in large-scale software development, which can be divided into many solo applications as small independent services according to business logic. (6,7) Even though the microservice architecture is very popular, it also introduces problems in service communications and resource allocation efficiency when the number of users rapidly increases or large-scale heterogeneous resources are needed. Because there are many microservice modules existing in one platform, which have complex relationships with each other, they can cause fatal performance bottlenecks in the resource allocation of the system, making load prediction an urgent necessity. (8) There have been many studies focused on the load prediction of the human flowrate of public services, including algorithms based on traditional probability statistics, data mining, neural networks, and deep learning. Load prediction based on traditional probability statistics includes the exponential smoothing model, autoregressive model, and autoregressive moving average model. Monfared et al. proposed a new adaptive exponential smoothing method, which can automatically adjust parameters according to reality and the model can be updated immediately over time. (9) Calheiros et al. provided an autoregressive moving average model, which is applied in SaaS for cloud computing to predict the physical machine load. (10) However, these methods are all based on predefined formulas, which limit their generalization ability to deal with ongoing load data, making it difficult to accurately depict actual situations with complex conditions. Shahin proposed an adjustable algorithm based on a dynamic threshold, which was constructed with the use of long short-term memory (LSTM) [a variant of the recurrent neural network (RNN)] to predict the size of resources and automatically allocate virtual machines according to predicted values. (11) Qiu et al. also developed a deep learning method but with a different bottom model, which was a combination of a multilayer restricted Boltzmann machine and a deep belief network (DBN). (12) These load models based on the above prediction methods were only concerned with hardware execution efficiency because the central process unit (CPU) and memory were the main factors. However, it is possible to neglect the load of elements including the CPU, memory, hard disk I/O, and network I/O. Moreover, a DBN is not good at dealing with time series data, in contrast to the RNN. A gate recurrent unit (GRU) is a variant of LSTM. However, to our knowledge, no research has focused on the load prediction strategy for the management system of a university public service by forecasting potential problems. In this study, we proposed a new load model, which considered the efficiencies of the CPU, memory, network I/O, and hard disk I/O. We also modified the LSTM model, which was a GRU model, for conducting the network training and used it in our load prediction strategy. We also simplified the gate compared with that in other studies with the aim of resource optimization. Also, to improve the accuracy, especially the processing accuracy of the predicted time series data, we improved the calculation efficiency. We compared the prediction results of our proposed strategy with other classical algorithms, including the autoregressive integrated moving average model (ARIMA), support vector regression (SVR), and LSTM. We found that our prediction method had higher efficiency than the other methods.

University Utility Architecture
A university utility is a cloud computing platform based on the microservice architecture, which is a renewed architecture and can be investigated as a common monolithic application. We adopted this scheme because the microservice architecture is relatively suitable for situations where the number of users is rapidly increasing and system maintenance becomes more burdensome without sufficient IT expertise. The services of management systems often need to run for the whole day without interruptions. When services are developed and deployed by different teams using different technologies, it is preferable to choose the microservice architecture for better development & operations (DevOps) practice. (13) The utility architecture of our university (Longyan University) is shown in Fig. 1. Users can visit the university utility from a cell phone or PC, and the log-in information is intercepted by a common public user interface that is responsible for unified authentication, security, and verification. This utility provides many services in the front end, including score query, school news, financial information, library access, and curriculum information. Users can tap any module that they want, and these requests are sent to the back-end destination. The Zuul gateway, which is designed for service information routing and location, is a necessary part of the system as a large number of microservice modules exist, resulting in frequent changes in the service IP address. When there are disturbances in microservice modules, which are not uncommon in a large-scale system, Hystrix can act as a fuse and quickly diagnose the locations of disturbances, cut off the related paths, and redirect the requests to alternative service modules. Ribbon is a load balancer, which is a critical component in the architecture. Its main function is to distribute client requests to different targets for microservice modules using its internal self-learning algorithm. With the help of Ribbon, the system load burden can be distributed more equally to each service module by considering its physical host burden. Users can upload related developed service module information into Config using the Git tool, and Config saves all the configuration information and downloads it to each module that requires it whenever necessary. The microservice modules discussed in this paper were developed by Spring Boot, which is a fast and convenient integrated development package, and Jenkins was responsible for microservice module testing and integration testing. (14,15)

Load model
So far, load models having functional limitations have often been considered, which are only concerned with the functions of CPU and memory. Actually, the hardware for the load models is more complicated in practice, so the load model in this study considers the parameters of CPU, memory, disk I/O, and network I/O. The load computing formula is described below, in which parameters with the prefix W refer to coefficients and they dominate the use of resources in the load model. Load Cpu , Load Mem , Load Disk , and Load Net respectively refer to the load values of the CPU, memory, disk, and network in one physical host at a given moment after normalization procedures. Load Host is the predicted load of a physical host at a given moment and is calculated as follows.

LSTM
Neural networks originated in the 1950s, and they have gradually been improved with the development of the perceptron network structure, which is a model consisting of three layers: an input layer, an output layer, and a hidden layer. Data can be incorporated in the input layer and output from the output layer after transformation in the hidden layer. With the improvement of models, many kinds of neural networks with multiple hidden layers have been invented, in which different optimization functions have been used. However, these neural network models are not good at dealing with time series data, and the RNN was developed to overcome this problem. (16) The output of neurons in the RNN can be applied as input data for the next time stamp, in which the connections of the structure established by the neurons of the RNN can remember input data strings of any length. The RNN can be seen as a kind of deep neural network (DNN) with the depth of its structure equivalent to the time length, which means that the condition of fading away is inevitable. To solve this problem, LSTM, an upgrade of the RNN, was designed, an example of which is the GRU. (17) The critical part of the RNN is the transfer function, which can be described by Eq. (2), where X t , the input of time stamp t, and Hidden t−1 , the output of the hidden layer of the prior time stamp t − 1, can be combined to update the output Hidden t of the hidden layer at time stamp t.
To characterize the properties of the RNN, F is a nonlinear differentiable transfer function and different values of F correspond to different RNN models. In this paper, we choose the vanilla RNN model, which is a simple transformation type of RNN and can be seen below. In Because the vanishing gradient problem is a classical problem, i.e., RNN models cannot capture data with long-term reliability, the transfer function will forget early training information when the time series data is long. LSTM is chosen for this situation because it can introduce the gate function in the design of the transfer function. The GRU is a minor revision of LSTM with simplified gates, and in the GRU model, two gates are introduced in neurons. The first one is reset_ gate, used for adjusting the combination of the current input data and the prior output hidden memory, and the other is update_gate, used for controlling and saving the training memory of the prior time stamp. The transfer function in the hidden layer of the GRU can be expressed as Eqs. (4)- (7), with T 1 , T 2 , and b co-shared among all time stamps during model training, in which the operator  is the dot product. The structures of the RNN and the variations of the GRU neurons are shown in Fig. 2

Neural network training
Neural network model training focuses on the hidden layer. First, in the input layer, we collect the four kinds of load time-series data mentioned above, which are normalized as follows: The training set and testing set in the model training can be expressed as below, with the constraint condition m < n.
Then we incorporate the input data into the classical slide window method, assuming a data length of length. The divided model input is expressed as For this input, the initial prediction output in the output layer can be expressed as Then the initial input data is incorporated into the hidden layer for the training calculation with the Adam optimization algorithm used to continuously update the related parameters in the model until the predefined prediction accuracy is satisfied. The output data processed by the hidden layer can be expressed as Eq. (13) As the input data is incorporated into the input layer, the output data will be "hidden" in the hidden layer calculation, and the initial prediction data in the output layer will be an array of length defined as length. We adopt the mean square error (MSE) as the error evaluation method, which is a method to calculate the loss, where the loss can be described as follows: Our load prediction strategy can be generalized as below in a pseudo-format, in which we need to initialize a few parameters, including the load data load, error accuracy o, possible steps, the GRU neuron state S, and the Adam optimization parameter seed, before model training. With the help of prediction data, we can predict the trend from the recent payload, i.e., whether it will exceed the threshold or is expected to increase in the future, providing useful information for the future scheduling strategy. (18)

Load prediction strategy
The processes of the load prediction strategy are listed below: Input Data: <Load, set length, o, length, S, seed, steps, wait for prediction data length > Output Data: <prediction data> (a) Fetch Load training and Load testing data from Load by set-length (b) Fetch X and Y from F training by length (c) GRU neuron creation by S (d) GRU network connection by GRU neuron (e) GRU network initialization by seed (f) foreach step in steps (g) Hidden ← GRU(X)

Experimental Results and Discussion
We carried out an experiment to demonstrate the effectiveness of the improved load prediction strategy. In the experiment, we first normalized the initial data from four different sources, the CPU, memory, disk I/O, and network I/O, using Eq.
Python 3.5, TensorFlow2.0, and Linux CentOS 7.2 were used in our experiment. We chose the length of the load data to be 30, the number of hidden layers in the GRU to be 4, with 50 neurons in each layer, and the number of iterations to be less than 300 to avoid spending too much time on training. The required prediction error was set to 10 −6 . The trained model, instructed in accordance with the above method, was used for load prediction and compared with the actual load curve, as shown in Fig. 3. Although the curves do not perfectly match, they both show the same trend and have similar data values, which means that the proposed load prediction strategy has good accuracy. Then we compared our strategy based on GRUs with some popular prediction strategies for load prediction: DBN, ES, ARIMA, and K-modes. The results are shown in Fig. 4 (19,20) and reveal that our GRU model has the smallest prediction loss. This is because ES and ARIMA are based on traditional probability statistics, and it is difficult to fit the complex and changing load data with predefined formulas. DBN has a better performance than ES and ARIMA owing to its strong generalization ability, but it is ineffective for time series data. K-modes requires a large amount of fundamental data for early training but it cannot effectively use preliminary data. This is because the effectiveness of using history data decreases over time, making it difficult to manipulate the data of time series in practice.