Optimizing Back Propagation Neural Network Parameters to Judge Fault Types of Ball Bearings

When current technology keeps advancing, global machine tool manufacturers are gradually moving toward smart production lines. The ball bearing is an important fixed part of a rotating shaft; its key function is to bear the load acting on the shaft and maintain the center position of the shaft. If the bearing is damaged, there will be abnormal vibration, runout, and abnormal noise. Hence, the fault detection and recognition of the ball bearing are particularly important. The fault signal data of the ball bearing used in this study are obtained from the Case Western Reserve University (CWRU), and we establish a ball bearing status recognition model according to different signal-captured positions. First, the infinite impulse response (IIR) filter and approximate entropy (ApEn) are used to extract the features of the signals. Afterwards, the data extracted from the features are used for model establishment and training through a back propagation neural network (BPNN) and a support vector machine (SVM). In general, the SVM classification is better than the BPNN, but through a series of experimental methods, we confirmed that the optimal BPNN parameters of this sample, including training function, data training ratio, and the number of neurons, make the recognition rate of the BPNN higher than that of the general SVM, and the accuracy rate reaches 95%.


Introduction
Most of the automation equipment available in the market today requires fixing of the axis of rotation, and the ball bearing supports the mechanical rotating body. The ball bearing is used to reduce the friction coefficient during rotation and ensure the accuracy of rotation. Hence, the ball bearing is an essential part of transmission equipment. If the motor breaks down during high-speed operation, it will affect the performance of the machine. If the breakdown is severe, the machine must stop processing, thus affecting the yield rate. To improve this situation, in addition to enhancing the rigidity of the bearing, it is important to determine its operating status and detect and analyze its malfunction. Thus, when the ball bearing is abnormal, it should be replaced immediately.
Various factors may cause the deformation of the inner and outer bearings, such as rotation under high-load and high-speed operation, and cause excessive ball wear. (1) Excessive ball wear causes a decrease in stability during transmission and affects the overall performance of the system. Many scholars have conducted fault diagnosis and working life analysis on the running condition of ball bearings. (2,3) The ball bearing status is determined on the basis of standards established by the American National Standards Institute (ANSI) and International Organization for Standardization (ISO), and most research studies analyze the bearing status relative to the changes in current at the load drive terminal. (4,5) Most studies detect the changes in audio frequency generated during bearing operation, (6,7) the abnormal changes in bearing temperature, (8) and the vibration changes generated during bearing operation, (9,10) and there are many types of method of bearing fault detection and diagnosis. Since the changes in vibration generated by the rotation of the bearing are not large, and there is always external noise interference during data collection, in most studies, time domain signals are converted into frequency domain signals for spectral amplitude analysis, to carry out feature extraction processes, such as Fourier transform (11) and wavelet transform. (12) However, Fourier transform cannot be used to describe the partial features of the signal, and the amount of wavelet transformation is huge; thus, it is difficult to perform real-time processing.
Technologies have been advancing in recent years, enabling significant enhancements in computer technology capabilities, such as the introduction of artificial neural networks (ANNs) for fault diagnosis. (13) The advantages of ANNs include the parallel distributed processing capability, the ability to learn, robustness, and fault tolerance. As for disadvantages, ANNs need a large amount of training sample data for training and application. Therefore, in this study, we use the fault signal of the ball bearing, which was disclosed by Case Western Reserve University (CWRU) for discussion, adopt an infinite impulse response (IIR) filter to carry out signal decomposition, (14) use approximate entropy (ApEn) (15) for feature extraction, and bring the status types of the ball bearing into the classic classifier (16) back propagation neural network (BPNN) (17) and support vector machine (SVM) (18) to compare the training, testing, and recognition rates of the models.

Experimental equipment and architecture
The simulation data used in this study adopt the fault signal database of the ball bearing proposed by CWRU. The platform is shown in Fig. 1; the red frame line a is the motor component, and its output horsepower (hp) is 2. The blue frame line b is the torque sensor, and the brown frame line c is the torque loading device component. The bearing at the drive end adopts the ball bearing model 6205-2RS JEM SKF (Svenska Kullagerfabriken, SKF). The detailed structural parameters of the machine are shown in Table 1, and the sampling frequency of the vibration signals is 12 kHz. Table 2 shows the detailed specifications of the bearing failure. The posted positions of the acceleration gauges are the drive-and fan-measured bearings. The signals captured by the acceleration gauges are fault signals from the balls and inner bearing. The PC hardware for analyses and recognition in this study is an Intel ® Core TM i7-7700 CPU 3.60 GHz. The RAM is 16 GB, the display card is an NVIDIA GeForce GTX-1050Ti, the verification software is Matlab 2019a, and the toolboxes are machine and deep learning toolboxes.
The database used in this study was established from the vibration signals described above. We cut each piece of the original signal (indefinite length point, at least 60000 points or above) into four sections (each section contains 12000 points). The original classification has four different load statuses of the electronic motor, three different fault diameters, and two fault statuses, and each fault status is classified into inner bearing fault and ball wear fault. Hence, in this study, we classified the above fault statuses into four types: the fan-measured acceleration gauge signal for internal bearing failure, the drive-measured acceleration gauge signal for internal bearing failure, the fan-measured acceleration gauge signal for ball wear failure, and the drive-measured acceleration gauge signal for ball wear failure. These four types of fault have 192 pieces of data in total, and each type has 48 pieces. The experimental flow chart designed in this study is shown in Fig. 2. This process goes through data preprocessing first, which includes signal classification, segmentation processing, and ball bearing database establishment. Feature extraction is performed by IIR and ApEn. IIR is filtered from 170 to 150, 140 to 120, and 110 to 90 Hz. Therefore, the analysis signal increases to three pieces, that  is, four pieces in total when the original signal is included. ApEn can obtain feature values from each piece; thus, the final data is 4 × 192. At the end of the experiment, we used BPNN and SVM classifier recognition rates for discussion, and determined the optimized parameters of the BPNN, to adopt a simple and easy-to-use model to establish bearing fault analysis.

Feature extraction
The experimental process is carried out according to Fig. 2. Feature extraction is implemented after obtaining the data from the database, and in this section, we illustrate the processing details of feature extraction. In this study, we use the IIR filter, also referred to as the IIR digital filter, which is a type of digital filter. Since there is a feedback loop in the IIR response to the pulse input signal, which is infinitely continuous, the IIR digital filter task is to change the spectrum of the input signal through computing. The IIR digital filter has the following advantages: it only needs low-order numbers to satisfy the filtering conditions and the computing efficiency is higher, but the computing has to use the entire segment of the signal continuously. Its input-output relationship is shown in the following difference equation: In Eq. (1), x(p) and y(p) are the input and output of the filter, respectively, b k and a k are the coefficients of the filter, and m and n are the orders of the filter. The above-mentioned IIR filter and ApEn are used to perform feature extraction. ApEn is a type of complexity index, which can indicate whether the data segment has similar results or conditions; if the data stands out abnormally, we can see clearly the changes in statistical results. At present, this method can be applied to some research studies related to frequency band complexity analysis. The following are demonstrations of ApEn: The time series for N data points is shown in Eq. (2). (19) { (1), Select any m pieces of consecutive data from the series to rise to the m-dimensional vector, as shown in Eq. (3). (19) The difference between the two vectors is defined as Eq. (4). (19) Given the noise filter coefficient as r, record the number of data that satisfy the [ ( ), ( )] d x i x j r ≤ condition and compare the sum of this value with N − m, that is, the ratio of each similar number to the total number, as defined in Eq. (5). (19) After extracting the features via ApEn, the size of the data changes from 12000 × 192 × 4 to 4 × 192.

BPNN
The BPNN is composed of a multilayer perceptron (MLP) and an error back propagation (EBP), (20) which is named the BP algorithm. The BPNN is a supervised learning network, (21) which is the most representative and popular application among neural networks. It is divided into the forward propagation of network input and the back propagation of output error. If the actual output cannot meet the expected network output, the error message will be transmitted backward from the output layer to the input layer, and the weight will be modified continuously to reduce the error successively to achieve the purpose of learning. Figure 3 shows the basic architecture of the BPNN. Yellow is the input layer, green is the hidden layer, and red is the output layer. The basic architecture of the neural network contains the processing unit, layer, and network. The processing unit is the most basic unit of computing, and several processing units with the same function are grouped into "layers". These layers include the input, hidden, and output layers; these different layers form a type of neural network, and the neural network itself has both learning and recalling functions. The BPNN training process is shown in Fig. 4.
The BPNN is a typical multilayer network structure, which consists of an input layer, a hidden layer, and an output layer. The layers are interconnected by weights and deviations. The output of the previous layer is subjected to a nonlinear activation function (22) to process the input of the next layer; the practical equation is as shown in Eq. (6).
f() is the activation function named Sigmoid; its definition is shown in Eq. (7).
In Eq. (6), S j is the input of the jth hidden layer, w ij is the weight that connects the weights of processing units in different layers, x i is the input of the ith value, and b j is a bias.
When there are errors between the predicted and actual values, the BPNN will return the errors to the weight and partial weight of each hidden layer's correction model. Then, the input of the training data according to each layer of new weight and bias computing is sent to the next layer, and the error is corrected repeatedly until it falls in the range of acceptance. To reduce the difference between the network output and the target value, the target function shown in Eq. (8) is used.
In Eq. (8), y (i) is the actual value and ( ) i y is the predicted value. In this study, we use three different training number functions to train the neural network, as follows: 1. Trainlm: According to the Levenberg-Marquardt algorithm, (23) to update the weights and biases, Trainlm has the highest convergence speed for the medium-scale neural network model, but the disadvantage is that it needs a lot of memory to work. 2. Trainbr: According to Bayesian regularization (24) to minimize the combination of square error and weight, the network has a higher generalization ability, but it takes a longer training time. 3. Trainscg: According to the Scaled Conjugate Gradient, (25) more iterations are needed than other algorithms, but there is no need to perform a linear search in the iteration, so the amount of computing for each iteration is considerably reduced.

SVM
The SVM is a widely used classification technique in the field of machine learning and pattern recognition. It is the first technique that was used to deal with the binary classification issue. The learning method of the learning system of the SVM is used to calculate an optimal hyperplane through statistical learning theory. The data between different categories has the maximal margin and minimal misclassification, and the optimal hyperplane is constructed using the support vectors. H is the classified hyperplane; H 1 and H 2 are the samples that are closest to the classified hyperplanes in each classification and are parallel to the hyperplane of the classified hyperplane; the distance between them is the margin. The optimal classification hyperplane requires that the classified hyperplane not only correctly separates the two samples, but also maximizes the classification intervals. While , and x is the vector of the input features, w is the normal vector of the hyperplane, b is a constant, and the interval between the two is 2 / w . Hence, to obtain the maximum interval, we look for the minimum of 2 / 2 w , which satisfies the conditional Eq. (9).
( ) To sum up the above, we have the following objective function: [ ] The Lagrange method (26) (w = a 1 y 1 x 1 + a 2 y 2 x 2 + ••• + a n y n x n ) is used to optimize the classification hyperplane; then, the issue of the binary classification of the SVM can be expressed as The SVM has the following advantages: it has good applicability in high-dimensional space and can effectively process data with more variables than the number of samples. Moreover, some subsets can be available at the training model with no need of plenty of memory.

Comparison of BPNN optimal training function
The internal parameter setting of the BPNN has no fixed basis and norms; therefore, in this study, we compare the different training functions in terms of the number of neurons and the amount of data training, and then select the best parameters. Under the condition of the same parameters, the three different training functions, Trainln, Trainbr, and Trainscg, are trained by the CPU, and we also use the GPU to train Trainscg and to determine whether the computing between the GPU and the CPU will indirectly affect the experiment. Four different results are shown in Figs. 5-8. On the basis of the results shown in Figs. 5-8 we can tell that the accuracy rate of Trainbr is higher than those of Trainlm and Trainscg. The results shown in Figs. 9-13 indicate that from the viewpoint of computing time, using a single CPU for low-dimensional computing is more suitable than using a single GPU to perform computing operations. Moreover, the experimental       results show that the recognition abilities of the three different training functions for the ball bearing database from high to low are Trainbr, Trainlm, and Trainscg. Therefore, we will adopt CPU and Trainbr training methods in our subsequent analysis.

Best data training ratio of BPNN
After comparing the training functions in Sect. 3.1, we fixed the data training amount of Trainbr to BPNN at 90% and the test amount at 10%, the data training amount at 80% and the test amount at 20%, the data training amount at 70% and the test amount at 30%, the data training amount at 60% and the test amount at 40%, the data training amount at 50% and the test amount at 50%, and compared the results. The experimental results are shown in Fig. 7.
We can tell from Fig. 7 that when the training rate is 90% and the other training ratios have a very high recognition rate, the recognition rate is up to 95%, and the lowest is 85%. After the analysis, we understood that the recognition abilities from high to low of five types of data training ratios for the ball bearing fault are 90, 80, 60, 70, and 50%. Therefore, in the follow-up experiment we adopt a training ratio of 90% for exploration.

Best number of neurons in hidden layers of BPNN
The results described in Sects. 3.1 and 3.2 show that the highest recognition ability was observed when the training function is Trainbr and the neural network data training ratio is 90%. Afterwards, we compared the relative recognition rates of different numbers of neurons (10, 20, 30, 40, 50, 100, 150, 200, and 250) and the corresponding accuracy rates of the numbers of neurons are shown in Fig. 7.
We can tell from Fig. 7 that when the number of neurons is 250, the corresponding recognition rate is higher than those for other numbers of neurons, and the recognition rate tends to increase when the number of neurons is 150. To summarize Sects. 3.1-3.3, we can tell that the highest recognition ability was observed when the training function is Trainbr, the neural network data training ratio is 90%, and the number of neurons is 250, and these parameters are used to recognize the ball bearing fault. From Table 3, we can verify that in this study, the BPNN has a very high recognition rate when the training ratio is 90% and the number of neurons is 250.

Comparison of classification results between SVM and BPNN
In this study, we adopt the SVM classifier to verify the classification accuracy rate; we capture 192 pieces of data by using the fault signals of the drive-measured ball and the inner bearing for analysis. When the training set is 90% and the test set is 10%, and when the training set is 80% and the test set is 20%, the SVM has a nearly 85% recognition rate, and the lowest recognition rate of the training sets is 81.25%. As for classifiers, the recognition performance of the SVM is better than that of the BPNN, but in this study, the recognition performance of the BPNN is better than that of the SVM through parameter adjustment and setting. Table 4 shows a comparison of the SVM and BPNN under the same training set and test set parameters. We can tell from the following table that the BPNN and SVM have the best recognition rates when the training set is 90%.

Conclusions
In this work, we analyzed the fault signals of the ball bearing disclosed by CWRU. After the preprocessing and feature extraction of the fault signals from the drive-measured ball and inner bearing, we performed the fault classification of the signals with the BPNN and SVM, and establish the optimal model by parameter selection. The research results show that when the BPNN uses Trainbr as the training function, the data training amount is 90%, and the number of neurons is 250, a very high classification ability is achieved. Afterwards, we used the SVM to carry out classification verification and found that the SVM has a high classification accuracy rate when the data training amount is 90%. Thus, we can tell that both the BPNN and the SVM have a very high recognition rate for the data analyzed in this study. In the final comparison of classifiers, we showed that through parameter adjustment, the classification accuracy rate of the BPNN is higher than that of the SVM.