Machine Failure Analysis Using Nearest Centroid Classification for Industrial Internet of Things

This paper presents a predictive model for machine failure analysis, aiming to accurately analyze various causes of machine failure. The predictive model was developed in the following three steps: 1) dataset classification, 2) attribute selection, and 3) centroid calculation. In the first step, the dataset is classified into multiple subdatasets according to the cause of machine failure. Each subdataset is denoted by a cluster. In the second step, the mean of each attribute measured at the same time is calculated and compared with that of the normal case. Then, the attribute that changes most after the machine failure is selected. In the last step, the mean and variance of the selected attribute are calculated to create the elements of each cluster, and then the centroid of each cluster that maximizes the cohesion of the cluster is calculated. The causes of machine failure are determined by comparing the distance between the data of the new machine failure with the centroid of each cluster. To verify the feasibility of the predictive model, we conducted an experimental implementation. The results show that the implemented predictive model is feasible for analyzing the causes of machine failure.


Introduction
Recently, the industrial Internet of things (IIoT) has received a great deal of attention from both industry and academia, since it makes the manufacturing process more reliable, efficient, and safe. (1)(2)(3) In IIoT, a number of sensors are typically employed to monitor their surrounding environments. (4) Particularly, in the manufacturing environment, multiple sensors are attached to machines to detect whether the machines are operating successfully. The sensors periodically generate a large amount of data and forward it to a central monitoring server via a wireless link. The central monitoring server extracts the meaningful information from the collected data using big data analysis technologies. The use of big data analysis technologies enables a user to make better decisions by predicting the occurrence of problems such as malfunction, long delay, and failure. Therefore, in recent years, many manufacturing enterprises have been attempting to apply such technologies to their manufacturing process.
Machine failure analysis is one of the most challenging issues faced by manufacturing enterprises, since machine failure may lead to disastrous consequences such as an increase in maintenance cost and defective products. (5) In the manufacturing environment, machine failure can occur owing to different causes. For example, machine failure can occur owing to collision with physical objects in the vicinity (e.g., workers, machines, and other manufacturing equipment). To cope with machine failures efficiently, it is necessary to accurately analyze various causes of machine failure using the data collected from sensors. For an accurate analysis, the relationship between the cause and type of machine failure should be determined precisely. To achieve this, most existing studies proposed predictive models using various big data analysis technologies such as association analysis, regression analysis, and neural network. (6)(7)(8) However, such predictive models suffer from high complexity and high computational burden for data analysis. Therefore, their application in real manufacturing environments is difficult.
In this paper, we propose a predictive model for machine failure analysis, aiming to analyze various causes of machine failure. The predictive model is developed using the dataset consisting of the data collected from the various sensors attached to the machine. Each sensor collects data multiple times when the event (i.e., machine failure) occurs. To develop the predictive model, we conduct the following three steps: 1) dataset classification, 2) attribute selection, and 3) centroid calculation. The first step is to classify the dataset into multiple subdatasets according to the cause of machine failure. Each subdataset is represented by a cluster. In the second step, the attribute that changes most after the machine failure is selected. For this, we calculate the mean of each attribute measured at the same time and compare it with that of the normal case. Finally, the last step is to calculate the centroid of each cluster that maximizes the cohesion of the cluster. In this step, the mean and variance of the selected attribute are calculated to create the elements of each cluster, expressed as two-dimensional points. To verify the feasibility of the predictive model, we conducted an experimental implementation using R Studio version 1.0.153. In the implementation, we used the dataset containing the force and torque measurements collected after detecting the failure of robot execution, provided by the University of California Irvine (UCI) machine learning repository. (9) The results show that the implemented predictive model is feasible for analyzing the causes of machine failure.
The rest of this paper is organized as follows. In Sect. 2, the predictive model for machine failure analysis is described in detail. The implementation results are presented in Sect. 3. Finally, we conclude this paper in Sect. 4.

Predictive Model for Machine Failure Analysis
To develop the predictive model for machine failure analysis, we conduct the 1) dataset classification, 2) attribute selection, and 3) centroid calculation steps sequentially. All the steps are separately conducted for each cause of machine failure. For the development, we use a dataset containing the data collected from the various sensors attached to the machine, which consists of various attributes determined depending on the type of sensor. Each attribute is collected a certain number of times after the event (i.e., machine failure).
In the dataset classification step, the dataset is classified into multiple subdatasets according to the cause of machine failure. Each of the classified subdatasets is used as a cluster. The sensors periodically collect the data a specific number of times when the event occurs. Therefore, each subdataset differentiates the data according to the event. For example, if the sensor generates fifteen data whenever an event occurs, each subdataset differentiates the fifteen data into a set.
In the attribute selection step, one of the attributes that change most after the machine failure is selected. To this end, we calculate the mean of each attribute measured at the same time and compare it with that of the normal case. Then, we select the attribute that has the largest difference from the normal case.
Upon selecting the attribute, the centroid calculation step starts. In this step, the mean and variance of the selected attribute are calculated to create the elements of each cluster. The elements of each cluster are generated for each event. The mean and variance of the selected attribute for the i-th cluster can be given as , , respectively, x i,j,k is the attribute of the k-th measurement for the j-th event in the i-th cluster, and n is the number of measurements. To create the elements of each cluster, we use the mean and variance calculated previously. Each element of each cluster is expressed as a two-dimensional point as where e i,j is the element of the j-th event for the i-th cluster. Afterward, the centroid of each cluster, which maximizes the cohesion of the cluster is calculated as ( ) where c i is the centroid of the i-th cluster, r is the two-dimensional random point, and d(•) is the distance function between two points. This step is performed separately for each subdataset to create the elements and centroid of each cluster. When a new event occurs, the distance between the centroid of each cluster and the point of the event (i.e., the mean and variance of the event) is calculated using the Euclidean distance. (10) Then, the obtained distances for all clusters are compared, and the cluster with the shortest distance is selected.

Implementation Results
The experimental implementation was conducted using R Studio version 1.0.153 to verify the feasibility of the model. We used a dataset including the force and torque measurements collected after detecting the failure of robot execution, provided by the UCI machine learning repository. In the dataset, two causes are considered for machine failure. The dataset also contains normal-case data. Thus, we generated the subdataset for each of the normal, collision, and obstruction cases. Note that the subdataset of the normal case is compared with those of the other cases. Each subdataset commonly contains the following six attributes: forces for the x-axis (F x ), y-axis (F y ), and z-axis (F z ), and torques for the x-axis (T x ), y-axis (T y ), and z-axis (T z ). Each attribute is measured every 315 ms fifteen times after machine failure. Tables 1-3 show the mean of each attribute measured at the same time in the normal, collision, and obstruction cases, respectively. In the tables, the attribute F z in the normal case changes greater than the other attributes after failure of robot execution caused by collision and obstruction. Therefore, F z is selected as the attribute for data analysis. Figures 1-3 show the scatter plots for elements of each cluster. In each figure, the x-and y-axes of the plot indicate the mean and variance of the attribute F z , respectively. Each circle indicates an element of a cluster. In the normal case, all the points are concentrated at a certain point. This is because the F z used in the normal case is almost the same as 15.5. On the other hand, F z varies dynamically when machine failure occurs. In particular, F z is more dispersed in the obstruction case than in the collision case since its variation is larger in the collision case. Each figure shows the mean and variance in the normal, collision, or obstruction case. From the figures, we can extract the centroid of a cluster. Specifically, the centroids of the clusters are (15.6, 39.5), (-29.2, 155944.1), and (-804.2, 1086131.5), respectively. By using these centroids, the causes of machine failure can be analyzed. For example, if the mean and variance of a new event are -31.7 and 260.0, the machine failure is caused by collision.

Conclusions
In this paper, we propose a predictive model for machine failure analysis, aiming to analyze various causes of machine failure. To develop the predictive model, we conducted the following three steps: 1) dataset classification, 2) attribute selection, and 3) centroid calculation. The dataset classification step classifies the dataset into multiple subdatasets according to the cause of machine failure, and the attribute selection step selects one of the attributes that change most after the machine failure. The centroid calculation step calculates the centroid of each cluster that maximizes cluster cohesion. To verify the feasibility of the predictive model, an experimental implementation was conducted using R Studio 1.0.153. We used the dataset containing the force and torque measurements collected after detecting the failure of robot execution, provided by the UCI machine learning repository. In the experiment, the clusters in the normal, collision, and obstruction cases were created, with the centroids (15.6, 39.5), (-29.2, 155944.1), and (-804.2, 1086131.5), respectively. The results show that the implemented predictive model is feasible for analyzing various causes of machine failure.