Partial Least Square-Support Vector Machine for Rapid Detection of Egg Storage Life by Chemometric Processing of Voltammetric Signals

The rapid detection of the grade and storage life of eggs is important for consumers and producers. In this study, an electrochemical system based on voltammetry was used for the rapid detection of egg yolk and egg white mixtures for classification and prediction of storage days. Measuring the Haugh units (H.U.) index of Lindian eggs is aimed at identifying the grades of these eggs and the corresponding storage days. Discriminant function analysis was used to identify accurately the grades of eggs on the basic of storage life. The Gaussian kernel function and a suitable penalty factor were selected to establish a support vector machine for predicting the storage days of eggs. By avoiding deviation in the result due to the correlation of data, which was acquired by square-wave voltammetry technology, the partial least squares and support vector machine were combined, and the predictive accuracy of egg storage days was improved by up to 97.14%. A tenfold cross-validation has been used to evaluate the classification performance of the model; the average accuracy of prediction of storage life was 88.57%.


Introduction
The high quality of energy provided by nutritious eggs is known all over the world. The quality of eggs being sold relates to the interests of consumers. According to the standard stipulated by the U.S. Department of Agriculture, eggs are divided into four grades: AA, A, B, and C. Eggs being sold must be of B grade or above. Because the surface biofilm of eggs is removed in a week or so, air and microorganisms can enter the egg and cause protein membranes to be oxidized, water to evaporate, and the egg yolk to degenerate so as to finally reduce the consumption and commodity values of the eggs. An accurate and quick method of detecting the storage life and grade of eggs is needed for the convenience of producers. (1)(2)(3)(4)(5) With the rapid development of detection technology, more and more detection methods have been applied to the quality detection of eggs, such as high-performance liquid chromatography, atomic absorption spectroscopy, electronic nose technology, and electronic tongue technology. The evaluation of eggs can be roughly divided into shelf-life detection and internal composition detection. (6)(7)(8)(9) In this study, we focused on detecting the mixture of egg yolk and egg white using an electrochemical system based on a three-electrode structure; the traditional egg Haugh units (H.U.) of measurement is used as a reference. First, using discriminant function analysis (DFA) based on the analysis of variance to identify the grade of eggs, we predict egg storage days to enable people to understand the freshness of eggs better. Thus, the prediction of egg storage days using a support vector machine (SVM) was achieved using data acquired with the electrochemical system. At the same time, in order to avoid the deviation in results based on the correlation of data that was acquired by square-wave voltammetry technology, the partial least squares (PLS) and SVM were combined to improve the forecast accuracy of egg storage days.

Experimental system
In this study, an electrochemical system that consists of a three-electrode structure, an electrochemical workstation, and a computer has been built to detect the grade and storage life of eggs. (10)(11)(12) The three-electrode structure was composed of a working electrode, a reference electrode, and an auxiliary electrode. The details of its structure are shown in Fig. 1.
The working electrode, which is also called a sensing electrode, can produce an obvious variation of component concentration, which needs to be tested in the measurement process. In this study, a gold disk electrode was used as the working electrode. The reference electrode consists of a constant-component phase. The electrode potential of the reference electrode did not change during the measurement. The change in the electromotive force of a cell reflects the variation of the electrode potential of the working electrode. In this study, we chose a saturated calomel electrode as the reference electrode. The auxiliary electrode constitutes a battery with the other electrode. We chose a platinum wire electrode as the auxiliary electrode. The reference electrode and the working electrode pole constitute the cell, which forms a current loop. There was a constant voltage system between the working electrode and the reference electrode to maintain the stability of the voltage in the liquid system. Under the control of the voltage, an excitation current was produced in the solution. The loop current, which was produced between the working electrode and the auxiliary electrode, was the output that would be tested as the signal.
The electrode array connected to the electrochemical workstation was inserted into the sample solution when we performed the experiment. On one hand, the electrochemical workstation provided a series of electrical pulses; on the other hand, it collected the response and stored it in the computer. Then, feature extraction was carried out on the stored signal. Finally, pattern recognition algorithms were used to identify the signals in order to distinguish and identify the different substances to obtain taste information on the different substances.

H.U. detection
The traditional methods of measuring the freshness of eggs, called H.U. detection, were used in this study to determine the storage life and grade of eggs. Measuring the H.U. index of the same batches of Lindian eggs was aimed at identifying the grades of these eggs and the corresponding storage days. H In the equation, H denotes the height of the egg white (mm), and w denotes the weight of the egg (g). The height of the egg white can be measured using an albumen altimeter. First, the level of a glass plate is corrected. After the egg is opened on the glass plate, three equidistant points are taken, which are evenly distributed in the middle of the edge between the egg yolk and the protein (avoiding the protein lacing); the height of the egg white is measured using the albumen altimeter, and the average is calculated.
The H.U. tester was used to measure 70 Lindian eggs that were in the same batch with the electrochemical system experiment based on voltammetry. As soon as the eggs were obtained, 10 eggs were randomly selected for testing. The H.U. value could be determined according to the egg weight and egg albumen height measured in the experiment. The remaining 60 eggs were divided into 20 groups and detected every two days with a handy instrument to determine the H.U. value.

Experiment with eggs based on the electrochemical system
The purpose of detecting the mixed liquor of egg yolks and egg whites with the electrochemical system based on the three-electrode structure was to obtain a more accurate storage life and grade of raw eggs. Seventy Lindian eggs from the same batch were selected to be used in the experiment. The eggs were about 5 cm long and 60 to 70 g in weight. There are five steps in each experiment. First, the beakers used in the examination were cleaned with deionized water. Second, three electrodes were soaked in alcohol for 6 min. Then, they were cleaned with deionized water and dried with filter paper. They were fixed in the electrode rack and connected to the electrochemical workstation later. Third, one egg was chosen for the measurement and the egg's yolk and white were mixed uniformly to obtain the mixed liquor. The mixed liquor was divided into 5 portions after being diluted fivefold with deionized water. Then, we put the mixtures in numbered beakers.
The experiment on the mixed liquor should begin in accordance with the purchase date. Fourth, the electrochemical workstation was turned on and the corresponding technology parameter for the experiment was set. Then, the electrode array was put into the mixed liquor to begin the measurement. Fifth, the measurement of the 5 portions of mixed liquor from the same egg was carried out in order. When the experiment on one egg was finished, the electrode was cleaned with deionized water and dried with filter paper before the measurement of the next egg. Ten eggs were chosen daily for the measurement. The experiment was carried out every 6 d by the same method until a total of 50 × 7 sets of data were obtained in 36 d.
The experiment was based on square-wave voltammetry technology. The corresponding parameters were as follows: 1.1 V of initial voltage, −0.6 V of terminal voltage, potential increments of 0.005 V, amplitude of 0.04 V, frequency of 15 Hz, and 15 s of standing time.

DFA
DFA is a multivariate statistical method based on observational variables to determine the classification of samples. (13)(14)(15)(16) It is used to classify type-unknown data by modeling type-known data. Common types of discriminant function analysis include the Bayes discriminant method, the distance discriminant method, and the Fisher discriminant method. In this study, the Fisher discriminant method based on analysis of variance is used to analyze the data. The one-dimensional linear function can replace the n-dimensional space point in order to transform the n-dimensional space observation points into m-dimensional points, and then classify the observation set in the m-dimensional space. The differences in the data, which have reduced dimension, should be extended in order to obtain a high identifiable efficiency. The optimal linear function is selected on the basis of the principle of the maximum ratio of the mean square error in the group and the mean square error between groups.

SVM
The basic idea of SVM is to build a separating hyperplane as the decision surface to obtain the maximum isolated edge between the counterexample and the positive example. (17)(18)(19) SVM could be the vector that is closest to the hyperplane .
An sample is the n-dimensional vectors; the L sample sets can be expressed as The constructed hyperplane can be expressed as where ω is the weighting vector of the hyperplane, Ψ(x) is the nonlinear mapping function, and b is the value for the category field.
By normalization and introducing the relaxation ξ i ≥ 0, in which i = 1, 2, …, L, …, n. The objective function is That means that the misclassified sample is the minimum while assuring the maximum class intervals so that the optimal classified surface can be obtained. C > 0, known as the penalty factor, is used to control the degree of penalty for the misclassified sample. The optimization problems are translated into Langrange dual problems for solving.
There are only a small number of coefficients that are nonzero in the solutions of optimization problems. The vector corresponding to the coefficients is the support vector to determine the optimal generalized classified surface.
For nonlinear classification, the algorithm of point multiplication can be difficult and timeconsuming. The Kernel functions K(x i , x j ) = Ψ(x i ) • Ψ(x j ) that meet the Mercer Theorem in the original input space could be used instead of dot product operations in the high-dimensional feature space. The method is called experienced kernel mapping or kernel trick. There are four types of common kernel function: linear, polynomial, Gaussian, and sigmoidal.

PLS
PLS is an algorithm that is used to fit the relationships of variables to two or more independent variables. (20)(21)(22) Prediction data of PLS were divided into several regions described by the feature vector. In the same regions, vectors are orthogonal to each other; in the different regions, vectors are associated with the model based on the previous channel.
For vectors of n-dimensional samples, the PLS individually extracts 't' and 'u' ingredients from the independent and dependent variable data. The values 't' and 'u' must meet the conditions that carry as far as possible the variations of information of each data table and reach the maximum level of correlation between them. The regression of independent variables to 't' and dependent variables to 'u' is achieved. If there is no satisfactory regression, then the next round of extraction will be carried out until a satisfactory precision is obtained.

PLS-SVM
The PLS-SVM is the combination of the PLS and the SVM. (23)(24)(25) First, PLS is used for data analysis and processing; then, the training samples are disposed of with the SVM to obtain the optimal hyperplane and classify them.
(2) Training samples are simplified with SVM to obtain the support vector and optimal hyperplane, and then the forecast is completed.

H.U.
During the experiments on the quality of eggs, the first experiment determined whether the purchased eggs were fresh, namely, whether the grade of eggs was AA. The next experiment was for determining the eggs' grade for different storage lives and the eggs' corresponding storage days. The results showed that the grade of the same batch of Lindian eggs was AA. The H.U. value was 92.55 (± 8.5). Table 1 shows the result for different storage days by the H.U. method.
As can be seen from Table 1, the H.U. values of eggs were all above 85. Under the conditions of room temperature and 70 to 80% relative humidity, the qualities of eggs would be reduced to A grade when stored for 6 d. The qualities of eggs would become B grade when stored more than 30 d. Eggs stored for more than 36 d would basically have gone bad and become C grade. These eggs were not available for sale any more.

Feature data extraction
After the experiments on mixed liquors of egg yolks and egg whites with the electrochemical system based on the three-electrode structure, the corresponding current-voltage curve was obtained. To compare these curves easily, current-voltage curves under different storage lives were superimposed. Figure 2 shows the current-voltage curve of the mixed liquors of egg yolks and egg whites tested under the conditions of 0, 6, 12, 18, 24, 30, and 36 d.
Because a current-voltage curve of the mixed liquors of egg yolks and egg whites tested using the electrochemical system contained a huge amount of data, it was important to extract the characteristic value. The current values at five special points were selected on the voltammetry curves (the first peak current, the second minimum current, the second peak current, the third peak current, the first minimum current) as characteristic values for each sample.

27
The relationships between the H.U. values of eggs for different storage times and the characteristic value could be established. As shown in Fig. 3, the responses to changes in the first peak current, the second minimum current, the second peak current, and the third peak current varied widely, except for the response to change in the first minimum current. Thus, for the results of the egg mixed liquors, it was feasible to select the first peak current, the second minimum current, the second peak current, and the third peak current as characteristic values.   Figure 4 shows the results of the mixed liquors of egg yolks and egg whites analyzed by discriminant function. DFA was used to determine the different grades of the mixed liquors. The sum of the contribution rates of DF1 and DF2 was 96.7%. Different grades of eggs can be distinguished on the basis of the contribution rate, and the individual sample in the same grade of egg was relatively concentrated.

Prediction of storage life of eggs
In this study, there are some corresponding relationships between the grade and the storage life of eggs. In fact, the prediction of the storage life of eggs is more important than the detection of egg grade because we can know the freshness degree of eggs better. The prediction of the storage life of eggs was the main focus in subsequent research.

SVM
The Gaussian radial basis function (RBF) kernel function was used in the SVM model to predict the storage days of eggs. Training data of 80% data points and 20% for testing purpose were randomly chosen. The training data were used to find the model parameters, and the testing data were used to evaluate the classification performance of the learned model. When the training model was established, we began the prediction for the classification of the testing set. The results are as follows: the accuracy of the prediction of storage life is 94.29%, the penalty factor C is 100, the mean square error (MSE) is 0.1189, and the training time is 1.89 s. Figure 5 shows the result of the classification of the testing set.

PLS
We constructed a PLS model for the storage life prediction of eggs. PLS creates a linear relationship from principal components of the input data. The correlation coefficient obtained from the PLS model was 0.9382 (r 2 = 0.9382), which indicated that the model was valid. In the PLS model, the MSE of the model was 0.5146, and the training time was 1.5 s. Figure 6 shows the relevant curve of the actual and predicted storage lives of eggs based on the PLS model.

PLS-SVM
The principal component obtained from the analysis of PLS was selected as a training data set, and then the SVM was used to establish the regression model; finally, predictive analysis was achieved. The kernel function in SVM was the Gaussian RBF. The accurate recognition rate of prediction for classification was 97.14%. The penalty factor was 100 (C = 100). The mean square error was 0.0567 and the training time was 1.3 s. Figure 7 shows the prediction for storage days based on PLS-SVM.
From Fig. 7, PLS-SVM shows good performance for the prediction of egg storage days. To avoid the contingency caused by the random choice of training set and testing set, the tenfold cross-validations were used to prove the performance of the PLS-SVM model. Table 2 shows the predicted results of egg storage days based on the tenfold cross-validations. Finally, the results are as follows: the accuracy of prediction of storage life is between 74.29 and 100%, the average accuracy of prediction of storage life is 88.57%, and the MSE is 0.2869.

Conclusions
According to the physicochemical properties of a mixture of egg yolk and egg white, an electrochemical system based on a three-electrode structure and square-wave voltammetry technology were used in this study to identify the grade of eggs and the corresponding storage days.
The traditional methods of measuring the freshness of eggs, named H.U. detection as a reference, were used in this study to characterize the egg storage quality. In this study, the detection of the mixture of egg yolk and egg white by square-wave voltammetry technology and DFA was used to identify the different grades of eggs successfully.

31
SVM was established on the basis of the Gaussian RBF kernel function and the penalty factor C (C = 100). The accuracy of the prediction of storage life is 94.29%. The partial least squares and support vector machine were combined to improve the forecast accuracy of egg storage days by up to 97.14%. A tenfold cross-validation was used to evaluate the classification performance of the PLS-SVM model and the average accuracy of the prediction of storage life is 88.57%.