Recognition of Eyelid Movement Using Electroencephalographic Signals

Eyelid movement patterns are a key factor in the detection of fatigue, and in this study, electroencephalography (EEG) was used to record the brainwave patterns associated with eyelid movement in subjects during various stages of fatigue. The three movements involved were no eyelid movement, closing the eye, and opening the eye. The collected signals were processed using the wavelet transform (WT) to break down the EEG signal and obtain the main features. The support vector machine (SVM) and back propagation neural network (BPNN) were used to determine eyelid movement conditions.


Introduction
Fatigue is a normal physiological reaction that usually follows a strenuous exercise, intense mental exertion, or anxiety. Fatigue is temporary and subsides with rest, even after a very intense activity or a panic attack. Fatigue can be exacerbated by insufficient sleep, a heavy workload, or undue mental stress. Driving a vehicle, when suffering from fatigue can be very dangerous and can cause traffic accidents that result in serious injury or death.
The harm that can result from driving when fatigued can be very serious. The state and degree of fatigue are closely related to eyelid movement, and in this study, physiological signals were collected to explore eyelid movement patterns. Caton (1) detected potential differences in the cerebral cortex of animals using a galvanometer in 1875, and Berger (2) recorded the first electroencephalography (EEG) in 1931. EEG equipment has now become very sophisticated, and emotional reactions (3) and concentration (4) in humans can be studied in some considerable detail, (5) and can even be used with game programs. (6) Ramachandran and Sazali (7) induced emotions by vision and hearing, and analyzed them with much channel electro-encephalogram signals. Chu et al. (8) collected EEG signals from schizophrenic patients, induced emotions by vision, and correlated the EEG signals with the level of illness. Studies of concentration or attention using EEG signals are an important recent focus. Morris et al. (9) investigated the salience of speech contrasts in noise, in relation to how listening attention affected scalp-recorded cortical responses. Kim et al. (10) studied the EEG signals generated before an exercise in an attempt to classify the subjects' intentions. They discovered that the EEG signal generated during preparation for an exercise could control brain-machine interfaces (BMIs). Sestito et al. (11) used EEG signals to explore visual perception in pilots. Abiri et al. (12) explored various EEGs and brain-computer interface (BCI) systems, and presented the advantages and disadvantages of various examples. Chiang et al. (13) explored a detection model that enhances the attention of learning with EEGs; their results indicated that having a nap in the afternoon will enhance attention in a follow-up learning process. Das et al. (14) used EEG signals to detect the cognitive ability of concentrated and focused attention and working memory. When classified, the levels of attention were found to be 84 and 81%, respectively. Chen et al. (15) decomposed an EEG into multiple bands using the wavelet packet transform (WPT) and used a combination of the likelihood of synchronization (SL) and the minimum spanning tree (MST) to evaluate the degree of drowsiness in vehicle drivers. The results enhanced the k-nearest neighbors algorithm (KNN) classifier to 98.6%. Khalaf et al. (16) used EEGs to explore different degrees of attention through auditory stimulation at 40 Hz.
Recent studies related to concentration or attention have used many different methods; one of the most frequently used methods has been the wavelet transform (WT). Stankovic and Falkowski (17) proposed the WT analysis method, their main objective being to show results unavailable through the conventional Fourier transform (FT). Mallat (18) introduced multiresolution analysis to wavelet analysis and constructed a wavelet function that could decompose and reconstruct a signal. Daubechies (19) developed a hierarchically compact orthogonal wavelet, now known as the Daubechies wavelet, which is primarily applied in the discrete wavelet transform (DWT) and widely used for digital signal analysis, signal compression, and noise filtering. Courmontagne et al. (20) used wavelet signals to solve the problem of noise in underwater acoustic signals. Chen et al. (21) used EEG signals to explore the attention level and used DWT to decompose the signal and obtain information from each frequency band. Belle et al. (22) used electrocardiogram (ECG) and EEG signals to analyze brain stimulation in the studies of attention, and also used DWT in processing the EEG signals. Chen et al. (23) used the WPT to decompose an EEG signal into multiple frequency bands to evaluae the degree of drowsiness in vehicle drivers, and then used the phase lag index (PLI) to evaluate the information from each frequency band. Hazarika et al. (24) also used DWT to decompose an EEG signal to each frequency band when investigating how long-term action video gaming modulates the neural processes of the inhibitory control mechanism. Wang et al. (25) converted EEG signals to individual frequency bands using the WPT in a driver drowsiness investigation. Artificial neural networks (ANNs) are frequently used in machine learning and cognitive science; such mathematical models imitate the structure and function of biological neural networks. At present, ANN technology is widely used in pattern recognition and classification, because it is robust and has good learning capability. The ANN learning model and the study of EEG signals often include a back propagation neural network (BPNN), (26,27) a probabilistic neural network (PNN), (28) or a general regression neural network (GRNN). (29) Such studies collect brain wave signals, select sample features, and then use a classifier to recognize eyelid motion.

Materials and experimental setup
In this study, a NeuroSky Mindwave headset, which uses a dry electrode, was used to collect EEG signals (Fig. 1). The use of a dry electrode, as opposed to adhesive conductive electrodes, is less limited by the environment. The sampling rate of this device is 512. The measurement position is the left frontal lobe (Fp1) and the earlobe (A1) is used as the reference point (Fig. 2).
The learning procedure used was as follows: the subjects stood in front of the computer camera and opened and closed their eyes once per second, fifteen times. The number of times of each opening and closing of the eyes was determined from the captured images and could be correlated with the brainwaves collected (for 128 data points) simultaneously. The features for each frame were determined and classified. Nine subjects took part in the experiments and the data from five, chosen at random, were used as training samples. The features from the experiments with the remaining four subjects were used as test samples for the support vector machine (SVM) and BPNN, and their recognition rate was calculated separately. The flow chart of collecting signals to recognition classification is shown in Fig. 3. The hardware equipment and software configuration of the computer of this study are shown in Table 1.

Signal analysis and feature calculation
The original signal collected by the EEG headset and PC program is in the time domain. This signal includes several different types of noise, increasing the difficulty in distinguishing the eyelid movement status. Therefore, processing and analysis must be carried out to separate the feature from the original noisy signal. The most common conversion method used FT, which focuses on filtering, or compressing the periodic signals, but for these noisy signals, it is not very effective. However, WT, which primarily uses the mother wavelet, gives where ψ(t) is the mother wavelet, 1 a is the normalization factor that maintains the wavelet orthogonal base, τ is the translation parameter, is the dilation of the mother wavelet. Although a horizontal movement is achieved through (t − τ), different a and τ values will have different effects on the mother wavelet, as shown in Fig. 4. DWT is the simplified form of the continuous wavelet transform (CWT). Since CWT calculates the inner product at different times and scales of the mother wavelet, that will  increase during calculation. DWT sends the original signal through both high-pass and lowpass filters in the wavelet and scaling functions, respectively. The brainwave signal will be divided into an approximated signal and a detailed signal after passing the high-pass and lowpass filters; the related equations are Here, φ(t) is the wavelet function, ψ(t) is the scaling function, and d j and c j are the wavelet and scaling coefficients of the j layer, respectively. In this study, the Daubechies (19) wavelet function is used for signal decomposition. In Matlab, the Daubechies wavelet function is expressed in the form of dbA. A is the vanishing moment of the Daubechies wavelet. The db4 of the Daubechies wavelet family is used in this study. Wave diagrams of the wavelet and scaling function waveforms are shown in Fig. 5. The horizontal axis shows the time and the vertical axis shows the amplitude. The Daubechies wavelet is mainly used for discrete wavelet conversion and frequently used in digital signal analysis After the db4 WT, the corresponding frequency band of the original brainwave can be calculated as where f is the upper limit of the frequency at level j, F s is the sampling frequency, and N p is the number of input data points. Assuming that the brainwave signal being read is X N (n), N is the Nth A5 frequency band data. After capturing each frequency band via WT, the following can be derived: maximum, minimum, summation, range, standard deviation, and median absolute deviation.

SVM
SVM is a method proposed by Vapnik (30) in 1999. It solves many problems of classification and is very popular for machine learning. SVM is a type of supervised learning network that can establish input and output planes (a hyperplane) in the training data and predict output results of the corresponding input data via the hyperplane. SVM can be linear or nonlinear. Linear SVM finds a separating hyperplane from the input training data to maximize the margin of the two types of data. The hyperplane can be distinguished by definition as : If f is greater than 0, that piece of data is +1; if it is less than 0, it is classified as −1. However, according to this type of classification, w and b will have infinite combinations. The main objective is to distinguish the hyperplane from the largest area to maximize the separation of data and use this method to reduce test errors effectively. The conditions must satisfy 1 for 1 Equations (11) and (12) can be combined into an inequality equation such as The distance 0 x w b ⋅ + = can be calculated from Eqs. (11) and (12), where the distance is 1 w and the boundary is 2 w . To find the maximum boundary of the hyperplane, it is necessary to find the minimum w 2 under the condition of Eq. (13). Any x i that validates the equal sign is a support vector. Lagrange optimization can be used to find the minimum 2 w . The Lagrange function is expressed as In Eq. (14), α i is the Lagrange coefficient, and α i > 0. The partial differentiation of w and b in Eq. (14) making it equal to 0 allows the identities of Eqs. (15) and (16) to be obtained.
Equation (18) can be satisfied if x i exists. x i is the vector that is closest to the optimization differentiation hyperplane, that is, if there is one x i 's a i * ≥ 0, it can be regarded as a support vector. Finding a support vector is the same as finding the maximum boundary. Finally, a function that classifies the problem can be generalized as When f(x) > 0, the classification of that piece of data is the same as the classification and label of the data; if it is different, then it is in another category. Another important point of SVM is the kernel function. SVM can map the input dataset into the feature space using the kernel function. Different kernel functions give different classifications. The common kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid. In this study, the polynomial kernel function [Eq. (20)] was used for classification.

BPNN
An ANN is a biomimetic neural network, which is connected by many artificial neurons to process calculation, and uses the transform function of the weighted input value product to represent the identity of the input and output values. An artificial neuron model is shown in Fig. 6.
As shown in Fig. 6, the output value of the j unit in the nth layer is the nonlinear function of the output value of the n − 1 layer unit shown as In the above formula, n j net is the integrated function, while f is the transform function. BPNN is the most common application and the current neural network leaning model. The model inputs features from the input layer, transmits the data to the hidden layer with an initial set weight, and transforms the input sum to the variables of the hidden layer via the transfer function. During the process, the transform function is responsible for the summation and transformation of the input signals, and transmits the signals to the next layer. The most commonly used transform function is the S-type, which has both forward and negative convergences. The S-type transform function that was selected for use in this study is the hyperbolic tangent function shown as The hidden layer can increase the complexity of the neural network by the simulation of many complex nonlinear relationships. However, if there are too many hidden layers, the memorization of the data structure of the training group and the generation of the over fitting status are not a trivial matter for the neural network. There will be further weight calculation and summation, from the hidden layer to the output layer, to enable hidden layer information to be transformed to output data. After this, the error function is used to calculate the difference between the ideal output value and the output value calculated by the neural network. The error function used in this study is the mean squared error (MSE) and is shown as where T k is the target value of the output layer and Y k is the inference value of the output layer calculated by the neural network. After the completion of the forward pass, backward pass is entered, pushes back from the output layer, and updates the weights to minimize the error function. The learning process of the entire BPNN facilitates the minimization of the error function. This usually employs the gradient steepest descent method to minimize the error function. When a training sample is input, the weight in the network will be adjusted slightly. The sensitivity of the error function to the weighted value is proportional to the level of adjustment. The equation of the weight is shown as In Eq. (24), W ij is the weight between the ith processing unit of the n − 1 layer and the jth processing unit of the n layer. η is the learning rate that controls the step size of the error function and is minimized each time by using the steepest descent method. After using the chain rate to expand ij E w ∂ ∂ , Eq. (25) is obtained: Substituting the integrated function into Eq. (25) gives Substituting Eq. (21) into Eq. (25) gives If the nth layer is the final layer, then the substitution of Eq. (23) gives If the nth layer is not the final layer, but one of the hidden layers of the network, then Eq. (29) can be derived as Finally, ij E w ∂ ∂ can be written as the general equation In the above equation, The nonlinear transform function used in this study was a hyperbolic tangent function. If we follow Eq. (27) to differentiate n j net in Eq. (22), Eq. (32) can be derived as The difference volume δ of the output layer can be derived by the steepest descent method as Similarly, the equation for the change in the volume of the threshold θ is as shown as From the above derivations, it can be seen that every time there is a backward processing pass, the difference between the weight and the threshold can be obtained. By gradually updating the weight and threshold, the error function can be converged.

Results and Discussion
The brainwave sampling rate used was 512 Hz, but the time taken by the action of opening and closing the eyes was found to be about 0.25. To obtain the waveforms associated with the actual opening and closing eyelid movements, and avoid excessive signals from eyes that were not opening or closing, the number of inputs points was reduced to 128, as shown in Fig. 7. The overlap was set to 78% (1−28/128 = 0.78), and only 28 pieces of brainwave data were updated at a time. This prevented the horizontal distance between frames that captured a brainwave from becoming too large, which would cause recording failure. The brainwave obtains the bandwidth corresponding to the wavelet of each order and the brainwave frequency band corresponding to the frequency bandwidth via Eq. (4), as shown in Table 2.
The transformation of the original brainwave using a five-hierarchy WT allows various types of brainwave to be obtained and, in this study, the original brainwaves were decomposed to allow a search for a waveform that could be used as a basis for classification. The results are shown in Fig. 8.
As can be seen in Fig. 8, the opening and closing of the eyes can be clearly distinguished in the A5 wavelet-transformed frequency band, which was chosen for classification in this study.

Classification results of various feature combinations
The classification correction rate equations used in this study to analyze the above-captured frequency bands and select the features that could be used as the basis for major classification are shown below:   Since the SVM and BPNN classifiers were used in this study, the experiments were performed separately in the feature selection section, and the features that offered good rates were used for subsequent experiments.
Three different modes were used in the following correct classification rate tests. Test A, all the data were used for training and testing; Test B, half the data were used for training and all the data were used for testing; and Test C, half the data were used for training and the other half for testing. The results of the three tests were then compared.
The signals of the A5 frequency band were used to calculate six features, namely, maximum, minimum, summation, range, standard deviation, and median absolute deviation. All the features were input into SVM and BPNN, and the classification results are shown in Table 3.
The experimental results in Table 3 show that SVM and BPNN were stable to a certain extent and had good classification accuracy. Both are suitable for the classification of brain waveforms. The selection of features is detailed in the next section and more meaningful features were selected to improve recognition.

Classification of features correction rate
In this part of the study, six features were classified as groups to compare the classification correction rate. Each group had two features and the test method chosen was Test C. Figure 9 shows the separation hyperplane diagram drawn from SVM classification data using maximum and minimum values. Table 4 shows the classification results obtained using SVM and BPNN in three groups covering the maximum and minimum, the summation and range, and the  standard deviation and median absolute deviation. The results show that the classification correction rate is highest when maximum and minimum rates are used. This is because the largest difference between the maximum and the minimum is observed for the closing and opening of eyes. In this experiment, two features were classified as one group and SVM and BPNN classifications were used to correct the rate experiments. When the number of input features was changed to three, the SVM separation hyperplane of SVM was also transformed into threedimensional (3D) space. In Fig. 10, the x-axis is maximum, the y-axis is minimum, and the z-axis is summation. The comparative experiments of the three features are organized and shown in Table 5. The results show that the classification correction rates of maximum and minimum features are closely related and also related to the summation and range. On the other hand, the results also show that the standard deviation and median absolute deviation will reduce the classification correction rate; hence, the classification of four features in the next group will focus on the maximum and minimum features, and group with other features to find the four features that provide the highest classification correction rate.
The classification correction rate results of the SVM and BPNN classifications that adopted four features are shown in Table 6.
The experimental data in this section show that when the maximum, minimum, summation, and range are defined as input features, although the classification correction rate of SVM remains at around 95.00% and does not increase, the classification correction rate of BPNN can go up to 98.33%, which is very close to the expected correction rate for the offline classification of this study. Therefore, the follow-up online classification experiments used the maximum, minimum, summation, and range to go with BPNN to carry out the online classification correction rate experiment.

Online real-time detection recognition results
The features finally used in this study were the maximum, minimum, summation, and range. However, online real-time detection and recognition are not possible to confirm the data collection status and implement cutting by replaying the video clips. Therefore, this experiment used 128 original data as signal cuts and captured 28 data each time for data update; this resulted in an overlap of 78%. Each subject closed and opened their eyes 20 times, and the results of online classification are shown in Table 7. We determined from the results that online real-time recognition had an average recognition rate of 85%.

Conclusions
In this study, we used single-point EEGs and focused on the closing and opening of the eyes of the subjects to analyze the level of fatigue. The signal analysis methods used the signals in the A5 frequency band after WT. After the capture of six features from the signals, SVM and BPNN were used for training and analysis, and to determine the correlation between eye closing and opening. Experimental results showed that the calculation method using four features (maximum, minimum, summation, and range) was suitable for recognizing eye closing and opening movements, and that the maximum and minimum features were the most relevant in this respect. In the offline case, the recognition rate was up to 98.33%, and the average online recognition rate reached 85%. It is expected that these results will contribute to studies related to driving safety in the future.