Combination of Variational Mode Decomposition for Feature Extraction and Deep Belief Network for Feature Classification in Motor Imagery Electroencephalogram Recognition

Brain–computer interface (BCI) technology based on motor imagery (MI) can establish the connection between the brain and the outside world; this has gradually become an important application of human–machine hybrid intelligence enhancement, especially for medical rehabilitation treatment. Because of the nonlinear, nonstationary, and low signal-to-noise ratio (SNR) characteristics of electroencephalogram (EEG) signals, it is a great challenge to accurately classify MI-EEG signals. Toward this end, in this study, both variational mode decomposition (VMD) and a deep belief network (DBN) were applied to MI classification based on the dataset of a previous BCI competition. Firstly, the EEG signal was decomposed by VMD to obtain the narrow-band component. Then the marginal spectrum, the instantaneous energy spectrum under the characteristic frequency band, and time-frequency joint features were extracted by Hilbert transform to achieve feature fusion. Finally, the DBN was used to reduce the dimensions of high-dimensional features and recognize MI patterns. The experimental results show that the joint VMD feature extraction and DBN feature classification method avoids information omission caused by the manual optimization of the period and the frequency band of artificial determination imagery. On the other hand, the method of using VMD and DBN to automatically extract the characteristics of the optimal period and optimal frequency band can effectively improve the recognition rate of MI


Introduction
A brain-computer interface (BCI) system operates an output device by analyzing the input electrophysiological signals and decoding the user's intention into control instructions. (1) In accordance with the different forms of signals, an electroencephalogram (EEG) can be divided into electroencephalograph, magnetoencephalogram (MEG), and functional magnetic resonance imaging (FMRI) signals, where EEG signals are widely used in BCI systems due to their noninvasiveness and low cost. (2,3) The principle of motor imagery (MI) is to generate event-related desynchronization/ synchronization phenomena for EEG signals on both sides of the brain during imaginationinduced movement, which has become a research direction in the field of EEG signals. (4) EEG signals are, for example, weak, nonlinear, nonstationary, and sensitive in time; thus, timefrequency domain analysis and spatial filtering are widely used in feature extraction. (5) Timefrequency domain analysis mainly includes the short-term Fourier transform (STFT), (6) wavelet transform (WT), (7) wavelet package transform (WPT), (8) and common spatial pattern (CSP) method. (9) However, the time-frequency domain analysis methods based on STFT, WT, and WPT cannot achieve high resolution in the time and frequency domains. However, with a change in brain state, the characteristics of an EEG signal in the time-frequency domain will fluctuate. Chen et al. combined Shannon entropy, wavelet entropy, and sample entropy for feature extraction. (10) Tan et al. proposed a multifeature extraction method based on ensemble empirical mode decomposition (EMD) and approximate entropy. (11) In summary, the above methods show good adaptability and high recognition accuracy, consider the fusion of features from a single angle, and cannot obtain a more complete description of the signal from multiple angles. However, variational mode decomposition (VMD) can decompose an EEG signal into several narrow-band components, which contain the characteristics of the signal at different time and frequency scales. Dragomiretskiy et al. applied this phenomenon to solve the problem of mode mixing. (12) A deep belief network (DBN) can automatically learn valid features to reduce the dimensions of features. (13) Movahedi et al. applied some of these features to EEG signal processing, such as sleep stage and emotion recognition, and epilepsy diagnosis. (14) A DBN can reduce the dimensions of high-dimensional features to obtain the optimal features of MI and avoid reducing the recognition rate. With this background, we propose in this paper a method comprising a combination of VMD, Hilbert transform, and a DBN for the recognition of MI-EEG patterns while avoiding the reduction of the recognition rate caused by manually determining the optimal period and optimal frequency band.

Related Work
Most of the existing methods are based on the use of the spatial and time-frequency characteristics of EEG signals to classify MI. The methods based on spatial features are mainly the CSP method and its modifications. The CSP method is based on matrix diagonalization to construct the optimal spatial filter for projection and then obtain the feature vectors with high discrimination. Zhang et al. proposed a multicore extreme learning machine based on the CSP method, in which the CSP method is used to extract spatial features for MI tasks. (15) Zayyanu et al. proposed the filter bank common spatial pattern (FBCSP) to solve the problem that the effectiveness of the CSP method depends on the selection of an appropriate frequency band. (16) Lu et al. proposed a deep learning model based on a restricted Boltzmann machine (RBM) that uses the fast Fourier transform to extract time-frequency features in EEG signals. (17) In addition, Sun et al. proposed extraction methods based on bispectral features. (18) However, it is necessary to acquire considerable prior knowledge to extract EEG signal characteristics. The end-to-end deep learning framework combines multiple processing stages, such as data processing and feature extraction, into a single model, and establishes a direct projection from the input to the output. For this reason, the emergence of deep learning has provided ideas for the establishment of end-to-end models; Dai et al. attempted to construct end-to-end deep learning models for MI classification. (19) However, EEG signals have the characteristics such as nonlinearity, nonstationarity, and low signal-to-noise ratio (SNR). These characteristics give rise to several major problems in the construction of an end-to-end model based on EEG signals. Schirrmeister et al. used the end-toend learning advantages of ConvNets to build ConvNets models with three different architectures: the Deep-ConvNets model, Shallow-ConvNets model, and Hybrid-ConvNets model. (20) Zhao et al. proposed the WaSF ConvNet model as an improved ConvNets model to solve the problems of the traditional model, such as the difficulty in interpretation and the large number of parameters, to realize feature learning of space-time union. (21) However, the above models generally use single-scale convolution, and the features extracted using this structure are limited. Although there are a few multiscale models for MI classification, Tang et al. proposed a single-scale convolution model that relies on the preprocessing of EEG signals. However, the multiscale span of these models is too small to fully extract EEG features. (22) In addition, the classification results of most existing models depend, to some extent, on the richness of spatial information in EEG signals. Data acquisition with fewer channels is more convenient, so when spatial information is insufficient, how EEG signals are extracted and classified is an important problem. Therefore, we use VMD to decompose EEG signals. The marginal spectrum (MS), instantaneous energy spectrum (IES), and joint time-frequency features (JT-Fs) are extracted by Hilbert transform to cover the time-frequency domain features under the complete MI time history, and the three features are integrated. A DBN is introduced to reduce the dimensions of the high-dimensional features after fusion to obtain the optimal features of MI and recognize the MI-EEG pattern, which prevents the reduction of the recognition rate that occurs upon manual determination of the optimal period and optimal frequency band.

High-dimensional JT-F Extraction Based on VMD
The framework of feature extraction and recognition based on VMD and DBN is shown in Fig. 1. When the MI-EEG signal is collected, several electrodes fixed on the head are used to obtain multichannel signals, and all the electrodes record the signals generated by the brain at the same sampling frequency. The MI-EEG signal used by the participants of this study was typically collected between the time of the MI reminder and the end of the activity. The MI-EEG signal is defined as where n is the number of samples of the MI-EEG signal, T = t × f is the number of time points of each EEG signal (t is the duration of the MI segment and f is the sampling frequency in Hz), and C is the number of channels of the EEG signal. The MI-EEG signal classification problem can be defined as where F is the mapping function, X represents the input EEG data, and Y pre is the prediction result of the model output.
Since the WPT has the advantages of no redundancy and no omission, it can realize the local time-frequency analysis of the signal and meet the analysis requirements of an EEG signal containing many details. Therefore, after comparing various preprocessing algorithms, we decided to use the WPT to extract the characteristic frequency band. The wavelet packet function is expressed as (23) 2 0 where u n (t) is the expression of the signal at time t at decomposed node n, l is the displacement, and H 0 and H 1 are a pair of conjugate orthogonal wavelet packet filters satisfying H 1 (k) = (−1) k H 0 (1 − k). When n = 0, the scale function u 0 (t) = φ(t) and the wavelet basis function u 1 (t) = Φ(t) can be obtained. The WPT has a similar structure to a binary tree. If the nodes in the wavelet packet decomposition are denoted as ( j, p) (where j is the depth or the number of layers of wavelet packet decomposition and p is the frequency band order of the nodes), then after the wavelet packet decomposition of layer j, the frequency band range corresponding to node j is where f s is the sampling frequency. When the WPT is performed on an EEG signal, it is necessary to first select the appropriate wavelet basis function in accordance with the characteristics of the EEG signal and then to determine the characteristic frequency band of the EEG signal to determine the numbers of layers and reconstruction nodes of wavelet packet decomposition. In the VMD algorithm, the intrinsic mode function (IMF) is defined as the amplitude-frequency modulation signal, which is expressed as where A k (t) (A k (t) ≥ 0) defines the instantaneous amplitude, and φ k (t) is the phase. The frequency of IMF is defined as ω k (t) = φ k (t). A variational problem is first constructed. Assuming that the , the decomposition sequence is guaranteed to be a modal component with finite bandwidth with a central frequency, and the sum of the estimated bandwidth of each mode is minimum. The constraint condition is that the sum of all modes is equal to the original signal; then the corresponding expression for the constrained variational problem is where K is the number of modes to be decomposed, u k and ω k are the kth modal component and the center frequency after decomposition, respectively, δ(t) is the Dirac function, and ⁎ is the convolution operator. To solve Eq. (4), we introduce the Lagrange multiplication operator λ, transform the constrained variational problem into an unconstrained variational problem, and obtain the augmented Lagrange expression as where α is a quadratic penalty factor. The alternating-direction multiplier method is used to update where X is the signal to be decomposed, ω k = ω k+1 , and In this study, the MI-EEG signal was first filtered using an 8-30 Hz bandpass filter, and then K IMF components were obtained by VMD of the filtered EEG signal. To extract the timefrequency characteristics of the MI-EEG signal, the IMF components were decomposed by Hilbert transform as follows: where ℜ is the Cauchy principal value, t 0 is time in the time domain, a k (t) is the instantaneous amplitude, and θ k (t) is the instantaneous phase. Therefore, the instantaneous frequency is defined as The original EEG signal can be expressed as the sum of each IMF component.
In addition, the Hilbert spectrum is defined as The Hilbert MS, IES, and JT-F under the characteristic frequency band are calculated as follows.
For Eq. (15), to avoid the omission of time-frequency domain information caused by manual selection, T is set as the end moment of MI. Because channels C 3 and C 4 are distributed on the left and right sides of the brain, respectively, whereas channel C Z is at the center of the brain, in the left-handed and right-handed MI experiments on the subjects, only the EEG signals of channels C 3 and C 4 can reflect the event-related desynchronization/event-related synchronization (ERS/ERD) phenomenon. Therefore, in feature extraction, we do not process the channel C Z EEG signal, and the MS feature F MS of channels C 3 Thus, the time-frequency joint feature F JT-F of channels C 3 and C 4 is

High-dimensional Feature Classification Based on DBN
A DBN consists of several RBMs in the bottom layer and a classifier in the top layer. The two adjacent layers make up one RBM, and each layer is independent of the other. However, data can be transformed among layers in accordance with the learning rules of the RBM through specific activation functions. Figure 2 shows the structure of the five-layer DBN. In Fig. 2, each RBM is composed of a visible layer (v) and a hidden layer (h). The layers are interconnected with each other through weight w, but there is no connection within the layers. It is assumed that layer v includes n visible units and layer h includes m hidden units.
As shown in Fig. 2, the energy of one group of RBM systems in a certain state is where v i defines the state of the ith visible unit and h j is the jth hidden unitary state. , i j ∀ , v i ∈{0, 1}, h j ∈{0, 1}, and θ = {a i , b j , w ij } are the parameters of the RBM model, w ij is the connection weight between visible element i and hidden element j, and a i and b j are the offsets of visible element i and hidden element j, respectively. The joint probability density of the visible layer and hidden layer is defined as According to Eq. (24), σ is the activation function, and the probabilities of the hidden node and visible node being activated are To make the RBM fit the training samples well, the training objective is set as the maximum likelihood function, Contrastive divergence is used to obtain a better estimate of the visible layer with only a few state transitions. More specially, the algorithm first calculates the conditional probabilities of all the hidden layer elements, then uses Gibbs sampling to determine the states of the hidden elements. Following this, it calculates the state of the visual layer to complete the reconstruction of the visual layer and finally solves Eq. (26). Therefore, the updating process of weight and bias can be defined as where orig ⋅ denotes the expectation of the original data, recon ⋅ denotes the expectation of the reconstructed data, and ε is the learning rate. For the parameter ε, Liu et al. proposed the use of the parameter Momentum of the momentum term to make the convergence more accurate. (24) Momentum is defined as where ρ is the learning rate of Momentum. Then the update rule of the weight bias becomes (29)

Datasets and processes
To verify the universality and effectiveness of the proposed method, BCI 2003 Competition Dataset III (S1), BCI 2005 Competition Dataset III b (S2-S4), and BCI 2008 Competition Dataset II b (S5-S7) were used in the experiment. The information of each dataset is shown in Table 1.
During the signal collection process, for a reason related to the acquisition equipment, some of the time block signals were not successfully collected, and these data points were represented as nonnumeric points in the signals. To improve the SNR, the experimental data were preprocessed and nonnumeric points in the data were set to zero. To reduce the workload of data processing, a downsampling process was performed on the response data of subjects S5-S7, and the sampling frequency was reduced to 125 Hz. Since the ERD/ERS phenomenon of MI mainly occurred in the Mu rhythm (8)(9)(10)(11)(12)(13) and Beta rhythm (16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28), the original signal was passed through a bandpass filter of 8-30 Hz, which is a Butterworth filter of order six, and the stopband cutoff frequency of the Butterworth filter was set to 6 and 32 Hz, respectively. The learning rate, number of pretrainings, and number of fine-tunings of each subject are shown in Table 2.

Results and analysis
To compare the effect of different feature extraction methods on the recognition rate, three reference groups, i.e., F MS , F IES , and F JT-F , were designed and compared with the proposed method. Figure 3 shows the maximum recognition rate of each subject for the three reference groups and the proposed method. As shown in Fig. 3, the average maximum recognition rates obtained using MS features, IES features, time-frequency joint features, and fusion features are 60.5, 73.1, 77.5, and 78.5%, respectively. The classification effect of MS features is relatively The preprocessed EEG signal was decomposed by VMD in the experiment. The results of VMD of the channel C 3 EEG signal of the first subject and its component spectrum are shown in Fig. 4. In the experiment, the EEG signal was decomposed by EMD, which served as the benchmark algorithm of VMD. The EMD results and component spectrum of the channel C 3 EEG signal of the first subject are shown in Fig. 5. The frequencies of the IMF1 and IMF7 components of the adaptive decomposition are concentrated in the range of 8-30 Hz as a result of prefiltering of the EEG signal. In addition, the frequency band aliasing of components obtained by EMD is serious, while the results of VMD are all narrow-band components, so the same component information does not appear in different components. In the experiment, it was found that the characteristic frequency band is mainly distributed in the first three order components.   The first three order components of EMD were selected to extract the same high-dimensional features, and the DBN was used to reduce the feature dimensions and classify the features obtained by the two methods. The recognition rates obtained by the two decomposition methods were compared, and the recognition results are shown in Fig. 6. Because of the better decomposition effect, the features extracted by VMD have a better recognition rate than those extracted by EMD.

Conclusions
BCIs, as an important application of human-computer hybrid intelligence enhancement, can control external devices through brain activities and thus establish the connection between the brain and the outside world. In early studies, BCIs were mainly used for the rehabilitation of stroke patients, and later, they began to be used in a wide range of fields, such as wheelchair  control, spelling machines, and emotion recognition. BCIs can make full use of brain activities such as electrophysiological and hemodynamic activities to enable interaction between the brain and the outside world. In this study, the VMD method was applied to decompose the EEG signal using the Hilbert transform to extract the MI of MS, IES, and JT-F. In addition, the four-layer DBN network was introduced to achieve dimensionality reduction of the integrated highdimensional feature and to obtain the optimal feature and classification. The recognition rate of the MI-EEG signal was improved. The proposed method was used to conduct classification experiments with BCI competition datasets, and the average accuracy was 79%. The proposed method effectively improves the recognition rate of MI-EEG signals and lays a foundation for the application of MI BCIs.