Modular-neural-network-based Decision Fusion for Steady-state Visually Evoked Potential-based Brain–Computer Interfaces

It is very difficult for a patient with severe disabilities to communicate with others or devices, greatly reducing the quality of their lives. In this study, a steady-state visually evoked potential (SSVEP)-based brain–computer interface (BCI) is proposed to make it easy for patients with severe disabilities to communicate. To precisely represent the characteristic of an elicited SSVEP, the four features extracted by fast Fourier transform, canonical correlation analysis, magnitude-squared coherence, and power cepstrum analysis are used. To fuse the decision results obtained by using the different features, a modular neural network (MNN), which includes input models and a decision model, is adopted to improve the recognition performance. To balance the recognition performance and computational complexity, an artificial neural network based on multilayer perceptrons is selected as the basic unit of the MNN. On the basis of the different features, the input model of the MNN can quickly find the decision results. To effectively fuse the decision results, the decision model of the MNN is adopted to obtain a precise decision. The experimental results demonstrated that the MNN has a higher accuracy than other approaches. Therefore, the proposed SSVEP-based BCI with the MNN can effectively help patients interact with their surroundings.

To help patients communicate with others or devices, augmentative and alternative communication (AAC) systems have been developed. (2)(3)(4)(5) Unfortunately, these AAC systems, which need to use the voice, mouth movement, and tongue, or require the voluntary control of the user's limbs, are unsuitable communication mechanisms for patients with severe disabilities. Recently, brain-computer interfaces (BCIs) have been successfully applied to implement AAC systems by electroencephalography (EEG) analysis. (6)(7)(8)(9)(10)(11)(12) BCIs enable patients to directly communicate with others or send commands to an external device by measuring the brain activity. The brain activity patterns used in BCIs can be categorized into P300 potentials, (6,7) steady-state visually evoked potentials (SSVEPs), (8,9) and motor imagery. (10)(11)(12) For these approaches, SSVEP-based BCIs, which are stimulated by a visual stimulus with a specific frequency, can achieve an excellent signal-to-noise ratio, making them very suitable for practical applications. Therefore, the development of SSVEP-based BCIs would greatly benefit patients in their daily lives.
Accurately representing SSVEP signals is one of the effective ways of improving the performance of SSVEP-based BCIs. (13,14) To find the frequencies corresponding to the visual stimulus frequencies, the fast Fourier transform (FFT) and Hilbert-Huang transform were developed and used to obtain the corresponding commands or messages. (15) Canonical correlation analysis (CCA) and magnitude-squared coherence (MSC) were proposed to find the corresponding correlation between the stimulus and a set of predefined pure sine and cosine reference templates in the time and frequency domains, respectively. (16,17) Power cepstrum analysis (PCA) was also adopted to represent the characteristics of SSVEP signals. (18) Therefore, fusing the results of different features should be able to effectively improve the performance of SSVEP-based BCI systems.
Over the last decade, artificial neural networks have been widely applied to many applications. (18)(19)(20) Neural networks have been applied to pattern recognition with superior performance to traditional methods. Researchers have started to build hybrid systems that combine the advantages of modular neural networks (MNNs). MNNs have the capability to learn different tasks simultaneously and reduce the complexity of systems. They have robustness and can be made fault-tolerant. Hence, MNNs can be adopted to integrate the decisions obtained by different features, enabling the accuracy of SSVEP-based BCI systems to be greatly improved.
To help patients with severe disabilities, an SSVEP-based BCI is developed as an AAC system in this study. To precisely represent the characteristics of EEG signals, four features are extracted by FFT, CCA, MSC, and PCA. To fuse the decision results determined by using these different features, an MNN consisting of two layers including input modules and an additional decision module is proposed to find a precise decision, which is used as a command or message for an SSVEP-based BCI system. The remainder of this paper is organized as follows. Section 2 describes the SSVEP signal acquisition, feature extraction, and MNN. Section 3 presents the results of a series of experiments to evaluate the performance of our approach. Finally, conclusions and possible improvements for the future development of this system are given in Sect. 4.

Materials and Methods
In this study, an MNN is adopted to develop the SSVEP-based BCI system whose block diagram is shown in Fig. 1. The system involves visual stimulation, EEG acquisition, feature extraction, and the MNN. First, the EEG signals for the corresponding responses are stimulated by visual stimulation. Second, four features are adopted to precisely represent the characteristics of the frequency responses. Third, for each feature, the input modules in the first layer of the MNN are used to recognize the corresponding results. Finally, the recognized results identified in the first layer of the MNN are fused by an additional decision module in the second layer of the MNN. These procedures are described in detail as follows.

Visual stimulation and EEG acquisition
To elicit SSVEPs, five blinking boxes are displayed on a 20" LCD screen. These blinking boxes, which flicker at 6.00, 6.67, 7.50, 8.57, and 10.00 Hz, are used to represent five different commands. (20) To reduce the interference between the blinking boxes, each box is arranged as shown in Fig. 1. Subjects are asked to sit in front of the LCD screen at a distance of 55 cm.
When the visual stimulation elicits EEG signals in the visual cortex of the brain, the EEG signals are acquired with an A/D converter with a 22-bit resolution and a 1 kHz sampling rate by using a NuAmps EEG amplifier supplied by Neuroscan Company. The EEG amplifier provides 37 channels for EEG acquisition, and the electrode placement follows the international 10-20 electrode placement system. In this study, the EEG signals were acquired from the Oz channel with reference and ground electrodes placed at the A1 and A2 channels, respectively.

Feature extraction
When an EEG signal, x, with N samples is acquired, four features are extracted in this study. The first one is the frequency magnitude estimated by FFT. The FFT is adopted to estimate the log magnitude of the spectrum, X(k). The second one is estimated by CCA, which is used to explore the underlying correlation between an input signal, x, and a set of reference signals, y. (17) For CCA, a pair of linear transforms, w x and w y , is used such that the correlation ρ between linear combinations is maximized. Then, the correlation can be defined as where ∑ xx = xx T and ∑ yy = yy T are the within-set covariance matrices and ∑ xy = xy T is the between-set covariance matrix. Any rescaling of w x and w y does not affect the correlation maximization, so Eq. (1) is equivalent to such that 1 Lagrange multipliers are then applied to transform Eq. (2) into the following generalized eigenvalue problems: and 1 2 The linear transforms w x and w y resulting in the largest canonical correlation between x and y are given by the eigenvectors corresponding to the largest generalized eigenvalues. The set of reference signals is formed by a series of sine-cosine waves.
The third feature is extracted by MSC, which is used to determine the correlation between x and y at each frequency. (16) The MSC at each stimulus frequency k, M xy (k), can be defined as where G xx and G yy are the auto-spectral densities of x and y, respectively, and G xy is the crossspectral density between x and y. The reference signal y is a pure sinusoidal wave and its frequency is equal to the stimulus frequency. The last type of feature used in this study is extracted by PCA. (18) PCA can effectively transform the EEG signals in low-dimensional space and keep most of the information by discrete cosine transform. PCA is applied to linearly transform the log magnitude of the spectrum, X(k), into the cepstrum domain and obtain the power cepstrum coefficient A, which is derived as where B is the number of frequency bins in X(m).

MNN
The structure of the MNN used in this study has two layers of neural models. (21) The first layer is the aggregation of several separately trained subnetworks, which are used as input models. Each input model is trained using a particular feature subset. The outputs of these input models are treated as the inputs of a subnetwork, which is a decision model, in the second layer of the MNN. The decision model is applied to fuse the decisions obtained using the different features. The input models and the decision model are designed as multilayer perceptrons (MLPs). Figure 2 shows the architecture of the MNN for the SSVEP-based BCI system. The features estimated by FFT, CCA, MSC, and PCA are separately fed into the FFT-based, CCAbased, MSC-based, and PCA-based MLPs, respectively. These MLPs are individually trained using the different features and then each MLP outputs its individual decision. Finally, the aggregation of these decisions is implemented by a decision MLP and then a final decision can be obtained by the MNN.
The MLPs used in this study are feedforward neural networks wherein the connections between the nodes do not form a cycle. An MLP consists of at least three layers of nodes, including an input layer, a hidden layer, and an output layer. Except for the input nodes, each node is a neuron that uses a sigmoid function as the nonlinear activation function in this study. To train the MLP, backpropagation is used to calculate the gradient needed in the calculation of the weights, which are used in the network.

Experimental Results and Discussion
In this study, 15 subjects were asked to participate in the experiments, and the leave-one-out method was applied to evaluate the proposed approach. The sampling rate and the frame size were set to 1 kHz and 1 s, respectively. Subsequently, a low-pass filter with a cutoff frequency of 50 Hz was used to obtain the downsampled signal with a sampling rate of 100 Hz. Each subject had 30 epochs for each specific frequency. Using a 1024-point FFT, the log magnitude spectrum was estimated and a triangle filter with a 2 Hz bandwidth was applied to estimate the log magnitude of specific frequencies. The order of PCA and the number of harmonics for CCA were set to 8 and 4, respectively. To compare the proposed approach, the decision method of identifying the maximum magnitude of the corresponding frequencies was selected as the baseline system and the results are shown in Table 1. For the feature of PCA, the maximum magnitude of PCA cannot be directly transformed to the frequencies; thus, it was not compared in this experiment.

Experimental results of MLP
As reported in this subsection, a feedforward artificial neural network was selected as the MLP and used to examine the different features. The number of hidden layers was set to 1 and the number of hidden nodes examined ranged from 4 to 9. The experimental results of the MLP obtained by FFT, CCA, MSC, and PCA are shown in Table 2 and denoted as NN(FFT), NN(CCA), NN(MSC), and NN(PCA), respectively. When the number of hidden nodes is 5, the recognition rates for each type of feature achieve acceptable results. It is also clear that NN(CCA) outperforms Baseline(CCA), but the performance characteristics of NN(FFT) and NN(MSC) are slightly degraded. When the number of hidden nodes is 7, the recognition rates

Experimental results of MNN
In this subsection, we present the results of the proposed approach and the effects of combining it with different features. In accordance with the previous experimental results, the number of hidden nodes in the first layer of the MNN was set to 5. The correlation between pairs of features was examined first. The number of hidden nodes in the second layer of the MNN was set from 4 to 9 and the experimental results are shown in Fig. 3. It is clear that the recognition rates can be effectively improved when decisions decided using two features are fused. Since CCA achieves the best performance when using a single MLP, CCA can precisely represent the EEG signals. The features fused using CCA features are improved from 86.78, 78.35, and 90.40 to 92.18, 88.04, and 90.93% for NN(FFT), NN(MSC), and NN(PCA), respectively. The results clearly show that CCA improves the recognition rates of the other features.
The experimental results obtained using three and four features are shown in Fig. 4. The comparison of the results in Figs. 3 and 4 shows that fusing any three features outperforms fusing any two features and that fusing four features achieves the best performance. Therefore,

Comparison with other approaches
The characteristic of the decision fusion for MNNs with different features is examined in this subsection. To investigate the effects of different features, the different features were combined into a new feature vector and treated as the input of MLPs. When the numbers of hidden layers and hidden nodes in the MLPs were set to 2 and 5, respectively, the numbers of hidden nodes in the MNNs and MLPs are similar. The experimental results for the MNNs and MLPs are shown in Table 3. It is clear that the recognition rates of the MNNs with different combinations of features are higher than those of the MLPs.
In Table 3

Conclusions
In this study, an MNN is proposed to fuse the features of FFT, CCA, MSC, and PCA to improve the efficiency of SSVEP-based BCI systems. The features of FFT, CCA, MSC, and PCA can precisely represent the characteristics of EEG signals. Moreover, the MNN can effectively fuse the decisions obtained by the different features. By using features with distinct properties, the SSVEP-based BCI system can achieve a satisfactory performance for decision fusion. Using the features of CCA and PCA, the MNN can achieve an acceptable result for practical applications. The experimental results showed that the MNN outperforms an MPL with full connections. To achieve a similar result to the MNN, the number of hidden layers should be considerably increased in the MLP such as by using a deep learning approach, which greatly increases the computational complexity. Hence, the proposed approach is very suitable for practical applications and can effectively help patients interact with their surroundings. In the future, the performance of the MNN can be improved by selecting a suitable neural network in the MNN or by fusing the decisions obtained by using different features with distinct properties. He is a professor and has been with the Department of Electrical Engineering, Southern Taiwan University of Science and Technology, for 32 years. His research interests include brain-computer interfaces, biomedical signal processing, system integration, and assistive device implementation. He is a member of the Taiwanese Society of Biomedical Engineering and Taiwan Rehabilitation Engineering and Assistive Technology Society. (chung@stust.edu.tw) Chung-Min Wu received his B.S. degree in automatic control engineering from Feng Chia University, Taichung, Taiwan, his M.S. degree in biomedical engineering from National Cheng Kung University, Tainan, Taiwan, and his Ph.D. degree in electrical engineering from National Cheng Kung University, Tainan, Taiwan, in 1994, 1998, and 2004, respectively. He is an associate professor in the Department of Intelligent Robotics Engineering, Kun Shan University. His research interests include fuzzy control, biomedical signal processing, and assistive tool implementation. He is a member of the Taiwanese Society of Biomedical Engineering and Taiwan Rehabilitation Engineering and Assistive Technology Society. (cmwu@mail.ksu.edu.tw)