Smart Driver Drowsiness Detection Model Based on Analytic Hierarchy Process

This paper proposes a smart driver drowsiness detection (SDDD) model for vehicles. The SDDD monitors a driver’s heart rate variability (HRV) through electrocardiography (ECG) in real time to detect driver drowsiness. The SDDD processes the data of HRV and ECG to obtain a set of parameters with time-domain analysis, frequency-domain analysis, detrended fluctuation analysis, approximate entropy, and sample entropy. In the process, a machine learning algorithm analyzes the parameters to detect driver drowsiness. The SDDD optimizes critical features with the analytic hierarchy process (AHP), which uses a feature extraction method through an iterative procedure. It is found that the SDDD in this study detects the level of driver drowsiness with higher sensitivity than previous models.


Introduction
Drowsy driving is dangerous as it frequently causes traffic accidents. Centers for Disease Control and Prevention (CDC) of the U.S.A. reported in 2013 that 1 in 25 adult drivers have fallen asleep while driving and that drowsy driving caused 72000 crashes, 44000 injuries, and 800 deaths in the U.S.A. in 2013, although these numbers are underestimates and up to 6000 fatal crashes may be caused each year. (1) There have been many studies to develop an efficient technology to detect drowsy driving based on a driver's heart rate (HR), respiration rate, posture, rapid eye movements, and head movement. Among the various indicators, HR variability (HRV) is known to be the most useful for detecting drowsiness. (2) HRV is measured by electrocardiography (ECG) and is used to monitor a driver's physiological and psychological states. ECG is a non-invasive and painless process used to detect the electrical activities of a human's heart. Electrical activities originate from muscle movements of the heart, which generate pulsating electrical waves. ECG requires sophisticated sensors with low noise and high amplification. Advanced sensor technology has enabled more reliable and accurate and faster monitoring of HRV.
HRV devices can detect a driver's states, but not the intensity of the states or the driver's physical condition and its changes. To obtain such information, HRV data need to be analyzed in detail and related with the driver's physical condition. (3) In addition, chronological data must be stored for a certain time period. Then, the necessary information on the physical condition and its changes with time can be extracted by either linear or nonlinear HRV analysis.
There have been many studies on HRV analysis, which mainly focused on spectral modification. Thus, most of the studies performed linear analysis such as time-domain or frequency-domain analysis. (4) However, linear analysis will not be appropriate to obtain nonlinear dynamics from HRV data when the data are obtained from a complex system. Thus, nonlinear analyses including abnormal statistical analysis and analysis with the filtering of noise have been suggested to remove the drawbacks of linear analysis. (5,6) Nonlinear analysis methods that have been proposed are detrended fluctuation analysis (DFA), wave analysis, (7) approximate entropy, (8) and sample entropy, (9) all of which yield better results than linear analysis for the HRV data of a complex system.
On the basis of the previous analysis methods of HRV data, a new method of analyzing HRV data to detect drowsy driving is proposed in this study. To obtain the best results from the analysis of HRV data, all of the above linear and nonlinear analysis methods are combined to define features and indicators for a new model of smart driver drowsiness detection (SDDD). The result of the new method is compared with that of several combinations of analysis methods. Improved driver drowsiness detection by the new method will minimize traffic accidents caused by drowsy driving.

Establishment of SDDD model
The construction process of the SDDD model in this study has the following steps ( Fig. 1).
(1) Preparation for signal detection (2) Signal detection of different states of body movements: blinking, closing, opening eyes, and vertical, horizontal, and rotating movements of head (3) Calculation of the intersection area of two normal distributions of two indicators (4) Calculation of the weights of HRV indicators based on analytic hierarchy process (AHP) (5) Machine learning training and model construction

Driving experiment
Twenty-one people (11 males and 10 females) participated in the driving experiment. Their average age was 24.1 (±3.41). All of them had held driver's licenses for at least six months. Participants of the experiment were randomly divided into two groups, groups A and B, who carried out simulated drives in the morning (02:00-04:30) and early evening (18:00-20:30), respectively. The times of the day for the simulated drives were chosen on the basis of research (10) in which the desire to sleep turned out to be strongest between 02:00 and 04:00 and between 14:00 and 16:30. In the simulated drives, participants rested for 10 min, drove for 90 min, then rested for 30 min again. Physiological parameters (low-frequency and high-frequency power in ECG) were measured for 5 min every 10 min during the experiments. The data were then used to obtain indicators as discussed later.

Experimental devices
The measuring devices for each participant included a chest-worn belt for measuring the HR, a smart mobile device, and an HRV analysis platform (a computer with the SDDD model) (Fig. 2).
(1) The chest-worn belt had a conductive rubber band to measure the HR. Signals of amplified potential difference generated by the heart were transmitted to a receiving device (the smart mobile device). (2) The smart mobile device with a low-power Bluetooth module (4.0 or higher) received the signals of the HR and the R-R intervals of the HR from the belt. The R-R interval is the elapsed time between two successive R-waves. The R-wave is the first upward deflection after a P-wave (atrial depolarization) and part of the QRS complex, which is the main spike on an ECG line. (11) A wireless network transmits the signals to the platform for HRV analysis every 5 min. (3) The HRV analysis platform analyzes the signals using the SDDD model in this study. (4) The SDDD model identifies a driver's drowsiness.

HRV data analysis
To propose an SDDD model with an AHP to improve the accuracy of detecting drowsy driving, the following different analyses of the signals were carried out. Experiment 1: time-domain and nonlinear analyses with feature extraction Experiment 2: frequency-domain and nonlinear analyses without feature extraction Experiment 3: time-domain, frequency-domain, and nonlinear analyses without feature extraction Experiment 4: time-domain, frequency-domain, and nonlinear analyses with feature extraction (the proposed method in this study) Detection and prediction of driver drowsiness require appropriate physiological data. For accurate prediction, feature extraction was used to select high-sensitivity indicators among 18 indicators for linear and nonlinear analyses as follows: seven indicators for time-domain analysis, seven for frequency-domain analysis, two for DFA, one for the approximate entropy method, and one for the sample entropy method. To select the most sensitive indicators among them, feature extraction was used to determine the degree of mutual dependence of two indicators with the normal distribution, and then an AHP was used to select one for the SDDD model. As a result, 14 indicators were selected to improve the accuracy in detecting driver drowsiness and reduce the probability of misjudgment (Table 1).
Frequency-domain analysis calculates the degree of variation of the R-R interval with time series analysis and uses the HR (in beats per minute, BPM) and R-R interval (in ms). Indicators for frequency-domain analysis are average HR (HR), average HR interval (RR), standard deviation of HRs (SDHR), and standard deviation of HR intervals (SDNN). Time-domain analysis uses the root mean square of the sum of squared differences in HR intervals (R_MMSD), the number of gaps that exceed 50 ms for HRs (NN50), and NN50 divided by the total number of periods of HRs (pNN50). R-MSSD, NN50, and pNN50 are all short-term variability indicators and are mutually correlated. Thus, they are used to evaluate the high-frequency variation of HRV.
Detrended fluctuation analysis fragments physiological data to a fixed length and calculates the average and standard deviation of the data in each segment to obtain the trend fluctuations of the data accordingly. The advantage is that the trend fluctuation of the entire segment data is obtained, but the result is easily affected by extreme values.
Approximate entropy analysis is a quantified sequence complexity method (9) and used to analyze the data of disordered sequences. A data dimension value and a threshold value are set first, and collected physiological data are cut by the data dimension value to obtain several fragments. Then, the proportion of the number of data in each segment above the threshold and the data disorder are calculated as the amount that is algorithmically calculated and is more convenient than measuring bias directly. The amount of information obtained for different data dimensions is defined as ApEn, whose value is between 0 and 1. When an ApEn value is smaller, the disorder of data is less obvious, and the degree of data variation in the sequence is smaller. This analysis is not easily affected by extreme values, but the threshold setting varies.
Sample entropy analysis is similar to approximate entropy analysis. (12) This analysis calculates the disorder of a system, that is, the degree of data disorder in the system. By using the preset data dimension and threshold value, physiological data are segmented to obtain several fragments. Then the threshold value is used to calculate the degree of data disorder in each segment. Sample entropy analysis mainly considers the proportion of data disorder in different data dimensions. Data disorder decreases when the data dimension is increased. The main advantage is that the proportion of data disorder is obtained, which it is not affected by extreme values. However, this analysis is not applicable with zero data disorder. Each analysis has its own applicable data distributions. Therefore, all analyses are performed on the physiological data in this study to establish an SDDD model that considers the data characteristics obtained by all analyses.
Indicators from the nonlinear analyses are thought to represent specific body movements. HR is related to the duration of blinking, VLF to the duration of closing eyes, LF to the ratio of the duration of closed to open eyes, HF to the head position on the x and y axes, TP to the head position on the z axis, and nHF and nLF to the rotation of the head along the x and y axes, respectively.

Integration of indicators
Calculating the intersection area of normal distributions of two indicators requires a test to ensure that the indicators are normally distributed. From experiments, it is found that at the significance level α = 0.05, the degree of freedom is df = 10, the chi-square critical value is χ 2 = 307.182, and the verification of the HR data distribution and normal distribution before driving is at the significance level α = 0.05, degree of freedom df = 10, and chi-square statistical value 0.982, which is less than the chi-square critical value. Therefore, the HR distribution before exercise conforms to the normal distribution. The chi-square test result of the HR data distribution shows that the data distribution of each category conforms to the normal distribution. Therefore, we can calculate the intersection area based on the normal distribution of all HRV measurement indicators in each category. The distribution of HRV before driving has a chi-square value of 0.982 at a significance level of α = 0.05 and df = 10. Therefore, the HRV data before driving are normally distributed.

Figures 3 and 4 show the overlapping areas (intersection areas) of the two normal distributions for the two groups. The intersection area of group A is smaller than that of group
where ( ) If σ 1 > σ 2 ,

Calculation of weight of integrated HRV indicator with AHP
The intersection areas of groups A and B are the input values of the AHP of this study. Then, the weight of each integrated HRV indicator is obtained according to the AHP. Weights ω 1 , ω 2 , ..., ω n are calculated using Eq. (11) with the elements (a 1 , a 2 , ..., a n ) where a i,j > 0, a i,j = 1/a j,i , and Calculation results are divided into n categories. Therefore, m features of the indicators are derived from q [q = 2 n C = n(n − 1)/2] groups of every two categories. For the jth feature, the intersection area of the kth group is defined as A (k, j) [Eq. (10)]. The size of the ith category is defined as G i , and the weight of each category (matrix M G ) can be derived from Eq. (12). The weight of each feature in the kth group can be calculated using Eq. (13). For the final decision, the final weight matrix W F is calculated from Eq. (14) based on the weight of m features of each group. Then, the critical value of the kth feature is evaluated as ω k . Finally, the feature with the highest value is taken as the critical feature to identify the driver drowsiness.

Model construction for machine learning
Algorithms for training in the machine learning in this study include the k nearest neighbors (kNN), support vector machine (SVM), naive Bayes classifier (NBC), neural network, and decision tree. We analyzed and compared the results of using these algorithms.

Results and Discussion
In this study, parameters such as the sympathetic activity index (LF), parasympathetic activity index (HF), and balance index (LF/HF) were calculated using each algorithm. As shown in Table 2, as the driving time increases, the low-frequency component (LF) gradually increases, indicating that the participants were under stress from driving. The human body tends to maintain constant operation of its physiology, so the sympathetic process is activated. This results in a gradual increase in LF/HF (sympathetic/parasympathetic balance index). After 90 min of simulated driving, the sympathetic nervous system of the participants was deactivated gradually. The balance index began to fall as the parasympathetic nerves were activated.
Changes in the HR of the human body are mainly regulated by sympathetic nerves under stressful conditions such as stress or exercise. Groups A and B showed different states of the sympathetic nerve at the beginning of driving. Group A carried out simulated driving very early in the morning (02:00-04:30). Their physical conditions were in a fresh state, so LF in their HRV data was generally lower than that in group B before driving. After driving, the sympathetic nerves were activated to maintain the normal operation of bodily functions, which increased LF.
In group B, who carried out the experiment between 18:00 and 20:30, the sympathetic nerve function began to decline after 60 min of driving as fatigue had already accumulated due to daily activities. Therefore, the autonomic nervous system stopped the sympathetic nerves from working to rest the body after the simulated driving. After driving, both groups rested and then the sympathetic nerve function was slowly deactivated. The independent sample t test of the HRV data showed that LF was significantly different before and after driving. There was no significant difference during driving (p > 0.05).
The experiments on driving drowsiness state identification were performed with different HRV analyses to propose the SDDD model in this study. Among them, experiments 1 to 3 were carried out with a single HRV analysis. In the proposed method, we adopted the state identification of the HRV measurement index with the SDDD model. According to the experimental results, the SDDD model had the highest recognition accuracy and the average accuracy was 97.97%. The average efficiencies of experiments 1 to 3 were 92.23%, 88.29%, and 91.89%, respectively. Table 3 shows the average accuracy for each classification algorithm and experiment. The single HRV analysis had recognition accuracies of 66.66% for KNN-1, 66.66% for KNN-5, and 83.33% for SVM. The overall average efficiency was 72.23%. Therefore, the SDDD model improved the average recognition accuracy by 25.74% compared with the result of single HRV analysis.

Conclusions
Our SDDD model measures the degree of overlap between two normal distributions, i.e., the intersection area under the distributions. Then, it uses the AHP to measure the intersection area to distinguish the importance of each HRV measurement feature. This method does not require conventional principal component analysis, multivariate analysis, or multi-objective decisionmaking to measure the importance of each feature.
The results of the SDDD model provided accurate information on the state of the driver and discriminated drowsiness efficiently with the machine learning algorithm. The SDDD model showed the highest identification accuracy in the experimental results and improved the accuracy by 25.74%. The overall average accuracy of the method in the original literature was low using a single time-domain analysis. Therefore, the proposed method improved the identification accuracy by 25.74% compared with the original method in the literature, which makes the overall efficiency of measuring HR, RR, and VLF more important for distinguishing the sleepiness of the driver. Further research is required to process unusual distributions of features and to predict drowsiness and exhaustion with an improved SDDD model. Our new SDDD model can include more information such as road types, weather conditions, and the speed of vehicles, which affect drivers significantly.