Analyzing Wrist Pulse Signals Measured with Polyvinylidene Fluoride Film for Hypertension Identification

The human wrist pulse signal, which often exhibits nonstationarity, can reflect changes in mechanisms and pathophysiology in the blood and viscera. In this paper, we present an integrated approach for identifying hypertension using the wavelet packet transform (WPT) and continuous hidden Markov models (HMM) to analyze wrist pulse signals. The approach starts with decomposing the wrist pulse signals into a number of frequency sub-bands through the WPT. Then the local discriminant bases (LDB) algorithm is used to obtain the best representation of the wrist pulse signal in the optimal frequency sub-bands. After that, energy features are extracted from those sub-bands and the optimal features associated with corresponding sub-bands are selected using the Fisher linear discriminant criterion. The optimal features are subsequently used as input to a continuous HMM classifier for hypertension identification. Experimental results indicate that the presented approach can differentiate the hypertensive wrist pulses from healthy wrist pulses effectively. In addition, when compared with other classifiers, the results demonstrate that the continuous HMM classifier yields a better classification result.


Introduction
Hypertension, also known as high blood pressure, is a chronic disease with a rise in arterial blood pressure as the main symptom. Most of the more than one million deaths due to cerebrovascular disease every year are caused by high blood pressure. Since wrist pulse signals carry a great deal of physiological information regarding the health of a human body, research on wrist pulse signal analysis has been attracting more and more attention over the past few years. However, the wrist pulse signal is very complex; it is related to the vascular structure, blood characteristics, blood circulation system, and many other factors (1) that give rise to challenges in effective pulse signal analysis.
For computerized pulse signal diagnosis, pulse signals are firstly acquired by sensors, (2,3) and then processed to extract effective features for different classification tasks, such as diagnosis of disease, pulse waveform classification, and analysis of health conditions. Since the pulse signal is weak, nonstationary, and predominantly in the low-frequency region, appropriate signal processing methods must be adopted for the signal analysis. With the rapid development of signal processing technology, time-frequency analysis methods, such as short-time Fourier transform, (4) Hilbert-Huang transform (HHT), (5) empirical mode decomposition (EMD), (6) and wavelet transform, (7) have provided effective methods of extracting features from transient, time-varying signals. For example, the pulse signal has previously been processed by HHT to determine arterial stiffness, which can be used as an indicator for differentiating subjects with and without diabetes. (8) The ensemble EMD has been used to extract statistical features for pulse feature representation. (9) Previous works (10) confirmed that wavelet transform is superior to frequency-domain analysis for the feature extraction of the pulse waveform and can decompose signals into elementary timefrequency blocks that are well localized in both time and frequency domains. (11,12) Automated pulse recognition by using the wavelet transform to compute the main time-domain characteristic parameters of pulse signals could also be realized. (13,14) Furthermore, wavelet packet transform (WPT), (15,16) as an extension of wavelet transform, has attracted increasing attention owing to its ability to provide more flexible time-frequency decomposition. (17) It was used for frequency subband energy or other statistical feature extractions for pulse feature representations. (18) Although many studies have shown that wavelet transform is effective in pulse signal analysis, research has rarely focused on hypertensive features learned through the pulse signal, and features extracted by wavelet transform are chosen mostly by subjective judgment so they may not be representative.
In this paper, an integrated approach, in which WPT is combined with continuous hidden Markov models (HMM), is designed for more effective pulse signal analysis with specific application to the health status assessment of hypertensive patients. WPT is used to decompose the pulse signals into various frequency sub-bands, with energy features being extracted from those sub-bands to characterize the pulse signal. The local discriminant bases (LDB) algorithm and Fisher linear discriminant (FLD) measure are applied when choosing the optimal features to characterize pulse signals. The HMM, a probability distribution model, is then used to diagnose hypertensive subjects. The paper is organized as follows. Section 2 gives the theoretical knowledge of WPT, LDB, FLD, and HMM. In addition, some methods, such as K-means clustering and the genetic algorithm (GA), are also studied for HMM parameter initialization. Then, human wrist pulse signals measured from healthy and hypertensive subjects are analyzed and discussed in Sect. 3. Finally, conclusions are drawn in Sect. 4.

WPT-based feature extraction
WPT, (15,16) different from discrete wavelet transform, can decompose a signal into equally spaced frequency sub-bands. Mathematically, a wavelet packet consists of a set of linearly combined wavelet functions that can be defined by the following recursive relationships: where ψ 0 (t) = ϕ(t) is the scaling function and ψ 1 (t) = ψ(t) is the wavelet function. The symbols h(n) and g(n) represent the low-pass and high-pass filter coefficients, respectively, determined by the scaling function and wavelet function. (19) Furthermore, h(n) and g(n) are related to each other by g(n) = (−1) n h(1−n). Utilizing the low-pass and high-pass filter coefficients, a time-domain signal x(t) can be decomposed recursively as where the symbols j and k denote the decomposition level and frequency sub-band, respectively. Furthermore, x k j (t) denotes the wavelet coefficients at the j-th level, k-th frequency sub-band, represented by node ( j, k), and x 0 0 (t) = x(t). If each frequency sub-band at ( j, k) can be considered as a subspace B j,k , which is spanned by a series of base vector , where m is equal to the dimension of B j,k , 2 N corresponds to the length of the signal, and then the signal x i can be expressed as where x j,k,m is equal to the m-th coefficient of B j,k . Unlike traditional signal processing methods, WPT can provide many important signal characteristics that are useful in classification. Because of the great diversity of the existing wavelet packet bases, WPT has various decomposition styles for a signal. Different wavelet packet bases possess different natures and reflect the different properties of a signal. To solve this problem, the LDB algorithm, introduced in the following subsection, has been employed to select the optimal wavelet packet bases for signal decomposition based on the data gathered.

LDB
The LDB algorithm is a pruning algorithm that identifies the subspaces that exhibit high discrimination attributes between signal classes, using a given dissimilarity measure. (20,21) The LDB algorithm has been employed to select the optimal wavelet packet bases to address realworld classification problems in audio, (22) radar, (23) and biomedical engineering, (24) etc. The optimal choice of LDB subspaces for a given dataset is driven by the nature of the dataset and the dissimilarity measures used to distinguish between classes. In this study, relative entropy is selected as the discriminant measure in searching for the optimal wavelet packet subspaces because of its prominent ability in characterizing the relationship between two signals. (25) In the case of two categories, the relative entropy can be defined as where p i (1) and p i (2) are two non-negative sequences satisfying the condition that the sum of every sequence is one. We assume that log0 = -∞ and log(p i /0) = +∞ for p i > 0, 0(±∞) = 0. The discriminant information D(p (1) , p (2) ) between the two sequences measures how different the distributions of the two classes are. For multiple class problems, the discriminant measure based on relative entropy is expressed as where L is the number of classes.
Suppose that A j,k represents the desired local discriminate basis restricted to the subspace of B j,k , which is a set of basis vectors at node ( j, k). Then, for a given training dataset consisting of L , where N l is the total number of training signals in class l, the steps for implementing the LDB algorithm are shown as follows.
Step 1: Choose a time-frequency decomposition method, such as WPT used in this study, to decompose the signals contained in the training dataset.
Step 2: Construct time-frequency energy maps C l for l = 1, 2, ..., L on the wavelet packet coefficients using following equation.
Step 4: Search for the best subspace A j,k , defined as follows. For j = J−1, J−2, ..., 0 and k = 0, 1, , that is, the dissimilarity measure of the parent node is greater than those of cumulative children nodes, then Step 5: Rank the complete basis functions found in step 3 from high to low according to their discrimination power. Through LDB, a set of optimal wavelet packet subspaces have been selected. In this study, the sum of the squares of the coefficients in each subspace (i.e., energy feature E k j = m x 2 j,k,m ) is calculated to construct the feature vector as where f t is the energy of each subspace and t is the number of subspaces chosen using LDB. However, preliminary analysis has revealed that not all the feature components in the feature vector contribute directly to distinguishing the hypertensive subjects from the healthy subjects. This is because the human wrist pulse frequency range is between 0 and 20 Hz, (26) whereas about 99% of the energy is distributed between 0 and 10 Hz. Furthermore, an excessively high vector dimension involved in the input to a classifier may not necessarily improve the performance of classification and also increase computational time. Therefore, the FLD measure is introduced in the following subsection to reduce the feature dimension.

FLD for feature selection
FLD, a distance measure, has been applied to components differentiation within a class pair. Its basic idea is to project a high-dimensional model of the initial samples onto the best vector space for identification, so as to reduce the feature space dimension and extract significant information for classification. Generally, the greater the distance between two feature components within a class pair, the easier it will be to separate them. The FLD power can be expressed as where the symbols μ i,f1 and μ j,f1 and σ 2 i,f1 and σ 2 j,f1 are the mean values and the variances of the l-th feature f l, for class i and class j, respectively. Features with a high discriminant power are more representative than those with low discriminant power, and will be selected as input for the subject's health status assessment.

HMM classifier design
HMM is a statistical model that has been successfully applied to speech recognition since the 1980s. (27) In recent years, it has been widely used, for example in text recognition and fault diagnosis. (28,29) HMM has been developed from the Markov chain. A Markov chain is a sequence of events, usually called states. They are not directly observed but are associated with a probability function. The generation of a random sequence is the result of a random walk in the chain and of an observation (also called an emission) at each visit to a state. An HMM has the following elements. (30) It can be seen that a complete HMM requires the specifications of N, M, A, B, and π. Since there is a definite relation between these parameters (e.g., N and M can be determined if A and B are known), (31) a compact notation is often used in the literature to indicate the complete parameter set of the model for convenience.
A typical HMM is shown in Fig. 1, where h 1 from one to h N symbolize the hidden states. Arrows indicate the relationship between variables, and the symbols above arrows indicate the state transfer probability.
In general, HMM can be divided into two categories, the discrete HMM and the continuous HMM. The latter is applied to analyzing the wrist pulse signals of healthy and hypertensive patients in this study. Compared with the typical discrete HMM, the observation probability distribution B in the continuous HMM is assumed to be generated by the Gaussian probability density function. In practical applications, one Gaussian probability density function cannot fit the probability distributions of the observation sequence, so the linear combination of several Gaussian probability density functions is used for generating the observation sequence. The observation probability distribution is expressed as (32) where D is the number of components in O and M is the number of Gaussian probability density functions. Furthermore, C jl is the l-th mixed coefficient belonging to the j-th state, μ jl is the l-th average of the mixed Gaussian belonging to the j-th state, and U jl is the l-th covariance of the mixed Gaussian belonging to the j-th state. Therefore, a continuous HMM could be denoted as (33) λ = {π, A, μ jl , U jl , C jl }.
When building an HMM, three basic problems must be solved: evaluation, decoding and training. Firstly, when observation sequence O and HMM λ are given, the evaluation problem is how to calculate the probability of observation sequence O under model λ. The forward-backward algorithm is used to solve this problem. (34) Secondly, the decoding problem is how to determine a best state sequence that can generate the best observation sequence when one observation sequence and a model are given. The Viterbi algorithm is used to solve this problem. (35) Thirdly, the training problem is how to determine the optimum HMM to maximize the probability of an observation sequence when the observation sequence is given, and the Baum-Welch algorithm is introduced to deal with this problem. (36) The Baum-Welch algorithm is a type of expectation maximization (EM) method based on the maximum likelihood criterion that could be defined as where O indicates the observation sequence and P(O|λ i ) indicates the probability of the output observations O under the HMM . Then, the best parameter set λ of the model is determined.

Parameter initialization
To use a continuous HMM as a classifier, the HMM parameters should be estimated first. Since it is generally recognized that the choice of the initial values of π and A has little effect on the results, the influence of the parameters μ jl , U jl , and C jl on the classification performance is investigated in this paper. In addition to random initialization, two other methods are studied to initialize the continuous HMM parameters, as briefly introduced below.

K-means initialization
K-means clustering is a method of cluster analysis aimed at distributing n observations into k clusters in which each observation belongs to the cluster with the nearest mean. (37) K-means clustering often uses Euclidean distance as the evaluation index of comparability, and is one of the popular cluster algorithms today. For HMM parameter initialization in this study, k feature vectors are randomly selected as initial cluster center points. Then, depending on the similarity criterion (Euclidean distance), each sample belongs to the cluster with the shortest distance to the clustering center, and the new central point of each cluster is calculated. If the mean square covariance of two iteration results tends to be convergent, the calculation is over; otherwise, this process is repeated to adjust the cluster center. Finally, the mean value and standard deviation of each cluster are used to estimate the Gaussian probability density function for initialization.

GA initialization
GA is an optimization algorithm that simulates Darwin's genetic choice and biological natural selection process. (38,39) It uses a coding scheme to realize parameter optimization through iterative genetic operations, including selection, crossover, and mutation, where a fitness function is usually defined to find the optimal solution. For HMM parameter initialization in this study, floating point number coding is used as the coding scheme due to the high precision requirement when the mixed Gaussian probability density function is employed for generating the observation sequence. The matrices of μ jl , U jl , and C jl are then built and connected together as individual populations. The fitness function is defined as (40) where O (k) indicates the k-th observation sequence of the training model and P(O (k) |λ) is the likelihood probability. After the result of the fitness function is obtained, individuals and the population are varied by genetic operations. This study employs a one-point crossover operator. It also takes a real value variation as a mutation operator and adopts a roulette wheel selection method. Supposed there are n genes and M sequences O k (k = 1, 2, ..., M) of length T. In accordance with the roulette wheel selection method, one gene from every sequence is selected and the probability for gene g being selected is shown as (32) After that the fitness function is recalculated. If the fitness matches the expectation error (<10 −3 ) or reaches the maximum number of iterations, the results are considered to have been obtained. Otherwise, the above process is repeated.

Wrist pulse signal diagnosis system
With the information described above, an integrated algorithm for wrist pulse signal diagnosis can be designed. Suppose there are k kinds of targets, and each target is represented by an HMM and recorded as λ 1 , λ 2 , ..., λ k . The flow chart of this wrist pulse signal diagnosis system is shown in Fig. 2. Firstly, k different categories of wrist pulse signals are collected and they are each decomposed through WPT. Secondly, the LDB algorithm is applied to the selection of the optimal bases and FLD is used to select the best energy characteristic vectors. Thirdly, each class of signals is divided into two parts. One is used to train the model, and the other is used to test the model. Fourthly, the energy characteristic vectors from the training dataset are used to obtain the k classes of continuous HMMs using the Baum-Welch algorithm. Finally, the energy characteristic vectors from the testing dataset are put into the trained models, and the likelihood ratio is calculated using the Viterbi algorithm. A wrist pulse signal series in the testing will belong to the continuous HMM that corresponds to the largest likelihood ratio.

Wrist signal measurement
Experimental study was conducted at Nanjing ZhongDa Hospital (affiliated to Southeast University) with informed concent that the authors have obtained. A pulse sensor, which converts the pulse to an electric signal through a polyvinylidene fluoride (PVDF) film, was placed on the surface of the wrist of the subjects. A total of 94 volunteers, including 38 healthy and 56 hypertensive subjects with an average age of 42 years old, were involved in the experiment. The ratio of males to females was close to 1. Typical healthy and hypertensive pulse signals are represented in Fig. 3. It can be seen that the descent stage of the hypertensive pulse wave has less fluctuation than that of a healthy pulse wave because the blood pressure of hypertensive patients increases sustainability.

Energy feature extraction
In this study, for hypertensive and healthy subjects, each pulse signal is decomposed into four levels using the WPT algorithm with the Db4 wavelet, whose shape is more similar than those of other wavelets to the shape of the pulse wave. Since energy can be used to describe the characteristics of a given signal, the corresponding energy values at all frequency sub-bands are calculated and chosen as the feature to characterize the wrist pulse signal. As an example, Fig. 4 illustrates the energy distribution of wrist pulse signals from both the healthy and hypertensive subjects at the fourth decomposition level. It can be seen that the energy of the healthy and hypertensive signals is mainly concentrated in the low frequency sub-band, which also matches the characteristics of the low frequency of the pulse signal. The energy amplitude of the healthy signal is significantly higher than that of the hypertensive signal, and thus the energy features of the package can be used to classify health and hypertension.  LDB and FLD are then used to select the appropriate optimal features from all the four decomposition levels. Figure 5 shows the selected wavelet packet nodes that contain the best discriminant information for classifying health conditions after applying the LDB algorithm. It can be seen from The corresponding FLD powers of all chosen frequency sub-bands are obtained and listed in Table 1. The first four large discriminant powers at frequency sub-bands B 3,3 , B 4,6 , B 4,13 , and B 3,2 are chosen. Therefore, the energy features E 3 3 , E 6 4 , E 13 4 , and E 2 3 are selected as the optimal features for continuous HMM classification.

HMM classification
In accordance with the human wrist pulse detection process, a continuous HMM structure (shown in Fig. 6.) that has three hidden states is chosen to simulate the changing processes of the wrist pulse. The states simulate the main, tide, and repulse waves of the wrist pulse.
The experiments were divided into three groups on the basis of different parameter initialization methods: random generation, K-means, and GA. The initial state distribution is set as π = [1, 0, 0],  Sensors and Materials, Vol. 29, No. 9 (2017) 1349 and the initial state transition probability distribution matrix A is expressed as Tables 2 and 3 are the initial parameters generated by K-means and GA, respectively. Features from a group of 20 healthy subjects and 20 hypertensive subjects are used as training data. The remaining 18 healthy subjects and 36 hypertensive subjects are used as test data and are input to the trained continuous HMMs.
The classification results of these three methods are listed in Table 4. It can be seen that continuous HMM, as a classifier, is effective in differentiating hypertensive subjects from healthy subjects. K-means and the GA algorithm, which are used to initialize the continuous HMM parameters, can improve the accuracy of the continuous HMM for classification, and GA is more effective than K-means because of its strong local search ability.

Comparison with other classifiers
From the discussion in Sect. 3.3, it can be seen that the continuous HMM classifier can effectively differentiate the hypertensive wrist pulses from healthy wrist pulses. To compare the classification accuracy of different classifiers, the artificial neural network (ANN) classifier, support vector machine (SVM), and k-nearest neighbor (KNN) classifier are to class the wrist pulse signals instead of the HMM classifier. (41)(42)(43) Classification results are shown in Table 5.
It can be seen that the classification accuracy rates of these three classifiers are less than 80%. The comparison of results shows that the continuous HMM classifier can effectively improve the classification results.

Conclusions
An integrated signal processing approach for wrist pulse signal analysis was investigated. This approach utilizes the capability of the WPT in analyzing non-stationary signals to construct the sub-band energy features of pulse signals, and then uses them as input to a continuous HMM classifier. Three methods were used to initialize the HMM parameters and the classification results were compared. The experimental results indicated that the integrated approach is effective   in differentiating hypertensive subjects from healthy subjects, and provides a viable tool for pulse signal analysis. Furthermore, the parameter initialization approach can effectively improve the classification performance.