IEEE 802.11ac-based Outdoor Device-free Human Localization

Wireless local area network (WLAN) technologies are utilized for sensing, which rely on changes in wireless signals in a sensing environment. A number of studies have dealt with channel state information (CSI)-based sensing technologies, although their applications are difficult in an outdoor environment owing to the limited influence of a sensing target object on radio signals. In this paper, we present an outdoor device-free human localization method using IEEE 802.11ac CSI. The key idea is to employ multiple transmitter–receiver pairs to derive sufficient information for human localization. We install multiple WLAN transmitter–receiver pairs to cover an entire localization target area and enable communication between transmitter– receiver pairs to collect CSI data from each pair. We then extract essential CSI components for localization to realize outdoor CSI sensing. A supervised learning method is used to estimate the location of a human. We conducted experiments to collect CSI data using actual WLAN devices in outdoor environments. The experimental evaluations revealed that our system estimated the location of a human in a target area with high accuracies of 93.51 and 92.65% in line-of-sight (LOS) and non-line-of-sight (NLOS) environments, respectively.


Introduction
Recent advances in wireless local area network (WLAN) communication technologies have led to the development of WLAN-based sensing technologies, which rely on changes in wireless signals received at a receiver to monitor the changes in a sensing environment. WLAN systems are prevalent nowadays. We can utilize WLAN devices already installed in the environment not only for communication but also for sensing.
Although pioneering WLAN-based sensing technologies have led to the development of sensing methods based on received signal strength (RSS), recent WLAN-based sensing technologies utilize channel state information (CSI), which is used for beamforming. The CSI includes amplitude and phase information between transmitter-receiver antenna pairs. WLAN uses orthogonal frequency division multiplexing (OFDM) communication and transmits subcarriers, which are multiple carrier waves having slightly different center frequencies. We can derive the amplitude and phase for each subcarrier. CSI-based sensing technologies analyze changes in CSI and extract changes in the multipath fading state for sensing tasks such as object detection and object movement detection.
CSI-based sensing technologies, however, are difficult to apply in an outdoor environment. In an outdoor environment, the influence of a sensing target object on radio signals is very limited because there are a limited number of radio-signal paths. Moreover, an outdoor line-ofsight (LOS) environment reduces the receiver amplifier gain to avoid clipping for high-power LOS signals. We therefore observe small CSI changes that are mainly caused by the changes in reflected/diffracted signals and need to extract CSI changes affected by sensing target objects.
In this paper, we present an outdoor device-free human localization method using IEEE 802.11ac CSI. Our key idea is to employ multiple transmitter-receiver pairs to derive sufficient information for human localization. We install multiple WLAN transmitter-receiver pairs to cover an entire localization target area and communicate between the transmitter-receiver pairs to collect CSI data between the pairs. We then extract essential CSI components for localization to realize outdoor CSI sensing. To efficiently collect CSI data from multiple transmitterreceiver pairs, we use the CSI collection system presented in our previous study, (10) which collects IEEE 802.11ac compressed CSI. IEEE 802.11ac standards define very high throughput (VHT) sounding protocol for beamforming. (11) We overhear VHT sounding protocol messages to collect compressed CSI data. The collected CSI data is then analyzed by a supervised machine learning algorithm to realize outdoor human localization. Unlike the related work described above, we utilize IEEE 802.11ac instead of IEEE 802.11n, where CSI is normally compressed. Because of the CSI compression process, we encounter a phase jump problem: the CSI phase often jumps between 0 and 2π. To overcome this phase jump problem, we apply a simple signal processing in which we calculate sin and cos to remove the influence of phase jumps. Our main target application is an outdoor exhibition event with walkways whose width is a couple of meters. We therefore install multiple transmitter-receiver pairs to cover a large area up to a couple of thousand square meters. We are aiming to achieve 10-meter accuracy to distinguish people in different event booths.
Specifically, our main contributions in this paper are as follows: • We present a design of an IEEE 802.11ac WLAN-based outdoor device-free human localization system that collects sufficient CSI data from multiple transmitter-receiver pairs. To the best of our knowledge, this is the first report of outdoor device-free human localization using IEEE 802.11ac WLAN compressed CSI. • We present a simple yet effective signal processing that overcomes the phase jump problem caused by a CSI compression process in IEEE 802.11ac. We calculate sin and cos of the compressed CSI phase to remove the influence of phase jumps on human localization. • We experimentally evaluate the basic performance of the outdoor human localization system using actual IEEE 802.11ac devices.
The remainder of this paper is structured as follows. In Sect. 2, we present existing CSIbased sensing work. In Sect. 3, we present an outdoor IEEE 802.11ac-based outdoor human localization system, which is followed by experimental evaluations in Sect. 4. Finally, in Sect. 5, we conclude the paper.

Related Work
To the best of our knowledge, IEEE 802.11ac WLAN-based outdoor sensing is novel. In this section, we briefly look at WLAN-based sensing technologies.

RSS-based sensing
WLAN-based sensing is based on the changes of received wireless signals. One of the major approaches of WLAN-based sensing is to utilize RSS. (6,7,(12)(13)(14) RASID (12) collects RSS from multiple WLAN devices and applies nonparametric statistical anomaly detection techniques to provide motion detection capability. RASID also updates its profile to follow the longterm changes in the environment. WiGest (6,7) is a gesture recognition system that detects RSS change primitives to detect predefined gesture motions. WiGest also presents a fine-grained gesture recognition method using CSI. Nuzzer (13) is a device-free localization system based on a probabilistic approach. Nuzzer statistically analyzes RSS derived from multiple transmitterreceiver pairs and calculates the probability of human location to finalize the estimation. GIFT (14) is a fingerprinting localization method using RSS gradient as a location-specific feature, which gives more stable localization results. Although we can easily derive RSS, unstable RSS limits the sensing performance.

CSI-based sensing
CSI-based sensing technologies have recently been focused on as they realize fine-grained sensing. (3)(4)(5)(15)(16)(17)(18)(19)(20)(21) CSI is radio propagation environment information between a transmitter and a receiver. WLAN uses OFDM with multiple carrier waves called subcarriers and therefore provides CSI for each subcarrier. CSI gives us not only amplitude change information but also phase change information, which enables us to realize more accurate sensing than RSS.
ART (15) is an eavesdropping method that captures the vibration of a loudspeaker using CSI. ART extracts audio vibration based on the audio-radio translation model defined in Ref. 15. SignFi (16) feeds CSI into a convolutional neural network (CNN) to recognize hand sign language. CsiGAN (17) utilizes a semisupervised generative adversarial network (GAN) to realize an activity recognition system applicable to anyone whose data are not used for training. BreathTrack (18) utilizes a sparse recovery method to determine the dominant radio path from CSI to monitor respiration. Wang et al. (19) presented a joint system that simultaneously performs activity recognition and indoor localization using CSI derived by special implementation of IEEE 802.11n. Widar3.0 (20) is a cross-domain gesture recognition system that works on a domain different from the domain used in the training of a machine learning algorithm. WiAct (21) is an activity recognition system extracting the activity-related Doppler shift based on the correlation of multiple antennas derived from CSI. The previous work focused on indoor CSI sensing, while we tackle outdoor CSI-based sensing.
In our previous work, we presented an IEEE 802.11ac CSI collection system. (10) The CSI collection system efficiently collects CSI from multiple transmitter-receiver pairs installed in a target area by snooping and extracting IEEE 802.11ac sounding protocol packets. We utilize the CSI collection system in this study. We can easily cover a large localization area by installing multiple transmitter-receiver pairs. Figure 1 illustrates an overview of the IEEE 802.11ac-based outdoor device-free human localization system. The system consists of four blocks: data acquisition, preprocess, human detection, and localization blocks. We first collect compressed CSI data in the data acquisition block and pass the compressed CSI data to the preprocess block. The preprocess block first compensates for phase rotation because CSI phase information suffers from phase rotation mainly caused by carrier-frequency offset (CFO). We then apply a low-pass filter (LPF) to denoise high-frequency environmental noise. The human detection block extracts features from preprocessed CSI data and normalizes the preprocessed CSI data. The human detection block then estimates whether a human is in a sensing target area using a supervised machine learning classifier. Only when a human is detected by the human detection block does the localization block estimate the human location using a supervised machine learning classifier. This two-step human localization efficiently estimates human location.

Design overview
In the following subsections, we present the design details of each block.

Data acquisition block
As shown in Fig. 1, the data acquisition block consists of WLAN devices. WLAN access points (APs) and WLAN devices, i.e., CSI measuring stations in Fig. 1, are installed in a target localization area. A WLAN device named a CSI monitoring station, which is installed near CSI measuring stations, snoops CSI feedback WLAN frames sent by CSI monitoring stations to collect CSI for multiple transmitter-receiver pairs. Note that a target human needs to carry no device because we only analyze the influence of a target human on wireless signals to estimate his or her location.
CSI feedback frames are collected via the VHT sounding protocol, which is defined in IEEE 802.11ac standards. (11) Figure 2 shows the communication sequence of the VHT sounding protocol, which shows an example when a WLAN AP retrieves CSI data from n WLAN stations. A WLAN AP first broadcasts a VHT null data packet (NDP) announcement frame. Then the AP periodically broadcasts a VHT NDP frame that includes specific data to estimate CSI. Stations that receive the VHT NDP frame estimate CSI and compress the CSI into angle information ψ by applying singular value decomposition (SVD) and the Givens rotation. i, j, and l are index numbers used in the CSI compression process. Stations send the compressed CSI to the AP as a VHT compressed beamforming frame with random backoff, i.e., a random-length waiting time before transmission. After one of the stations sends CSI to the AP, the other station waits for a Beamforming report poll frame to be sent from the AP prior to sending a VHT compressed beamforming frame to the AP. A CSI monitoring station captures all these frames and extracts VHT compressed beamforming frames to derive CSI, i.e., ϕ ij and ψ lj . CSI data consist of phase and amplitude, which are represented by angles ϕ ij and ψ lj as the phase and antenna differences between antennas, (22) respectively. The angles ϕ ij and ψ lj are quantized with specific bit lengths , respectively. The angles ϕ ij and ψ lj are described as follows.
The index numbers i, j, and l and the size of the CSI feedback matrix depend on the number of antennas on the transmitter and receiver. i, j, and l take an integer such that j ≤ i ≤ N r − 1, 1 ≤ j ≤ min(N c , N r − 1), and j + 1 ≤ l ≤ N r , where N c and N r are the numbers of transmitter and receiver antennas such that N c ≤ N r . Table 1 shows the relationship between angles ϕ ij and ψ lj and the size of the CSI feedback matrix. (11) IEEE 802.11ac uses at least 52 subcarriers in addition to four pilot subcarriers in a 20 MHz VHT mode. We can derive angles from each beamforming frame.

Preprocess block
The preprocess block consists of a phase rotation compensation sub-block and an LPF, as shown in Fig. 1. The phase rotation compensation sub-block removes the influence of CSI phase rotation and the LPF reduces high-frequency noise.
CSI data derived by the data acquisition block is passed to the phase rotation compensation sub-block. As described in Sect. 3.2, angle ϕ ij takes a value greater than or equal to 0 and less than 2π as ϕ ij represents the phase difference between antennas. Angles 0 and 2π represent the same value. We therefore convert an angular value to a linear value that describes 0 and 2π as the same value. We calculate sinϕ ij and cosϕ ij in the phase rotation compensation block and utilize sinϕ ij and cosϕ ij instead of directly using ϕ ij as features for a machine-learning-based classifier.
In IEEE 802.11ac compressed CSI, phase jumps have a big influence because a compression process almost removes linear phase rotation caused by carrier frequency offset. When we use IEEE 802.11n noncompressed CSI, which is popularly used in related work, we need phase calibration such as that presented in PhaseFi. (23) IEEE 802.11ac, however, compresses CSI by removing phase rotation information keeping relative phase and amplitude information. Instead, Table 1 Relationship between angles ϕ ij , ψ lj and size of CSI feedback matrix. (11) Size of CSI feedback matrix

Size of angle information
Order of angles in beamforming frames |{ϕ ij }| |{ψ lj }| we encounter phase jumps; when compressed CSI phase ϕ ij is close to 0 and 2π, ϕ ij jumps between 0 and 2π back and forth. Dividing ϕ ij into sinϕ ij and cosϕ ij , we can successfully remove the influence of phase jumps. Figure 3 shows an example of phase jumps of compressed CSI derived from an actual IEEE 802.11ac WLAN device installed in an outdoor environment. The example also depicts sinϕ ij and cosϕ ij for comparison. From Fig. 3(a), we can observe that the compressed CSI ϕ 31 jumps between 0 and 2π. We can also observe that sinϕ 31 and cosϕ 31 removes the influence of phase jumps.
We then apply an LPF to reduce the influence of high-frequency environmental noise because CSI data include the influence of small changes in objects such as the sway of trees. CSI also suffers from radio communication circuit noise, especially oscillator phase noise. The majority of frequency components of the human walking motion are less than 4 Hz. (24) The cutoff frequency of the LPF is therefore set to 6 Hz including a margin to reduce the influence of high-frequency noise and CSI noise caused by radio circuits. In our implementation, we used a 64-tap finite impulse response (FIR) filter. Figure 4 shows an example of the noisy sinϕ 11 angle data of subcarrier −28 when there is a human in an outdoor sensing target area. Figure 4(a) shows raw angle data and Fig. 4(b) shows the result of frequency analysis. The vertical red dotted line in Fig. 4(b) represents 4 Hz. Each plot shows both raw and filtered data, i.e., LPF-applied data. We can confirm that the majority of frequency components are less than 4 Hz, and high-frequency components are successfully reduced by the LPF.

Human detection block
The human detection block consists of a feature extraction sub-block, a normalization subblock, and a human detector, as shown in Fig. 1. The feature extraction sub-block calculates features from a batch of CSI data for each subcarrier, and the normalization sub-block normalizes the features using a general normalization method such as min-max normalization. The human detector sub-block then classifies whether a human is in a sensing target area. The feature extraction sub-block first divides the preprocessed CSI data into "batches", which are fixed-length data. The batches are required as the input of a machine learning model to efficiently train the model as well as to extract features. We make batches by dividing the CSI data along the timeline. We then calculate features from batches. Note that we do not limit features used in our system. In our implementation, we used Random Forest Classifiers (RFCs), which require feature vectors. We assume that the location of a human in a sensing target area has an influence on changes in CSI data. We choose the nine features below: • mean: ξ • standard deviation: σ • median • absolute max value • peak-to-peak range • interquartile range • data percent within one standard deviation of mean ξ ± σ • data percent within double standard deviation of mean ξ ± 2σ • data percent within three standard deviation of mean ξ ± 3σ Most of these features are based on previous papers on human activity recognition. (25,26) Note that only time-domain features are available because CSI measurement is performed at an unstable sampling rate due to CSI packet loss. In addition to the features referring to the previous papers, we employ features such as data percent within the standard deviation of mean, which capture non-Gaussian distribution.
The normalization sub-block normalizes the features using a general normalization method such as min-max normalization. The human detector sub-block then detects a human, which works as a binary classifier, in a target area using the normalized features. We do not limit the machine learning method. The normalized features are passed as the input of the machine learning model, which is trained prior to estimation. The pretrained machine learning model estimates whether a human is in a target area and labels the result as "human" or "no human".

Localization block
The localization block consists of a location classification sub-block, as shown in Fig. 1. The localization block receives the feature vectors when the human detection block estimates there is a "human" in a target area. The localization block estimates the location of the human in a target area.
The location is estimated from sub-areas, i.e., predefined parts of the target area. We therefore use a machine-learning-based classifier to estimate the location. The classification method is again not limited in our system.
Note that the classifier estimates the location within sub-areas with a human even if the feature vectors corresponding to "no human" are passed to the input of the classifier. This might happen when the human detection block mistakenly estimates "human" data as "no human" data.

Experimental Evaluation
We conducted experiments in outdoor LOS and NLOS environments. Figure 5 shows the experimental setup in a LOS environment. We installed a testbed WLAN AP and three Galaxy S7 edge smartphones as CSI measuring stations, denoted by STA1, STA2, and STA3 in Fig. 5(b), at a height of 1 m from the ground. There were no obstacles between the WLAN AP and the CSI measuring stations. An Intel Compute Stick STK2m364CC CSI monitoring station was installed on the ground near the AP. We derived a 4 × 1 CSI feedback matrix because the WLAN AP has four antennas and CSI measuring stations have one antenna. Therefore, six angles for each of 52 subcarriers, i.e., 312 angles in total, were derived from each CSI measuring station. The nine numbered rectangles in Fig. 5(b) represent the location estimation target sub-areas labeled from 1 to 9, which are used in location estimation. No human situation is regarded as sub-area 0. We collected CSI data, i.e., angles ϕ ij and ψ lj , at a sampling rate of 100 Hz for 60 s while a single human randomly walked among the sub-areas. CSI data without a human in the target sub-areas were also collected, which is labeled as sub-area 0. We extracted CSI data when we derived the CSI data from all three CSI measuring stations. ϕ ij values were converted into sinϕ ij and cosϕ ij and fed into the feature extraction sub-block. For each feature, we have 9 CSI angles × 52 subcarriers = 468 angles in total. The dimension of a feature vector is 3 CSI measuring stations × 9 features × 468 angles = 12636. The average rate of CSI-packet reception was 93.2%. Figure 6 shows the NLOS experimental setup. We used the same equipment as that used in the LOS experiment. Because the localization target area was larger than that of LOS experiment, we used four CSI measuring stations denoted by STA1, STA2, STA3, and STA4 in Fig. 6(b). There was a pillar between the WLAN AP and each of the CSI measuring station. The 11 numbered rectangles in Fig. 6(b) represent the location estimation target sub-areas labeled from 1 to 11. No human situation is regarded as sub-area 0.

Experimental setup
We collected CSI data in the same manner as in the LOS experiment. The dimension of feature vector is 4 CSI measuring stations × 9 features × 468 angles = 16848. The average rate of CSI-packet reception was 97.3%. There were vehicles and people passing right beside the experiment environment during the experiment. CSI packets might include the influence of the passing vehicles and people.
Note that there were approximately 10 WLAN APs around the two experiment environments. WLAN signals sent from other WLAN devices have no influence on CSI. Background WLAN signals can be regarded as noise. When there are strong background WLAN signals, WLAN frames for CSI measurement might be lost. However, CSI has a small influence of noise because CSI is calculated only when a demodulation process that removes the influence of noise succeeds. In our experiments, packets were sometimes lost, and packet reception rates were therefore less than 100% in both LOS and NLOS experiments.

Human detection performance in LOS environment
First, we evaluated the human detection performance of our outdoor device-free human localization system. The data labeled 0 means that there is no human in the target area, while the data labeled from 1 to 9 means a human is in the target area. For the evaluation, we performed k-fold cross-validation. We randomly divided the data into k chunks. One of the chunks was used for performance evaluation and the remaining chunks were used for training. We repeated the cross-validation n times with randomly shuffled batches. Note that the data order in each batch was not shuffled. We set k = 5 and n = 10. The batch size was 50, which corresponds to 500 ms.
For both training and testing, we equalized the number of data in each label by undersampling the batches. For the performance evaluation of human localization, test data were undersampled from the estimation result of the human detection block.
In our evaluation, we utilized RFCs in both human detection and localization blocks. RFCs require no normalized data as the input. We therefore skipped the normalization process in our evaluation. Figure 7 is the confusion matrix of the human detection results. Our human localization system successfully detected the presence of a human with an overall accuracy of 99.05%. A very small number of detection trials incorrectly detected or failed to detect a human. Figures 8-10 are the confusion matrices of the human detection results obtained using the data from CSI measuring stations STA1, STA2, and STA3, respectively. The total accuracies for STA1, STA2, and STA3 were 98.66, 96.52, and 92.34%, respectively, which are lower than the overall accuracy when using all three stations.
Comparing Figs. 8-10, we can confirm that results obtained with data from STA1 and from STA3 have the highest and lowest accuracies, respectively. The CSI data measured at different locations had different impacts on human detection performance.

Human detection performance in NLOS environment
We often install WLAN devices in NLOS even in outdoor environments in our daily lives. We evaluated the performance of our outdoor human localization system in an NLOS environment. We performed k-fold cross-validation with the same parameters as those used in the evaluation in the LOS environment. Table 2 shows overall accuracies of human detection when we used CSI from all the stations and each of the stations. The overall accuracy when we used CSI from all the stations was the highest. Compared with the overall human detection accuracy 99.05% in the LOS environment, we derived a slightly higher overall accuracy in the NLOS environment. The overall accuracy is dependent on the location of the CSI measuring station, which resulted in the small difference.

Localization performance in LOS environment
Next, we evaluated the human localization performance of our outdoor device-free human localization system. We again performed k-fold cross-validation with the same parameters as    those used in the evaluation of the human detection block because we tested the model in the location classification sub-block only when the human detection block estimated a "human". Figure 11 shows the confusion matrix of the human localization results. The overall accuracy of the human localization was 93.51%. There are non-zero cells in the row with an "actual area" of 0. The non-zero cells correspond to false positive detections by the human detection block, as described in Sect. 3.5.
The location estimation accuracy was lowest for sub-area 7. We assume that this was caused by its distance from the WLAN AP. A human in sub-area 7 had a smaller influence on the radio signal propagation environment than in the other sub-areas owing to the distance from the WLAN AP. In sub-areas distant from the AP such as sub-areas 1 and 9, we derived good results. There are trees near sub-areas 1 and 9, while there are no trees near sub-area 7, as shown in Fig. 5. The trees affected radio signal propagation, which made the influence of a human more significant. Figures 12-14 show the confusion matrices of the location results when we used data from STA1, STA2, and STA3, respectively. The overall accuracies of human localization were 90.67, 76.24, and 85.31%, respectively, which are lower than the overall accuracy for all three stations. These results show a different tendency: the performance in the highest and lowest sub-areas is dependent on the location of stations. From these results, we can confirm that the localization performance is highly dependent on the location of the CSI measuring station. Our proposed human localization system combines the data from all three CSI measuring stations to minimize the performance difference and to improve overall localization accuracy.
To validate that our proposed method is effective for other localization method, we evaluated overall localization accuracies when we utilize the support vector machine (SVM), XGBoost, and the method proposed in Ref. 19 instead of random forest. Table 3 summarizes the overall localization accuracies in the LOS environment with random forest, SVM, XGBoost, and Ref. 19 classifiers. We can confirm that the overall accuracy is improved when we utilize CSI data from all stations compared with the accuracy derived with a single station.

Localization performance in NLOS environment
Finally, we evaluated the human localization performance of our outdoor device-free human localization system in the NLOS environment. We performed k-fold cross-validation with the same parameters as those used in the above human detection and localization evaluations.      Table 4 shows the overall localization accuracies in the NLOS environment with random forest, SVM, XGBoost, and Ref. 19 classifiers. We can confirm that the overall accuracy is improved when we utilized CSI data from all stations compared with the accuracy derived with CSI from a single station.
We again note that we do not limit the classifier algorithm used in the human localization process. By utilizing simple signal processing to phase information, we were able to use IEEE 802.11ac compressed CSI with our system. CSI is efficiently collected from multiple WLAN stations. CSI data from multiple stations successfully improved localization accuracy both in the LOS and NLOS environments.

Conclusion
In this paper, we presented an outdoor device-free human localization system used for IEEE 802.11ac compressed CSI. Our system extracts features from batched CSI data derived from multiple transmitter/receiver pairs and estimates the existence of a human in a target area. When the system detects a human, the system proceeds to the localization process, where the detected human location is estimated. We implemented our localization system and evaluated its performance using CSI data collected from three transmitter-receiver pairs in an outdoor LOS environment with actual WLAN devices. The evaluation results revealed that our localization system successfully estimated human location with overall accuracies of 93.51 and 92.65% in LOS and NLOS environments, respectively.
For each environment, we have conducted the experiments on a single day, which implicitly excludes the influence of time passage. The experiments were also conducted under sunny condition. We are planning to evaluate the influence of time passage and various weather conditions as our future work.