Unknown On-body Device Position Detection Based on Ensemble Novelty Detection

In recent years, on-body device position recognition has attracted a lot of attention from the ubiquitous computing community with a view to providing reliable services to users. The existing work has focused on the recognition of classes included in a training dataset, but handling a new position that the recognition system does not know is still impossible. The unknown position should be handled in an appropriate way to avoid incorrect behavior and adapt to each user’s way of carrying the device. In this article, we propose a new detection method based on the ensemble learning principle, in which the final results are obtained from a collection of judgments by a weak novelty detector. We devise a method of finding a threshold that maximizes overall accuracy, rather than a mere majority vote. This method is evaluated with three datasets and various conditions to confirm the effectiveness of ensemble novelty detection and the threshold estimation method.


Introduction
People carry on-body devices in a variety of ways, including in pockets and bags. (1) These ways play an important role in the usability of an on-body device and the quality of sensordependent services that facilitate human-human communication, reduce unnecessary energy consumption, and automatically select an appropriate notification method. (2)(3)(4) On-body device position recognition is gaining attention in ubiquitous computing communities that use multiclass classification techniques and where the number of positions is fixed before use. (2,(5)(6)(7)(8) However, in reality, a variety of on-body device positions are available, and a user may carry the device in a position not originally intended for the system, i.e., an unknown position. In this case, the classifier performs incorrect recognition because the current position of the device is not in the range of recognition.
Therefore, the aim of this study is to detect unknown positions using novelty detection technology. When a position that is difficult to classify into any known position type is detected in the use stage, it is not classified into a known position type, but it is rather judged as an unknown position so that appropriate processing can be invoked. In our previous work, we proposed a framework to detect and add unknown positions through the active involvement of the user, (9) in which three types of single novelty detection methods were compared in terms of detection accuracy. In this article, we propose an ensemble novelty detection method based on the principle of ensemble learning, such as random forest, (10) and apply it to unknown position detection. In the proposed method, novelty detection is performed with multiple weak detectors trained with different samples that accept different randomly selected features. Multiple results from weak detectors are integrated to obtain the final judgment on known or unknown positions. We propose a method of estimating an appropriate threshold for the judgment, rather than a mere majority vote. This estimation method is designed to be effective without any user involvement, rather than using a dataset to train weak detectors. The feasibility of the proposed method is evaluated for unknown position detection in on-body device position recognition under various conditions, such as different datasets and combinations of known positions.
The remainder of this article is organized as follows. In Sect. 2, the work related to on-body position recognition and examples of novelty detection applications are examined. In Sect. 3, the proposed method is described with a preliminary experiment to assess the effectiveness of ensemble novelty detection, and a new way to detect unknown positions more accurately is devised. In Sect. 4, the evaluation method is presented. In Sect. 5, the experimental results and the effectiveness of the proposed method are discussed. Finally, in Sect. 6, the article is concluded. Table 1 shows the storing positions of mobile devices found in the literature, which include trouser pockets, chest pockets, shoulder bags, handbags, and the hand. Here, the accuracy generally cannot be compared because experimental conditions and verification methods differ in each work. However, literature all focus on recognizing the position of the device in one of the predefined positions. Hence, the system misrecognizes data from an unknown position as one of the known positions. However, once the system knows that the data are obtained from Table 1 Examples of on-body position recognition.

Literature
Positions supported in the work Fujinami et al. (6) neck (hanging), chest pocket, jacket pocket, trouser front/back pockets, bag (backpack, handbag, and shoulder bag), hand (calling, watching the screen in the portrait direction, and swinging during walking) Fujinami (2) neck (hanging), chest pocket, jacket pocket, trouser front/back pockets, bag (backpack, handbag, shoulder bag, and messenger bag) Alanezi and Mishra (7) jacket pocket, trouser front/back pockets, desk, hand (calling, watching the screen in the portrait direction, and swinging during walking) Shi et al. (11) chest pocket, trouser front/back pockets, hand Bieshaar (12) jacket pocket, trouser front/back pockets, backpack Sztyler and Stuckenschmidt (13) head, chest, upper arm, waist, forearm, thigh, shin Wiese et al. (14) pocket, bag, hand, away from human Yang and Wang (15) jacket pocket, trouser pocket, bag, hand an unknown position, it can take appropriate action, such as discard the result and ask the user to label the data for registering the detected position to recognition targets. Novelty detection is a technique that determines if a test sample belongs to a known or trained class and consists of the following steps: 1. Novelty model construction provides a set of samples belonging to known classes used to train a model of known classes as an inverse representation of the novelty model. 2. Unknown sample detection projects a test sample to the model space and identifies it as a known or unknown sample. Novelty detection is applied to various themes, such as images on the web, text mining, and network security. (16)(17)(18) Additionally, novelty detection using the inertial sensor data of on-body devices has been conducted in human activity recognition. Yin et al. studied human activity recognition, and unknown activities were detected using novelty detection. (19) In their work, five activities, including walking and running, were considered as known behaviors, and all other activities were classified as unknown activities. In addition, Guo et al. performed novelty detection based on human activity recognition using the inertial sensor data of an onbody device. (20) They considered six activities, including walking and running, and changed the combination of known and unknown activities. In our earlier work, (9) we attempted to detect unknown on-body device positions with a single novelty detector, which we consider the first attempt to solve the on-body device localization problem. In the work, three types of supervised novelty detection methods were applied in accordance with the categorization by Domingues et al.: (21) one-class support vector machine as a domain-based method, local outlier factor (LOF) as a density-based method, and isolation forest as an isolation-based method. We concluded that LOF is applicable to unknown position detection; however, its accuracy is insufficient and thus needs to be further improved. In this study, we attempt to improve the accuracy of unknown position detection with an ensemble of LOF-based novelty detectors. Figure 1 illustrates the basic processing flow of the on-body device localization system. As described in Sect. 2, the novelty detector requires the knowledge of a familiar class and is trained in advance by the same data as for the recognition component. Novelty detection works as a filter to feed only the data (samples) from known positions to the position recognition component. This technique helps the component to avoid misclassifying data from an unknown position into one of n known positions. This is a benefit of novelty detection from the system viewpoint of reliability. In addition to simply rejecting the data from unknown positions, the data judged as unknown can be used to expand the supported classes of the system, i.e., known positions, by the user's active involvement in labeling and retraining the recognition component. This is another benefit of novelty detection from the viewpoint of extensibility.

Ensemble novelty detection
As one of the better-known ensemble methods for classification and regression, random forest has achieved a high accuracy in various areas. An ensemble method makes the final decision using the results of a number of weak classifiers (regressors). In novelty detection, the effectiveness of using multiple detectors has also been demonstrated. (22) In this article, we propose an ensemble novelty detection method for the accurate detection of unknown storing positions. We also demonstrate a method of adjusting an important parameter that maximizes the effect of ensemble novelty detection while eliminating the burden of end users.
In random forest, randomness is introduced in two stages of building weak classifiers: the training sample and attribute (feature) subset selections. In the proposed method, we follow this approach. Figure 2 illustrates the processing flow of ensemble novelty detection. S weak detectors are trained in advance, whereby the detection features are chosen from the original feature set F; the training data are also chosen from the original training data labeled as known by restoration extraction. In the detection phase, each detector performs novelty detection independently against a test sample (A) and the judgment of known or unknown is performed by integrating the results from each detector (B). Generally, in ensemble classification, the final results are determined by a simple majority vote that uses the results of multiple weak classifiers. However, in the proposed ensemble novelty detector, the method that makes judgments based on the results of multiple weak detectors is not limited to a simple majority vote. Instead, we introduce a constant T called the decision threshold, and a test sample is finally judged as unknown if the number of weak detectors with the judgment of known is less than T; otherwise, it is judged as known. In other words, S weak detectors generate S final judgment results depending on T. The majority vote is a special case with T = S/2. The next section presents a preliminary experiment carried out to determine the relationship between T and an evaluation parameter.

Preliminary experiments
This section presents preliminary experiments to determine the characteristics of ensemble novelty detection.

Dataset and preprocessing
Three datasets were collected from smartphones carried in a wide variety of ways, such as in trouser pockets, in the hand, or in the handbag ( Table 2). Datasets A and B (2,6) were collected in the authors' laboratory, whereas dataset C (23) is a publicly available dataset. All datasets are composed of three-axis accelerometer signals. The features designed for position recognition were used for each dataset, which include the mean and standard deviation in the time domain and the entropy and energy in the frequency domain, for example.

Evaluation parameter
As shown in Sect. 3.2, T contributes to novelty detection performance. Normalized accuracy (NA) was used as an evaluation parameter, which is represented by Eqs. (1)-(3). NA is the average of the true positive rate (TPR) and true negative rate (TNR). Here, the decision of unknown is assumed as positive and the decision of known is negative. Thus, the TPR indicates the ratio of samples correctly judged as unknown to the total number of unknown samples. By contrast, the TNR is represented by the ratio of samples correctly judged as known to the total number of actually known samples. Meanwhile, global accuracy (GA) is represented by the ratio of samples correctly judged as unknown and known to the total number of samples [Eq. (4)]. Compared with GA, NA is considered a more effective parameter when a difference is observed between the numbers of unknown and known samples.

Effect of ensemble novelty detection
We first examined the relationship between T and NA by changing T in dataset A. There were two groups of 35 subjects (70 in total). The data obtained from the subjects in the first and second groups were used as the training and test data, respectively. The average of the results obtained from the 35 test subjects was taken as the result of the experiment.
The experimental system was implemented using scikit-learn 0.20.3, a Python machine learning library. In the random forest implementation of scikit-learn, the default value for the number of features in weak classifiers is the square root of the original number of features. By following this convention, the size of feature vectors, F′, randomly selected for each weak classifier is five (≈ 30 ≈ ). Moreover, the number of weak detectors, S, was set to 100. The fixed set of utilized known positions was composed of the following: chest pocket, jacket pocket, trouser front pocket, trouser back pocket, backpack, and swinging during walking, which are underlined in Table 2. Figure 3 shows the change in NA when T was varied. NA NonEns is the case where a single detector is used. At T = 63, NA in the ensemble case had a maximum value of 0.864, which is an increase of 0.186 compared with that in the case of NA NonEns (0.678). In addition, NA for the majority vote (T = 50) was 0.818, which is 0.046 lower than the maximum value. These results suggest that the majority vote is not the best solution for the ensemble method and that NA can be greatly increased by setting an appropriate T.
Hereafter, the T with the highest NA is represented as T max , whereas the T obtained by the majority vote is represented as T mv . Moreover, NA(T) represents NA at T. In Sect. 3.4, we describe a method of estimating T max as a hyperparameter.

Effect of the number of known classes
In addition, we examined the effect of the number of known classes in T max by changing the combination of known positions. In addition to dataset A, datasets B and C were also used. Similarly to dataset A, the size of feature vectors for each weak detector (F′) is the square root of the size of the original feature vectors |F|. Thus, eight and four features were randomly selected in experiments for datasets B and C, respectively.
The evaluation was carried out using a leave-p-persons-out cross-validation (LpPO) scheme, in which data from p persons were used to train the ensemble novelty detector and the data from the other persons were used for the test. The case with p = 1 is a special case called leaveone-person-out cross-validation (LOPO-CV), which is a popular evaluation method in machine learning. However, to reduce the time of evaluation, we took half of the persons in the dataset as p and performed the evaluation twice by swapping the roles of training and testing. Thus, data from 70, 20, and 10 persons were all used for training and testing for datasets A, B, and C, respectively. The average of the two tests was used as the evaluation result.
Various combinations of known positions were examined. Each dataset is divided into a group of known positions and a group of unknown positions, in which the data from known positions were used to train the ensemble novelty detector and the data from other positions were used to test the detector. As described above, we used the LpPO approach, and the data for training consisted of the known positions obtained only from half of the subjects, that is, p persons in the training group, whereas the data for the test included the known and unknown positions obtained from another p persons as a realistic condition. Ideally, all possible combinations should be used as known positions in the evaluation. However, in the case of dataset A, the total number of combinations was huge, i.e., 2035 (=  Table 1. These are regarded as the candidates for known positions, which are underlined in Table 2. k positions were used as known positions, and the rest were considered unknown positions. In total, 57 (=  sets of known positions were formed, respectively. As the number of classes in dataset C is small, i.e., five, at most four positions were used as known. Table 3 shows the average T max for different numbers of known positions when the number of weak detectors is 100. In datasets A and B, the differences between T max with k known positions and T max with (k − 1) known positions are within 2.0 to 3.9 and 1.8 to 4.2, respectively, which we consider small. Exceptionally, in dataset C, the difference in T max between k = 3 and k = 4 is 9.1. We consider that this is due to the small number of subjects in the training dataset, i.e., five. Therefore, although exceptions are present, T max obtained with k known positions and T max obtained with (k − 1) known positions can be regarded as close. In the next section, we present a method of estimating T max based on this principle.

Algorithm for estimating T max
As shown in Sect. 3.3.3, if we know T max in advance, then the system can fully benefit from the ensemble novelty detection. However, in practice, unknown classes cannot be included in the dataset for evaluation. Therefore, we propose a method of estimating T max using only training samples of known classes. The proposed estimation method is based on the principle shown in Sect. 3.3.4: T max with k known positions is close to T max with (k − 1) known positions. Let k, n, and S be the number of known positions, the number of subjects, and the number of weak detectors, respectively. The algorithm used to estimate T max is as follows.
Step 1: In one weak detector, calculate NA by using the data from one subject as test data and the data from the remaining (n − 1) subjects as training data, also regarding one of the positions as an unknown position and the remaining (k − 1) positions as known positions. Figure 4 shows a state where Subject 1 is the test subject and the other subjects are used to train the detector. Also, Position 1 is set to the unknown position.
Step 2: Perform Step 1 by changing the weak detector; iterate until all weak detectors have been evaluated, i.e., S times.
Step 3: Obtain S final (integrated) results based on the S results of weak detectors by changing T, and calculate NA for each T.
Step 4: Perform Steps 1 to 3 with a different test subject; iterate until all (n) subjects in the dataset are tested.
Step 5: For each T, calculate the average of n NAs, and check T max : T when the calculated average NA is the highest. Step 6: Perform Steps 1 to 5 while changing the unknown position; iterate k times. Figure 5 shows a state where Position 2 is the unknown position.
Step 7: Calculate the average of the k values of T max obtained in Step 5 and regard it as T max .
T max is represented as T, which corresponds to the highest average NA of all combinations of known positions calculated using the LOPO-CV scheme. The steps described above need to be performed when the number of known classes is changed. Therefore, an end user does not need to do anything unless a new storing position is registered to the recognition targets during use. We discriminate the estimated T max from the true T max by referring to the estimated one as T est .

Evaluation Methodology
The proposed ensemble novelty detection method was evaluated using the three datasets in more depth. This section provides a description of the method, followed by the results and discussion in Sect. 5. Unless noted otherwise, the number of weak detectors (S) and the contamination (cont), a parameter in LOF, were set to 100 and 0.05, respectively.

Basic performance of the proposed method
The basic detection performance of the proposed method was evaluated using the three datasets shown in Table 2. The evaluation was carried out in the same manner as in Sect. 3.3.4. Here, NA(T est ) was also calculated.

Effectiveness against parameter variations
The effectiveness of the ensemble novelty detection and the T max estimation method was evaluated in depth against two parameters: the number of weak novelty detectors (S) and the contamination parameter in the LOF-based novelty detector.
Generally, in ensemble classification, accuracy increases with the number of weak classifiers and converges at a certain level. (6) Thus, the number of weak detectors was changed to 20 and 50 to confirm this tendency in ensemble novelty detection. At the same time, the feasibility of the ensemble novelty detection and the proposed T max estimation method for various numbers of weak detectors was evaluated.
Furthermore, in LOF, cont is an important parameter that represents the novelty threshold. If cont increases, then the test sample is more likely to be judged as unknown. Here, we set the value to 0.01 and 0.10 to verify the effectiveness of the proposed method for various cont values.

Effectiveness of the proposed estimation method
Although the experiment in Sect. 4.1 covered all possible combinations of known classes in dataset A, here we pick a case that we used in the preliminary experiment in Sect. 3.3.3 to examine the accuracy of T est estimation in a concrete example. In this case, the combination of known positions consists of the following: chest pocket, jacket pocket, trouser front pocket, trouser back pocket, backpack, and swinging during walking. In Fig. 6, we add T est and NA(T est ) to Fig. 3 to compare different T values and the corresponding NAs. The figure shows that T est is much closer to T max than T mv , which indicates that the proposed estimation method performed better than the mere majority vote in this case.  Figure 7 shows the overall experimental results, i.e., the averages of 57, 26, and 25 combinations of known positions for datasets A, B, and C, respectively. A similar tendency can be found in datasets A and B, indicating that NA(T est ) is closer to NA(T max ) than NA(T mv ) and (NA NonEns ). However, in dataset C, NA(T mv ) is larger than NA(T est ). This is because the number of subjects in the training data was too small (5) to construct the novelty model well. The results discussed in Sect. 3.3.4 also show that the underlying assumption required for applying the T est estimation method was not established. For the other two datasets, a decision threshold that is close to the ideal one (T max ) can be successfully estimated from the data from persons whose data are not used for training the novelty detectors.

Effects of features on ensemble novelty detection
Similarly to a general classification task, the number and types of features are considered to affect the detection performance of novelty detection. In this study, different features were used for each of the three datasets; however, a commonly effective set of features may exist for the same task, i.e., on-body device position recognition. Finding such a feature set is an agenda for future work. Meanwhile, the proposed ensemble method detects unknown classes for each weak detector using different subset of features, which means that novelty models are generated in a different feature spaces. In general, a feature set that is not suitable for detecting any unknown class is difficult to identify in advance. A single detector case (NonEns) may fail to detect a novelty class with the particular set of features. By contrast, in a collection of weak detectors working in heterogeneous feature spaces, some of them performs correctly and others do not. If the number of correctly performed detectors is above the decision threshold, an overall detection is correct. We consider that this is a reason that the ensemble methods using T mv , T est , and T max showed better NAs than NA NonEns as shown in Fig. 7. Figure 8 shows the results obtained when the numbers of weak detectors (S) are 20, 50, and 100. As S increases, so does NA(T max ); the rate of increase is low, indicating a tendency similar to that of the general ensemble classification. Moreover, NA(T est ) in datasets A and B is closer to NA(T max ) than NA(T mv ) and also much higher than NA NonEns . NA(T est ) in dataset C also increases with S; however, it is lower than NA(T mv ) as discussed in the previous section. Therefore, the proposed estimation method is applicable even if the number of weak detectors is changed; however, there is a case in which NA(T est ) is lower than NA(T mv ).

Robustness against the number of weak detectors
As shown above, NA(T est ) increases with the number of weak detectors; however, the increase in NA(T est ) can also increase the requirements for computational memory space and processing speed. Thus, a trade-off exists between the number of weak detectors and these requirements. Therefore, an appropriate number of weak detectors should be determined depending on the desired NA and available computational power. Figure 9 shows the results obtained when the cont values are 0.01, 0.05, and 0.10. As shown in this figure, NA NonEns markedly changes with cont, whereas the variations in NA(T max ) and  NA(T est ) are not as large as that in NA NonEns . This finding suggests that the ensemble novelty detection is less affected by cont. Furthermore, the calculated NA(T est ) is closer to NA(T max ) than NA(T mv ) for all cont values for datasets A and B and in the case of cont = 0.01 for dataset C. Therefore, the proposed estimation method is effective in most cases.

Robustness against novelty threshold (contamination)
The important point is that the effects of the proposed ensemble novelty detection and T est estimation are reasonably independent of the internal parameter of the LOF-based single novelty detector, which implies that the adjustment of cont is not such a critical task.

Conclusion
We proposed an ensemble novelty detection method and applied it to the on-body device position recognition problem. This method integrates the results from multiple weak novelty detectors to infer whether a device is located in a known or unknown position. We devised a method of estimating the decision threshold for the known position that maximizes the overall accuracy of novelty detection (T max ). This method was evaluated using three datasets while changing parameters such as the combination of positions regarded as known, the number of weak detectors, and the hyperparameter of the LOF-based novelty detector, i.e., contamination. For various conditions, we confirmed that the ensemble novelty detection with the estimated decision threshold outperformed a single novelty detector in all three datasets. The accuracy obtained with the proposed T max estimation (T est ) was comparable to that obtained with T max and was more effective than a mere majority vote in two datasets; however, one of the datasets with a relatively small number of persons (five) showed a lower accuracy than the results with the majority vote. We also confirmed that the detection accuracy increased with the number of weak detector (S), while there was little change in the accuracy with respect to contamination (cont). Thus, S is the dominant parameter in the proposed method.
We consider that the proposed estimation method is applicable to a wide variety of novelty detection problems in which the training data are obtained from multiple subjects and constitute multiple classes in addition to on-body position recognition, such as human daily activity recognition and fall detection. In the future, the applicability will be verified in such domains.