Novel and Robust Vision- and System-on-chip-based Sensor for Fall Detection

In this paper, we propose a novel and robust visionand system-on-chip (SoC)-based system as a sensor to effectively detect falls of the elderly. The proposed method consists of five steps: initial light stability confirmation, gradient-difference-based foreground detection, dilationand multiframe-based foreground construction, false fall detection problem solving, and fall detection determination with a general-purpose input/output-based fall warning transmission. Real test videos have shown that our comprehensive experiments justify the low power, low hardware cost, and high detection accuracy merits of the proposed method when compared with related fall detection methods.


Introduction
According to Ref. 1, the percentage of people aged 65 or older in the world will be 16% in 2050. The main reason is that the number of elderly people is increasing owing to the improvement in health care provided by hospitals and medical advancements. In Taiwan, the percentage of elderly people aged 65 and above at the end of 2017 was 13.9%, and the percentage is expected to reach 20% in 2026. (2) The statistical data indicated that, on average, 28-35% of the elderly fall at least once a year, and these falls often lead to serious injuries, especially if the elderly patient falls at home. (3) Once a fall event occurs for an elderly person, if a fall detection signal can be sent out in time to call for first aid, the person can be rescued. The previous fall detection methods can be classified into four categories, namely, the wearable-device-based category, the ambiance-sensor-based category, the RGB-D [red-green-blue (RGB) color camera plus depth sensor]-based category, and the camera-based category. In the following brief survey of related works, we also point out their weaknesses.

Related work
In the wearable-device-based category, the user often carries a wearable device or a smartphone. Pierleoni et al. (4) proposed a fall detection method in which the hardware covers a slim battery, a wireless receiver, a microprocessor, and a sensor used to gather information from a triaxial accelerometer, a gyroscope, and a magnetometer. Then, the collected data are analyzed by the proposed fall detection method. In Kau and Chen's fall detection method, (5) the smart-phone used includes an e-compass and a triaxial accelerometer. From the gathered information, the discrete wavelet transform and support vector machine techniques are used to detect a fall event. Cheffena (6) proposed a smartphone-audio-feature-based fall detection method. The main constraint of the above-mentioned wearable-device-based fall detection methods is that it is inconvenient for the elderly to carry a wearable device all day in order to detect falls.
In the ambiance-sensor-based category, Su et al. (7) set up a Doppler sensor under the ceiling and proposed a coarse-to-fine-based discrete wavelet transform approach to detect falls. Shiba et al. (8) proposed a hidden Markov model approach to measure the frequency distribution of vertical velocity trajectories in the foreground for determining the abrupt change occurring in a fall event. Erol et al. (9) proposed a fall detection method using two Doppler sensors and the fusion-based approach to achieve better performance. The main constraint in the above Doppler-sensor-based approach is that the feasible distance is quite small from the sensor to the foreground.
In the RGB-D-camera-based category, the hardware mainly includes a Microsoft Kinect system and a personal computer (PC). Abobakr et al. (10) proposed a skeleton-free fall detection method using the random decision forest model constructed from training depth maps. Finally, the fall event is determined by the support vector machine technique. Bian et al. (11) employed an enhanced randomized decision tree in fall detection such that human joint information can be extracted. Furthermore, joint trajectory information is provided to detect the fall event. Kong et al. (12) proposed a multistep fall detection method. For fall detection, one merit of the RGB-Dcamera-based approach is its independence from light changes, even in a dark room. However, the RGB-D camera, as well as the bundled PC, is expensive and takes up space.
In the color-camera-based category, the hardware used mainly includes a color digital camera and a PC. Liao et al. (13) combined a Bayesian belief network model, the motion activity measure, and human silhouette shape variations to solve the fall detection problem. Lin et al. (14) first applied the Gaussian mixture model and the ellipse approach to approximate the foreground. Furthermore, using the angular orientation and motion acceleration information, a fall event is detected. Nguyen et al. (15) approximated the foreground using an ellipse and the smallest rectangle. According to the angle of the ellipse and the ratio of the height to the width of the rectangle, a rule-based classification scheme is proposed to detect a fall event. Abdelhedi et al.'s method (16) was first applied to the Type-2 Fuzzy Gaussian mixture model to extract the foreground, and then the human fall behavior is analyzed by fuzzy logic to detect a fall event. Although the computational time is short, the camera and bundled PC are still not cheap for consumers.
Currently, on the basis of a low-powered and low-cost system-on-chip (SoC) framework, Chung et al. (17) first applied the Gaussian mixture model to estimate the background model and then generated the foreground model by subtracting the estimated background from the current image frame. Furthermore, according to the variation of the centroid, a fall detection procedure is carried out. Experimental data indicated that there is room for improvement in its accuracy and execution time, especially in a complicated environment. Besides the method in Ref. 17, the three methods in Refs. 14-16 are also included in the comparative methods.

Motivation
The main motivation of this paper is twofold: (1) to develop an effective vision-and SoCbased fall detection method with low power, low hardware cost, and high detection accuracy to overcome the limitations existing in the above-mentioned four fall detection categories, and (2) to design a more robust foreground construction process to improve our preliminary result, (17) and thereby improving the accuracy of the fall detection method.

Contributions
In this paper, we propose a new, effective vision-and SoC-based fall detection method that consists of five steps: initial light stability confirmation, gradient-difference-based foreground detection, dilation-and multiframe-based foreground construction, false fall detection problem solving, and fall detection determination associated with a general-purpose input/output-based fall warning transmission. In particular, the proposed gradient-differencebased foreground detection step, which is robust against light changes, and the dilation-and multiframe-based foreground construction step can robustly construct the foreground model. Because the extracted foreground is more complete, not only can the fall detection accuracy be increased, but the false fall detection problem caused by a squat posture can also be resolved. Real test videos have shown that our comprehensive experiments have justified the low power, low hardware cost, and high detection accuracy of the proposed fall detection method when compared with related methods. (14)(15)(16)(17) The rest of this paper is organized as follows. In Sect. 2, the SoC-based fall detection system setting in a home room is presented. In Sect. 3, the proposed fall detection method is presented. In Sect. 4, the experimental results are reported. In Sect. 5, some conclusions are presented.

Vision-and SoC-based Fall Detection System Setting in a Room
The room environment simulated in the experiment is depicted in Fig. 1(a). As depicted in Fig. 1(b), the proposed vision-and SoC-based fall detection system contains two kinds of hardware, the SoC 'Hisilicon-Hi3516CV300' with a size of 37 × 37 mm 2 and the bundled camera 'M12-4IR(3MP)-C'. They are combined together and arranged at location A, as depicted in Fig. 1(c). With our fall detection method in the embedding system in SoC as a fall detection sensor, the power consumption of the proposed fall detection sensor is only 0.5 W.
Note that the three comparative methods (14)(15)(16) were implemented in the PC-based system in which the CPU is Intel i7-8700@3.2GHz with RAM 24GB, and the power consumption is 70 W, indicating the low power merit of our vision-and SoC-based fall detection system. On the basis of a general-purpose input/output transmitter, our method sends out an alert signal to call for help when a fall event is detected by the proposed fall detection sensor.
The f unctionalit y of the proposed system is as follows. The CPU in SoC is ARM926@800MHz associated with 32 KB I-Cache, 32 KB D-Cache, 256 MB RAM, and 128 MB ROM. The operating system is Hisilicon Linux 32-bit and the HiLinux kernel is developed on the basis of the standard Linux kernel V3.18.y. The program development environment is the C standard library glibc-2.20. The power supply is via electrical wiring.

Proposed Fall Detection Method
Because the Gaussian mixture model for constructing the foreground model (17) is timeconsuming and the constructed foreground model is sometimes incomplete, we propose a new gradient-difference-based foreground detection approach and a dilation-and multiframebased approach to construct a more complete foreground model. As shown in Fig. 2, the entire proposed fall detection method consists of the following five steps.

Step 1: Initial light stability confirmation
Let I t and I t−1 denote the two images at time instances t and t − 1, respectively. To test whether the light condition has been stabilized before starting the formal fall detection task, we first compute the absolute difference between the average gray value of I t and that of I t−1 , and then we compute the absolute average difference that can be calculated efficiently by the following histogram computation-based method: 1 0 0 320 240 320 240 where H t and H t−1 denote the two histograms of I t and I t−1 , respectively, in which each image has 320 × 240 pixels. The main reason why we adopt the histogram-based approach to calculate D avg in Eq. (1) is that the histogram of each image can be quickly calculated by calling the building function 'HI_MPI_IVE_Hist' in SoC (Table 1). Furthermore, we examine whether D avg is less than the specific threshold, and empirically the threshold is set to 100. If not, it means that the light condition has not been stabilized and we go to Step 1; otherwise, we go to Step 2.

Step 2: Gradient-difference-based foreground detection
To remove noise, we perform the erosion operation function 'HI_MPI_IVE_Erode' on the current image frame. Next, we perform the Sobel edge detection function 'HI_MPI_IVE_ Sobel' on the current image frame I t to obtain the gradient map G t , and then we apply the building function 'HI_MPI_IVE_Sub' to calculate the gradient difference map between G t and G t−1 . The calculated gradient difference map can be used to extract more foreground motion information against light changes. Furthermore, we apply the building function 'HI_MPI_ IVE_Thresh' to remove the pixel with weak motion information in the gradient difference map, obtaining the coarse foreground F t in I t .

Step 3: Dilation-and multiframe-based foreground construction
To obtain a fine foreground model, each time, the coarse foreground model in Step 2 at the time instance t is fused by performing the logical OR operation 'HI_MPI_IVE_Or' on the five consecutive foreground models at the time instances t, t − 1, ..., and t − 4, and the resultant foreground model is given by Because F t still has several holes, the building function 'HI_MPI_IVE_Dilate' is performed on F t to dilate each connected component in F t , obtaining a more complete foreground model F t '. For example, given an image frame in Fig. 3(a), after performing Step 2 and solving Eq. (2) for Fig. 3(a), the constructed foreground model F t ' is depicted by those white pixels in Fig. 3(b). Furthermore, we calculate the centroid of  in which m 10 , m 01 , and m 00 denote the traditional first-order, first-order, and zero-order moments of F t ', respectively. The centroid calculated using Eq. (3) is used to identify the center of the constructed foreground model F t '. (18)

Step 4: Solving the false fall detection problem
When one person is opening the door at time instance t to leave the current room, the light coming from the neighboring room and pouring into the floor of the current room may make the centroid of the foreground appear to return from the door area to the floor area of the room, which usually leads to a false fall detection. In what follows, we propose an effective approach to solving this false fall detection problem.
Considering the three consecutive gradient maps G t−2 , G t−1 , and G t , we perform the difference operation 'HI_MPI_IVE_Sub' on two consecutive gradient maps to obtain the two boundary maps B t−1 = G t−1 − G t−2 and B t = G t − G t−1 . Furthermore, we perform the logical AND operation 'HI_MPI_IVE_And' on the two consecutive boundary maps B t−1 and B t to obtain the boundary change map B t ' = B t−1 ∩ B t , which is robust to light variations.
Let the number of white pixels in B t ' be |B t '|. When |B t '| is small, those white pixels in B t ' may be treated as noise. To construct a stable boundary change map, we perform . Then, we calculate the centroid of B t ', (x c ', y c '). Suppose the door region has been marked in advance. If the centroid of B t ' remains in the door region, we set the status of the door region to be S door = 1; otherwise, S door = 0. Because the constructed boundary change map is robust to light variations, the false fall detection can be avoided with the help of the door region status. Note that the status of the door region determined in this step could be considered together with the centroid calculated in Step 3 to determine whether a true fall event has occurred, as described in the next step.

Step 5: Fall detection determination
If the centroid obtained in Step 3 is on the floor area and the door region status obtained in Step 4 is S door = 0, then the constructed foreground is in a candidate fall event on the floor region; otherwise, the constructed foreground is considered to be in a nonfall event. When the constructed foreground is in a candidate fall event on the floor region, if the centroid of B t ' determined by solving Eq. (3) remains in a fixed location for more than t seconds, e.g., t = 3, a true fall detection alarm signal is formally sent out via the general-purpose input/output-based transmitter to call for first aid help; otherwise, go to Step 1.
As shown in Figs. 4(a) and 4(b) in which each real image frame contains a different fall event, Figs. 4(c) and 4(d) demonstrate the corresponding detected fall events, respectively. Owing to the robustness of the proposed foreground detection process (see Step 2), the process for constructing a more complete foreground (see Step 3), and the boundary change map for providing the door region status, the detected fall event by our vision-and SoC-based method is much more reliable than our early work using the Gaussian mixture model approach. (17)

Experimental Results
In this section, thorough experimentation has been carried out to demonstrate that in terms of accuracy, precision, and recall, the proposed vision-and SoC-based fall detection method outperforms our previous work (17) and the three-image-processing-and PC-based fall detection methods. (14)(15)(16)

Figures and tables
where 'Accuracy' denotes the total successful fall and nonfall detection rate of all events, 'Precision' denotes the successful fall detection rate of all fall events, and 'Recall' denotes the successful rate of all fall events.

Test dataset
As shown in Table 2, the types of simulated fall and nonfall events are nine and five, respectively, and the test simulation videos are captured in a real room. In the simulation, there are three persons, two females and one male, who simulate the above-mentioned fourteen fall and nonfall events. In Table 2, four types of events for the bed and chair are included to increase the practical application of the proposed vision-and SoC-based fall detection method. In total, we have 43 videos with 51 different segments covering the fourteen types of fall and nonfall events. The readers can access these test videos from the website.

Performance evaluation and comparison
From the above-mentioned test videos, in terms of the three performance metrics in Eqs. (4)-(6), Table 3 shows the performance comparison among our vision-and SoC-based fall detection method and the four other related methods. In Table 3, Ac, Pr, and Re are the abbreviations for accuracy, precision, and recall, respectively. From Table 3, we observe that the accuracy of our method is 98.0%, the detection precision is 100%, and the recall is 96.6%. The average number of image frames processed per second (fps) by our method is less than that using the PC-based approaches; (14)(15)(16) however, the average number of fps obtained by our method is much higher than that in our previous work. (17) When compared with the related fall detection methods, (14)(15)(16)(17) our method has the highest fall detection accuracy.

Conclusion
We have presented our vision-and SoC-based fall detection method for the elderly in a home environment. Our method consists of five steps, and in particular, the proposed Steps 2 and 3 can construct a more complete and reliable foreground; the false fall event detection problem can be resolved with the help of the door region status in Step 4. The data from thoroughly conducted experiments have demonstrated the low hardware cost, low power, and high detection accuracy of our fall detection method relative to the previous vision-and SoC-based method (17) and the three previous methods (14)(15)(16) in the color camera-based category. Our future work is to apply the model compression technique (20) to prune the number of parameters used in the recent deep-learning-based fall detection method (21) such that the pruned deep-learning-based detection method can be run in a SoC-based environment.  Table 3 Accuracy and frame rate comparison among related methods. Ac (%) Pr (%) Re (%) Average fps Nguyen et al. (15) 49.0 55.2 55.2 361.016 Abdelhedi et al. (16) 78. 4 78.1 86.2 378.298 Lin et al. (14) 72.5 72.2 86.7 152.213 Chung et al. (17) 76