On the Use of Kinect Sensors to Design a Sport Instructor Robot for Rehabilitation and Exercise Training of the Elderly

In this study, we developed a sport instructor robot system for rehabilitation and exercise training of senior persons. In the system, a popular Kinect sensor, viewed as the eyes of a humanoid robot, was employed to capture and recognize the gestures made by a person. For the design of the Kinect-sensor-based sport instructor robot system, there are primarily two investigating phases: the establishment of an expert system for the sport instructor robot, and the development of Kinect-sensor-based gesture recognition. The humanoid robot system of the sport instructor and the Kinect-sensor-based gesture recognition system were successfully bundled together using a state machine scheme for effectively performing exercise training of the elderly. In this study, three different types of state machine dominating three different exercise training strategies were developed. In Kinect-sensor-based gesture recognition to check the correctness of a person’s active gesture, to further increase the recognition accuracy, gesture activity detection (GAD) was investigated, and some GAD methods were proposed. The efficiency and effectiveness of the system were evaluated using a gesture activity database composed of seven different elderly rehabilitation actions. Experimental results demonstrated the feasibility and the superiority of the Kinect-sensor-based sport instructor robot system.


Introduction
Maintaining good exercise habits is a very important thing for the elderly. In fact, there is still a majority of the elderly who do not have a regular habit of outdoor activities. Because these elderly persons do not have exercise habits, they experience osteoporosis, muscle weakness, joint degeneration, or other age-related diseases, which largely reduce their desire to go out to exercise. A home-exercise instructional teaching film may be an alternative for the elderly to learn to exercise. However, learning such activities by watching an instructional film will not be effective, especially for training classes in rehabilitation activities, since a film cannot adjust an error in a gesture immediately if an elderly person performs a wrong or substandard gesture. To overcome these problems, in this study we developed a sport instructor robot system in which a popular Kinect-camera sensor device is employed as the eye of the humanoid robot to see if the person carries out a correct gesture. Figure 1 illustrates the proposed sport instructor robot system incorporating Kinect-sensor-based gesture recognition for rehabilitation and exercise training of the elderly. As shown in Fig. 1, the humanoid robot is well-trained to perform a series of gestures of certain rehabilitative sports as a real sport expert, and the active person learns the rehabilitative sport by doing the same gesture as the robot. Note that in the system in Fig. 1, Kinect-sensorbased gesture recognition is performed in the learning process of the overall exercise to verify the correctness of each gesture performed by the user.
The Kinect device made by the Microsoft company is a popular body-sensing sensor, (1) and the use of the Kinect sensor in gesture recognition has been noted in a series of studies. (2)(3)(4) An eigenspace approach for Kinect gesture recognition is presented in Ref. 2. In the work of Ref. 3, Kinect-based gesture recognition using a hidden Markov model (HMM) method was proposed, and the approach was used in the application so that the humanoid robot could imitate all the active gestures of a human user. A feature design scheme for Kinect-based dynamic time-warping gesture recognition was developed in Ref. 4. Kinect-sensor-based gesture recognition could be widely used in lots of specific purposes of applications including the application of rehabilitation exercise (5)(6)(7) and the application of intelligent teaching and learning. (8)(9)(10) In Ref. 5, a scheme of goal orientated stroke rehabilitation with the support of the Kinect sensor was developed. A home-based rehabilitation exercise by combining Kinect and fuzzified dynamic time warping gesture recognition algorithm was investigated in the work of Ref. 6. In the study of Ref. 7, a system to develop a Kinect-based rehabilitation training assistant was presented and implemented. The utilization of the Kinect sensor as an e-teacher to help the person intelligently learn certain exercise at home could be seen in the study of Ref. 8 to teach children with hearing and visual impairment, in the work of Ref. 9 to assist the deaf or mute person by the developed gesture-based game, and in the research of Ref. 10 to evaluate the walking style quality of the person by the Kinect-sensor-derived 3D skeleton model. In addition, studies on combining both the Kinect sensor device and the robot device for achieving a specific application have also been reported in recent years. (11)(12)(13)(14) In the study in Ref. 11, several  465 different body gestures were recognized using the Kinect sensor, and then an interactive interface between the body gesture module and the humanoid robot was generated. Similar to the work in Ref. 11, a Kinect-based gesture command control method for humanoid robots to imitate human actions was presented in Ref. 12. In the study in Ref. 12, the humanoid robot performed the same active gesture as the test active user, and a human-computer interface using Kinect-based gesture command control was developed for constructing a humanoid robot system to imitate the activity of a person. An integrating task to combine the Kinect sensor, the gesture recognition system, and the mobile device was carried out in Ref. 13 to construct an application that enables interactive discussion. In the work in Ref. 14, the microphone array of the Kinect device was used to develop a speech and speaker recognition method for humanoid robot exhibition control. Different to the Kinect-sensor-based gesture recognition studies in Refs. 5-10 using only Kinect for rehabilitative exercise and the integration studies of Kinect and robots in Refs. 11-14 used in only game-based applications, this work presents a new technique of a sport instructor robot expert system incorporating Kinect-sensor-based gesture recognition for a new application of rehabilitative training of the elderly. Compared with the work in Ref. 12, the roles that the humanoid robot and the person play were completely exchanged, such that the humanoid robot becomes a sport expert and plays standard gesture activities for the aged person to imitate. In this study, a state machine scheme was explored for achieving the regulation of certain rehabilitative sport activities in the sport teaching system. In addition, to further enhance the Kinect-sensor-based gesture recognition system, the new technical issue of gesture activity detection (GAD) was also investigated in this study. GAD for improving Kinect-sensor-based gesture recognition is a completely different strategy from those feature-based or model-based performance improvement methods presented in the studies in Refs. 2-4. In summary, the system presented in this study has several competitive advantages compared with conventional systems: • an effective integration of Kinect and humanoid robots as a sport expert instructor to interactively perform rehabilitative training of the elderly, • an intelligent state machine mechanism for regulating the overall training exercise of all specific gestures with each state node defined as a proper Kinect-sensor-based gesture recognition task, and • a GAD scheme and related GAD-derived momentum information for each class of gestures to effectively enhance Kinect-sensor-based gesture recognition.

A Sport Instructor Robot System with Kinect-Sensor-Based Gesture Recognition
This section presents the sport instructor robot system with Kinect-based recognition. The system employing a humanoid robot device, a Kinect sensing device, and the related pattern recognition method for recognition of the learner's active gestures was divided into three parts: expert system designs of the sport instructor for the Kinect-incorporated humanoid robot, enhanced Kinect gesture recognition by GAD, and the state machine-regulated rehabilitation sport scheme composed of a series of Kinect gesture recognitions with GAD, which is detailed in the following.

Expert system designs for the sport instructor in the Kinect-sensor-incorporated humanoid robot system
The design of the expert system for the sport instructor involves two technical issues: (1) the development of the humanoid robot sport expert based on certain specific rehabilitations or sports, and (2) Kinect-sensor-based gesture recognition of the sport instructor robot system with respect to the active trainee.
For the first technical issue-the development of the humanoid robot sport expert for certain specific rehabilitations or sports-a general expert system design strategy was used. The expert system for the humanoid robot sport expert contains a phase of knowledge acquisition, a phase of establishing a 3D-(x,y,z) gesture database, a phase of establishing the humanoid robot motion engine, and a phase of performing verification of the expert knowledge. In this work, certain rehabilitation exercises including seven different gesture actions for the elderly were applied, and their corresponding expert system for the sport instructor was developed. Figure 2 shows that a human sport expert performed a specific rehabilitation exercise in the Kinect-sensing environment, and all the 3D-coordinate data from human skeletal information collected in the process of the actions were estimated to establish the gesture database. The established gesture database was then provided to build up the humanoid robot motion engine to define all motion parameters for the humanoid robot device.
For the second technical issue-carrying out gesture recognition on the active trainee using the Kinect sensor incorporated in the sport instructor robot system-an eigenspace approach for human gesture recognition by Kinect-sensor-derived 3D-coordinate data was employed (see Ref. 2). An eigenspace approach for Kinect-sensor-based gesture recognition, called Eigen3Dgesture, had been proposed earlier in a previous study reported in Ref. 2. The Eigen3Dgesture employing principal component analysis (PCA) containing mainly three calculation phases: PCA operations for 3D features of human activities, eigenspace establishment of gesture information, and recognition of the test gesture data via the trained eigenspace. In the previous Kinect-sensor-based gesture recognition works including Ref. 2, the technical issue of GAD based on the active user's actual gesture data was not taken into account. In the phase of recognition of the test gesture data via the trained Eigen3Dgesture eigenspace, a direct scheme without any GAD evaluation to make a recognition decision is as follows, where D i is the distance of the test gesture data and the centroid of the i-th classification of active gestures in the Eigen3Dgesture eigenspace, and the recognition outcome ĩ is the label of the gesture classification with the minimum distance among all n defined gesture categorizations. With the use of GAD, the performance of the Kinect-sensor-based gesture recognition system can be further improved, which will be detailed in the following sections.

Enhanced Kinect-sensor-based gesture recognition by GAD
An enhanced Kinect-sensor-based gesture recognition method using GAD to further increase the accuracy of the recognition of the gesture of the rehabilitative sport learner in the sport instructor expert system is presented in this section. Traditional Kinect-sensor-based gesture recognition uses a fixed-length time-interval to accumulate the gesture data of the user for gesture recognition. However, such invariant-sized gesture data for recognition would not be suitable in an application involving real-time gesture recognition. A large time interval for collecting gesture data, 10 s, for example, would create too much redundant data that does not belong to the assigned gesture action. Conversely, incomplete gesture data for a gesture made by the learner would be obtained owing to a small-sized time interval, only 1 s, for example, set to accumulate the active gesture data. Both large-sized and small-sized time intervals to collect gesture data are not appropriate for the Kinectsensor-based gesture recognition system. A more flexible and accurate method for collecting gesture data is to detect the position at the start or the position at the end of the assigned gesture action by the learner. Performing GAD for an active person by calculating the start point, the end point, or both was therefore explored, and two GAD methods were developed in this work, which are described in the following section. Figures 3 and 4 are two GAD algorithms proposed in this study, which are called one-cut GAD and two-cut GAD, respectively. One-cut GAD determined only the starting-point frame of the assigned gesture action of the active user, and the end-point frame of the action was acquired directly by adding an appropriate time duration (which is set 3 s where there are 90 gesture frames in total in this study) from the starting point. Different to one-cut GAD, the two-cut GAD method calculated both the starting-and end-point frames of the assigned gesture action made by the learner. When the starting and end points of the assigned gesture action were accurately determined by GAD, effective data representing the assigned gesture action could then be finely acquired between the estimated starting-and end-point frames. Gesture data collected using the GAD approaches was much more effective for Kinect-sensor-based gesture recognition. Note that in overall GAD calculation, the momentum information of the gesture frame could be used to efficiently check if the current frame is the starting-or end-point frame of the assigned gesture action. The momentum parameter of the i-th gesture frame, M(i), is defined as where j is the joint index, N is the total number of joints in the human skeleton (in this study, N = 20), d is the dimension index, and D is the dimension size (in this study, D = 3, denoting x-, y-, and z-coordinates), and the index baseframe means the base gesture frame used for comparison by evaluating the difference between the base frame and each of all possible gesture frames (in this study, the base frame denoting a standing-pose frame). Figure 5 illustrates the schematic of the momentum information derived by calculating the active difference between the current gesture frame and the base frame.
The above-mentioned momentum information estimated in the calculation of GAD could also be employed to improve pattern recognition calculations of the Kinect-sensor-based gesture recognition work. In the task of Kinect-sensor-based gesture recognition using the PCAbased Eigen3Dgesture approach, the momentum information could be utilized to enhance the calculation phase of recognition of the test gesture data via the trained eigenspace. The recognition determination of Eq. (1) could be further improved by incorporating the momentum comparison information of the current test gesture data, which is proposed as   Label the f-th frame as the start-frame; /* Estimate the end-frame by adding certain time-duration */ Index of the end-frame = Index of the start-frame + time-duration; break and exit loop; End If End for Acquisition of effective data between the estimated start and end frames; where w i is the designed weight parameter given to the distance item of the test gesture data and the centroid of the i-th classification of active gestures in the Eigen3Dgesture eigenspace; i D is the reestimated distance compared with the conventional distance D i without any momentum information. The parameter w i containing the momentum information is developed in the following,

GAD algorithm with detections of start and end frames (two-cut GAD) Initialize Max-threshold and
where n is set 7 in this study because of seven different categorizations of gestures defined. Figure  6 shows the Kinect-sensor-derived human skeleton containing 3D position information of each of 20 joints for each of these seven actions, which can be used to calculate the momentum information for each gesture. Figure 7 depicts seven specific GAD-derived momentum models trained in advance, M 1 , M 2 ,…, M 7 , each of which denotes the momentum information of the corresponding specific classification of gestures, and the starting-and end-point frames in certain specific type of gesture can be estimated by GAD. Compared with the general Eigen3Dgesture method, the Eigen3Dgesture recognition calculation with additional momentum evaluations would have a more reliable recognition decision result, and the performance of gesture recognition can be largely improved.

State machine-regulated rehabilitation sport scheme composed of a series of Kinectsensor-based gesture recognition with GAD
As mentioned previously, for the sport instructor expert system, a specific rehabilitation exercise was determined in advance. This study formulated an exercise including seven different gestures for the rehabilitation of the aged person as follows: Gesture-1: Chest expanding Gesture-2: Stretching both hands and crouching Gesture-3: Holding up both hands Gesture-4: Right pendulum movement Gesture-5: Left pendulum movement Gesture-6: Stretching the right foot Gesture-7: Stretching the left foot Three different types of state machine scheme, Type-1, Type-2, and Type-3, were developed to regulate the execution of the seven different gestures shown in Figs. 8-10, respectively. Note that in each of three state machines, the node of the state represents the Kinect-sensor-based gesture recognition calculation mentioned in the previous sections. In addition, the state transition of the state machine denotes the result of the gesture recognition estimate: is it correct or incorrect? When the result is correct, the state transition to jump to the next state can then be done. Otherwise, the state transition is forbidden, and the state machine remains in its original state. In Fig. 8, the Type-1 state machine is designed with each state node denoting the recognition task of one of the seven

Gesture-1
Gesture-2 Gesture-3 Gesture-4 Gesture-5 Gesture-6 Gesture-7 471 different gestures. In the scheme of the Type-1 state machine, the user is requested to perform the same gesture twice. When both of the two same gestures are recognized as the correct and standard gestures, the user can then continue carrying out another gesture. Similar to the Type-1 state machine, the Type-2 state machine in Fig. 9 is also developed to perform recognition of one of the seven different gestures in a state. The main difference between Type-1 and Type-2 state machines is that in the domination of the Type-2 state machine the user will be approved to do the next gesture immediately in the situation that only the current gesture performed by the user is correctly recognized. In Fig. 10, three different gestures, Gesture-1, Gesture-2, and Gesture-3, are combined in the first state of the Type-3 state machine, which means that the state transition of the first state can be carried out only when all three gestures are correctly recognized. Observed from the Type-3 state machine in Fig. 10, a similar operation can also be found in the second state including Gesture-4 and Gesture-5 actions and in the last state node containing Gesture-6 and Gesture-7 actions.

Experiments and Results
The experiment with the sport instructor expert system was performed in a laboratory environment. The robot used as the sport instructor in this work was the Type-A Bioloid humanoid robot, which has 18 artificial joints. (15) A Kinect device was deployed near the robot to receive the data on the learner's gestures. For the gesture database for performance evaluation of the presented approaches, a total of five users recorded the gesture data. The gesture database collected by five users (User-A to User-E) was divided into two parts, the training part and the test part. The part of training gesture data composed of 210 gestures, in which each of seven different active gestures, "Chest expanding", "Stretching both hands and crouching", "Holding up both hands", "Right pendulum movement", "Left pendulum movement", "Stretching the right foot", and "Stretching the left foot", was performed 10 times by each three of the five users (User-A, User-B, and User-C), was used to establish the eigenspace of 3D-gesture features; the part of test gesture data composed of 350 gestures was divided into two test databases, Test-DB1 and Test-DB2, for the performance comparison of gesture recognition. The database Test-DB1 contained 210 gestures collected from   each of User-A, User-B, and User-C, in which each user performed the additional seven gestures (10 times per gesture) that was completely different to those in the training phase. The database Test-DB2 contained 140 gestures obtained from two new users (User-D and User-E) that did not appear in the training phase, and each user was requested to carry out each of seven different gestures 10 times. Figure 11 depicts each of the seven different rehabilitation gesture activities that are displayed by the humanoid robot in the sport instructor expert system. Two experiments, Kinect-sensor-based gesture recognition of seven different gestures and state machine recognition for the rehabilitation exercise, were done. Tables 1-4 summarize the recognition performance of Kinect-sensor-based gesture recognition using the Eigen3Dgesture method, the Eigen3Dgesture with one-cut GAD method, the Eigen3Dgesture with two-cut GAD method, and the Eigen3Dgesture with GAD-derived momentum method, respectively. Table 5 compares recognition performances of three different types of state machine for the seven gestures of rehabilitation exercise using four different Kinect-sensor-based gesture recognition methods. Observed from Tables 1-4, Eigen3Dgesture with GAD-derived momentum performs best and has an outstanding recognition rate of 99.14%, which is followed by 98.57% of Eigen3Dgesture with two-cut GAD and 88.57% of Eigen3Dgesture with one-cut GAD. The Eigen3Dgesture method without any GAD estimates has the lowest recognition rate of only 87.71%. It is seen in Table 5 that four recognition methods used in Kinect-sensor-based gesture recognition have different effects   Table 5 Recognition rates of three different types of state machine for the seven gestures of rehabilitation exercise using Kinect-sensor-based gesture recognition of Eigen3Dgesture, Eigen3Dgesture with one-cut GAD, Eigen3Dgesture with two-cut GAD, and Eigen3Dgesture with GAD-derived momentum.
Method of Kinect-sensor-based gesture recognition on the recognition performance of the seven gestures of rehabilitation exercise by state machines. In addition, three different types of state machine for regulating the learner's rehabilitation exercise also have different performances. Kinect-sensor-based gesture recognition by Eigen3Dgesture with two-cut GAD or Eigen3Dgesture with GAD-derived momentum will lead to good learning in the rehabilitation exercise. In addition, among three different state machines, the Type-3 state machine performs best, which has a satisfactory averaged recognition accuracy of 96.1%. Both Type-1 and Type-2 state machines also have good recognition performances. From all these experimental results, the sport instructor robot system incorporating Kinect-sensor-based gesture recognition presented in this study was able to achieve a high performance for the exercise training of a person.

Conclusions
In this paper, a sport instructor robot expert system incorporating Kinect-sensor-based gesture recognition was developed for rehabilitation and exercise training of the elderly. A GAD scheme for enhancing Kinect-sensor-based gesture recognition was proposed. In addition, three different types of state machine for formulating certain rehabilitation exercises in the sport instructor expert system were also presented.