Simulating Feasibility Assessment of the Golf Swing by Kinect Motion-Sensing Photography

Golf swing analysis is a popular research subject, and is divided into contact sensing and non-contact sensing analysis according to different research methods. Contact sensing research often uses wearable sensors, including triaxial accelerometers, gyroscopes, and gravity accelerators, which are mounted on the golfer’s limbs or relevant equipment. This study aims to construct a teaching system for correcting swing action using non-contact sensing Kinect motion-sensing photo equipment. The proposed system is expected to collect a golfer’s pose and action during the backswing for data analysis, and detect incorrect backswing actions in order to attain the goal of interactive golf teaching.


Introduction
Golf is a popular recreational sport in Europe and North America, and in recent years, prevails throughout Asia. The majority of the golfing population is in the U.S. and Japan, with 26 million and 17 million players, respectively. (1) According to the statistics of the Control Yuan in Taiwan, golf has been listed as a university sports course in Taiwan for 20 years. At present, 112 colleges in Taiwan offer golf courses, and golf will be a title event of the Olympic Games in 2016. (2) However, golf is a challenging game, as it is difficult to strike the ball accurately so that the ball flies high and far in the right direction. Such misplay is likely to frustrate learning.
Cooper and Mather measured the maximum angular rate of golfers' swings as a function of time and constructed a swing mechanics model for different golfers, which could be applied as the standard for a correct swing pose and improve the golfers' swing based on the data. (3) However, swing data differ due to different golfer features; thus, the established swing standard is inapplicable to most people. Reyes and Mittendorf constructed a new golf analysis model, which is a biomechanical model built according to the biological features of specific golfers. (4) The relevant parameters, including length, weight, and area of the club, are adjusted by this model to lower the golf learning threshold. This study aims to construct a teaching system for correcting swing action using non-contact sensing Kinect motion-sensing photo equipment. The proposed system is expected to collect a golfer's pose and action during the backswing for data analysis, and detect incorrect backswing actions to attain the goal of interactive golf teaching. Wu et al. indicated that the golf swing is a sophisticated multiple-joint involved motion, which requires high stability and accuracy. (5) Ghasemzadeh et al. discussed that wrist rotation actions include the forward movement of the left knee, turning of buttocks and spine, and movement of shoulder joints. (6,7) Moreover, the strength received by the feet and wrists is varied with the swing phases. Pink et al. divided the golf swing motion into six phases (Fig. 1), including take away, forward swing, acceleration, early follow through, and late follow through. (8) The backswing begins with the feet leading the body to act, namely, the action of the lower body causes the center of body weight to shift to the right foot before the body action of backswing to the top. (9) Faldo, Simmon and Foston reported that the center of gravity gradually shifted to the right during the backswing, and when the top of the backswing was reached, about 90% of body weight was on the right foot, and specifically, it should be on the inside of the right foot. (10,11) Golf swing analysis is a popular research subject, and it is divided into contact sensing and non-contact sensing according to different research methods. (12,13) Contact sensing often employs wearable sensor devices, including triaxial accelerometers, gyroscopes, and gravity accelerators, which are mounted on golfers' limbs or relevant equipment. The devices obtain an individual golfer's values during the swing, which are analyzed by many researchers. (6,(14)(15)(16) In contrast, noncontact sensing uses pictures or videos to record a golfer's actions during the swing, and the actions are analyzed.

Motion Analysis of Golf Swing
Blake and Grundy used motion capture to analyze the golf swing. (17) This system builds a complete 3D motion model using data from sensing patches on a golfer's body, and the accuracy of motion is obtained by analyzing the 3D model. However, wearing multiple sensing elements influences a golfer's swing, meaning the golfer cannot swing freely. Somjarod et al. recorded 30 male professional and amateur golfers' swing poses in videos. (18) This study analyzed the variances in the knees of the golfers during the period of the swing, using the video data, and observed the appropriate knee angle and direction of displacement. Karliga and Hwang used the video captured by one camera to reconstruct the 3D action of a golf swing and analyzed a 3D model of the body to determine the pose variations. (19) Gehrig et al. used the video of swing path to analyze the club head path and constructed a system to determine the result of the golfer's swing. (20) Fig. 1. Golf swing motion as described by Pink (1993). (8) Sensors and Materials, Vol. 28, No. 6 (2016) 727

Kinect Action Sensing and Motion-Sensing Photography
The Kinect sensor is required to be placed at a sensible distance from the golfer, which is generally 1.2 to 3.5 m. Figure 2 illustrates a Microsoft Kinect sensor. It has three lenses; the middle lens is an RGB color camera for recording color images, while the left and right lenses are the 3D depth sensors formed from an infrared (IR) transmitter and an IR complementary metaloxide-semiconductor (CMOS) camera. The Kinect detects the user's action depending on the 3D depth sensor.
As shown in Table 1, the maximum resolution of the Kinect color camera is 1280 × 960, and the maximum resolution of the IR camera is 640 × 480. The Kinect is combined with followfocus technology, where the base motor rotates as the focused object moves. Kinect can capture two images: a color image, and a depth image. The color image is captured by the middle RGB color camera; the depth image information is captured by the right camera, and the leftmost IR camera emits an invisible laser. Different depths are determined by an algorithm; and the image information is derived from the feedback depth information. As the RGB color camera and IR CMOS camera have horizontal parallax, the color image and depth image captured inside the program have only a small overlapping area that is not completely coincident. Therefore, there must be an additional coordinate mapping function for processing image differences. Kinect uses light coding technology to obtain the image depth. Light coding technology uses IR light to emit an invisible class 1 laser light, and the diffuser in front of the lens uniformly projects the laser light within the measurement space. The laser speckle calibrates the specific space. When the laser light shoots a rough object, random speckles are formed, where the speckles in any two places within a space are of different patterns, and vary with the distance. Therefore, these speckles are equivalent to marks on the overall space. To measure a space, the reference plane must be taken out to a certain distance, then the IR camera records all speckles on each plane. The positioning information for the overall space can be obtained from the laser speckles in random shapes. When the original data are captured, they are calculated by the chip as an image with 3D depth. The light colors in depth image streaming are close to the range sensor, while the deep colors are far from the range sensor (Fig. 3).
The Kinect is the origin of the 3D world coordinate system, and the z-axis is the positive direction. A point p(x, y, z) on the surface of an object in the 3D scene is projected to a point P(X, Y) on the 2D focal plane. Their relation can be inferred from reference 18 by Eq. (1): Since the focal length f of a Kinect camera is 22.5-585 mm, it can be determined from Eq. (2): The Python code below illustrates how a 3D point can be computed from the pixel coordinates and the depth value: fx = 525.0 # focal length x fy = 525.0 # focal length y cx = 319.5 # optical center x cy = 239.5 # optical center y factor = 5000 # for the 16-bit PNG files # OR: factor = 1 # for the 32-bit float images in the ROS bag files for v in range(depth_image.height): for u in range(depth_image.width): Z = depth_image[v,u] / factor; X = (u − cx) * Z / fx; Y = (v − cy) * Z / fy; // transform (x,y,z) to (xw,yw,zw) Eigen::Vector4d w = Tk * Eigen::Vector4d(x, y, z, 1); xw = w(0); yw = w(1); zw = w(2); The depth image information obtained by Kinect is converted into a skeleton tracking system. Just as every joint in the human skeleton has a unique name, Software Development Kit (SDK) designates a nodename for access (Fig. 4). The Kinect action sensing side obtains the human skeleton information at a refresh frequency of 30 frames per second, and the information obtained is transmitted via the AUX side to the universal serial bus (USB) connection line of the Kinect processing element, which is located in the central motion information processing side. As the refresh frequency is 30 frames per second, there will be a 33 ms error time between joints at each time. When the user's body information is identified and captured, the system integrates the information into a skeleton table. This skeleton tracking system can simultaneously track 6 human bodies and identify two body actions. For each human body, 20 joints can be recorded, including the trunk and limbs. (20)

System Planning and Design
Before the swing motion, the golfer stands within an appropriate range for the Kinect action sensing terminal. The action sensing end is in charge of capturing the color image and depth image from the visual range and transmitting the two data streams, via USB interface to the Kinect processing element of the central motion information processing end. The unit uses Kinect SDK to convert the depth image information into a human skeleton structure, which is sent to the logical processing unit. The motion tracking technology of Kinect is different from the previous 2D operating mode. The remote control and touch panel operating modes are 2D (they only capture the spatial position of the target, i.e., x-axis and y-axis), whereas the motion tracking technology not only captures the spatial position (x-axis and y-axis) and color of the target, but also captures the depth (z-axis), range, distance, and the surroundings of the target. Consequently, it is also known as the 3D operating mode.

Microsoft Kinect starts and sets skeleton detection
The software and hardware for developing a Kinect application are described as follows. The Microsoft Windows 7 operating system must be used, as the software operating environment requires a 32-bit version, although 64-bit is acceptable. Regarding the hardware, there must be a central processing unit (CPU) above a dual-core of 2.66 GHz, random-access memory (RAM) above 2 GB, a video adapter supporting DirectX 9.0c or higher, and a Kinect sensor. The software must be Visual Studio 2010 or Visual C# 2010 Express to compose the program code, the .NET Framework 4.0 environment, and Kinect SDK for Windows. As skeleton detection is used, Microsoft DirectX 9 SDK-June 2010 and Runtime for Microsoft DirectX 9 must be installed. A simulated evaluation system use built on the C# program of the .Net framework, Kinect for Windows SDK (Fig. 5

Acquisition of joints
When a golfer enters the Kinect action sensing end range, the skeleton joint data are transmitted to the Kinect processing element. The Kinect processing element confirms that the golfer's joints are tracked, and the skeleton joint data are sent to the logical processing unit. The Kinect motionsensing photo equipment captures the information from the human skeleton in three steps: (1) the human body block is marked by the first 3 bits of depth image; (2) the human body block is separated into 31 parts for calculating skeleton coordinates; and (3)

Judgment rules
The x-axis and y-axis of various nodes in the Kinect human joint information are identified. Before the identification, the golfer must revert to a preparatory period. Taking a right-handed golfer as an example, as the golf swing moves the center of gravity to the inside of the left leg, the system establishes the X and Y thresholds according to the human joints of the left shoulder and left elbow in the preparatory period, where the X threshold takes the left shoulder as a reference point, and the Y threshold takes the left elbow as a reference point. The golfer's backswing is to the top clockwise from the X reference point. The standard stance of the backswing is that the width of the feet is identical to the width of shoulders, and the shoulders drive the backswing. Regarding the club head, both hands and the trunk turn to the upper right during the backswing. In the ideal motion, the waist turns 45°, the shoulders turn 90°, the buttocks turn 45°, and the difference of 45° generates the torque of the upper body. Thus, the body turns to the top of the backswing as the main strength of the downswing. The judgment process is described as follows: STEP 1: Judge whether Kinect human joint information and triaxial sense information exist or not; if not, the complete data set must be received before startup. The overall judgment process is started when the two data sets are given. STEP 2: According to the swing period being evaluated, the 20 human joints imported by Kinect are matched with the pose logic data. STEP 3: If any sense data mismatches the pose logic, a warning message is given; if all sense data are identified, the judgment mechanism returns to the origin, receiving data continuously, and pose judgment is continuously repeated. The judgment rules are described, as follows: (1) Address (i) Judgment 1: left shoulder higher than right shoulder The SHOULDER_LEFT and SHOULDER_RIGHT joints provided by Microsoft Kinect human joints are used for detection. When the Y value of SHOULDER_LEFT is smaller than the Y value of SHOULDER_RIGHT, it means the golfer's left shoulder is lower than the right shoulder. The discriminant is expressed as where TH 1 is the system set threshold. (ii) Judgment 2: spacing between feet In terms of golf swing stance, the width between the feet shall be identical with the shoulder width. The SHOULDER_LEFT, SHOULDER_RIGHT, FOOT_LEFT, and FOOT_RIGHT joints provided by Microsoft Kinect human joints are used for detection. When the distance between the X value of FOOT_RIGHT and the X value of FOOT_ LEFT is greater than the distance between the X value of SHOULDER_RIGHT and the X value of SHOULDER_LEFT, it means the golfer's stance is too wide. The discriminant is expressed as where TH 2 is the system set threshold.
where TH 31 and TH 32 are the system set threshold. (ii) Judgment 4: whether right elbow is raised or not The correlation among the SHOULDER_RIGHT, ELBOW_RIGHT, and HAND_ RIGHT joints provided by Kinect human joints is used for judgment. The discriminant is expressed as If |Y SHOULDER_RIGHT -Y ELBOW_RIGHT | < TH 41 AND |Y HAND_RIGHT -Y ELBOW_RIGHT | < TH 42 , EP 4 = 1, Otherwise EP 4 = 0, where TH 42 is the system set threshold. If (Y start -Y end ) ≥ TH 51 , EP 5 = 1, Otherwise EP 5 = 0, where TH 51 is the system set threshold. (ii) Judgment 6: backswing amplitude The body shifts if the backswing amplitude is too large; the X value of SHOULDER_ CENTER and HIP_CENTER joints provided by Kinect human joints is used for judgment. The discriminant is expressed as Eq. (8).

Simulation Results
This section presents the simulation of the golf swing by Kinect. Figure 6 illustrates a golfer's swing as detected by Kinect. A golf swing motion is separated into six phases, including take away, forward swing, acceleration, early follow through, and late follow through. A Kinect sensor can catch and record this action. Figure 7 depicts an implemented golf swing by Kinect motion-sensing   photography. Furthermore, the action data detected for a golfer will be packaged in an XML format, which is easy to integrate with other services: the XML format is illustrated in Table 2.

Conclusion
Golf is a challenging game, as it is difficult to strike the ball accurately so that it flies high and far in the right direction. Such misplay is likely to frustrate learning. Golf-specific exercise drills can be prescribed to a golfer to improve the swing efficiency. Golf swing analysis is a popular research subject, and it is divided into contact sensing and non-contact sensing, according to different research methods. This study aims to construct a teaching system for correcting a swing using non-contact sensing Kinect motion-sensing photo equipment. The proposed system is expected to collect the golfer's pose and action during the backswing for data analysis and to detect incorrect backswing actions to attain the goal of interactive golf teaching.
Before the swing, the golfer stands within an appropriate range of the Kinect action sensing terminal. The action sensing end is in charge of capturing the color image and depth image from the visual range and transmitting the two data streams via USB interface to the Kinect processing element of the central motion information processing end. When the golfer enters the Kinect action sensing end range, the skeleton joint data are transmitted to the Kinect processing element. The Kinect processing element confirms that the golfer's joints are tracked, and the skeleton joint data are sent to the logical processing unit.
Developed to utilize the motion capture advantages of the Kinect, we can use the data coordinates from the Kinect to determine a golfer's exact body measurements and posture. The Kinect motion-sensing system creates an accurate animation of the swing that can be viewed from any angle, as well as exact data on anybody segment throughout the swing. The golfer may use this feedback to enhance the feedback for the movement and learning of the skill is accelerated. This information provides one efficient way for golfers to swing based on what they can do physically. Using a Kinect motion-sensing system can determine a golfer's sequence of movement through a swing.