Fast Estimation of Pedestrian Movement

In this study, a single camera has been used to capture images of the road in front of a moving vehicle. Image processing algorithms identify the location of pedestrians in the images, calculate the direction and rate of movement of each one, and issue safety warnings. The pedestrian motion vector detection and warning system presented here has three components: The first serves to identify, locate, and mark the positions of pedestrians in images using the Fisher classifier. The follow-up image processing is confined to the labeled images, and this significantly reduces the image postprocessing load. The second component involves the calculation of pedestrian motion vectors using the Lucas–Kanade optical flow method. Finally, the vehicle’s future zone of movement is established, and judgment is made as to whether any pedestrians will be in this zone or not. This is determined from the pedestrian movement vectors and a warning can be triggered to alert the driver in time for any necessary avoidance action to be taken. This can improve road safety and reduce the number of accidents involving pedestrians, which result from driver fatigue, negligence or careless driving, and the number of such accidents can be reduced.


Introduction
According to a World Health Organization (WHO) report in 2013, more than 270000 pedestrians lose their lives on the world's roads each year, accounting for 22% of a total of 1.24 million road accident deaths. WHO has called on governments and automobile companies to take concrete action to improve pedestrian safety. In recent years, many advanced driver assistance systems (ADAS) have been developed for vehicles to improve both driving and safety. Safety features include collision avoidance technologies that alert the driver to potential problems, or take action to avoid collisions by implementing safeguards or even taking control of the vehicle. Some features of ADAS, including a pedestrian protection system, (1)(2)(3) a lane departure warning system, (4,5) a blind spot monitor, (6) automatic parking, and driver drowsiness detection, (7,8) have become the focus of accident avoidance. In these studies, the acquisition of information about the traffic environment is obtained by a computerized network of cameras. In some cases, sensing elements such as radar, ultrasound, and infrared have also been used. For example, Mercedes-Benz has developed the "PRE-SAFE with Pedestrian Recognition System", which uses radar and stereoscopic cameras to determine the direction in which pedestrians are moving to avoid collisions. Volvo has proposed a "Pedestrian Detection with Full Auto Brake (PDFAB)" for 2020, which will enhance driving safety and achieve a "zero accident" vision. The PDFAB system will use computer-vision algorithms to identify pedestrians 80 cm in height or more, who are within 50 m of the front of the vehicle and within a 45° fan-shaped area. Auxiliary radar will measure the distance and detect the direction of the pedestrian's movement. Mazda announced their i-ACTIVSENSE security system in November 2012. This can automatically detect obstacles in front of the vehicle (called Forward Obstruction Warning, FOW), and uses warning lights and audio signals to alert the driver to ensure early reaction. In November 2013, Ford announced a brand new obstacle avoidance system (OAS). Should a vehicle in the front, or to the side, suddenly brake or change direction as a reaction to an emergency, the driver will be warned. If the driver does not take immediate action, the OAS will actively intervene by braking or steering the car to avoid the obstacles and prevent a collision.
Pedestrians and obstacles are the most important factors in traffic accidents. Lin et al. (9) proposed a monocular-vision-based obstacle detection algorithm for parking assistance applications in an advanced safety vehicle that uses a rear view camera. The corner features of rear obstacles are first estimated using features that presented in accelerated segment test (FAST) corner detection method. (10) An inverse perspective mapping (IPM) image is then used to determine whether all the detected features are obstacle candidates. This system is only usable in typical urban parking lots.
Langer and Jochem (11) described an integrated millimeter wave radar and vision sensor system for autonomous on-road navigation. It has a range of approximately 200 m and uses a linear array of receivers and wave front reconstruction to compute the range and bearings of objects within the field of view. It is integrated with a vision-based lane-keeping system to accurately detect and classify obstacles with respect to the danger they pose to the vehicle and to execute any required avoidance maneuvers. The combination of radar sensors and visual detection allows road obstacles to be more accurately identified, but also results in higher cost.
Keller and Dang (12) presented a novel active pedestrian safety system that combines sensing, situation analysis, decision making, and vehicle control. The sensing component is based on stereo vision, and it fuses two complementary approaches for added robustness: (1) motion-based object detection and (2) pedestrian recognition. The highlight of the system is its ability to decide whether it will perform automatic braking or evasive steering, and reliably execute this maneuver at relatively high vehicle speed (up to 50 km/h). In this study, a stereoscopic camera is used to obtain 6D-Vision information, which is a combination of 3D information with additional information about the movement of the object(s). The 3D position is accompanied by a series of 3D positions of the object, which allows tracking. At the same time, the Kalman Filter, which combines space and time messages, can provide an estimate of the 3D motion path. The disadvantages of this system are the high cost of the stereoscopic visual sensors and the time-consuming calculations.
Alonso et al. (13) have proposed a safety assistant system for dynamically monitoring overtaking from the rear of a vehicle. Rear road status information from side mirror cameras mounted outside the vehicle is acquired, and calculations are made of the dynamic driving status using the Lucas-Kanade optical flow method. In addition, the author has proposed a template filtering and compensation perspective formula method, which can quickly obtain vehicle position. Although it is not as accurate as inverse perspective transformation, it has the advantage of fast execution.
In these previous studies, the use of image processing technology to identify objects (including pedestrians) is consistent, and a combination with radar is often used to accurately calculate object distance. However, the use of multiple cameras to build stereoscopic or panoramic images incurs substantial equipment cost and also increases the amount of visual algorithm computation. In our study, a single camera has been used to capture road images and there was no need for enormous amounts of stereoscopic image data processing. The system's real-time processing capability was enabled and the processing load was much lower because an oversized n × n detection window was not used and only specific image areas, regions of interest (ROI), were processed. The pedestrian motion vector detection and warning system presented here has three components: The first serves to identify, locate and mark the positions of pedestrians in images using the Fisher classifier. The follow-up image processing is confined to the labeled images, and this significantly reduces the image postprocessing load. In the second part, the Lucas-Kanade optical flow method was used to calculate the pedestrian movement vectors so that their forward moving rate and direction could be determined. The last step is the establishment of the area into which the vehicle will move in the future. Judgment is then made as to whether a pedestrian will move into the driving area of the vehicle or not. A signal can then be generated to alert the driver to such an eventuality.

Pedestrian detection
In our previous work, the pedestrian detection procedure is accelerated using a two-layer cascade of classifiers. (14) At the front end, the Fisher classifier using Haar-like features (15)(16)(17) can rapidly select candidate regions in the image where pedestrians may be present. At the back end, the Fisher classifier using a covariance matrix descriptor (18) can accurately determine if pedestrians are positioned in the candidate regions. If a region is determined as positive by the two-layer cascade classifiers, images that might include pedestrians are captured and delineated by a rectangular frame. Image processing is only carried out on the marked area and the computational load is kept to a minimum.

Pedestrian corner features
After a pedestrian has been identified by a frame, the Harris corner detection algorithm (19) is applied to find the corners of the pedestrian's contour. It tests each pixel in the rectangular frame that contains a pedestrian image to see if a corner is present. Harris corner detection is developed based on the Moravec algorithm. The basic principle is taking a small window of size n × n centering on a target pixel, moving the window separately along every 45 degrees (horizontal, vertical, and on the two diagonals), and calculating the gradation change of the window in eight directions. The degree of gradation change between two nearby windows in the image I that centered on (u, v) and (u + ∆x, v + ∆y) respectively, is then measured, and denoted in Eq. (1).
is a weighted function, and a 2D Gaussian function is generally adopted. Next, I(u + ∆x, v + ∆y) can be approximated by a Taylor expansion, and Eq. (1) can be rearranged as (3) I x and I y are the partial derivatives of I in x-and y-directions, respectively. A corner is characterized by a large variation of S(∆x, ∆y) u,v in all directions, which means that the matrix A will have two large eigenvalues λ 1 and λ 2 .
Since the calculation of the eigenvalues of matrix A in Eq. (3) is computationally expensive, Harris and Stephens (19) suggest Eq. (4) to verify a corner and make the computation more efficient. The algorithm does not have to actually compute the eigenvalue decomposition of the matrix A; it is sufficient to evaluate the determinant and the trace of A to find a corner. M c is positive in the corner region, negative in the edge regions, and small in the flat region.
Here, the value selection of κ will affect the composition of corners, and the empirical values are generally between 0.04 and 0.06. Corner detection is only carried out for the area within the rectangular frame, in other words, the region with pedestrians. As for the images of the contour area, the horizontal and vertical gradient values I x and I y , respectively, are higher. To speed up the corner feature calculations, corner detection is concentrated mainly on the image area with higher I x and I y values. The results of the calculations are corner characteristics that are distributed over the head, hands, body, and feet, as shown in Fig. 1(a). Since the hands and feet move back and forth rather much as a person walks or runs and the magnitude and direction of their movement are not consistent, they are not representative of a pedestrian moving at a steady speed. The head and body swing less and their movement vectors are a better choice for calculation. Again, significant computation time can be saved when calculation is confined to the characteristic data of the head alone. Therefore, in the actual tests, we only calculated the motion vectors using corner characteristic data from a quarter of the upper part of the rectangular frame, as shown in Figs. 1(b) and 1(c).

Optical flow algorithm
After selecting the corner characteristics of the pedestrian's head contour to represent their movement characteristics, the Lucas-Kanade optical flow algorithm (20,21) was used to calculate the movement vectors of the corner features. The Lucas-Kanade method can be used with two consecutive image frames, I(t) and I(t + 1), to calculate the movement vectors of an object based on the following three assumptions. Assumption 1: There is no marked difference in the brightness of the two consecutive images. For example, if one image was acquired in bright sunlight and the other in the shade, the variation of brightness may be too large for a vector calculation. Assumption 2: The distance moved between successive images cannot be too large. Where a fixed n × n detection window is used, the object must remain within the window to be tracked. If motion displacement takes the object outside the window area, tracking will fail. Assumption 3: The detected pixels and the adjacent pixels move in the same direction.
Thus, the Lucas-Kanade optical flow method is assumed to hold for all pixels within an n × n window centered at p in image I(t) for calculating pedestrian movement. The local image flow where q 1 , q 2 , ..., q n 2 are the pixels inside the n × n window. I x (q i ), I y (q i ), and I t (q i ) are the partial derivatives of the image I in the x-direction, y-direction, and time t, evaluated at the point q i and at the current time. In practice, the weight w i is usually set to a Gaussian function of the distance between q i and p. However, if the pedestrian moves too fast, they might be outside the n × n window by the time the next image is captured. A larger window is one solution but that would increase computation by O(n 2 ). The pyramid Lucas-Kanade optical flow method (22) shrinks the image into several layers where the width and height of each layer is half that of the preceding one. The same n × n window can detect the characteristic displacement values in the shrunken images. Assuming that the shrunken width/height of each image is half that of the original, and the original image is layer 0, the displacement velocity of each layer is as shown in Eq. ). Next, use the residual equation as Eq. (7) to determine if the tracking of the characteristic point p is valid. The residual formula calculates such characteristic point differences (including the neighboring area of n × n) between successive images, performing a similar calculation. A low residual differential value indicates that the characteristic point of the next movement can be tracked in the next image. This means that the calculated optical flow value d L is correct. Ideally, the residual values should approach zero. On the other hand, larger residual values are an indication that the characteristic points have not been correctly tracked. In such a case, the calculated optical flow value means nothing. If we cannot correctly track the characteristic points because the pedestrian has moved too fast or has moved out of the n × n window, the remedy would be to shrink the images and recalculate the optical flow and residual differential values, using the results to decide if the optical flow values are meaningful. The images should be repeatedly shrunk until the displacements of the characteristic points can be tracked. The optical values d L of the characteristic points from I L images have been successfully tracked when the calculations show sufficiently small residual differential values. We can then use Eq. (6) to get d 0 , and in turn obtain the movement vectors in image I 0 . Of course, it may be possible that the objects move so fast so that correct optical flow values cannot be obtained from the I Lmax images. In our study, we have chosen a setting of L max = 2 and consider this to be a reasonable moving speed suitable for normal pedestrian tracking.

Focus expansion
The cameras used in this study were mounted on the front windshield of the vehicles and commanded an unobstructed view of the road ahead. However, as a vehicle moves forward, focal expansion takes place (as shown in Fig. 2). Objects such as trees, buildings, posts, and other immovable objects expand from the focal point and move outwards. In other words, even pedestrians who are standing still will appear to move in successive frames. This is reflected by changes in the coordinates, which are dependent on the relative speed of the vehicle v c . Particularly, the optical flow v f acquired from the images does not represent actual pedestrian movement vectors. The real pedestrian movement vectors should be v p = v f − v c . The pedestrian shown in Fig. 3 is standing still, and when the vehicle travels forward at a speed v c , the (x 1 , y 1 ) 719 position coordinate of the pedestrian from I(t) images is as shown in Fig. 3(a), where the position coordinate in the I(t + ∆t) image moves to (x 2 , y 2 ), as shown in Fig. 3(b). During this time (∆t), the coordinate changes in the subsequent images are due to the forward movement of the vehicle at speed v c .

Vehicle Driving Areas
The lane model plays a very important role in our pedestrian warning system, as shown in Fig.  4. As already mentioned, a single CCD camera was installed behind the front windshield of the vehicle to capture images of the road ahead. In the image, the horizontal axis (u) represents the direction of deviation (left or right) and the vertical axis (v) represents the driving direction of the car.
We can use road configurations and settings as well as driving habits (for example, some drivers tend to favor the right or the left side of the lane) to analyze the statistical results and construct a lane model. The lane model describes possible paths that might be taken by a vehicle. This lane model is constructed using the location of the lane markings on both the left and right sides, namely, u li and u ri . A covariance matrix that describes the potential range of fluctuation is also used in the lane model.
We referred to the specifications for main roads in Taiwan (23) and other statistical data relating to road design and used five parameters to describe the locations of the lane markings in the image: (24) (1) the camera inclination angle α, which is the angle of the camera as installed in the car; (2) the vehicle steering angle ψ; (3) the lateral curve C l ; (4) the road width L; and (5) vehicles on the carriageway in the transverse position x 0 , as shown in Fig. 5   Mean values taken for the five parameters from Table 1 were input into Eq. (8) where γ = 1. We then input v i (i = 1, ..., N) into the equation to obtain N number of u i (at this time, u li = u i ). These are scalar values, and a collection of N number of scalar-valued parameters is known as the parameter vector X = [u 1 , u 2 , ..., u N ] 1×N . The parameter vector X describes the average location of the lane markings in a typical road image, that is, the location where the lane markings most commonly appear. We then find the first-order derivative for these five parameters to obtain the Jacobian matrix, as shown in Eq. (9).
We multiply the Jacobian matrix obtained in Eq. (9) by the C p matrix, where C p is the diagonal matrix obtained from the standard deviations of the five parameters listed in Table 1. A covariance matrix C x (an error covariance matrix) can be obtained, as shown in Eq. (10).
Lastly, we construct our lane model using the parameter vector X and the covariance matrix C x , as shown in Fig. 4. u li and u ri are calculated on the basis of Eq. (10), the parameter vector X. u li − σ li , u li + σ li , u ri − σ ri , and u ri + σ ri are calculated on the basis of the diagonal elements σ of the covariance matrix C x and the parameter vector X. We define the range enclosed by u li − σ li and u ri + σ ri as a dangerous zone for pedestrians through which a car might drive.
The pedestrian collision warning system will alert the driver when a pedestrian is within the danger zone, or if a previously obtained pedestrian movement vector shows them to be heading for Table 1 Lane location parameters. the danger zone, as shown in Fig. 6(a). When the vector indicates movement away from the danger zone, the path ahead of the moving vehicle is considered to be safe, as shown in Fig. 6(b).

Experimental Results and Discussion
In this article, the tests have been performed mainly on urban roads during daytime and using a single camera mounted in the center of the dash in front of the windshield of an ordinary car to capture images of the road ahead. The specifications of the camera used in this study are listed in Table 2. The experimental results and analysis are presented in three parts: pedestrian detection, pedestrian movement vector, and danger zone warning.

Pedestrian detection
The pedestrian image database was taken from The National Institute for Research in Computer Science and Control (INRIA), (25)    The advantage of using Haar-like characteristics for identification is its fast computation speed. It can very quickly search for areas that may contain pedestrians. The experimental results show that smaller true negative values correspond to lower false pedestrian identification. Relatively false positive corresponds to almost 50% of false judgments. In other words, nonpedestrians could be mistaken as pedestrians. To reduce the number of misidentifications, we used cascaded frontand back-end classifiers to reidentify the image areas filtered by the front-end classifier. The use of the back-end classifier significantly reduced the false positive ratio. Incidentally, the images used for testing included both stationary and moving pedestrians. Some experimental results obtained with this pedestrian detection system are presented in Fig. 7.

Pedestrian movement flow
After the pedestrian positions have been delineated by rectangular frames, the follow-up image processing algorithms need only take care of the upper quarter of these frames, i.e., the ROI, and the amount of computation is relatively small. The sequence of ROI computation is as follows: To start with, Harris corner detection is used to find the corner feature points of the head outline. Next, the movement vectors v f are calculated from the corner point data using the pyramid optical flow method. The vehicle forward moving speed v c is then used to correct for focal expansion and to determine the actual movement vectors v p of the pedestrian(s). In our experiments, the vehicle forward moving speed v c is limited at 40 km/h for the consideration of testing safety. Some experimental results of the pedestrian movement estimation are shown in Fig. 8, and the pedestrian movement vectors are indicated in the figure. In Fig. 9, the danger zone in the vehicle driving forward path is specified on the basis of the lane model. When a pedestrian is within the danger zone, or the pedestrian motion vectors indicate possible movement into the danger zone, the system will alert the driver to this eventuality by generating a visual or audible driver warning.

Discussion
In this study, we have used a number of available technologies to devise a simple ADAS system to improve the safety of pedestrians and the drivers of motor vehicles, more particularly passenger cars. Pedestrians are often very seriously hurt in road accidents, and in this study, we focused on protection and the prevention of such eventuality. Computer visual technology is widely used nowadays for all types of object identification, but the differences between pedestrians and rigid, mostly immovable objects, have been difficult to determine on the basis of singular characteristic data. Most previous studies used multiple characteristics to identify a pedestrian, and this approach makes it necessary to use a huge database to train the learning classifiers. The image algorithms consume a correspondingly large amount of computing power and time. The pedestrian identification algorithms presented in our study consist of two cascaded classifiers. The front end uses singular Haar-like characteristic data for the rapid identification of possible pedestrians. Although the back-end classifier needs to take care of multi-characteristic data to make classification decisions, the computational load is quite light because only image areas that have been prefiltered by the front-end classifier are processed. For the calculation of  pedestrian movement vectors, only the corner feature points of the heads of pedestrians are used, which provides overall moving velocities and directions. This is both effective and workable. This method simplifies image processing computation on the premise that the effectiveness of identification shall not be adversely affected in any way. However, each image shown in Table 3 would take about 0.5 s to process using a PC with an Intel Core i5 CPU and 4 GB RAM executing the algorithms coded with Matlab. Even if the requirements of a real-time operation could be satisfactorily carried out using hardware circuitry, the cost will still be relatively high. We will continue our efforts to find even simpler image algorithm alternatives and also investigate the use of parallel processing.

Conclusions
In this study, we utilized a visual image processing method to estimate pedestrian movement vectors and issue anticollision warnings. Two steps are involved, pedestrian detection and pedestrian movement vector estimation, each of which could be established as independent modules. Pedestrian identification is made by two-layered cascaded classifiers. The front-end classifier uses Harr-like characteristic data for rapid classification and the back-end classifier uses covariance descriptor characteristics to reclassify the prefiltered image zone and reach correct identification with 96% certainty. In addition, the postprocessing algorithm of pedestrian movement vectors focuses only on the ROI, and this raises the system processing speed significantly. The pyramid movement vector algorithm is even more efficient for estimating pedestrian movement. Since the speed at which pedestrians move is not very high, a twolayer pyramid optical flow algorithm and 5 × 5 tracking window are sufficient. The pedestrian movement vector estimation and safety warning system designs easily cope with the real-time processing applications. The design of this visual image processing system takes both quantitative calculations and time into consideration.