Light Field Acquisition Method Based on Depth Sampling

1Engineering Research Center of Spatial Information Technology, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China 2Key Lab of 3D Information Acquisition and Application, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China 3Beijing Imaging Technology Innovation Center, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China


Introduction
Light field imaging is a technique that first acquires an image and then recomputes it using algorithms. (1,2) The imaging process includes two parts: the acquisition of the light field and the processing of the light field data. Light field imaging technology captures the 4D parameter values of light fields and subaperture images from multiple angles in a single exposure, which makes it interesting for applications in depth extraction, 3D modeling, and virtual reality. The results of light field acquisition directly determine whether this technology will have wide application, and much research has been conducted on how to acquire a light field. Adelson and Bergen proposed a 7D all-optical function P(x,y,z,θ,φ,λ,t) to characterize the geometric distribution of light in space. (3) Considering some invariant properties of light harvesting, Levoy and Hanrahan reduced the all-optical function to four dimensions (x,y,z,θ,φ) and introduced the idea of reference planes to parametrically characterize the 4D light field by the intersection of light rays with two parallel planes. The reference planes are a parametric characterization of the 4D light field in terms of (u,v,s,t), where (u,v) and (s,t) denote the intersections of the ray and two planes. (4) This parametric characterization method provides the theoretical basis for a light field acquisition approach. Accordingly, the light field acquisition process is in fact the process of obtaining the intersection of light and two reference planes, and the location of the reference planes determines the type of light field acquisition equipment required. There are two main types of existing light field acquisition device: one type records light similarly to a camera array by placing a reference plane on the object side and the other type records light similarly to an alloptical camera by placing a reference plane on the image side. The camera array consists of multiple conventional cameras, (5,6) as shown in Fig. 1, which form a virtual projection reference plane consisting of multiple lens projection centers and a virtual imaging plane consisting of multiple CMOS sensors. The camera light field is collected by acquiring the intensity of light radiation from a point in the target scene as seen from different perspectives, and the images taken by each camera can be regarded as sampled images from different angles of the light field. (7)(8)(9) An all-optical camera consists of a main lens, a microlens array, and an imaging sensor, and the microlens array is placed before the sensor (10,11) as shown in Fig. 2, where the camera forms two reference planes, a microlens array, and a CMOS sensor. The light field is collected by capturing the angular distribution of light at the main lens through individual microlenses.
The two types of light field acquisition device mentioned above operate mainly by the angular sampling of light. All images are recorded at a given angular resolution, treating a beam of light as a single ray. The light field is obtained by simultaneously recording the light propagation path and intensity information in one imaging session. However, regardless of whether a camera array or a plenoptic camera collects the light field, special equipment is required. Camera arrays require dozens or even hundreds of conventional cameras, resulting in the need for more equipment and making it difficult to control their time synchronization accuracy and relative position. A plenoptic camera has a simple structure compared with a camera array, and it only needs to collect the light field through one exposure; however, the  angular and spatial resolutions of the light field collected with this method are both limited, which leads to a spatial resolution much lower than that of a conventional camera. Therefore, developing a simple structure without the loss of the spatial resolution for light field acquisition has become a pressing problem. If a light field is viewed as a 3D field that fills space, we can slice and sample it at different depths. The light field is acquired by recording the slice information at different depths; this sampling method is referred to as depth sampling in this paper. As shown in Fig. 3, a theoretical model of light field acquisition based on depth sampling is proposed in this paper. The main contributions are as follows: (1) A theoretical model of light field acquisition based on depth sampling is proposed, and the theoretical transformation process from the sampled image set to the light field L(x,y,u,v) is given for known slice sampled image sets {(x 1 ,y 1 ,d 1 ), ..., (x M ,y M ,d M )} with different depths.
(2) On the basis of the above theoretical model and with the help of the theory of image reconstruction by projection, the algorithm and technical procedure for light field acquisition from depth sampling are given.

Theoretical Model of Depth Sampling Light Field Acquisition
In the theoretical model of depth sampling light field acquisition, two mutually parallel planes are introduced to parametrically characterize the light field. As shown in Fig. 4, the main lens plane (u,v) represents the reference plane in the direction of the light source and the image plane (x,y) represents the reference plane of the light imaging direction. x m denotes different image planes and d m denotes different image distances.
It is assumed that depth sampling is expressed as  where I(x,y,d) represents the pixel value at image plane d and (x,y). It can be seen from Fig. 4 that for the same light, both sides of Eq. (2) are equivalent expressions: Equation (3) can be obtained from the triangle similarity theorem: In the same way, Eq. (4) can be obtained: Thus, we obtain Next, the depth sampling can be expressed as where ( ) represents the projected pixel value for the same ray at the same position in each image plane.
Image reconstruction involves the projection of a 2D cross section of the object in various directions in the image plane to obtain a series of 1D projection functions. Then, the 2D cross section of the object is reconstructed using these 1D projection functions. (12) The most commonly used image reconstruction method is the filtered backprojection algorithm. In this paper, for light field multiview image reconstruction, the accumulation method in the filtered backprojection algorithm is also used. The filtered backprojection algorithm is a spatial processing method based on the Fourier transform. This algorithm performs convolution processing on the projections from each acquisition projection angle before backprojection to reduce the shape artifacts caused by the point spread function, thereby improving the quality of the reconstructed image. (13) According to the theory of projection reconstructed images, the image of any point can be regarded as the integral of all the light rays passing through the point at different angles. The algorithm is expressed as where i is the pixel value of the point, P θ is the projection value of the ray passing through the point at a certain angle θ, and T is the number of projection angles. Each image in depth sampling can also be seen as a 2D projection of a 4D light field. The projection on each image plane is . The number of depth samples is equivalent to that projection angles T. Then, the 4D light field recovered from the depth sampling using Eq. (8) can be expressed as 1 1 In this equation, α m = d m /d, which represents the image distance ratio, and L rec (x,y,u,v) is the collected 4D light field. For a given (u,v), the transmission direction of light can be determined, which is equivalent to the determination of a virtual camera. Taking images of light in the direction of (x,y) given different (u,v) values, we can obtain images having different perspectives. M represents the number of depth samples, and d represents the reference image plane, which can be of any d m .

Light field acquisition results based on depth sampling
Depth sampling can be understood as a set of images I(x,y,d) focused on different depths of the target scene, representing a sliced sampling of different depths of the light field. This sampling approach is different from that of common devices or methods that use camera or microlens arrays to sample light field angles. If depth samples are regarded as images with different focusing distances, then depth sampling can be achieved with simpler equipment, such as ordinary optical cameras. The focal length of the camera is fixed, and the sampled data of different depth slices are obtained by acquiring images with different depths of focus. We use a Canon 5D Mark III camera as the experimental device. In the experiment, the device position is fixed to obtain slice samples having different depths in a target scene.
The target scene for this article is composed of four playing cards, with each card representing a focus plane. The reason for choosing playing cards is that they are easy to focus on, and the movement of the viewing angle can be clearly seen after collecting the light field. The depth sampling data obtained with the Canon camera are formulated as {(x 1 ,y 1 ,d 1 ), ..., (x 4 ,y 4 ,d 4 )}. Four images with different focus planes completely cover the entire experimental scene. In addition, the (x,y) resolution is 1920 × 1280. To achieve better results, the camera control software digiCamControl is used to control the camera from a computer. The depths of focus of the four images are 0.75, 0.84, 0.96, and 1.03 m. To minimize the impact of the depth of the field on data collection, the focal length of the equipment used in the experiment is adjusted to 105 mm, and the aperture is adjusted to 4.0. When the depth of focus is 1 m, the depth of field is approximately 10 cm, and these depths of focus obtain more ideal images. The images acquired with different depths of focus are shown in Fig. 5.
In this study, Eq. (9) represents the light field collected from the depth sampling. In addition, subaperture images are used to visualize light fields. According to Eq. (9), given different (u,v) values, images (x,y) with different perspectives are obtained, where u represents the viewing angle in the horizontal direction and v represents the viewing angle in the vertical direction. We set the values of (u,v) to (20,0), (0,0) and (−20,0), where (0,0) represents the central viewing angle. The value in the vertical direction v remains fixed at 0, and that in the horizontal direction u is set to different values to observe the movement of the viewing angle. The obtained images (x,y), which have different perspectives, are shown in Fig. 6. The upper half of each image shows the acquired subaperture image, and the lower half is a partially enlarged view from the left of the subaperture image. The experimental results also demonstrate that given different (u,v) values, subaperture images (x,y) of different viewing angles can be obtained. In the experimental scene above, each playing card is selected as a focus plane, and the four cards completely cover the entire experimental scene. To better characterize the experiment and discuss the effect of the number of depth samples at the same time, we choose two and three depth samples that do not completely cover the experimental scene for comparison. The experimental results are shown in Fig. 7.
Since there is no true image for reference in this study, the Tenengrad, Laplacian, and variance functions are selected to evaluate the sharpness of the three images in Fig. 7. The Tenengrad and Laplacian functions are gradient-based functions that can be used to detect  whether an image is sharp and has sharp edges. The clearer the image is, the larger the resulting value. (14,15) The variance function is a measure of the degree of dispersion between discrete data and the expectation of probability theory. Since a clear image has a larger grayscale difference between pixels than a blurred image, the variance function can be used to evaluate the sharpness of the image; the clearer the image, the larger the variance. The three sharpness evaluation functions are used to quantitatively evaluate the three subaperture images generated above, and the results are shown in Table 1. It can be seen from these results that the greater the number of depth samples, the clearer the collected light field image. When the depth samples cover the entire experimental scene, the collected light field image is clearer than an image that does not completely cover the entire experimental scene. This is mainly because when the depth samples do not cover the entire experimental scene, part of the experimental scene cannot be clearly focused, which causes the image to become blurred. The light field subaperture image collected using Eq. (9) is not as clear as an image that completely covers the experimental scene.

Comparison of light field acquisition results obtained by depth and angular sampling methods
The depth-sampling-based light field image acquisition method only requires the use of ordinary cameras to acquire images in different focus planes to achieve light field computational imaging, which is very different from the angular-sampling-based light field acquisition method in terms of the acquisition model and equipment used. However, the depth-sampling-based method requires multiple consecutive shots of the target scene, which is more conducive to light field image acquisition in static or slow-moving experimental scenes. This is obviously different from the light field image acquisition method that uses a plenoptic camera to collect the light field in one shot. To better verify the experimental results of our proposed method, the popular Lytro Illum V2 plenoptic camera is used to collect light field data from the same experimental scene, and the light field images acquired by the two cameras are compared in terms of their sharpness and acquisition effects.
The Lytro Illum V2 plenoptic camera produces approximately 40 million effective pixels, the capture sensor has a resolution of 7728 × 5368, the number of microlens arrays is 541 × 434, the angular resolution is 15 × 15, and the number of pixels behind each microlens is 225. For this article, two cameras, Lytro Illum V2 and Canon 5D Mark III, are used to perform angular and depth sampling methods for the same experimental scene. To reduce the effects of other camera parameters on the experimental results, the focal lengths of both cameras are set to 105 mm. The sampling results of the two methods are shown in Fig. 8.
The light field image collected by the proposed method is a subaperture image. Therefore, it is necessary to decode the original light field image captured by the Lytro Illum V2 camera into a subaperture image to compare the collection effects. The angular resolution of the camera is 15 × 15, which means that a total of 15 × 15 subaperture images can be decoded. The   used for effect comparison. The subaperture images of the light field collected by the two methods are shown in Fig. 9. The small image on the left of each image is a partially enlarged view to clearly show the movement of the subaperture image viewing angle. For the light field subaperture images collected by the two methods, the angular resolution of the light field image collected by depth sampling can reach that of the light field image collected by angular sampling. The spatial resolution of the light field image collected by depth sampling is 1920 × 1280, which is the same as the size of the original sensor. The spatial resolution of the light field image collected by angular sampling is 625 × 433, which is much smaller than the size of the original sensor of 7728 × 5328.
To better verify the effects of the proposed method from a quantitative perspective, the three sharpness evaluation functions mentioned above are again used to evaluate the sharpness of the light field images collected by angular and depth sampling methods. The results are shown in Table 2.
It can be seen from the table that the light field subaperture image collected by depth sampling is not as clear as that collected by angular sampling. However, the difference between the two sampling methods is very small. The reason for the less clear image is that the light field depth sampling algorithm used for the proposed method directly adds the data in the depth samples, which causes the acquired subaperture image to become blurred.
From the experiment in Sect. 3.1, we find that by setting different (u,v) values in the depth sampling algorithm, we can obtain subaperture images (x,y) with different viewing angles. To explore the effects of different u values on the collected subaperture images, we employ four different u values in our experiments; the experimental results are shown in Fig. 10.
The three sharpness evaluation functions are also used to evaluate the sharpness of the light field images collected after setting different u values. The results are shown in Table 3. Figure 10 and Table 3 show that the larger the u value, the more blurred the acquired light field subaperture image. Equation (9) shows that as u increases, the amplitude of the image movement becomes larger, which makes the image more blurred when the image movement is added. A similar effect is seen in the subaperture image collected by the Lytro Illum V2 camera. The subaperture image quality is highest at the center of the microlens and deteriorates with increasing distance from the center.

Conclusions
We propose a new light field acquisition method that enables the depth sampling of the target scene, acquiring different depth images of the target scene, and then recovers the 4D light field from the depth sampling data. The clarity of the light field subaperture images collected by this method and by the traditional angular sampling method is evaluated. The experimental results show that the light field images collected by the depth sampling method proposed in this paper do not require special hardware, and the spatial resolution can reach the sensor size. The spatial resolution of the light field image collected by the traditional angular sampling method is much smaller than the size of the sensor. The clarity of a light field image collected by the proposed method is very close to that of the light field image collected by the angular sampling method. This method provides a more concise approach to the acquisition of light field images in computational imaging technology.