Image Resizing in Saliency Histogram Domain

The saliency map of an image provides useful information regarding the region of interest. In this paper, the normalized saliency map, called the saliency histogram, is used as a valid probability density function, based on which an efficient algorithm has been proposed for image resizing in the saliency histogram domain. Moreover, the saliency-histogram-based image resizing algorithm has been extended to video applications. Experimental results show the performance of the saliency histogram in terms of the preservation of salient objects in resized images and videos. The proposed approach is suitable for content-driven surveillance with multiresolution image sensor systems.


Introduction
Handheld devices such as smart phones and digital pads nowadays are in widespread use; however, there is an inevitable problem regarding the adjustment of image and video sizes. (1) Conventional methods using simple cropping and linear scaling are likely to degrade the region of interest (ROI). (2) In order to resize images/videos more effectively, it is desirable to take into account the visual content. (3) Modern resizing techniques are generally grouped into two categories: the discrete approach and the continuous approach. The content-aware seam carving (SC) algorithm is a common representative of the former. (4) As wavelet transform provides an efficient manner to represent images at multiple scales, (5) Conger et al. proposed the seamlet transform with wavelet filters for multiseam carving. (6) In Ref. 7, we proposed a scale-recursive algorithm for fast image resizing in the wavelet domain. The continuous approach is to warp an image by quadshaped mesh deformation. Gal et al. reshaped images to preserve the proportions of prominent objects while stretching or squeezing homogeneous regions. (8) Wolf et al. analyzed the importance of pixels on the basis of local saliency, object detection and motion estimation for content driven video retargeting. (9) Chen et al. formulated image warping as a convex quadratic problem, and solved it via quadratic programming. (10) Wang et al. proposed the optimized scale and stretch (OSS) algorithm by allowing uniform scaling in prominent regions while hiding nonuniform stretching distortions in homogeneous backgrounds. (11) The saliency map (SM) of an image provides valuable information on the presence of prominent objects. (12) In Refs. 13-15, we proposed several algorithms based on the normalized SM, termed the saliency histogram (SH), for image resizing. In this work, we extend the potential of the SH to video resizing.

Materials and Methods
In contrast to the discrete approach to image resizing, the continuous approach tends to avoid the cause of annoying jags. (2) Image warping uses a mesh M = (V, E, Q) composed of vertices V, edges E, and quads Q; V and E form grid lines to partition an image into Q. Wang et al. proposed the OSS algorithm to map the input mesh M onto the deformed mesh M' with minimum distortion energy. (11) For each quad q in Q, the OSS algorithm allows a scaled mapping v' where s q is the scale factor and t is the constant translation vector. The distortion energy D(q) of q is defined as where E(q) is the edge of q and t is eliminated in the computation of D(q). By taking the SM, w(q), of an image into account, the OSS algorithm minimizes the total distortion D as It is a quadratic problem with constraints on boundary conditions, grid line bending, and foldover prevention.
The normalized SM can be used to describe the probability of a target's presence in the image domain and is therefore termed SH. For 1D shrinking, the marginal SH (MSH) is defined as where p x (x) denotes the MSH in the x-axis and p(x, y) is the SH of an image. Figure 1(a) shows a test image, and its SM is given in Fig. 1(b). If the image is evenly partitioned into L vertical strips, it is noted from the MSH shown in Fig. 1(c) that most of the important contents are in the left and right regions. To reduce the difference in saliency between strips, a nonuniform partition is needed. On the basis of Eq. (4), the following transformation is used for adaptive nonuniform partition: where c j is the jth strip represented by its upper-right vertex, j = 1, 2, …, L, and the boundary Sensors and Materials, Vol. 29, No. 11 (2017) 1485 condition is c L = N. As noted in Fig. 1(d), the MSH of the nonuniformly partitioned strips is much smoother than that in Fig. 1(c). Figure 1(e) shows the boundaries of the nonuniformly partitioned strips overlaid on the SM. For horizontal shrinking, the output image is evenly partitioned into L vertical strips with an equalized MSH, and the mapping of the output strips can be realized from the corresponding nonuniform input strips by simple interpolation. Figures 1(f) and 1(g) show the results of reducing the width of the image in Fig. 1(a) using the MSH-based algorithm and linear scaling. As expected, the mapping shows the potential of the SH. 2D shrinking can be realized by using the tensor product of two 1D shrinking operations, i.e., width reduction followed by height reduction, or vice versa. Details of the saliency histogram equalization (SHE) and hybrid SHE-SC algorithms can be found in Refs. [13][14][15].
Visual saliency provides useful information on the likelihood of ROI, which is beneficial to the development of content-driven image/video resizing systems. In general, the nonuniform mesh constructed by SHE is composed of different quad sizes, and high-saliency quads are smaller than low-saliency quads. Both of the SHE algorithm and the hybrid SHE-SC algorithm can be applied to the first video frame to construct the initial mesh. For computational simplicity, a fast smoothing scheme is proposed to stabilize the construction of successive meshes with temporal coherence for the rest of a video. Specifically, let M k−1 and M k be the meshes obtained from the SH of two consecutive video frames, F k−1 and F k , at times k − 1 and k, respectively. Each vertex, v i,j,k , at coordinates (i, j) in the mesh M k is linearly smoothed as where v i,j,k−1 is the corresponding vertex in the previous mesh M k−1 , and 0 < α < 1 implies the degree of temporal coherence between the frames F k−1 and F k . As a video is usually composed of different scenes, the proposed video resizing algorithm first divides the input video into sequences of scenes using the scene change detection algorithm, (16) and then resizes each sequence of frames independently since the temporal coherence between different scenes is not necessary. Figure 2 exemplifies the detection result with sharp peaks indicating the occurrences of scene changes for a segment of the big buck bunny sequence.

Experimental Results
The SH-based image and extended video resizing algorithms were implemented in Matlab. The discriminative regional feature integration (DRFI) SM (17) was used to produce the SH of an image. Figure 3 shows the potential of the SH-based image resizing algorithm compared with those of other state-of-the-art algorithms. (9,10) The width of the original image was reduced. This shows that the proposed algorithm is preferable in terms of the preservation of salient objects.
The extended SH-based video resizing algorithm was compared with other existing algorithms. (18,19) The discrete cosine transform (DCT)-based pairwise comparison method (16) was used for scene change detection. Different scenes were resized separately as temporal coherence is not necessary for disjointed scenes. Figure 4 shows a visual comparison. (18) As can be seen, there exist curving artifacts at the right border of the resized frame in the warp method. (18) They are absent in the image resized using the proposed algorithm. Figure 5 presents visual comparison with the improved SC. (19) Note that most of the ROI in the middle area can be preserved using the  (16) with sharp peaks indicating scene changes.  (9) (c) OSS, (11) and (d) proposed algorithm.  (19) improved SC at the cost of border backgrounds. In contrast, the SH-based video resizing algorithm is capable of preserving border regions with acceptable deformation.
In most cases, the SH-based algorithm is superior. (18,19) As the warping approach is mainly dependent on the estimation of SH, there is a possibility of overwarping in regions with low saliency. One example is given in Fig. 6. As observed from the estimated SH [ Fig. 6(b)] obtained by DRFI (17) of a video frame [ Fig. 6(a)], the saliency values near the left border are small, and consequently, there is overwarping at the left border of the resized frame. To solve this problem, one of the promising approaches is to supplement the SH with additional features, e.g., those proposed by Judd et al. (20) Figure 6(e) shows the improved result with the enhanced SH shown in Fig. 6(d).

Conclusion
Saliency information plays an important role in content-aware image/video resizing. The normalized SM, called the SH, can be used as a valid probability density function for image warping with nonuniform meshes. In this study, images/video frames are partitioned into nonuniform quad meshes in the SH domain. Note that the content of salient objects can be well preserved in the resized images and video frames after mapping from the input quads onto the corresponding output quads. In the future, the proposed content-driven approach will be suitable for intelligent surveillance with multiresolution image sensors. (21) More specifically, the salient objects can be well sampled in a high-resolution mode with more pixels by sampling the backgrounds with fewer pixels.  (20) (e) Improved result with the aid of (d). (a)