Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-24T02:24:10.222Z Has data issue: false hasContentIssue false

Rate-dependent seam carving and its application to content-aware image coding

Published online by Cambridge University Press:  27 February 2013

Yuichi Tanaka*
Affiliation:
Graduate School of Engineering, Utsunomiya University, Utsunomiya, Tochigi 321-8585, Japan Graduate School of BASE, Tokyo University of Agriculture and Technology, Koganei, Tokyo 184-8588, Japan
Taichi Yoshida
Affiliation:
Department of Electronics and Electrical Engineering, Keio University, Yokohama, Kanagawa 223-8522, Japan
Madoka Hasegawa
Affiliation:
Graduate School of Engineering, Utsunomiya University, Utsunomiya, Tochigi 321-8585, Japan
Shigeo Kato
Affiliation:
Graduate School of Engineering, Utsunomiya University, Utsunomiya, Tochigi 321-8585, Japan
Masaaki Ikehara
Affiliation:
Department of Electronics and Electrical Engineering, Keio University, Yokohama, Kanagawa 223-8522, Japan
*
Y. Tanaka email: [email protected]

Abstract

Content-aware image resizing (CAIR) is desired because it preserves prominent regions in a resized image. However, CAIR requires high computational complexity to perform in mobile devices, though it is desired for these displays. Moreover, transmitting the side information for CAIR from the encoder is a problem since it usually requires high bitrates compared with those for image data. In this paper, we present a rate-dependent CAIR method that produce various retargeting results based on the bitrates for side information. Furthermore, we apply the proposed technique to wavelet-based image coding. Our proposed content-aware image coding method provides good performances for both CAIR and image coding.

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
The online version of this article is published within an Open Access environment subject to the conditions of the Creative Commons Attribution-NonCommercial-ShareAlike license . The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © The Authors, 2013.

I. Introduction

Recently, content-aware image resizing (CAIR) techniques, also known as image retargeting, have been attracting the attention of computer vision researchers [Reference Avidan and Shamir1Reference Domingues, Alahi and Vandergheynst6]. These techniques are regarded as sophisticated image resizing methods. To resize an image into a different aspect ratio and/or one with complex structures, CAIR gives better results than traditional scaling and cropping. In image processing terms, CAIR is a non-uniform resampling of images for extracting region-of-interest (ROI) automatically. However, it generally requires high computational complexity to analyze local and/or global image structures such as objects, backgrounds, and saliency. Therefore, CAIR is a very challenging task for the built-in software in portable devices, such as cell phones and PDAs.

Seam carving (SC) [Reference Avidan and Shamir1, Reference Rubinstein, Shamir and Avidan2] is one of the pioneering techniques for CAIR. It determines a seam, defined as an eight-connected path of pixels from the top (left) to the bottom (right) of the original image shown in Fig. 1(b). The width (height) of the image is reduced by one pixel after pruning a seam. The final retargeted image is produced with iterative pruning of seams until the retargeted image reaches the desired width (height). An example of the retargeted image is shown in Fig. 1(d). Combinations of SC and other resizing methods (scaling and/or cropping) have been reported in [Reference Dong, Zhou, Paul and Zhang7, Reference Rubinstein, Shamir and Avidan8].

Fig. 1. Visual comparisons of the CAIR methods (Mt. Evans image). Black colored regions represent pixels to be pruned. The original size is 1024 × 768 pixels and it is resized to 1024 × 512. From left to right: (a) original image, (b) seam paths calculated by SC, (c) seam paths calculated by RD-SC, (d) CAIR result by SC and (e) CAIR result by RD-SC.

There are a few approaches of content-aware image coding (CAIC) with SC-like methods [Reference Nguyen, Yang and Cai9Reference Tanaka, Hasegawa and Kato11]. Generally, they first perform CAIR for the original image and then the retargeted imageFootnote 1 is compressed by some image coding standards. The side information, which would consist of seam path information (SPI) and pixel values of seams, will be stored and/or sent with the compressed image data. At the decoder side, the compressed core image is first decoded. If one needs the reconstructed image with arbitrary resolutions, the side information is used to reconstruct regions outside the core image (non-ROI).

For these methods, the amount of SPI significantly affects the entire bitrates. Although the conventional method [Reference Nguyen, Yang and Cai9] uses arithmetic coding to encode SPI, it usually requires high bitrates. The methods in [Reference Tanaka, Hasegawa and Kato10, Reference Tanaka, Hasegawa and Kato11] try to reduce the bitrates to employ “pillar”-style seams. Unfortunately, the structures of their seam paths are restricted.

In this paper, we present a novel rate-dependent SC (RD-SC). Intuitively, seam paths by RD-SC are the lossy version of the original seam paths and it is realized by a seam path approximation with piecewise predefined functions (Fig. 1(c)) while maintaining CAIR performance (Fig. 1(e)). Furthermore, we integrate RD-SC into image coding to realize an effective CAIC. A new parent–children relationship for a well-known wavelet-based image coding algorithm, set partitioning in hierarchical trees (SPIHT) [Reference Said and Pearlman12], is proposed. In the experimental results, its image coding performances at full resolution are comparable to those of the regular SPIHT while preserving good CAIR qualities for retargeted images.

This paper is organized as follows: Section II gives a brief reviews of related methods of RD-SC. The following two sections present the main contributions of this paper. Section III discusses RD-SC using a piecewise approximation. A wavelet-based CAIC with the modified SPIHT algorithm is presented in Section IV. The experimental results are shown in Section V, and finally, Section VI summarizes the paper.

Notation

Table 1 lists the symbols used in this paper for an image I. To simplify our description, we use the MATLAB-based notation for the sets of rows or columns in a matrix. Moreover, for simplicity, the ith element of a vector x is represented as x(i) and a subvector of x between L 0 ≤ i ≤ L 1 is defined as x(L 0:L 1).

Table 1. Notation with Respect to an Image I.

In addition, this paper considers a vertical seam for the discussion about RD-SC unless specified. A vertical seam is defined as an eight-connected path of pixels of one-pixel width from the top to the bottom of an image. The process can be easily extended to a horizontal seam.

II Review

A) Seam carving

SC [Reference Avidan and Shamir1, Reference Rubinstein, Shamir and Avidan2] prunes seams classified as insignificant in an image. In the original SC technique, a dynamic programming approach is used to decide seams which can be pruned. A cumulative cost M(i, j) for each seam is calculated by the following equation:

(1)$$M\lpar i\comma \; j\rpar = \min \left(\matrix{M\lpar i-1\comma \; j-1\rpar + C_L\lpar i\comma \; j\rpar \cr M\lpar i-1\comma \; j\rpar + C_U\lpar i\comma \; j\rpar \cr M\lpar i-1\comma \; j+1\rpar + C_R\lpar i\comma \; j\rpar } \right)\comma \; \eqno\lpar 1\rpar$$

where

$$\eqalign{C_L\lpar i\comma \; j\rpar &= \vert I\lpar i\comma \; j + 1\rpar - I\lpar i\comma \; j - 1\rpar \vert \cr &\quad + \vert I\lpar i-1\comma \; j\rpar - I\lpar i\comma \; j - 1\rpar \vert \comma \; \cr C_U\lpar i\comma \; j\rpar &= \vert I\lpar i\comma \; j + 1\rpar - I\lpar i\comma \; j - 1\rpar \vert \comma \; \cr C_R\lpar i\comma \; j\rpar &= \vert I\lpar i\comma \; j + 1\rpar - I\lpar i\comma \; j - 1\rpar \vert \cr &\quad + \vert I\lpar i-1\comma \; j\rpar - I\lpar i\comma \; j + 1\rpar \vert .}$$

The cost function is called forward energy [Reference Rubinstein, Shamir and Avidan2]. Finally, the seam with the smallest M(H I, j) is pruned. With this criterion, a retargeted image is forced to avoid the large transition of pixel values between the left and right pixels of the pruned seam. The process is iterated until the desired image width is reached. For other details, please refer to [Reference Avidan and Shamir1,Reference Rubinstein, Shamir and Avidan2].

B) CAIC

So far, several methods were reported in CAIC [Reference Nguyen, Yang and Cai9Reference Tanaka, Hasegawa and Kato11, Reference Décombas, Capman, Renan, Dufaux and Pesquet-Popescu13Reference Deng, Lin and Cai15]. In this subsection, we briefly review the related works.

Image coding with selective data pruning (SDP) [Reference Vō, Solé, Yin, Gomila and Nguyen16] is classified as one of the “decimation-then-compression” image coding approach. The SDP method has a very simple architecture for reducing an image size since it is based on line-based downsampling. That is, rows and/or columns in an image are simply pruned and then the pruned image is interpolated on the synthesis side. The pruned lines are located predominantly across low-frequency regions in the image and thus the interpolated image has few artifacts if a small number of lines are pruned. In the comparison of image coding without data pruning, SDP-based image compression improves the peak signal-to-noise ratio (PSNR) at low bitrates. However, when many lines are pruned, the pruned lines most likely cut across salient regions. As a result, the interpolated image could have artifacts. A similar method to SDP was also presented in [Reference Wang and Urahama17].

Incorporating SC into image coding was investigated by Nguyen et al. [Reference Nguyen, Yang and Cai9]. In their method, SPI and the pixel values of seams as well as the core image are encoded. The method can reconstruct the original image losslessly if the core image, the SPI and the pixel values of seams are losslessly encoded and transmitted. However, it demands a large amount of bitrates for SPI, and the overhead bitrate significantly affects the entire image coding performance.

The authors proposed an approach for CAIC called image concentration and dilution [Reference Tanaka, Hasegawa and Kato10,Reference Tanaka, Hasegawa and Kato11], denoted as CD-based image coding hereafter. In the method, first, insignificant regions in the original image are discarded by the SC-like method and then the core image is compressed by some image coding standard. At the same time, the discarded pixel positions are stored. On the decoder side, the core image and the side information are synthesized to reconstruct the image with an arbitrary resolution. The discarded pixels, i.e., non-ROI pixels, are interpolated from the decoded core image. Unfortunately, reconstructed images after CD-based image coding sometimes have the interpolation artifacts at non-ROI.

The SC-based content-aware video coding was proposed in [Reference Décombas, Capman, Renan, Dufaux and Pesquet-Popescu13]. It reduces side information bitrate, i.e., for SPI, by employing key points of seam paths and a seam merging process. In low bitrate video coding, the method succeeded to reduce total bitrates for the equivalent video quality. However, it is also non-perfect-reconstruction video coding as well as SDP- and CD-based image coding. That is, seams should be interpolated from reconstructed core video frames.

Frequency-domain SC and its image coding application were reported in [Reference Deng, Lin and Cai14, Reference Deng, Lin and Cai15]. It calculates optimal seam paths on the LL subband of DWT. The SPI is then used to determine the scanning order of SPIHT. The seams are scanned and transmitted in the energy descending order (high-energy seams are firstly encoded). In full resolution, seams in the LL subband result block-based seams and therefore the decomposition level should not be too large (practically, the decomposition level will be less than or equal to 3 [Reference Deng, Lin and Cai15]). Therefore, its image coding performance at full resolution is substantially inferior to the original SPIHT.Footnote 2

Briefly speaking, the following two main problems should be resolved for CAIC:

  1. (1) Reducing side information for CAIR while preserving CAIR performance.

  2. (2) Maintaining image coding performance at full resolution.

For both encoding and decoding, the side information, especially SPI, is usually required to obtain a retargeted image with an arbitrary resolution. However, the original SC used in [Reference Nguyen, Yang and Cai9] is not designed to encode SPI effectively. Therefore, the amount of side information is quite large if we do not consider designing a SPI encoding method.

Another problem is the image coding performance. Although the core image can be compressed effectively by any existing standards, non-ROI is difficult to compress with a good quality since the positions of those pixels are not structured and highly depend on images.

In the following sections, we introduce the RD-SC and the modified tree structure of SPIHT, which try to resolve these problems.

III. RD-SC with Piecewise Approximation

In this section, we present details of RD-SC. The seam paths of RD-SC are constructed from those of the original SC based on a piecewise approximation. The RD-SC structure introduced in this section is illustrated in Fig. 2 and it is briefly summarized as follows:

  1. (1) Calculate a seam by the original SC. If there exists already pruned seams, a modified cost map calculated from previous pruned seam paths is used (Section III-F).

  2. (2) Approximate the seam with the consideration of a rate-distortion performance (Section III-A–III-E).

  3. (3) If possible, prune multiple seams (Section III-G).

  4. (4) Store SPI and return to (1) for the next seam approximation.

Fig. 2. Structure of RD-SC.

The rest of this section describes detailed algorithms.

A) Piecewise function

Let s be a seam path. In general, it can be defined as follows:

(2)$${\bi s} = \lcub j \in {\open N}\colon j = f\lpar i\rpar \comma \; 0\leq i \leq H_I-1\rcub \comma \; \eqno\lpar 2\rpar$$

where f(i) is a function to determine a seam path. A specific coordinate of the seam path is represented as (i, s(i)). According to the purpose of SC [Reference Avidan and Shamir1, Reference Rubinstein, Shamir and Avidan2], the following conditions are usually imposed on f(i):

  • Connectivity: |f(i) − f(i + 1)| ≤ 1.

  • Rectangular restriction: 0 ≤ f(i) ≤ W I − 1.

The first condition is the requirement that a seam path should be a set of eight-connected pixels. The second one is imposed to preserve a rectangular shape of the resized image.

Usually s has a complex structure since SC calculates s pixel-by-pixel and thus, it is difficult to compress efficiently. In this paper, we propose an approximation method of s with piecewise predefined functions. We define an approximated seam path a as follows:

(3)$${\bi a}_k = \lcub j \in {\open N}\colon j = f_{k}\lpar i_{k}\rpar \comma \; 0 \leq i_{k} \leq L\lpar k\rpar -1\rcub \comma \; \eqno\lpar 3\rpar$$

where ak (k = 0, 1, …) is the kth portion of a, L(k) defines the interval for ak and f k(i k) is an approximation function. ak is calculated within the interval $\left[\sum_{p=0}^{k-1} L\lpar p\rpar \comma \; \sum_{p=0}^{k} L\lpar p\rpar -1\right]$. The relationship between s and a is illustrated in Fig. 3.

Fig. 3. Relationship between s and a. Blue solid and red dashed lines represent s and a, respectively.

B) Candidates of f k(i k)

Any functions can be candidates of f k(i k) as long as they follow the conditions mentioned in the previous subsection. We consider to reduce the bitrate for SPI by choosing one function from several templates. A template set of f k(i k), shown as f k, g(i k), is heuristically defined as follows:

(4)$$f_{k\comma 0} \lpar i_{k}\rpar = C_{k}\comma \; \eqno\lpar 4\rpar$$
(5)$$f_{k\comma \pm 1}\lpar i_{k}\rpar = \pm i_{k} + C_{k}\comma \; \eqno\lpar 5\rpar$$
(6)$$f_{k\comma \pm 2}\lpar i_{k}\rpar = \pm \left[{1\over 3} i_{k} \right]+ C_{k}\comma \; \eqno\lpar 6\rpar$$
(7)$$f_{k\comma \pm 3}\lpar i_{k}\rpar = \pm \left[{i_{k}^{2} \over 2L\lpar k\rpar } \right]+ C_{k}\comma \; \eqno\lpar 7\rpar$$
(8)$$f_{k\comma \pm 4}\lpar i_{k}\rpar = \pm \min\left(\left[{1 \over 2}\sqrt{L\lpar k\rpar \cdot i_{k}} \right]\comma \; i_{k}\right)+ C_{k}\comma \; \eqno\lpar 8\rpar$$

where C k = a k(0) and [·] is the rounding operator. f k, g(i k) are shown in Fig. 4. Clearly, (4) is the constant function. The upper and lower bounds of seam paths from the connectivity condition are represented in (5). In (8), the minimum operator is used since the square-root functions violate the connectivity condition without the operator.

Fig. 4. f k, g(i k) where L(k) = 32. The number in legend is corresponding to those in (4–8).

We append one extra condition as follows:

(9)$$C_{k} = a_k\lpar 0\rpar = \left\{\matrix{s_{o}\lpar 0\rpar \hfill & \hbox{if}\ k=0\comma \; \hfill \cr a_{k-1}\lpar L\lpar k-1\rpar - 1\rpar \hfill & \hbox{otherwise}\comma \; \hfill} \right. \eqno\lpar 9\rpar$$

where so is the original seam path to be approximated. This means the initial position of ak is the same as the terminated position of ak−1. With this restriction, we can reduce the connection information between ak−1 and ak.

As a result, the SPI for ak consists of a selected index among (5–8) and the interval L(k). Furthermore, the overall initial position s o(0) = a(0) is required for each a.

C) Recursive optimization of approximated seams

The optimal L(k) should not be determined as a constant. For some k, L(k) can be large if so goes through very smooth regions or it is well approximated by one of the templates (4–8). Whereas L(k) should be small at the textured region where so has a complex structure. Hence, we present the algorithm for recursive optimization of L(k).

We assume so is calculated before the approximation. To estimate L(k) effectively, we define the simplified forward energy criterion as follows:

(10)$$C_{SFE} \lpar {\bi s}\rpar = {1 \over s_L} \sum_i \left\vert I\lpar i\comma \; s\lpar i\rpar - 1\rpar - I\lpar i\comma \; s\lpar i\rpar + 1\rpar \right\vert\comma \; \eqno\lpar 10\rpar$$

where s L is the length of s. This criterion is similar to the forward energy of the improved SC [Reference Rubinstein, Shamir and Avidan2] but nevertheless is less complex. In the optimization process, we use (10) since many repetitive calculations are required.

Algorithm 1 presents the detailed optimization process of L(k) for the kth interval of I:

$$I_{sub} = I \left(\sum_{p = 0}^{k - 1} L\lpar p\rpar \colon \sum_{p = 0}^k L\lpar p\rpar - 1\comma \; \colon \right).$$

Algorithm 1 Optimization of L(k) with piecewise functions

In Algorithm 1, so, sub is a original sub-seam path for I sub, T > 0 is a threshold and asub is the approximated seam path for I sub. Moreover,

$${\bi a}_{k\comma g} = \lcub j\colon j = f_{k\comma g}\lpar i_{k}\rpar \comma \; 0\leq i_{k} \leq L\lpar k\rpar -1\rcub .$$

It is clear that the approximated seam path ak, g can be completely recovered from the returned parameters. The simplified structure of this algorithm is shown in Fig. 5. In applications shown in this paper, the maximum row height of I sub (denoted as L max) is limited to 32. As a result, each L(k) is selected from {32, 16, 8, 4}.

Fig. 5. Recursive seam approximation process. The red lines represent the division points of so, sub.

D) Iterative optimization

In Algorithm 1, the optimal ak associated with L(k) is obtained for a given threshold T. As a further extension, the optimal threshold T is sought to control the bitrate for the SPI and the CAIR quality. The optimization is realized by an iterative optimization of Algorithm 1. The object of RD-SC is to find the nearest seam aopt to so under a bitrate constraint for the SPI, which is represented as follows:

(11)$${\bi a}_{opt} = \hbox{arg} \min_{\bi a} \lpar D_{PA} \lpar {\bi a}\rpar + \lambda R_{PA} \lpar {\bi a}\rpar \rpar . \eqno\lpar 11\rpar$$

where λ ≥ 0 is the Lagrange multiplier. In this approach, we define D PA(a), shown below, based on the reconstruction error by the simple interpolation.

(12)$$D_{PA}\lpar {\bi a}\rpar = \hbox{MSE} \lpar I\lpar \colon \comma \; {\bi s}_o\rpar \comma \; {\bar I}\lpar \colon \comma \; {\bi s}_o\rpar \rpar - \hbox{MSE} \lpar I\lpar \colon \comma \; {\bi a}\rpar \comma \; {\bar I}\lpar \colon \comma \; {\bi a}\rpar \rpar \comma \; \eqno\lpar 12\rpar$$

where MSE is a mean squared error between the original and interpolated seams and Ī is defined by a linear interpolation from six available neighbors as

(13)$$\bar{I}\lpar i\comma \; v\lpar i\rpar \rpar = {1 \over 6}\sum_{u_0 = -1}^1 \sum_{u_1 \in \lcub -1\comma 1\rcub } I\lpar i + u_0\comma \; v\lpar i\rpar + u_1\rpar \comma \; \eqno\lpar 13\rpar$$

which is depicted in Fig. 6. The calculation method of R PA (a) is shown in the following subsection (Section III-E).

Fig. 6. Illustrative example of (13). White and black dots represent core (remaining) and seam pixels, respectively.

Finally, aopt in (11) is sought by gradually changing T shown in Algorithm 2, where ${\bi a}_{N_{cur}}$ is the approximated seam path at the current iteration and T PA contains N T threshold values and they are monotonically increasing. In this paper, we use TPA = [1, 5, 10, …, 50].

Algorithm 2 Iterative optimization of a

It is a similar strategy to the rate-distortion optimization for video coding [Reference Ortega and Ramchandran18]. Instead of the brute-force optimization, our approach is climbing up the R-D slope as shown in Fig. 7 to reduce computational complexity.

Fig. 7. Illustrative example of Algorithm 2.

E) Bitrate calculation

In Algorithm 2, R PA (a) should be properly calculated from the approximated seam a. As previously mentioned, the function index g opt in Algorithm 1 and the set of L(k) are the side information for SPI. Since Algorithm 1 is based on the division of I sub into two halves, we can represent a set of L(k) with a simple binary representation. That is, 0 is stored if I sub is not divided, whereas 1 is stored if I sub is divided. As a result, a binary vector bdiv can be obtained for the division information, which corresponds to L(k).

By our preliminary experiment, codewords for function indices g opt are determined as shown in Table 2. The required bitrate for the approximation functions is calculated based on the assigned codewords.

Table 2. Codeword Assignment for f k, g(i) of RD-SC

Finally, R PA (a) can be represented as follows:

(14)$$R_{PA}\lpar {\bi a}\rpar = \#\lpar {\bi b}_{div}\rpar + \sum b_{g_{opt}}\comma \; \eqno\lpar 14\rpar$$

where #(bdiv) is the number of elements in bdiv and $b_{g_{opt}}$ is the number of bits to encode g opt.

F) Weighted cost matrix

Each rate-dependent seam path can be calculated by RD-SC mentioned previously. However, a reconstructed image sometimes presents visible artifacts due to seam pruning from almost the same spatial positions. To alleviate this problem, we consider balancing the positions of the seams. For this purpose, we modify the cost of the original SC M(i, j) in (1). When an already pruned seam is defined as $\hat{\bi a}_{m} \lpar m = 0\comma \; 1\comma \; \ldots\rpar $, a new cost map is represented as follows:

(15)$$M_b\lpar i\comma \; a\lpar i\rpar \rpar = w \sum_{m} \delta_{\hat{a}_m\lpar i\rpar \comma a\lpar i\rpar }\comma \; \eqno\lpar 15\rpar$$
(16)$$\hat{M} \lpar i\comma \; j\rpar = M\lpar i\comma \; j\rpar + M_b\lpar i\comma \; j\rpar \comma \; \eqno\lpar 16\rpar$$

where w and δi, j are the weight and the Kronecker delta, respectively. Briefly speaking, the extra cost map M b(i, j) appends w for the conventional cumulative cost M(i, j) when a seam candidate crosses (or touches) a pruned seam path. We use ${\hat M}\lpar i, j \rpar$ instead of M(i, j) in (1) for calculating a seam path. As a result, the updated cost ${\hat M}\lpar i, j \rpar$ avoids pruning a seam adjacent to already pruned seams.

Furthermore, ${\hat M}\lpar i, j \rpar$ is utilized to determine the maximum number of seams. If $\min\lpar {\hat M}\lpar H_{I}\comma \; j\rpar \rpar \gt \tau$, where τ is an arbitrary threshold, we stop pruning seams since the remaining retargeted image is sufficiently condensed and no more seams should be pruned from the image.

G) Multiple seam pruning

In the original SC, just one seam is pruned at each seam calculation. Hence, pruning many seams increases the computational complexity. Therefore, we reuse a primal seam path for pruning the seams around it. In this paper, we use the seam-by-seam calculation for so to clarify our contribution. To further reduce the computational complexity, we could use an improved SC method, such as [Reference Chiang, Wang, Chen and Lai19], which uses one saliency map for pruning several seams. It is able to speed up the calculation of so.

We present an algorithm to prune multiple seams with a primal seam path ap in Algorithm 3. In the algorithm, ${\bi B} = \lcub {\bi b}_{-\lpar N_{m} - 1\rpar /2}\comma \; \ldots\comma \; {\bi b}_{0}\comma \; \ldots\comma \; {\bi b}_{\lpar N_{m}-1\rpar /2}\rcub $ is a set of seams to be pruned at the same time, N m is the (odd) number of pruned seams and T m is the threshold to determine whether multiple seams can be pruned.

Algorithm 3 Multiple seam pruning

IV. Wavelet-Based CAIC

In this section, the wavelet-based multiresolution representation of images for RD-SC is presented. Furthermore, the modified SPIHT algorithm is shown.

A) Lifting-based seamlet

For preserving the image coding performance at full resolution, the pixel values of seams should be encoded as well as the core image. Before constructing a full multiresolution image, seams are transformed by the lifting-based discrete wavelet transform (DWT).

The wavelet transform along seam paths, named seamlet, was proposed by Conger et al. [Reference Conger, Radha and Kumar20]. It generates coarse and detail coefficients similar to the conventional DWT, but its downsampling positions are determined by seam paths. Though it is able to use any wavelet filters, the original seamlet recommended Haar wavelet. However, we focus on the image coding application using CAIR. The detail coefficients should be as small as possible for image coding. Thus, lifting-based 5/3 DWT is employed for seamlet in this paper. The detail coefficient is calculated as follows:

(17)$$\eqalign{d\lpar i\comma \; a\lpar i\rpar \rpar &= I\lpar i\comma \; a\lpar i\rpar \rpar \cr &\quad - \left[\left(I\lpar i\comma \; a\lpar i\rpar - 1\rpar + I\lpar i\comma \; a\lpar i\rpar + 1\rpar \right)/{2}\right]\comma \; } \eqno\lpar 17\rpar$$

where d(i, a(i)) is the detail coefficient at (i, a(i)). This lifting-based transformation is shown in Fig. 8. In our approach, the coarse coefficients remain unchanged different from the original seamlet. That is, the five-tap lowpass DWT filtering is not performed, since these unfiltered original pixels in the core image are finally transformed by the conventional DWT. Finally, d(i, a(i))'s are collected and one column of detail coefficients is yielded. The process is recursively performed for the number of seams.

Fig. 8. Lifting-based seamlet. Top: transformation of I(i, a(i)). Bottom: illustrative example of seamlet for one seam.

In this paper, first vertical seams are calculated and transformed by the lifting-based seamlet and then the horizontal seams are calculated and transformed, shown in Fig. 9. Finally, the core image is transformed by the multilevel DWT to construct a fully decomposed multiresolution image. Note that this can be easily extended to the optimal order seam pruning [Reference Rubinstein, Shamir and Avidan8] for 2-D CAIR; however, we choose this simple 2-D CAIR because of the computational complexity of the optimization of the seam pruning sequence. A retargeted image with an arbitrary resolution is easily obtained by interpolating seam paths.

Fig. 9. Encoder structure of the proposed CAIC.

B) Pre-quantization of detail seam coefficients

It is clear that pixel values of the seams are not as important as the core image since prominent regions are preserved in the core image through RD-SC. Therefore, we consider to quantize detail seam coefficients before SPIHT encoding. It is represented as follows:

(18)$$d_q\lpar i\comma \; a\lpar i\rpar \rpar = \hbox{sgn}\lpar d\lpar i\comma \; a\lpar i\rpar \rpar \rpar \left[{\vert d\lpar i\comma \; a\lpar i\rpar \rpar \vert + f_q \over \Delta}\right]\comma \; \eqno\lpar 18\rpar$$

where d q(i, a(i)) is the pre-quantized detail seam coefficient, f q is the rounding control parameter, and Δ is the quantization step size. Clearly, no quantization is performed when Δ = 1. The pre-quantized signal is reconstructed by the following equation:

(19)$$d^{\prime} \lpar i\comma \; a\lpar i\rpar \rpar = \Delta d_q\lpar i\comma \; a\lpar i\rpar \rpar . \eqno\lpar 19\rpar$$

This pre-quantization is based on the scalar quantization with offset used in H.264/AVC [Reference Wedi and Wittman21] and we use f q = Δ/3 similar to the intra-coding of H.264/AVC. By using this pre-quantization, we can control the qualities between the core image and seams.

Modified tree structure of SPIHT for CAIC

For the proposed multiresolution image representation, we introduce a new parent–children relationship between the transformed coefficients in the core image and detail coefficients of seams for SPIHT encoding [Reference Said and Pearlman12].

The ordinary parent–children relationship in [Reference Said and Pearlman12] is defined to connect a pixel to four pixels in the same subband for the next lower decomposition (higher frequency) level. In this paper, the relationship is used for transformed coefficients of the core image. Let O(q, r) be a set of children indices of the parent index (q, r). The relationship is defined as

(20)$$\eqalign{O\lpar q\comma \; r\rpar &=\lcub \lpar 2q\comma \; 2r\rpar \comma \; \lpar 2q\comma \; 2r+1\rpar \comma \; \lpar 2q+1\comma \; 2r\rpar \comma \; \cr &\qquad \lpar 2q+1\comma \; 2r+1\rpar \rcub .}\eqno\lpar 20\rpar$$

Besides the ordinary relationship, we introduce the new relationship between the core image and seams. We define (q c, r c) as the index of a transformed coefficient of the core image after DWT in the HL subband for the lowest decomposition (the highest frequency) level, corresponding to I(i, a(i) − 1). Furthermore, (q s, r s) is defined as the index of a detail coefficient of a seam corresponding to d q (i, a(i)).

The new parent–children relationship is represented as follows:

(21)$$O\lpar q_c\comma \; r_c\rpar \ni \left\{\matrix{\lpar q_s\comma \; r_s\rpar \hfill & \hbox{if}\, \lpar q_s\comma \; r_s\rpar \, \hbox{exists}\comma \; \hfill \cr \phi \hfill & \hbox{otherwise}. \hfill} \right. \eqno\lpar 21\rpar$$

The new parent–children relationship is illustrated in Fig. 10. This relationship means that detail coefficients of seams are related to the nearest core pixels. The case of horizontal seams is similarly defined as well.

Fig. 10. New parent–children structure of the proposed CAIC. An arrow with a solid line represents the relationship.

V. Experimental results

In this section, we validate image coding and CAIR performances. Four eight-bit grayscale images Park Joy, Pisa Tower, Beach and Crew are used as test images. Their sizes are summarized in Table 3 and the original images are shown in Fig. 11. All core image sizes are determined automatically based on the parameter τ introduced in Section III-F. Experimentally set parameters are summarized in Table 4.

Fig. 11. Test images.

Table 3. Sizes of Test Images (H I × W I)

Table 4. Parameter Settings for RD-SC.

A) Effects of parameter setup

The theoretical optimization of parameters shown in Table 4 is quite difficult since the optimization process could be highly non-linear and the best parameter set is certainly depending on the one who views the retargeted and full resolution images. Therefore, all of them are set heuristically as previously mentioned. Here, we describe the effect of parameter settings. In the following experiments in this subsection, one parameter is changed and the remaining ones are the same as Table 4 unless specified. Since effects of a few parameters are relatively clear, i.e., those of L max and τ are analogous, we describe about λ, ω, T m, and N m. It is an interesting future work to determine optimal parameters automatically. As a simple approach, we could append a parameter tuning step by a user with some specific test images.

1) λ

It controls bitrates for SPI and the accuracy of seam paths. In the experiment, W I/4 vertical seams and H I/4 horizontal seams are calculated regardless of τ. The SPI of SC is encoded by arithmetic coding similar to [Reference Nguyen, Yang and Cai9]. Figure 12 is a plot of the bitrate of SPI as a function of λ. As expected, the bitrate decreases as λ increases and the bitrates are much lower than those of SC.

Fig. 12. Bitrates for side information. The solid lines represent required bitrates of SPI by SC.

Figure 13 shows the calculated seams of Pisa Tower for various λ. The seams represent different paths depending on λ. The seams move finely like SC for a small λ, whereas for a large λ, the seam paths are a coarse approximation of the original ones.

Fig. 13. Approximated seams by RD-SC. The enlarged portions of Pisa Tower are shown. Black lines represent approximated seam paths.

2) ω

It is used for balancing seam positions. ω = 0 means there are no restrictions for the positions. Figure 14 shows approximated seam paths of Crew for various ω. The seam paths are significantly changed within the range [0, Reference Guo, Liu, Shi, Zhou and Gleicher5]. Without the restriction, i.e., ω = 0, seams are mainly located on the smooth areas and the positions are concentrated similar to the original SC. Whereas for the large ω, such as ω = 5.0 or 10, the seams are distributed uniformly. Seams for the large ω are sometimes passing through the prominent regions of the image, e.g., faces of astronauts.

Fig. 14. Approximated seams by RD-SC for various ω. A total of 128 seams are removed regardless of τ for comparison purpose.

For the retargeting purpose, ω should be small; which yields similar seam positions to those of the original SC (Fig. 14(a)). Whereas for the image coding purpose, seam paths should be distributed uniformly. As a simple comparison, Table 5 presents PSNRs of the interpolated seams by linear interpolation. It is clear that ω = 0, i.e., the most concentrated case, shows the worst interpolation result since the small ω requires interpolating wide areas from few remaining pixels. It is also observed that ω should not be too large. Therefore, as a trade off issue, we picked up ω = 0.5.

Table 5. Linear interpolation results for Crew image.

3) N m and T m

N m determines the maximum number of seams removed at the same time. The large N m permits removing many seams by one seam path calculation step. The seam paths for N m = 3 and 11 are shown in Fig. 15. As expected, many seams share one seam paths in Fig. 15(b). PSNRs of interpolated seams by linear interpolation and SPI bitrates are compared in Fig. 16. It seems that the large N m is recommended anytime. However, as shown in Fig. 17, the large N m significantly affects the retargeted image quality. In the figure, the straight line in the original Crew image was broken in the retargeted image with N m = 11. Hence, we picked up N m = 3, which seems a good trade-off as shown in Fig. 16.

Fig. 15. Approximated seams by RD-SC for different N m (T m = 4). A total of 128 seams are removed regardless of τ for comparison purpose.

Fig. 16. PSNRs of interpolated seams and bitrates for SPI (Crew image, 128 seams).

Fig. 17. Enlarged portions of retargeted Crew image.

T m has a similar effect to N m, since it decides whether N m seams can be pruned or not (see Algorithm 3). If N m > 2552 (possible upper bound of MSE), the algorithm always choose to prune N m seams regardless of MSE. Based on the similar experiment about T m, we experimentally set T m = 4. The approximated seams for a few T m are shown in Fig. 18.

Fig. 18. Approximated seams by RD-SC for different T m (N m = 1). A total of 128 seams are removed regardless of τ for comparison purpose.

B) CAIR performance

The second comparison is for CAIR performance. As far as we know, few numerical measures exist to compare CAIR performances. Hence, we present all of the retargeted images for visual quality comparison. Table 3 summarizes the sizes of the core images. Figure 19 shows CAIR results by using SC and RD-SC. For comparison purpose, scaled images by imresize function of MATLAB are also shown. All images are resized to have the same size of the retargeted images by RD-SC. From the figures, RD-SC presents very similar retargeted images to those obtained by SC.

Fig. 19. Resizing results. The left, center, and right columns correspond to the resized images by scaling, SC, and RD-SC, respectively. From top to bottom: Park Joy, Pisa Tower, Beach, and Crew.

C Computational complexity

It is worth noting that the computational complexity in the seam approximation process of RD-SC is higher than those of SC due to the recursive estimation. It is difficult to present the complexity theoretically due to the highly image-adaptive computations of RD-SC. Thus, the execution time is compared. All algorithms are implemented on a MATLAB (R2011a) platform. The simulation is run on the desktop computer with Intel 3.4-GHz CPU (Corei7-2600), 16-GB RAM. Table 6 shows the execution time of the average of five executions for Beach image. In the table, the seam calculation time by SC (the column of “SC”) is also shown for comparison. The computation time is proportional to the number of seams to be removed. Seam path approximation time is longer than that of the original SC. However, Table 6 also presents the seam path decoding time of RD-SC. Obviously, it is significantly faster than the seam calculation of SC. It is beneficial for the traditional client–server model, where the decoder will be a mobile device which often has poor computation power.

Table 6. Execution time (s) of Beach image.

D) Application to image coding

Here, we compare image coding performance. As the conventional encoders, SPIHT, the method in [Reference Nguyen, Yang and Cai9] (denoted as SC-SPIHT) and CD-based SPIHT [Reference Tanaka, Hasegawa and Kato11] (denoted as CD-SPIHT) are used. Our CAIC method is referred to as CA-SPIHT in this section. Note that SC-SPIHT does not include pixel values of seams; that is, the compressed pruned image is interpolated by linear interpolation since the bitrate for SPI already requires a high percentage of the entire bit budget. All methods apply five-level 9/7 DWT for core imagesFootnote 3 before performing SPIHT.

The R-D curves, encoded with Δ = 1 in (18), are shown in Fig. 20. SC-SPIHT is inferior to the other methods due to its large amount of bitrates for SPI. Furthermore, CD-SPIHT is worse than SPIHT and CA-SPIHT especially for high bitrates since its non-ROI is just interpolated from the core image at the decoder. SPIHT and CA-SPIHT are comparable, but SPIHT slightly outperforms the CA-SPIHT. More precisely, in comparison to SPIHT, CA-SPIHT increases bitrates less than or equal to 10 %. However, the bitrate increase would be a permissible penalty for users demanding both of a good CAIR quality and image coding performance at full resolution. In addition, reconstructed images at full resolution are shown in Figs. 21–24. Similar to the R-D curves, the reconstructed images by SC-SPIHT and CD-SPIHT have interpolation artifacts, whereas our CA-SPIHT significantly reduces those artifacts and the reconstructed images are very similar to those encoded by SPIHT.

Fig. 20. R-D curves of reconstructed image at full resolution. From left to right, top to bottom: Park Joy, Beach, Pisa Tower, and Crew.

Fig. 21. Reconstructed Park Joy images. Each image is encoded at 1.0 bpp.

Fig. 22. Reconstructed Pisa Tower images. Each image is encoded at 1.0 bpp.

Fig. 23. Reconstructed Beach images. Each image is encoded at 1.0 bpp.

Fig. 24. Reconstructed Crew images. Each image is encoded at 1.0 bpp.

Moreover, Fig. 25 represents PSNRs of core images for a few step sizes of the pre-quantization in Section IV-B. As expected, if Δ in (18) is large, the quality of the core image is improved especially for high bitrate coding. In addition, PSNR gains from Δ = 1 depend on the test images. For example, seam paths for Park Joy pass through the very smooth regions, the PSNR gains due to the pre-quantization is marginal, whereas for Beach, the PSNR gain is significant since vertical seams pass through the highly textured sand area.

Fig. 25. R-D curves of reconstructed core images for various Δ.

VI. Conclusions

In this paper, we proposed rate-dependent structures of SC by piecewise approximation of the original seam paths. Furthermore, a new parent–children relationship of SPIHT is proposed for CAIC. In the experimental results, our approach shows comparable CAIR performances to those of SC with significant reductions of required bitrates for side information. Its image coding performances are comparable to those of the original SPIHT and also it consistently outperforms the conventional CAICs. Our RD-SC could be extended to video retargeting, which is one of our ongoing studies.

Acknowledgements

The authors thank krossbow (Mt. Evans), wsuph001 (Pisa Tower), and wenno (Beach) of flickr (www.flickr.com) for making available images in creative commons license. This work was supported in part by JSPS KAKENHI Grant Number 24760288, SCAT Research Grant, The Nakajima Foundation, and Kayamori Foundation of Informational Science Advancement.

Footnotes

1 Sometimes it is called core image in this paper.

2 In [Reference Deng, Lin and Cai15], 1.8 dB average PSNR degradation was reported.

3 For SPIHT, the core image is the original image itself.

References

REFERENCES

[1]Avidan, S.; Shamir, A.: Seam carving for content-aware image resizing. ACM Trans. Graph., 26(3) (2007), 9 pp., Article 10. DOI = 10.1145/1239451.1239461. http://dx.doi.org/10.1145/1239451.1239461.Google Scholar
[2]Rubinstein, M.; Shamir, A.; Avidan, S.: Improved seam carving for video retargeting. ACM Trans. Graph., 27(3) (2008), 9 pp., Article 16. DOI = 10.1145/1360612.1360615. http://dx.doi.org/10.1145/1360612.1360615.Google Scholar
[3]Wolf, L.; Guttmann, M.; Cohen-Or, D.: Non-homogeneous content-driven video-retargeting, in Proc. ICCV'07, 2007.Google Scholar
[4]Wang, Y.; Tai, C.-L.; Sorkine, O.; Lee, T.-Y.: Optimized scale-and-stretch for image resizing. ACM Trans. Graph. 27(5) (2008), 8 pp., Article 118. DOI = 10.1145/1409060.1409071. http://doi.acm.org/10.1145/1409060.1409071.Google Scholar
[5]Guo, Y.; Liu, F.; Shi, J.; Zhou, Z. H.; Gleicher, M.: Image retargeting using mesh parametrization. IEEE Trans. Multimedia, 11(5) (2009), 856867.Google Scholar
[6]Domingues, D.; Alahi, A.; Vandergheynst, P.: Stream carving: an adaptive seam carving algorithm, in Proc. ICIP'10, 2010.CrossRefGoogle Scholar
[7]Dong, W.; Zhou, N.; Paul, J.; Zhang, X.: Optimized image resizing using seam carving and scaling. ACM Trans. Graph., 28(5) (2009), 10 pp. Article 125. DOI = 10.1145/1618452.1618471. http://doi.acm.org/10.1145/1618452.1618471.Google Scholar
[8]Rubinstein, M.; Shamir, A.; Avidan, S.: Multi-operator media retargeting. ACM Trans. Graph., 28(3) (2009), 11 pp. DOI = 10.1145/ 1531326.1531329. http://doi.acm.org/10.1145/1531326.1531329.Google Scholar
[9]Nguyen, A.; Yang, W.; Cai, J.: Seam carving extension: a compression perspective, in Proc. 17th ACM Int. Conf. on Multimedia, 2009, 825828.Google Scholar
[10]Tanaka, Y.; Hasegawa, M.; Kato, S.: Image coding using concentration and dilution based on seam carving with hierarchical search, in Proc. ICASSP'10, 2010, 13221325.Google Scholar
[11]Tanaka, Y.; Hasegawa, M.; Kato, S.: “Improved image concentration for artifact-free image dilution and its application to image coding”, in Proc. ICIP'10, 2010, 12251228.Google Scholar
[12]Said, A.; Pearlman, W. A.: A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans. Circuits Syst. Video Technol., 6(3) (1996), 243250.Google Scholar
[13]Décombas, M.; Capman, F.; Renan, E.; Dufaux, F.; Pesquet-Popescu, B.: Seam carving for semantic video coding, in Proc. SPIE, Applications of Digital Image Processing XXXIV, 2011.Google Scholar
[14]Deng, C.; Lin, W.; Cai, J.: Content-based image compression for arbitrary-resolution display devices, in Proc. ICC 2012, 2011, 15.Google Scholar
[15]Deng, C.; Lin, W.; Cai, J.: Content-based image compression for arbitrary-resolution display devices. IEEE Trans. Multimedia, 14(4) (2012), 11271139.Google Scholar
[16], D. T.; Solé, J.; Yin, P.; Gomila, C.; Nguyen, T. Q.: Selective data pruning-based compression using high-order edge-directed interpolation. IEEE Trans. Image Process., 19(2) (2010), 399409.Google Scholar
[17]Wang, T.; Urahama, K.: Cartesian resizing of image and video for data compression, in Proc. TENCON 2010, 2010, 16511656.Google Scholar
[18]Ortega, A.; Ramchandran, K.: Rate-distortion methods for image and video compression. IEEE Signal rocess. Mag., 15(6) (1998), 2350.Google Scholar
[19]Chiang, C. K.; Wang, S. F.; Chen, Y. L.; Lai, S. H.: Fast JND-based video carving with GPU acceleration for real-time video retargeting. IEEE Trans. Circuits Syst. Video Technol. 19(11) (2009), 15881597.Google Scholar
[20]Conger, D. D.; Radha, H.; Kumar, M.: Seamlets: Content-aware nonlinear wavelet transform, in Proc. ICASSP'10, 2010, 14501453.Google Scholar
[21]Wedi, T.; Wittman, S.: Rate-distortion constrained estimation of quantization offsets, in Joint Video Team (JVT), Doc. JVT-O066, Busan, Korea, April 2005.Google Scholar
Figure 0

Fig. 1. Visual comparisons of the CAIR methods (Mt. Evans image). Black colored regions represent pixels to be pruned. The original size is 1024 × 768 pixels and it is resized to 1024 × 512. From left to right: (a) original image, (b) seam paths calculated by SC, (c) seam paths calculated by RD-SC, (d) CAIR result by SC and (e) CAIR result by RD-SC.

Figure 1

Table 1. Notation with Respect to an Image I.

Figure 2

Fig. 2. Structure of RD-SC.

Figure 3

Fig. 3. Relationship between s and a. Blue solid and red dashed lines represent s and a, respectively.

Figure 4

Fig. 4. fk, g(ik) where L(k) = 32. The number in legend is corresponding to those in (4–8).

Figure 5

Algorithm 1 Optimization of L(k) with piecewise functions

Figure 6

Fig. 5. Recursive seam approximation process. The red lines represent the division points of so, sub.

Figure 7

Fig. 6. Illustrative example of (13). White and black dots represent core (remaining) and seam pixels, respectively.

Figure 8

Algorithm 2 Iterative optimization of a

Figure 9

Fig. 7. Illustrative example of Algorithm 2.

Figure 10

Table 2. Codeword Assignment for fk, g(i) of RD-SC

Figure 11

Algorithm 3 Multiple seam pruning

Figure 12

Fig. 8. Lifting-based seamlet. Top: transformation of I(i, a(i)). Bottom: illustrative example of seamlet for one seam.

Figure 13

Fig. 9. Encoder structure of the proposed CAIC.

Figure 14

Fig. 10. New parent–children structure of the proposed CAIC. An arrow with a solid line represents the relationship.

Figure 15

Fig. 11. Test images.

Figure 16

Table 3. Sizes of Test Images (HI × WI)

Figure 17

Table 4. Parameter Settings for RD-SC.

Figure 18

Fig. 12. Bitrates for side information. The solid lines represent required bitrates of SPI by SC.

Figure 19

Fig. 13. Approximated seams by RD-SC. The enlarged portions of Pisa Tower are shown. Black lines represent approximated seam paths.

Figure 20

Fig. 14. Approximated seams by RD-SC for various ω. A total of 128 seams are removed regardless of τ for comparison purpose.

Figure 21

Table 5. Linear interpolation results for Crew image.

Figure 22

Fig. 15. Approximated seams by RD-SC for different Nm (Tm = 4). A total of 128 seams are removed regardless of τ for comparison purpose.

Figure 23

Fig. 16. PSNRs of interpolated seams and bitrates for SPI (Crew image, 128 seams).

Figure 24

Fig. 17. Enlarged portions of retargeted Crew image.

Figure 25

Fig. 18. Approximated seams by RD-SC for different Tm (Nm = 1). A total of 128 seams are removed regardless of τ for comparison purpose.

Figure 26

Fig. 19. Resizing results. The left, center, and right columns correspond to the resized images by scaling, SC, and RD-SC, respectively. From top to bottom: Park Joy, Pisa Tower, Beach, and Crew.

Figure 27

Table 6. Execution time (s) of Beach image.

Figure 28

Fig. 20. R-D curves of reconstructed image at full resolution. From left to right, top to bottom: Park Joy, Beach, Pisa Tower, and Crew.

Figure 29

Fig. 21. Reconstructed Park Joy images. Each image is encoded at 1.0 bpp.

Figure 30

Fig. 22. Reconstructed Pisa Tower images. Each image is encoded at 1.0 bpp.

Figure 31

Fig. 23. Reconstructed Beach images. Each image is encoded at 1.0 bpp.

Figure 32

Fig. 24. Reconstructed Crew images. Each image is encoded at 1.0 bpp.

Figure 33

Fig. 25. R-D curves of reconstructed core images for various Δ.