Single-Image Depth Map Estimation Using Blur Information

Single-Image Depth Map Estimation Using Blur Information
Dmitry Akimov
Department of Computational
Mathematics and Cybernetics
Moscow State University, Moscow,
Russia
dakimov@graphics.cs.msu.ru
Dmitry Vatolin
Department of Computational
Mathematics and Cybernetics
Moscow State University, Moscow,
Russia
dmitry@graphics.cs.msu.ru
Abstract
This paper presents a novel approach for depth map estimation
from a single image using information about edge blur. The blur
amount at the edge is calculated from the gradient magnitude ratio
of the input and re-blurred images. The final depth map can be
obtained by propagating estimated information from the edges to
the entire image using cross-bilateral filtering. Experimental
results for real images and video sequences demonstrate the
effectiveness of this method in providing a plausible depth map
for 2D-to-3D conversion that generates comfortable stereo
viewing.
Keywords: Image Processing, Depth Map, Defocus Blur, Crossbilateral Filtering.
Maxim Smirnov
YUVsoft Corp.
ms@yuvsoft.com
( )
( )
Here, A and B is the amplitude and offset of the edge,
respectively.
The blurred edge is the function i(x) (Fig. 1(b)), which is the
convolution of f(x) and g(x,σ) where the latter is a Gaussian
function with an unknown parameter σ.
( )
( )
(
)
( ) (
∫
)
1. INTRODUCTION
3D images and videos have become more and more popular in
everyday life, but the process of making 3D content is still very
complicated and expensive. Many techniques for automatic 2Dto-3D conversion, and depth map estimation from 2D image
sources plays an important role in this process.
Depth can be recovered either from binocular cues, such as stereo
correspondence, or monocular cues, such as shading, perspective
distortion, motion and texture. Conventional methods of depth
map estimation have relied on multiple images, video sequences,
calibrated cameras or specific scenes, or a huge training database.
Stereo vision [3] measures disparities between a pair of images of
the same scene taken from two different viewpoints, and it uses
these disparities to recover the depth map. Shape from shading [4]
offers another method for monocular depth reconstruction, but it
is difficult to apply to scenes that do not have fairly uniform color
and texture. Saxena et al. [2] presented an algorithm for
estimating the depth map from numerous monocular image
features using MRF modeling and a training database. The
proposed algorithm in this paper focuses on a more challenging
problem: recovering depth from a single defocused image
captured by an uncalibrated camera without any prior information.
2. DEPTH ESTIMATION
The main part of the proposed algorithm is blur estimation at the
edges and proper depth propagation from the edges to the entire
image.
2.1 Defocus blur model
To estimate a blur value, the defocus blur model and special blur
estimation method were introduced in [5].
Assume that the ideal edge is the step function f(x) (Fig. 1(a)) with
the jump at x = 0:
( )
{
Figure 1: (a) Ideal edge function f(x).
(b) Blurred edge function i(x).
2.2 Blur value estimation
A blurred edge with unknown parameter σ is re-blurred using a
Gaussian function with a known parameter σ1.
( )
( )
(
)
Consider the gradient magnitude of i1(x):
( )
(
))
(( )
( )
( )
(((
)
( )
(
(
))
(
))
)
√ (
)
The gradient magnitude ratio of the re-blurred edge and the
original edge is the function Bi(x).
( )
( )
( )
(
√
)
The maximum of the function is at the edge location (x = 0):
( )
( )
( )
√
This function depends only on blur parameters, and the unknown
σ can be estimated from the gradient magnitude ratio:
√
( )
.
2.4 Depth Propagation
The next important step in depth estimation is correct propagation
of the blur information from the edges to the entire image. To
achieve this, iterative cross-bilateral mask-oriented filtering was
used:
(
)
(
∬
Figure 2: (a) Blurred edge i(x) and re-blurred edge i1(x).
(b) Gradient magnitudes of i(x) and i1(x).
(c) Gradient magnitude ratio function Bi(x).
2.3 Blur value filtering
Blur values are estimated for each point of the image’s edge mask.
Assuming the entire scene in the image is beyond the camera’s
focal plane, the depth at the edges can be measured by the
estimated parameter σ, forming a sparse map D0(x,y) and the mask
of points with known depth values Mask0. Because of noise in the
image and inaccuracies in edge detection, however, the estimate
of the sparse depth map D0(x,y) may contain some errors. To
avoid such estimation errors, several prefiltering techniques are
proposed.
The first step of filtration is outlier removal. Points with rare
depth values relative to the depth value histogram are excluded
from D0 and Mask0 (Fig. 3, row 1 and 2).
The second step is median filtering of D0 according to mask and
color difference. Figure 3 illustrates the result of median filtering;
following noise reduction, two main depth profiles appeared.
(
)
(
)
( (
)
(
)
)
where Di(x,y) is the depth map estimated at the iteration i of the
propagation process, Maski is the propagated region mask, σs and
σc are spatial and color weights respectively and
is the
normalization factor.
3. FINAL DEPTH MAP
The final depth map of the image is obtained from the propagated
depth map. Following the sky detection process, the sky region is
defined as the most distant area of the scene. The refined depth
map is then filtered using cross-bilateral filtering.
3.1 Sky detection
To achieve more-accurate depth maps for outdoor scenes, a sky
detection algorithm is introduced (Fig. 4). This algorithm is based
on color segmentation of potential sky areas in HSV color space.
Figure 4: Original image and detected sky region.
Figure 3: Histograms of depth values in sparse depth map.
The sky region is assumed to be in the upper area of the image, so
the method recursively propagates the sky region from the
image’s upper edge. If a pixel color corresponds to a comparison
criterion, the method marks it as a sky pixel and then checks the
neighboring pixels. The algorithm stops if the current pixel
doesn’t correspond to the color criterion or if a color difference
between the current and previous pixels exceeds a constant
threshold.
The color criterion used here was chosen empirically and consists
of three parts:
Figure 6: Comparison of proposed method (column 3) and [5] (column 2).
1.
( (
)
) ( (
)
to detect light tones in the sky
)
2.
( (
)
) ( (
)
to detect grey tones in the sky
3.
( (
)
) ( (
)
) ( (
)
to detect blue, orange and yellow tones in the sky
gradient magnitude ratio of the re-blurred and original images. A
final depth map is then obtained using iterative cross-bilateral
mask-oriented filtering. The proposed technique generates
plausible depth maps that can be more accurate than those
produced by existing methods.
)
)
4. RESULTS
In this work the Canny edge detection algorithm [1] was used and
was tuned for uniform estimation of edges with different
magnitudes.
The parameter σ1 for re-blurring was set to σ1 = 0.5. The estimated
blur value at the edge location is then filtered using median
filtering with a radius of 4 and using outlier exclusion according
to histogram values.
For bilateral propagation, the parameters were set to σs = 10 and
σc = 7, and Ω was defined as a circle of radius 30.
6. ACKNOWLEDGMENTS
This work was partially supported by grant 10-01-00697-a from
the Russian Foundation for Basic Research.
7. REFERENCES
[1] Canny J., “A computational approach to edge detection,”
IEEE Trans. Pattern Anal. Mach. Intell. 8(6) (1986) 679-698.
[2] Saxena A., Chung S. H., and Ng A. Y., “Learning depth from
single monocular images,” in Neural Information Processing
System (NIPS) 18, 2005.
[3] Scharstein D. and Szeliski R., “A taxonomy and evaluation
of dense two-frame stereo correspondence algorithms,”
International Journal of Computer Vision (IJCV), vol. 47, 2002.
[4] Shao M., Simchony T., and Chellappa R., “New algorithms
for reconstruction of a 3-d depth map from one or more images”,
In Proc IEEE CVPR, 1988.
[5] Zhuo S. and Sim T., “On the Recovery of Depth from a
Single Defocused Image,” In Computer Analysis of Images and
Patterns. Springer, 2009.
About the author
Dmitriy Vatolin (M’06) received his M.S. degree in 1996 and his
Ph.D. in 2000, both from Moscow State University. Currently he
is head of the Video Group at the CS MSU Graphics & Media
Lab. He is author of the book Methods of Image Compression
(Moscow, VMK MGU, 1999), co-author of the book Methods of
Data Compression (Moscow, Dialog-MIFI, 2002) and co-founder
of www.compression.ru and www.compression-links.info. His
research interests include compression techniques, video
processing and 3D video technics: optical flow, depth estimation from motion, focus, cues, video matting, background restoration,
high quality stereo generation. His contact email is
dmitry@graphics.cs.msu.ru.
Figure 5: Source images (column 1) and estimated depth maps
(column 2) using proposed method.
Figure 5 shows the results of the algorithm for real images. Rows
1 and 3 illustrates that the proposed method preserves all thin
details of the source image, providing a correct depth map.
In figure 6 the proposed algorithm was compared with that of [5],
which produced a coarse layered depth map with some layers not
corresponding well to edges in the source image. By contrast, the
results of the proposed algorithm are more accurate, continuous
and detailed, all while preserving object boundaries.
5. CONCLUSION
This paper presents a novel approach for depth map estimation
from a single image. The blur amount is calculated from the
Maxim Smirnov is the chief technology officer at YUVsoft IT
company. Maxim has graduated from Saint Petersburg State
University of Aerospace Instrumention in 2001 and received a
Ph.D. in data compression from the same university in 2005. His
research interests include forecasting of processes in technical and
economic systems, video and universal data compression, stereo
video. His contact email is ms@yuvsoft.com.
Dmitry Akimov is a student at Moscow State University’s
Department of Computational Mathematics and Cybernetics. His
contact email is dakimov@graphics.cs.msu.ru.