Katrina Sokolova
Barak Shilo

Though lackluster at first glance, the disparity map forms the basis of many computer vision techniques. A disparity map is effectively a limited 3D model. Applications range from classic vision goals (e.g. feature detection) to graphics motivated goals such as scene rendering. We explore potential applications of the disparity map and the means of implementation. In the process, we assess the performance of the Zitnick—Kanade stereo matching algorithm and extend the software, allowing the visual isolation of a specified depth range.

Introduction

Depth and edge detection are important parts of image analysis in the study of computer vision. Image processing involves extracting information about shapes, distances, textures and colors that compose an image, and using this data to classify objects in this image. We will focus on depth detection and using image disparity factor to approximate distances between objects in the image. We will experimentally prove the correctness of triangulation technique in stereo matching algorithm using ZKS software.

A disparity map is a depth map where the depth information is derived from offset images of the same scene. Depth maps can be generated using various other methods, such as time-of-flight (sonic, infrared, laser), which we will not explore here. Although these active methods can often produce far more accurate maps at short distances, the passive method we use has its benefits, including applicability at long distances.

One computer graphics application that makes use of a disparity map is the synthesis of intermediate frames between two offset images. To produce an intermediate frame, the Painter's Algorithm can be used on a pixel level:

  1. Choose a multiplying factor, m, by which to translate the original left image, with 0 < m < 1
  2. Starting with the deepest pixels (those with the least disparity): where Dp is the disparity of the pixel, translate the pixel m * Dp pixels to the right. Newly translated (nearer) pixels overwrite previous ones.

The process is straightforward and can produce good results given an accurate disparity map. Of course, it can be improved. Using pixel information from both images would fill in the gaps for areas that are occluded in the left image.

Undergraduates Vu and Smulski compared Phase-Correlation and Multiple-baseline Stereo to calculate motion translation (i.e. disparity), with the goal of producing intermediate frames. They found Multiple-baseline Stereo (Kanade) to be more effective [2].

Triangulation Geometrics

The technique for gauging depth information given two offset images is called triangulation. Triangulation makes use of a number of variables; the center point of the cameras (c1, c2), the cameras focal lengths (F), the angles (O1, O2), the image planes (IP1, IP2), and the image points (P1, P2).

The following examples show how the triangulation technique works.

triang.JPG

For any point P of some object in the real world, P1 and P2 are pixel point representations of P in the images IP1 and IP2 as taken by cameras C1 and C2. F is the focal length of the camera (distance between lens and film). B is the offset distance between cameras C1 and C2. V1 and V2 are the horizontal placement of the pixel points with respect to the center of the camera. The disparity of the points P1 and P2 from image to image can be calculated by taking the difference of V1 and V2. This is the equivalent of the horizontal shift of point P1 to P2 in the image planes. Using this disparity one can calculate the actual distance of the point in the real world from the images. The following formula can be derived from the geometric relation above:

Where $D :=$ Distance of point in real world, $b :=$ base offset, $f :=$ focal length of camera, and $d :=$ disparity:

(1)
\begin{equation} D = b f / d \end{equation}

This formula allows us to calculate the real world distance of a point. If what we are interested in is relative distance of points rather than exact distance we can do this with even less information. The base offset and focal length of the camera are the same for both images. Hence the distance of different points in the images will vary solely based on this disparity component. Therefore we can gauge relative distance of points in images without having the base offset and focal length.

Triangulation works under the assumption that points P1 and P2 represent the same point P in the real world. An algorithm for matching these two points must be performed. This can be done by taking small regions in one image and comparing them to regions in the other image. Each comparison is given a score and the best match is used in calculating the disparity. The technique for scoring region matching varies, but usually is based on the number of pixels that are the same on an exact or near-exact point basis. Both triangulation technique for stereo image matching and technique for point-matching within a region are successfully implemented in the “Cooperative Algorithm for Stereo Matching and Occlusion Detection” [1].

Zitnick—Kanade Stereo Algorithm

ZK Stereo (Cooperative Algorithm for Stereo Matching and Occlusion Detection [1]) is a program developed by Carnegie Mellon University. It applies the triangulation algorithm discussed above.

The program reads in a file that allows the user to specify the two images, the range of disparity, the size of the window for pattern matching, and some other options for refining the stereo output.

params.JPG

The algorithm iterates through all the points of an image. For each point it extracts a window, the size of which is specified by a user. After extracting the window, it then iterates through every possible location and gives a scored based on how closely each pixels from one image matches against a pixel in the other image. When all points are matched, the program computes disparity values using triangulation technique for stereo matching.

sample_zks.JPG

The images above are sample input. When run, ZK stereo outputs a map of the disparity between the two different images, represented by a grayscale image where lighter shades means more disparity (object is closer) and darker shades means less disparity (object is farther away).

Also displayed in the ZK Stereo program is the confidence that the disparity is correct. The confidence is determined by how good of a score a point received when its window was matched to the other image. Typically edges of objects tend to have a higher confidence. This is because there is more variance at edges which makes it receive a higher score when matching it to the other image.

The Experiments

First Experiment

pair_st.JPG
disp_st.JPG

Our first experiment with the algorithm produced mediocre results. The resulting disparity map did a good job of separating the layers of depth within the image; however, the grayscale value that represented distance from the camera did not match the actual order of objects. For example, the clothespin is almost the same color as the eraser, while it was much further away from the camera during the experiment. Also, the background is a flat wall parallel to the image plane. Texture and shadows on the wall seem to have deceived the matching algorithm, erroneously giving depth to the surface; moreover, the three furthest objects appear darker than the wall, implying that they are behind the wall.

The reasons ZKS algorithm did not produce better results are:

  • presence of shadows in the test photos
  • some surfaces were non-uniformly reflecting light from flash
  • background was a flat single-colored surface
  • some parts of the first image were occluded in the second image and pixels that did not have matching pixels were colored black (program default)

Progressively Increasing NumIterations

One of the parameters to ZKS is NumIterations, which specifies how many times the algorithm is repeated over the disparity map. This has the effect of smoothing and refining the resulting map. Below is a sequence of maps demonstrating the output over a range of NumIterations values (0 to 15). For this example stereo pair, the disparity map is near optimal at a NumIterations of 8, with subsequent values exhibiting minimal marginal benefit. Below is a sample of the output images (Complete results).

coal_(25,46,1,2,1,0,0.96,SAD).dis.bmp
NumIterations: 0
coal_(25,46,1,2,1,8,0.96,SAD).dis.bmp
NumIterations: 8
coal_(25,46,1,2,1,15,0.96,SAD).dis.bmp
NumIterations: 15

CMU/VASC and University of Tsukuba Stereo Pairs

The Carnegie Mellon Vision and Autonomous Systems Center's Image Database is a useful resource for rectified left/right image pairs. The University of Tsukuba also has a great, multi-layered scene. We tested the ZKS algorithm using a few different pairs.

CMU/VASC Pentagon Scene (pentagon.zks)

pentagon_left.bmp pentagon_right.bmp pentagon_(-10,10,1,2,1,8,0.96,SAD).dis.bmp

University of Tsukuba Scene (lamp.zks)

scene_l.bmp scene_r.bmp
lamp_(0,20,1,2,1,8,0.96,SAD).dis.bmp

Stereogram Trials — New Hope for the Magic Eye Challenged!

A Single Image Stereogram (SIS), also known as a Single Image Random Dot Stereogram (SIRDS), is a synthesized image that incorporates stereo data into a single image. When viewed correctly, matching areas within the image merge together, and the disparities associated with them create the illusion of depth.

A SIS is produced using a depth map and a pattern that is repeated at disparity intervals corresponding to the map. Hence, we realized it would be possible to reverse-engineer these stereograms with ZKS. In fact, a SIS is an ideal input for a pixel based disparity detection algorithm because it is computer generated, and therefore doesn't exhibit any of the consistency issues of a pair of photographs (i.e. variation in noise, lighting, color, and alignment). We used a SIS from <http://archive.museophile.org/3d/> of the Batman emblem. We approximated the disparity manually by counting the pixels between matching areas within the stereogram. Because a SIS is essentially both a left and right image, only one input image was used in the configuration file. The input and output are below. A reduced version of the SIS was used as input to the program, however, the full-size SIS (pdf) is better suited for human viewing.

batman.bmp batman_(-25,-39,1,2,1,8,0.96,SAD).dis.bmp

Adding Color and Motion to Disparity Maps

In "Shape-Time Photography" [3], Freeman and Zhang describe a method to produce a kind of superior multiple-exposure photograph. The goal is a better visual description of spatial relationships among moving objects over time. The method takes into account depth information in order to meaningfully layer objects within the image. Incidentally, Freeman and Zhang make use of the ZK Stereo algorithm.

We wondered what other visual representations could build upon the utility of a raw disparity map. One effective representation, described in our introduction, involves animating a scene by calculating intermediate images. A rudimentary form of this is simply displaying the left and right images in quick succession. These methods attempt to directly simulate depth perception. However, other techniques are also desirable.

Sometimes a certain depth range is of interest, and segregating image information within that depth range is useful. Security applications come to mind. So, we decided to implement a highlight/isolate feature, which is demonstrated below, using the Tsukuba scene (highlighting.zks). We are able to successfully isolate the head and the lamp.

lamp50-60_(0,20,1,2,1,8,0.96,SAD).dis.bmp lamp50-60_(0,20,1,2,1,8,0.96,SAD).iso.bmplamp65-80_(0,20,1,2,1,8,0.96,SAD).dis.bmp lamp65-80_(0,20,1,2,1,8,0.96,SAD).iso.bmp

Technical Details

A Windows-compatible binary version of the ZKS algorithm is available, but we wanted to have a more convenient version of the software with access to the internals. Among other things, this allowed us to implement the highlighting feature. And, as a consequence, ZKS can now be run on UNIX and Mac OS X platforms. Using the bare algorithm source (stereo.h and stereo.cpp), implemented in C++, we filled in the necessary image-handling routines (our source files are: run-zks.cpp, cimage.h, and cimage.cpp).

The implementation of normalized correlation was buggy, but sums of absolute differences matching worked well, so we simply disabled the former (i.e. "SAD 1" is allowed in the .zks configuration file, "SAD 0" is not).

Software Used

ZK Stereo
C++ source for Zitnick and Kanade's stereo matching algorithm
EasyBMP
C++ library for using bitmap files

Convenience Improvements on ZKS Features

We implemented hyphenated ranges to facilitate batch processing of inputs. This feature was used to generate the 16 outputs for the NumIterations example. Also, having the bmp image files written automatically (rather than resorting to a screenshot) was a nice convenience.

Using Our Code

Download: Source code (zip)

Compile with:

g++ run-zks.cpp stereo.cpp EasyBMP.cpp cimage.cpp -o run-zks

To generate a disparity map using the sample images:

./run-zks coal.zks

The program outputs three files:

  • The disparity map image (.dis.bmp)
  • The disparity map float values in text format (.dis)
  • An image displaying the confidence values (.conf.bmp)

If the "Highlight" option is used, a .iso.bmp ("iso" is mnemonic for isolated) file is also output, with all non-highlighted regions of the image blacked out.

Our extension of the ZKS software is backward compatible with the .zks configuration file format. New features in the configuration file follow.

Hyphenated Ranges

Numeric inputs can be specified as hyphenated ranges.

For example, the following generates 20 different disparity maps. i.e. (30,50,0.96), (30,50,0.97), (31,51,0.96), …

The "MinMaxDiff" option constrains the disparity ranges to those where, in this example, the MinDisparity and MaxDisparity have a difference of 20 (i.e. (30,50) but not (30,51) or (30,52) etc.). The "MinMaxDiff" option is optional, but if it is included, it must go directly beneath "MaxDisparity." If "MinMaxDiff" wasn't set below, it would produce 10*10*2 = 200 disparity maps.

coal0.bmp coal1.bmp 1
MinDisparity 30-40
MaxDisparity 50-60
MinMaxDiff 20
WinRadL0 1
WinRadRC 2
WinRadD 1
NumIterations 0
MaxScaler 0.96-0.97
USE_SAD 1
Output File coal

Highlighting a Range of Depths

The "Highlight" line takes a "percentage of depth" argument between 1 and 100 (as a hyphenated range), and highlights that range in red in the disparity map. It also outputs a version of the input image with all regions outside of the depth range blacked out. The "Highlight" line must appear directly above the "WinRadL0" line. The following example highlights the nearest half of the depth range:

coal0.bmp coal1.bmp 1
MinDisparity 25
MaxDisparity 46
Highlight 50-100
WinRadL0 2
WinRadRC 2
WinRadD 1
NumIterations 0
MaxScaler 0.96
USE_SAD 1
Output File coal

Inherited Configuration Format

The basic .zks file format, inherited from ZKS, follows. From the README.txt file included with ZK Stereo:

s0 s1 i0
MinDisparity i1
MaxDisparity i2
WinRadL0 i3
WinRadRC i4
WinRadD i5
NumIterations i6
MaxScaler f0
USE_SAD i7
Output File s2

The above may be repeated so the program can do more than one stereo image pair at a time.

s0
Name of the first/reference (right) image. Must be a bmp file.
s1
Name of the second (left) image, must be rectified (column wise) relative to first image.
i0
Number of color bands, 1 for b/w and 3 for rgb.
i1
Minimum disparity value in pixels.
i2
Maximum disparity value in pixels. If i2 - i1 is too large the program will crash due to a shortage of memory. A difference of greater than 80 is not recommended. To make sure disparities are found correctly pad the disparity range by 3 or 5 pixels.
i3
Window radius used for computing the initial match values L0. Recommended values = 1 or 2
i4
Local Support Radius (row-column dimensions) used for averaging match values during each iteration. Recommended values = 1 to 3
i5
Local Support Radius (disparity dimension) used for averaging match values during each iteration. Recommended value = 1
i6
Number of iterations used to refine the disparity map. Recommended values = 5 to 15
f0
Used for linearly scaling the L0 values when SAD is used. The worse the image quality (more noise) the lower the value should be. Recommended values = 0.95 to 0.97
i7
1 if SAD (sums of absolute differences) is to be used for computing the initial match values. 0 if normalized correlation should be used. If image quality is good SAD (a value of 1) is recommended.
s2
Name of the output file which the disparity and confidence values will be printed to.

Conclusion

In this project we explored and enhanced the functionality of Zitnick—Kanade stereo matching algorithm software. We experimented with different settings that made the algorithms produce good results, and with settings that cause suboptimal performance. We observed the pattern of Increasing NumIterations. We ran the algorithm on a random dot stereogram and the results exceeded our expectations. We successfully implemented additional features such as depth level highliting and depth level isolation.

More Information

http://research.microsoft.com/~larryz/publications.htm
ZKS related research papers (also see "Massive Arabesque" demo video—live depth map generation allows virtual camera panning)

Related Links

from http://www.cs.cmu.edu/~cil/v-source.html

http://www.ai.sri.com/~konolige/svs/
area correlation stereo software in C
http://cat.middlebury.edu/stereo/code.html
"StereoMatcher" a C++ implementation of different stereo algorithms
http://www.hpl.hp.com/personal/mp/research/SVDStereo/stersvd.htm
Singular Value Decomposition — feature matching, includes some source code in C and Matlab
http://www.tina-vision.net/
TINA, a large library of computer vision functions in C
http://www.nottingham.ac.uk/~etzpc/sirds.html
SIRDS — random dot stereograms
http://www.rhythm.com/~keith/autoStereoGrams/
more stereograms, and a program to make your own

Holographic Displays

http://www.3dcgi.com/cooltech/displays/displays.htm
listing of autostereoscopic displays (3D monitors that don't require special glasses)
http://www.holografika.com/
an autostereoscopic display with video demos
Bibliography
1. C. Zitnick and T. Kanade, "A Cooperative Algorithm for Stereo Matching and Occlusion Detection," tech. report CMU-RI-TR-99-35, Robotics Institute, Carnegie Mellon University, October, 1999.
3. Freeman, W., and Zhang, H. "Shape-Time Photography," (pdf) Proceedings of CVPR 2003.

Site manager


Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.