|M. Pawan Kumar|
REGION SELECTION FOR SCENE UNDERSTANDING
M. Pawan Kumar and D. Koller
We consider the problem of simultaneously dividing an image into coherent regions and assigning labels to regions using a global energy function. We form a large dictionary of putative regions using bottom-up over-segmentation techniques and formulate the problem of selecting the regions and their labels as an integer program. We provide an efficient dual decomposition method to solve an accurate linear programming relaxation of the integer program.
Why are Regions Useful?
While pixel-based models provide very accurate solutions for low-level vision problems such as interactive binary segmentation or dense stereo correspondence, they have had limited success in high-level vision. The underlying reason is the noise present in features extracted from regularly shaped patches around the pixels. To overcome this problem, several researchers have suggested the use of regions (a set of coherent, connected pixels) and shown promising results on various applications, such as those shown below.
Region Selection using Energy Minimization
Region-based models are typically employed using a two-step strategy: (i) obtain the regions using a standard over-segmentation approach such as mean-shift; and (ii) assign labels to the region (depth values for 3D reconstruction, semantic classes for segmentation). However, regions obtained in this manner have no information about the high-level task.
We propose to simultaneously obtain the regions and their labels by minimizing a global energy function. This problem is extremely challenging since the number of pixel-to-region assignments is more than exponential in the number of pixels. To overcome this problem, we make use of the fact that the union and intersection of the segments obtained by an over-segmentation approach can provide us with a (large) dictionary of putative regions. We can then cast our problem as a combinatorial optimization task of selecting regions and their labels from the dictionary.
The regions and their labels are selected according to the following criteria: (i) the selected regions must cover the entire image; (ii) no two selected regions should overlap with each other; and (iii) the global energy function should be minimized. This can be cast as an integer program, which allows us to obtain an approximate solution using a linear programming relaxation that replaces the binary variables of the integer program with real variables in the interval [0,1].
Efficient Dual Decomposition
We solve the linear programming relaxation using dual decomposition  by diving the problem into the following subproblems.
We demonstrate the efficacy of our method using scene segmentation as an example application. We compare the results obtained using our regions with other types of regions that are commonly used in computer vision, namely (i) Intersection - the intersection of a set of over-segmentations (in our case, 3 different over-segmentation obtained by changing the parameters of mean-shift); (ii) Segmentation - the best over-segmentation in terms of the energy funciton; and (iii) Gould et al. - the region selection method of . We also show a comparison with the pixel-based model described in .
 D. Hoiem, A. Efros and M. Herbert. Geometric context from a single image. In ICCV, 2005.