M. Pawan Kumar 
HOME

REGION SELECTION FOR SCENE UNDERSTANDING M. Pawan Kumar and D. Koller
Overview We consider the problem of simultaneously dividing an image into coherent regions and assigning labels to regions using a global energy function. We form a large dictionary of putative regions using bottomup oversegmentation techniques and formulate the problem of selecting the regions and their labels as an integer program. We provide an efficient dual decomposition method to solve an accurate linear programming relaxation of the integer program. Why are Regions Useful? While pixelbased models provide very accurate solutions for lowlevel vision problems such as interactive binary segmentation or dense stereo correspondence, they have had limited success in highlevel vision. The underlying reason is the noise present in features extracted from regularly shaped patches around the pixels. To overcome this problem, several researchers have suggested the use of regions (a set of coherent, connected pixels) and shown promising results on various applications, such as those shown below.
Region Selection using Energy Minimization Regionbased models are typically employed using a twostep strategy: (i) obtain the regions using a standard oversegmentation approach such as meanshift; and (ii) assign labels to the region (depth values for 3D reconstruction, semantic classes for segmentation). However, regions obtained in this manner have no information about the highlevel task. We propose to simultaneously obtain the regions and their labels by minimizing a global energy function. This problem is extremely challenging since the number of pixeltoregion assignments is more than exponential in the number of pixels. To overcome this problem, we make use of the fact that the union and intersection of the segments obtained by an oversegmentation approach can provide us with a (large) dictionary of putative regions. We can then cast our problem as a combinatorial optimization task of selecting regions and their labels from the dictionary.
The regions and their labels are selected according to the following criteria: (i) the selected regions must cover the entire image; (ii) no two selected regions should overlap with each other; and (iii) the global energy function should be minimized. This can be cast as an integer program, which allows us to obtain an approximate solution using a linear programming relaxation that replaces the binary variables of the integer program with real variables in the interval [0,1]. Efficient Dual Decomposition We solve the linear programming relaxation using dual decomposition [4] by diving the problem into the following subproblems.
Results We demonstrate the efficacy of our method using scene segmentation as an example application. We compare the results obtained using our regions with other types of regions that are commonly used in computer vision, namely (i) Intersection  the intersection of a set of oversegmentations (in our case, 3 different oversegmentation obtained by changing the parameters of meanshift); (ii) Segmentation  the best oversegmentation in terms of the energy funciton; and (iii) Gould et al.  the region selection method of [3]. We also show a comparison with the pixelbased model described in [3].
References
[1] D. Hoiem, A. Efros and M. Herbert. Geometric context from a single image. In ICCV, 2005.
