Notation The regions of a dictionary are indexed using r or r'. The intersection of all the regions provides a set of superpixels, which are indexed using s. The label set of the highlevel vision task is denoted by L = {1,...,h}. We introduce an augmented label set L' = {0} ∪ L, where the label 0 indicates that the region is not selected. The unary potential of a region r taking a label i in L is denoted by θ_{r}(i). The pairwise potential of two neighboring regions (regions that share at least one boundary pixel and do not overlap) r and r' taking labels i and j in L is denoted by θ_{rr'}(i,j). Integer Programming Formulation Using the above notation, the problem of simultaneously selecting the regions and their labels can be cast as the following integer program:
The binary variable y_{r}(i) indicates whether a region r is assigned a label i from the set L' or not. The objective function corresponds to the energy of the solution specified by y. The constraints specify that each region is either not selected, or if selected it is assigned exactly one label. The last constraint ensures that the entire image is explained and no two overlapping regions are selected.
