M. Pawan Kumar 
HOME

SELFPACED LEARNING FOR LATENT VARIABLE MODELS M. Pawan Kumar, B. Packer and D. Koller
Overview We develop an accurate iterative algorithm for learning the parameters of a latent variable model such as latent structural SVM. Our approach uses the intuition that the learner should be presented with the training samples in a meaningful order: easy samples first, hard samples later. At each iteration, our approach simultaneously chooses the easy samples and updates the parameters. Latent Variable Models
Learning Latent Variable Models Given a training dataset D = {(x_{i},y_{i}), i = 1,2,...,n}, we wish to learn the parameters w of the model. Typically, the parameters are learned by iteratively optimizing a function F(w) in two steps: (i) perform inference over latent variables; and (ii) update the parameters. Examples of such methods include the expectationmaximization (EM) algorithm for maximizing the likelihood and the concaveconvex procedure (CCCP) algorithm for latent structural SVM. SelfPaced Learning Selfpaced learning modifies the update step such that the new parameters are learned using only a subset of easy samples at each iteration. In more detail, let the original update step involve solving the following optimization problem: min_{w} f(w) where f(w) = r(w) + Σ_{i} l_{i}(w). The function r(w) is the regularization term and l_{i}(w) is the loss of the i^{th} sample. Selfpaced learning replaces the above problem with the following: min_{w,v} g(w,v) where g(w,v) = r(w) + Σ_{i} v_{i} l_{i}(w)  1/K Σ_{i} v_{i}. The term v_{i} is a binary variable that indicates whether the i^{th} sample is easy (v_{i} = 1) or not (v_{i} = 0), and K is the selfpaced weight that determines how many samples are considered easy. As an illustration, consider the binary classification problem (circles vs. triangles) shown below. Easy samples are shown in green, difficult samples are shown in red. As K decreases the number of samples deemed easy increases.
The overall selfpaced learning algorithm is summarized below. Results We tested selfpaced learning using four standard machine learning applications that can be formulated as a latent structural SVM. We provide a comparison of selfpaced learning with the state of the art CCCP algorithm. The details of all the experiments are provided in our NIPS 2010 paper. Object Classification Input x is an image. Output y is the category of the object present in the image. For simplicity, we assume that each image only contains instances of a single object category. The hidden variables h model the bounding box of the object in the image. We use the Mammals dataset, which consists of 271 images each containing one of 6 different mammals. The dataset is split in a 90%10% traintest fold. The figure below shows the imputed bounding box during four different iterations of the selfpaced learning algorithm. The blue bounding boxes indicate an easy sample, while red bounding boxes indicate difficult samples. Note that the algorithm automatically discards those samples whose bounding boxes are incorrect.
The figure below shows the value of the objective function, the training error and the testing error for five different folds of the dataset. Note that selfpaced learning outperforms CCCP on all three measures.
Motif Finding Input x is a DNA sequence. Output y indicates whether x binds to a protein of interest. The hidden variables h model the location of the motif that provides useful cues about the binding affinity of a sequence. We use the five randomly selected proteins from the UniProbe dataset. For each protein we have around 40,000 sequences, which we split into five different folds with 50% for training and 50% for testing. The figure below shows the average Hamming distance between the imputed motifs over different iterations. The low value in the initial iterations of selfpaced learning indicates that the algorithm is successful in selecting easy samples first. Note that the Hamming distance for the motifs obtained using selfpaced learning is lower than that of the motifs obtained using CCCP at convergence.
The figure below shows the mean (over five folds) value of the objective function, the training error and the testing error for five different proteins. Once again, selfpaced learning outperforms CCCP on all three measures.
Digit Recognition Input x is an image. Output y is the digit shown in the image. For simplicity, we consider a binary classification problem where each class is a digit. The hidden variables h model the rotation of digits in the images. We use the standard MNIST dataset that specifies a training and testing split for each digit. The figure below shows the relative objective function value (between CCCP and selfpaced learning), the training error and the testing error for four different binary classification tasks. While the performance of CCCP and selfpaced learning is comparable for three of the tasks, selfpaced learning significantly outperforms CCCP for the 1 vs. 7 classication task for several values of the parameter C.
Noun Phrase Coreference Input x is a document. Output y is the clustering of the nouns in the document such that each cluster corresponds to a single object. The hidden variables h model a forest over the nouns, with each tree corresponding to a cluster in y. We use the MUC6 dataset that consists of 60 documents. We use the 5050 split into training and testing that was employed in [1]. The figure below shows the relative objective function value (between CCCP and selfpaced learning), the training error and the testing error for different values of C. The bottom plot corresponds to the pairwise loss, while the top plot corresponds to the MITRE loss. Once again, selfpaced learning outperforms CCCP.
References
[1] C.N. Yu and T. Joachims. Learning structural SVMs with latent variables. In ICML, 2009.
