Visual Learning with Weak Supervision


Full-day CVPR 2013 Tutorial - June 23rd



Matthew Blaschko
Center for Visual Computing
Ecole Centrale Paris
Chatenay-Malabry, France
matthew.blaschko "at"

M. Pawan Kumar
Center for Visual Computing
Ecole Centrale Paris
Chatenay-Malabry, France

Ben Taskar
Department of Computer Science
University of Washington
Seattle, USA
taskar "at"


Structured output prediction refers to the task of learning to predict elements of a complex interdependent output space that correspond to a given input. In recent years, it has made a tremendous impact on computer vision by providing an elegant formulation for systems that perform object detection, semantic segmentation, pose estimation and various other important visual tasks. In order to train such systems, it is typical to require full annotation of the output to be predicted, such as bounding boxes for object detection, pixel level labeling for segmentation or stick-figures for pose estimation (as shown in the images below).

Object Detection

Semantic Segmentation

Pose Estimation

Why Should I Learn With Weak Supervision?

Collecting full annotations for all the images and videos in a large dataset is an onerous and expensive task. The size of the current datasets clearly reflects this fact. While the largest available dataset that provides pixelwise labels consists of only a few thousand images, the largest dataset that provides only image-level labels (which indicate the presence or absence of an object category in an image) consists of millions of images.

Instead of relying on small supervised dataset for complex visual tasks, weak learning allows us to use large, inexpensive datasets. For example, we can use image-level labels for object detection, or bounding box information for pose estimation and semantic segmentation.

Size of Datsets

How Will The Tutorial Help Me?

This tutorial is aimed at researchers who wish to use a large amount of weakly supervised training data to learn complex visual models. The introduction of the tutorial covers the basics of supervised learning. Thus, no prior knowledge of structured output prediction is assumed. While the tutorial describes known theoretical results, no proofs are provided. Instead, the focus is on the description of the machine learning frameworks as well as the practical techniques used to overcome the difficulties presented by medium to large scale datasets. Highlights of the tutorial include

• An overview of learning frameworks in computer vision, and the placement of learning with weak supervision within that landscape.

• An overview of popular datasets with detailed annotations, and some of the tasks and challenges that may be addressed with weak annotations.

• An introduction to the state of the art methods for learning from weak annotations.

• Programming demos with downloadable code for all the topics covered in the tutorial.


  • Supervised Learning
    • Max-Margin Methods
    • Probabilistic Methods

  • Motivation for Weak Supervision
    • Overview of Current Datasets
    • Drawbacks of Supervised Learning

  • Latent Support Vector Machine
    • Mathematical Formulation
    • Practical Techniques to Improve Speed
    • Practical Techniques to Improve Accuracy

  • Latent SVM Generalizations
    • Modeling Uncertainty in Latent Variables
    • Max-Margin Min-Entropy Models
    • Dissimilarity Coefficient Learning

  • Posterior Regularization
    • Generative vs. Discriminative Modeling
    • Decomposable Constraints
    • Connections to Generalized Expectations

  • Conclusion
    • Strengths and Weaknesses of Current Methods
    • Open Problems
    • Questions/Comments


  • Part I: Machine Learning and Structured Prediction in Computer Vision. [PDF]
  • Part II: Loss-based Learning with Weak Supervision. [PDF] [PPT]
  • Part III: Weakly Supervised Learning with Rich Prior Knowledge. [PDF]

Code for Tutorial Experiments

Other Code

Relevant Papers

Links to relevant papers will be available soon.