Computer Vision Misc Reading Group
Wednesdays, 4:30 - 6:00, Intel Lab (Top Floor, Collaborative Innovation Center)

Mailing List Subscription | | Presenter List | | Slides | | Paper Suggestions | | Previous Years | Related Links |

Announcements:

2008 Schedule

Jump to next talk
(the highlighted row)

Date Presenter Description
1/9/2008 TBA TBA
1/16/2008 Cancelled Scheduling mix-up. This meeting is cancelled.
1/23/2008 Dhruv Batra I'm going to talk about this recent talk from ICCV '07 -- Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik. Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification,

This is an improvement over their previous work -- Andrea Frome, Yoram Singer, Jitendra Malik. Image Retrieval and Recognition Using Local Distance Functions. NIPS 2006. A good background reference (which I won't necessarily talk about) is -- M. Schultz and T. Joachims, Learning a Distance Metric from Relative Comparisons, NIPS 2003.

1/30/2008 Dave Bradley I will be talking about deep learning in general and Restricted Boltzman Machines (RBMs) in particular. Deep learning refers to learning machines that have several intermediate representation layers. Traditionally, with the exception of convolutional neural networks, these deep machines have been hard to train. Recently, however, greedy layer-wise pretraining with unsupervised data has been found to produce good results for a variety of deep architectures. I will be covering some recent work presented at NIPS that use RBMs in deep architectures to model natural images (Osindero and Hinton 2007) and recognize the orientation of faces (Salakhutdinov and Hinton 2007). I will also cover an interesting empirical evaluation of RBMs that was presented at ICML (Larochelle et al. 2007) and takes a more critical view of the challenges remaining in deep learning.

Papers

Osindero, S. and Hinton, G. E. Modeling image patches with a directed hierarchy of Markov random fields. Advances in Neural Information Processing Systems 20, 2007

Salakhutdinov, R. R. and Hinton, G. E. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes. Advances in Neural Information Processing Systems 20 , 2007

Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra and Yoshua Bengio, An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation, International Conference on Machine Learning proceedings, 2007

2/6/2008 Henry Kang I will talk about learning a novel kernel from multiple kernels and its application in recognition tasks. In computer vision community, Support Vector Machine has been largely practiced. The recent research has been focused on engineering good kernels for each recognition task, such as Pyramid Matching Kernel and Spatial Pyramid Matching Kernel. In ICCV 2007, there are two papers accepted discussing generating a novel kernel by learning from multiple kernels. The key idea is different kernels might have different discriminative-invariance power, it is advantageous to combine them together linearly and form a new kernel.

Application Papers:

M. Varma and D. Ray. Learning the discriminative power-invariance trade-off. International Conference on Computer Vision, October 2007.

Ankita Kumar, Cristian Sminchisescu. Support Kernel Machines for Object Recognition. ICCV 2007

Background in Machine Learning community:

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for Support Vector Machines. Machine Learning, 46:131¨C159, 2002.

A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. More efficiency in multiple kernel learning. In ICML, 2007.

F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In NIPS, 2004.

2/13/2008 Sanjeev Koppal The paper I am presenting is : Multi-View Stereo for Community Photo Collections

It can be found at: http://grail.cs.washington.edu/projects/mvscpc/

2/20/2008 Ankur Datta I am presenting the following cute iccv 2007 paper:

Synthetic Aperture Tracking: Tracking through Occlusions

Abstract: Occlusion is a significant challenge for many tracking algorithms. Most current methods can track through transient occlusion, but cannot handle significant extended occlusion when the object's trajectory may change significantly. We present a method to track a 2D object through significant occlusion using multiple nearby cameras (e.g., a camera array). When an occluder and object are at different depths, different parts of the object are visible or occluded in each view due to parallax. By aggregating across these views, the method can track even when any individual camera observes very little of the target object. Implementationwise, the methods are straightforward and build upon established single-camera algorithms. They do not require explicit modeling or reconstruction of the scene and enable tracking in complex, dynamic scenes with moving cameras. Analysis of accuracy and robustness shows that these methods are successful when upwards of 70% of the object is occluded in every camera view. To the best of our knowledge, this system is the first capable of tracking in the presence of such significant occlusion.

2/27/2008
Slides
Jean-Francois Lalonde Continuing the ICCV'07 review, I will be talking about the following paper from Yael Pritch et al.:

Webcam synopsis: peeking around the world

Here is the abstract:

The world is covered with millions of webcams. Some are private, but many transmit everything in their field of view over the internet 24 hours a day. A web search finds public webcams in airports, intersections, classrooms, parks, shops, ski resorts, and more. These public webcams are an endless resource, and some sites are already mapping them by location or by functionality. But when a webcam is selected - most of the video broadcast will be of no interest due to lack of activity. We propose to generate a short video that will be a synopsis of an infinite video stream, such as generated by a webcam. We would like to address queries like "I would like to watch in one minute the highlights of this camera broadcast during the past day". The two major phases are: (i) An online conversion of the video stream into a searchable structure based on objects and activities (rather than frames). (ii) A response phase, generating the video synopsis as a response to the user's query. To include maximum information in a short synopsis we simultaneously show activities that may have happened at different times. The synopsis video can also be used as an index into the original video stream, restoring the chronological order.

3/5/2008 Stano Funiak In my talk, I will cover two papers. First, I will briefly review the Maximum Variance Unfolding (MVU) method for dimensionality reduction:

Learning a kernel matrix for nonlinear dimensionality reduction K. Q. Weinberger, F. Sha, and L. K. Saul, ICML 2004

Then I will talk about a recent paper that allows one to integrate side information (such as class labels) into the embedding and provides a theoretical justification for MVU:

Colored Maximum Variance Unfolding Le Song, Alex Smola, Karsten Borgwardt, Arthur Gretton NIPS 2007 http://books.nips.cc/papers/files/nips20/NIPS2007_0492.extra.zip

3/19/2008 Jonathan Huang I'll present: A novel set of rotationally and translationally invariant features for images based on the non-commutative bispectrum by Risi Kondor

abstract: We propose a new set of rotationally and translationally invariant features for image or pattern recognition and classification. The new features are cubic polynomials in the pixel intensities and provide a richer representation of the original image than most existing systems of invariants. Our construction is based on the generalization of the concept of bispectrum to the three-dimensional rotation group SO(3), and a projection of the image onto the sphere.

3/26/2008 Yaser Sheikh I'll be presenting http://www.cs.rutgers.edu/%7Eelgammal/pub/LeeICCV07ViewPostureManifold.pdf, Lee and Elgammal, from ICCV 2007.
4/2/2008 Christopher Geyer Here's the paper and abstract:

Deformable Template As Active Basis by Ying Nian Wu, Zhangzhang Si, Chuck Fleming, and Song-Chun Zhu in ICCV 2007 (this paper won honorable mention at ICCV '07)

Research homepage: http://www.stat.ucla.edu/~ywu/ActiveBasis.html

This article proposes an active basis model and a shared pursuit algorithm for learning deformable templates from image patches of various object categories. In our generative model, a deformable template is in the form of an active basis, which consists of a small number of Gabor wavelet elements at different locations and orientations. These elements are allowed to slightly perturb their locations and orientations before they are linearly combined to generate each individual training or testing example. The active basis model can be learned from training image patches by the shared pursuit algorithm. The algorithm selects the elements of the active basis sequentially from a dictionary of Gabor wavelets. When an element is selected at each step, the element is shared by all the training examples, in the sense that a perturbed version of this element is added to improve the encoding of each example. Our model and algorithm are developed within a probabilistic framework that naturally embraces wavelet sparse coding and random field.

4/9/2008 Minh Hoai Nguyen Title: Imaged-based Shaving

Authors: Minh Hoai Nguyen, Jean-Francois Lalonde, Alexei Efros, Fernando De la Torre.

Abstracts: Many categories of objects, such as human faces, can be naturally viewed as a composition of several different layers. For example, a bearded face with glasses can be decomposed into three layers: a layer for glasses, a layer for the beard and a layer for other permanent facial features. While modeling such a face with a linear subspace model could be very difficult, layer separation allows for easy modeling and modification of some certain structures while leaving others unchanged. In this paper, we present a method for automatic layer extraction and its applications to face synthesis and editing. Layers are automatically extracted by utilizing the differences between subspaces and modeled separately. We show that our method can be used for tasks such beard removal (virtual shaving), beard synthesis, and beard transfer, among others.

PDF: http://graphics.cs.cmu.edu/projects/imageshaving/nguyen_eurographics_08.pdf

4/16/2008 Gunhee Kim This talk serves as my speaking requirement for the MS degree.

Thesis: Link analysis techniques for object modeling and recognition

My thesis is based on the following two papers.

1. Gunhee Kim, Christos Faloutsos, and Martial Hebert, "Unsupervised Modeling of Object Categories Using Link Analysis Techniques", CVPR 2008. (Accepted for Oral) (available at http://www.cs.cmu.edu/~gunhee/publish/cvpr08_gunhee.pdf )

2. Gunhee Kim, Christos Faloutsos, and Martial Hebert, "Modeling and Recognition of Object Categories with Combination of Topic Contents and Geometric Similarity Links", ECCV 2008. (Submitted)

A copy of the thesis is available at: see email for detail.(link available from Tuesday. Since the ECCV paper is still under review, *Please do not redistribute!*)

Abstract:

This paper proposes a novel approach for unsupervised modeling and recognition of object categories in which we first build a large-scale complex network which captures the interactions of all unit visual features across the entire training set and we infer information, such as which features are in which categories, directly from the graph by using link analysis techniques. The link analysis techniques are based on well-established graph mining techniques used in diverse applications such as WWW, bioinformatics, and social networks. The techniques operate directly on the patterns of connections between features in the graph rather than on statistical properties, e.g., from clustering in feature space. We argue that the resulting techniques are simpler, and we show that they perform similarly or better compared to state of the art techniques on both common and more challenging data sets. Also, we extend this link analysis idea to combine it with the statistical framework of topic contents. By doing so, our approach not only dramatically increases performance but also provides feasible solutions to some persistent problems of statistical topic models based on bag-of-words representation such as modeling of geometric information, computational complexity, and the inherent ambiguity of visual words. Our approach can be incorporated in any generative models, but here we consider two popular models, pLSA and LDA. Experimental results show that the topic models updated by adding link analysis terms significantly outperform the standard pLSA and LDA models. Furthermore, we presented competitive performances on unsupervised modeling, classification, and localization tasks with datasets such as MSRC and PASCAL2005.

Thesis Committee Members: Martial Hebert(Chair), Christos Faloutsos(CSD ), Marius Leordeanu

4/23/2008 Santosh Kumar Divvala I will be presenting a small paper that we contributed to the CVPR 2008 workshop.

"Using lots of unlabelled data to help single-view geometry estimation" Abstract: We describe a preliminary investigation of utilising large amounts of unlabelled image data to help in the estimation of rough scene layout. We take the single-view geometry estimation system of Hoiem et al. as the baseline and see if it is possible to improve its performance by considering a set of similar scenes gathered from the web. The two complimentary approaches being considered are 1) improving surface classification by using average geometry estimated from the matches, and 2) improving surface segmentation by injecting segments generated from the average of the matched images. The system is evaluated using the labelled 300-image dataset of Hoiem et al. and shows promising results.

I am still in the process of updating the camera-ready copy of the paper. However, I have put a version exclusively for tomorrow's Misc-read audience(See email for details).

4/30/2008 Pyry Matikainen TBA
5/7/2008 David Lee TBA
5/14/2008 Marius Leordeanu TBA
5/21/2008 Tom Stepleton TBA

Meetings in Previous Years

Paper Lists from Previous Years

Related Links

This file is located at: /afs/cs/project/vmr/www/misc_read/