Vision and Mobile Robotics Laboratory | Publications
|
|
Home | Members | Projects | Publications | Software | Videos | Job opportunities | Internal |
2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 |
1997 |
2006 |
Annotation: "Defects experienced during construction are costly and preventable. However, inspection programs employed today cannot adequately detect and manage defects that occur on construction sites, as they are based on measurements at specific locations and times, and are not integrated into complete electronic models. Emerging sensing technologies and project modeling capabilities motivate the development of a formalism that can be used for active quality control on construction sites. In this paper, we outline a process of acquiring and updating detailed design information, identifying inspection goals, inspection planning, as-built data acquisition and analysis, and defect detection and management. We discuss the validation of this formalism based on four case studies." . |
Annotation: "In this research we address the problem of classification and labeling of regions given a single static natural image. Natural images exhibit strong spatial dependencies, and modeling these dependencies in a principled manner is crucial to achieve good classification accuracy. In this work, we present Discriminative Random Fields (DRFs) to model spatial interactions in images in a discriminative framework based on the concept of Conditional Random Fields proposed by Lafferty et al (Lafferty et al., 2001). The DRFs classify image regions by incor- porating neighborhood spatial interactions in the labels as well as the observed data. The DRF framework offers several advantages over the conventional Markov Random Field (MRF) framework. First, the DRFs allow to relax the strong assumption of conditional independence of the observed data generally used in the MRF framework for tractability. This assumption is too restrictive for a large number of applications in computer vision. Second, the DRFs derive their classification power by exploiting the probabilistic discriminative models instead of the generative models used for modeling observations in the MRF framework. Third, the interaction in labels in DRFs is based on the idea of pairwise discrimination of the observed data making it data-adaptive instead of being fixed a priori as in MRFs. Finally, all the parameters in the DRF model are estimated simultaneously from the training data unlike the MRF framework where the likelihood parameters are usually learned separately from the field parameters. We present preliminary experiments with man-made structure detection and binary image restoration tasks, and compare the DRF results with the MRF results." . |
Annotation: "" . |
Annotation: "Image understanding requires not only individually esti- mating elements of the visual world but also capturing the interplay among them. In this paper, we provide a frame- work for placing local object detection in the context of the overall 3D scene by modeling the interdependence of ob- jects, surface orientations, and camera viewpoint. Most object detection methods consider all scales and locations in the image as equally likely. We show that with probabilistic estimates of 3D geometry, both in terms of surfaces and world coordinates, we can put objects into perspective and model the scale and location variance in the image. Our approach reflects the cyclical nature of the problem by allowing probabilistic object hypotheses to re- fine geometry and vice-versa. Our framework allows pain- less substitution of almost any object detector and is easily extended to include other aspects of image understanding. Our results confirm the benefits of our integrated approach. " . |
Annotation: "We present an efficient method for maximizing energy functions with first and second order potentials, suitable for MAP labeling estimation problems that arise in undirected graphical models. Our approach is to relax the integer constraints on the solution in two steps. First we efficiently obtain the relaxed global optimum following a procedure similar to the iterative power method for finding the largest eigenvector of a matrix. Next, we map the relaxed optimum on a simplex and show that the new energy obtained has a certain optimal bound. Starting from this energy we follow an efficient coordinate ascent procedure that is guaranteed to increase the energy at every step and converge to a solution that obeys the initial integral constraints. We also present a sufficient condition for ascent procedures that guarantees the increase in energy at every step. " . |
Annotation: "We introduce a method for object class detection and localization which combines regions generated by image segmentation with local patches. Region-based descriptors can model and match regular textures reliably, but fail on parts of the object which are textureless. They also cannot repeatably identify interest points on their boundaries. By incorporating information from patch-based descriptors near the regions into a new feature, the Region-based Context Feature (RCF), we can address these issues. We apply Region-based Context Features in a semi-supervised learning framework for object detection and localization. This framework produces object-background segmentation masks of deformable objects. Numerical results are presented for pixel-level performance. " . |
Annotation: "Occlusion boundaries are notoriously difficult for many patch-based computer vision algorithms, but they also provide potentially useful information about scene structure and shape. Using short video clips, we present a novel method for scoring the degree to which edges exhibit occlusion. We first utilize a spatio-temporal edge detector which estimates edge strength, orientation, and normal motion. By then extracting patches from either side of each detected (possibly moving) edglet, we can estimate and compare motion to determine if occlusion is present. This completely local, bottom-up approach is intended to provide powerful low-level information for use by higher-level reasoning methods." . |
Annotation: "We describe an extension to ordinary patch-based edge detection in images using spatio-temporal volumetric patches from video. The inclusion of temporal information enables us to estimate motion normal to edges in addition to edge strength and spatial orientation. The method can handle complex edges in clutter by comparing distributions of data on either half of an extracted patch, rather than modeling the intensity profile of the edge. An efficient approach is provided for building the necessary histograms which samples candidate edge orientations and motions. Results are compared to classical spatio-temporal filtering techniques. " . |
Annotation: "For real-time stereo vision systems, the standard method for estimating sub-pixel stereo disparity given an initial integer disparity map involves fitting parabolas to a matching cost function aggregated over rectangular windows. This results in a phenomenon known as pixel-locking, which produces artificiallypeaked histograms of sub-pixel disparity. These peaks correspond to the introduction of erroneous ripples or waves in the 3D reconstruction of truly flat surfaces. Since stereo vision is a common input modality for autonomous vehicles, these inaccuracies can pose a problem for safe, reliable navigation. This paper proposes a new method for sub-pixel stereo disparity estimation, based on ideas from Lucas-Kanade tracking and optical flow, which substantially reduces the pixel-locking effect. In addition, it has the ability to correct much larger initial disparity errors than previous approaches and is more general as it applies not only to the ground plane. We demonstrate the method on synthetic imagery as well as real stereo data from an autonomous outdoor vehicle." . |
Annotation: "Despite the fact that color is a powerful cue in object recognition, the extraction of scale-invariant interest regions from color images frequently begins with a conversion of the image to grayscale. The isolation of interest points is then completely determined by luminance, and the use of color is deferred to the stage of descriptor formation. This seemingly innocuous conversion to grayscale is known to suppress saliency and can lead to representative regions being undetected by procedures based only on luminance. Furthermore, grayscaled images of the same scene under even slightly different illuminants can appear sufficiently different as to affect the repeatability of detections across images. We propose a method that combines information from the color channels to drive the detection of scale-invariant keypoints. By factoring out the local effect of the illuminant using an expressive linear model, we demonstrate robustness to a change in the illuminant without having to estimate its properties from the image. Results are shown on challenging images from two commonly used color constancy datasets. " . |
Annotation: "An important task in the analysis and reconstruction of curvilinear structures from unorganized 3-D point samples is the estimation of tangent information at each data point. Its main challenges are in (1) the selection of an appropriate scale of analysis to accommodate noise, density variation and sparsity in the data, and in (2) the formulation of a model and associated objective function that correctly expresses their effects. We pose this problem as one of estimating the neighborhood size for which the principal eigenvector of the data scatter matrix is best aligned with the true tangent of the curve, in a probabilistic sense. We analyze the perturbation on the direction of the eigenvector due to finite samples and noise using the expected statistics of the scatter matrix estimators, and employ a simple iterative procedure to choose the optimal neighborhood size. Experiments on synthetic and real data validate the behavior predicted by the model, and show competitive performance and improved stability over leading polynomial-fitting alternatives that require a preset scale. " . |
Annotation: "An important task in the analysis and reconstruction of curvilinear structures from unorganized 3-D point samples is the estimation of tangent information at each data point. Its main challenges are in (1) the selection of an appropriate scale of analysis to accommodate noise, density variation and sparsity in the data, and in (2) the formulation of a model and associated objective function that correctly expresses their effects. We pose this problem as one of estimating the neighborhood size for which the principal eigenvector of the data scatter matrix is best aligned with the true tangent of the curve, in a probabilistic sense. We analyze the perturbation on the direction of the eigenvector due to finite samples and noise using the expected statistics of the scatter matrix estimators, and employ a simple iterative procedure to choose the optimal neighborhood size. Experiments on synthetic and real data validate the behavior predicted by the model, and show competitive performance and improved stability over leading polynomial-fitting alternatives that require a preset scale. " . |
2005 |
Annotation: "Three-dimensional ladar data are commonly used to perform scene understanding for outdoor mobile robots, specifically in natural terrain. One effective method is to classify points using features based on local point cloud distribution into surfaces, linear structures or clutter volumes. But the local features are computed using 3-D points within a support-volume. Local and global point density variations and the presence of multiple manifolds make the problem of selecting the size of this support volume, or scale, challenging. In this paper we adopt an approach inspired by recent developments in computational geometry and investigate the problem of automatic data-driven scale selection to improve point cloud classification. The approach is validated with results using data from different sensors in various environments classified into different terrain types (vegetation, solid surface and linear structure). " . |
Annotation: "Autonomous navigation in natural environment requires three-dimensional (3-D) scene representation and interpretation. High density laser-based sensing is commonly used to capture the geometry of the scene, producing large amount of 3-D points with variable spatial density. We proposed a terrain classification method using such data. The approach relies on the computation of local features in 3-D using a support volume and belongs, as such, to a larger class of computational problems where range searches are necessary. This operation on traditional data structure is very expensive and, in this paper, we present an approach to address this issue. The method relies on reusing already computed data as the terrain classification process progresses over the environment representation. We present results that show significant speed improvement using ladar data collected in various environments with a ground mobile robot." . |
Annotation: Current feature-based object recognition methods use information derived from local image patches. For robustness, features are engineered for invariance to various transformations, such as rotation, scaling, or affine warping. When patches overlap object boundaries, however, errors in both detection and matching will almost certainly occur due to inclusion of unwanted background pixels. This is common in real images, which often contain significant background clutter, objects which are not heavily textured, or objects which occupy a relatively small portion of the image. We suggest improvements to the popular Scale Invariant Feature Transform (SIFT) which incorporate local object boundary information. The resulting feature detection and descriptor creation processes are invariant to changes in background. We call this method the Background and Scale Invariant Feature Transform (BSIFT). We demonstrate BSIFT's superior performance in feature detection and matching on synthetic and natural images. |
Annotation: "Errors in laser based range measurements can be divided into two categories: intrinsic sensor errors (range drift with temperature, systematic and random errors), and errors due to the interaction of the laser beam with the environment. The former have traditionally received attention and can be modeled. The latter in contrast have long been observed but not well characterized. We propose to do so in this paper. In addition, we present a sensor independent method to remove such artifacts. The objective is to improve the overall quality of 3-D scene reconstruction to perform terrain classification of scenes with vegetation.". |
Annotation: "Despite significant advances in image segmentation techniques, evaluation of these techniques thus far has been largely subjective. Typically, the effectiveness of a new algorithm is demonstrated only by the presentation of a few segmented images and is otherwise left to subjective evaluation by the reader. Little effort has been spent on the design of perceptually correct measures to compare an automatic segmentation of an image to a set of hand-segmented examples of the same image. This paper demonstrates how a modification of the Rand index, the Normalized Probabilistic Rand (NPR) index, meets the requirements of largescale performance evaluation of image segmentation. We show that the measure has a clear probabilistic interpretation as the maximum likelihood estimator of an underlying Gibbs model, can be correctly normalized to account for the inherent similarity in a set of ground truth images, and can be computed efficiently for large datasets. Results are presented on images from the publicly available Berkeley Segmentation dataset.". |
Annotation: "In this paper, we explore the problem of three-dimensional motion planning in highly cluttered and unstructured outdoor environments. Because accurate sensing and modeling of obstacles is notoriously difficult in such environments, we aim to build computational tools that can handle large point data sets (e.g. LADAR data). Using a priori aerial data scans of forested environments, we compute a network of free space bubbles forming safe paths within environments cluttered with tree trunks, branches and dense foliage. The network (roadmap) of paths is used for efficiently planning paths that consider obstacle clearance information. We present experimental results on large point data sets typical of those faced by Unmanned Aerial Vehicles, but also applicable to ground-based robots navigating through forested environments." . |
Annotation: "In this work, we address the detection of vehicles in a video stream obtained from a moving airborne platform. We propose a Bayesian framework for estimating dense optical flow over time that explicitly estimates a persistent model of background appearance. The approach assumes that the scene can be described by background and occlusion layers, estimated within an Expectation-Maximization framework. The mathematical formulation of the paper is an extension of our previous work where motion and appearance models for foreground and background layers are estimated simultaneously in a Bayesian framework" . |
Annotation: "AVI Video available at: http://www.cs.cmu.edu/~dhoiem/projects/popup/popup_movie_912_500_DivX.avi". |
Annotation: "Classification of various image components (pixels, regions and objects) in meaningful categories is a challenging task due to ambiguities inherent to visual data. Natural images exhibit strong contextual dependencies in the form of spatial interactions among components. For example, neighboring pixels tend to have similar class labels, and different parts of an object are related through geometric constraints. Going beyond these, different regions e.g., sky and water, or objects e.g., monitor and keyboard appear in restricted spatial configurations. Modeling these interactions is crucial to achieve good classification accuracy. In this thesis, we present discriminative field models that capture spatial interactions in images in a discriminative framework based on the concept of Conditional Random Fields proposed by Lafferty et al. The discriminative fields offer several advantages over the Markov Random Fields (MRFs) popularly used in computer vision. First, they allow to capture arbitrary dependencies in the observed data by relaxing the restrictive assumption of conditional independence generally made in MRFs for tractability. Second, the interaction in labels in discriminative fields is based on the observed data, instead of being fixed a priori as in MRFs. This is critical to incorporate different types of context in images within a single framework. Finally, the discriminative fields derive their classification power by exploiting probabilistic discriminative models instead of the generative models used in MRFs. Since the graphs induced by the discriminative fields may have arbitrary topology, exact maximum likelihood parameter learning may not be feasible. We present an approach which approximates the gradients of the likelihood with simple piecewise constant functions constructed using inference techniques. To exploit different levels of contextual information in images, a two-layer hierarchical formulation is also described. It encodes both short-range interactions (e.g., pixelwise label smoothing) as well as long-range interactions (e.g., relative configurations of objects or regions) in a tractable manner. The models proposed in this thesis are general enough to be applied to several challenging computer vision tasks such as contextual object detection, semantic scene segmentation, texture recognition, and image denoising seamlessly within a single framework. ". |
2004 |
Annotation: We present a multi-projector stereoscopic display which incorporates a high-resolution inset image, or fovea. The system uses four projectors, and the image warping required for on-screen image alignment and foveation is applied as part of the rendering pass. We discuss the problem of ambiguous depth perception between the boundaries of the inset in each eye and the underlying scene, and present a solution where the inset boundaries are dynamically adapted as a function of the scene geometry. An efficient real-time method for boundary adaptation is introduced. It is applied as a post-rendering step, does not require direct geometric computations on the scene, and is therefore practically independent of the size and complexity of the model. . |
Annotation: We present a stereoscopic display system which incorporates a high-resolution inset image, or fovea. We describe the specific problem of false depth cues along the boundaries of the inset image, and propose a solution in which the boundaries of the inset image are dynamically adapted as a function of the geometry of the scene. This method produces comfortable stereoscopic viewing at a low additional computational cost. The four projectors need only be approximately aligned: a single drawing pass is required, regardless of projector alignment, since the warping is applied as part of the 3-D rendering process. |
2003 |
Annotation: {This paper proposes a method to fit a skeleton or stick-model to a blob to determine the pose of a person in an image. The input is a binary image representing the silhouette of a person and the ouput is a stick-model coherent with the pose of the person in this image. A torso model is first defined, and is then scaled and fitted to the blob using the distance transform of the original image. Then, the fitting is performed independently for each of the four limbs (two arms, two legs), using again the distance transform. The fact that each limb is fitted independently speeds-up the fitting process, avoiding the combinatorial complexity problems that are frequent with this type of method.} url="http://vision.gel.ulaval.ca/fr/publications/Id_444/PublDetails.php" keywords= "pose recognition" . |
2002 |
2001 |
2000 |
1999 |
1998 |
1997 |