Visual Motion (Dynamics)
Structure from motion
Interacting with a complex, unknown, dynamic
environment requires continuously updated knowledge of its shape and
motion. We propose several algorithms aimed at inferring shape, motion
and appearance causally and incrementally. See also the related project
object insertion in live video
Ambiguities and optimality in 3D motion estimation
Estimating 3D structure and motion can be
cast as a non-linear, high-dimensional optimization problem, prone to
local minima. Such local minima are intrinsic to the problem, and not
the algorithm or computational device used to solve it, and are
therefore true illusions. Can we identify, analyze, categorize such
illusions, and devise optimal algorithms to infer the global estimate
Real-time, vision-based navigation and interaction
Vision is a remote, distributed, passive
sensor crucial for primates to move within the environment. While
successful application of vision in the loop of a control system has
been demonstrated under partially controlled conditions (freeway
guidance, spacecraft landing), we tackle navigation and interaction
within unknown and dynamic environments by building representations
that can be used for localization, mapping and navigation.
How can we capture the "overall motion" for a
deforming object? How can we "separate" the overall motion from the
deformation? How do we characterize what is "conserved" during motion?
We propose a framework for modeling deforming motion that entails
defining a "moving average shape" and that allows for the simultanoues
registration and matching of images and for tracking deformable
We segment videos into domains of homogeneous
motion by minimzing an appropriate cost functionals. Our method
allows tracking moving objects in video sequences, reconstructing the
different depth layers of a 3D scene filmed by a moving camera and
segmenting motion patterns which cannot be distinguished based on their
Dynamic textures are sequences of images of
scenes that exhibit some form of temporal and possibly spatial
stationarity, such as fire, smoke, steam, foliage etc. Models of
dynamic textures can be used to generate novel synthetic sequences and
manipulate real ones.
How do we distinghish fog from steam? Models
of dynamic textures can be used to discriminate visual processes based
on their spatial as well as temporal statistics.
How do we detect the presence of smoke, and
identify where in the image it appears?
At a fairly high level of abstraction, a
human moving about can be represented as a dynamical system, driven by
intentions (actions), and outputing actuator forces, resulting in joint
trajectories. We study how one can infer actions from remote
measurements of joint angles or trajectories. Ultimately we want to be
able to identify an action regardless of the particular individual, and
to identify the individual regardless of the action. Preliminary
results show that simple dynamical models allow for successful
classification of action classes, such as walking gaits.
Our goal in this project is to build
synthetic models of human faces that can be driven by a speech signal,
while retaining the distinctive features of a particular individual.
Modeling and representation
Is it possible to define a flexible
representation of shape that is linear, so that the sum of two shapes
is a shape, and operations like differentiation, averaging and
orthogonal projection make sense? We represent the shape of closed
planar contours as the zero level set of functions that satisfy certain
partial differential equations, so that they are (quasi) linear by
Planar contours can be easily recognized
despite being presented under various transformations, such as scaling,
translation, projective transformations, in addition to being subjected
to measurement noise. Is it possible to define a signature that is
invariant with respect to such transformations, and at the same time
insensitive to noise?
By introducing prior knowledge on the shape
of objects of interest, one can drastically improve the robustness of
segmentation processes to noise, background clutter and partial
occlusion. We investigate methods to integrate such priors into level
set based segmentation schemes. By minimzing an appropriate cost
functional we simultaneously generate a knowledge-driven segmentation
of the input image and a decision about where to apply which prior. As
a result we can simultaneously reconstruct multiple familiar objects in
a given image.
We develop variational techniques for
matching closed planar contours without distinct landmark points.
Certain objects elicit perceptual responses:
a face can appear attractive or friendly, a car can appear aggressive
or comfortable, etc. Since such objects are characterized by their
shape (and to a lesser extent by their radiance), there must be some
form of "map" between geometry and qualitative perception. How is this
map represented? How can it be inferred? Can it be inverted, so as to
allow purposeful changes in geometry to achieve a desired perceptual
Multiple view geometry
Through most of the past decade we have been
engaged in the study of the geometry of multiple views, which plays a
key role in the reconstruction of the 3D structure of the scene, the
motion and calibration of the camera.
Given a sequence of images of a scene
containing multiple rigid objects moving independently, one can
estimate the number of objects, the motion of each object, and what
portion of the visual field corresponds to what object using algebraic
Occluding boundaries are visually salient
because they offen result in discontinuities in image intensity.
T-junctions arise when a curve terminates at an occluding boundary
(forming a "T"). Unfortunately, T-junctions do not correspond to
physical points on the scene, as they move with the viewpoint.
Nevertheless, we show that the motion of T-junctions on the image plane
contains information about the scene that can be exploited for
Visual Reconstruction (Photometry)
Radiance and shape estimation
Traditional stereo relies on the "brightness
constancy" assumption to establish correspondence between points in
different images. This allows "eliminating" photometry from the
equation and reduces stereo reconstruction to a purely geometric
problem. However, when the brightness assumption is not satisfied, one
cannot "separate" the reconstruction of shape from the reconstruction
of reflectance. We show under what condition such separation yields
optimal algorithms. The cost functional can be integrated either in the
image, or on the scene surface where the image back-projects. When
integrating on the scene, the optimality conditions involve derivatives
of the (noise-ridden, measured) images. However, when integrating on
the image, the optimality conditions only involve derivatives of the
(noiseless, ideal) model. Therefore, one can devise
infinite-dimensional gradient-based reconstruction algorithms that do
not involve derivatives of the data, with obvious improvement in
Traditional stereo relies on establishing
correspondence between points in different images. Unfortunately, such
correspondence cannot be established unless the scene is made of dull
matte objects, for instance with shiny, specular, or translucent
materials. We propose a novel approach that relies on matching image to
image, but on matching each image to an underlying model of the
geometry (shape)photometry (radiance tensor field) of the scene.
Discrepancy from the model is measured by the deviation from the ideal
rank of the radiance tensor field; we develop optimal algorithms to
infer shape and radiance from collections of images, based on
variational techniques and level set methods to integrate partial
When a scene contains no "features" (constant
albedo) or too many features (dense self-similar texture), traditional
stereo matching algorithms fail to find proper "correspondence." We
therefore seek to match image to image, but instead match all data to
an underlying model of the scene geometry and its photometry, subject
to the assumption of constant albedo.
Even when an object has constant albedo, the
measured irradiance is not, because of shading and other effects. While
one could model this effect explicitly (see Stereoscopic Shading
project), if illumination is static one can assume that it is the
albedo that is smooth, and exploit this assumption to recover shape and
Many real objects (especially man-made) are
made by composing different materials, and therefore they have
piecewise constant reflectance properties. We have developed algorithms
for estimating the shape, albedo, and albedo boundaries from
collections of images. The process involves performing region-based
segmentation on evolving surfaces.
When neither motion, nor shape nor albedo are
known, under suitable conditions one can simultaneously estimate shape
and camera pose by jointly registering various "regions" of the scene.
Illumination and reflectance
Smooth objects with constant albedo result in
smooth measured images due to non-uniform illumination. We develop
techniques to estimate shape, albedo and illumination properties of the
scene under the assumption of constant albedo and finite point light
It is well-known that blur conveys spatial
information. However, to what extent does it? Can one characterize the
set of shapes that are indistinghishable from any number of defocused
images? Since the answer depends on the radiance of the scene, do there
exist radiances (e.g. structured light patterns) that allow
reconstructing any shape? We present a mathematical analysis of the
observability properties of shape from defocus. We also present novel
techniques to reconstruct shape and radiance
Under the conditions for which one can
reconstruct shape from defocused images, we develop inference
algorithms that are optimal in the sense of least-squares. By
exploiting the properties of semi-infinite orthogonal projectors in
Hilbert spaces we can transform an infinite-plus-one-dimensional
optimization problem into a much more efficient (regularized)
one-dimensional optimization, with obvious consequences to
We develop efficient algorithms for
reconstructing 3D shape and radiance from blurred images that are
optimal in the sense of relative entropy. The algorithms consist of
evolving a surface from an initial point towards a (local) minimum of
an energy functional, via the numerical integration of a suitable
partial differential equation.
Images depend on the shape of the scene, its
radiance, as well as the optical characteristics of the imaging device.
In this work we show that one can learn the optical characteristics
from data. Our approach is robust to the point where one can learn the
optical characteristic of a "virtual" camera using synthetic training
data, and apply the results to real cameras in order to reconstruct the
shape of real scenes.
Estimating shape and radiance from blurred
images is well-known to be a severely ill-posed inverse problem. In
this work we propose an efficient solution via the forward solution of
a diffusive partial differential equation with a space-varying stopping
time. This allows us to have a well-behaved, straightforward numerical
algorithm that has proven robust and efficient.
Since images are captured by integrating
photon count over an interval of time (exposure), moving objects appear
blurred in ways that depend upon their shape, motion and reflectance.
We propose a collection of algorithms to estimate shape and motion of
moving objects from one single blurred image.
Visual Modeling for Recognition
Visual features for correspondance
How can we decide whether two images portray
the same scene? What is the scene? How is it related to the image? Are
there representations that are invariant with respect to nuisance
factors (viewpoint, illumination)? Are there image statistics
("feeatures") that do not alter decision performance?
Filtering, control and identification
Given a process that exhibits complex dynamic
behavior, one can choose to model it globally with a very complex
model, or to choose a simple class of models and represent the process
locally, together with the partition of the data into neighborhoods. We
explore the problem of identifying simple local model and their domain
for dynamic processes.
Particle filters are flexible algorithms to
propagate the conditional density of a dynamical model, represented
weakly as a collection of samples drawn from it. We explore particle
algorithms for dynamical models whose state space has a non-trivial
geometric structure, such as a Lie group or a homogeneous space.
We are interested in controlling a
non-holonomic robot as to follow a prescribed trajectory with
guaranteed performance. We propose an algorithm inspired by model-based
predictive control that involves controlling the local approximation of
the trajectory to be tracked, computed in real-time.
We explore the use of various signal
processing algorithms to enhance the perception capabilities of
patients with retinal implants.
DARPA Grand Challenge
The UCLA Vision Lab is engaged in the DARPA Grand Challenge as part of the Golem Group/UCLA team.
Center for Computational Biology
The convergence of the biomedical revolution and the information
technology revolution is a major event in the history of science. The
emerging discipline of Computational Biology is a natural result of
this convergence. The mathematical and computational sciences lie at
the center of this new endeavor, providing the tools and framework for
model building and quantitative analysis.
The Center for Computational Biology (CCB) was established to develop,
implement and test computational biology methods that are applicable
across spatial scales and biological systems. Our objective is to help
elucidate characteristics and relationships that would otherwise be
impossible to detect and measure.
Interactions fostered by this multi-disciplinary scientific network
will spawn novel strategies and will initiate training opportunities
for the next generation of relevant and promising biological
Active Vision Control System
Active Vision Control System for Complex Adversarial 3-D Environments
The Active-Vision Control Systems MURI is
a joint effort sponsored by the Air Force Office of Scientific
The MURI Project includes students, faculty and staff from StanFord
University, UC Berkeley and UCLA. The aim of the project is to develop
computational methods for the simulation of collaborative motion of
autonomous vehicles. The multi-disciplinary team consists of faculty
and researchers from applied mathematics, statistics, computer
science, electrical engineering and aeronautical engineering who
combine their expertise to derive practical control algorithms for
groups of collaborating vehicles. (Please follow the links to each of
the faculty members to obtain their publications and presentations).