Following is a selection of research
projects I worked on during my graduate studies at UCF:
Multi-view Tracking in Crowded
Recognition in Groups of People
Based 3D Reconstruction
Invariant Object Recognition
Challenge Contest: Where Am I?
Route Panoramas for
TREC Video Retrieval Evaluation Forum (Trecvid)
Multi-view Tracking in Crowded Scenes
Occlusion and lack of visibility in crowded
scenes make it very difficult to track individual people correctly and
consistently. This problem is particularly hard to tackle in single camera
systems. We have developed a multi-view approach to tracking people in
crowded scenes where people may be partially or completely occluding each
other. Our approach is to use multiple views in synergy so that information
from all views is combined to detect objects. To achieve this we develop a
planar homography constraint to resolve occlusions
and robustly determine locations on the ground plane corresponding to the
feet of the people. To find tracks we obtain feet regions over a window of
frames and stack them creating a space time volume. Feet regions belonging to
the same person form contiguous spatio-temporal
regions that are clustered using a graph cuts segmentation approach. Figure 1
below shows the various steps in our algorithm. Results on one of our test
sequences are shown in figure 2 (click the figure to download video).
Associated publication: Saad M. Khan, Mubarak Shah, A
Multiview Approach to Tracking People in Crowded
Scenes using a Planar Homography Constraint, 9th European
Conference on Computer Vision, Graz, Austria, 2006.
Various steps in our algorithm shown in the figure.
Tracking results on a sequence captured with four cameras.
Activity Recognition in Groups of People
In this project we developed a novel approach to recognize the class of
activities characterized by their rigidity in formation for example people
parades, airplane flight formations or herds of animals. The central idea is
to model the entire group as a collective rather than focusing on each
individual separately. We model the formation as a 3D polygon with each
corner representing a participating entity (see figure 1). Tracks from the entities
are treated as tracks of feature points on the 3D polygon. Based on the rank
of the track matrix we can determine if the 3D polygon under consideration
behaves rigidly or undergoes non-rigid deformation.
Associated publication: Saad M. Khan, Mubarak Shah, Detecting
Group Activities using Rigidity of Formation, ACM Multimedia,
Figure 1: (a) An
example of a parade scene. The rigidity of the formation (the red polygon)
characterizes the parade activity. (b) The diagram gives an outline of our
proposed method for classification of rigidity in formation.
Figure 2: Each row contains frames from a video sequence of multiple
people walking. The frames are overlaid with tracks of the participating
people. The first two sequences are detected as structured activities
(parades) while the last two sequences are random crowds.
Based 3D Reconstruction
In this project we developed a purely image-based approach to fusing
foreground silhouette information from multiple arbitrary views for
applications like 3D reconstruction. Our approach does not require 3D
constructs like camera calibration to carve out 3D voxels or project visual
cones in 3D space. Using planar homographies and
foreground likelihood information from a set of arbitrary views, we show that
visual hull intersection can be performed in the image plane without
requiring to go in 3D space. This process delivers a 2D grid of object
occupancy likelihoods representing a cross-sectional slice of the object.
Subsequent slices of the object are obtained by extending the process to
planes parallel to a reference plane in a direction along the body of the
object (see figure 1). Figure 2 shows the 3D reconstruction results. A
detailed narration video describing the work can be downloaded from the link
Associated publication: Saad M. Khan, Pingkun
Yan, Mubarak Shah, A Homographic
Framework for the Fusion of Multi-view Silhouettes, International
Conference of Computer Vision, Rio de Janeiro, Brazil, 2007.
Download (100 MB)
(.zip file 27 MB)
Figure 1: Visual hull intersection at multiple planes in the scene.
Each deliver a slice of the object.
Figure 2: The 3D reconstruction results from some of our experiments.
The images on the right are the zoomed in views of the reconstructed objects.
Notice the slices.
View Invariant Object Recognition
In this project we worked on developing a view invariant object class
recognition system. Instead of using a complicated mechanism for relating
multiple 2D training views, we establish spatial connections between
different views by mapping them directly to the surface of 3D model
representing the shape of the object. The 3D shape of an object is
reconstructed using our approach described here. Features
are computed in each 2D model view and mapped to the 3D shape model using the
same homographic framework. To generalize the model for object class
detection, features from supplemental views are also considered. A codebook
is constructed from all of these features and then a 3D feature model is
built. Given a 2D test image, correspondences between the 3D feature model
and the testing view are identified by matching the detected features. Based
on the 3D locations of the corresponding features, several hypotheses of
viewing planes can be made. The one with the highest confidence is then used
to detect the object using feature location matching. Performance of the
proposed method has been evaluated by using the PASCAL VOC challenge dataset
and promising results are demonstrated.
Associated publication: Pingkun Yan, Saad M.
Khan, Mubarak Shah, 3D
Model based Object Recognition from Arbitrary View, International
Conference of Computer Vision, Rio de Janeiro, Brazil, 2007.
Construction of 3D feature model for motorbikes. 3D shape model of motorbike
(at center) is constructed using the model views (images on the inner circle)
taken around the object from different viewpoints. Supplemental images (outer
circle) of different motorbikes are obtained by using Google’s image search
Detection of motorbikes using our approach. The ground truth is shown in
green and red boxes display our detected results.
ICCV Challenge Contest: Where Am I?
This contest was held at the International
Conference of Computer Vision, 2005. Given a set of images
taken at known GPS locations, the problem was to find GPS locations of test
images taken at unknown locations but roughly in the same area (view
overlap). Our team consisted of five researchers including myself. We
received an honorable mention at the conference for achieving fourth place amongst
twenty participating teams from top universities all over the
Figure 1: The figure demonstrates the basic task. Given a set of
images taken at known GPS locations (a), we had to find the GPS locations of
un-labeled pictures (b).
Figure 2: A visualization of our results from the contest. The
satellite picture at the bottom is overlaid with our localization results.
Blue marks our localization and red circles are ground truth. Yellow links
show error in localization.
Route Panoramas and Localization
Using traditional SLAM/SFM based approaches the task of localization becomes
intractable when the area under investigation reaches city/town size. The
amount of data (pictures/videos) required to visually map a city,
comprehensively, can be exhaustive for most search algorithms. To circumvent
this problem in this project we visually map the area using route panoramas.
Route panoramas (pushbroom images) provide a
compact yet comprehensive medium to visually capture large areas. They are
easily generated by mounting a camera on a moving vehicle (car) and driving
around town. Given a query image taken at an arbitrary location in the area,
we show that we can accurately recover the location of the camera by finding it’s
epipole in the route panorama of the scene. To this
end we have shown that there exists a fundamental matrix between a route
panorama and a perspective image of the same scene. The fundamental matrix is
calculated using feature matches as correspondences between the query image
and the route panorama.
Associated publications: Saad M. Khan, Fahd Rafi,
Mubarak Shah, Where Was the
Picture Taken: Image Localization in Route Panoramas using Epipolar Geometry, International
Conference of Multimedia and Expo (ICME), Toronto, Canada 2006.
Figure 1: An example route panorama. The route panorama is closely
estimated by a linear pushbroom camera ( ortho-perspective). The ortho-perspective projection (orthographic horizontally
and perspective vertically) means closer objects like cars, trees and
poles get squeezed in, while important landmarks, like buildings that are
further away are adequately captured.
Figure 2: Demonstration of the epipolar
geometry between a perspective and linear pushbroom
camera used in our approach to link up perspective and pushbroom
(route panoramas) images.
Figure 3: Some results. (a) is the route
panorama. (b) is the test perspective image captured
in the scene. (c) shows the epipole
of the test image in the route panorama. (d) shows
the epipole trajectory in the test image i.e. the
projection of the camera trajectory used to construct the route panorama.