|
RESEARCH
PROJECTS
Following is a selection of research projects I worked on during my graduate studies
at UCF:
Multi-view Tracking in Crowded Scenes
Activity Recognition in Groups of People
Image Based 3D Reconstruction
View Invariant Object Recognition
ICCV Challenge Contest: Where Am I?
Route Panoramas for Localization
Autonomous Vision Based UAV Navigation
TREC
Video Retrieval Evaluation Forum (Trecvid)
Multi-view Tracking in Crowded Scenes
Occlusion and lack of visibility in crowded scenes make it very difficult
to track individual people correctly and consistently. This problem is
particularly hard to tackle in single camera systems. We have developed a
multi-view approach to tracking people in crowded scenes where people may be
partially or completely occluding each other. Our approach is to use multiple
views in synergy so that information from all views is combined to detect
objects. To achieve this we develop a planar homography constraint to resolve
occlusions and robustly determine locations on the ground plane corresponding
to the feet of the people. To find tracks we obtain feet regions over a window
of frames and stack them creating a space time volume. Feet regions belonging
to the same person form contiguous spatio-temporal regions that are clustered
using a graph cuts segmentation approach. Figure 1 below shows the various
steps in our algorithm. Results on one of our test sequences are shown in
figure 2 (click the figure to download video).
Associated publication: Saad M. Khan, Mubarak Shah,
A Multiview
Approach to Tracking People in Crowded Scenes using a Planar Homography
Constraint, 9th European
Conference on Computer Vision, Graz, Austria, 2006.

Figure 1: Various steps in our
algorithm shown in the figure.
 
 
Figure 2: Tracking results on a sequence captured with four
cameras.
Activity Recognition in Groups of People
In this project we developed a novel approach to recognize the class of
activities characterized by their rigidity in formation for example
people parades, airplane flight formations or herds of animals. The
central idea is to model the entire group as a collective rather than
focusing on each individual separately. We model the formation as a 3D
polygon with each corner representing a participating entity (see
figure 1). Tracks from the entities are treated as tracks of feature
points on the 3D polygon. Based on the rank of the track matrix we can
determine if the 3D polygon under consideration behaves rigidly or
undergoes non-rigid deformation.
Associated publication:
Saad M.
Khan, Mubarak Shah, Detecting Group Activities
using Rigidity of Formation, ACM Multimedia,
Singapore 2005.

(a)
(b)
Figure 1:
(a)
An example of a parade scene. The rigidity of the formation (the red
polygon) characterizes the parade activity. (b) The diagram gives an
outline of our proposed method for classification of rigidity in
formation.
Figure 2: Each row contains frames from a
video sequence of multiple people walking. The frames are overlaid with
tracks of the participating people. The first two sequences are
detected as structured activities (parades) while the last two
sequences are random crowds.
Image Based 3D Reconstruction
In this project we developed a purely image-based approach to fusing
foreground silhouette information from multiple arbitrary views for
applications like 3D reconstruction. Our approach does not require 3D
constructs like camera calibration to carve out 3D voxels or project
visual cones in 3D space. Using planar homographies and foreground
likelihood information from a set of arbitrary views, we show that
visual hull intersection can be performed in the image plane without
requiring to go in 3D space. This process delivers a 2D grid of object
occupancy likelihoods representing a cross-sectional slice of the
object. Subsequent slices of the object are obtained by extending the
process to planes parallel to a reference plane in a direction along
the body of the object (see figure 1).
Figure 2 shows the 3D reconstruction results. A detailed narration
video describing the work can be downloaded from the link below.
Associated publication:
Saad M.
Khan, Pingkun Yan, Mubarak Shah, A Homographic
Framework for the Fusion of Multi-view Silhouettes,
International Conference of Computer Vision, Rio de Janeiro, Brazil,
2007.
Narration Video Download (100 MB)
Results Videos (.zip file 27 MB)

Figure 1: Visual hull intersection at
multiple planes in the scene. Each deliver a slice of the object.

(a)

(b)
Figure 2: The 3D reconstruction results from
some of our experiments. The images on the right are the zoomed in
views of the reconstructed objects. Notice the slices.
View Invariant Object Recognition
In this project we worked on developing a view invariant object class
recognition system. Instead of using a complicated mechanism for
relating multiple 2D training views, we establish spatial connections
between different views by mapping them directly to the surface of 3D
model representing the shape of the object. The 3D shape of an object
is reconstructed using our approach described here.
Features are computed in each 2D model view and mapped to the 3D shape
model using the same homographic framework. To generalize the model for
object class detection, features from supplemental views are also
considered. A codebook is constructed from all of these features and
then a 3D feature model is built. Given a 2D test image,
correspondences between the 3D feature model and the testing view are
identified by matching the detected features. Based on the 3D locations
of the corresponding features, several hypotheses of viewing planes can
be made. The one with the highest confidence is then used to detect the
object using feature location matching. Performance of the proposed
method has been evaluated by using the PASCAL VOC challenge dataset and
promising results are demonstrated.
Associated publication: Pingkun
Yan, Saad M. Khan, Mubarak Shah, 3D Model
based Object Recognition from Arbitrary View,
International Conference of Computer Vision, Rio de Janeiro, Brazil,
2007.
Figure 1: Construction of 3D feature model for motorbikes. 3D shape
model of motorbike (at center) is constructed using the model views
(images on the inner circle) taken around the object from different
viewpoints. Supplemental images (outer circle) of different motorbikes
are obtained by using Google’s image search
Figure 2: Detection of motorbikes using our approach. The ground
truth is shown in green and red boxes display our detected results.
ICCV Challenge Contest: Where Am I?
This contest
was held at the
International Conference
of Computer Vision, 2005. Given a set of
images taken at known GPS locations, the problem was to find GPS
locations of test images taken at unknown locations but roughly in the
same area (view overlap). Our team consisted of five researchers
including myself. We received an honorable mention at the conference for
achieving
fourth place amongst twenty participating teams from top
universities all over the world.
(a)
(b) Figure 1: The figure demonstrates the basic task. Given a set of images
taken at known GPS locations (a), we had to find the GPS locations of
un-labeled pictures (b).

Figure 2: A visualization of our results from the contest. The satellite
picture at the bottom is overlaid with our localization results. Blue
marks our localization and red circles are ground truth. Yellow links
show error in localization.
Route
Panoramas and Localization
Using
traditional SLAM/SFM based approaches the task of localization becomes
intractable when the area under investigation reaches city/town size.
The amount of data (pictures/videos) required to visually map a city,
comprehensively, can be exhaustive for most search algorithms. To
circumvent this problem in this project we visually map the area using
route panoramas. Route panoramas (pushbroom images) provide a compact
yet comprehensive medium to visually capture large areas. They are
easily generated by mounting a camera on a moving vehicle (car) and
driving around town. Given a query image taken at an arbitrary location
in the area, we show that we can accurately recover the location of the
camera by finding it’s epipole in the route panorama of the scene. To
this end we have shown that there exists a fundamental matrix between a
route panorama and a perspective image of the same scene. The
fundamental matrix is calculated using feature matches as
correspondences between the query image and the route panorama.
Associated publications:
Saad M. Khan, Fahd Rafi, Mubarak Shah,
Where Was the Picture Taken: Image Localization in Route Panoramas
using Epipolar Geometry, International Conference
of Multimedia and Expo (ICME), Toronto, Canada 2006.
Figure 1: An
example route panorama. The route panorama is closely estimated by a
linear pushbroom camera ( ortho-perspective). The ortho-perspective
projection (orthographic horizontally and perspective vertically)
means closer objects like cars, trees and poles get squeezed in, while
important landmarks, like buildings that are further away are adequately
captured.

Figure 2: Demonstration of the epipolar geometry between a perspective and
linear pushbroom camera used in our approach to link up perspective and
pushbroom (route panoramas) images.
 Figure 3: Some results. (a) is the route panorama. (b) is the test perspective
image captured in the scene. (c) shows the epipole of the test image in
the route panorama. (d) shows the epipole trajectory in the test image
i.e. the projection of the camera trajectory used to construct the route
panorama.
Autonomous Vision Based Navigation of a UAV
Autonomous control of unmanned vehicles is an important area of
research in artificial intelligence. Intelligence in an autonomous
robot must include strategies for mobility in order to achieve higher
lever functional tasks. This project aims at a system for an Unmanned
Aerial Vehicle (UAV) following moving targets on ground. The UAV has
physical constraints on airspeed and maneuverability. The target
however can move freely and in any general pattern. We assume minimum
knowledge about the target while navigating the aircraft. The system
includes visual tracking of the target with a camera mounted on the
UAV. The camera is also controlled by the closed loop algorithm
according to the position and orientation of the aircraft and the
position of the target. Aircraft stabilization and interpretation is
performed using an autopilot system "Piccolo" by Cloud Cap Technology.
The navigation and visual processing is performed on computers at the
ground, control commands from which are sent to the aircraft
wirelessly. The figure below shows our system architecture and pictures
of the individual components.
Associated publications: Fahd
Rafi, Saad M. Khan, Khurram Shafiq, Mubarak Shah.
Autonomous Target Following by Unmanned Aerial Vehicles,
SPIE
Defense and Security Symposium 2006, Orlando FL.
|

|

|
|
(a) The
system Architecture
|
(b) The
Piccolo Avionics System
|
|

|

|
|
(c) The
UAV
|
(d) The
Ground Station
|
|