Following is a selection of research projects I worked on during my graduate studies at UCF:
   Multi-view Tracking in Crowded Scenes
Activity Recognition in Groups of People
Image Based 3D Reconstruction
View Invariant Object Recognition
ICCV Challenge Contest: Where Am I?
Route Panoramas for Localization
    TREC Video Retrieval Evaluation Forum (Trecvid)


Multi-view Tracking in Crowded Scenes
Occlusion and lack of visibility in  crowded scenes make it very difficult to track individual people correctly and consistently. This problem is particularly hard to tackle in single camera systems. We have developed a multi-view approach to tracking people in crowded scenes where people may be partially or completely occluding each other. Our approach is to use multiple views in synergy so that information from all views is combined to detect objects. To achieve this we develop a planar homography constraint to resolve occlusions and robustly determine locations on the ground plane corresponding to the feet of the people. To find tracks we obtain feet regions over a window of frames and stack them creating a space time volume. Feet regions belonging to the same person form contiguous spatio-temporal regions that are clustered using a graph cuts segmentation approach. Figure 1 below shows the various steps in our algorithm. Results on one of our test sequences are shown in figure 2 (click the figure to download
Associated publication: Saad M. Khan, Mubarak Shah,
A Multiview Approach to Tracking People in Crowded Scenes using a Planar Homography Constraint, 9th European Conference on Computer Vision, Graz, Austria, 2006.

Figure 1: Various steps in our algorithm shown in the figure.

Figure 2: Tracking results on a sequence captured with four cameras.

Activity Recognition in Groups of People
In this project we developed a novel approach to recognize the class of activities characterized by their rigidity in formation for example people parades, airplane flight formations or herds of animals. The central idea is to model the entire group as a collective rather than focusing on each individual separately. We model the formation as a 3D polygon with each corner representing a participating entity (see figure 1). Tracks from the entities are treated as tracks of feature points on the 3D polygon. Based on the rank of the track matrix we can determine if the 3D polygon under consideration behaves rigidly or undergoes non-rigid deformation.
Associated publication: Saad M. Khan, Mubarak Shah,
Detecting Group Activities using Rigidity of Formation, ACM Multimedia, Singapore 2005.

                                             (a)                                                                                                              (b)
Figure 1:
 (a) An example of a parade scene. The rigidity of the formation (the red polygon) characterizes the parade activity. (b) The diagram gives an outline of our proposed method for classification of rigidity in formation.


Figure 2: Each row contains frames from a video sequence of multiple people walking. The frames are overlaid with tracks of the participating people. The first two sequences are detected as structured activities (parades) while the last two sequences are random crowds.


Image Based 3D Reconstruction
In this project we developed a purely image-based approach to fusing foreground silhouette information from multiple arbitrary views for applications like 3D reconstruction. Our approach does not require 3D constructs like camera calibration to carve out 3D voxels or project visual cones in 3D space. Using planar homographies and foreground likelihood information from a set of arbitrary views, we show that visual hull intersection can be performed in the image plane without requiring to go in 3D space. This process delivers a 2D grid of object occupancy likelihoods representing a cross-sectional slice of the object. Subsequent slices of the object are obtained by extending the process to planes parallel to a reference plane in a direction along the body of the object (see figure 1). Figure 2 shows the 3D reconstruction results.  A detailed narration video describing the work can be downloaded from the link below.

Associated publication: Saad M. Khan, Pingkun Yan, Mubarak Shah,
A Homographic Framework for the Fusion of Multi-view Silhouettes, International Conference of Computer Vision, Rio de Janeiro, Brazil, 2007. 
Narration Video Download (100 MB)
Results Videos (.zip file 27 MB)

Figure 1: Visual hull intersection at multiple planes in the scene. Each deliver a slice of the object.


Figure 2: The 3D reconstruction results from some of our experiments. The images on the right are the zoomed in views of the reconstructed objects. Notice the slices.


View Invariant Object Recognition
In this project we worked on developing a view invariant object class recognition system. Instead of using a complicated mechanism for relating multiple 2D training views, we establish spatial connections between different views by mapping them directly to the surface of 3D model representing the shape of the object. The 3D shape of an object is reconstructed using our approach described
here.  Features are computed in each 2D model view and mapped to the 3D shape model using the same homographic framework. To generalize the model for object class detection, features from supplemental views are also considered. A codebook is constructed from all of these features and then a 3D feature model is built. Given a 2D test image, correspondences between the 3D feature model and the testing view are identified by matching the detected features. Based on the 3D locations of the corresponding features, several hypotheses of viewing planes can be made. The one with the highest confidence is then used to detect the object using feature location matching. Performance of the proposed method has been evaluated by using the PASCAL VOC challenge dataset and promising results are demonstrated.
Associated publication: Pingkun Yan, Saad M. Khan, Mubarak Shah,
3D Model based Object Recognition from Arbitrary View, International Conference of Computer Vision, Rio de Janeiro, Brazil, 2007.

Figure 1: Construction of 3D feature model for motorbikes. 3D shape model of motorbike (at center) is constructed using the model views (images on the inner circle) taken around the object from different viewpoints. Supplemental images (outer circle) of different motorbikes are obtained by using Google’s image search

Figure 2: Detection of motorbikes using our approach. The ground truth is shown in green and red boxes display our detected results.


ICCV Challenge Contest: Where Am I?
This contest was held at the
International Conference of Computer Vision, 2005. Given a set of images taken at known GPS locations, the problem was to find GPS locations of test images taken at unknown locations but roughly in the same area (view overlap). Our team consisted of five researchers including myself. We received an honorable mention at the conference for achieving fourth place amongst twenty participating teams from top universities all over the world.    

                                                        (a)                                                                                                                        (b)

Figure 1: The figure demonstrates the basic task. Given a set of images taken at known GPS locations (a), we had to find the GPS locations of un-labeled pictures (b).

Figure 2: A visualization of our results from the contest. The satellite picture at the bottom is overlaid with our localization results. Blue marks our localization and red circles are ground truth. Yellow links show error in localization.


Route Panoramas and Localization
Using traditional SLAM/SFM based approaches the task of localization becomes intractable when the area under investigation reaches city/town size. The amount of data (pictures/videos) required to visually map a city, comprehensively, can be exhaustive for most search algorithms. To circumvent this problem in this project we visually map the area using route panoramas. Route panoramas (pushbroom images) provide a compact yet comprehensive medium to visually capture large areas. They are easily generated by mounting a camera on a moving vehicle (car) and driving around town. Given a query image taken at an arbitrary location in the area, we show that we can accurately recover the location of the camera by finding it’s epipole in the route panorama of the scene. To this end we have shown that there exists a fundamental matrix between a route panorama and a perspective image of the same scene. The fundamental matrix is calculated using feature matches as correspondences between the query image and the route panorama.
Associated publications:
Saad M. Khan, Fahd Rafi, Mubarak Shah, Where Was the Picture Taken: Image Localization in Route Panoramas using Epipolar Geometry, International Conference of Multimedia and Expo (ICME), Toronto, Canada 2006.

Figure 1: An example route panorama. The route panorama is closely estimated by a linear pushbroom camera ( ortho-perspective). The ortho-perspective projection (orthographic horizontally and perspective vertically)  means closer objects like cars, trees and poles get squeezed in, while important landmarks, like buildings that are further away are adequately captured.

Figure 2: Demonstration of the epipolar geometry between a perspective and linear pushbroom camera used in our approach to link up perspective and pushbroom (route panoramas) images.

Figure 3: Some results. (a) is the route panorama. (b) is the test perspective image captured in the scene. (c) shows the epipole of the test image in the route panorama. (d) shows the epipole trajectory in the test image i.e. the projection of the camera trajectory used to construct the route panorama.