Learning Pedestrian Dynamics from the Real World

In this paper we describe a method to learn parameters which govern pedestrian motion by observing video data. Our learning framework is based on variational mode learning and allows us to efficiently optimize a continuous pedestrian cost model. We show that this model can be trained on automatic tracking results, and provides realistic and accurate pedestrian motions.

Published in ICCV 09



Vision Surveillance

Non-technical talk aimed at introducing undergraduate level students to computer vision problems relating to surveillance. This talk covers many different applications and aspects of automatic surveillance.



3DSIFT Source Code

This MATLAB code is meant for research purposes only.

There have been various changes made to the code since the initial publication. Some subtle, some not so subtle. The most significant change is the use of a tessellation method to calculate the orientation bins. Our testing has shown improved results; however, currently rotational invariance has not been re-implemented. Rotational invariance is useful in certain applications, however it is useless in others, for this reason we have focused our time elsewhere. Another notable change is the elimination of some points due to lack of descriptive information (multiple gradient orientations). This is a change which has a flag, and can therefore be turned on or off, however I suggest leaving it on and writing your frontend in such a way that allows 3DSIFT to refuse points, as this too has proven very effective in our testing.

Please see the README file for more detailed and up-to-date information.


3DSIFT Release History

v1.0 - Initial Public Release

A 3-Dimensional SIFT Descriptor and
Its Application to Action Recognition

In this paper we introduce a 3-dimensional (3D) SIFT descriptor for video or 3D imagery such as MRI data. We also show how this new descriptor is able to better represent the 3D nature of video data in the application of action recognition. This paper will show how 3D SIFT is able to outperform previously used description methods in an elegant and efficient manner. We use a bag of words approach to represent videos, and present a method to discover relationships between spatio-temporal words in order to better describe the video data.

Published in ACM MM 07



Radon Based Texture Classification

An implementation and analysis of the paper "Radon Representation-Based Feature Descriptor for Texture Classification". We were able to reproduce many of the experiments of this paper and in this study we will show our texture classification results using the Radon Representation Feature Descriptor (RRFD).





COCOA is a modular system capable of performing motion compensation, moving object detection, object tracking and indexing of videos taken from a camera mounted on a moving aerial platform (e.g. UAVs). In order to index a video COCOA processes it through a number of stages. At first stage, motion of the aerial platform is compensated by employing one of the several frame to frame alignment methods which are available in COCOA. Second stage performs moving object detection by employing a hybrid approach which involves frame differencing, background modeling and object segmentation. In third stage, foreground regions are tracked as long as they remain within the field of view of the camera. Finally, tracks are generated and analyzed with respect to the global mosaic reference frame. Interesting events are marked out in these trajectories and subsequently used to index the video. In addition to these capabilities COCOA also provides a search feature which can be used to retrieve previously indexed videos from the database. COCOA is customizable to different sensor resolutions and is capable of tracking targets as small as 200 pixels. It works seamlessly for both visible and thermal imaging modes. The system is implemented in Matlab and performs video processing in a batch mode.

Project Webpage

Video Registration

This paper is a short report on a method for registration of video data which was integrated into the COCOA GUI. This work details a method for registration using data which may or may not contain accurate meta-data.






Computer Vision Resources

Papers on the web - Read publications before they are (officially) published

Peter's Functions for Computer Vision - Essential functions

LibSVM - SVM library

mmread - Read (virtually) any media file into MATLAB

Piotr's MATLAB Toolbox - More useful functions

Netlab - Collection of algorithms for pattern recognition


About Me

I completed my Ph.D. at the University of Central Florida in 2011 under Marshall Tappen. I am currently employed by Microsoft.