CAP 6412 - 01 - Advanced Computer Vision

 Fall 2003
T R: 16:30-17:45pm
ENG2 302

Credit hours: 3

Office hours: Mondays 2:00-3:00PM, Tuesdays and Thursdays 3:00PM to 4:00PM, and by appointment
Office Location: CSB 238 

Instructor: Mubarak Shah, email:

Course Web Page:

TA: Paul Smith, email:, Office hours: 11:00am -12:00pm, Friday, CCI 202 (TA office)

Course Goals:

To prepare students for graduate research in computer vision.

Course Description:

Review recent advances in computer vision. 

Required and Optional texts:

No textbook. 

Course Prerequisites:


Exam and Grading Policy:

Reports                                                  30%
Discussion and Attendance                     20%
Homework                                            10%
Programs/Project                                   40%
No exam



Class Policy:

The University Golden Rules will be observed in this class. Copying or Plagiarism is violation of the Golden Rules.

Some Tips on Reading Research Papers:

1. You have to read the paper several times to understand it. When you read the paper first time, if you do not understand something do not get stuck, keep reading assuming you will figure out that later. When you read it the second time, you will understand much more, and the third time even more ...

2. Try first to get a general idea of the paper: What problem is being solved? What are the main steps? How can I implement the method?, even though I do not understand why each step is performed the way it is performed?

3. Try to relate the method to other methods you know, and conceptually find similarities and differences.

4. In the first reading it may be a good idea to skip the related work, since you do not know all other papers, they will confuse you more.

5. Do not use dictionary to just look up the meaning of technical terms like particle filters, maximum likelihood, they are concepts, dictionaries do not define them. They will tell you literal meanings, which may not be useful.

6. Try to understand each concept in isolation, and then integrate them to understand the whole paper. For instance, the paper on "Feature Integration with adaptive weights in a sequential Monte Carlo Tracker" is quite complex paper at the first look. Because it uses Monte Carlo, particle filter, likelihood etc. But try to understand the gist of it. The paper is about tracking, you know a few tracking methods already. It uses features: color histogram, templates in correlation, shape, etc. You know these features, and you have used them. The probabilities obtained by each features are combined (fused) to achieve tracking. How will you combine the probabilities or confidences of each features: multiply, add, apply threshold and then add ...
Particle filter/condensation method is already available in Intell Open CV library, use it, get some idea how it works, what are the parameters, then go back to read the paper again ... If you keep doing it for one week, you will understand a lot about that paper! Next week you do the second paper, and so on ...

Research Tip in MIT:

August 26: Lecture 1
Computer Vision Story and The Changing Shape of Computer Vision in Twenty First Century

Paper List:

August 28: Bilateral Filtering, Presented by Jiangjian Xiao
C. Tomasi, R. Manduchi, "Bilateral filtering for gray and color images", ICCV 1998.

Related papers:
D. Comaniciu, P. Meer, "Mean Shift: A Robust Approach toward Feature Space Analysis", IEEE Trans. Pattern Analysis Machine Intell., Vol. 24, No. 5, 2002, code.
F. Durand, J. Dorsey, "Fast Bilateral Filtering for the Display of High-Dynamic-Range Images", SIGGRAPH 2002.
S. Fleishman, I. Drori, D. Cohen-Or, "Bilateral Mesh Denoising", SIGGRAPH 2003.
T. Jones, F. Durand, M. Desbrun. "Non-Iterative, Feature-Preserving Mesh Smoothing", SIGGRAPH 2003, code.

September 2: Video Matching and Tracking, Presented by Omar Javed
B. Li, R. Chellappa, Q. Zheng, and S. Der, "Model-Based Temporal Object Verification Using Video", IEEE Trans. Image Processing, 2001.

Related papers:
B. Li, R. Chellappa, Q. Zheng, and S. Der, "Experimental Evaluation of FLIR ATR Algorithms", Computer Vision and Image Understanding, 2001.

September 4-9: Image Inpainting, Restoration, and Completion, Presented by Alper Yilmaz
D. Tschumperle, R. Deriche, "Vector-Valued Image Regularization with PDE's: A Common Framework for Different Applications", CVPR 2003.

Related papers:
I. Drori, D. Cohen-Or, H. Yeshurun, "Fragment-Based Image Completion", SIGGRAPH 2003.
T. Chan, J. Shen, "Variational Image Inpainting", December 2002. Related link.
M. Bertalmío, G. Sapiro, V. Caselles and C. Ballester, "Image Inpainting", SIGGRAPH 2000. Related link.

September 11: Image Editing, Presented by Yunjun Zhang
P. Pérez, M. Gangnet, A. Blake, "Poisson Image Editing", SIGGRAPH 2003.

September 16-18: Subspace Constraints, Presented by Alexei Gritai
M. Irani, "Multi-Frame Correspondence Estimation Using Subspace Constraints", IJCV 2002.

Related papers:
L. Zelnik-Manor, M. Irani, "Multi View Subspace Constraints on Homographies", ICCV 1999.
Q. Ke, T. Kanade, "A Robust Subspace Approach to Layer Extraction", IEEE Workshop on Motion and Video Computing 2002.

September 23-25: Normal cuts and Graph cuts, Presented by Mehmet Baris Caglar
J. Shi, J. Malik, "Normalized Cuts and Image Segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888-905, August 2000.

Related papers:
Vladimir Kolmogorov and Ramin Zabih, "Multi-camera Scene Reconstruction via Graph Cuts", ECCV 2002.
Vladimir Kolmogorov and Ramin Zabih, "What Energy Functions can be Minimized via Graph Cuts?", ECCV 2002.

September 30, 4:30 PM: Demo of Program I, in  CSB103 Computer Vision lab, No class

October 2: No class

October 3 (Friday 4:30-6:00pm Makeup class,  CSB 109): Texture Synthesis, Presented by Adeel Bhutta
V. Kwatra, I. Essa, A. Schödl, G. Turk, A. Bobick, "Graphcut Textures: Image and Video Synthesis Using Graph Cuts", SIGGRAPH 2003.

Related papers:
M. Cohen, J. Shade, S. Hiller, O. Deussen, "Wang Tiles for Image and Texture Generation", SIGGRAPH 2003.
A. Efros, W. Freeman, "Image Quilting for Texture Synthesis and Transfer", SIGGRAPH 2001.
A. Schödl, I. Essa, R. Szeliski, D. Salesin, "Video Textures", SIGGRAPH 2000.

October 7: Video Surveillance, Presented by Imran N Junejo
Dimitrios Makris and Tim Ellis, "Path Detection in Video Surveillance", Image and Vision Computing Journal, vol.20/12, pp 895-903, October 2002. 

Related papers:
William Chen and Shih-Fu Chang, "Motion Trajectory Matching of Video Objects", SPIE.

October 9:  Homework and program discussion: Paul

October 14: Tracking, Presented by Jamal Alzeban
D. Comaniciu, V. Ramesh, P. Meer, "Kernel-Based Object Tracking", IEEE Trans. Pattern Analysis Machine Intell., Vol. 25, No. 5, 2003.

October 16: No class

October 21: CFG, Presented by Asaad Hakeem
D. Moore, I. Essa, "Recognizing Multitasked Activities using Stochastic Context-Free Grammar".

October 23: Shadow Transfer, Presented by Yunjun Zhang
Y. Chuang, D. Goldman, B. Curless, D. Salesin, and R. Szeliski, "Shadow Matting and Compositing", SIGGRAPH 2003. Relate Link.

October 24: (Friday 4:30-6:00pm Makeup class,  CSB 109) Homework Discussion

October 28: Activity Recognition,  
A. Efros, A. Berg, G. Mori, J. Malik, "Recognizing Action at a Distance", ICCV 2003.

October 30: Image Based Rendering, Presented by Garry Getting
A. Fitzgibbon, Y. Wexler, A. Zisserman, "Image-Based Rendering Using Image-Based Priors", ICCV 2003. Marr Prize.

November 4: No class

November 6: No class

November 11: No class, holiday

November 13: 4:30 PM: Demo of Program II, in  CSB103 Computer Vision lab, No class

November 18: Tracking, Presented by Asaad Hakeem, (Omar will moderate)
Robert T. Collins, Yanxi Liu. "On-Line Selection of Discriminative Tracking Features", ICCV 2003. Marr Prize.

November 20: Tracking, Presented by Adeel Bhutta
H. Tai, H. Sawhney, and R. Kumar, "Object Tracking with Bayesian Estimation of Dynamic Layer Representations", PAMI (24), 2002.

Note: This will be Program III

November 25: Tracking, Presented by Imran N Junejo
Paul Viola, Michael J. Jones, Daniel Snow. "Detecting Pedestrians using Patterns of Motion and Appearance", ICCV 2003.

November 27:  Thanksgiving (No class)

December 2 & 4: Presented by Khurram
X. Chen, Z. Tu, A.L. Yuille, S.C. Zhu. "Image Parsing: Segmentation, Detection and Recognition", ICCV 2003. Marr Prize.

December 11: 4:30 PM: Demo of Program III, in  CSB103 Computer Vision lab, No class

December 12: Last Lecture

Data Set and Code:

Image I/O tutorial and test data set.
Color Image I/O

Project 1:
Project 1 Description.
View Input Images Online.
Download Input Images.
Code of Bilateral filter and Mean_shift filter on gray image for comparison.

Project 2:
Project 2 Description.
Graph cut code from Cornell University.
Due Date: Nov 11, 2003.


Homework 1 Description. Solution 1.
Homework 2 Description. Solution 2.
Homework 3 Description.

Fall 2001
T Th 7:00 - 8:15pm
Engineering II 105

Prof. Mubarak Shah
238 CSB

office Hours:  3-4PM Mondays,  6-7PM Tuedays, 4-5PM Thursdays

Grading Policy

Reports                                                30%
Discussion and Attendance                    20%
Homework                                           10%
Programs/Project                                  40%




Tuesday, August 21: Lecture-1

Thursday, August 23: Presentation by Omar
Yair Weiss, " Deriving Intrinsic Images from Image Sequences"

Tuesday, August 28: Presentation by Cen
Kojima  and Tamura , ``Natural Language Description of Human Activities from Video
Images Based on Concept Hierarchy of Actions''.

Thursday, August 30: Presentation by Alper
Tao and Huang, "Visual Estimation and Compression of Motion Parameters"

Tuesday, September 11: Presentation by Sohaib
Shum et al, 'Rendering by Manifolrd Hopping

Thursday, September 20: Presentation by Lisa Spencer
Fred Pighin, Modeling and Animating Realistic Faces from Images
Fred Pighin, Modeli ng and Animating Realistic Faces from Images

Thursday, September 27: Presentation by Dr. Shah
Illumination-insensitive face recognition using symmetric shape-from-shading Wen Yi Zhao; Chellappa, R. Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on , Volume: 1 , 2000 Page(s): 286 -293 vol.1

Tuesday, October 2: presentation by Yaser shaikh
Caspi and Irani, Allignment of Non-overlapping Sequences

Thursday, October 4: Presentation by Jiangjian Xiao

Tuesday October 23 Presentation by Paul Smith:
Seitz, S., Space of  All Stereo Images

Thursday, October 25 Presentation by Yun Zhai  :
Karasaridis and E Simoncelli. A Filter Design Technique for Steerable Pyramid Image Transforms. Int'l Conf. Acoustics Speech and Signal Processing. Atlanta GA, May 1996.

Tuesday, October 30 Presentation by Dr. Shah:
Region Tracking via level set PDEs Without motion computation

Tuesday, November 6 Presentation by Ruifeng Xu:
Real-Time Tracking of Non-rigid objects using Mean Shift

Tuesday, November 20 Presentation by Xu:
Video georegistration: algorithm and quantitative evaluation Wiles, R.P.; Hirvonen, D.J.; Hsu, S.C.; Kumar, R.; Lehman, W.B.; Matei, B.; Zhao, W.-Y.

Monday November 26 Presentation by Orkun Alatas:
Shashua, A. et al On Projection Matrices

Saturday December 1 Presentation by Natan:
Toyama, K., Blake, A. Probablistic Tracking in a Metric Space

Saturday December 1 by Panraphee:
Video Skimming

Spring 2000
T Th 5:30 - 6:45pm
Comm 116

Prof. Mubarak Shah

Thursday, Jan 13: Presentation by Jasmina
Teklap et. al., "Region Based Parametric Motion Segmentation Using Color Information" [postscript]

Thursday, Jan 13: Presentation by Jasmina
Teklap et. al., "Region Based Parametric Motion Segmentation Using Color Information" [postscript]

Friday, Jan 14: Presentation by Omar
Adelson, "Layered Representations for Vision and Video" [postscript]
Wang, Adelson, "Representing Moving Images with Layers" [PDF] (reference discussed in Omar's presentation, needed for implementation)

Tuesday, Jan 18: Presentation by Omar
G. Brostow, Irfan Essa, "Motion Based Decompositing of Video" [PDF]

Thursday, Jan 20
Seitz, Dyer, "View Morphing", [PDF]

Tuesday, Jan 25
Continue discussion of Seitz/Dyer Paper

Thursday, Jan 27: Presentation by Sohaib
Baker, Szeliski, Anandan, "A Layered Approach to Stereo reconstruction", [PDF]
Szeliski et. al., "Layered Depth Images", [PDF]
(NOTE: the second paper will not be discussed in class. it is the paper that was originally assigned. The first paper will be discussed.)

Tuesday, Feb 1: Demo of Program 1

Thursday, Feb 3: Presention by Khaled
Irani, Rousso, Peleg, "Computing Occluding and Transparent Motions" [PS]

Friday, Feb 4 (makeup class for next thu):
continued discussion on last paper

Tuesday, Feb 8: Presentation by Cen
Halevy, Weinshall, "Motion of disturbances: detection and tracking of multibody non-rigid motion", [PDF]

Thursday, Feb 10: No class

Tuesday, Feb 15: Presentation by Matthew
Olson, Brill, "Moving Object Detection and Event Recognition Algorithms for Smart Cameras", [to be handed out in class]

Thursday, Feb 17: Presentation by Alper
Hartley, "In Defense of the 8-Point Algorithm", [PDF]

Tuesday, Feb 22: Presentation by Alper, Omar
Biometric identification
Machine Learning of Event Segmentation for News On Demand

Thursday, Feb 24: Presentation by Jasmina, Zeeshan, Sohaib
Complementary video and audio analysis for broadcast news archives
Transcribing Broadcast News for Audio and Video Indexing
Maestro: Conductor of Multimedia Analysis Techonologies

Tuesday Feb 29: Demo of program

Thursday, March 2: Discussion on Homeworks

Tuesday, March 7: Presentation by Khaled
Pighin, Szeliski, Salesin, Resynthesizing Facial Animation through 3D Model-Based Tracking [PDF]

Thursday, March 9: Presentation by Sohaib
Marco La Cascia, Stan Sclaroff, Vassilis Athitsos, Fast Reliable Head Tracking uder Varying Illumination: An Approach Based on Registration of Texture Mapped 3D Models [PS]

Tuesday, March 14, Thursday March 16
Spring Break :)

Tuesday, March 21: Presentation by Omar
Kumar, Sawhney, Asmuth, Pope, Hsu, Registration of Video to Geo-referenced Imagery [PDF]

Thursday, March 23:
No class today: Makeup to be announced later

Tuesday, March 28: Presentation by Sohaib
Irani, Anandan, Robust multi-sensor image alignment [PS]

Thursday, March 30: Presentation by Cen
Shum, He, Rendering with Concentric Mosiacs [PDF]

Friday March 31 (Makeup class): Presentation by Zeeshan
J. K. Bryan, D. M. Bell, A. J. Lee, N. H. Carender, F. H. Baker, A New Image Registration Paradigm (handed out by Zeeshan)

Tueday April 4: Presentation by Khaled
Zheng, Chellappa, "A Computational Vision Approach to Image Registration"

Thurday April 6:
Review of Registration Techniques

Tuesday April 11: Presentation by Alper
Volker Blanz and Thomas Vetter, A Morphable Model for the Synthesis of 3D Faces [PDF]

Thursday April 13: Presentation by Cen
Clark Olson, "A General Method for Geometric Feature Matching and Model Extraction"

Tuesday April 18: Presentation by Jasmina
Veenman, Reinders, Backer, "Qualitative Motion Correspondence: Unraveled and Resolved"
flower results

CAP 3930
Lecture 1

Image Processing Toolbox Tutorial

Face recognition slides

Data for face recognition program

Matlab files to read and write PGM files
ReadBinPGM.m WriteBinPGM.m

Images for Face Recognition
Training Images
Full Size Small Size

Testing Images
Full Size Small Size

Fundamentals of Computer Vision


taxi1 taxi2
yosemite1 yosemite2