YouTube Action Dataset

Related Publications:

1. Jingen Liu, Jiebo Luo and Mubarak Shah, Recognizing Realistic Actions from Videos "in the Wild", IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), 2009.

2. Jingen Liu, Yang Yang and Mubarak Shah, Learning Semantic Visual Vocabularies using Diffusion Distance, IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), 2009.

3. Jingen Liu, Yang Yang, Imran Saleemi, and Mubarak Shah, Learning semantic features for action recognition via Diffusion Map, to appear in Journal of Computer Vision and Image Understanding, 2012. [PDF]


1. It contains 11 action categories: basketball shooting, biking/cycling, diving, golf swinging, horse back riding, soccer juggling, swinging, tennis swinging, trampoline jumping, volleyball spiking, and walking with a dog.

2. This dataset is very challenging due to large variations in camera motion, object appearance and pose, object scale, viewpoint, cluttered background, illumination conditions, etc.

3. For each category, the videos are grouped into 25 groups with more than 4 action clips in it. The video clips in the same group may share some common features, such as the same actor, similar background, similar viewpoint, and so on.

4. The videos are ms mpeg4 format. You need to install the right Codec (e.g., K-lite Codec Pack contains a cellection of Codecs) to access them.

5. If you happen to use this dataset, you can refer the following paper:
J. Liu, J. Luo and M. Shah, Recognizing realistic actions from videos "in the wild", CVPR 2009, Miami, FL. ( For action biking and walking class, we select all the videos; for the rest of action classes, we selected the videos numbered from 01 to 04 from each group. Leave one out cross-validation is used, which means one group is used for testing, the rest for training).



  1. UCF YouTube Action Dataset [about 424M]
  2. [New] UCF YouTube Action Dataset with bounding box annotations.