|Chaotic Invariants for Human Action Recognition
Related Publication: Saad Ali, Arslan Basharat and Mubarak Shah, Chaotic Invariants for Human
Action Recognition, IEEE International Conference on
Computer Vision (ICCV), Rio de Janeiro, Brazil, October 14-20,
- Algorithmic Steps
- Related Links
The aim of this project is to derive a representation of the
dynamical system generating the human actions directly from the experimental
data. This is achieved by proposing a computational framework that uses
concepts from the theory of chaotic systems to model and analyze nonlinear
dynamics of human actions. The trajectories of human body joints are
used as the input representation of the action.
Our contributions include :1) investigation of the appropriateness
of theory of chaotic systems for human action modelling and recognition,
2) a new set of features to characterize nonlinear dynamics of human
actions, 3) experimental validation of the feasibility and potential
merits of carrying out action recognition using methods from theory
of chaotic systems.
This section describes the algorithmic steps of the proposed
action recognition framework. These are: i) Given a video of an exemplar
action, obtain trajectories of reference body joints, and break each
trajectory into a time series by considering each data dimension separately;
ii) obtain chaotic structure of each time series by embedding it in
a phase space of an appropriate dimension using the mutual information,
and false nearest neighborhood algorithms; iii) apply determinism test
to verify the existence of deterministic structure in the reconstructed
phase space; iv) represent dynamical and metric structure of the reconstructed
phase space in terms of the phase space invariants, and v) generate
global feature vector of exemplar action by pooling invariants from
all time series, and use it in a classification algorithm. Now, we describe
each step of the algorithm in more detail in following subsections.
(a) The block diagram of the crowd flow segmentation and instability detection algorithm.
Trajectories of six body joints (two hands, two feet, head,
belly) are used for representing an action. The trajectories are normalized
with respect to the belly point, resulting in five trajectories per
action. In case of the motion capture data set, each point of the trajectory
in represented by a three-dimensional coordinate (x,y,z).
In case of the videos, we used a semi-supervised joint detection and tracking approach for generating these trajectories.
That is, first we extracted the body skeletons and their endpoints
by using morphological operations on the foreground silhouettes
of the actor. An initial set of trajectories is generated by
joining extracted joint locations using the spatial and motion
similarity constraint. The broken trajectories and wrong
associations were corrected manually.
Trajectories for the ballet action from the motion capture data set.
Trajectories for the walk action from the video data set.
Next, each dimension of the trajectory is treated as a separate univariate time series.
The next figure shows these univariate time series for the walk action from the motion capture data set.
Embedding is a mapping from one dimensional space to a m-dimensional
space. It is an important part of study of chaotic systems, as it
allows one to study the systems for which the state space variables
and the governing differential equations are unknown. The underlying
idea of embedding is that all the variables of a dynamical system
influence one another. Thus, every subsequent point of the given one
dimensional time series results from an intricate combination of the
influences of all the true state variables of the system. This observation
allows us to introduce a series of substitute variables to obtain
the whole m-dimensional phase space, where substitute variables carry
the same information as the original variables of the system. This
is pictorially described in the following figure:
The values of optimal m and tau are computed using the mutual infomration
and false nearest neighbor algorithms respectively. The details
of these algorithms are avilable in our ICCV 2007 publication.
Three dimensional visualization of reconstructed phase spaces of different trajectories.
Metric, dynamical and topological organization of orbits
(trajectories) associated with the strange attractor of the reconstructed
phase space can be used to distinguish different strange attractors
representing different human actions. This organization is quantified
in terms of phase space invariants. In this project, we limit
ourselves only to metric and dynamical invariants which include:
i) Maximal Lyapunov Exponent, ii) Correlation Integral, and
iii) Correlation Dimension.
Maximal Lyanpunov Exponent:Lyapunov exponent is a dynamical invariant of the attractor, and
measures the exponential divergence of the nearby trajectories in
the phase space. If the value of maximum Lyapunov exponent is
greater than zero, that means the dynamics of underlying system
are chaotic. In order to compute maximum Lyapunov exponent of
reconstructed phase space, we compute select a number of reference points and their neighboring points to compute
Correlation Integral:The correlation integral is a metric invariant, which
characterizes the metric structure of the attractor by quantifying
the density of points in the phase space. It achieves this through
a normalized count of pair of points lying within a certain radius.
Correlation Dimension:The correlation
dimension also characterizes the metric structure of the
attractor. It measures the change in the density of phase space
with respect to the neighborhood radius $\epsilon$. The
correlation dimension can be computed from the correlation
integral by exploiting the power law relationship.
Experimental analysis is carried out on two data sets.
The first set of experiments is performed on the data set containing
3-dimensional motion capture sequences provided by FutureLight
(R&D division of Santa Monica Studios). In total, this data set
contains 155 sequences of 5 action classes, namely dance,
jump, run, sit, and walk, with 30,
14, 30, 33, and 48 instances, respectively. All five classes have
significant intra-class variations. The next figure
shows some example trajectories for each of the action classes.
| Example Trajectories
Confusion Table - Obatined by Leave One Out Cross Validation
The second set of experiments is performed on the action data set used in M. Blank et.
al., ``Actions as Space-Time Shapes'', ICCV, 2005. This data set constains actions performed by real actors. Specifically, the data set
contains 81 videos with 9 different actions performed by 9 different actors. The performance of our algorithm is depicted by the following confusion table.
Confusion Table - Obatined by Leave One Out Cross Validation
capture dataset used in the ICCV 2007 paper [22MB]
- Matlab script for reading the
motion capture data set
- Power Point Presentation [41MB]
The following list includes links to books, websites, and Matlab toolboxes that we benefited from during the course of the project.
- The Topology of Chaos: Alice in Stretch and Squeezeland
- Analysis of Observed Chaotic Data (Institute for Nonlinear Science)
- Matlab toolkit