HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences



Introduction

We present a new descriptor for activity recognition from videos acquired by a depth sensor. Previous descriptors mostly compute shape and motion features independently; thus, they often fail to capture the complex joint shape-motion cues at pixel-level. In contrast, we describe the depth sequence using a histogram capturing the distribution of the surface normal orientation in the 4D space of time, depth, and spatial coordinates. To build the histogram, we create 4D projectors, which quantize the 4D space and represent the possible directions for the 4D normal. We initialize the projectors using the vertices of a regular polychoron. Consequently, we refine the projectors using a discriminative density measure, such that additional projectors are induced in the directions where the 4D normals are more dense and discriminative. Through extensive experiments, we demonstrate that our descriptor better captures the joint shape-motion cues in the depth sequence, and thus outperforms the state-of-the-art on all relevant benchmarks.



Proposed Method

we propose to capture the evolving shape in the depth sequence using a histogram of oriented 4D surface normals. In order to construct HON4D, the 4D space is quantized using a regular 4D extension of a 2D polygon, namely, a 600-cell Polychoron. Consequently, the quantization is refined using a novel discriminative density measure, which we compute along the quantized directions in the 4D space.


Experiments

We extensively experimented on the proposed method using three standard 3D activity datasets including MSR Sports Actions 3D, MSR Gesture 3D, and MSR Daily Activity 3D. Our descriptor outperforms all other methods on these datasets. Please refer to the paper for the detailed results.


New dataset - MSR Action Pairs

We additionally collected a new type of 3D dataset, which we refer to as 3D Action Pairs dataset. The actions in the new dataset are selected in pairs such that the two actions of each pair are similar in motion (have similar trajectories) and shape (have similar objects); however, the motion-shape relation is different. Our descriptor also outperfoms other methods in this dataset. Please refer to the paper for the detailed results.