Abstract:
Automated human action analysis has important applications in various domains such as
automated driving systems, video retrieval, video surveillance (for security purposes), elderly
care, and human-robot interactions. However, various problems in this area are quite
challenging and are yet unsolved. Traditional problem of human action recognition involves
the classification of videos to action class labels. This requires a robust video representation
technique and good classifier for modeling of feature representations and to account for
variations. In real time applications, one has to deal with continuous action videos where
multiple actions are performed. In some cases (e.g. human object interactions), one also
needs to consider local levels of actions involving aspects of individual body parts and objects.
In this thesis, we propose some approaches and provide some interesting experimental
analysis to address some important problems related to human action analysis.
First, we propose to use skeleton information with Eigen-joint frame representation and
apply a dynamic frame warping (DFW) framework and a Bag-of-words (BOW) framework
for action recognition. Our approach can deal with the variations in action duration. We
demonstrate that our method is better able to deal with the intra-class variations and as a result,
performs better than some contemporary methods. Our approach also work with lesser
number of training examples better than hidden markov models (HMMs) and conditional
random fields (CRFs).
In the second part of the thesis, we consider a more challenging aspect of human action
localization which is important for continuous action recognition. In this problem, a particular
action is to be recognized in a test sequence of multiple actions, with unknown order. We do not assume any knowledge about the starting and ending frames of each action. We propose a greedy alignment algorithm which works in real-time, and is extended upon the
Dynamic frame warping framework. A notion of class templates in the DFW framework
helps in achieving the intra-class variations and the greedy alignment algorithm allows us
to work with framework in real time unlike dynamic programming based dynamic frame
warping framework.
In the third part of the thesis, we focus on the task of fine-grained manipulation action
classification where hand-object interactions are involved. In this work, we use grasp attributes
and motion-constraints information available with Yale Human Grasping dataset.
We propose to use the grasp and motion-constraints information to classify 455 object manipulation
actions present in this dataset. We show differential comparisons for the performance
of different classifiers on grasp information. We also compare object manipulation
action recognition accuracies using coarse-grained and ne-grained grasp information.