Independent subspace analysis for activity recognition and fine-grained classification. (MS)

Kumar, Muneder

Independent subspace analysis for activity recognition and fine-grained classification. (MS)

Kumar, Muneder

URI: http://hdl.handle.net/123456789/383

Date: 2017-04-07

Abstract:

Visual recognition is a challenging problem which depends on the discriminative nature and robustness of the features used in recognition techniques. These techniques are mainly focused on adapting hand-designed local features such as SIFT, HOG, k-NN, and SURF etc., which are not scalable to other modalities. Hence there is a paradigm shift from hand-designed local features to unsupervised learning in order to extract features directly from the raw data. Visual signals (images) can be modeled using independent subspace analysis (ISA), an extension to general ICA model, which gives invariant features. ISA has been extended for large data set to delivers hierarchy of features using convolution and stacking multiple layers of ISA over each other. Albeit performance is good, it takes signi ficant amount of time on large datasets due to high computational complexity and sequential implementation. Two different methods are proposed to speed up feature learning in multilayered ISA. First method for faster feature learning uses parallelization present in the data. MapReduce, a scalable programming model, is used to parametrize ISA model using multiple map-reduce functions over the equal disjoint sets of distributed data. The second method for increasing speed uses spatio-temporal interest point detectors to extract important blocks from video which removes irreverent video blocks. The latter not only enhances the speed but also improves the classification accuracy. Different input level modifications are also proposed which increases the classification performance. A data set is also created for human-water activities for surveillance purpose near water bodies and the ISA network is applied over it for feature extraction and classification. Multilayered ISA is used to extract features for ne-grained recognition of similar objects i.e., categorizing various types of leaves, butteries and birds into their subcategories like breeds and species. This architecture has three ISA layers to extract features from the large image patches. The process convolves learned filters over a large spatial region (image patch) which are learned by applying ISA on small size image patches. Further, discriminative patches are used to train ISA network which correspond to SIFT points and has optimal size based on classification accuracy. Addition of more ISA layers increases the percentage of true-positives significantly enough while our computational cost is not affected due to the reduction in data size. The proposed approach is tested over leaf, butterfly, and bird dataset. Most of the techniques applied on the leaf was focused on structural features since leaves have needges. These needges are enhanced by applying contrast limited adaptive histogram equalization (CLAHE) on the leaf images. The hybrid technique which work best for leaf dataset is wavelet transform of patches taken around SIFT key points of the enhanced image. It should be hypothesized that adding another ISA layer captures large spatial region and hence gives the complex structure present there. All this together improves percentage of true-positives in the classification by a significant amount. Features learned from ISA are also used for action recognition in RGB and depth videos where cuboids are extracted around spatio temporal interest points after normalizing the frame size of different videos. The resulting cuboids are concatenated for training multilayered ISA model with two layers. Different dataset are used for testing the framework such as MSR-Action3D, MSR Daily Activity 3D, UTD multimodal human action datasets having 20, 16, and 27 activities respectively