Abstract:
In thesis, we address some issues in classification of varying length patterns
of speech and scene images represented as sets of continuous valued feature
vectors using kernel methods. Kernels designed for varying length patterns
are called as dynamic kernels. This thesis considers the matching based approaches for designing dynamic kernels.
The thesis first proposes the example-specific density based matching kernel
(ESDMK) based support vector machine (SVM) classifier for varying
length patterns. The proposed kernel is computed between a pair of examples,
represented as sets of feature vectors, by matching the estimates
of the example-specific densities computed at every feature vector in those
two examples. The number of feature vectors of an example among the
k nearest neighbors of a feature vector is considered as an estimate of the
example-specific density. The minimum of the estimates of two example-specific densities, one for each example, at a feature vector is considered as the matching score. The ESDMK is then computed as the sum of the matching score computed at every feature vector in a pair of examples. The main
issue in building the proposed kernel is choice of k, the number of neighbors.
This thesis proposes to combine all the matchings obtained using the
different values of k to compute pyramid match ESDMK. We propose to
compute pyramid match ESDMK as the weighted sum of matches obtained
by computing the ESDMKs at sequence of increasingly coarser neighbors.
The proposed ESDMKs does not include spatial information in the images
which is important for better matching of images. We propose the spatial
ESDMK (SESDMK) to include the spatial information. We consider a fixed
number of spatial regions in every scene image. An ESDMK for the local
feature vectors in a particular region from the two examples is constructed.
Then, the SESDMK is constructed as a combination of ESDMKs of all the regions. The performance of the SVM-based classifiers using the proposed
family of ESDMKs for sets of local feature vectors extracted from images
and long duration speech is studied for scene classification, speech emotion
recognition and speaker identification tasks and compared with that of the
SVM-based classifiers using the state-of-the-art dynamic kernels.