Computer Vision
A Probabilistic Representation for Efficient Large Scale Visual
Recognition Tasks
Abstract:
In this paper, we present an efficient alternative to the traditional
vocabulary based on bag-of-visual words (BoV) used for visual classification
tasks. Our representation is both conceptually and computationally superior to
the bag-of-visual words: (1) We iteratively generate a Maximum Likelihood
estimate of an image given a set of characteristic features in contrast to the
BoV methods where an image is represented as a histogram of visual words,(2)
We randomly sample a set of characteristic features instead of employing
computation intensive clustering algorithms used during the vocabulary
generation step of BoV methods. Our comparable performance to the
state-of-the-art, on experiments over a challenging scene categorization
dataset and two equally challenging human action datasets, demonstrates the
universal applicability of our method.
The camera ready version is here.
The code and data used in the experiments discussed in the paper would be
uploaded shortly.
Video Monitoring of Honey Bees
Abstract:
(Coming soon!)
TRECVID MED(Multimedia Event Detection) 2010
Abstract:
TRECVID Multimedia Event Detection offers an interesting but very challenging
task in detecting high-level complex events(batting baseball run, making cake,
assembling shelter) in user-generated videos. In this
paper, we will present an overview and comparative analysis of our results,
which achieved top performance among all 45 submissions in TRECVID 2010.
Our aim is to answer the following questions. What kind of feature is more
effective for multimedia event detection? Are features from different feature
modalities (e.g., audio and visual) complementary for event detection? Can we
benefit from generic concept detection of background scenes, human actions,
and audio concepts? Are sequence matching and event-specific object detectors
critical?
Our findings indicate that spatial-temporal feature is very effective for
event detection, and it's also very complementary to other features such as
static SIFT and audio features. As a result, our baseline run combining these
three features already achieves very impressive results, with a mean minimal
normalized cost (MNC) of 0.586. Incorporating the generic concept detectors
using a graph diffusion algorithm provides marginal gains (mean MNC 0.579).
Sequence matching with Earth Mover's Distance (EMD) further improves the
results (mean MNC 0.565). The event-specific detector ("batter"), however,
didn't prove useful from our current re-ranking tests. We conclude that it is
important to combine strong complementary features from multiple modalities
for multimedia event detection, and cross-frame matching is helpful in coping
with temporal order variation. Leveraging contextual concept detectors and
foreground activities remains a very attractive direction requiring further
research.
This is a joint effort between Columbia University and UCF which culminated
into the best performance in the
Multimedia Event Detection
2010 challenge. A notebook paper is available
here.
A Framework for Photo-Quality Assessment and Enhancement
based on Visual Aesthetics
Abstract:
We present an interactive application that enables users to improve the visual
aesthetics of their digital photographs using spatial recomposition. Unlike
earlier work that focuses either on photo quality assessment or interactive
tools for photo editing, we enable the user to make informed decisions about
improving the composition of a photograph and to implement them in a coherent
framework. Specifically, the user can interactively select a foreground object
and the system will present recommendations for where it can be moved in a
manner that optimizes a learned aesthetic metric while obeying semantic
constraints. For photographic compositions that lack a distinct foreground
object, our tool provides the user with cropping or expanding recommendations
that improve its aesthetic quality. We learn a support vector regression model
for capturing image aesthetics from user data and seek to optimize this metric
during recomposition. Rather than prescribing a fully-automated solution, we
allow user-guided object segmentation and inpainting to ensure that the final
photograph matches the user's criteria. Our approach achieves 86% accuracy in
predicting the attractiveness of unrated images, when compared to their
respective human rankings. Additionally, 73% of the images recomposited using
our tool are ranked more attractive than their original counterparts by human
raters.
This work is accepted
in ACM Multimedia International Conference
(ACMMM 2010) as a 10 page paper
(17% acceptance rate), held in Firenze, Italy. Here is an accompanying
talk.
A subset of the images from the dataset mentioned in the paper is available
here. We received some objects from Flickr
users for making there images publicly available for experiments, hence the
full dataset was brought down. The code provided in the archive is unsupported.
Moving Object Detection and Tracking in Forward Looking Infra-Red
Aerial imagery
Abstract:
This chapter discusses the challenges of automating surveillance and
reconnaissance tasks for infra-red visual data obtained from aerial platforms.
These problems have gained significant importance over the years, especially
with the advent of lightweight and reliable imaging devices. Detection and
tracking of objects of interest has traditionally been an area of interest in
the computer vision literature. These tasks are rendered especially challenging
in aerial sequences of infra red modality. The chapter gives an overview of
these problems, and the associated limitations of some of the conventional
techniques typically employed for these applications. We begin with a study
of various image registration techniques that are required to eliminate motion
induced by the motion of the aerial sensor. Next, we present a technique for
detecting moving objects from the ego-motion compensated input sequence.
Finally, we describe a methodology for tracking already detected objects using
their motion history. We substantiate our claims with results on a wide range
of aerial video sequences.
This work is
published as a chapter in Springer book
Machine Vision beyond Visible Spectrum .
Grid/High Performance Computing
Scalable and Distributed Mechanisms for Integrated Scheduling and
Replication in Data Grids
Abstract:
Data Grids seek to harness geographically distributed resources for
large-scale data-intensive problems. Such problems involve loosely coupled jobs
and large data sets distributed remotely. Data Grids have found applications in
scientific research fields of high-energy physics, life sciences etc. as well as
in the enterprises. The issues that need to be considered in the Data Grid
research area include resource management for computation and data. Computation
management comprises scheduling of jobs, scalability, and response time; while
data management includes replication and movement of data at selected sites. As
jobs are data intensive, data management issues often become integral to the
problems of scheduling and effective resource management in the Data Grids.
Integration of data replication and scheduling strategies is important. Such an
integrating solution is either non-existent or work in a centralized manner
which is not scalable. The paper deals with the problem of integrating the
scheduling and replication strategies in a distributed manner. As part of the
solution, we have proposed a Distributed Replication and Scheduling Strategy
(DistReSS) which aims at an iterative improvement of the performance based on
coupling between scheduling and replication, which is achieved in a distributed
and hierarchical fashion. Results suggest that, in the context of our
experiments, DistReSS performs comparable to the centralized approach when the
parameters are tuned properly.
Work accepted as a poster in CCGrid
2005.