I've done my PhD in the Imedia lab at INRIA. The subject was : Towards an efficient visual content description for images automatic annotation. The thesis manuscript and my publications are available online.
I took part in the ImagEval benchmark. The methods used ranked first on tasks 4 (object detection) and 5 (scene categorization). Details can be found in the CIVR'07 paper.
Thesis summary (en français) :
Recent technological advances in the acquisition of multimedia data have led to an exponential growth of digital content. For the end-user of generalist images databases (photo agencies, personal collections), manual annotation has become prohibitively expensive.
We present a generic approach to automatic annotation that generates new metadata. It is based on a statistical learning framework using SVM with triangular kernel. The description of visual content and its representation are perhaps the most important steps as they are used by the whole process. For the global representation of images, we propose the new forms descriptor LEOH. In order to describe locally the images, we use bag of visual words. We show, in a original way, that dense sampling is preferable to the use of points of interest detectors for the selection of visual patches. In addition, we propose to include flexible geometric constraints, which are, by nature, ignored in the bag of words, by using pairs of visual words. In the context of active learning, we propose a new strategy to mix global visual descriptions and bag of words.
This work has been assessed with realistic images datasets. These experiments have highlighted the relevance of the proposed improvements. We obtain the best performances during the ImagEVAL benchmark campaign.