Cognitive Image Understanding Image representations and Multimodal learning
One of the primary and most appealing goals of computer vision is to automatically understand the content of images on a cognitive level. Ultimately we want to have computers interpret images as we humans do, recognizing all the o...
ver más
¿Tienes un proyecto y buscas un partner? Gracias a nuestro motor inteligente podemos recomendarte los mejores socios y ponerte en contacto con ellos. Te lo explicamos en este video
Proyectos interesantes
IDIU
Integrated and Detailed Image Understanding
1M€
Cerrado
PCIN-2013-047
VISUAL SENSE, ETIQUETADO DE INFORMACION VISUAL CON DESCRIPCI...
130K€
Cerrado
PGC2018-096156-B-I00
RECUPERACION Y DESCRIPCION DE IMAGENES MEDIANTE LENGUAJE NAT...
68K€
Cerrado
TIN2016-79717-R
APRENDIZAJE PROFUNDO MULTI-TAREA PARA RECONOCIMIENTO DE OBJE...
39K€
Cerrado
Fecha límite de participación
Sin fecha límite de participación.
Descripción del proyecto
One of the primary and most appealing goals of computer vision is to automatically understand the content of images on a cognitive level. Ultimately we want to have computers interpret images as we humans do, recognizing all the objects, scenes, and people as well as their relations as they appear in natural images or video. With this project, I want to advance the state of the art in this field in two directions, which I believe to be crucial to build the next generation of image understanding tools. First, novel more robust yet descriptive image representations will be designed, that incorporate the intrinsic structure of images. These should already go a long way towards removing irrelevant sources of variability while capturing the essence of the image content. I believe the importance of further research into image representations is currently underestimated within the research community, yet I claim this is a crucial step with lots of opportunities good learning cannot easily make up for bad features. Second, weakly supervised methods to learn from multimodal input (especially the combination of images and text) will be investigated, making it possible to leverage the large amount of weak annotations available via the internet. This is essential if we want to scale the methods to a larger number of object categories (several hundreds instead of a few tens). As more data can be used for training, such weakly supervised methods might in the end even come on par with or outperform supervised schemes. Here we will call upon the latest results in semi-supervised learning, datamining, and computational linguistics.