Descripción del proyecto
In recent years, transformer-based deep learning models such as BERT or GPT-3 have led to impressive results in many natural language processing (NLP) tasks, exhibiting transfer and few-shot learning capabilities.
However, despite faring well in benchmarks, current deep learning models for NLP often fail badly in the wild: they are bad at out-of-domain generalization, they do not exploit contextual information, they are poorly calibrated, and their memory is not traceable. These limitations stem from their monolithic architectures, which are good for perception, but unsuitable for tasks requiring higher-level cognition.
In this project, I attack these fundamental problems by bringing together tools and ideas from machine learning, sparse modeling, information theory, and cognitive science, in an interdisciplinary approach. First, I will use uncertainty and quality estimates for utility-guided controlled generation, combining this control mechanism with the efficient encoding of contextual information and integration of multiple modalities. Second, I will develop sparse and structured memory models, together with attention descriptive representations towards conscious processing. Third, I will build mathematical models for sparse communication (reconciling discrete and continuous domains), supporting end-to-end differentiability and enabling a shared workspace where multiple modules and agents can communicate.
I will apply the innovations above to highly challenging language generation tasks, including machine translation, open dialogue, and story generation. To reinforce interdisciplinarity and maximize technological impact, collaborations are planned with cognitive scientists and with a scale-up company in the crowd-sourcing translation industry.