Descripción del proyecto
The last two decades have witnessed giant experimental breakthroughs in different areas of the life sciences, from genomics to epidemiology. Thanks to modern high-throughput techniques, biological systems across multiple scales –from single molecules up to entire populations– can now be probed quantitatively at high spatial and temporal resolutions. Besides enhancing our basic knowledge of a system’s constituents, these data potentially encode a plethora of information about the functional constraints that govern its evolution and the physical constraints that limit its performance, as well as about levels of organization, dynamical constraints or design principles that would be hard to identify from low-throughput data. Extracting this information is also crucial for applications ranging from the design of proteins with a desired functionality to the reconstruction of contacts during an epidemics. Inverse statistical mechanics attempts to do it by inferring generative models (Boltzmann distributions) from data using methods from the physics of disordered and random systems. Specific characteristics of biological data however, like strong undersampling and heterogeneity, limit the effectiveness of these tools. SIMBAD aims at developing a class of statistical inference techniques capable of overcoming these issues. In SIMBAD, theoretical work will supply concepts and methods to address four pressing problems (learning protein sequence landscapes, inverse modeling metabolic networks, inferring contact networks from epidemiological data, and improving survival analysis models), which in turn will guide the theory towards integration with the existing standards of each field. This effort promises to open new pathways for basic research to impact economic, technological and societal issues; the high- profile cross-disciplinary expertise represented in SIMBAD ensures instead for measurable and achievable objectives, placing SIMBAD in an ideal position to achieve its goals