Nowadays a huge amount of planetary DNA and RNA sequencing data is available and continues to double every two years. However, efficiently analyzing this data is impossible due to its size measured in petabases. Many high-impact b...
ver más
¿Tienes un proyecto y buscas un partner? Gracias a nuestro motor inteligente podemos recomendarte los mejores socios y ponerte en contacto con ellos. Te lo explicamos en este video
Proyectos interesantes
RYC-2016-20621
Phylogenomic approaches in the sequencing era
309K€
Cerrado
PTA2014-09515-I
Desarrollo y asistencia de pipelines computacionales para el...
39K€
Cerrado
CSIC13-4E-2490
Infraestructura de cálculo científico para genómica, proteóm...
15K€
Cerrado
PTQ-11-04988
Desarrollo de algoritmos y métodos estadísticos para identif...
39K€
Cerrado
BIO2012-40244
DESARROLLO DE RECURSOS COMPUTACIONALES PARA LA CARACTERIZACI...
105K€
Cerrado
BFU2011-28575
NGS-COFFEE: PRODUCCION DE ALINEAMIENTOS GENOMICOS MULTIPLES...
300K€
Cerrado
Información proyecto IndexThePlanet
Duración del proyecto: 64 meses
Fecha Inicio: 2023-04-26
Fecha Fin: 2028-08-31
Líder del proyecto
INSTITUT PASTEUR
No se ha especificado una descripción o un objeto social para esta compañía.
TRL
4-5
Presupuesto del proyecto
2M€
Fecha límite de participación
Sin fecha límite de participación.
Descripción del proyecto
Nowadays a huge amount of planetary DNA and RNA sequencing data is available and continues to double every two years. However, efficiently analyzing this data is impossible due to its size measured in petabases. Many high-impact biological discoveries could be made but are prevented by the lack of fast search algorithms. I recently demonstrated this potential by discovering an order of magnitude more RNA virus species within all public RNA samples. A global index, i.e. a planetary genomic search engine, would unlock instant and inexpensive search within petabase-scale data.
I hypothesize that I can create a searchable index for all of the public DNA and RNA sequencing data. Leveraging my unique expertise across algorithms and data structures for biological sequences, my plan is to design efficient methods to assemble and compress all available sequencing data, and then construct an external-memory index that will support versatile biological queries.
A planetary sequencing data index will enable a myriad of bioinformatics analyses that are currently out of reach. I will demonstrate the utility of the index by constructing a database of human transcripts with novel diseases associations, discovering novel microbial species, and providing a search engine for environmental metagenomes. The resulting unprecedented collection of assembled genomes and compressed reads will lift a major challenge in data accessibility, improving its efficiency by several orders of magnitude, revolutionizing the scale of future bioinformatics analyses.