High Performance Language Technologies (HPLT) is a space combining petabytes of natural language data with large-scale model training. With trillions of words of text, the space will be the largest open text collection. Cleaning...
ver más
¿Tienes un proyecto y buscas un partner? Gracias a nuestro motor inteligente podemos recomendarte los mejores socios y ponerte en contacto con ellos. Te lo explicamos en este video
Proyectos interesantes
EuroMatrix
EuroMatrix: Statistical and Hybrid Machine Translation Betw...
2M€
Cerrado
SEQCLAS
A Sequence Classification Framework for Human Language Techn...
3M€
Cerrado
TIN2017-91692-EXP
TRADUCCION AUTOMATICA NEURONAL NO SUPERVISADA: UN NUEVO PARA...
48K€
Cerrado
EUR2019-103819
REPRESENTACION UNIVERSAL DEL LENGUAJE APRENDIDA AUTOMATICAME...
75K€
Cerrado
DECOLLAGE
DEep COgnition Learning for LAnguage GEneration
2M€
Cerrado
FoTran
Found in Translation Natural Language Understanding with C...
2M€
Cerrado
Información proyecto HPLT
Duración del proyecto: 38 meses
Fecha Inicio: 2022-06-13
Fecha Fin: 2025-08-31
Líder del proyecto
UNIVERZITA KARLOVA
No se ha especificado una descripción o un objeto social para esta compañía.
TRL
4-5
Presupuesto del proyecto
4M€
Fecha límite de participación
Sin fecha límite de participación.
Descripción del proyecto
High Performance Language Technologies (HPLT) is a space combining petabytes of natural language data with large-scale model training. With trillions of words of text, the space will be the largest open text collection. Cleaning and privacy protecting services improve the quality and ethical properties of the text. Going beyond static repositories that require the user to individually analyze each data set, the project will rate data sets by how much they improve end-to-end language models and machine translation systems. Continuous integration of models and data will result in free downloadable high-quality models for all official European Union languages and beyond. The models will be reproducible with information and evaluation metrics shown in a publicly available dashboard. By focusing on training at scale, the project complements the inference-focused European Language Grid, which in turn will be used for model deployment. Datasets, models and information about them will be published in recognized FAIR data repositories, aggregation catalogues and marketplaces for easy discovery, access, replication, and exploitation.