Hola,
¿eres nuevo aquí?

Regístrate gratis y conecta tu empresa con financiación pública, partners y proyectos.

Tengo cuenta

Regístrate

¿Te ayudamos?

Ejemplos de búsqueda

Vídeos Explicativos

NonSequeToR

Financiado

Cerrado

Non sequence models for tokenization replacement

Natural language processing (NLP) is concerned with computer-based processing of natural language, with applications such as human-machine interfaces and information access. The capabilities of NLP are currently severely limited... ver más

30/09/2023

LMU MUENCHEN

3M€

Presupuesto del proyecto: 3M€

Líder del proyecto

LUDWIGMAXIMILIANSUNIVERSITAET MUENCHEN No se ha especificado una descripción o un objeto social para esta compañía.

TRL 4-5

Fecha límite participación Sin fecha límite de participación.

Ver 2 Participantes

Financiación concedida El organismo H2020 notifico la concesión del proyecto el día 2023-09-30

ERC-2016-ADG: ERC Advanced Grant Scope:Objectives

I+D

Cerrada hace 8 años

0% 100% 100%

Características del participante

Este proyecto no cuenta con búsquedas de partenariado abiertas en este momento.

Información adicional privada

No hay información privada compartida para este proyecto. Habla con el coordinador.

2 Participantes

LMU MUENCHEN

2.50M€ | Lider

HIDRO WATER

169.15K€ | Participante

Conecta tu I+D

¿Tienes un proyecto y buscas un partner? Gracias a nuestro motor inteligente podemos recomendarte los mejores socios y ponerte en contacto con ellos. Te lo explicamos en este video

Duración del proyecto: 76 meses Fecha Inicio: 2017-05-10
Fecha Fin: 2023-09-30

Líder del proyecto

LUDWIGMAXIMILIANSUNIVERSITAET MUENCHEN No se ha especificado una descripción o un objeto social para esta compañía.

TRL 4-5

Presupuesto del proyecto 3M€

Fecha límite de participación Sin fecha límite de participación.

Descripción del proyecto Natural language processing (NLP) is concerned with computer-based processing of natural language, with applications such as human-machine interfaces and information access. The capabilities of NLP are currently severely limited compared to humans. NLP has high error rates for languages that differ from English (e.g., languages with higher morphological complexity like Czech) and for text genres that are not well edited (or noisy) and that are of high economic importance, e.g., social media text. NLP is based on machine learning, which requires as basis a representation that reflects the underlying structure of the domain, in this case the structure of language. But representations currently used are symbol-based: text is broken into surface forms by sequence models that implement tokenization heuristics and treat each surface form as a symbol or represent it as an embedding (a vector representation) of that symbol. These heuristics are arbitrary and error-prone, especially for non-English and noisy text, resulting in poor performance. Advances in deep learning now make it possible to take the embedding idea and liberate it from the limitations of symbolic tokenization. I have the interdisciplinary expertise in computational linguistics, computer science and deep learning required for this project and am thus in the unique position to design a radically new robust and powerful non-symbolic text representation that captures all aspects of form and meaning that NLP needs for successful processing. By creating a text representation for NLP that is not impeded by the limitations of symbol-based tokenization, the foundations are laid to take NLP applications like human-machine interaction, human-human communication supported by machine translation and information access to the next level.

Conecta tu I+D

Entra hoy

Forgot your password?

Financiación

Empresas

CTIs/Universidades

Proyectos

Investigadores