Multilingual Lexicon Extraction from Comparable Corpora
Given large collections of parallel (i.e. translated) texts, it is well-known how to, by successively applying a sentence- and a
word-alignment step, establish correspondences between words across languages. However, parallel text...
ver más
¿Tienes un proyecto y buscas un partner? Gracias a nuestro motor inteligente podemos recomendarte los mejores socios y ponerte en contacto con ellos. Te lo explicamos en este video
Proyectos interesantes
PGC2018-101831-B-I00
CONSECUENCIAS DE LA SIMILITUD ENTRE LENGUAS EN EL LEXICO DEL...
169K€
Cerrado
PSI2012-32533
¿LA ADQUISICION DEL LENGUAJE DEPENDE DE LA PERCEPCION HEURIS...
70K€
Cerrado
BES-2014-070547
MOLDEAMIENTO EN LA DETECCION DE PATRONES POR LAS REPRESENTAC...
88K€
Cerrado
DALI
Disagreements and Language Interpretation
2M€
Cerrado
LEXICAL
Lexical Acquisition Across Languages
2M€
Cerrado
ThReDS
A Theory of Reference for Distributional Semantics
158K€
Cerrado
Fecha límite de participación
Sin fecha límite de participación.
Descripción del proyecto
Given large collections of parallel (i.e. translated) texts, it is well-known how to, by successively applying a sentence- and a
word-alignment step, establish correspondences between words across languages. However, parallel texts are a scarce
resource for most language pairs involving lesser-used languages. On the other hand, human second language acquisition
seems not to require the reception of large amounts of translated texts, which indicates that there must be another way of
crossing the language barrier. Apparently, the human capabilities are based on looking at comparable resources, i.e. texts
or speech on related topics in different languages, which, however, are not translations of each other. Comparable (written
or spoken) corpora are far more common than parallel corpora, thus offering the chance to overcome the data acquisition
bottleneck. Despite its cognitive motivation, in the proposed project we will not attempt to simulate the complexities of
human second language acquisition, but will show that it is possible by purely technical means to automatically extract
information on word- and multiword-translations from comparable corpora. The aim is to push the boundaries of current
approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1)
Eliminating the need for initial lexicons by using a bootstrapping approach which only requires a few seed translations. 2)
Implementing a new methodology which first establishes alignments between comparable documents across languages,
and then computes cross-lingual alignments between words and multiword-units. 3) Improving the quality of computed word
translations by applying an interlingua approach, which, by relying on several pivot languages, allows a highly effective
multi-dimensional cross-check. 4) We will show that, by looking at foreign citations, language translations can even be
derived from a single monolingual text corpus.