<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
handle: 10138/309933
A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just as it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, whereas genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.
FOS: Computer and information sciences, Linguistics and Language, Språkbehandling och datorlingvistik, 530 Physics, 1702 Artificial Intelligence, 10231 Department of Astrophysics, Language and Linguistics, computational linguistics, representation learning, Artificial Intelligence, 1706 Computer Science Applications, natural language processing, 1203 Language and Linguistics, Natural Language Processing, Computer and information sciences, Computer Science - Computation and Language, linguistic typology, Computer Science Applications, 3310 Linguistics and Language, Languages, Computational linguistics. Natural language processing, language technology, P98-98.5, language representations, Computation and Language (cs.CL)
FOS: Computer and information sciences, Linguistics and Language, Språkbehandling och datorlingvistik, 530 Physics, 1702 Artificial Intelligence, 10231 Department of Astrophysics, Language and Linguistics, computational linguistics, representation learning, Artificial Intelligence, 1706 Computer Science Applications, natural language processing, 1203 Language and Linguistics, Natural Language Processing, Computer and information sciences, Computer Science - Computation and Language, linguistic typology, Computer Science Applications, 3310 Linguistics and Language, Languages, Computational linguistics. Natural language processing, language technology, P98-98.5, language representations, Computation and Language (cs.CL)
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 19 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |