Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Input Representations Matter
AuthorsVerena Blaschke†‡§, Masha Fedzechkina, Maartje ter Hoeve
AuthorsVerena Blaschke†‡§, Masha Fedzechkina, Maartje ter Hoeve
Cross-lingual transfer is a popular approach to increase the amount of training data for NLP tasks in a low-resource context. However, the best strategy to decide which cross-lingual data to include is unclear. Prior research often focuses on a small set of languages from a few language families or a single task. It is still an open question how these findings extend to a wider variety of languages and tasks. In this work, we contribute to this question by analyzing cross-lingual transfer for 263 languages from a wide variety of language families. Moreover, we include three popular NLP tasks in our analysis: POS-tagging, dependency parsing, and topic classification. Our findings indicate that the effect of linguistic similarity on transfer performance depends on a range of factors: the NLP task, the (mono- or multilingual) input representations, and the definition of linguistic similarity.
August 1, 2024research area Speech and Natural Language Processingconference ACL
December 11, 2022research area Speech and Natural Language ProcessingWorkshop at NeurIPS