Addressing The Idiom Challenge in Machine Translation: A Review Focused on Low-Resource Languages
DOI:
https://doi.org/10.5281/zenodo.18431088Keywords:
Machine Translation, Idiomatic Expressions, Low-Resource Languages, Large Language Models, SurveyIntroductionAbstract
Machine translation (MT) has made significant progress for high-resource languages, yet idiomatic translation remains a persistent challenge, especially for low-resource languages like Punjabi. Idioms are non-compositional with respect to cultural values, which makes literal translations insufficient. This paper presents a systematic review of idiom translation in pairs of low-resource-to-high-resource languages, focusing on Punjabi-English as a case study. We analyze key challenges—including dataset limitations, figurative-literal ambiguity, structural complexity, and evaluation limitations—and examine existing approaches, including rule-based, statistical, neural, and large language model (LLM)-based methods. We identify gaps in idiom-specific datasets, evaluation frameworks, and multilingual transfer techniques. Finally, we provide some guidance for future research, highlighting hybrid models, community-driven datasets, multimodal translation, and idiom-aware evaluation metrics. This review aims to guide the development of more accurate and culturally aware MT systems for low-resource languages.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Manjot Kaur, Jasvir Kaur, Jasmin Kaur Gahlot, Prof. Palak Sood (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
