Addressing The Idiom Challenge in Machine Translation: A Review Focused on Low-Resource Languages
Main Article Content
Abstract
Machine translation (MT) has made significant progress for high-resource languages, yet idiomatic translation remains a persistent challenge, especially for low-resource languages like Punjabi. Idioms are non-compositional with respect to cultural values, which makes literal translations insufficient. This paper presents a systematic review of idiom translation in pairs of low-resource-to-high-resource languages, focusing on Punjabi-English as a case study. We analyze key challenges—including dataset limitations, figurative-literal ambiguity, structural complexity, and evaluation limitations—and examine existing approaches, including rule-based, statistical, neural, and large language model (LLM)-based methods. We identify gaps in idiom-specific datasets, evaluation frameworks, and multilingual transfer techniques. Finally, we provide some guidance for future research, highlighting hybrid models, community-driven datasets, multimodal translation, and idiom-aware evaluation metrics. This review aims to guide the development of more accurate and culturally aware MT systems for low-resource languages.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.