2606.02147
2026-06-02
cs.CL
cs.AI
Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages
高、中、低资源语言中的句子和对话多语言习语
Saeed Almheiri, Bilal Elbouardi, Salsabila Zahirah Pranida, Irina Nikishina, Ashwath Rao B, Parameswari Krishnamurthy, Muhammad Cendekia Airlangga, Rifo Ahmad Genadi, Nguyen Phan Gia Bao, Amir Hossein Yari, Hawau Olamide Toyin, Nurdaulet Mukhituly, Mena Attia, Besher Hassan, Ahmad Fathan Hidayatullah, Tatsuki Kuribayashi, Haonan Li, Suma Bhat, Fajri Koto
发表机构
*
Mohamed bin Zayed University of Artificial Intelligence(莫扎德大学人工智能大学)
;
University of Hamburg(汉堡大学)
;
Manipal University(曼印大学)
;
IIIT Hyderabad(海得拉尔IIIT)
;
University of Science and Technology of Hanoi(河内科学技术大学)
;
Universitas Islam Indonesia(印尼伊斯兰大学)
;
Princeton University(普林斯顿大学)
AI总结
针对多语言习语理解,构建了覆盖3种高资源、3种中资源和12种低资源语言的MIDI数据集,包含句子和对话上下文中的字面与比喻用法,实验表明低资源语言理解更差,字面义比比喻义更难,对话上下文虽有改善但未消除差距。