arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.19778 2026-04-23 cs.CL

Towards High-Quality Machine Translation for Kokborok: A Low-Resource Tibeto-Burman Language of Northeast India

Badal Nyalang, Biman Debbarma

详情

英文摘要

We present KokborokMT, a high-quality neural machine translation (NMT) system for Kokborok (ISO 639-3), a Tibeto-Burman language spoken primarily in Tripura, India with approximately 1.5 million speakers. Despite its status as an official language of Tripura, Kokborok has remained severely under-resourced in the NLP community, with prior machine translation attempts limited to systems trained on small Bible-derived corpora achieving BLEU scores below 7. We fine-tune the NLLB-200-distilled-600M model on a multi-source parallel corpus comprising 36,052 sentence pairs: 9,284 professionally translated sentences from the SMOL dataset, 1,769 Bible-domain sentences from WMT shared task data, and 24,999 synthetic back-translated pairs generated via Gemini Flash from Tatoeba English source sentences. We introduce as a new language token for Kokborok in the NLLB framework. Our best system achieves BLEU scores of 17.30 and 38.56 on held-out test sets, representing substantial improvements over prior published results. Human evaluation by three annotators yields mean adequacy of 3.74/5 and fluency of 3.70/5, with substantial agreement between trained evaluators.

URL PDF HTML ☆

赞 0 踩 0

2604.19777 2026-04-23 cs.CL cs.AI cs.IR

Self-Describing Structured Data with Dual-Layer Guidance: A Lightweight Alternative to RAG for Precision Retrieval in Large-Scale LLM Knowledge Navigation

Hung Ming Liu

Comments 18 pages, 6 figures, 7 tables

2604.19776 2026-04-23 cs.CL cs.LG

Development and Preliminary Evaluation of a Domain-Specific Large Language Model for Tuberculosis Care in South Africa

Thokozile Khosa, Olawande Daramola

Comments 12 pages, 2 figures, ICICT 2026 Conference

2604.19775 2026-04-23 cs.AI cs.CL cs.ET cs.MA cs.RO

From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents

Trilok Padhi, Ramneet Kaur, Krishiv Agarwal, Adam D. Cobb, Daniel Elenius, Manoj Acharya, Colin Samplawski, Alexander M. Berenbeim, Nathaniel D. Bastian, Susmit Jha, Anirban Roy

Comments 12 pages, 3 figures

2604.19774 2026-04-23 cs.CL cs.AI

Phase 1 Implementation of LLM-generated Discharge Summaries showing high Adoption in a Dutch Academic Hospital

Nettuno Nadalini, Tarannom Mehri, Anne H Hoekman, Katerina Kagialari, Job N Doornberg, Tom P van der Laan, Jacobien H F Oosterhoff, Rosanne C Schoonbeek, Charlotte M H H T Bootsma-Robroeks

Comments The methods section is located after the discussion in this manuscript

2604.19773 2026-04-23 cs.CL cs.AI

PR-CAD: Progressive Refinement for Unified Controllable and Faithful Text-to-CAD Generation with Large Language Models

Jiyuan An, Jiachen Zhao, Fan Chen, Liner Yang, Zhenghao Liu, Hongyan Wang, Weihua An, Meishan Zhang, Erhong Yang

2604.19772 2026-04-23 cs.CL cs.AI

CoAuthorAI: A Human in the Loop System For Scientific Book Writing

Yangjie Tian, Xungang Gu, Yun Zhao, Jiale Yang, Lin Yang, Ning Li, He Zhang, Ruohua Xu, Hua Wang, Kewen Liao, Ming Liu

2604.19771 2026-04-23 cs.CL cs.AI cs.IR

Cognis: Context-Aware Memory for Conversational AI Agents

Parshva Daftari, Khush Patel, Shreyas Kapale, Jithin George, Siva Surendira

Comments 30 pages, 8 figures, 11 tables

2604.19770 2026-04-23 cs.CL cs.CV

Hybrid Multi-Phase Page Matching and Multi-Layer Diff Detection for Japanese Building Permit Document Review

Mitsumasa Wada

Comments 9 pages, 3 figures

2604.19769 2026-04-23 cs.CL cs.AI cs.LG

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference

Gradwell Dzikanyanga, Weihao Yang, Hao Huang, Donglei Wu, Shihao Wang, Wen Xia, Sanjeeb K C

2604.19768 2026-04-23 cs.CL cs.AI

Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

Asim D. Bakhshi

Comments 19 pages, 7 figures, Paper Under Review by the Elsevier Journal Assessing Writing

2604.19767 2026-04-23 cs.LG cs.AI

Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models

Ally Qin, Jian Wan, Sarat Mudunuri, Srinivasan Manoharan

2604.19766 2026-04-23 cs.CL cs.AI

OThink-SRR1: Search, Refine and Reasoning with Reinforced Learning for Large Language Models

Haijian Liang, Zenghao Niu, Junjie Wu, Changwang Zhang, Wangchunshu Zhou, Jun Wang

2604.19765 2026-04-23 cs.CL cs.AI

Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs

Snehit Vaddi, Pujith Vaddi

Comments 18 pages, 5 models, 6 domains, ACL format. Includes causal intervention analysis

2604.19764 2026-04-23 cs.CL cs.AI

Can We Locate and Prevent Stereotypes in LLMs?

Alex D'Souza

2604.19761 2026-04-23 cs.AI cs.LG cs.NE

EvoForest: A Novel Machine-Learning Paradigm via Open-Ended Evolution of Computational Graphs

Kamer Ali Yuksel, Hassan Sawaf

2604.19760 2026-04-23 cs.AI cs.SI

Inference Headroom Ratio: A Diagnostic and Control Framework for Inference Stability Under Constraint

Robert Reinertsen

Comments Resubmission with revisions addressing moderator concerns regarding distinction from signal-to-noise metrics and structural dependence in simulation design. See updated Section 4.4 for clarification

2604.19759 2026-04-23 cs.AI cs.CL

Automated Detection of Dosing Errors in Clinical Trial Narratives: A Multi-Modal Feature Engineering Approach with LightGBM

Mohammad AL-Smadi

Comments Accepted for CL4Health 2026, LREC26 conference

2604.19758 2026-04-23 cs.AI cs.CL cs.LG

ThermoQA: A Three-Tier Benchmark for Evaluating Thermodynamic Reasoning in Large Language Models

Kemal Düzkar

Comments 17 pages, 8 figures, open-source dataset and code

2604.19757 2026-04-23 cs.LG cs.AI cs.CL

Transparent Screening for LLM Inference and Training Impacts

Arnault Pachot, Thierry Petit

2604.19756 2026-04-23 cs.LG cs.AI

WorkflowGen:an adaptive workflow generation mechanism driven by trajectory experience

Ruocan Wei, Shufeng Wang, Ziwei Shi

Comments 16 pages,3 tables

2604.19754 2026-04-23 cs.AI cs.LG

Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classroom

Prudence Djagba, Kevin Haudek, Clare G. C. Franovic, Leonora Kaldaras

Comments Published as a conference paper at NARST 2026

2604.19753 2026-04-23 cs.AI cs.CL cs.LG

Algorithm Selection with Zero Domain Knowledge via Text Embeddings

Stefan Szeider

2604.19751 2026-04-23 cs.AI cs.CY

AI to Learn 2.0: A Deliverable-Oriented Governance Framework and Maturity Rubric for Opaque AI in Learning-Intensive Domains

Seine A. Shintani

Comments 10 pages, 2figures

2604.19749 2026-04-23 cs.AI cs.SE

The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?

Yirong Zeng, Shen You, Yufei Liu, Qunyao Du, Xiao Ding, Yutai Hou, Yuxian Wang, Wu Ning, Haonan Song, Dandan Tu, Bibo Cai, Ting Liu

Comments 17 pages, 9 figures

2604.19679 2026-04-23 cs.CV

MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation

Liyang Li, Wen Wang, Canyu Zhao, Tianjian Feng, Zhiyue Zhao, Hao Chen, Chunhua Shen

Comments Project page: https://aim-uofa.github.io/MMControl/

2604.19591 2026-04-23 cs.CV

Structure-Semantic Decoupled Modulation of Global Geospatial Embeddings for High-Resolution Remote Sensing Mapping

Jienan Lyu, Miao Yang, Jinchen Cai, Yiwen Hu, Guanyi Lu, Junhao Qiu, Runmin Dong

2604.19564 2026-04-23 cs.CV cs.AI

EgoSelf: From Memory to Personalized Egocentric Assistant

Yanshuo Wang, Yuan Xu, Xuesong Li, Jie Hong, Yizhou Wang, Chang Wen Chen, Wentao Zhu

2604.19502 2026-04-23 cs.CL

Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews

Bowen Li, Haochen Ma, Yuxin Wang, Jie Yang, Yining Zheng, Xinchi Chen, Xuanjing Huang, Xipeng Qiu

Comments 38 pages,8 figures,4 tables

2604.19386 2026-04-23 cs.CV

Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

Zhiheng Fu, Yupeng Hu, Qianyun Yang, Shiqi Zhang, Zhiwei Chen, Zixu Li

Comments Accepted by CVPR 2026