arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.12355 2026-05-08 cs.LG

Tree-Structured Synergy of Large Language Models and Bayesian Optimization for Efficient CASH

Beicheng Xu, Weitong Qian, Lingching Tung, Yupeng Lu, Bin Cui

详情

英文摘要

To lower the expertise barrier in machine learning, the AutoML community has focused on the CASH problem, which jointly automates algorithm selection and hyperparameter tuning. While traditional methods like Bayesian Optimization (BO) struggle with cold-start issues, Large Language Models (LLMs) can mitigate these through semantic priors. However, existing LLM-based optimizers generalize poorly to high-dimensional, structured CASH spaces. In this paper, we propose LB-MCTS, a trajectory-structured optimization framework that uses a Monte Carlo Tree Search tree as a shared state for algorithm selection, hyperparameter refinement, and BO-LLM proposer synergy. Within this shared state, BO provides algorithm-specific surrogate modeling for quantitative search, while the LLM exploits path-aware selective memory to generate semantic proposals and reflections. As the surrogate model improves, a reliability-aware proposer policy adaptively shifts from LLM-driven to BO-driven proposals within a unified search trajectory. Experiments on 104 AMLB datasets demonstrate that LB-MCTS consistently outperforms BO-based, LLM-based, and hybrid baselines.

URL PDF HTML ☆

赞 0 踩 0

2601.09298 2026-05-08 cs.CV

Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain

Lianying Chao, Kai Zhang, Haoran Cai, Sijie Wu, Xubin Li, Xin Chen

详情

Journal ref: 2025 CCF BigData

英文摘要

In the information and communications technology (ICT) industry, training a domain-specific large language model (LLM) or constructing a retrieval-augmented generation system requires a substantial amount of high-value domain knowledge. However, the knowledge is not only hidden in the textual modality but also in the image modality. Traditional methods can parse text from domain documents but dont have image captioning ability. Multi-modal LLM (MLLM) can understand images, but they do not have sufficient domain knowledge. To address the above issues, this paper proposes a multi-stage progressive training strategy to train a Domain-specific Image Captioning Model (DICModel) in ICT, and constructs a standard evaluation system to validate the performance of DICModel. Specifically, this work first synthesizes about 7K image-text pairs by combining the Mermaid tool and LLMs, which are used for the first-stage supervised-fine-tuning (SFT) of DICModel. Then, ICT-domain experts manually annotate about 2K image-text pairs for the second-stage SFT of DICModel. Finally, experts and LLMs jointly synthesize about 1.5K visual question answering data for the instruction-based SFT. Experimental results indicate that our DICModel with only 7B parameters performs better than other state-of-the-art models with 32B parameters. Compared to the SOTA models with 7B and 32B parameters, our DICModel increases the BLEU metric by approximately 56.8% and 20.8%, respectively. On the objective questions constructed by ICT domain experts, our DICModel outperforms Qwen2.5-VL 32B by 1% in terms of accuracy rate. In summary, this work can efficiently and accurately extract the logical text from images, which is expected to promote the development of multimodal models in the ICT domain.

URL PDF HTML ☆

赞 0 踩 0

2601.08403 2026-05-08 cs.AI

Owen-Shapley Policy Optimization: A Principled RL Algorithm for Generative Search LLMs

Abhijnan Nath, Alireza Bagheri Garakani, Tianchen Zhou, Fan Yang, Yan Gao, Nikhil Krishnaswamy

Comments Added additional experiments, computational analysis and further revisions

2601.06320 2026-05-08 cs.LG physics.geo-ph

Sensoformer: Robust Sim-to-Real Inference on Variable-Geometry Sensor Sets via Physics-Structured Randomization

Zhe Jia, Xiaotian Zhang, Junpeng Li

2601.03162 2026-05-08 cs.LG

On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime

Shuai Jiang, Alexey Voronin, Eric Cyr, Ben Southworth

Comments 21 pages, 13 figures,

2601.01400 2026-05-08 cs.CL

EternalMath: A Living Benchmark of Frontier Mathematics that Evolves with Human Discovery

Jicheng Ma, Guohua Wang, Xinhua Feng, Yiming Liu, Zhichao Hu, Yuhong Liu

2601.00655 2026-05-08 cs.LG cs.AI

Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability

Kasra Fouladi, Hamta Rahmani

Comments 12 pages

2512.22991 2026-05-08 cs.LG

Fusion or Confusion? Multimodal Complexity Is Not All You Need

Tillmann Rheude, Roland Eils, Benjamin Wild

2512.20854 2026-05-08 cs.CL cs.IR

How important is Recall for Measuring Retrieval Quality?

Shelly Schwartz, Oleg Vasilyev, Randy Sawaya

Comments Dataset: https://huggingface.co/datasets/primer-ai/retrieval-response

2512.18181 2026-05-08 cs.CV

MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation

Kaixing Yang, Jiashu Zhu, Xulong Tang, Ziqiao Peng, Xiangyue Zhang, Puwei Wang, Jiahong Wu, Xiangxiang Chu, Hongyan Liu, Jun He

Comments Accepted by SIGGRAPH 2026

2512.18034 2026-05-08 cs.AI

Accelerating Discrete Facility Layout Optimization: A Hybrid CDCL and CP-SAT Architecture

Joshua Gibson, Kapil Dhakal

2512.13281 2026-05-08 cs.CV

VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?

Jiaqi Wang, Weijia Wu, Yi Zhan, Rui Zhao, Ming Hu, James Cheng, Wei Liu, Philip Torr, Kevin Qinghong Lin

Comments Code is at https://github.com/video-reality-test/video-reality-test, page is at https://video-reality-test.github.io/

2512.10248 2026-05-08 cs.CV cs.AI

RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection

Zhuo Wang, Xiliang Liu, Ligang Sun

2512.06370 2026-05-08 cs.LG stat.ML

Greedy Alignment Principle for Optimizer Selection

Jaerin Lee, Kyoung Mu Lee

Comments 34 pages, 4 figures

2511.22812 2026-05-08 cs.CV

LC4-DViT: Land-cover Creation for Land-cover Classification with Deformable Vision Transformer

Kai Wang, Siyi Chen, Weicong Pang, Chenchen Zhang, Renjun Gao, Ziru Chen, Cheng Li, Dasa Gu, Rui Huang, Alexis Kai Hon Lau

Comments This work has been submitted to the IEEE for possible publication.The project is available at https://github.com/weicongpang/LVC2-DViT.git

2511.22038 2026-05-08 cs.CL

Early Risk Prediction with Temporally and Contextually Grounded Clinical Language Processing

Rochana Chaturvedi, Yue Zhou, Andrew D. Boyd, Brian T. Layden, Mudassir Rashid, Lu Cheng, Ali Cinar, Barbara Di Eugenio

2511.21471 2026-05-08 cs.AI

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

Peiran Xu, Sudong Wang, Yao Zhu, Jianing Li, Gege Qi, Yunjian Zhang

2511.19972 2026-05-08 cs.CV

Boosting Reasoning in Large Multimodal Models via Activation Replay

Yun Xing, Xiaobin Hu, Qingdong He, Jiangning Zhang, Shuicheng Yan, Shijian Lu, Yu-Gang Jiang

Comments CVPR 2026

2511.00751 2026-05-08 cs.AI cs.CL

Self-Consistency Is Losing Its Edge: Diminishing Returns and Rising Costs in Modern LLMs

Chiyan Loo

Comments 7 pages, 3 figures

2510.16371 2026-05-08 cs.CV cs.AI cs.LG

Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis

Mohammad Javad Ahmadi, Iman Gandomi, Parisa Abdi, Seyed-Farzad Mohammadi, Amirhossein Taslimi, Mehdi Khodaparast, Hassan Hashemi, Mahdi Tavakoli, Hamid D. Taghirad

Comments 28 pages, 14 figures, 15 tables. Data descriptor for the Cataract-LMM benchmark dataset. Source code and dataset are available

2510.11068 2026-05-08 cs.LG eess.AS eess.IV

Efficient Test-Time Adaptation through Latent Subspace Coefficients Search

Xinyu Luo, Jie Liu, Kecheng Chen, Junyi Yang, Bo Ding, Arindam Basu, Haoliang Li

Comments Under review

2510.10241 2026-05-08 cs.CL cs.IR

ImCoref-CeS: An Improved Lightweight Pipeline for Coreference Resolution with LLM-based Checker-Splitter Refinement

Kangyang Luo, Yuzhuo Bai, Shuzheng Si, Cheng Gao, Zhitong Wang, Yingli Shen, Wenhao Li, Zhu Liu, Yufeng Han, Jiayi Wu, Cunliang Kong, Maosong Sun

Comments Accepted by ACL2026 main

2510.09316 2026-05-08 cs.LG cs.CL

Large Language Model Prompt Datasets: An In-depth Analysis and Insights

Yuanming Zhang, Yan Lin, Arijit Khan, Huaiyu Wan

2510.08750 2026-05-08 cs.LG cs.CL

Exploring Cross-Client Memorization of Training Data in Large Language Models for Federated Learning

Tinnakit Udsa, Can Udomcharoenchaikit, Patomporn Payoungkhamdee, Sarana Nutanong, Norrathep Rattanavipanon

Comments Accepted to The 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

2510.07516 2026-05-08 cs.AI cs.CL

CompassLLM: A Multi-Agent Approach toward Geo-Spatial Reasoning for Popular Path Query

Md. Nazmul Islam Ananto, Shamit Fatin, Mohammed Eunus Ali, Md Rizwan Parvez

2510.01719 2026-05-08 cs.CL

What MLLMs Learn about When they Learn about Multimodal Reasoning

Jiwan Chung, Neel Joshi, Pratyusha Sharma, Youngjae Yu, Vibhav Vineet

2510.01457 2026-05-08 cs.LG

A Forensic Analysis of Synthetic Data in RL: Diagnosing and Solving Algorithmic Failures in Model-Based Policy Optimization

Brett Barkley, David Fridovich-Keil

2509.23629 2026-05-08 cs.AI cond-mat.dis-nn cond-mat.stat-mech cs.LG physics.soc-ph

Emergent Slow Thinking in LLMs as Inverse Tree Freezing

Sihan Hu, Xiansheng Cai, Yuan Huang, Zhiyuan Yao, Linfeng Zhang, Pan Zhang, Youjin Deng, Kun Chen

Comments 34 pages, 17 figures, 1 table

2509.17291 2026-05-08 cs.LG

GraphWeave: Interpretable and Robust Graph Generation via Random Walk Trajectories

Rahul Nandakumar, Deepayan Chakrabarti

Comments 18 pages, 4 figures. Accepted at ECML-PKDD 2025

2509.14594 2026-05-08 cs.AI

SynBench: A Benchmark for Differentially Private Text Generation

Yidan Sun, Viktor Schlegel, Srinivasan Nandakumar, Iqra Zahid, Yuping Wu, Yulong Wu, Hao Li, Jie Zhang, Warren Del-Pinto, Goran Nenadic, Siew Kei Lam, Anil Anthony Bharath

Comments 16 pages