arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2086
专题追踪
2605.05328 2026-05-08 cs.CV cs.RO

Query2Uncertainty: Robust Uncertainty Quantification and Calibration for 3D Object Detection under Distribution Shift

Query2Uncertainty: 3D目标检测中分布偏移下的鲁棒不确定性量化与校准

Till Beemelmanns, Alexey Nekrasov, Stefan Vilceanu, Jonas Steinhaus, Timo Woopen, Bastian Leibe, Lutz Eckstein

发表机构 * Institute for Automotive Engineering, RWTH Aachen(汽车工程研究所,亚琛RWTH大学) Computer Vision Institute, RWTH Aachen(计算机视觉研究所,亚琛RWTH大学)

AI总结 本文提出一种密度感知校准方法,结合后验校准器与DETR风格3D目标检测器的潜在对象查询特征密度,提升分布偏移场景下的不确定性估计与校准性能。

Comments Accepted for publication at CVPR 2026

详情
AI中文摘要

可靠的不确定性估计对于部署安全的自动驾驶系统至关重要,但现代检测器在分布偏移下仍存在校准不足的问题。尽管后验校准方法能改善分布内测试的校准效果,但无法适应分布偏移场景。本文提出一种密度感知校准方法,将后验校准器与DETR风格3D目标检测器的潜在对象查询特征密度相结合。这些查询形成紧凑且具有位置和类别意识的特征,适用于密度估计,使我们的方法能够在分布偏移场景中调整模型置信度。通过在这些查询特征上拟合密度估计器,我们的方法联合重新校准分类和边界框回归的不确定性。在多视角相机和LiDAR基检测器上,我们的方法在分布内和分布偏移场景中均优于标准后验方法。代码可访问https://tillbeemelmanns.github.io/query2uncertainty/。

英文摘要

Reliable uncertainty estimation for 3D object detection is critical for deploying safe autonomous systems, yet modern detectors remain poorly calibrated, especially under distribution shifts. Although post-hoc calibration methods address this issue and provide improved calibration for in-distribution tests, they fail to adapt in distribution-shifted scenarios. In this work, we address this issue and introduce a density-aware calibration method that couples post-hoc calibrators with the feature density of latent object queries from DETR-style 3D object detectors. These queries form a compact, location and class-aware feature, ideal for density estimation, allowing our approach to adjust model confidences in distribution-shift scenarios. By fitting a density estimator on these query features, our approach jointly recalibrates both classification and bounding box regression uncertainties. On both a multi-view camera and LiDAR-based detector, our approach consistently outperforms standard post-hoc methods in both in-distribution and distribution-shifted scenarios. Code available https://tillbeemelmanns.github.io/query2uncertainty/ .

2605.05285 2026-05-08 cs.LG

Attribution-Guided Continual Learning for Large Language Models

基于属性引导的大型语言模型持续学习

Yazheng Liu, Yuxuan Wan, Rui Xu, Xi Zhang, Sihong Xie, Hui Xiong

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港理工大学(广州)) The Beijing University of Posts and Telecommunications(北京邮电大学)

AI总结 本文提出基于属性引导的持续学习框架,通过估计每个Transformer层中参数的重要性,调节梯度以保留旧任务知识,实验表明在持续学习基准上优于基线方法。

详情
AI中文摘要

大型语言模型(LLMs)在持续学习中常出现灾难性遗忘:在依次学习新任务后,其在早期任务上的表现变差。现有方法通过数据回放、参数冻结或正则化来缓解灾难性遗忘,但这些方法缺乏对LLM内部知识分布语义意识。为此,我们提出一种属性引导的持续微调框架。我们的方法估计每个Transformer层中任务特定的参数重要性,并利用这些分数调节梯度。对先前任务重要的参数接收较小的更新,而相关性较低的参数保持可塑性以学习新任务。在持续学习基准上的实验表明,我们的方法在保持旧任务性能的同时,能够维持在新任务上的竞争力。

英文摘要

Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or regularization. However, these methods lack semantic awareness of internal knowledge distribution in LLMs. As a result, they cannot distinguish parameters that should be preserved or updated. We propose an attribution-guided continual fine-tuning framework for LLMs. Our method estimates task-specific, element-wise parameter importance in each Transformer layer and uses these scores to modulate gradients. Parameters important to previous tasks receive smaller updates, while less relevant ones remain plastic for learning new tasks. Experiments on continual learning benchmarks show that our method consistently outperforms baselines, achieving better retention of old tasks while maintaining competitive performance on new tasks.

2605.05283 2026-05-08 cs.CV

Seeing What Shouldn't Be There: Counterfactual GANs for Medical Image Attribution

看见不应存在的东西:用于医学图像归因的反事实GANs

Shakeeb Murtaza

发表机构 * COMSATS University Islamabad(卡姆萨特大学伊斯兰堡)

AI总结 本文提出基于反事实解释的类导向特征归因方法,通过生成对抗网络和循环一致性损失函数,提升医学图像分析的可解释性,并评估了反事实实例生成方法的有效性。

详情
AI中文摘要

图像归因提供了关于影响整个图像或其像素分类的物体的见解。这些见解帮助放射科医生可视化医学影像中的变形。现有的可视化技术基于判别模型,突出输入图像中参与分类器决策的区域。然而,这些方法不考虑所有显著对象,因为其目标是使用最小的判别特征集进行分类。为了解决这个问题,本文提出了一种基于反事实解释(CX)的类导向特征归因方法。反事实解释(CX)解释了因果推理过程:“如果X没有发生,那么Y就不会发生”。该方法基于生成对抗网络(GANs)并使用循环一致性损失函数。我们在合成、肺结核和BraTS数据集上评估了该方法。所有实验都证实了所提方法的有效性。本研究还突显了现有反事实解释技术在生成合理反事实实例(CIs)方面的局限性。因此,伴随可信的反事实实例(CIs)的反事实解释提供了自解释的类比解释。为此,提出了一种反事实实例生成方法。此外,还使用了一种新颖的技术来评估CI的质量。基线结果是在BraTS数据集上产生的。

英文摘要

Ascription of an image gives insights into the objects that influence the classification of the whole image or its pixels towards a specific category. These insights help radiologists to visualize deformities in medical imaging. Most of the existing visualization techniques are based on discriminative models and highlight regions of the input image participating in the decision-making of a classifier. However, these approaches do not take all noticeable objects into account as their objective is to classify the input by using a minimal set of discriminative features. To overcome the issue, a counterfactual explanation (CX) based class-oriented feature attribution method is proposed. A counterfactual explanation (CX) explicates a causal reasoning process of the form: "if X had not happened, then Y would not have happened". The method is built on generative adversarial networks (GANs) with a cyclical-consistent loss function. We evaluate our method on three datasets: synthetic, tuberculosis and BraTS. All experiments confirm the efficacy of the proposed method. This study also highlighted the limitations of existing counterfactual explanation techniques in producing plausible counterfactual instances (CIs). Accompanying CXs with believable CIs thus provides self-explanatory analogy-based explanations. To this end, a CI generation method is proposed. Also, a novel technique is used to evaluate the quality of CI. The baseline results are produced on the BraTS dataset.

2605.05280 2026-05-08 cs.LG

Forecasting Green Skill Demand in the Automotive Industry: Evidence from Online Job Postings

预测汽车行业的绿色技能需求:来自在线招聘信息的证据

Sabur Butt, Joshua N. Arrazola E., Hector G. Ceballos, Patricia Caratozzolo

发表机构 * Institute for the Future of Education, Tecnológico de Monterrey(教育未来研究所,蒙特雷理工大学)

AI总结 本文通过分析墨西哥汽车行业的在线招聘信息,构建计算框架预测绿色技能需求,识别出274种绿色技能,并利用时间序列模型预测未来趋势,发现可再生能源、回收和氢能技术需求增长最快。

详情
AI中文摘要

全球向可持续经济转型正在重塑劳动力市场,但系统性识别和预测绿色技能的方法仍然有限。本文提出一个计算框架,利用墨西哥汽车行业的在线招聘信息测量和预测绿色技能需求,该行业约占全国GDP的4%。我们收集了Indeed Mexico、OCC Mundial和LinkedIn(2024年7月至2025年7月)的招聘信息,共获得204,373条技能记录。结合多语言嵌入和ESCO验证的两阶段流程识别出274种独特的绿色技能,共出现8,576次(占所有技能的4.22%)。我们使用滚动起源评估基准测试了15种时间序列预测模型。基于Transformer的模型,尤其是FEDformer、Reformer和Informer,在MAE约为2.5e-5和相对RMSE低于15的情况下表现最佳。我们进一步提出一个框架,通过绝对和相对增长分类技能,识别出稳定、新兴和高影响的技能。结果显示当前需求集中在运营可持续性实践,而增长最快的技能与可再生能源、回收和氢能技术相关。该流程支持在绿色转型中的数据驱动劳动力规划。

英文摘要

The global transition toward sustainable economies is reshaping labor markets, yet systematic methods for identifying and forecasting green skills remain limited. This study presents a computational framework to measure and predict green skill demand using online job postings from Mexico's automotive industry, which contributes about 4% of national GDP. We compile a dataset of job advertisements from Indeed Mexico, OCC Mundial, and LinkedIn (July 2024 to July 2025), yielding 204,373 skill records. A two-stage pipeline combining multilingual embeddings and ESCO validation identifies 274 unique green skills across 8,576 occurrences (4.22% of all skills). We benchmark 15 time series forecasting models using a rolling origin evaluation. Transformer-based models, especially FEDformer, Reformer, and Informer, achieve the best performance, with MAE around 2.5e-5 and relative RMSE below 15. We further propose a framework to classify skills by absolute and relative growth, identifying stable, emerging, and high-impact competencies. Results show current demand is concentrated in operational sustainability practices, while the fastest-growing skills relate to renewable energy, recycling, and hydrogen technologies. This pipeline supports data-driven workforce planning in the green transition.

2605.05278 2026-05-08 cs.LG cs.IT math.IT

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

专家路由用于通信高效MoE的有限专家银行

Mohammad Reza Deylam Salehi, Ali Khalesi

发表机构 * LINCS Lab(LINCS实验室)

AI总结 本文提出有限专家银行框架,通过预训练CNN专家和数据依赖选择规则,利用信息论量化路由信息,以提升MoE推理系统的资源效率和专家路由性能。

详情
AI中文摘要

资源高效的机器学习越来越多地使用稀疏混合专家(MoE)架构,其中门控同时作为学习组件和路由接口,控制计算、通信和精度。受MoE门控的有限速率解释启发,我们将门控视为随机信道,并用I(X;T)量化所选专家可用的路由信息。为使信息量在合成示例外可处理,我们开发了使用预训练CNN专家的有限银行MNIST构造,并采用离散的数据依赖选择规则。由于所选模型属于有限候选集,算法互信息I(S;W)具有从经验后验q(W|S)得出的闭式离散熵估计器。通过扫描数据依赖参数α,我们观察到I(S;W)单调跟踪泛化差距,而Xu-Raginsky界表现出预期的松散性。我们还与统一联合界基线进行比较,并引入I(X;T)的经验估计器以及Blahut-Arimoto过程来跟踪专家银行上的精度-速率曲线。所提框架为分析资源感知MoE推理系统和解释I(X;T)和D(R_g)作为高效专家路由的设计代理提供了实用工具。

英文摘要

Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpretations of MoE gating, we treat the gate as a stochastic channel and use $I(X;T)$ to quantify the routing information available to the selected expert. To make the associated information quantities tractable beyond synthetic examples, we develop a finite-bank MNIST construction using pretrained CNN experts and a discrete, data-dependent selection rule. Since the selected model belongs to a finite candidate set, the algorithmic mutual information $I(S;W)$ admits a closed-form discrete-entropy estimator from the empirical posterior $q(W|S)$. Sweeping a data-dependence parameter $α$, we observe that $\widehat I(S;W)$ monotonically tracks the generalization gap, while the Xu-Raginsky bound exhibits the expected looseness. We also compare with a uniform union-bound baseline and introduce an empirical estimator of $I(X;T)$ together with a Blahut-Arimoto procedure for tracing an accuracy-rate curve over the expert bank. The proposed framework provides a practical tool for analyzing resource-aware MoE inference systems and for interpreting $I(X;T)$ and $D(R_g)$ as design proxies for efficient expert routing.

2605.05245 2026-05-08 cs.CL cs.IR

AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation

AdaGATE:适应性间隙感知的令牌高效证据组装用于多跳检索增强生成

Yilin Guo, Yinshan Wang, Yixuan Wang

发表机构 * Center for Data Science(数据科学中心) New York University(纽约大学) Tandon School of Engineering(泰顿工程学院)

AI总结 AdaGATE通过适应性间隙感知修复和令牌高效证据选择,提升多跳RAG在不完美检索下的鲁棒性,实现最佳证据F1得分。

Comments 10 pages, 4 figures, 2 tables

详情
AI中文摘要

检索增强生成(RAG)在现实部署中对多跳问题仍脆弱,因检索证据可能噪声或冗余且仅有限上下文传递给生成器。现有控制器解决部分问题,但通常仅扩展上下文加法、从固定top-k集选择或优化相关性而不显式修复缺失的桥接事实。我们提出AdaGATE,一种无训练证据控制器,将证据选择框架为令牌约束的修复问题。AdaGATE结合实体中心间隙跟踪、针对性微查询生成和基于效用的选择机制,平衡间隙覆盖、证实、新颖性、冗余和直接问题相关性。我们在HotpotQA上评估AdaGATE,涵盖清洁、冗余和噪声注入检索条件。在所有三种设置中,AdaGATE在比较控制器中实现最佳证据F1得分,清洁数据达62.3%,冗余注入达71.2%,同时使用比Adaptive-k少2.6倍的输入令牌。这些结果表明,显式间隙感知修复结合令牌高效证据选择可提升多跳RAG在不完美检索下的鲁棒性。我们的代码和评估流程可在https://github.com/eliguo/AdaGATE获取。

英文摘要

Retrieval-augmented generation (RAG) remains brittle on multi-hop questions in realistic deployment settings, where retrieved evidence may be noisy or redundant and only limited context can be passed to the generator. Existing controllers address parts of this problem, but typically either expand context additively, select from a fixed top-k set, or optimize relevance without explicitly repairing missing bridge facts. We propose AdaGATE, a training-free evidence controller for multi-hop RAG that frames evidence selection as a token-constrained repair problem. AdaGATE combines entity centric gap tracking, targeted micro-query generation, and a utility based selection mechanism that balances gap coverage, corroboration, novelty, redundancy, and direct question relevance. We evaluate AdaGATE on HotpotQA under clean, redundancy, and noise injected retrieval conditions. Across all three settings, AdaGATE achieves the best evidence F1 among the compared controllers, reaching 62.3% on clean data and 71.2% under redundancy injection, while using 2.6x fewer input tokens than Adaptive-k. These results suggest that explicit gap-aware repair, combined with token-efficient evidence selection, improves robustness in multi-hop RAG under imperfect retrieval. Our code and evaluation pipeline are available at https://github.com/eliguo/AdaGATE.

2605.05241 2026-05-08 cs.RO cs.LG

DexSim2Real: Foundation Model-Guided Sim-to-Real Transfer for Generalizable Dexterous Manipulation

DexSim2Real: 基于基础模型的仿真到现实迁移用于通用的灵巧操作

Zijian Zeng, Fei Ding, Huiming Yang, Xianwei Li, Yuhao Liao

发表机构 * Tsinghua University(清华大学) Alibaba Group(阿里巴巴集团) Bengbu University(Bengbu大学) UCSI University(UCSI大学)

AI总结 本文提出DexSim2Real框架,结合基础模型引导的域随机化、触觉视觉交叉注意力策略和渐进技能课程,提升灵巧操作的仿真到现实迁移性能,实验表明其在现实世界中的成功率达到78.2%。

Comments 13 pages, 2 figures, 5 tables

详情
AI中文摘要

仿真到现实迁移仍然是将仿真中学习的灵巧操作策略部署到现实机器人中的关键瓶颈。现有方法依赖手动设计的域随机化或任务特定适应,限制了其在多样化操作场景中的泛化能力。我们提出了DexSim2Real,一个集成框架,利用视觉语言基础模型来弥合灵巧操作的仿真到现实差距。我们的系统结合了三个组件:(1) 基础模型引导的域随机化(FM-DR),利用视觉语言模型作为视觉真实性批评者,通过闭环CMA-ES优化仿真参数,补充文本方法如DrEureka的直接视觉反馈;(2) 触觉视觉交叉注意力策略(TVCAP),适应零样本仿真到现实强化学习;(3) 渐进技能课程(PSC),基于LLM的任务分解,结合针对接触丰富的灵巧任务定制的难度调度器。在六个具有挑战性的操作任务上的广泛实验表明,DexSim2Real在现实世界中的平均成功率为78.2%,优于DrEureka和DeXtreme,同时将仿真到现实性能差距降低到仅8.3%。

英文摘要

Sim-to-real transfer remains a critical bottleneck for deploying dexterous manipulation policies learned in simulation to real-world robots. Existing approaches rely on manually designed domain randomization or task-specific adaptation, limiting their generalizability across diverse manipulation scenarios. We present DexSim2Real, an integrated framework that leverages vision-language foundation models to bridge the sim-to-real gap for dexterous manipulation. Our system combines three components: (1) Foundation Model-Guided Domain Randomization (FM-DR), which uses a vision-language model as a visual realism critic to optimize simulation parameters via closed-loop CMA-ES, complementing text-based approaches like DrEureka with direct visual feedback; (2) a Tactile-Visual Cross-Attention Policy (TVCAP) that adapts cross-attention visuo-tactile fusion to zero-shot sim-to-real RL; and (3) a Progressive Skill Curriculum (PSC) that builds on LLM-based task decomposition with a difficulty scheduler tailored to contact-rich dexterous tasks. Extensive experiments on six challenging manipulation tasks with blinded evaluation demonstrate that DexSim2Real achieves a 78.2% average real-world success rate, outperforming DrEureka and DeXtreme while reducing the sim-to-real performance gap to only 8.3%.

2605.05236 2026-05-08 cs.RO cs.AI

Topology-Driven Anti-Entanglement Control for Soft Robots

拓扑驱动的抗纠缠控制用于软机器人

Haoyang Le, Shengxuan Wang, Mohan Chen, Shuo Feng

发表机构 * School of Mathematics and Statistics, Zhengzhou University(郑州大学数学与统计学学院) School of Computer Science and Artificial Intelligence, Zhengzhou University(郑州大学计算机科学与人工智能学院) College of Information Engineering, North China University of Water Resources and Electric Power(华北水利水电大学信息工程学院)

AI总结 本文提出拓扑驱动多智能体强化学习框架,通过集中学习和拓扑安全层协调多机器人系统,解决复杂环境中纠缠问题,提升收敛性和抗缠绕效果。

Comments 17 pages, 4 figures

详情
AI中文摘要

在复杂约束环境中精密制造领域,软机器人作用日益突出,基于多智能体强化学习的抗缠绕控制成为研究热点。当前核心问题是如何在高约束环境中协调多个机器人完成解缠操作。现有分布式训练框架在高密度障碍和不稳定环境中面临可观测性挑战,导致学习效果差。本文提出拓扑驱动多智能体强化学习(TD-MARL)框架,通过集中学习使每个智能体通过共享拓扑状态感知其他智能体的策略,缓解复杂交互导致的训练不稳定性;通过分布式执行消除机器人间通信资源需求,提升系统可靠性;集成拓扑安全层利用拓扑不变量准确评估和缓解纠缠风险,避免策略陷入局部困难。最终在真实仿真环境中进行的完整仿真实验表明,该方法在收敛性和抗缠绕效果上优于当前先进的深度强化学习(DRL)方法。

英文摘要

In the field of precision manufacturing in complex constrained environments, the role of soft robots is increasingly prominent, and the realization of anti-winding control based on multi-intelligent body reinforcement learning has become a research hotspot. One of the core problems at present is to coordinate multiple robots to complete the unwinding operation in a highly constrained environment. The existing distributed training framework faces some observability challenges in high-density barrier and unstable environments, resulting in poor learning results. This paper proposes a topology-driven Multi-Agent Reinforcement Learning (TD-MARL) framework to coordinate multi-robot systems to avoid entanglement. Specifically, the critical network adopts centralized learning, so that each intelligent body can perceive the strategies of other intelligent bodies by sharing the topological state, thus alleviating the training instability caused by complex interactions; eliminating the demand for communication resources between robots through distributed execution, Upgrade system reliability; the integrated topological security layer uses topological invariants to accurately assess and mitigate the risk of entanglement to avoid the strategy from falling into local difficulties. Finally, the full simulation experiments carried out in the real simulation environment show that the method is better than the current advanced deep reinforcement learning (DRL) method in terms of convergence and anti-winding effect.

2605.05228 2026-05-08 cs.LG cs.AI cs.NE

Evolutionary fine tuning of quantized convolution-based deep learning models

量化卷积深度学习模型的进化微调

Marcin Pietroń

发表机构 * AGH University(AGH大学)

AI总结 本文提出利用进化策略优化量化模型的精度,通过调整少量权重的量化状态提升性能,验证了在VGG、ResNet等架构上的有效性。

详情
AI中文摘要

深度学习模型在许多机器学习任务中是最高效的。在物联网、移动设备、独立自主或实时系统中使用时,其复杂性和内存大小是主要缺点。因此,许多研究集中在深度学习架构的压缩技术上。最流行的技巧之一是量化。在大多数工作中,量化是基于最近邻量化技术进行的。本文重点是改进预训练和量化模型的量化效率。该方法有潜力提高量化模型的最终精度。本文的主要假设是基于最近邻四舍五入的网络最终量化状态不保证最优精度。在本文中,进化策略被用作优化方法。每次迭代中,进化会改变少量权重的值。它将这些值转移到不同的量化状态。本文表明,使用适当的操作符和参数的所提进化方法可以快速提高量化模型的精度。结果展示了VGG和ResNet等流行架构在图像分类和检测中的表现。此外,还对自编码器架构进行了模拟。

英文摘要

Deep learning models are the most efficient models in many machine learning tasks. The main disadvantage when using them in IoT, mobile devices, independent autonomous or real-time systems is their complexity and memory size. Therefore, much research has concentrated on compression techniques of deep learning architectures. One of the most popular technique is quantization. In most of the works, the quantization is done based on the nearest neighbour quantization technique. This work focuses on improving the quantization efficiency in pretrained and quantized models. This approach has the potential to improve the final accuracy of quantized models. The main postulate of the work is that final quantization states of the network based on nearest neighbour rounding does not guarantee optimal accuracy. In the presented work, the evolution strategy is used as an optimization approach. The evolution in each iteration changes the values of the small percentage of weights. It shifts theirs values to different quantization states. The work shows that proposed evolution with an appropriate set of operators and parameters can fast improve the accuracy of the quantized models. The results are presented for popular architectures such as VGG and Resnet for image classification and detection. Additionally, simulations were carried out for the autoencoder architecture.

2605.05227 2026-05-08 cs.LG cs.AI

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

重新思考大语言模型训练中的数据整理:在线重加权优于离线方法

Wanru Zhao, Yihong Chen, Yuzhi Tang, Wentao Ma, Shengchao Hu, Shell Xu Hu, Alex Iacob, Abhinav Mehrotra, Nicholas D. Lane

发表机构 * University of Cambridge(剑桥大学) OATML, University of Oxford(牛津大学OATML实验室) University of Toronto(多伦多大学) Shanghai Jiao Tong University(上海交通大学) Samsung AI Center(三星人工智能中心)

AI总结 本文提出ADAPT框架,通过动态在线重加权提升模型泛化能力,实验显示其在指令微调和大规模预训练中优于传统离线方法。

Comments ICLR 2026

详情
AI中文摘要

数据整理是大语言模型训练中的关键但研究不足的领域。现有方法如数据选择和混合均采用离线范式,与训练过程脱节,导致工程开销大且整理过程脆弱:整个流程必须在模型/任务变化时重新运行。此外,离线方法通过硬过滤或重采样改变数据量,常牺牲数据多样性并损害泛化能力。本文提出将数据整理重新视为在线重加权问题,通过损失加权动态调整样本重要性而非静态预处理。具体而言,我们引入ADAPT(自适应数据重加权用于预训练和微调),一种动态在线框架,通过基于相似性的质量信号引导的自适应样本学习率重新加权训练样本,不改变训练样本数量。不同于强制静态数据分布的离线方法,ADAPT作为隐式课程学习者,逐步从粗粒度模式转向细粒度语义区分。在指令微调和大规模预训练实验中,ADAPT在相等FLOPs下实现了更强的跨基准泛化能力。

英文摘要

Data curation is a critical yet under-explored area in large language model (LLM) training. Existing methods, such as data selection and mixing, operate in an offline paradigm, detaching themselves from training. This separation introduces engineering overhead and makes the curation brittle: the entire pipeline must be re-run under model/task shifts. Moreover, offline methods alter data size through hard filtering or resampling, often sacrificing data diversity and harming generalization. We propose to rethink data curation as an online reweighting problem, where sample importance is dynamically adjusted during training via loss weighting rather than static pre-processing. Specifically, we introduce ADAPT (Adaptive Data reweighting for Pretraining and FineTuning), a dynamic online framework that reweights training samples with adaptive per-sample learning rates guided by similarity-based quality signals, without changing the number of training samples. Unlike offline methods that enforce a static data distribution, ADAPT acts as an implicit curriculum learner, progressively shifting focus from coarse-grained patterns to fine-grained semantic distinctions as the model evolves. Experiments on both instruction tuning and large-scale pretraining show that ADAPT consistently outperforms offline selection/mixing and prior online methods, achieving stronger cross-benchmark generalization under equal FLOPs.

2605.05224 2026-05-08 cs.LG cs.AI cs.CR

Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training Paradigms

通道级语义扰动:用于多样化训练范式的不可学习示例

Bo Wang, Jia Ni, Mengnan Zhao, Zhan Qin, Kui Ren

发表机构 * School of Information and Communication Engineering, Dalian University of Technology(大连理工大学信息与通信工程学院) School of Computer Science and Technology, Anhui University(安徽大学计算机科学与技术学院) School of Computer Science and Technology, Zhejiang University(浙江大学计算机科学与技术学院)

AI总结 本文研究了在不同训练范式下不可学习示例的有效性,发现预训练权重会削弱现有方法效果,并提出Shallow Semantic Camouflage策略以提升数据不可学习性。

详情
AI中文摘要

未经许可的个人数据在模型训练中的使用已成为日益严重的隐私威胁。不可学习示例(UEs)通过在良性示例中嵌入不可察觉的扰动来阻碍特征学习。然而,现有研究主要在从头开始训练设置下评估UEs,其在广泛采用的预训练-微调(PF)范式下的行为尚未被探索。本文首次系统研究了不同训练范式下的不可学习示例。我们的分析发现加载并冻结预训练权重显著削弱了现有UEs方法的有效性。我们进一步通过语义过滤解释这些发现:虽然UEs倾向于使模型过拟合非语义噪声,从而削弱其语义提取能力,但在PF范式下,冻结的浅层保留数据语义,有效过滤掉如不可学习噪声等干扰信息。基于这些见解,我们提出了一种分层欺骗策略,即浅层语义伪装(SSC),该策略将生成过程限制在语义有效的子空间内,旨在绕过预训练权重引入的语义抑制。大量实验表明,我们的方法在挑战性训练范式下,如浅层冻结和语义聚焦预训练(SF-Pretrain)中,能够持续保持数据不可学习性,填补了基于预训练的不可学习学习中的关键空白。

英文摘要

The unauthorized use of personal data in model training has emerged as a growing privacy threat. Unlearnable examples (UEs) address this issue by embedding imperceptible perturbations into benign examples to obstruct feature learning. However, existing studies mainly evaluate UEs under from-scratch training settings, leaving their behavior under the widely adopted pretraining-finetuning (PF) paradigm largely unexplored. In this work, we provide the first systematic investigation of unlearnable examples across diverse training paradigms. Our analysis reveals that loading and freezing pretrained weights significantly weakens the effectiveness of existing UEs methods. We further explain these findings through semantic filtering: while UEs tend to induce models to overfit non-semantic noise, thereby weakening their semantic extraction capabilities, under the PF paradigm, frozen shallow layers preserve data semantics, effectively filtering out distracting information like unlearnable noise. Guided by these insights, we propose a hierarchical deception strategy, Shallow Semantic Camouflage (SSC), that confines the generation process to a semantically valid subspace, aiming to bypass the semantic suppression introduced by pretrained weights. Extensive experiments demonstrate that our method consistently preserves data unlearnability even under challenging training paradigms, such as shallow-layer freezing and semantic-focused pretraining (SF-Pretrain), bridging the critical gap in pretrain-based unlearnable learning.

2605.05223 2026-05-08 cs.LG cs.AI

Structural Instability of Feature Composition

特征组合的结构不稳定性

Yunpeng Zhou

发表机构 * Whiteknights House, Whiteknights, Reading, RG6 6UR(阅读大学)

AI总结 本文研究了特征组合中同时激活不同语义潜在表示的理论基础,提出几何框架分析特征联合的不稳定性,并验证了在高偏差情况下ReLU rectification导致的系统漂移现象。

详情
AI中文摘要

稀疏自编码器(SAEs)已成为解耦变换器架构中特征叠加的强大范式,使通过激活引导实现精确控制成为可能。然而,组合引导——同时激活不同语义潜在表示的理论基础仍被忽视。线性表示假说通常忽略了在过完备词典中出现的非线性干扰效应。我们提出一个几何框架来分析特征联合的不稳定性。将激活空间建模为高维稀疏锥流形,推导出在球形词典模型下渐近的组合崩溃阈值,该阈值由信号锥的高斯均宽(统计维度)表征。我们进一步表明,在高偏差情况下,ReLU rectification将微观相关性引起的方差波动转换为系统漂移,通过组合积累产生与ratchet效应一致的干扰增长。我们在CLEVR中提取的结构语义特征上验证了预测的标度趋势,其中层次相关性加速了相对于随机基线的过渡。共同,我们的结果突显了联合引导可扩展性的几何约束,并激发了超越朴素线性叠加的组合机制。

英文摘要

Sparse Autoencoders (SAEs) have emerged as a powerful paradigm for disentangling feature superposition in transformer-based architectures, enabling precise control via activation steering. However, the theoretical foundations of compositional steering -- the simultaneous activation of distinct semantic latents -- remain under-explored. The prevailing Linear Representation Hypothesis often abstracts away non-linear interference effects that arise in overcomplete dictionaries. We present a geometric framework for analyzing the instability of feature unions. Modeling the activation space as a high-dimensional sparse cone manifold, we derive an asymptotic compositional-collapse threshold under a spherical dictionary model, characterized by the Gaussian mean width (statistical dimension) of the signal cone. We further show that, in the high-bias regime, ReLU rectification converts microscopic correlation-induced variance fluctuations into a systematic drift that accumulates under composition, yielding interference growth consistent with a ratchet effect. We validate the predicted scaling trends on structured semantic features extracted from CLEVR, where hierarchical correlations accelerate the transition relative to random baselines. Together, our results highlight geometric constraints on the scalability of union-based steering and motivate composition mechanisms that explicitly manage interference beyond naive linear superposition.

2605.05222 2026-05-08 cs.LG cs.AI

Adaptive Computation Depth via Learned Token Routing in Transformers

通过学习的标记路由实现自适应计算深度

Ahmed Abdelmuniem Abdalla Mohammed

发表机构 * Independent Researcher(独立研究者)

AI总结 本文提出Token-Selective Attention,通过学习的标记门控机制,在Transformer中动态调整计算深度,减少计算量并提升效率。

Comments 11 pages, 9 figures, 4 tables, https://github.com/AhmedHamadto/TSA

详情
AI中文摘要

标准的Transformer架构对每个标记应用相同的层数,无论上下文难度。我们提出Token-Selective Attention(TSA),一种在连续Transformer块之间残差更新上的学习每标记门控。每个门是一个轻量级的两层多层感知机(MLP),产生连续的停止概率,使该机制端到端可微,参数开销为1.7%,不改变基础架构。值得注意的是,TSA在没有显式深度压力的情况下学习难度比例的路由:即使在λ=0(无深度正则化)时,任务损失梯度本身也驱动路由跳过20%的标记层操作。在字符级语言建模中,TSA在Tiny-Shakespeare和enwik8上节省了14-23%的标记层操作(TLOps),在质量损失小于0.5%的情况下。在匹配效率下,TSA的验证损失比早退出低0.7%,且学习的路由直接转移到推理时的稀疏执行,实现实际的时钟速度提升。

英文摘要

Standard transformer architectures apply the same number of layers to every token regardless of contextual difficulty. We present Token-Selective Attention (TSA), a learned per-token gate on residual updates between consecutive transformer blocks. Each gate is a lightweight two-layer multi-layer perceptron (MLP) that produces a continuous halting probability, making the mechanism end-to-end differentiable with 1.7% parameter overhead and no changes to the base architecture. Notably, TSA learns difficulty-proportional routing without any explicit depth pressure: even at $λ=0$ (no depth regularisation), the task-loss gradient alone drives the router to skip 20% of token-layer operations. On character-level language modeling, TSA saved 14-23% of token-layer operations (TLOps) across Tiny-Shakespeare and enwik8 at <0.5% quality loss. At matched efficiency, TSA achieved 0.7% lower validation loss than early exit, and the learned routing transfers directly to inference-time sparse execution for real wall-clock speedup.

2605.05221 2026-05-08 cs.LG cs.CL

Data-Driven Variational Basis Learning Beyond Neural Networks: A Non-Neural Framework for Adaptive Basis Discovery

数据驱动的变分基学习超越神经网络:一种非神经框架用于自适应基发现

Andrew Kiruluta

发表机构 * UC Berkeley(加州大学伯克利分校)

AI总结 本文提出非神经框架的变分基学习方法,通过变分优化直接从数据学习基函数,实现数据自适应的基展开,保留可解释性和数学透明度。

详情
AI中文摘要

经典的表示系统如傅里叶级数、小波和固定字典提供了可解析的基展开,但无法内在适应现代高维数据的经验结构。神经网络通过学习数据特征克服了这一限制,但其分层非线性参数化往往牺牲了可解释性、基结构的显式控制和数学透明性。本文开发了一种非神经替代方法,通过变分优化直接从数据学习基函数。所提出的框架称为数据驱动变分基学习(DVBL),将基原子作为主要优化变量,联合学习样本特定系数,并在适当情况下学习潜在线性演化算子。这产生了一种数据自适应的基展开,仍保持显式、可解释且适合严格分析。我们建立了模型,证明了极小值的存在,证明了交替最小化算法的块状下降性质,给出了系数恢复和基可识别的条件,并展示了如何在不调用神经架构的情况下整合流形和动态正则化。我们还讨论了该框架与经典字典学习、谱方法、Koopman算子方法和深度表示学习相比的概念新颖性。

英文摘要

Classical representation systems such as Fourier series, wavelets, and fixed dictionaries provide analytically tractable basis expansions, but they are not intrinsically adapted to the empirical structure of modern high-dimensional data. Neural networks overcome this limitation by learning features from data, yet they do so through layered nonlinear parameterizations that often sacrifice interpretability, explicit control over basis structure, and mathematical transparency. In this manuscript we develop a non-neural alternative that learns basis functions directly from data through variational optimization. The proposed framework, termed Data Driven Variational Basis Learning (DVBL), treats basis atoms as primary optimization variables and learns them jointly with sample-specific coefficients and, when appropriate, a latent linear evolution operator. This yields a data-adaptive basis expansion that remains explicit, interpretable, and amenable to rigorous analysis. We formulate the model, establish existence of minimizers, prove blockwise descent properties for an alternating minimization algorithm, give conditions for coefficient recovery and basis identifiability, and show how manifold and dynamical regularization can be integrated without invoking neural architectures. We also discuss the conceptual novelty of the framework relative to classical dictionary learning, spectral methods, Koopman operator methods, and deep representation learning.

2605.05218 2026-05-08 cs.LG cs.AI math.DS nlin.CD

Horizon-Constrained Rashomon Sets for Chaotic Forecasting

具有水平约束的Rashomon集用于混沌预测

Gauri Kale, Rahul Vishwakarma, Holly Diamond, Ava Hedayatipour, Amin Rezaei

发表机构 * Dept. of Electrical Engineering, California State University Long Beach, USA WorkOnward Inc., USA Dept. of Computer Engineering \& Computer Science, California State University Long Beach, USA

AI总结 本文提出水平约束Rashomon集理论框架,用于研究混沌系统中模型多样性随预测时间的变化,通过Lyapunov加权指标和决策对齐算法提升决策质量,实验表明在安全关键领域具有显著优势。

Journal ref AIP Advances 16, 045208 (2026)

详情
AI中文摘要

预测多样性与混沌动力学是机器学习中的两个基本挑战,尽管概念上有联系,但发展独立。本文通过引入水平约束Rashomon集理论框架,研究混沌系统中模型多样性随预测时间的变化。与静态预测任务不同,混沌导致初始相似模型呈指数发散,根本改变了预测等价性。证明有效Rashomon集随提前时间呈指数收缩,速率由最大李雅普诺夫指数决定,并引入李雅普诺夫加权指标提供更紧的预测分歧界。利用这些见解,开发决策对齐选择算法,根据下游效用选择近最优模型而非仅预测准确度。在合成混沌系统(洛伦兹-96、库马托-西瓦斯金斯基)和实际应用(风力发电、交通、天气)上的广泛实验表明,框架在提升决策质量(18-34%)的同时保持竞争性的预测性能。本文建立了混沌理论与预测多样性之间的首个严格联系,为在安全关键混沌领域部署机器学习提供了原则性指导。

英文摘要

Predictive multiplicity and chaotic dynamics represent two fundamental challenges in machine learning that have evolved independently despite their conceptual connections. We bridge this gap by introducing horizon-constrained Rashomon sets, a theoretical framework that characterizes how model multiplicity evolves with prediction horizon in chaotic systems. Unlike static prediction tasks where the Rashomon set remains fixed, chaos induces exponential divergence among initially similar models, fundamentally transforming the nature of predictive equivalence. We prove that the effective Rashomon set contracts exponentially with lead time at a rate determined by the maximum Lyapunov exponent and introduce Lyapunov-weighted metrics that provide tighter bounds on predictive disagreement. Leveraging these insights, we develop decision-aligned selection algorithms that choose among near-optimal models based on downstream utility rather than forecast accuracy alone. Extensive experiments on synthetic chaotic systems (Lorenz-96, Kuramoto-Sivashinsky) and real-world applications (wind power, traffic, weather) demonstrate that our framework improves decision quality by 18-34\% while maintaining competitive predictive performance. This work establishes the first rigorous connection between chaos theory and predictive multiplicity, providing principled guidance for deploying machine learning in safety-critical chaotic domains.

2605.05217 2026-05-08 cs.LG cs.AI

Physics-Informed Neural Networks with Learnable Loss Balancing and Transfer Learning

具有可学习损失平衡和迁移学习的物理信息神经网络

Reza Pirayeshshirazinezhad

发表机构 * Texas A&M University(德克萨斯大学A&M分校)

AI总结 本文提出了一种自监督的物理信息神经网络框架,通过动态平衡物理约束和数据驱动监督来提升科学机器学习在数据稀缺情况下的性能。该方法引入了可学习的混合神经元,以适应性调整各项的贡献,从而提高训练稳定性与泛化能力,并结合迁移学习策略提升效率。

详情
AI中文摘要

我们提出了一种自监督的物理信息神经网络(PINN)框架,该框架能够自适应地平衡物理约束和数据驱动的监督,以应对科学机器学习中的数据稀缺问题。与以往依赖固定或启发式权重的PINN不同,我们的方法引入了可学习的混合神经元,能够根据各项的不确定性动态调整各术语的相对贡献。这种机制使得训练过程更加稳定,并提高了泛化能力,无需手动调参。为进一步提升效率,我们集成了迁移学习策略,通过重用相关领域的表示,并将其适应到新的物理系统中,即使数据有限。我们验证了该框架在预测液态金属微型散热器中的热传递时的表现,仅使用87个CFD数据点,适应性PINN的误差小于8%,优于浅层神经网络、核方法和仅依赖物理的基线方法。我们的框架提供了一种将物理信息适应性地嵌入神经网络的一般方法,为各种科学领域(包括流体动力学和材料建模)中的数据稀缺问题提供了一种稳健且可重复的解决方案。

英文摘要

We propose a self-supervised physics-informed neural network (PINN) framework that adaptively balances physics-based and data-driven supervision for scientific machine learning under data scarcity. Unlike prior PINNs that rely on fixed or heuristic weighting of physics residuals and data loss, our approach introduces a learnable blending neuron that dynamically adjusts the relative contribution of each term based on their uncertainties. This mechanism enables stable training and improved generalization without manual tuning. To further enhance efficiency, we integrate a transfer learning strategy that reuses representations from related domains and adapts them to new physical systems with limited data. We validate the framework for the prediction of heat transfer in liquid-metal miniature heat sinks using only 87 CFD datapoints, where the adaptive PINN achieves an error <8%, outperforming shallow neural networks, kernel methods, and physics-only baselines. Our framework provides a general recipe for embedding physics adaptively into neural networks, offering a robust and reproducible approach for data-scarce problems across various scientific domains, including fluid dynamics and material modeling.

2605.05216 2026-05-08 cs.LG

SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees

SAT: 为无协调自由插拔多LLM训练的单调改进保证的序列代理调优

Yi Xie, Yangyang Xu, Yi Fan, Bo Liu

发表机构 * Department of Electrical \& Computer Engineering, University of Arizona Tucson, Arizona USA Department of Mathematical Sciences, Rensselaer Polytechnic Institute Troy, New York USA Amazon Web Services New York USA Department of Electrical \& Computer Engineering, University of Arizona Department of Mathematical Sciences, Rensselaer Polytechnic Institute Amazon Web Services

AI总结 本文提出SAT方法,通过因子化策略和块坐标更新实现无协调的分布式训练,保证训练过程单调改进和插拔不变性,实验表明其在AIME24/25基准上性能优于Qwen3-32B。

Comments Published at AAMAS 2026

详情
AI中文摘要

大型语言模型(LLMs)具有大量参数但部署成本高。近期研究探索使用多个小型高效LLM团队,但联合更新导致分布偏移问题。本文引入序列代理调优(SAT),通过因子化策略和块坐标更新实现无协调的分布式训练。具体而言,开发了序列感知的在策略优势估计器和每代理KL信任区域,理论上提供单调改进和可插拔不变性保证。实验表明,三个4B代理团队(12B总参数)在AIME24/25基准上平均超Qwen3-32B 3.9%,通过替换两个8B代理进一步提升复合分数10.4%。

英文摘要

Large language models (LLMs) with a large number of parameters achieve strong performance but are often prohibitively expensive to deploy. Recent work explores using teams of smaller, more efficient LLMs that collectively match or even outperform a single large model. However, jointly updating multiple agents introduces compounding distribution shifts, making coordination and stability during training difficult. We address this by introducing Sequential Agent Tuning (SAT), a coordinator-free training paradigm. SAT represents the team as a factorized policy and employs block-coordinate updates over agents, enabling scalable, decentralized training without a central controller. Specifically, we develop a sequence-aware, on-policy advantage estimator that conditions on the evolving team policy, coupled with per-agent KL trust regions that isolate occupancy drift. Theoretically, this framework provides two critical guarantees. First, it ensures monotonic improvement, stabilizing the training process. Second, it establishes provable plug-and-play invariance: any agent can be upgraded to a stronger model without retraining the rest of the team, with a formal guarantee that the performance bound improves. Empirically, a team of three 4B agents (12B total) trained with SAT surpasses the much larger Qwen3-32B on AIME24/25 benchmarks by 3.9\% on average. We validate our plug-and-play theory by swapping in two 8B agents, which boosts the composite score by 10.4\%. We provide code and appendix of proof at https://github.com/Yydc/SAT-AAMAS

2605.05215 2026-05-08 cs.CV cs.AI cs.LG

Layout-Aware Representation Learning for Open-Set ID Fraud Discovery

面向布局的表示学习用于开放式ID欺诈发现

Jinxing Li, Nicholas Ren, Cathy Chang, Hongkai Pan, Daniel George

发表机构 * WithPersona

AI总结 本文提出面向布局的表示学习方法,用于开放式ID欺诈检测。通过上下文感知的SimMIM微调和复合损失的监督度量学习,改进DINOv3以适应文档领域。实验显示模型在加拿大布局上达到99.83%的分类准确率,并发现276个适应性物理欺诈案例。

详情
AI中文摘要

身份-文档欺诈检测不是一个静态的二分类问题。适应性攻击者修改模板和伪造流程,使历史欺诈标签过时,并大规模成功伪造作为连贯的活动。因此,我们研究面向布局的表示学习用于开放式欺诈发现,而非仅闭合集分类。我们通过上下文感知的SimMIM微调和监督度量学习,将DINOv3适应到文档领域,并采用复合损失鼓励类间分离和类内紧凑。模型仅使用美国ID训练。通过轻量级MLP和softmax分类器,嵌入在加拿大布局上达到99.83%的布局分类准确率。此外,在包含20,448个加拿大ID的数据集上,嵌入空间分析揭示了276个适应性物理欺诈案例,包括222个未被现有检测器发现的案例。嵌入支持基于相似性的扩展,从单个确认种子扩展到与传统元数据图无关的相关案例。面向布局的文档嵌入为在分布偏移下发现新型和大规模活动欺诈提供了生产对齐的基础。

英文摘要

Identity-document fraud detection is not a stationary binary classification problem. Adaptive attackers modify templates and fabrication pipelines, making historical fraud labels stale, and successful forgeries recur at scale as coherent campaigns. We therefore study layout-aware representation learning for open-set fraud discovery rather than only closed-set classification. We adapt DINOv3 to the document domain via context-aware SimMIM fine-tuning and supervised metric learning with composite loss that encourages inter-class separability and intra-class compactness. The model is trained with U.S. IDs only. With a lightweight MLP and softmax classifier, the embedding achieves 99.83% layout classification accuracy on Canadian layouts. Moreover, on a dataset of 20,448 Canadian IDs, embedding-space analysis surfaces 276 adaptive physical-fraud cases, including 222 not surfaced by incumbent detectors. The embedding supports similarity-based expansion from a single confirmed seed to additional related cases not linked by conventional metadata graphs. The layout-aware document embeddings provide a production-aligned basis for discovering novel and campaign-scale fraud under distribution shift.

2605.05213 2026-05-08 cs.LG q-bio.QM

Nationwide EHR-Based Chronic Rhinosinusitis Prediction Using Demographic-Stratified Models

基于全国电子健康记录的慢性鼻窦炎预测:使用人口统计学分层模型

Sicong Chang, Yidan Shen, Justina Varghese, Akshay R Prabhakar, Sebastian Guadarrama-Sistos-Vazquez, Jiefu Chen, Masayoshi Takashima, Omar G. Ahmed, Renjie Hu, Xin Fu

发表机构 * Electrical and Computer Engineering(电气与计算机工程) University of Houston(休斯顿大学) Otolaryngology(耳鼻喉科) Houston Methodist Hospital(休斯顿方法医院) Information Science Technology(信息科学与技术)

AI总结 本文利用全国纵向电子健康记录数据,通过人口统计学分层模型预测慢性鼻窦炎,通过特征选择和分层建模提升预测性能,实现更有效的早期筛查和转诊优先级设定。

Comments Sicong Chang, Yidan Shen are the co-first authors This paper is already accepted to IEEE Engineering in Medicine and Biology Society (EMBC) 2026 conference

详情
AI中文摘要

慢性鼻窦炎(CRS)是一种常见的异质性炎症性疾病,导致显著的发病率和医疗成本。CRS在常规诊疗中难以早期识别,因为其症状表现与过敏性鼻炎等常见疾病重叠,且异质性表型进一步模糊了风险模式。以往的预测研究往往依赖于单一机构的队列,这限制了在人群层面的推广性。为克服这一问题,我们利用全国纵向电子健康记录数据(All of Us Research Program)来预测CRS诊断,使用两年的预诊断历史数据。为了解决编码电子健康记录数据中的极端特征稀疏性和维度问题,我们实施了一种混合特征选择流程,结合基于流行率的统计筛选与基于模型的重要性排名,将约110,000个候选代码压缩到100个可解释的特征中。为捕捉人口统计学异质性,我们跨六个成人性别和生命周期子组训练了人口统计学分层模型,并对每个子组进行了特定的超参数调整。我们的框架实现了整体AUC为0.8461,比最佳基线模型提高了0.0168的判别能力。这些结果表明,常规收集的电子健康记录数据可能支持具有代表性的CRS风险分层,并指导初级保健中的早期筛查和转诊优先级设定。

英文摘要

Chronic rhinosinusitis (CRS) is a common heterogeneous inflammatory disorder that causes substantial morbidity and healthcare costs. CRS is difficult to identify early from routine encounters, as symptom presentations overlap with common conditions such as allergic rhinitis, and heterogeneous phenotypes further obscure risk patterns. Prior predictive studies often rely on single-institutional cohorts , which reduce population-level generalizability. To overcome this, we leveraged nationwide longitudinal EHR data from the \textit{All of Us} Research Program to predict CRS diagnosis using two years of pre-diagnostic history. To address extreme feature sparsity and dimensionality in coded EHR data, we implemented a hybrid feature-selection pipeline that combines prevalence-based statistical screening with model-based importance ranking, compressing approximately 110,000 candidate codes into 100 interpretable features. To capture demographic heterogeneity, we trained demographic stratified models across six adult sex and life-stage subgroups with subgroup-specific hyperparameter tuning. Our framework achieved an overall AUC of 0.8461, improving discrimination by 0.0168 over the best baseline. These results demonstrate that routinely collected EHR data may support population-representative CRS risk stratification and inform earlier triage and referral prioritization in primary care.

2605.05209 2026-05-08 cs.LG cs.AI

Are Flat Minima an Illusion?

平坦极小值是否只是一个幻觉?

Michael Timothy Bennett

发表机构 * School of Computing(计算学院) The Australian National University(澳大利亚国立大学)

AI总结 本文探讨了平坦极小值对泛化能力的影响,指出弱性才是关键因素,并通过实验验证了弱性与泛化能力的正相关关系。

详情
AI中文摘要

神经网络在损失景观的平坦区域更容易泛化。但函数保持的重参数化可以放大Hessian矩阵两到三个数量级而不改变预测。如果权重空间的几何结构可以凭空制造,它就不能成为任何原因。换句话说,平坦是简单的,而简单性取决于编码。本文证明弱性在交换需求下是最优的,并且PAC-Bayes界限有效是因为它们与弱性相关。在MNIST数据集上,大批次泛化优势随着训练数据增加而消失。一个预测能力依赖于数据量的量不是原因而是混杂因素。在100个相同架构和训练的网络上进行头对头测试,弱性预测泛化(ρ=+0.374,p=0.00012),而尖锐性反相关(ρ=-0.226)且简单性预测无(p=0.848)。在Fashion-MNIST上,尽管简单性在一定程度上具有预测性。简单性依赖于数据集,而弱性是不变的。平坦极小值从未是答案。

英文摘要

Neural networks that land in flat regions of the loss landscape tend to generalise better than those in sharp regions. Sharpness-Aware Minimisation exploits this to improve generalisation. But function-preserving reparameterisation can inflate the Hessian of any minimum by two orders of magnitude without changing a single prediction. If the geometry of weight space can be manufactured from nothing, it cannot be the cause of anything. In other words, flat is simple and simplicity depends on encoding. Here I show that the actual driver is weakness, the volume of completions compatible with the learned function in the learner's embodied language. Weakness is reparameterisation-invariant because it is defined over what the network \emph{does}, not how it is parameterised. I prove weakness is minimax-optimal under exchangeable demands, and that PAC-Bayes bounds work because they correlate with it. On MNIST, the large-batch generalisation advantage \emph{vanishes} as training data grows, from $+1.6\%$ at $n = 2{,}000$ to $+0.02\%$ at $n = 60{,}000$. A quantity whose predictive power depends on how much data you have is not a cause but a confounder. I run head-to-heads on 100 networks with identical architecture and training. For MNIST weakness predicts generalisation ($ρ= +0.374$, $p = 0.00012$), sharpness anticorrelates ($ρ= -0.226$) and simplicity predicts nothing ($p = 0.848$). For Fashion-MNIST ($ρ= +0.384$, $p = 8.15 \times 10^{-5}$), though simplicity is at least somewhat predictive there. Simplicity is dataset dependent, whereas weakness is invariant. Flat minima were never the answer.

2605.05208 2026-05-08 cs.RO cs.DC math.OC

A GPU-Accelerated Hybrid Method for a Class of Multi-Depot Vehicle Routing Problems

一种用于多仓库车辆路径问题类别的GPU加速混合方法

Zhenyu Lei, Jin-Kao Hao

发表机构 * LERIA, Universit e ´ \acute{e} d’Angers(昂西大学LERIA实验室)

AI总结 本文提出了一种混合算法,结合学习驱动的多样性控制路线交换交叉和多仓库支持的可行与不可行搜索框架,通过多惩罚评估函数指导,提升多仓库问题的求解效率和扩展性。

详情
AI中文摘要

多仓库车辆路径问题(MDVRPs)在各种实际应用中普遍存在,但因其内在复杂性而具有计算挑战性。本文提出了一种有效的混合算法,用于解决一类MDVRPs。该算法整合了学习驱动、多样性控制的路线交换交叉和多仓库支持的可行与不可行搜索框架,由多惩罚评估函数引导。两个专门的仓库相关局部搜索算子被纳入,以进一步增强多仓库设置中的搜索能力。为提高计算效率和可扩展性,开发了一种增强版本的算法,该算法使用基于张量的GPU加速结合新颖的多步更新策略。对三种MDVRP变种的基准实例进行了广泛的计算实验,结果表明所提出的算法在与最先进方法的竞争中表现优异,特别是对于大规模实例。

英文摘要

Multi-depot vehicle routing problems (MDVRPs) are prevalent in a variety of practical applications. However, they are computationally challenging to solve due to their inherent complexity. This paper proposes an effective hybrid algorithm for a class of MDVRPs. The algorithm integrates a learning-driven, diversity-controlled route-exchange crossover and a multi-depot-supported feasible-and-infeasible search framework guided by a multi-penalty evaluation function. Two dedicated depot-related local search operators are incorporated to further strengthen the search capability in multi-depot settings. To improve computational efficiency and scalability, an enhanced version of the algorithm is developed that uses a tensor-based GPU acceleration combined with a novel multi-move update strategy. Extensive computational experiments on benchmark instances of three MDVRP variants show that the proposed algorithms are highly competitive with state-of-the-art methods, especially for large-scale instances.

2605.05102 2026-05-08 cs.LG stat.ML

Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

多臂老虎机和强化学习中分布后悔的统一框架

Harin Lee, Min-hwan Oh

发表机构 * University of Washington(华盛顿大学) Seoul National University(首尔国立大学)

AI总结 本文提出一种统一框架,研究随机多臂老虎机和回合制强化学习中的分布后悔,通过简单的UCBVI型算法,推导出通用的分布后悔界,优化期望性能、尾风险和实例依赖行为的平衡。

Comments Accepted at the Conference of Learning Theory (COLT) 2026

详情
AI中文摘要

我们通过统一框架研究随机多臂老虎机和回合制强化学习中的后悔分布。我们正式化分布后悔界为一个概率保证,适用于所有置信水平δ∈(0,1],从而表征δ范围内的后悔分布。我们提出一种简单的UCBVI型算法,具有探索奖励min{c₁,k/N, c₂,k/√N},其中N表示访问次数,(c₁,k, c₂,k)是用户指定的参数。对于任意参数序列,我们推导出通用的间隙无关和间隙依赖的分布后悔界,从而系统地表征参数如何控制期望性能、尾风险和实例依赖行为之间的权衡。特别是,我们的界在极小化和实例依赖情况下均实现了期望和分布后悔之间的最优权衡。作为特殊情况,对于具有A个臂和时间范围T的多臂老虎机,我们获得分布后悔界为O(√(AT)log(1/δ)),首次证实了Lattimore & Szepesvári (2020, 第17.1节)的猜想。

英文摘要

We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $δ\in (0,1]$, thereby characterizing the regret distribution across the full range of $δ$. We present a simple UCBVI-style algorithm with exploration bonus $\min\{c_{1,k}/N, c_{2,k}/\sqrt{N}\}$, where $N$ denotes the visit count and $(c_{1,k},c_{2,k})$ are user-specified parameters. For arbitrary parameter sequences, we derive general gap-independent and gap-dependent distributional regret bounds, yielding a principled characterization of how the parameters control the trade-off between expected performance, tail risk, and instance-dependent behavior. In particular, our bounds achieve optimal trade-offs between expected and distributional regret in both minimax and instance-dependent regimes. As a special case, for multi-armed bandits with $A$ arms and horizon $T$, we obtain a distributional regret bound of order $\mathcal{O}(\sqrt{AT}\log(1/δ))$, confirming the conjecture of Lattimore & Szepesvári (2020, Section 17.1) for the first time.

2605.04830 2026-05-08 cs.LG cond-mat.stat-mech

Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

对称破缺与非局域性相变在扩散模型中的共存

Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao

发表机构 * QuEra Computing Inc.(QuEra计算公司)

AI总结 研究现代扩散变换器中对称破缺与非局域性相变是否共存,通过分析生成轨迹动态和结果,发现两者近似同时发生,首次统一两种相变概念,为评估模型效率和指导架构设计提供依据。

Comments 20 pages, 10 figures. comments are welcome

详情
AI中文摘要

扩散模型在生成动态中经历相变,有两个互补的临界性诊断。对称破缺观点认为临界窗口是轨迹分裂为不同语义能量景观最小值的时刻,而非局域性观点认为临界窗口是局部去噪失效的时刻。我们研究现代扩散变换器中两种相变概念是否共存。通过评估生成轨迹的动态和结果,我们观察到非局域性和对称破缺临界时间几乎同时发生。我们的工作首次在实践中统一两种相变概念:它提供了具体的诊断方法,说明扩散模型何时以及为何依赖于条件和全局去噪,从而为模型效率的原理性评估提供依据,并指导架构和采样方案的设计,以避免不必要的计算。

英文摘要

Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic minima of the energy landscape, whereas the nonlocality picture views the critical window as when local denoising fails. We study whether two notions of such phase transitions are concurrent in modern diffusion transformers. By evaluating the dynamics and outcomes of the generation trajectory, we observe a near-simultaneous occurrence of the non-locality and symmetry breaking critical times. Our work is the first to unify the two notions of phase transitions in practice: it provides a concrete diagnostic for when and why diffusion models rely on conditioning and global denoising, enabling principled evaluation of model efficiency and guiding the design of architectures and sampling schemes that avoid unnecessary computation.

2605.04719 2026-05-08 cs.CL

Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL

每一步都重要:面向工具集成文本到SQL的步骤级信用分配

Yaxun Dai, Baolin Sun, Junying Wang, Pengfei Wang, Yingqi Gao, Xuemei Dong, Mengdie Chu, Xiang Qi, Pingfu Chao

发表机构 * Institute of Computer Science and Technology, Soochow University, China(苏州大学计算机科学与技术学院) Ant Digital Technologies, Ant Group(蚂蚁集团数字技术部) School of Management, University of Science and Technology of China(中国科学技术大学管理学院)

AI总结 本文提出FineStep框架,通过独立过程奖励设计和步骤级信用分配机制,解决文本到SQL中粗粒度监督导致的信用分配问题,提升模型效率和泛化能力。

详情
AI中文摘要

工具集成的文本到SQL解析已发展为有前景的范式,将SQL生成视为交织工具执行的序列决策过程。然而,现有强化学习方法主要依赖粗粒度结果监督,导致根本性的信用分配问题:模型对产生正确答案的任何轨迹给予相同奖励,即使中间步骤冗余、低效或错误。因此,模型被鼓励探索次优推理空间,限制了效率和泛化能力。为解决此问题,我们提出FineStep,一种用于工具增强文本到SQL的步骤级信用分配新框架。首先,我们引入独立过程奖励设计以缓解结果监督的信号稀疏性。其次,我们提出步骤级信用分配机制以精确量化每个推理步骤的价值。最后,我们开发基于步骤级优势的策略优化方法以实现高效更新。在BIRD基准测试中,广泛实验表明FineStep实现了最先进的性能,并减少了冗余工具交互,在4B规模下比GRPO平均提高3.25%的EX。

英文摘要

Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome supervision, resulting in a fundamental credit assignment problem: models receive the same reward for any trajectory that yields the correct answer, even when intermediate steps are redundant, inefficient, or erroneous. Consequently, models are encouraged to explore suboptimal reasoning spaces, limiting both efficiency and generalization. To address this problem, we propose FineStep, a novel framework for step-level credit assignment in tool-augmented Text-to-SQL. First, we introduce a reward design with independent process rewards to alleviate the signal sparsity of outcome supervision. Next, we present a step-level credit assignment mechanism to precisely quantify the value of each reasoning step. Finally, we develop a policy optimization method based on step-level advantages for efficient updates. Extensive experiments on BIRD benchmarks show that FineStep achieves state-of-the-art performance and reduces redundant tool interactions, with a 3.25% average EX gain over GRPO at the 4B scale.

2605.04412 2026-05-08 cs.CV

Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

结构化3D潜在空间出人意料地强大:通过2D扩散模型释放可泛化的风格

Yiran Qiao, Yiren Lu, Yunlai Zhou, Disheng Liu, Linlin Hou, Rui Yang, Yu Yin, Jing Ma

发表机构 * Case Western Reserve University(凯斯西储大学)

AI总结 本文提出DiLAST方法,利用预训练的2D扩散模型生成通用风格先验,通过扩散引导对齐渲染视角与目标风格,优化结构化3D潜在表示,实现对Out-of-distribution风格的有效生成。

详情
AI中文摘要

3D资产生成在游戏和虚拟现实等领域具有关键作用,能够从单张或多张图像快速合成高保真3D对象。基于此能力,使生成过程具备风格可控性自然成为重要方向。然而,现有方法通常依赖于与3D生成模型训练分布相似的风格图像,当面对Out-of-distribution(OOD)风格时,性能显著下降甚至失效。为解决这一限制,我们引入DiLAST:基于2D扩散的潜在觉醒方法,用于3D风格迁移。具体而言,我们利用预训练的2D扩散模型作为教师,提供丰富且可泛化的风格先验。通过扩散引导对齐渲染视角与目标风格,我们的方法优化结构化3D潜在表示以实现风格化。我们发现这一限制并非源于模型容量不足,而是由于结构化3D潜在的未充分利用,其本质上具有表现力。尽管3D生成模型训练数据相对有限,但可以利用2D扩散引导将去噪推向潜在空间特定方向,从而生成多样化的OOD风格。在多种数据集和多个3D生成后端上进行的广泛实验证明了该方法的有效性和即插即用性质。

英文摘要

3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emerges as an important and desirable direction. However, existing approaches typically rely on style images that lie within or are similar to the training distribution of 3D generation models. When presented with out-of-distribution (OOD) styles, their performance degrades significantly or even fails. To address this limitation, we introduce \textbf{DiLAST}: 2D Diffusion-based Latent Awakening for 3D Style Transfer. Specifically, we leverage a pretrained 2D diffusion model as a teacher to provide rich and generalizable style priors. By aligning rendered views with the target style under diffusion-based guidance, our method optimizes the structured 3D latent representations for stylization. We observe that this limitation stems not from insufficient model capacity, but from the underutilization of structured 3D latents, which are inherently expressive. Despite being trained on comparatively limited data, 3D generation models can leverage 2D diffusion guidance to steer denoising toward specific directions in latent space, thereby producing diverse, OOD styles. Extensive experiments across diverse data and multiple 3D generation backbones demonstrate the effectiveness and plug-and-play nature of our approach.

2605.04282 2026-05-08 cs.LG

Hardware-Aware Neural Feature Extraction for Resource-Constrained Devices

面向资源受限设备的神经特征提取:考虑硬件的神经特征提取

Francesco Tosini, Simone Pedroni, Christian Veronesi, Pietro Bartoli, Andrea Giudici, Marco Paracchini, Marco Marcon, Diana Trojaniello

发表机构 * Department of Electronics, Information and Bioengineering (DEIB), Politecnico di Milano(电子、信息与生物工程系(DEIB),米兰理工学院) Smart Eyewear Lab, EssilorLuxottica(智能眼镜实验室,EssilorLuxottica)

AI总结 本文提出Gideon,一种面向资源受限设备的神经特征提取器,结合关系知识蒸馏和可微神经架构搜索,在内存和运算约束下提升INT8鲁棒性与量化抗性,实现高效部署。

Comments This paper has been accepted for publication at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. \c{opyright}IEEE

详情
AI中文摘要

视觉SLAM是空间计算系统的核心组件,但将其部署到微控制器类硬件上仍具挑战性,由于内存、带宽和量化限制。尽管现代神经描述符具有强鲁棒性,但其实际应用常受系统级瓶颈限制,这些瓶颈未被基于FLOP的效率指标所捕获。本文引入Gideon,一种专门针对资源受限设备设计的硬件感知神经特征提取器。我们的方法结合了从SuperPoint教师模型中获得的关系知识蒸馏与在严格内存和运算约束下进行的可微神经架构搜索(DNAS)。与传统设计流程不同,我们将量化稳定性与动态范围紧凑性作为首要目标。我们证明,诸如将批量归一化替换为仿射层等架构选择显著提高了INT8鲁棒性,且描述符维度直接决定了量化抗性。在STM32N6上部署时,Gideon实现了9.003 ms的推理时间(111 fps),同时保持在1.5 MB以下的内存占用。值得注意的是,INT8量化导致的降级可忽略不计,偶尔甚至达到全精度性能。这些结果表明,通过整体硬件-算法协同设计,可以将鲁棒的学得特征提取与嵌入式硬件限制相结合。

英文摘要

Visual SLAM is a core component of spatial computing systems, yet deploying learned local feature extractors on microcontroller-class hardware remains challenging due to memory, bandwidth, and quantization constraints. While modern neural descriptors provide strong robustness, their practical adoption is often hindered by system-level bottlenecks that are not captured by FLOP-based efficiency metrics. In this work, we introduce Gideon, a hardware-aware neural feature extractor explicitly designed for resource-constrained devices. Our approach combines relational knowledge distillation from a SuperPoint teacher with differentiable neural architecture search (DNAS) under strict memory and operator constraints. Unlike conventional design pipelines, we treat quantization stability and dynamic-range compactness as first-class objectives. We show that architectural choices such as replacing Batch Normalization with affine layers significantly improve INT8 robustness, and that descriptor dimensionality directly governs quantization resilience. Deployed on STM32N6, Gideon achieves 9.003 ms inference time (111 fps) while remaining below a 1.5 MB memory footprint. Remarkably, INT8 quantization induces negligible degradation and occasionally matches full-precision performance. These results demonstrate that robust learned feature extraction can be reconciled with embedded hardware constraints through holistic hardware-algorithm co-design.

2605.04066 2026-05-08 cs.CL cs.ET cs.LG

Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

适应与繁荣!面向改进大语言模型推理的自适应幂均策略优化

Yiming Huang, Zhenbo Shi, Shuzheng Gao, Cuiyun Gao, Peiyi Han, Chuanyi Liu

发表机构 * Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Peng Cheng Laboratory(鹏城实验室) The Chinese University of Hong Kong(香港中文大学)

AI总结 本文提出自适应幂均策略优化方法,通过引入幂均目标和反馈自适应裁剪,提升大语言模型的推理性能,实验显示其在多个任务中表现优于现有方法。

Comments Accepted to ACL 2026 (Findings)

详情
AI中文摘要

可验证奖励的强化学习(RLVR)是增强大语言模型(LLMs)推理能力的重要范式。然而,现有方法通常依赖静态策略优化方案,与模型不断演变的推理能力不匹配。为了解决这一问题,我们提出了自适应幂均策略优化(APMPO),其包含两个主要创新:幂均策略优化(PMPO)和反馈自适应裁剪(FAC)。具体而言,PMPO引入了一个广义的幂均目标,使模型能够自适应地从算术均值的信号放大行为过渡到几何均值的一致性强制行为。FAC根据实时奖励统计自适应调整裁剪界限,以克服静态机制的限制。借助这些创新,APMPO提升了学习动态和推理性能。在三个推理任务的九个数据集上的广泛实验显示,APMPO在数学推理基准上的平均Pass@1分数比GRPO提高了3.0分(使用Qwen2.5-3B-Instruct)。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is an essential paradigm that enhances the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically rely on static policy optimization schemes that misalign with the model's evolving reasoning capabilities. To address this issue, we propose Adaptive Power-Mean Policy Optimization (APMPO), which comprises two main innovations: Power-Mean Policy Optimization (PMPO) and Feedback-Adaptive Clipping (FAC). Specifically, PMPO introduces a generalized power-mean objective. This enables the model to adaptively transition from the signal-amplifying behavior of the arithmetic mean to the consistency-enforcing behavior of the geometric mean. FAC adaptively adjusts clipping bounds based on real-time reward statistics to overcome the limitations of static mechanisms. Capitalizing on these innovations, APMPO improves learning dynamics and reasoning performance. Extensive experiments on nine datasets across three reasoning tasks showcase the superiority of APMPO over state-of-the-art RLVR-based baselines. For instance, APMPO boosts the average Pass@1 score on mathematical reasoning benchmarks by 3.0 points compared to GRPO when using Qwen2.5-3B-Instruct.

2605.04065 2026-05-08 cs.CL cs.ET cs.LG

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

基于自发自由能的强化学习与自适应优势塑造的无监督推理在大语言模型中的应用

Yiming Huang, Zhenbo Shi, Xin-Cheng Wen, Jichuan Zeng, Cuiyun Gao, Peiyi Han, Chuanyi Liu

发表机构 * Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Peng Cheng Laboratory(鹏城实验室) The Chinese University of Hong Kong(香港中文大学)

AI总结 本文提出FREIA算法,通过自发自由能奖励和自适应优势塑造提升大语言模型的无监督推理能力,在三个任务中表现优异。

Comments Accepted by ACL 2026

详情
AI中文摘要

无监督强化学习(RL)已成为使大语言模型(LLMs)自我改进的有前途的范式。然而,现有无监督RL方法在训练过程中难以适应模型推理能力的演变。为此,我们引入FREIA,一种基于两个关键创新的新型RL算法:(1)自发自由能奖励(FER)根据自由能原理平衡共识和探索;(2)自适应优势塑造(AAS)根据采样奖励的统计特性自适应调整学习信号。在九个数据集上的实验证明,FREIA在三个推理任务中优于其他无监督RL基线。值得注意的是,在数学推理任务中,FREIA在使用DeepSeek-R1-Distill-Qwen-1.5B模型时,Pass@1指标平均高出0.5至3.5分。

英文摘要

Unsupervised reinforcement learning (RL) has emerged as a promising paradigm for enabling self-improvement in large language models (LLMs). However, existing unsupervised RL-based methods often lack the capacity to adapt to the model's evolving reasoning capabilities during training. Therefore, these methods can misdirect policy optimization in the absence of ground-truth supervision. To address this issue, we introduce FREIA, a novel RL-based algorithm built on two key innovations: (1) Free Energy-Driven Reward (FER) adapts rewards to balance consensus and exploration based on the Free Energy Principle. (2) Adaptive Advantage Shaping (AAS) adaptively adjusts learning signals based on the statistical characteristics of sampled rewards. Empirical evaluations on nine datasets across three reasoning tasks showcase that FREIA outperforms other unsupervised RL-based baselines. Notably, in mathematical reasoning tasks, FREIA surpasses other methods by an average of 0.5 to 3.5 points in Pass@1 using the DeepSeek-R1-Distill-Qwen-1.5B model.

2605.04057 2026-05-08 cs.LG cs.AI

Structured Progressive Knowledge Activation for LLM-Driven Neural Architecture Search

结构化渐进知识激活用于LLM驱动的神经架构搜索

Zhen Liu, Yuhan Liu, Jinjun Wang, Wei Song, Jianyi Liu, Jingwen Fu

发表机构 * State Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University.(人机混合增强智能国家重点实验室,人工智能与机器人研究院,西安交通大学) MiLM Plus, Xiaomi Inc.(小米MiLM Plus) North China University of Technology, Beijing.(华北电力大学(北京)) Zhongguancun Academy, Beijing, China(中关村学院,北京,中国)

AI总结 本文提出SPARK方法,通过显式选择功能因素并条件化编辑,减少功能耦合影响,提升神经架构搜索的样本效率和OOD准确性。

详情
AI中文摘要

本文聚焦神经架构搜索(NAS)中的关键挑战:在昂贵评估下整合已知架构知识并探索新设计。大语言模型(LLMs)作为NAS的有力助手,能将丰富的架构和编码先验转化为可执行代码修改。然而,实践中看似局部的修改常导致非局部行为和性能变化,因单个修改可能意外耦合多个交互功能因素,即功能耦合。为使LLM知识在耦合下有效,我们提出结构化渐进知识激活(SPARK),通过显式选择功能因素并条件化编辑,减少耦合副作用,产生更精确可靠的架构修改。在CLRS-DFS上,SPARK实现了28.1倍的样本高效架构进化加速,并在OOD准确性上获得22.9%的相对提升。

英文摘要

This paper focuses on a key challenge in Neural Architecture Search (NAS): integrating established architectural knowledge while exploring new designs under expensive evaluations. Large language models (LLMs) are a promising assistant for NAS because they can translate rich architectural and coding priors into executable code edits. However, in practice, seemingly local revisions often propagate into non-local behavioral and performance shifts because a single edit can inadvertently couple multiple interacting functional factors, a phenomenon we refer to as functional entanglement. To make LLM knowledge usable under such entanglement, we propose Structured Progressive Knowledge Activation (SPARK), which activates relevant priors by explicitly selecting the functional factor to modify and conditioning the edit on that factor. This factor-conditioned editing reduces entangled side effects and yields more targeted, reliable architecture modifications. On CLRS-DFS, SPARK achieves a 28.1x sample-efficient architecture evolution speedup and yields a 22.9 percent relative improvement in OOD accuracy.

2605.03989 2026-05-08 cs.AI

An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration

面向经验的可插拔经验型RAG技能:用于经验驱动的检索策略编排

Dutao Zhang, Tian Liao

发表机构 * Macao Polytechnic University(澳门理工学院)

AI总结 本文提出经验型RAG技能,通过分析场景和经验记忆选择合适检索策略,提升多任务检索效果,实测在多个数据集上优于固定检索基线。

Comments Preprint. 6 pages, 1 figure, 3 tables

详情
AI中文摘要

检索增强生成系统通常假设固定检索流程适用于异构任务,但事实性问答、多跳推理和科学验证显示不同的检索偏好。我们提出了经验型RAG技能,作为代理与检索池之间的可插拔检索编排层。该技能分析当前场景,咨询经验记忆,选择合适检索策略,并将结构化证据返回给代理。在固定候选池下,经验型RAG技能在BeIR/nq、BeIR/hotpotqa和BeIR/scifact数据集上实现了整体nDCG@10为0.8924,优于固定单检索器基线并在与Adaptive-RAG风格路由竞争中保持竞争力。结果表明,检索策略选择可以作为可重用的代理技能封装,而非硬编码在上层工作流中。

英文摘要

Retrieval-augmented generation systems often assume that one fixed retrieval pipeline is sufficient across heterogeneous tasks, yet factoid question answering, multi-hop reasoning, and scientific verification exhibit different retrieval preferences. We present Experience-RAG Skill, an agent-oriented pluggable retrieval orchestration layer positioned between the agent and the retriever pool. The proposed skill analyzes the current scene, consults an experience memory, selects an appropriate retrieval strategy, and returns structured evidence to the agent. Under a fixed candidate pool, Experience-RAG Skill achieves an overall nDCG@10 of 0.8924 on BeIR/nq, BeIR/hotpotqa, and BeIR/scifact, outperforming fixed single-retriever baselines and remaining competitive with Adaptive-RAG-style routing. The results suggest that retrieval strategy selection can be productively encapsulated as a reusable agent skill rather than being hard-coded in the upper workflow.