arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.04157 2026-06-04 cs.RO

Selecting haptic guidance models in teleoperation: guidelines from a comparative user study

遥操作中触觉引导模型的选择：来自比较用户研究的指南

Alexis Boulay, Margot Vulliez, David Daney

发表机构 * Farm3, Besançon, France（法国贝桑松Farm3）； Auctus Team, Inria, Talence, France（法国塔兰西Inria Auctus团队）

AI总结通过用户研究比较弹簧-阻尼器、势场和引导管三种触觉引导模型，提出基于环境特征和实时评估指标的模型选择指南。

Comments EUROHAPTICS 2026 - EuroHaptics International Conference, Jul 2026, Sienna, Italy

详情

AI中文摘要

遥操作中的触觉引导通过力反馈增强操作员性能。本文提出了考虑任务、环境和操作员的最合适模型选择指南。我们定义了一个统一公式，将最常见的模型（弹簧-阻尼器、势场和引导管）表示为具有特定模型引导函数的刚度-阻尼系统的变体。我们进行了一项用户研究，在垂直农业任务中比较了三种经典模型在六种不同环境条件下的场景。结果显示没有普遍优越的模型：弹簧-阻尼器在杂乱环境中表现优异，势场在自由空间中表现良好（但在障碍物附近存在风险），而引导管提供了平衡的折衷。我们提出了新颖的客观指标来评估交互，并表明引导力大小与舒适度和信任度评分相关。这些发现通过环境特征和实时评估指标提供了实用的模型选择指南。

英文摘要

Haptic guidance in teleoperation enhances operator performance through force feedback. This paper presents guidelines to select the most appropriate model considering the task, the environment and the operator. We define a unified formulation expressing most common models (spring-damper, potential field, and guiding tube) as variations of a stiffness-damping system with model-specific guiding functions. We conducted a user study comparing the three classical models across six scenarios with varying environmental conditions in a vertical farming task. Results show no universally superior model: spring-damper excels in cluttered environments, potential field in free spaces (but it shows risks near obstacles), and guiding tube offers a balanced compromise. We propose novel objective metrics to evaluate the interaction, and show that guiding force magnitude correlates with comfort and trust scores. These findings provide practical model selection guidelines through environmental characteristics and real-time evaluation metrics.

URL PDF HTML ☆

赞 0 踩 0

2606.04152 2026-06-04 cs.AI cs.CY

Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research

通过符号思考：PEEL作为认知可问责的AI赋能研究的符号脚手架

Clarisse de Souza, Gabriel Barbosa, Simone Diniz Junqueira Barbosa, Bárbara Betts, Renato Cerqueira, Juliana Jansen Ferreira

发表机构 * PUC-Rio（里约热内卢联邦大学）； PUC-Behring Institute of Artificial Intelligence（贝林格人工智能研究所）

AI总结本文提出PEEL框架，结合Voyant Tools的确定性远读与Claude的LLM解释，基于皮尔斯符号学和溯因推理，揭示AI生成摘要中的系统性扭曲，并得出三项设计启示。

Comments 10 pages, 5 figuras

2606.04150 2026-06-04 cs.AI cs.HC

Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection

偶然陷入AI情感依赖：日常AI互动如何重塑人际关系

Yaoxi Shi, Cathy Mengying Fang, Pattie Maez, Amit Goldenberg

发表机构 * Imperial College Business School（帝国学院商学院）； Harvard Business School AI Institute（哈佛商学院人工智能研究所）； MIT Media Lab（麻省理工学院媒体实验室）； Harvard Business School（哈佛商学院）； Harvard Department of Psychology（哈佛大学心理学系）

AI总结本文通过实证研究，揭示AI情感支持通常在日常任务导向的互动中偶然产生，且这种路径依赖会改变人们对AI情感能力的信念，导致对AI的偏好增加、对人类的偏好减少。

详情

AI中文摘要

公共讨论和新兴政策通常假设AI情感支持是一种有意的行为：孤独的用户有意识地寻求专用伴侣聊天机器人的安慰。在本文中，我们基于新兴的实证证据，认为这种描述在两个层面上不准确，既涉及AI情感支持的产生方式，也涉及它如何塑造未来行为。首先，AI情感支持通常是在通用平台上的任务导向互动中偶然产生的，就像工作场所的友谊通过合作加深一样。其次，这些偶然遭遇是路径依赖的：对AI情感支持的积极体验会更新人们对AI情感能力的信念，并改变他们未来寻求情感支持的选择，增加对AI的偏好，减少对人类的偏好。我们回顾了最近的证据，包括与OpenAI合作进行的一项大规模纵向研究，该研究显示，每天与AI进行五分钟关于个人问题的对话，持续28天，导致寻求人类支持的偏好下降10.3%，对AI的偏好上升11.6%。这些发现表明，当前专注于伴侣应用和孤立互动的政策无法充分保护人际关系。相反，有效的监管应扩展到通用AI系统，并解决人们寻求支持方式的累积性、轨迹层面的变化。认识到人们如何偶然陷入AI情感支持，以及这些遭遇如何随时间重塑人际关系，对于保障人类福祉至关重要。

英文摘要

Public discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that this picture is inaccurate on two accounts, both in how AI emotional support arises and how it shapes future behavior. First, AI emotional support commonly emerges incidentally within task-oriented interactions on general-purpose platforms, much as workplace friendships deepen through collaboration. Second, these incidental encounters are path-dependent: positive experiences of AI emotional support update people's beliefs about AI's emotional capabilities and redirect their choices for future emotional support, increasing preference for AI and decreasing preference for humans. We review recent evidence, including a large-scale longitudinal study conducted in collaboration with OpenAI, showing that daily five-minute conversations with an AI about personal issues over 28 days led to a 10.3% decrease in the preference for seeking support from humans and an 11.6% increase in the preference for AI. These findings suggest that current policy, focused on companion apps and isolated interactions, cannot adequately protect human connection. Instead, effective regulations should extend to general-purpose AI systems and address cumulative, trajectory-level changes in how people seek support. Recognizing how people stumble into AI emotional support and how those encounters redirect human connections over time is essential to safeguarding human well-being.

URL PDF HTML ☆

赞 0 踩 0

2606.04149 2026-06-04 cs.RO

CoPark: Learning Reactive Parking via Self-Play

CoPark：通过自我对弈学习反应式泊车

Jiarong Wei, Yanxing Chen, Sinuo Song, Yin Wu, Anna Rehr, Abhinav Valada

发表机构 * Department of Computer Science, University of Freiburg（弗赖堡大学计算机科学系）； CARIAD SE（CARIAD SE公司）； Technical University of Munich（慕尼黑技术大学）

AI总结提出CoPark，一种基于残差策略的多智能体自我对弈强化学习方法，通过固定先验与残差头结合，在反应式泊车中实现高精度与安全交互的平衡，显著优于基线方法。

详情

AI中文摘要

学习一个能够以高几何精度达到目标同时与附近智能体安全交互的单一策略面临相互冲突的目标。精度有利于固定几何计划的执行，而交互则要求在另一智能体侵入时立即偏离，导致针对一个目标优化的策略往往在另一个目标上失败。我们在反应式自主泊车的背景下研究这一问题，其中多辆车必须达到指定车位，终端精度达到亚米级，同时在整个操作过程中对邻近车辆保持响应。我们提出CoPark，一种基于残差策略架构的多智能体自我对弈RL方法。预计算的离线计划提供固定的动作先验，而残差头学习反应式修正。残差策略在自我对弈下学习行为，弥补数据和脚本的不足，而固定先验保持纯策略难以可靠达到的车位框架几何。关键设计是一种合作伙伴威胁调制的通道非对称先验释放。连续威胁信号将纵向通道的权限转移给残差头以实现让行，而横向通道仍锚定在预计算参考上以保持亚米级车位对齐。闭环细化层修正动作网格离散化带来的残差终端误差。我们在六个停车场训练策略，并在我们的新反应式泊车基准（包括Dragon Lake Parking (DLP)和DeepScenario Open 3D (DSC3D)）上进行零样本评估。CoPark实现了约70-85%的成功率，碰撞率仅为3-6%，显著优于经典、模仿学习和大规模RL基线。重要的是，结果展示了涌现的交互行为，如倒车让行、中途让行、狭窄通道通行和排队。

英文摘要

Learning a single policy that reaches a goal with high geometric precision while interacting safely with nearby agents poses conflicting objectives. Precision favors commitment to a fixed geometric plan, whereas interaction requires immediate deviation when another agent intrudes, causing policies optimized for one objective to often fail at the other. We study this problem in the context of reactive autonomous parking, where multiple vehicles must reach assigned slots with sub-meter terminal accuracy while remaining responsive to neighboring vehicles throughout the maneuver. We propose CoPark, a multi-agent self-play RL approach built on a residual-policy architecture. A precomputed offline plan provides a fixed action prior, while a residual head learns the reactive corrections. The residual policy learns behaviors under self-play, where data and scripting fall short, while the fixed prior holds the slot-frame geometry that pure policies struggle to reach reliably. The key design is a partner-threat-modulated, channel-asymmetric release of the prior. A continuous threat signal shifts authority of the longitudinal channel to the residual head to enable yielding, while the lateral channel remains anchored to the precomputed reference to preserve sub-meter slot alignment. A closed-loop refinement layer corrects residual terminal error from action-grid discretization. We train our policy on six parking lots and evaluate zero-shot on our new reactive-parking benchmark spanning Dragon Lake Parking (DLP) and DeepScenario Open 3D (DSC3D). CoPark achieves ~70-85% success with only 3-6% collision rate, substantially outperforming classical, imitation-learning, and large-scale RL baselines. Importantly, the results demonstrate emergent interaction behaviors such as reverse-yielding, mid-maneuver yielding, tight-corridor passing, and queuing.

URL PDF HTML ☆

赞 0 踩 0

2606.04143 2026-06-04 cs.LG cs.AI

Physics-Informed Machine Learning for Short-Term Flood Prediction

物理信息机器学习用于短期洪水预测

Tewodros Syum Gebre, Jagrati Talreja, Leila Hashemi-Beni

发表机构 * IEEE Service Center（IEEE服务中心）； National Science Foundation（国家科学基金会）； Microsoft（微软）

AI总结提出一种物理信息机器学习框架，通过将水文知识作为趋势对齐约束嵌入LSTM损失函数，在数据稀缺和极端天气下提升洪水预测的物理一致性和可靠性。

Comments This paper has been accepted for publication in IGARSS 2026. The final authenticated version will be available through IEEE Xplore

详情

AI中文摘要

准确的洪水预测对于减轻灾害风险和保护社区至关重要。然而，纯数据驱动的机器学习模型在数据稀缺环境中常常表现不佳，并可能违反基本的水文原理。标准长短期记忆（LSTM）网络可能产生物理上不一致的预测，特别是在外推到极端天气条件时。为了解决这些限制，我们提出了一种物理信息机器学习（PIML）框架，将水文知识直接纳入LSTM模型的损失函数中。具体来说，趋势对齐约束惩罚降水与流量趋势之间的方向不一致性，从而在不需复杂水动力学方程的情况下提高模型鲁棒性。这种正则化鼓励模型学习物理上合理的水文过程线行为，即使在训练数据有限的情况下，也能增强峰值洪水事件期间的可靠性。实验结果表明，所提出的物理信息模型在数据稀缺环境下优于标准LSTM基线，当仅使用5%的可用数据训练时，纳什-萨特克利夫效率（NSE）从0.20提高到0.23。在模拟极端气候情景下的额外压力测试表明，基线模型表现出不稳定的行为，而物理信息模型保持了方向一致性和物理合理性。尽管在数据有限的情况下准确预测极端峰值幅度仍然具有挑战性，但所提出的方法显著减少了纯数据驱动模型中常见的非物理波动。这些发现表明，简单的物理约束可以显著提高深度学习模型在实时洪水预测中的可靠性，为无测站流域和不断变化的气候条件提供了实用解决方案。

英文摘要

Accurate flood forecasting is essential for mitigating disaster risks and protecting communities. However, purely data-driven machine learning models often struggle in data-scarce environments and may violate fundamental hydrological principles. Standard Long Short-Term Memory (LSTM) networks can generate physically inconsistent predictions, particularly when extrapolating to extreme weather conditions. To address these limitations, we propose a Physics-Informed Machine Learning (PIML) framework that incorporates hydrological knowledge directly into the loss function of an LSTM model. Specifically, a Trend Alignment constraint penalizes directional inconsistencies between precipitation and discharge trends, improving model robustness without requiring complex hydrodynamic equations. This regularization encourages the model to learn physically plausible hydrograph behavior, even with limited training data, while enhancing reliability during peak flood events. Experimental results show that the proposed physics-informed model outperforms a standard LSTM baseline in data-scarce settings, increasing the Nash-Sutcliffe Efficiency (NSE) from 0.20 to 0.23 when trained on only 5% of the available data. Additional stress tests under simulated extreme climate scenarios demonstrate that the baseline model exhibits unstable behavior, whereas the physics-informed model maintains directional consistency and physical plausibility. Although accurately predicting extreme peak magnitudes remains challenging with limited data, the proposed approach substantially reduces unphysical fluctuations common in purely data-driven models. These findings demonstrate that simple physical constraints can significantly improve the reliability of deep learning models for real-time flood forecasting, offering a practical solution for ungauged basins and evolving climate conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.04135 2026-06-04 cs.LG

Stationarity-Aware Retrieval-Augmented Time Series Forecasting

平稳性感知的检索增强时间序列预测

Shiqiao Zhou, Holger Schöner, Zipeng Wu, Edouard Fouché, IAG Wilson, Shuo Wang

发表机构 * University of Birmingham（伯明翰大学）； Siemens AG（西门子有限公司）

AI总结提出SARAF框架，通过自适应平衡检索相关性与多样性，并利用平稳性感知聚合，提升非平稳时间序列预测的准确性和鲁棒性。

Comments Accepted by KDD 2026 research track

详情

AI中文摘要

时间序列预测依赖于历史模式，但真实世界序列通常表现出非平稳性和制度转换，这对全参数预测器构成挑战。受检索增强生成（RAG）启发，最近的工作通过检索相关历史片段并在推理时将其作为外部证据来增强预测器。然而，由于真实世界时间序列的内在非平稳性，高度相似的过去片段并不一定意味着相似的未来，这使得仅基于相似性的检索脆弱且容易冗余。我们提出平稳性感知的检索增强时间序列预测（SARAF），这是一个自适应平衡检索中相关性和多样性的框架。SARAF首先通过时间对齐增强的时间相似性形成候选池，然后应用多样性感知选择策略覆盖异质历史制度，其中多样化强度由数据集级别的平稳性自动调节。此外，SARAF使用平稳性感知聚合来融合检索到的未来。在八个真实世界数据集上的大量实验表明，SARAF实现了有竞争力的预测性能，并在强基线上提高了平均准确性和鲁棒性，在具有挑战性的非平稳设置下尤其明显。代码：https://github.com/ShiqiaoZhou/SARAF。

英文摘要

Time series forecasting relies on historical patterns, but real-world series often exhibit non-stationarity and regime shifts that challenge fully parametric forecasters. Inspired by Retrieval-Augmented Generation (RAG), recent work augments forecasters by retrieving relevant historical segments and using them as external evidence at inference time. However, due to the intrinsic non-stationarity of real-world time series, a highly similar past segment does not necessarily imply a similar future, rendering similarity-only retrieval brittle and prone to redundancy. We propose Stationarity-Aware Retrieval-Augmented Time Series Forecasting (SARAF), a framework that adaptively balances relevance and diversity in retrieval. SARAF first forms a candidate pool via temporal similarity with time-aligned enhancement, then applies a diversity-aware selection strategy to cover heterogeneous historical regimes, with the diversification strength automatically modulated by dataset-level stationarity. Moreover, SARAF uses stationarity-aware aggregation to fuse the retrieved futures. Extensive experiments on eight real-world datasets show that SARAF achieves competitive forecasting performance and improves average accuracy and robustness over strong baselines, with particularly clear benefits under challenging non-stationary settings. Code: https://github.com/ShiqiaoZhou/SARAF.

URL PDF HTML ☆

赞 0 踩 0

2606.04133 2026-06-04 cs.CV

Pinpoint: Grounded Worldwide Image Geolocation via Cross-Source Retrieval and Reranking

Pinpoint: 基于跨源检索与重排序的全球图像地理定位

Nika Chuzhoy, Brian Hu, Amit A. Arora, Jae Ro, Sarthak S. Sahu

发表机构 * Virtualitics

AI总结提出一种检索-重排序架构Pinpoint，通过对比学习融合Flickr照片和街景图像，结合注意力重排序器利用跨源证据实现全球图像地理定位，在多个基准上达到最优。

详情

AI中文摘要

图像地理定位旨在根据视觉内容估计照片拍摄地点。在全球范围内，由于视觉证据往往模糊、多样且分布不均，这仍然具有挑战性。先前的工作通常将普通互联网照片和街景图像的地理定位视为独立任务，尽管它们具有互补优势：互联网照片更匹配用户拍摄查询的外观分布，而街景图像提供更密集、地理覆盖更广的参考。我们提出Pinpoint，一种检索-重排序架构，以由粗到细的流程结合两种数据源。对比图像-GPS嵌入器在用户上传的Flickr照片和街景图像上训练，学习共享的图像-GPS嵌入空间，用于检索候选位置。然后，基于注意力的重排序器通过结合候选级别的视觉和GPS特征以及来自附近位置的跨源证据，对检索到的候选进行重新评分，以确定预测。与最近的先前工作不同，Pinpoint不依赖多模态大语言模型，使得推理更快且更具可重复性。Pinpoint在互联网照片（IM2GPS3k和YFCC4k）和街景图像（OSV-5M）的标准基准上，在所有指标上均达到最先进的结果。

英文摘要

Image geolocation aims to estimate where a photograph was taken from its visual content. At worldwide scale, this remains challenging because visual evidence is often ambiguous, diverse, and unevenly distributed. Prior work has typically treated geolocation of ordinary internet photos and street-view imagery as separate tasks, despite their complementary strengths: internet photos better match the appearance distribution of user-captured queries, while street-view imagery provides denser, geographically grounded coverage. We present Pinpoint, a retrieve-and-rerank architecture that combines both sources in a coarse-to-fine pipeline. A contrastive image-GPS embedder is trained on both user-uploaded Flickr photos and street-view imagery, learning a shared image-GPS embedding space that is used to retrieve candidate locations. An attention-based reranker then rescores retrieved candidates by combining candidate-level visual and GPS features with cross-source evidence from nearby locations to ground the prediction. Unlike recent prior work, Pinpoint does not rely on multimodal large-language models, making inference faster and more reproducible. Pinpoint achieves state-of-the-art results across all metrics on standard benchmarks for internet photos (IM2GPS3k and YFCC4k) and street-view imagery (OSV-5M).

URL PDF HTML ☆

赞 0 踩 0

2606.04130 2026-06-04 cs.RO

CLAW: Learning Continuous Latent Action World Models via Adversarial Latent Regularization

CLAW: 通过对抗潜在正则化学习连续潜在动作世界模型

Tewodros Ayalew, Matthew Jeung, Samuel Wheeler, Xiao Zhang, Andre de la Cruz Arce, Kaylene Stocking, Michael Maire, Matthew R. Walter

发表机构 * University of Chicago（芝加哥大学）； Toyota Technological Institute at Chicago（芝加哥丰田技术研究所）； Argonne National Laboratory（阿贡国家实验室）

AI总结提出CLAW框架，利用对抗潜在正则化和扩散视频生成，从无动作视频中端到端学习世界模型与连续潜在动作表示，支持观察模仿学习和目标导向规划。

Comments 8 pages, 15 pages of supplementary material

详情

AI中文摘要

我们引入了CLAW，一个完全端到端的自监督框架，用于直接从无动作视频中联合学习世界模型和连续潜在动作表示。我们的方法利用对抗潜在正则化和基于扩散的视频生成来捕获结构化和语义上有意义的动作表示，同时建模丰富的、可预测的环境动态，而不依赖于任何动作标签或注释。通过同时训练潜在动作模型和世界模型，CLAW学会仅从视觉观察中推理推断的动作如何引起环境转变。我们展示了由此产生的潜在动作世界模型支持从观察中模仿学习和目标导向规划。在模仿学习中，从原始视频中提取的潜在动作实现了行为克隆。对于规划，CLAW生成潜在动作序列并将其映射到可执行动作以达到期望目标。跨多种任务和实体的广泛实验表明，CLAW产生了语义上有意义的潜在动作表示，支持有效的动作迁移，并实现了规划和从观察中模仿，优于现有方法。

英文摘要

We introduce CLAW, a fully end-to-end self-supervised framework for learning a world model jointly with continuous latent action representations directly from action-free videos. Our approach leverages adversarial latent regularization and diffusion-based video generation to capture structured and semantically meaningful action representations while modeling rich, predictive environment dynamics, without relying on any action labels or annotations. By simultaneously training the Latent Action Model and world model, CLAW learns to reason about how inferred actions induce environment transitions from visual observations alone. We show that the resulting latent action world model supports both imitation learning from observation and goal-directed planning. In imitation learning, latent actions extracted from raw videos enable behavior cloning. For planning, CLAW generates sequences of latent actions and maps them to executable actions to reach desired goals. Extensive experiments across diverse tasks and embodiments demonstrate that CLAW produces semantically meaningful latent action representations, supports effective action transfer, and enables planning and imitation from observation, outperforming existing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.04127 2026-06-04 cs.CL

When Retrieval Doesn't Help: A Large-Scale Study of Biomedical RAG

当检索无济于事：生物医学RAG的大规模研究

Erfan Nourbakhsh, Rocky Slavin, Ke Yang, Anthony Rios

发表机构 * The University of Texas at San Antonio（德克萨斯大学阿灵顿分校）

AI总结本研究通过大规模实验发现，检索增强生成（RAG）在生物医学问答中仅带来微小且不一致的提升（1-2%），主要瓶颈在于模型有效利用检索证据的能力不足。

Comments 9 Pages, accepted to BioNLP Workshop at ACL 2026

详情

AI中文摘要

医学问答是一个高风险场景，事实错误可能导致严重后果。检索增强生成（RAG）被广泛视为一种有前景的解决方案，先前的研究报告称大型医学问答模型有显著提升。我们在一系列7B到72B参数的开源指令调优模型上重新审视了这一假设。在五个模型、十个生物医学QA数据集、四种检索方法和四个检索语料库上，我们发现与无检索基线相比，检索仅带来微小且不一致的改进，通常在1-2个百分点内。相比之下，骨干模型的选择比检索器或语料库的选择影响大得多，并且在大多数设置中，专家和外行检索源的表现相似。这些结果表明，主要瓶颈不仅仅是检索质量，而是模型有效利用检索证据的能力有限。

英文摘要

Medical question answering is a high-stakes setting where factual errors can have serious consequences. Retrieval-augmented generation (RAG) is widely viewed as a promising solution, and prior work has reported substantial gains for large medical QA models. We revisit this assumption across a broad range of open-weight instruction-tuned models spanning 7B to 72B parameters. Across five models, ten biomedical QA datasets, four retrieval methods, and four retrieval corpora, we find that retrieval yields only small and inconsistent improvements over a no-retrieval baseline, typically within 1-2 points. In contrast, the choice of backbone model has a much larger effect than the choice of retriever or corpus, and expert and layman retrieval sources perform similarly in most settings. These results suggest that the main bottleneck is not retrieval quality alone, but the model's limited ability to use retrieved evidence effectively.

URL PDF HTML ☆

赞 0 踩 0

2606.04120 2026-06-04 cs.CL cs.AI

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

SaliMory: 为对话代理编排认知记忆

Kai Zhang, Xinyuan Zhang, Hongda Jiang, Shiun-Zu Kuo, Hyokun Yun, Ejaz Ahmed, Shereen Oraby, Ziyun Li, Sanat Sharma, Ann Lee, Ahmed A Aly, Anuj Kumar, Raffay Hamid, Xin Luna Dong

发表机构 * Meta Reality Labs（Meta现实实验室）

AI总结提出SALIMORY框架，通过层级阶段过程奖励和奖励分解对比优化，端到端训练单一语言模型管理认知结构记忆，显著降低记忆相关错误并提升个性化表现。

2606.04115 2026-06-04 cs.LG cs.AI

dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats

dMX: 低精度浮点格式的可微分混合精度分配

Giuseppe Franco, Ian Colbert, Pablo Monteagudo-Lago, Felix Marty, Nicholas Fraser

发表机构 * AMD

AI总结提出可微分混合精度量化框架 dMX，通过连续优化每层浮点格式参数并配合退火调度和正则化项，实现硬件兼容的 MXFP 格式分配，在 LLM 上取得帕累托最优效果。

详情

AI中文摘要

将大型语言模型（LLM）量化为低精度浮点表示是高效部署的关键，然而在所有层上统一应用单一比特宽度在性能和准确性方面均非最优。本文介绍 dMX，一种用于可学习浮点比特宽度分配的可微分混合精度量化框架。我们研究了其在开放计算项目（OCP）标准定义的微缩放浮点（MXFP）数据类型家族上的应用。每层比特宽度分配被表述为一个连续优化问题，其中每层的浮点格式由一个标量参数参数化，将多变量设计空间折叠为单个可学习偏移量。在训练过程中，该偏移量取连续值，避免了离散量化格式之间的突然振荡。基于温度的退火调度逐步离散化学习到的偏移量，确保最终配置映射到硬件兼容的 MXFP 格式，而不会在训练和推理行为之间出现突变。目标感知正则化项将平均比特宽度引导至用户指定的预算，作为推理成本的粗粒度代理，平衡模型质量与部署效率。我们在不同 LLM 家族（如 Llama、Qwen3 和 SmolLM2）上进行了实验，评估了 WikiText-2 上的困惑度和四个零样本推理基准上的准确率。在这些设置中，dMX 一致地产生帕累托主导模型，并优于基于 Kullback-Leibler（KL）散度的层选择启发式方法，有效导航模型质量与平均比特宽度之间的权衡。

英文摘要

Quantizing large language models (LLMs) to low-precision floating-point representations is central to efficient deployment, yet applying a single bit-width uniformly across all layers is sub-optimal in terms of both performance and accuracy. This work introduces dMX, a differentiable mixed-precision quantization framework for learnable floating-point bit-width assignment. We study its application for the microscaling floating-point (MXFP) family of data types defined by the Open Compute Project (OCP) standard. The per-layer bit-width assignment is formulated as a continuous optimization problem in which each layer's floating-point format format is parameterized by a scalar parameter, folding the multi-variate design space into a single learnable offset. During training this offset takes continuous values, avoiding sudden oscillations between discrete quantization formats. A temperature-based annealing schedule progressively discretizes the learned offsets, ensuring that the final configuration maps to hardware-compatible MXFP formats without abrupt transitions between training and inference behavior. A target-aware regularization term steers the average bit-width toward a user-specified budget, serving as a coarse-grained proxy for inference cost and balancing model quality against deployment efficiency. We performed experiments on different families of LLM, such as Llama, Qwen3, and SmolLM2, evaluating perplexity on WikiText-2 and accuracy on four zero-shot reasoning benchmarks. Across these settings, dMX consistently yields Pareto-dominating models and improves over Kullback-Leibler (KL) divergence-based layer-selection heuristics, efficiently navigating trade-offs between model quality and average bit-width.

URL PDF HTML ☆

赞 0 踩 0

2606.04111 2026-06-04 cs.RO cs.AI cs.SY eess.SY

AgenticDiffusion: Agentic Diffusion-based Path Planning for Vision-Based UAV Navigation

AgenticDiffusion：基于智能体扩散的视觉无人机导航路径规划

Faryal Batool, Muhammad Ahsan Mustafa, Fawad Mehboob, Valerii Serpiva, Dzmitry Tsetserukou

发表机构 * University of Engineering and Technology, Lahore（拉合尔工程与技术大学）

AI总结提出AgenticDiffusion多视角无人机导航框架，结合语言引导推理、开放词汇目标定位、视觉扩散规划与NMPC，通过协调第一人称和俯视图提升室内导航效率，在40次真实实验中实现80%任务成功率。

详情

AI中文摘要

室内无人机导航需要在有限视场观测下实现高效探索、场景理解和可靠轨迹执行。现有的基于视觉的导航框架通常依赖单视角观测，限制了其对遮挡、目标可见性和全局场景结构的推理能力。在这项工作中，我们提出了AgenticDiffusion，一个多视角无人机导航框架，在统一的空中导航流程中协调语言引导推理、开放词汇目标定位、基于视觉的扩散规划以及NMPC。给定自然语言指令和同步的第一人称视角（FPV）与俯视图观测，该框架在轨迹执行前确定最具信息量的导航视角并生成任务计划。使用开放词汇定位模型定位目标后，特定视角的扩散规划器生成用于无人机执行的导航轨迹。通过互补视角，所提框架减少了重复目标探索，并提高了在杂乱室内环境中的导航效率。该框架在四个真实无人机导航场景中进行了验证，涉及自适应视角选择、多阶段任务执行、长时域导航和安全着陆点选择。实验结果表明，在40次真实试验中，总体任务成功率达到80%，而扩散规划器实现了100%的轨迹生成成功率。

英文摘要

Indoor UAV navigation requires efficient exploration, scene understanding, and reliable trajectory execution under limited field-of-view observations. Existing vision-based navigation frameworks typically rely on single-view observations, limiting their ability to reason about occlusions, target visibility, and global scene structure. In this work, we propose AgenticDiffusion, a multi-view UAV navigation framework that coordinates language-guided reasoning, open-vocabulary target grounding, vision-based diffusion planning, and NMPC within a unified aerial navigation pipeline. Given a natural language instruction and synchronized first-person-view (FPV) and top-view observations, the framework determines the most informative viewpoint for navigation and generates a mission plan prior to trajectory execution. The targets are localized using an open-vocabulary grounding model, after which viewpoint-specific diffusion planners generate navigation trajectories for UAV execution. Using complementary viewpoints, the proposed framework reduces repeated target exploration and improves navigation efficiency in cluttered indoor environments. The framework was validated in four real-world UAV navigation scenarios involving adaptive viewpoint selection, multi-stage mission execution, long-horizon navigation, and safe landing-site selection. The experimental results demonstrated an overall mission success rate of 80% in 40 real-world trials, while the diffusion planners achieved a trajectory generation success rate of 100%.

URL PDF HTML ☆

赞 0 踩 0

2606.04110 2026-06-04 cs.LG stat.ML

Variance Reduction for Heavy-Tailed Monetization Metrics in Ranking Experiments via Post-Stratification

基于事后分层的排序实验中重尾货币化指标的方差缩减

Neeti Pokharna, Olivier Jeunen, Yatharth Saraf, Aleksei Ustimenko

发表机构 * ShareChat ； Aampe ； Simulacra Research

AI总结针对排序实验中重尾货币化指标方差大、统计功效低的问题，提出结合事后分层与CUPED的方差缩减框架，利用实验前协变量提升灵敏度，在ShareChat部署后以约45%的流量实现同等统计置信度。

Comments Accepted as Industry Track paper in the 2026 ACM SIGIR Conference on Research and Development in Information Retrieval

详情

DOI: 10.1145/3805712.3808428

AI中文摘要

排序和检索系统的在线评估通常依赖于下游货币化指标，如应用收入或创作者收益。这些指标通常是重尾的，一小部分用户主导了均值和方差，导致A/B实验的统计功效低、结论不可靠——尤其是在流量有限的情况下。我们提出了一个实用的在线实验方差缩减框架，通过结合事后分层与CUPED。我们的方法利用实验前协变量提高货币化实验的灵敏度，无需额外流量。在ShareChat的排名驱动货币化实验中部署后，该方法显著降低了方差并提高了决策稳定性，与标准指标相比，以约45%的流量实现了同等的统计置信度。我们进一步讨论了实际设计选择、防护措施和局限性，为事后分层在现实信息检索和推荐系统中的适用性提供了指导。

英文摘要

Online evaluation of ranking and retrieval systems often relies on downstream monetization metrics such as app revenue or creator earnings. These metrics are typically heavy-tailed, with a small fraction of users dominating both mean and variance, leading to low statistical power and unreliable conclusions in A/B experiments -- especially under limited traffic. We present a practical framework for variance reduction in online experiments by combining post-stratification with CUPED. Our approach leverages pre-experiment covariates to improve the sensitivity of monetization experiments without requiring additional traffic. Deployed at ShareChat across ranking-driven monetization experiments, the method substantially reduces variance and improves decision stability, achieving equivalent statistical confidence with ~45\% less traffic than standard metrics. We further discuss practical design choices, guardrails, and limitations, providing guidance on when post-stratification is appropriate for real-world information retrieval and Recommendation systems.

URL PDF HTML ☆

赞 0 踩 0

2606.04107 2026-06-04 cs.CV

Reflection Separation from a Single Image via Joint Latent Diffusion

基于联合潜在扩散的单图像反射分离

Zheng-Hui Huang, Zhixiang Wang, Yu-Lun Liu, Yung-Yu Chuang

发表机构 * Shanda AI Research Tokyo（Shanda AI Research东京）； National Taiwan University（台湾大学）； National Yang Ming Chiao Tung University（阳明交通大学）

AI总结提出一种基于扩散模型的方法，通过联合生成透射和反射层、跨层自注意力机制、分离采样策略和潜在优化，解决强光或弱反射等极端条件下的单图像反射分离问题。

Comments CVPR 2026. Project page: https://brian90709.github.io/diff-reflection-separation/

详情

AI中文摘要

单图像反射分离在强光或弱反射等极端条件下极具挑战性。现有方法由于信息不足，在强光或弱反射场景中往往难以恢复两个图层。本文提出了一种针对此任务显式微调的扩散模型，利用生成扩散先验实现鲁棒分离。我们的方法通过一个统一的扩散模型同时生成透射层和反射层，并引入一种新颖的跨层自注意力机制以更好地解耦特征。我们进一步引入一种分离采样策略，在扩散过程中迭代减少层间干扰，以及一个带有学习到的合成函数的潜在优化步骤，以在复杂真实场景中获得改进的结果。大量实验表明，我们的方法在多个真实世界基准上超越了最先进的方法。项目页面：https://brian90709.github.io/diff-reflection-separation/

英文摘要

Single-image reflection separation is highly challenging under extreme conditions like glare or weak reflections. Existing methods often struggle to recover both layers in glare or weak-reflection scenarios because of insufficient information. This paper presents a diffusion model explicitly fine-tuned for this task, leveraging generative diffusion priors for robust separation. Our method simultaneously generates transmission and reflection layers through a unified diffusion model, incorporating a novel cross-layer self-attention mechanism for better feature disentanglement. We further introduce a disjoint sampling strategy to iteratively reduce interference between the layers during diffusion and a latent optimization step with a learned composition function for improved results in complex real-world scenarios. Extensive experiments demonstrate that our approach surpasses state-of-the-art methods on multiple real-world benchmarks. Project page: https://brian90709.github.io/diff-reflection-separation/

URL PDF HTML ☆

赞 0 踩 0

2606.04106 2026-06-04 cs.LG cs.AI

Building The Ph(ysical)AI Layer Of Machine Intelligence

构建机器智能的物理AI层

Ulbert Jose Botero, Liam Smith, Brooks Olney, Pooya Khorrami, Steven Kusiak, Watson Jia, Sage Trudeau, Daniel Capecci

发表机构 * MIT Lincoln Laboratory（麻省理工学院林肯实验室）

AI总结提出基于信号处理原理的基座模型，通过射频数据训练实现跨模态迁移，无需目标域微调，以1.99M参数在15个任务上平均准确率77.7%。

Comments 102 pages, 11 Figures

详情

AI中文摘要

基础模型通过多样化数据的大规模训练实现泛化，但在没有配对训练数据的情况下，向真正未见过的领域迁移存在局限性。我们提出基于原理的基座模型，该模型编码信号处理原理（傅里叶分解、能量守恒、对称性），而不是学习无约束的统计相关性。我们假设不同领域的差异不在于基本物理规律，而在于时间、频率、幅度或相位上的可学习变换。仅使用射频数据训练，并结合这些原理的协同设计架构和损失函数，我们实现了向音频、图像、文本和视频的跨模态迁移，仅使用从射频数据学习到的冻结表示，无需在目标域上对编码器进行微调。我们的1.99M参数冻结编码器通过线性探测在15个不同任务上达到77.7%的平均准确率（top-3为91.9%），具有系统性差异：在物理基础任务（说话人识别、地震学、射频指纹识别）上为84.5%，而在语义任务（音乐流派、语言识别）上为70.0%。这表明基于原理和基于规模的方法提供了互补路径：物理原理实现了高效的跨模态迁移，同时自然地界定了物理理解与语义理解之间的边界。

英文摘要

Foundation models achieve generalization through massive-scale training on diverse data, but have limitations with transfer to truly unseen domains without paired training data. We propose principle-driven foundation models that encode signal-theoretic principles (Fourier decomposition, energy conservation, symmetry) rather than learn untethered statistical correlations. We hypothesize that domains differ not in fundamental physics, but in learnable transformations in time, frequency, magnitude, or phase. Training exclusively on radio-frequency (RF) data with co-designed architecture and losses incorporating these principles, we achieve cross-modal transfer to audio, images, text, and video using only frozen representations learned from RF data, requiring no fine-tuning of the encoder on target domains. Our 1.99M parameter frozen encoder achieves 77.7% average accuracy (91.9% top-3) across 15 diverse tasks via linear probing, with systematic variation: 84.5 on physically-grounded tasks (speaker recognition, seismology, RF fingerprinting) versus 70.0% on semantic tasks (music genre, language recognition). This reveals that principle-driven and scale-driven approaches offer complementary paths: physical principles enable efficient cross-modal transfer while naturally establishing the boundary between physical and semantic understanding.

URL PDF HTML ☆

赞 0 踩 0

2606.04103 2026-06-04 cs.SD cs.AI cs.LG eess.AS

The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing Aids

可微分听觉环路（DAL）：用于超个性化助听器的机器学习框架

Alejandro Ballesta Rosen, Jason Mikiel-Hunter, Julian Maclaren, Jack Collins, Richard F. Lyon, Simon Carlile

发表机构 * Google Research Australia（谷歌澳大利亚研究实验室）； Macquarie University（麦考瑞大学）

AI总结提出可微分听觉环路（DAL）框架，通过将CARFAC模型移植到JAX并优化SEANet深度神经网络，以正常听觉神经活动模式为参考补偿听力损失，在神经表征和信号保真度指标上优于传统助听器基线。

详情

AI中文摘要

传统助听器依赖固定的频率依赖性放大和压缩来管理灵敏度降低，这在复杂环境中（如多说话者场景，即“鸡尾酒会”问题）往往无法提供足够的听力支持。为了更全面地解决听力损失背后的编码功能障碍，我们引入了可微分听觉环路（DAL），这是一个用于个性化助听器设计和验配的新开源框架。我们的第一个DAL实现包含了CARFAC——一个可微的人类耳蜗功能模型，我们将其移植到JAX，以优化深度神经网络，使受损的听觉神经活动模式与正常听力参考匹配。为了构建具有所需精细频谱-时间信号处理的助听器，我们采用了SEANet，一种波形到波形的全卷积UNet生成器。我们通过比较适配正常听力的CARFAC模型输出与适配每个受试者个体听力损伤的CARFAC模型输出来微调网络。比较使用来自各自CARFAC神经活动模式（NAP）输出和稳定听觉图像（SAI）的损失函数进行，后者提供捕获听觉神经输出中相位不敏感时间结构的二维表示。通过梯度下降，SEANet模型学习同时去噪输入并补偿由受损CARFAC模型建模的听力损失。在神经表征和信号保真度指标上，DAL优化的SEANet模型优于测试的主助听器（MHA）基线。DAL框架为基于模型、机器学习驱动的助听器信号处理个性化提供了一条实用路径。下一步包括硬件部署以实现真实世界的临床测试。

英文摘要

Conventional hearing aids rely on fixed, frequency-dependent amplification and compression to manage reduced sensitivity, which often fails to provide sufficient listening support in complex environments, such as situations with multiple speakers (the ``cocktail party'' problem). To more comprehensively address the underlying encoding dysfunctions of hearing loss, we introduce the Differentiable Auditory Loop (DAL), a new open-source framework for personalized hearing aid design and fitting. Our first implementation of DAL incorporates CARFAC, a differentiable model of human cochlear function, which we ported to JAX, to optimize a deep neural network to match impaired auditory neural activity patterns with a normal-hearing reference. To build a hearing aid with the fine-grained spectro-temporal signal processing required, we adopt SEANet, a waveform-to-waveform fully convolutional UNet generator. We fine-tune the network by comparing the outputs of a CARFAC model fitted to normal hearing with that of a CARFAC model fitted to match each subject's individual hearing impairment. The comparison is done using loss functions derived from the respective CARFAC neural activity pattern (NAP) outputs and stabilized auditory images (SAIs), the latter providing a 2D representation that captures phase-insensitive temporal structure in the auditory nerve output. Through gradient descent, the SEANet model learns to both denoise the input and compensate for the hearing loss modelled by the impaired CARFAC model. Across neural-representation and signal-fidelity metrics, the DAL-optimized SEANet model outperformed the tested master hearing aid (MHA) baselines. The DAL framework provides a practical path toward model-based, machine-learning-driven personalization of hearing aid signal processing. Next steps include hardware deployment to enable real-world clinical testing.

URL PDF HTML ☆

赞 0 踩 0

2606.04100 2026-06-04 cs.LG physics.comp-ph

Stein Kernelized Molecular Dynamics for Active Learning of Interatomic Potentials

Stein核化分子动力学用于原子间势的主动学习

Joanna Zou, Fraser Birks, Dallas Foster, Youssef Marzouk

发表机构 * Center for Computational Science & Engineering, Schwarzman College of Computing, MIT（计算科学与工程中心，计算机科学学院，麻省理工学院）； Warwick Centre for Predictive Modelling, School of Engineering, University of Warwick（预测建模中心，工程学院，沃里克大学）； NVIDIA

AI总结提出Stein核化分子动力学（SKMD），一种通过相互作用粒子动力学获取信息性训练配置的增强采样方法，用于主动学习和微调机器学习原子间势，保持玻尔兹曼分布作为渐近分布，并采用自适应停止准则高效在线获取非冗余数据，在Müller-Brown势和丙氨酸二肽的MACE势上展示了优于基线的模型精度。

详情

AI中文摘要

机器学习原子间势（MLIP）能够实现高效且精确的原子模拟，但其性能关键取决于训练数据的质量和多样性。我们引入了Stein核化分子动力学（SKMD），这是一种增强采样方法，利用相互作用粒子动力学获取信息性训练配置，用于MLIP的主动学习和微调。SKMD是Stein变分梯度下降的一种随机变体，通过引入异步粒子更新和全局原子描述符的核函数，为分子动力学进行了适配，从而提供了对称性感知的构型相似性度量。与分子动力学中使用的其他增强采样器不同，SKMD保留了玻尔兹曼分布作为动力学的渐近分布。这一特性在探索多样构型与吸引到高概率区域之间取得了平衡。我们进一步提出了一种高效在线数据获取方法，使用自适应停止准则在模拟过程中选择非冗余训练数据。我们展示了SKMD在Müller-Brown势的神经网络模型主动学习以及丙氨酸二肽的MACE原子间势微调中的应用。与主动学习基线相比，我们的方法在相同数量的训练样本下，以更少的训练迭代次数实现了更高的模型精度。

英文摘要

Machine learning interatomic potentials (MLIPs) enable efficient and accurate atomistic simulations but depend critically on the quality and diversity of the training data. We introduce Stein kernelized molecular dynamics (SKMD), an enhanced sampling method that uses interacting particle dynamics to acquire informative training configurations for the active learning and fine-tuning of MLIPs. SKMD corresponds to a stochastic variant of Stein variational gradient descent that is adapted for molecular dynamics by incorporating asynchronous particle updates and a kernel of global atomic descriptors, which provides a symmetry-aware measure of configurational similarity. Unlike other enhanced samplers used in molecular dynamics, SKMD preserves the Boltzmann distribution as the asymptotic distribution of the dynamics. This property enforces a balance between the exploration of diverse configurations and attraction toward high-probability regions of the energy landscape. We further propose an approach to efficient online data acquisition using an adaptive stopping criterion that selects non-redundant training data over the course of simulation. We demonstrate SKMD for the active learning of a neural network model of the Müller-Brown potential and the fine-tuning of a MACE interatomic potential for alanine dipeptide. Compared to active learning baselines, our method achieves higher model accuracy in fewer training iterations with the same number of acquired training samples.

URL PDF HTML ☆

赞 0 踩 0

2606.04098 2026-06-04 cs.CV

When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

当眼见不再为实——面向搜索辅助的视频虚假信息检测基准

Tao Yu, Yujia Yang, Shenghua Chai, Zhang Jinshuai, Haopeng Jin, Hao Wang, Minghui Zhang, Zhongtian Luo, Yuchen Long, Xinlong Chen, Jiabing Yang, Zhaolu Kang, Yuxuan Zhou, Zhengyu Man, Xinming Wang, Hongzhu Yi, Zheqi He, Xi Yang, Yan Huang, Liang Wang

发表机构 * CASIA（中国科学院自动化研究所）； UCAS（中国科学院大学）； BAAI（百度人工智能研究院）； Tsinghua University（清华大学）； Peking University（北京大学）

AI总结提出EVID-Bench基准，通过跨视频对比和开放网络搜索检测视频虚假信息，涵盖9种操纵类型，评估前沿多模态模型发现准确率低且面临多种挑战。

Comments 52 pages

详情

AI中文摘要

视频虚假信息越来越多地在语义和证据层面运作：真实镜头可能被选择性编辑、时间重排、跨源拼接或通过AI生成内容增强以构建虚假叙事。这种依赖证据的操纵无法仅从输入视频中可靠验证，因为缺失、重排、替换或重新语境化的证据位于视频本身之外。我们引入了 extbf{EVID-Bench}，一个面向搜索辅助的视频虚假信息检测基准，系统必须搜索开放网络以查找相关视频，并通过跨视频比较识别哪些信息是虚假的。EVID-Bench包含222个视频，涵盖3类9种操纵类型：AI生成、单源编辑和多源编辑。所有样本均经过验证，前沿模型仅通过视觉检查无法检测。我们使用检索增强验证基线评估了九种前沿多模态模型。最佳系统仅达到61.43%的点级准确率和43.24%的视频级准确率，而AI生成的操纵仍然特别具有挑战性。错误分析揭示了反复出现的挑战：模型固着于无关锚点，错误地将合成内容归因于编辑拼接，并在完全解释操纵之前过早终止搜索。

英文摘要

Video misinformation increasingly operates at the semantic and evidential level: authentic footage may be selectively edited, temporally reordered, spliced across sources, or augmented with AI-generated content to construct false narratives. Such evidence-dependent manipulations cannot be reliably verified from the input video alone, because the missing, reordered, replaced, or recontextualized evidence lies outside the video itself. We introduce \textbf{EVID-Bench}, a benchmark for search-grounded video misinformation detection, where a system must search the open web for related videos and identify what information is false through cross-video comparison. EVID-Bench comprises 222 videos spanning 9 manipulation types across 3 categories: AI generation, single-source editing, and multi-source editing. All samples are verified to be undetectable by frontier models through visual inspection alone. We evaluate nine frontier multimodal models using a retrieval-augmented verification baseline. The best system achieves only 61.43\% point-level accuracy and 43.24\% video-level accuracy, while AI-generated manipulations remain especially challenging. Error analysis reveals recurring challenges: models fixate on irrelevant anchors, misattribute synthetic content to editorial splicing, and terminate search prematurely before fully explaining the manipulation.

URL PDF HTML ☆

赞 0 踩 0

2606.04095 2026-06-04 cs.CL cs.AI

POLARIS: Guiding Small Models to Write Long Stories

POLARIS：引导小模型撰写长篇小说

Rishanth Rajendhran, Jenna Russell, Mohit Iyyer, John Frederick Wieting

发表机构 * University of Maryland（马里兰大学）； Google（谷歌）； DeepMind（深Mind）

AI总结提出POLARIS训练方法，结合LLM裁判奖励和人类参考注入，使9B小模型在长故事写作中达到与27B模型相当的质量，并展现出长度泛化能力。

详情

AI中文摘要

小型开源模型在长篇创意写作中表现不佳：它们生成的故事要么远低于要求的长度，要么随着长度增加质量显著下降，尤其是与前沿模型相比。我们提出了POLARIS（基于LLM裁判奖励和锚定参考注入的故事写作策略优化），这是一种低计算量的GRPO方法，包含两个关键要素：一个具有结构化故事质量评分标准的前沿LLM裁判作为在线奖励，以及人类参考注入（HRI），其中教师强制的人类撰写故事作为每个GRPO组内的高奖励锚点。通过将我们的训练方法应用于Qwen3.5-9B，使用从100部短篇小说集中提取的约1.4K个提示-故事对数据集和4块A100 GPU，我们得到了POLARIS-9B。在涵盖分布内和分布外提示及评分标准的五个基准测试中，POLARIS-9B与更大的开源模型竞争，同时更严格地遵循长度指令。盲人机评估证实，POLARIS-9B优于基础Qwen3.5-9B，并与Qwen3.5-27B相当。尽管仅在长达4000词的故事上训练，POLARIS-9B在要求故事长度达到训练长度3倍的提示下仍能保持质量，而大多数开源模型在此情况下质量、长度遵循度或两者均显著下降。更广泛地说，我们的结果表明，长度泛化是创意写作模型的一个有意义的压力测试，也是区分其他接近模型的有用视角。

英文摘要

Small open-weight models struggle at long-form creative writing: their generated stories either fall far short of the requested length, or their quality significantly degrades as length increases, especially when compared to frontier models. We present POLARIS (Policy Optimization with LLM-as-a-judge rewards and Anchored-Reference Injection for Storywriting), a lower-compute GRPO recipe with two key ingredients: a frontier LLM judge with a structured Story Quality rubric as the online reward, and human-reference injection (HRI), where a teacher-forced human-written story serves as a high-reward anchor within each GRPO group. By applying our training recipe to Qwen3.5-9B, using a dataset of approximately 1.4K prompt-story pairs derived from 100 short-story anthologies and 4 A100 GPUs, we obtain POLARIS-9B. Across five benchmarks spanning in-distribution and out-of-distribution prompts and rubrics, POLARIS-9B is competitive with much larger open-weight models while following length instructions more closely. A blinded human evaluation confirms that POLARIS-9B is preferred to the base Qwen3.5-9B and on par with Qwen3.5-27B. Despite training only on stories up to 4k words, POLARIS-9B preserves quality on prompts requesting stories up to 3 times the training length, a regime where most open-weight models degrade substantially in quality, length adherence, or both. More broadly, our results suggest that length generalization is a meaningful stress test for creative-writing models and a useful lens for distinguishing otherwise close models.

URL PDF HTML ☆

赞 0 踩 0

2606.04092 2026-06-04 cs.CV cs.LG

Optimal Transport Flow Matching by Design

通过设计实现最优传输流匹配

Shimon Malnick, Matan Rusanovsky, Ohad Fried, Shai Avidan

发表机构 * Tel Aviv University（特拉维夫大学）； Reichman University（里奇曼大学）

AI总结本文通过将先验分布视为设计选择而非固定输入，利用数据与其低频投影之间的恒等耦合作为最优传输耦合，简化流匹配模型中的轨迹曲率，实现快速高质量生成。

Comments Project page: https://www.malnick.net/designing_ot_flows

详情

AI中文摘要

流匹配模型学习将样本从简单先验分布传输到复杂数据分布。当先验-数据对通过最优传输（OT）耦合时，学习到的轨迹是直线且无交叉的，从而实现快速甚至单步生成。然而，在高维空间中计算OT耦合是困难的，现有方法试图解决OT问题，但代价是持续的偏差或显著的开销。我们不求解OT耦合，而是重新表述问题。一旦将先验视为设计选择而非固定输入，先验与数据之间的OT耦合就不再唯一。许多先验允许与数据之间存在OT最优的恒等耦合，因此我们可以自由选择一个易于采样的先验。我们将自然图像的低频投影确定为这样的选择。数据与其低频表示之间的恒等耦合在经验上是OT最优的，先验的结构足够丰富，可以在推理时由轻量级模型采样，而剩余的流匹配任务简化为合成高频细节。用高斯噪声插值先验进一步提高了生成质量，同时保留了OT耦合。该方法无需对流模型本身进行修改，并且自然地与潜在空间模型、无分类器引导和单步生成框架集成。在所有基准测试中，与现有流匹配方法相比，我们的方法将轨迹曲率降低了2倍以上，从而在少步数情况下实现了更好的生成质量。

英文摘要

Flow matching models learn to transport samples from a simple prior distribution to a complex data distribution. When prior-data pairs are coupled via optimal transport (OT), the learned trajectories are straight and non-crossing, enabling fast, even single-step, generation. However, computing the OT coupling in high dimensions is intractable, and existing methods attempt to solve the OT problem, at the cost of persistent bias or significant overhead. Rather than solving for the OT coupling, we reformulate the problem. Once the prior is treated as a design choice rather than a fixed input, the OT coupling between prior and data is no longer unique. Many priors admit an OT-optimal identity coupling to the data, leaving us free to choose one that is also tractable to sample. We identify low-frequency projection of natural images as such a choice. The identity coupling between data and its low-frequency representation is empirically OT-optimal, the prior is structured enough to be sampled by a lightweight model at inference, and the remaining flow-matching task reduces to synthesizing high-frequency detail. Interpolating the prior with Gaussian noise further improves generation quality while preserving the OT coupling. The approach requires no modifications to the flow model itself, and integrates naturally with latent-space models, classifier-free guidance, and one-step generation frameworks. Across all benchmarks, our method reduces trajectory curvature by more than $2\times$ compared to existing flow matching methods, yielding better generation quality in the few-step regime.

URL PDF HTML ☆

赞 0 踩 0

2606.04075 2026-06-04 cs.LG cs.AI cs.CL cs.CR cs.CY

Large Language Models Hack Rewards, and Society

大型语言模型攻击奖励机制与社会

Wei Liu, Xinyi Mou, Hanqi Yan, Zhongyu Wei, Yulan He

发表机构 * King’s College London（伦敦大学国王学院）； Fudan University（复旦大学）； The Alan Turing Institute（艾伦·图灵研究所）

AI总结研究强化学习训练中大型语言模型利用奖励函数漏洞的“社会攻击”现象，通过SocioHack沙盒实验发现模型能发现并利用社会规则漏洞，且现有安全措施效果有限。

Comments 14 pages, 9 figures, 7 tables

详情

AI中文摘要

强化学习已成为一种主导的后训练范式，使大型语言模型能够从奖励中学习。我们观察到社会规则在结构上与奖励函数相似。它们定义了可衡量的结果、阈值和例外情况，同时往往仅部分指定了制度意图。我们假设强化学习训练过程可能利用这些漏洞，因此提出模型在强化学习期间攻击奖励函数的已知倾向是否可能扩展为一种更严重的失败模式，即社会攻击：发现社会运行规则中的漏洞。为了研究这一现象，我们引入了SocioHack，一个包含72个社会环境的沙盒，并发现这些环境中奖励攻击自然出现并导致监管漏洞的发现。模型学会攻击社会规则并生成技术上合规但违背监管意图的策略，而当前的大型语言模型安全措施仅提供有限的缓解。因此，收集真实世界反馈用于模型训练需要更加谨慎，我们需要下一代后训练范式来安全地在真实社会中迭代大型语言模型。

英文摘要

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving institutional intent only partially specified. We hypothesise that the RL training process may exploit these gaps and therefore ask whether models' well-known tendency to hack reward functions during RL can scale into a more consequential failure mode named societal hacking: discovering loopholes in the rules society runs on. To study this phenomenon, we introduce SocioHack, a sandbox of 72 societal environments, and find that within these environments, reward hacking naturally emerges and leads to regulatory loophole discovery. Models learn to hack the social rules and generate strategies that remain technically compliant while defeating regulatory intent, and current LLM safeguards provide only limited mitigation. Therefore, collecting in-the-wild feedback for model training requires greater caution, and we need a next-generation post-training paradigm for safely iterating LLMs in real society.=

URL PDF HTML ☆

赞 0 踩 0

2606.04074 2026-06-04 cs.LG cs.AI cs.IT math.IT

Adaptive Patching Is Harder Than It Looks For Time-Series Forecasting

自适应分块在时间序列预测中比看起来更难

Federico Zucchi, Yi Xie, Chao Zhang, Keyuan Luo, Thomas Lampert, Ziyue Li

发表机构 * ICube, University of Strasbourg, Illkirch-Graffenstaden, France（斯特拉斯堡大学ICube研究所，法国伊尔克里奇-格拉夫芬斯坦德）； Technical University of Munich（慕尼黑技术大学）； FinTech Thrust, The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州）金融科技研究组）； Computer Science Department, Hainan Bielefeld University of Applied Sciences（海南比尔费尔德应用科学大学计算机科学系）； Cephalgo, Strasbourg, France（法国斯特拉斯堡Cephalgo公司）； Heilbronn Data Science Center, Munich Data Science Institute（慕尼黑数据科学研究所海德堡数据科学中心）

AI总结本文通过理论分析和实验验证，探讨自适应分块在时间序列Transformer中是否优于调优的均匀分块，发现均匀基线在标准基准上具有竞争力，自适应分块的优势有限且依赖于特定方法和数据集。

详情

AI中文摘要

自适应分块是时间序列Transformer最近提出的一个引人注目的方案：在序列局部信息丰富的区域分配更细的分块。本文探究在什么条件下内容自适应分块算子应优于调优的均匀算子。局部异质性本身并不足够：在逐点预测损失下，一个看似复杂的区域并不自动意味着更细的分块会减少损失。我们将分块建模为有预算的比特率分配，并推导出一个显式阈值，动态分块规则必须满足该阈值才能击败调优的均匀基线，然后从局部（二次代理）和全局（模型假设下的强凸界）两方面界定了可实现的改进。由此得出两个结构性结果：在没有耦合约束的情况下，标量局部复杂度无法在常见损失景观下产生非均匀最优；一旦骨干网络训练到其表示感知最优，对齐增益会在调优的均匀分块大小附近崩溃。为了验证这些预测，我们在三种代表性架构上进行了受控隔离研究，用均匀分块大小扫描替换每个自适应机制，同时保持骨干网络、数据和训练协议不变。在标准的长时域预测基准上，验证选择的均匀基线与动态对应物具有竞争力，每个设置的效果集中在零附近，且按数据集汇总后没有一致的方向性优势。我们观察到的较大增益是方法和数据集特定的。因此，自适应分块应针对调优的均匀基线进行评估；其价值取决于是否有一个廉价且可靠的路由信号能够识别出更细的分块实际上在何处减少预测损失。

英文摘要

Adaptive patching is a recent and compelling proposal for time-series Transformers: allocate finer patches where the sequence looks locally informative. This paper asks under what conditions a content-adaptive patching operator should outperform a tuned uniform one. Local heterogeneity alone is not enough: under pointwise forecasting losses, a complex-looking region is not automatically one where finer patching reduces the loss. We model patching as a budgeted bitrate allocation and derive an explicit threshold that a dynamic patching rule must satisfy to beat a well-tuned uniform baseline, then bound the achievable improvement both locally (a quadratic surrogate) and globally (a strong-convexity bound under the model's assumptions). Two structural results follow: without a coupling constraint, scalar local complexity cannot produce a non-uniform optimum under a common loss landscape; and once the backbone is trained to its representation-aware optimum, the alignment gain collapses around a well-tuned uniform patch size. To test these predictions, we run a controlled isolation study on three representative architectures, replacing each adaptive mechanism with a uniform patch-size sweep while keeping the backbone, data, and training protocol fixed. On standard long-horizon forecasting benchmarks, the validation-selected uniform baseline is competitive with the dynamic counterpart, with per-setting effects concentrated near zero and no consistent directional advantage once results are aggregated by dataset. The larger gains we do observe are method- and dataset-specific. Adaptive patching should therefore be evaluated against a tuned uniform baseline; its value depends on whether a cheap and reliable routing signal can identify where finer patches actually reduce forecasting loss.

URL PDF HTML ☆

赞 0 踩 0

2606.04073 2026-06-04 cs.LG cs.AI stat.ML

TPA-AD: A Two-Stage Pseudo Anomaly-Guided Method for Bearing Time-Series Anomaly Detection

TPA-AD: 一种用于轴承时间序列异常检测的两阶段伪异常引导方法

Xiancheng Wang, Zhibo Zhang, Ran Li, Rui Wang, Minghang Zhao, Shisheng Zhong, Lin Wang

发表机构 * CQSF.com（重庆师范大学）； Huadian University（哈尔滨理工大学）

AI总结提出一种两阶段伪异常引导方法TPA-AD，通过重构模型和特征误差控制生成边界伪异常窗口，结合对比学习与KNN实现无监督轴承时间序列异常检测，在轴承故障和退化数据集上表现稳定且具泛化性。

详情

AI中文摘要

本文提出了一种两阶段伪异常引导的异常检测方法（TPA-AD），用于在仅正常样本可用的训练设置下进行轴箱轴承时间序列异常检测（TSAD）。该方法首先利用重构模型和每特征目标误差控制在正常边界附近生成伪异常窗口，然后通过正常窗口与伪异常窗口之间的对比学习学习异常敏感表示，最后使用k近邻（KNN）生成窗口级和点级异常分数。与依赖已知故障类别、真实异常先验或随机异常注入的现有方法相比，TPA-AD通过在边界邻域构建伪异常提高了正常边界的可分离性，并能联合处理混合变量场景中的连续和离散特征。主要实验在轴承故障检测数据集和退化过程数据集上进行，并在13个公共TSAD数据集上进行了额外的探索性扩展。结果表明，所提方法产生相对稳定的异常响应，对退化演化敏感，并在公共TSAD基准和真实高速列车相关轴承数据上表现出一定程度的更广泛适用性。

英文摘要

This paper proposes a two-stage pseudo anomaly-guided anomaly detection method (\textbf{T}wo-stage \textbf{P}seudo \textbf{A}nomaly-guided \textbf{A}nomaly \textbf{D}etection, \textbf{TPA-AD}) for axle-box bearing time-series anomaly detection (time series anomaly detection, TSAD) under the setting where only normal samples are available for training. The method first generates pseudo-anomalous windows near the normal boundary using a reconstruction model and per-feature target-error control. It then learns anomaly-sensitive representations through contrastive learning between normal and pseudo-anomalous windows, and finally produces window-level and point-level anomaly scores using k-nearest neighbors (KNN). Compared with existing methods that rely on known fault categories, real anomaly priors, or random anomaly injection, TPA-AD improves the separability of the normal boundary by constructing pseudo-anomalies in boundary neighborhoods and can jointly handle continuous and discrete features in mixed-variable scenarios. The main experiments are conducted on bearing fault detection datasets and degradation-process datasets, with an additional exploratory extension on $13$ public TSAD datasets. The results show that the proposed method yields relatively stable anomaly responses, is sensitive to degradation evolution, and demonstrates a certain degree of broader applicability on public TSAD benchmarks and real high-speed-train-related bearing data.

URL PDF HTML ☆

赞 0 踩 0

2606.04072 2026-06-04 cs.RO cs.DC cs.LG cs.SY eess.SY

CADET: A Modular Platform for Evaluating Distributed Cooperative Autonomy in Connected Autonomous Vehicles

CADET：用于评估网联自动驾驶车辆中分布式协作自主性的模块化平台

Pragya Sharma, Brian Wang, Mani Srivastava

发表机构 * UCLA ； Amazon Scholar（亚马逊学者）

AI总结提出CADET模块化平台，通过解耦自动驾驶堆栈并集成网络与工作负载仿真，系统评估分布式协作自主系统在真实部署条件下的安全性与性能。

详情

Journal ref: ICRA 2026

AI中文摘要

深度学习模型日益成为自动驾驶汽车（AV）管道的核心，然而其集成传统上遵循单一设计，即感知、规划和控制在同一车载计算机上执行。这种设计忽视了协作自主的新兴范式，即车辆通过车联网（V2X）连接与路侧单元（RSU）、边缘服务器和云托管智能进行交互。协作感知和控制提高了安全性和效率，但也引入了系统级挑战：网络延迟、计算异构性和多租户争用，所有这些都严重影响实时决策。这些挑战因对大型基础模型的日益依赖而进一步放大，这些模型的规模需要云部署。我们提出CADET（通过分布式实验工具包实现协作自主），这是一个模块化平台，用于在真实部署条件下对分布式协作自主系统进行系统化和可重复的评估。CADET将自动驾驶堆栈解耦为可组合的模块，这些模块可以灵活地部署在车辆、基础设施和边缘/云层级上。该框架集成了最先进的模型，引入了基于轨迹的网络和工作负载仿真，并提供了同步的模型级、系统级和任务级检测。通过V2V和V2I实验，我们表明分布式部署选择从根本上影响安全性，其中V2V意图数据包优于基于云的感知，而RSU辅助感知在过载并发请求之前维持安全性。尽管专为自动驾驶管道设计，CADET也支持数据集驱动的实验，使系统和机器学习研究人员能够独立于完整的车辆仿真来基准测试分布式推理工作负载。CADET是开源的，代码和演示可在https://nesl.github.io/cadet-web获取。

英文摘要

Deep learning models are increasingly central to autonomous vehicle (AV) pipelines, yet their integration has traditionally followed a monolithic design where perception, planning, and control execute on a single onboard computer. This design overlooks the emerging paradigm of cooperative autonomy, where vehicles interact with roadside units (RSUs), edge servers, and cloud-hosted intelligence through vehicle-to-everything (V2X) connectivity. Cooperative perception and control improve safety and efficiency, but also introduce systems-level challenges: network latency, compute heterogeneity, and multi-tenant contention, all critically affect real-time decision-making. These challenges are further amplified by the increasing reliance on large foundation models, whose scale necessitates cloud deployment. We present CADET (Cooperative Autonomy through Distributed Experimentation Toolkit), a modular platform for systematic and reproducible evaluation of distributed cooperative autonomy systems under realistic deployment conditions. CADET decouples the AV stack into composable modules that can be flexibly deployed across vehicles, infrastructure, and edge/cloud tiers. The framework integrates state-of-the-art models, incorporates trace-driven network and workload emulation, and provides synchronized model-, system-, and task-level instrumentation. Through V2V and V2I experiments, we show that distributed deployment choices fundamentally shape safety, with V2V intent packets outperforming cloud-based perception and RSU-assisted perception sustaining safety until overloaded by concurrent requests. Although designed for AV pipelines, CADET also supports dataset-driven experimentation, enabling systems and ML researchers to benchmark distributed inference workloads independently of full vehicle simulation. CADET is open source, with code and demo available at https://nesl.github.io/cadet-web.

URL PDF HTML ☆

赞 0 踩 0

2606.04063 2026-06-04 cs.LG cs.AI

LLM Compression with Jointly Optimizing Architectural and Quantization choices

联合优化架构与量化选择的大语言模型压缩

Hoang-Loc La, Truong-Thanh Le, Amir Taherkordi, Phuong Hoai Ha

发表机构 * UiT The Arctic University of Norway（UiT北莫斯科斯大学）； University of Oslo, Norway（奥斯陆大学）

AI总结提出一种可微神经架构搜索框架，联合优化大语言模型的架构配置与混合精度量化，实现更优的精度-延迟权衡。

详情

AI中文摘要

部署大型语言模型（LLM）因其巨大的内存和计算需求而具有挑战性。虽然一些方法通过从头开发小型或微型语言模型来解决这一问题，但这些方法需要大量的GPU训练。压缩预训练的LLM用于边缘设备提供了一种有吸引力的替代方案。除了剪枝和量化，神经架构搜索（NAS）能够实现有效的压缩，然而先前的NAS方法通常限制搜索空间并将架构与量化解耦。我们引入了一种可微NAS框架，该框架探索整个空间，并联合优化LLM线性层的架构配置与混合精度量化。实验表明，我们的模型在精度-延迟权衡上具有优越性：在可比精度下，我们的模型推理速度比顺序的NAS后量化基线快1.4倍，或在等效延迟下，在七个推理任务上平均精度提高高达6%。

英文摘要

Deploying large language models (LLMs) is challenging due to their significant memory and computational requirements. While some methods address this by developing small or tiny language models from scratch, these approaches demand extensive GPU training. Compressing pre-trained LLMs for edge devices offers a compelling alternative. Beyond pruning and quantization, Neural Architecture Search (NAS) enables effective compression, yet prior NAS approaches often limit the search space and decouple architecture from quantization. We introduce a differentiable NAS framework that explores the entire space and jointly optimizes architectural configurations alongside mixed-precision quantization for linear layers of LLMs. Experiments demonstrate superior accuracy-latency trade-offs: our models achieve up to 1.4x faster inference than sequential NAS-then-quantization baselines at comparable accuracy, or up to 6% higher average accuracy across seven reasoning tasks at equivalent latency.

URL PDF HTML ☆

赞 0 踩 0

2606.04061 2026-06-04 cs.CV

Intra-Modal Neighbors Never Lie: Rectifying Inter-Modal Noisy Correspondence via Graph-Based Intra-Modal Reasoning

模态内邻居从不说谎：基于图模态内推理纠正模态间噪声对应

Yang Liu, Wentao Feng, Shu-Dong Huang, Yalan Ye, Jiancheng Lv

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出IN2R框架，利用模态内数据的几何稳定性，通过图精炼器对动态跨模态记忆中的邻居进行关系推理，合成连续软原型以纠正模态间噪声对应，显著提升跨模态检索性能。

详情

Journal ref: International Conference of Machine Learning 2026

AI中文摘要

大规模网络采集数据集推动了跨模态检索的进展，但不可避免地遭受噪声对应问题，严重损害模型泛化能力。现有方法主要通过过滤噪声或寻找替代标签来解决，但它们主要局限于“离散选择”范式。我们认为，依赖单一离散代理会导致单点脆弱性和离散化误差。为克服这些限制，我们提出了一种新颖框架——模态内邻居感知噪声纠正（IN2R），它将范式从搜索替代标签转变为合成可靠的监督目标。利用模态内数据固有的几何稳定性，IN2R采用图精炼器对从动态跨模态记忆中检索到的邻居进行关系推理。我们的方法不是传播离散标签，而是合成一个连续的软原型，反映局部语义邻域的共识，有效纠正模态间错位。在Flickr30K、MS-COCO和CC152K上的大量实验表明，IN2R显著优于最先进的方法。我们的代码和预训练模型可在https://github.com/liuyyy111/IN2R公开获取。

英文摘要

Large-scale web-harvested datasets have fueled the progress of cross-modal retrieval but inevitably suffer from noisy correspondence, which severely degrades model generalization. Existing methods primarily address this by filtering out noise or seeking a substitute label, yet they predominantly remain bound by a "Discrete Selection" paradigm. We argue that relying on a single discrete proxy induces Single-Point Fragility and Discretization Error. To overcome these limitations, we propose a novel framework, Intra-modal Neighbor-aware Noise Rectification (IN2R), which shifts the paradigm from searching for a substitute to synthesizing a reliable supervision target. Leveraging the intrinsic geometric stability of intra-modal data, IN2R employs a Graph Refiner to perform relational reasoning over neighbors retrieved from a dynamic Cross-Model Memory. Instead of propagating discrete labels, our method synthesizes a continuous, soft prototype that reflects the consensus of the local semantic neighborhood, effectively rectifying inter-modal misalignment. Extensive experiments on Flickr30K, MS-COCO, and CC152K demonstrate that IN2R significantly outperforms state-of-the-art methods. Our code and pre-trained models are publicly available at https://github.com/liuyyy111/IN2R.

URL PDF HTML ☆

赞 0 踩 0

2606.04060 2026-06-04 cs.CV

Weakly Supervised Incremental Segmentation via Semantic Anchors and Spatial Arbitration

基于语义锚点和空间仲裁的弱监督增量分割

Zhonggai Wang, Kai Fang, Guangyu Gao

发表机构 * National Natural Science Foundation of China（中华人民共和国国家自然科学基金委员会）； Tsinghua University（清华大学）

AI总结针对弱监督增量语义分割中噪声监督导致的特征漂移和语义覆盖问题，提出SASA方法，通过语义锚点稳定表示学习和空间标签仲裁过滤不可靠信号，有效缓解特征漂移。

Comments Accepted by ICME2026

详情

AI中文摘要

弱监督增量语义分割（WILSS）面临持续引入噪声监督的问题，这会逐步破坏类别级表示，导致严重的特征漂移和语义污染，从而使新学习的类别覆盖旧类别。为了解决这些问题，我们提出了一种抗漂移的WILSS方法，名为SASA，旨在通过语义锚点和空间仲裁稳定语义学习。具体地，在表示层面，我们引入可学习令牌的语义锚点作为刚性类别级参考，以保持长期语义一致性。作为补充，弹性残差适应实现了受控的、实例特定的细化，确保稳定而灵活的学习轨迹。在监督层面，我们开发了一种空间标签仲裁机制，该机制执行几何感知决策，直接过滤不可靠信号，并强制执行严格的“一个对象，一个类别”约束。通过协同稳定表示和提高监督可靠性，SASA有效缓解了弱监督下的特征漂移。在标准基准上的大量实验表明，我们的方法始终优于现有最先进方法，特别是在具有挑战性的多步增量设置中。代码可在https://github.com/ZhonggaiWang/SASA获取。

英文摘要

Weakly Incremental Learning for Semantic Segmentation (WILSS) suffers from the continuous introduction of noisy supervision, which progressively corrupts class-level representations, leading to severe feature drift and semantic corruption, thereby causing newly learned classes to overwrite old ones. To address these issues, we propose a drift-resilient WILSS approach, named SASA, designed to stabilize semantic learning via Semantic Anchors and Spatial Arbitration. Specifically, at the representation level, we introduce semantic anchors of learnable tokens as rigid class-level references to preserve long-term semantic identity. Complementary to this, an elastic residual adaptation facilitates controlled, instance-specific refinement, ensuring a stable yet flexible learning trajectory. At the supervision level, we develop a Spatial Label Arbitration mechanism that performs geometry-aware decisions to directly filter unreliable signals and enforce a strict "one object, one class" constraint. By synergistically stabilizing representations and improving supervision reliability, SASA effectively mitigates feature drift under weak supervision. Extensive experiments on standard benchmarks demonstrate that our approach consistently outperforms existing state-of-the-art methods, particularly in challenging multi-step incremental settings. The code is available at https://github.com/ZhonggaiWang/SASA.

URL PDF HTML ☆

赞 0 踩 0

2606.04053 2026-06-04 cs.LG cs.AI

A Goal-Set Characterization of Task Composition in the Boolean Task Algebra

布尔任务代数中任务组合的目标集刻画

Eduardo Terrés-Caballero, Herke van Hoof

发表机构 * Informatics Institute, University of Amsterdam（阿姆斯特丹大学信息学院）； AMLab, University of Amsterdam（阿姆斯特丹大学AML实验室）

AI总结本文通过目标集方法简化了布尔任务代数中的任务组合，证明了确定性MDP中最优扩展Q值函数由通用任务和空任务决定，从而减少了学习成本。

详情

AI中文摘要

布尔任务代数（BTA）通过为达到目标的任务配备布尔运算，为强化学习中的零样本任务组合提供了一个原则性框架。我们重新审视了其结构假设，并形式化了最优扩展Q值函数空间中的坍缩：在确定性MDP中，每个这样的函数完全由通用任务和空任务决定。这使得原始BTA公式中提出的对数基任务集变得冗余。基于这一观察，我们引入了一种基于目标集的组合方法，该方法对目标集执行逻辑运算，并通过从通用值函数和空值函数中选择切片来重构组合值函数。这降低了标准BTA的学习成本，并减少了BTA和技能机器的组合时间，同时保持了策略性能。在表格、视觉、函数逼近和连续控制领域的实验表明，学习额外的基任务并不会带来更好的性能。最后，我们研究了随机设置，并提供了一个反例，表明这种坍缩不一定成立，即最优组合可能需要考虑目标数量指数级的策略。代码可在 https://github.com/EduardoTerres/bta_paper 获取。

英文摘要

The Boolean Task Algebra (BTA) provides a principled framework for zero-shot task composition in reinforcement learning by equipping goal-reaching tasks with Boolean operations. We revisit its structural assumptions and formalize a collapse in the space of optimal extended Q-value functions: in deterministic MDPs, every such function is fully determined by the universal and empty tasks. This makes the logarithmic set of base tasks proposed in the original BTA formulation redundant. Building on this observation, we introduce a goal-set-based composition method that performs logical operations on goal sets and reconstructs composed value functions by selecting slices from the universal and empty value functions. This reduces learning costs for standard BTA and reduces composition time for both BTA and Skill Machines, while preserving policy performance. Experiments across tabular, visual, function-approximation, and continuous-control domains show that learning additional base tasks does not yield better performance. Finally, we study the stochastic setting and provide a counterexample showing that this collapse need not hold, that is, optimal composition may require accounting for exponentially many policies in the number of goals. Code is available at https://github.com/EduardoTerres/bta_paper.

URL PDF HTML ☆

赞 0 踩 0

2606.04051 2026-06-04 cs.LG cs.AI cs.CR

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

RUBAS: 基于评分标准的强化学习用于智能体安全

Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui, Qi Zhu, Fei Mi, Hongning Wang, Minlie Huang

发表机构 * The Conversational AI (CoAI) group, DCST, Tsinghua University（清华大学对话人工智能（CoAI）组，DCST，清华大学）； Huawei Noah’s Ark Lab（华为诺亚实验室）

AI总结提出RUBAS框架，通过将智能体行为分解为四个维度的评分标准提供细粒度奖励，利用强化学习在保证任务完成的同时提升工具使用安全性。

详情

AI中文摘要

LLM进化为工具型智能体带来了与真实世界执行相关的新安全挑战，而非简单的文本生成。现有的对齐方法通常依赖粗略的拒绝信号或静态监督，难以在多样化的智能体风险中平衡安全性与有用的工具执行。我们提出了RUBAS，一种基于评分标准的强化学习框架用于智能体安全。RUBAS将智能体行为分解为四个维度：工具使用安全性、参数安全性、响应安全性和有用性。这些结构化的评分标准在完整的智能体轨迹上提供细粒度且可解释的奖励，使强化学习能够在保持任务完成的同时优化安全工具使用。在多个智能体安全基准和模型上的大量实验表明，RUBAS相比标准对齐基线提高了安全性，减少了基于工具的幻觉，并保持了竞争性的实用性。我们的结果表明，多维评分标准奖励为在安全关键的工具使用环境中对齐LLM智能体提供了有效的训练信号。

英文摘要

The evolution of LLMs into tool-enabled agents creates a new class of safety challenges associated with real-world execution rather than simple text generation. Existing alignment methods often rely on coarse refusal signals or static supervision, making it difficult to balance safety with useful tool execution across diverse agentic risks. We introduce RUBAS, a rubric-based reinforcement learning framework for agent safety. RUBAS decomposes agent behavior into four dimensions: tool-use safety, argument safety, response safety, and helpfulness. These structured rubrics provide fine-grained and interpretable rewards over complete agent trajectories, enabling reinforcement learning to optimize safe tool use while preserving task completion. Extensive experiments across multiple agent safety benchmarks and models show that RUBAS improves safety over standard alignment baselines, reduces tool-grounded hallucinations, and maintains competitive utility. Our results suggest that multi-dimensional rubric rewards provide an effective training signal for aligning LLM agents in safety-critical tool-use settings.

URL PDF HTML ☆

赞 0 踩 0

2606.04050 2026-06-04 cs.LG cs.AI

LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection

LiftQuant: 通过维度提升和投影实现连续位宽的LLM

Liulu He, XuanAng Liu, Juntao Liu, Taolue Feng, Ting Lu, Chunsheng Gan, Zhiyv Peng, Yuan Du, Huanrui Yang, Yijiang Liu, Li Du

发表机构 * Nanyang Technological University（南洋理工大学）

AI总结提出LiftQuant框架，通过“提升-投影”机制实现准连续位宽控制，以精确适配内存预算，在70B模型上以2.4位压缩超越现有2位模型。

Comments ICML 2026 Spotlight

详情

AI中文摘要

现有的量化方法从根本上受限于刚性的整数位宽（例如2位、3位），导致存在“部署鸿沟”，即大型语言模型无法最优地适配特定的内存预算。为弥合这一鸿沟，我们引入了LiftQuant，一种新颖的框架，能够实现连续位宽控制，从而实现真正的帕累托最优部署。其核心创新是一种“提升-投影”机制，该机制通过从更高维度的“提升”空间中投影一个简单的1位格点来近似低维权重向量。关键在于，有效位宽仅由提升维度与原始维度的比率决定，这使得位宽可以准连续地调整，因为维度是一个灵活的结构参数。这种投影生成一个结构化但非均匀的码本，捕获了向量量化（VQ）的表达能力。虽然优于VQ，但LiftQuant的解码路径仅依赖于线性变换和1位均匀量化器，保持了硬件友好的特性。这种灵活性具有变革性：LiftQuant能够将70B的LLM压缩到2.4位，以精确适配24GB GPU，其性能显著超过在同一设备上部署的最先进的2位模型。我们的代码和检查点可在https://github.com/Heliulu/LiftQuant获取。

英文摘要

Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a ``deployment gap" where Large Language Models cannot be optimally fitted to specific memory budgets. To bridge this gap, we introduce LiftQuant, a novel framework that enables continuous bit-width control for true Pareto-optimal deployment. The core innovation is a ``lift-then-project" mechanism which approximates low-dimensional weight vectors by projecting a simple 1-bit lattice from a higher-dimensional ``lifted" space. Crucially, the effective bit-width is determined simply by the ratio of the lifted dimension to the original dimension, which allows the bit-width to be tuned quasi-continuous as the dimension is a flexible structural parameter. This projection generates a structured yet non-uniform codebook, capturing the expressive power of Vector Quantization (VQ). While beneficial over VQ, LiftQuant's decoding path relies solely on linear transformations and 1-bit uniform quantizers, retaining hardware-friendly nature. This flexibility is transformative: LiftQuant enables a 70B LLM to be compressed to 2.4 bits to precisely fit a 24GB GPU, where its performance significantly surpasses state-of-the-art 2-bit models fitted on the same device. Our code and ckpt is available at https://github.com/Heliulu/LiftQuant.

URL PDF HTML ☆

赞 0 踩 0