arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1925
2606.18098 2026-06-17 cs.AI 新提交

IsabeLLM: Automated Theorem Proving Applied to Formally Verifying Consensus

IsabeLLM: 自动化定理证明应用于共识的形式化验证

Elliot Jones, William Knottenbelt

发表机构 * Imperial College London(伦敦帝国学院)

AI总结 本文改进IsabeLLM自动化定理证明工具,通过检索增强生成、错误追踪和反例生成提升大语言模型上下文,并兼容最新Isabelle和Sledgehammer,用于验证比特币工作量证明共识。

详情
AI中文摘要

人工智能(AI)的进步使得AI用于定理证明成为形式化验证计算机系统的一种有前景的方法。尽管由于所需专业知识和努力,形式化验证传统上仅限于安全关键系统,但AI可以帮助自动化大量工作负载,使其更易访问。基于区块链的系统越来越受欢迎,并经常成为恶意行为者的目标,常常导致巨大的财务损失,这凸显了更好地验证这些系统和缓解漏洞的必要性。可以说,这些系统中最重要的组件是共识协议,它允许节点在潜在对抗环境中达成决策。在本文中,我们改进了IsabeLLM,即Isabelle中的自动化定理证明工具。具体而言,我们实现了检索增强生成框架、错误追踪和反例生成,以改善提供给大语言模型的上下文。还实现了与最新版本Isabelle和Sledgehammer的兼容性,以提高效率。我们比较了两个版本IsabeLLM在完成比特币工作量证明共识验证方面的性能。

英文摘要

Advances in Artificial Intelligence (AI) have led AI for Theorem Proving to become a promising means of formally verifying computer systems. Whilst formal verification is traditionally reserved for safety-critical systems due to the required amount of expertise and effort, AI can help to automate a large amount of this workload and make it far more accessible. Blockchain-based systems are becoming increasingly popular and are frequently targeted by malicious actors, often resulting in huge financial losses, highlighting the need to better verify these systems and mitigate vulnerabilities. Arguably the most important component of these systems is the consensus protocol, which allows nodes to agree on decisions in a potentially adversarial environment. In this paper, we improve upon IsabeLLM, the automated theorem proving tool in Isabelle. Namely, we implement a Retrieval-Augmented Generation framework, Error tracing and counterexample generation for improved context supplied to the Large Language Model. Compatibility with the latest version of Isabelle and Sledgehammer is also implemented for improved efficiency. We compare the performance of the two versions of IsabeLLM in their ability to complete the verification of Bitcoin's Proof of Work consensus.

2606.18097 2026-06-17 cs.RO 新提交

WireCraft: A Simulation Benchmark for Industrial DLO Manipulation

WireCraft:工业DLO操作仿真基准

Chongyu Zhu, Ramy ElMallah, Hyegang Kim, Zachary Tang, Jiachen Rao, Artem Arutyunov, Seungyeon Ha, Chi-Guhn Lee

发表机构 * Department of Mechanical and Industrial Engineering, University of Toronto(多伦多大学机械与工业工程系) Department of Computer Science, University of Toronto(多伦多大学计算机科学系) CREFLE Inc.(CREFLE公司)

AI总结 针对工业中可变形线性物体(DLO)操作缺乏统一基准的问题,提出WireCraft仿真基准,支持可配置难度和资产,涵盖三种任务族,并评估强化学习、模仿学习和视觉-语言-动作策略。

详情
AI中文摘要

可变形线性物体(DLO),如电线和电缆,是工业装配的核心。与刚体不同,刚体的状态由6自由度位姿捕获,而DLO具有无限维配置空间,并在与夹爪、夹具和工作空间的接触下连续变形,使其成为通用灵巧操作的一个高要求基准。尽管其重要性,策略开发和比较仍然困难:现有基准通常绑定到特定硬件设置,缺乏模块化和可定制的任务资产,或者研究没有真实世界工业线缆操作相关夹具的通用可变形物体任务。很少有基准将仿真、真实世界数据和共享评估协议对齐。为弥合这一差距,我们引入了WireCraft,一个用于工业DLO操作的仿真基准,具有可配置的难度和资产,涵盖三个任务族:连接器插入、夹子布线和通道就位。它支持两种互补的DLO物理模型——铰接式和可变形式,轨迹来自仿真和物理UR5。我们在共享指标下对强化学习(RL)、模仿学习(IL)和视觉-语言-动作(VLA)策略进行基准测试。基于特权状态的RL在每个任务族的一个代表性设置中实现了超过82%的成功率,确认了任务的良好定义。然而,对于连接器插入,从到达插座到接触丰富的对齐的过渡仍然是视觉RL、IL和VLA策略的关键瓶颈。这些结果表明,工业DLO操作虽然在特权状态下可处理,但对于当前基于视觉的学习仍然是一个开放的挑战。基准、数据和工具将在接收后开源。

英文摘要

Deformable Linear Objects (DLOs), such as wires and cables, are central to industrial assembly. Unlike rigid objects, whose state is captured by a 6-DoF pose, DLOs have an infinite-dimensional configuration space and deform continuously under contact with grippers, fixtures, and the workspace, making them a demanding benchmark for general dexterous manipulation. Despite their importance, policy development and comparison remain difficult: existing benchmarks are often tied to specific hardware setups, lack modular and customizable task assets, or study generic deformable-object tasks without the fixtures relevant to real-world industrial wire manipulation. Few benchmarks align simulation, real-world data, and shared evaluation protocols. To bridge this gap, we introduce WireCraft, a simulation benchmark for industrial DLO manipulation with configurable difficulty and assets, spanning three task families: connector insertion, clip routing, and channel seating. It supports two complementary DLO physics models, articulated and deformable, and the trajectories come from both simulation and a physical UR5. We benchmark reinforcement learning (RL), imitation learning (IL), and vision-language-action (VLA) policies under shared metrics. Privileged state-based RL solves a representative setting in each task family with over 82\% success, confirming the tasks are well-posed. For connector insertion, however, the transition from reaching the socket to contact-rich alignment remains a key bottleneck for vision RL, IL, and VLA policies. These results indicate that industrial DLO manipulation, though tractable under privileged state, remains an open challenge for current vision-based learning. The benchmark, data, and tools will be open-sourced upon acceptance.

2606.18096 2026-06-17 cs.LG cs.AI cs.DC 新提交

S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained Devices

S4oP:面向资源受限设备的结构化状态空间模型的算子级剪枝

Marco Deano, Filippo Ziche, Nicola Bombieri

发表机构 * University of Verona(威尼斯大学)

AI总结 提出一种针对S4和S4D模型的增量算子级剪枝方法,通过结构化掩码与微调交替进行,在保持预测性能的同时显著降低推理成本,首次系统研究SSM的结构化算子剪枝。

详情
AI中文摘要

结构化状态空间模型(SSMs),包括S4和S4D架构,最近已成为捕捉序列数据中长程依赖关系的基于注意力模型的有力替代方案。尽管其经验性能强劲,但由于计算和内存需求,在时间和资源受限的环境中部署这些模型仍然具有挑战性。在本文中,我们提出了一种新颖的增量式算子级剪枝方法,用于基于S4和S4D的模型,该方法在保持预测性能的同时显著降低推理成本。据我们所知,这是首个系统研究SSM结构化算子剪枝的工作。我们的方法通过将结构化掩码与微调交替进行,逐步剪枝模型算子,同时联合监控准确性和推理延迟。我们在一个统一的训练和评估框架中实现了这种方法,该框架能够系统地探索效率-准确性的权衡。在多个基准数据集上的实验表明,剪枝高达70%的模型算子在大多数情况下保持了原始模型的性能,同时显著降低了推理延迟。这些结果表明,结构化算子剪枝是一种有效且先前未被探索的提高SSM效率的策略,并有助于它们在资源受限的实际场景中的部署。

英文摘要

Structured State Space Models (SSMs), including the S4 and S4D architectures, have recently emerged as powerful alternatives to attention-based models for capturing long-range dependencies in sequential data. Despite their strong empirical performance, deploying these models in time- and resource-constrained settings remains challenging due to their computational and memory demands. In this paper, we propose a novel incremental, operator-level pruning approach for S4- and S4D-based models that significantly reduces inference cost while preserving predictive performance. To the best of our knowledge, this is the first work to systematically investigate structured operator pruning for SSMs. Our method progressively prunes model operators by interleaving structured masking with fine-tuning, while jointly monitoring accuracy and inference latency. We implement this approach within a unified training and evaluation framework that enables systematic exploration of efficiency-accuracy trade-offs. Experiments across multiple benchmark datasets show that pruning up to 70% of the model operators preserves the performance of the original models in most cases, while substantially reducing inference latency. These results demonstrate that structured operator pruning is an effective and previously unexplored strategy for improving the efficiency of SSMs and facilitate their deployment in practical, resource-constrained scenarios.

2606.18094 2026-06-17 cs.SD 新提交

Next-Turn: Duration-Aware Streaming Endpoint Detection via Time-to-Next-Speech-Onset Prediction

Next-Turn: 通过预测下一次语音开始时间进行持续时间感知的流式端点检测

Tristan Tsoi, Jiajun Deng, Yingke Zhu, Huu Quyen Dang, Tianxiang Cao, Nikita Kuzmin, Tao Zhong, Simon Lui

发表机构 * Central Media Technology Institute, Huawei(华为中央媒体技术研究院) The Chinese University of Hong Kong(香港中文大学) Nanyang Technological University(南洋理工大学)

AI总结 提出Next-Turn方法,以到下一次语音开始的时间为训练目标,直接利用语音时间戳,无需额外标注,在端点检测中比最强基线提高25.9%的绝对准确率,且与持续时间感知目标联合训练可进一步提升性能。

Comments Interspeech 2026

详情
AI中文摘要

端点检测(EPD)对于流式语音系统中的自然轮换至关重要。然而,由于说话者常因犹豫和不流畅而在话语中停顿,可靠地确定话语的端点具有挑战性。语义EPD已成为解决此问题的有前景方向,但受到模糊监督和严格流式约束的阻碍。我们提出Next-Turn,使用到下一次语音开始的时间作为训练目标,其中目标直接源自语音时间戳,无需额外标注。实验表明,所提方法优于传统的声学方法和最近的语义EPD基线,在320毫秒内端点准确率比最强基线绝对提高了25.9%。此外,与持续时间感知目标联合训练补充了标准二进制EPD,其增益随停顿增加而单调递增。

英文摘要

Endpoint detection (EPD) is essential for natural turn-taking in streaming speech systems. However, reliably determining the endpoint of an utterance is challenging because speakers often pause mid-utterance due to hesitations and disfluencies. Semantic EPD has emerged as a promising direction to address this issue but is hindered by ambiguous supervision and strict streaming constraints. We propose Next-Turn that uses the time-to-next-speech-onset as the training objective, where targets are derived directly from speech timestamps and require no additional annotation. Experiments show that the proposed method outperforms conventional acoustic and recent semantic EPD baselines, achieving a 25.9% absolute improvement in endpoint accuracy within 320 ms over the strongest baseline. In addition, joint training with the duration-aware objective complements standard binary EPD, with gains that increase monotonically with increasing pauses.

2606.18092 2026-06-17 cs.RO cs.AI 新提交

EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning

EAGG: 通过几何感知图条件实现具身对齐的抓取生成

Wanhao Niu, Qiyan Ke, Yuan Sun, Hao Sun, Jie Xu, Muyuan Ma, Ruiqi Hu, Fuchun Sun

发表机构 * Department of Computer Science and Technology, Tsinghua University(清华大学计算机科学与技术系) Beijing Moce Future Technology Co., Ltd.(北京墨策未来科技有限公司)

AI总结 提出EAGG,一种通过拓扑感知末端执行器图和几何感知令牌实现跨末端执行器抓取生成的统一模型,在MultiGripperGrasp基准上达到56.17%平均成功率,并显著降低接触距离。

Comments 16 pages, 8 figures. Code is available at this https URL (https://github.com/wanhaoniu/EAGG)

详情
AI中文摘要

跨末端执行器抓取生成旨在寻求一个统一的模型,能够泛化到不同物体以及从平行夹爪到灵巧末端执行器的不同具身形态。现有的抓取生成器通常针对固定具身设计,或使用静态描述符编码具身身份,当拓扑结构、驱动耦合和接触几何差异较大时,这会削弱迁移能力。我们提出EAGG,一种具身对齐的抓取生成器,通过拓扑感知的末端执行器图和具身特定的低维末端执行器控制空间来表示每个具身。一个冻结的末端执行器认知骨干将当前关节状态转换为几何感知令牌,作为可复用的形态先验,并通过迭代几何注入在采样过程中刷新这些令牌,使条件与不断演变的末端执行器几何保持同步。在MultiGripperGrasp基准上,EAGG在六个训练末端执行器上达到56.17%的平均成功率,与专门训练的差距在1.10个百分点以内,同时保持对微调和零样本末端执行器的迁移能力。迭代几何注入进一步将合并中位接触距离从0.239厘米降低到0.189厘米。这些结果表明,通过在共享生成器内对齐具身结构而非抑制具身差异,可以增强跨末端执行器抓取生成。代码可在该网址获取:https://this URL。

英文摘要

Cross-end-effector grasp generation seeks a unified model that generalizes across objects and across embodiments ranging from parallel grippers to dexterous end effectors. Existing grasp generators are typically designed for a fixed embodiment or encode embodiment identity with a static descriptor, which weakens transfer when topology, actuation coupling, and contact geometry differ substantially. We present EAGG, an embodiment-aligned grasp generator that represents each embodiment with a topology-aware end-effector graph and an embodiment-specific low-dimensional end-effector control space. A frozen end-effector-cognition backbone converts the current articulated state into geometry-aware tokens that act as a reusable morphology prior, and iterative geometry injection refreshes these tokens throughout sampling so that conditioning remains synchronized with the evolving end-effector geometry. On the MultiGripperGrasp benchmark, EAGG reaches 56.17% average success across six training end effectors, remaining within 1.10 percentage points of specialized training while preserving transfer to finetuning and zero-shot end effectors. Iterative geometry injection further reduces the pooled median contact distance from 0.239 cm to 0.189 cm. These results show that cross-end-effector grasp generation is strengthened by aligning embodiment structure inside a shared generator rather than suppressing embodiment differences. Code is available at this https URL.

2606.18089 2026-06-17 cs.LG 新提交

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

从推理轨迹到可复用模块:理解语言模型推理中的组合泛化

Lingjing Kong, Xin Liu, Guangyi Chen, Martin Q. Ma, Xiangchen Song, Yuekai Sun, Mikhail Yurochkin, Taylor W. Killian, Ruslan Salakhutdinov, Kun Zhang, Eric P. Xing, Zhengzhong Liu

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) Institute of Foundation Models(基础模型研究院) University of Michigan(密歇根大学)

AI总结 本文通过层次化潜在选择模型形式化组合泛化,理论证明SFT提供原子模块,RL分解轨迹实现组合泛化,实验验证RL能从复合轨迹中提取原子模块并重组解决新配置。

Comments ICML2026

详情
AI中文摘要

结合监督微调(SFT)和强化学习(RL)的训练后流程已成为将大型语言模型(LLM)转化为稳健推理者的关键方法。我们认为这种组合成功源于组合泛化,并通过层次化潜在选择模型将其形式化。在此框架中,推理轨迹由一系列离散的潜在选择变量生成,这些变量对应于可复用的原子模块,包括技能(局部操作)和路由机制(中间信息如何被选择、复用和组合)。在该模型中,我们从理论上证明SFT和RL扮演着不对称且互补的角色:SFT在组合轨迹中提供原始模块材料,而RL分解这些轨迹以识别潜在原子模块并实现组合泛化。我们设计受控实验验证这一理论。结果表明,RL可以从SFT提供的复合轨迹中提取原子模块,并将其重组以解决新配置。此外,我们发现基于复合轨迹的训练比基于孤立原子模块的训练产生更强的泛化能力。最后,我们研究了SFT和RL数据之间的关系,并确定了一种有效协议:SFT通过组合轨迹确保所有原子模块的覆盖,而RL专注于SFT支持范围之外的新组合以驱动探索。

英文摘要

Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined success is driven by compositional generalization, which we formalize through a hierarchical latent selection model. In this framework, reasoning traces are generated by a cascade of discrete latent selection variables corresponding to reusable atomic modules, including both skills (local operations) and routing mechanisms (how intermediate information is selected, reused, and composed). Within this model, we theoretically show that SFT and RL play asymmetric, complementary roles: SFT supplies the raw module materials in compositional traces, and RL decomposes those traces to identify the latent atomic modules and enable compositional generalization. We design controlled experiments to validate this theory. Our results demonstrate that RL can extract atomic modules from compound traces supplied by SFT and recombine them to solve new configurations. Moreover, we find that training on compound traces yields stronger generalization than training on isolated atomic modules. Finally, we investigate the relationship between SFT and RL data and identify an effective protocol in which SFT ensures coverage of all atomic modules through compositional traces, while RL focuses on novel compositions outside the SFT support to drive exploration.

2606.18080 2026-06-17 cs.LG 新提交

Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of Stability

Edge Flow: 一种可处理且可预测的梯度下降在稳定性边缘的连续时间模型

Pierre Marion

发表机构 * Inria, École Normale Supérieure, PSL Research University(法国国家信息与自动化研究所,巴黎高等师范学院,PSL研究大学)

AI总结 针对深度学习梯度下降在稳定性边缘(EoS)的动力学,提出Edge Flow模型,通过三个耦合常微分方程分解为中心、振荡方向和幅度,实现可处理且预测性的建模,并揭示锐度自稳定机制。

Comments 24 pages, 13 figures

详情
AI中文摘要

深度学习中的梯度下降可能在稳定性边缘(EoS)运行,此时损失Hessian的最大特征值徘徊在稳定阈值$2/\eta$附近,其中$\eta$是学习率。经典的梯度流和下降引理等分析工具在此不适用,因此需要寻找在EoS有效的连续时间模型。我们提出Edge Flow,一个由三个耦合常微分方程组成的系统,提供了梯度下降在EoS动力学的可处理、忠实且预测性的模型。Edge Flow将动力学分解为中心、振荡方向和振荡幅度。中心遵循对称化损失上的修正梯度流;方向通过Rayleigh商动力学跟踪Hessian的顶部特征向量;幅度根据锐度是否超过或低于阈值$2/\eta$而指数增长或衰减。关键在于,锐度稳定通过耦合动力学中的自稳定反馈循环实现。离散化Edge Flow每次迭代仅需两次梯度计算和一次Hessian-向量积。我们实验证明,Edge Flow至少与先前提出的连续时间EoS模型一样忠实地跟踪梯度下降的动力学,此外还能解析EoS开始时锐度的振荡,并为理解和缓解该区域的不稳定性提供了原则性框架。

英文摘要

Gradient descent in deep learning may operate at the edge of stability (EoS), a regime in which the largest eigenvalue of the loss Hessian hovers near the stability threshold $2/\eta$, where $\eta$ is the learning rate. Classical analysis tools such as gradient flow and the descent lemma do not apply here, motivating the search for a continuous-time model valid at EoS. We propose Edge Flow, a system of three coupled ordinary differential equations that provides a tractable, faithful, and predictive model of gradient descent dynamics at EoS. Edge Flow decomposes the dynamics into a center, an oscillation direction, and an oscillation magnitude. The center follows a modified gradient flow on a symmetrized loss; the direction tracks a top eigenvector of the Hessian via Rayleigh quotient dynamics; and the magnitude grows or decays exponentially depending on whether the sharpness exceeds or falls below the threshold $2/\eta$. Crucially, sharpness stabilization emerges from the coupled dynamics via a self-stabilization feedback loop. Discretizing Edge Flow only requires two gradient evaluations and one Hessian--vector product at each iteration. We demonstrate empirically that Edge Flow tracks the dynamics of gradient descent at least as faithfully as previously proposed continuous-time EoS models, while in addition resolving the oscillation of the sharpness at the onset of EoS, and that it provides a principled framework for understanding and mitigating instabilities in this regime.

2606.18075 2026-06-17 cs.AI 新提交

A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation

上下文感知与关系感知的图检索增强生成统一框架

Haoyang Zhong, Yifei Sun, Antong Zhang, Chunping Wang, Lei Chen, Yang Yang

发表机构 * Zhejiang University(浙江大学) Nanyang Technological University(南洋理工大学) Finvolution Group(信也科技集团)

AI总结 提出HyGRAG分层图RAG框架,通过构建融合上下文与关系的摘要、跨层级检索及动态更新,将多跳推理准确率提升9.7%。

Comments Accepted at The ACM Web Conference 2026 (WWW '26)

详情
AI中文摘要

检索增强生成(RAG)已成为用外部知识增强大型语言模型(LLM)的范式,但现有基于图的方法面临一个根本限制:以实体为中心和以块为中心的方法操作在锚定于原始文本的表示上,缺乏真正的知识融合。以实体为中心的方法连接逻辑相关的内容,以块为中心的方法保留上下文,但两者都通过相似性搜索分别检索信息,错过了其综合产生的新兴理解。在本文中,我们提出HyGRAG,一种分层图RAG框架,通过解决三个核心挑战超越源文档:构建真正整合上下文和关系信息的摘要,利用这些综合表示在检索中访问新兴知识,以及高效更新分层结构以适应动态语料库。具体地,我们在包含块和实体节点的混合图上设计分层索引结构,然后迭代聚类并生成基于LLM的摘要。接着,我们设计上下文和关系感知的检索,跨所有抽象级别搜索,同时通过社区成员关系扩展。此外,我们通过基于附加的算法实现动态知识更新,仅需局部重新摘要。实验结果表明,HyGRAG将多跳推理任务的平均准确率提高了9.7%,同时保持了合理的效率。

英文摘要

Retrieval-Augmented Generation (RAG) has emerged as a paradigm for enhancing large language models (LLMs) with external knowledge, yet existing graph-based methods face a fundamental limitation: entity-centric and chunk-centric approaches operate on representations anchored to original text without true knowledge fusion. While entity-centric methods connect logically related content and chunk-centric methods preserve context, both retrieve information separately through similarity search, missing emergent understanding from their synthesis. In this paper, we propose HyGRAG, a hierarchical graph RAG framework that transcends source documents by addressing three core challenges: constructing summaries that genuinely integrate contextual and relational information, leveraging these synthesized representations to access emergent knowledge during retrieval, and efficiently updating hierarchical structures for dynamic corpora. Specifically, we design hierarchical index structures over hybrid graphs with both chunk and entity nodes, then iteratively cluster them and generate LLM-based summaries. Then, we design context and relation-aware retrieval that searches across all abstraction levels while expanding through community membership. Moreover, we enable dynamic knowledge update through attachment-based algorithms with only local re-summarization. Experimental results show that HyGRAG improves the average accuracy of multi-hop reasoning tasks by 9.7%, while maintaining reasonable efficiency.

2606.18071 2026-06-17 cs.LG cs.AI 新提交

Volterra Generative Models

Volterra生成模型

Yusen Jia, Bingyan Han

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 提出Volterra生成模型,通过分数阶核引入路径依赖噪声,利用马尔可夫提升和残差状态学习,解决非马尔可夫动力学下的扩散生成问题,在MNIST和CIFAR-10上验证有效性。

Comments 36 pages

详情
AI中文摘要

基于分数的扩散模型通常使用布朗扰动,这提供了易处理的反向时间动力学,但施加了无记忆的噪声。我们引入了Volterra生成模型,这是一个连续时间的基于分数的框架,其前向过程通过分数阶核注入路径依赖噪声。为了处理非马尔可夫和非半鞅动力学,我们在两种情况下使用高斯求积构造有限维马尔可夫提升,并在平滑情况下使用混合有限差分指数近似。我们证明了平方误差界,推导了增广的线性高斯前向过程,并表明通过考虑残差状态和分析辅助高斯分数,学习可以保持数据维度。我们还识别了由共享布朗因子和有符号平滑区域权重引起的协方差和反向时间退化。退化激发了稳定条件处理,对于刚性较大的提升,则采用高斯桥重建采样器。在MNIST和CIFAR-10上的实验表明,具有小马尔可夫提升的持久分数扰动可以改善MNIST上的基于分数的生成,并为自然图像提供有前景的扩展,而桥采样器为较大提升提供了稳定机制。

英文摘要

Score-based diffusion models typically use Brownian perturbations, which provide tractable reverse-time dynamics but impose memoryless noising. We introduce Volterra generative models, a continuous-time score-based framework whose forward process injects path-dependent noise through fractional kernels. To handle the non-Markovian and non-semimartingale dynamics, we construct finite-dimensional Markovian lifts using Gaussian quadrature in both regimes and a hybrid finite-difference exponential approximation in the smooth regime. We prove squared error bounds, derive an augmented linear-Gaussian forward process, and show that the learning can remain data-dimensional by considering residual states and analytic auxiliary Gaussian scores. We also identify covariance and reverse-time degeneracies caused by shared Brownian factors and signed smooth-regime weights. The degeneracy motivates stabilized conditioning and, for stiff larger lifts, a Gaussian-bridge reconstruction sampler. Experiments on MNIST and CIFAR-10 show that persistent fractional perturbations with small Markovian lifts can improve score-based generation on MNIST and provide a promising extension to natural images, while the bridge sampler provides a stability mechanism for larger lifts.

2606.18068 2026-06-17 cs.AI 新提交

Agentic AI-based Framework for Mitigating Premature Diagnostic Handoff and Silent Hallucination in Healthcare Applications

基于Agentic AI的框架:缓解医疗应用中的过早诊断交接和无声幻觉

Divyansh Srivastava, Shreya Ghosh, Anshul Verma, Rajkumar Buyya

发表机构 * Distributed Systems (qCLOUDS) Lab, School of Computing Information Systems, The University of Melbourne, Australia 2Department of Computer Science Engineering, School of Electrical Computer Sciences (SECS), Indian Institute of Technology Bhubaneswar, India 3Department of Computer Science Banaras Hindu University, Varanasi, India

AI总结 提出多智能体框架,通过确定性编排约束和两个安全机制(神经符号状态跟踪门和语义熵不确定性量化门)解决LLM在医疗对话中的过早诊断交接和无声幻觉问题,诊断精度提升11.3个百分点。

详情
AI中文摘要

大型语言模型(LLM)和多智能体系统的最新进展推动了Agentic AI的兴起,显示出在医学推理方面的潜力。然而,开放式对话代理仍然容易受到两种关键故障模式的影响:过早的诊断交接和无声的临床幻觉,这些可能在到达患者之前未被检测到。在这项工作中,我们提出了一个多智能体框架,通过用确定性编排约束取代“LLM作为法官”的路由来解决这两个问题。该框架包含两个安全机制。首先,一个神经符号状态跟踪门通过阻止诊断转换直到所有必需的维度被收集,强制实施OLDCARTS临床协议(发病、位置、持续时间、特征、加重/缓解因素、放射、时间和严重程度)的完整性。其次,一个认知不确定性量化(UQ)门计算跨K=5个独立诊断样本的语义熵(H),以在交付前识别和拦截发散输出。我们使用由llama-3.1-70b-instruct模型驱动的模拟患者代理在150个测试案例上评估该系统。完整架构实现了49.3%的诊断精度,比无约束基线绝对提高了11.3个百分点。此外,我们观察到OLDCARTS完整性(σ)与语义熵(H)之间存在统计显著的负相关(r = -0.181,p < 0.05),表明结构化信息收集与诊断不确定性降低相关。

英文摘要

Recent advances in Large Language Models (LLMs) and multi-agent systems have driven the rise of Agentic AI, showing promise for medical reasoning. However, open-ended conversational agents remain prone to two critical failure modes: premature diagnostic handoff and silent clinical hallucinations that may go undetected before reaching the patient. In this work, we propose a multi-agent framework that addresses both issues by replacing ``LLM-as-a-judge'' routing with deterministic orchestration constraints. The framework incorporates two safety mechanisms. First, a neuro-symbolic state-tracking gate enforces completeness of the OLDCARTS clinical protocol (Onset, Location, Duration, Character, Aggravating/Alleviating factors, Radiation, Timing, and Severity) by blocking diagnostic transitions until all required dimensions are collected. Second, an epistemic uncertainty quantification (UQ) gate computes semantic entropy (H) across K=5 independent diagnostic samples to identify and intercept divergent outputs before delivery. We evaluate the system using simulated patient agents powered by the llama-3.1-70b-instruct model on 150 test cases. The full architecture achieves 49.3% diagnostic precision, representing an absolute improvement of 11.3 percentage points over an unconstrained baseline. Additionally, we observe a statistically significant negative correlation (r = -0.181, p < 0.05) between OLDCARTS completeness (\sigma) and semantic entropy (H), suggesting that structured information gathering is associated with reduced diagnostic uncertainty.

2606.18066 2026-06-17 cs.LG 新提交

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

NoiseTilt: 噪声倾斜反向核用于扩散奖励对齐

Jisung Hwang, Yunhong Min, Jaihoon Kim, I-Chao Shen, Minhyuk Sung

发表机构 * KAIST(韩国科学技术院) The University of Tokyo(东京大学)

AI总结 提出噪声倾斜反向核(NTRK),通过将奖励梯度注入噪声项实现奖励引导采样,保持预训练反向核不变,每步仅需单样本,在奖励对齐任务中超越现有方法且不损失样本质量。

Comments 52 pages

详情
AI中文摘要

我们引入了噪声倾斜反向核(NTRK),这是一种奖励引导的扩散采样器,通过噪声项注入奖励梯度,保持预训练反向核不变,且每步仅需一个样本。推理时的奖励引导采样极大地扩展了预训练扩散模型的通用性。然而,现有方法面临权衡。基于梯度的引导会偏移反向均值,引导生成但将中间状态推离模型训练区域,降低质量。基于搜索的方法保持质量但无法获得梯度信号。先前没有方法能同时实现两者。NTRK通过保持反向均值固定并将噪声项偏向高奖励来解决这一问题。我们引入了一个白化算子,这是NTRK背后的核心机制,使得奖励梯度可以安全地作为噪声注入而不丢失其引导信号。在各种奖励对齐任务中,NTRK在保持样本质量的同时超越了最新的基线方法。值得注意的是,在美学生成任务上,NTRK仅用25次NFE就超越了最佳基线在500次NFE时的奖励,计算量减少了20倍。

英文摘要

We introduce the Noise-Tilted Reverse Kernel (NTRK), a reward-guided diffusion sampler that injects reward gradients through the noise term, leaving the pretrained reverse kernel unchanged and requiring only a single sample per step. Reward-guided sampling at inference time has greatly expanded the versatility of pretrained diffusion models. Yet existing methods face a trade-off. Gradient-based guidance shifts the reverse mean, steering generation but pushing intermediate states outside the region that the model was trained on and degrading quality. Search-based methods preserve quality but gain no gradient signal. No prior method achieves both. NTRK resolves this by keeping the reverse mean fixed and biasing the noise term toward high reward. We introduce a whitening operator, the central mechanism behind NTRK, that makes the reward gradient safe to inject as noise without losing its guiding signal. Across various reward alignment tasks, NTRK outperforms recent state-of-the-art baselines without losing sample quality. Remarkably, on aesthetic generation, NTRK surpasses the reward of the best baseline at 500 NFEs using only 25 NFEs, a 20$\times$ reduction in compute.

2606.18063 2026-06-17 cs.CV cs.AI cs.LG 新提交

When LLMs Analyze Scars: From Images to Clinically-Meaningful Features

当LLM分析疤痕:从图像到临床有意义的特征

Ruman Wang, Hangting Ye

发表机构 * Liaoning University of Traditional Chinese Medicine(辽宁中医药大学) School of Artificial Intelligence, Jilin University(吉林大学人工智能学院)

AI总结 提出ScaFE框架,利用LLM作为知识驱动的特征工程师,将高维图像转化为低维临床可解释特征,在数据稀缺的疤痕分类中优于端到端深度学习方法。

详情
AI中文摘要

医学图像分类面临一个基本困境:虽然深度学习模型在大规模数据上表现卓越,但现实临床场景中由于标注成本、隐私约束和疾病罕见性,常常遭受严重的数据稀缺。这一挑战在病理性疤痕分类中尤为突出,区分瘢痕疙瘩和增生性疤痕需要微妙的专家知识,且标注图像极其有限。我们提出一种新范式,将大型语言模型(LLM)重新定位为知识驱动的特征工程师,而非端到端分类器。我们将此框架称为ScaFE(疤痕特征工程)。我们的关键洞察是,LLM编码了丰富的医学知识,可以外部化为可执行的特征提取代码,从而将高维图像转化为低维、临床可解释的表示。具体来说,我们使用既定的疤痕评估标准提示LLM,生成确定性的Python代码,提取与临床评分系统(如温哥华疤痕量表)对齐的特征。我们的方法提供三个关键优势:(1)数据效率,通过将知识获取与统计学习解耦,在有限训练样本下实现稳健性能;(2)隐私保护,原始图像在本地处理,不暴露给外部LLM;(3)可解释性,通过基于临床推理的显式特征。在疤痕分类上的大量实验表明,在数据有限条件下,我们的方法始终优于端到端深度学习基线或使用LLM作为黑盒分类器,为将LLM集成到数据高效且临床透明的医学AI系统中开辟了有前景的方向。

英文摘要

Medical image classification faces a fundamental dilemma: while deep learning models achieve remarkable performance at scale, real-world clinical scenarios often suffer from severe data scarcity due to annotation costs, privacy constraints, and disease rarity. This challenge is particularly pronounced in pathological scar classification, where differentiating keloids from hypertrophic scars requires subtle expert knowledge and labeled images are extremely limited. We propose a novel paradigm that repositions large language models (LLMs) as knowledge-driven feature engineers rather than end-to-end classifiers. We call this framework ScaFE (Scar Feature Engineering). Our key insight is that LLMs encode rich medical knowledge that can be externalized as executable feature extraction code, enabling the transformation of high-dimensional images into low-dimensional, clinically interpretable representations. Specifically, we prompt an LLM with established scar assessment criteria to generate deterministic Python code that extracts features aligned with clinical scoring systems such as the Vancouver Scar Scale. Our approach offers three key advantages: (1) data efficiency, achieving robust performance with limited training samples by decoupling knowledge acquisition from statistical learning; (2) privacy preservation, as raw images are processed locally without exposure to external LLMs; and (3) interpretability, through explicit features grounded in clinical reasoning. Extensive experiments on scar classification demonstrate that our method consistently outperforms end-to-end deep learning baselines or using LLMs as black-box classifiers under limited data conditions, establishing a promising direction for integrating LLMs into data-efficient and clinically transparent medical AI systems.

2606.18062 2026-06-17 cs.CL cs.AI cs.CR cs.HC 新提交

Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond

现实中的安全与隐私提示:用户向LLM提问及LLM如何回应

Hobin Kim, Xiaoyuan Wu, Omer Akgul, Lujo Bauer, Nicolas Christin

发表机构 * Carnegie Mellon University(卡内基梅隆大学) RSAC Labs(RSAC实验室)

AI总结 基于WildChat数据集,分析用户向大语言模型提出的安全与隐私问题,分类并评估模型回答质量与一致性。

详情
AI中文摘要

大型语言模型(LLM)被广泛用于满足用户的信息需求;用户向LLM询问天气、提出教育问题,并咨询法律帮助。一个特别未被充分研究的领域是数字安全与隐私(S&P),用户可能寻求LLM的帮助,了解如何保护他们的在线账户或保护计算机免受网络攻击。据我们所知,之前没有研究收集或分析用户向LLM提出的S&P问题;先前关于LLM回答质量的研究依赖于专家撰写的S&P误解或常见问题解答,而非用户查询。利用WildChat(一个从现实环境中收集的320万用户-LLM对话数据集),我们的研究识别出14,727个S&P提示,并将其分为九类,涵盖广泛的S&P主题。从S&P提示中,我们抽样了450个,并进行了主题分析,以描述用户向LLM提出的S&P问题。与主题分析分开,我们整理了270个寻求建议的S&P提示,其中用户询问建议、指导或特定的S&P信息。我们测量了将提示向LLM提出10次时的LLM回答质量和一致性。我们发现,商业LLM优于开放权重模型(GPT 5.5在98%的提示上提供了“足够好”的回答;Llama 4为47%)。然而,在平均获得高质量回答的提示中,商业模型有时会在不同运行中产生矛盾的回答,有可能使用户困惑或误导用户。

英文摘要

Large language models (LLMs) are widely used to fulfill users' information needs; users ask LLMs about the weather, pose educational questions, and consult them for legal assistance. One particularly understudied area is digital security and privacy (S&P), where users may seek LLMs' help on how to secure their online accounts or protect their computers from cyber attacks. To the best of our knowledge, no prior study has collected or analyzed the S&P questions users ask LLMs; prior research on LLM response quality relied on expert-authored S&P misconceptions or FAQs rather than user queries. Drawing from WildChat, a dataset of 3.2M user-LLM conversations collected in the wild, our study identifies 14,727 S&P prompts and categorizes them into nine categories covering a wide range of S&P topics. From the S&P prompts, we sampled 450 and performed a thematic analysis to characterize the S&P questions users ask LLMs. Separate from the thematic analysis, we curated 270 advice-seeking S&P prompts, where users ask for recommendations, guidance, or specific S&P information. We measured LLM response quality and consistency when posing the prompt to LLMs 10 times. We found that commercial LLMs outperform open-weight models (GPT 5.5 provided "good enough" responses on 98% of prompts; Llama 4 on 47%). However, among prompts that received high-quality responses on average, commercial models sometimes produce contradictory responses across runs, risking confusing or misleading users.

2606.18060 2026-06-17 cs.AI cs.CL 新提交

PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience

PseudoBench: 衡量自主研究如何助长伪科学

Xinyang Liao, Lingyu Li, Huacan Liu, Tianle Gu, Yang Yao, Tong Zhu, Yan Teng, Yingchun Wang

发表机构 * Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) Xi’an Jiao Tong University(西安交通大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出PseudoBench基准,通过200个伪科学声明-证据对评估AI代理识别和抵制伪科学的能力,发现当前系统极易生成有说服力的伪科学报告,拒绝率接近零。

Comments 26 pages, 21 figures

详情
AI中文摘要

随着基于大型语言模型的代理进入自主科学研究,它们抵制伪科学的能力变得越来越重要。否则,此类系统可能迅速生成看似合理但具有误导性的研究,污染学术文献并侵蚀对科学的信任。我们提出了PseudoBench,一个对抗性基准,用于评估自主研究系统能否识别和抵制伪科学叙述。PseudoBench包含五个领域的200个精心策划的伪科学声明-证据对,并通过从实验到写作的端到端研究流程评估代理。测试了七个最先进的代理,我们发现当前系统很容易生成与伪科学前提一致的有说服力的报告,拒绝率接近零,最高抵制率仅为27.4%。更强的代理有可能用更复杂的科学语言包装伪科学,增加其表面可信度。这些发现揭示了助长伪科学的惊人能力,呼吁在广泛部署之前进行科学对齐。

英文摘要

As Large Language Model based agents enter autonomous scientific research, their ability to resist pseudoscience becomes increasingly important. Otherwise, such systems may rapidly generate plausible yet misleading studies that contaminate academic literature and erode trust in science. We present PseudoBench, an adversarial benchmark for evaluating whether agentic auto-research systems can identify and resist pseudoscientific narratives. PseudoBench contains 200 curated pseudoscientific claim-evidence pairs across five domains and evaluates agents through an end-to-end research pipeline from experiments to writing. Testing seven state-of-the-art agents, we find that current systems readily produce persuasive reports that align with pseudoscientific premises with near-zero refusal rates and the highest resistance of only 27.4%. Stronger agents risk packaging pseudoscience in more sophisticated scientific language, increasing its apparent credibility. These findings reveal an alarming capacity to fuel pseudoscience, calling for scientific alignment before widespread deployment.

2606.18056 2026-06-17 cs.CL 新提交

ConSA: Controllable Sparsity in Hybrid Attention via Learnable Allocation

ConSA: 通过可学习分配实现混合注意力中的可控稀疏性

Yao Chen, Yinqi Yang, Junyuan Shang, Xiangzhao Hao, Simeng Zhang, Yilong Chen, Tingwen Liu, Shuohuan Wang, Dianhai Yu

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) School of Cyber Security, University of Chinese Academy of Sciences(中国科学院大学网络空间安全学院) Baidu Inc.(百度公司)

AI总结 提出ConSA框架,通过L0正则化和增广拉格朗日约束学习全注意与滑动窗口注意的最优分配,实现用户指定的稀疏目标,在0.6B和1.7B规模LLM上优于规则基线,并发现底层SWA与中层FA的集中分配模式。

详情
AI中文摘要

结合全注意(FA)和滑动窗口注意(SWA)的混合架构是高效LLM推理的一种有前景的范式。然而,现有方法通常依赖手工规则或简单的后验启发式进行FA/SWA分配,并且对这些设计背后的注意行为分析有限。我们提出混合注意力中的可控稀疏性(ConSA),一个在用户指定稀疏目标下学习最优FA/SWA分配的框架。ConSA采用L0正则化学习选择每个注意力单元FA或SWA的二元掩码,同时增广拉格朗日约束在层或KV头粒度上强制执行目标稀疏性。我们在0.6B和1.7B规模的两个LLM上评估ConSA。学习到的分配一致优于基于规则的基线,其中KV头级分配比层级分配带来明显增益。学习到的模式将SWA置于底层,并将FA集中在连续的中间层块中,这与基于规则方法中均匀交错模式不同。这种结构在模型规模、稀疏级别和分配粒度上持续存在,揭示了学习分配背后细粒度的内在注意行为谱。

英文摘要

Hybrid architectures combining full attention (FA) and sliding-window attention (SWA) are a promising paradigm for efficient LLM inference. However, existing methods typically rely on hand-crafted rules or simple post-hoc heuristics for FA/SWA allocation and offer limited analysis of the attention behaviors underlying these designs. We propose Controllable Sparsity in Hybrid Attention (ConSA), a framework that learns optimal FA/SWA assignment under a user-specified sparsity target. ConSA employs L0 regularization to learn binary masks selecting between FA and SWA for each attention unit, while an augmented Lagrangian constraint enforces the target sparsity at either layer or KV-head granularity. We evaluate ConSA on two LLMs at the 0.6B and 1.7B scales. Learned allocations consistently outperform rule-based baselines, with KV-head-wise allocation yielding clear gains over layer-wise allocation. The learned patterns place SWA in the bottom layers and concentrate FA into contiguous middle-layer blocks, diverging from evenly interleaved patterns in rule-based methods. This structure persists across model scales, sparsity levels, and allocation granularities, revealing a fine-grained spectrum of intrinsic attention behaviors that underlies the learned allocation.

2606.18053 2026-06-17 cs.RO 新提交

A Hybrid Optimization Framework for Grasp Synthesis under Partial Observations

一种用于部分观测下抓取合成的混合优化框架

Wenzheng Zhang, Fahira Afzal Maken, Tin Lai, Fabio Ramos

发表机构 * School of Computer Science, The University of Sydney(悉尼大学计算机科学学院) Data61, CSIRO(澳大利亚联邦科学与工业研究组织Data61) NVIDIA(英伟达)

AI总结 提出结合基于学习的能量模型与解析迭代最近点方法的混合框架,从部分观测点云生成鲁棒抓取,在67个物体5360次抓取尝试中平均成功率达60.9%,优于现有方法。

详情
AI中文摘要

我们提出一种混合抓取合成框架,该框架将基于学习的能量模型(EBM)与解析迭代最近点(ICP)方法相结合,以从部分观测的点云生成鲁棒抓取。学习到的能量函数在Stein变分梯度下降(SVGD)框架中充当先验,指导抓取配置的迭代优化。在67个物体上的5360次抓取尝试评估中,我们的方法实现了60.9%的平均成功率,优于AnyGrasp(31.1%)、抓取姿态检测(48.4%)和AS-ICP(56.6%)。这些结果突显了我们方法的强泛化能力,并展示了将数据驱动学习与几何优化相结合如何解决单独使用任一策略的局限性。

英文摘要

We propose a hybrid grasp synthesis framework that combines a learning-based Energy-Based Model (EBM) with an analytical Iterative Closest Point (ICP) method to generate robust grasps from partially observed point clouds. The learned energy function acts as a prior within a Stein Variational Gradient Descent (SVGD) framework, guiding iterative refinement of grasp configurations. Evaluated on 67 objects with 5,360 grasp attempts, our method achieves an average success rate of 60.9\%, outperforming AnyGrasp (31.1\%) and Grasp Pose Detection (48.4\%) and AS-ICP (56.6\%). These results highlight the strong generalization ability of our approach and demonstrate how combining data-driven learning with geometric optimization addresses the limitations of either strategy in isolation.

2606.18051 2026-06-17 cs.CL 新提交

Compositional Skill Routing for LLM Agents: Decompose, Retrieve, and Compose

面向LLM智能体的组合技能路由:分解、检索与组合

Xueping Gao

发表机构 * Alibaba Cloud(阿里云)

AI总结 提出SkillWeaver框架,通过迭代技能感知分解(SAD)将复杂查询分解为原子子任务,检索对应技能并组合为可执行计划,在CompSkillBench上验证了分解质量是主要瓶颈,SAD将分解准确率提升32.7%。

详情
AI中文摘要

LLM智能体越来越依赖外部技能(可复用的工具规范),但现实任务通常需要组合多种技能,而不仅仅是选择一种。我们将此形式化为组合技能路由问题:给定一个复杂的用户查询和一个大型技能库,将查询分解为原子子任务,为每个子任务检索适当的技能,并组合成一个可执行的计划。我们提出了SkillWeaver,一个分解-检索-组合框架,结合了LLM任务分解器、带有FAISS索引的双编码器技能检索器以及依赖感知的DAG规划器。为了支持评估,我们引入了CompSkillBench,这是一个包含300个组合查询的基准测试,覆盖来自公共MCP生态系统的2,209个真实MCP服务器技能,跨越24个功能类别。我们的实验表明,任务分解质量是主要瓶颈:标准LLM分解在步骤级别仅达到34.2%的类别召回率。为了解决这个问题,我们提出了迭代技能感知分解(SAD),这是一种检索增强的反馈循环,能够迭代地将分解与可用技能对齐。SAD在单次迭代中将分解准确率从51.0%提高到67.7%(+32.7%,Wilcoxon p < 10^-6);DA条件分析证实,正确的粒度是有效检索的前提(当DA=1时,CatR@1从34%上升到41%)。SkillWeaver将上下文窗口消耗减少了99%以上,迁移实验证实了泛化能力(即使目标类别不在检索池中,相对DA增益仍达到+35.6%)。

英文摘要

LLM agents increasingly rely on external skills -- reusable tool specifications -- but real-world tasks often require composing multiple skills, not just selecting one. We formalize this as the Compositional Skill Routing problem: given a complex user query and a large skill library, decompose the query into atomic sub-tasks, retrieve the appropriate skill for each sub-task, and compose an executable plan. We present SkillWeaver, a decompose-retrieve-compose framework combining an LLM task decomposer, a bi-encoder skill retriever with FAISS indexing, and a dependency-aware DAG planner. To support evaluation, we introduce CompSkillBench, a benchmark of 300 compositional queries over 2,209 real MCP server skills spanning 24 functional categories, sourced from the public MCP ecosystem. Our experiments reveal that task decomposition quality is the primary bottleneck: standard LLM decomposition reaches only 34.2% category recall at the step level. To address this, we propose Iterative Skill-Aware Decomposition (SAD), a retrieval-augmented feedback loop that iteratively aligns decomposition with available skills. SAD improves decomposition accuracy from 51.0% to 67.7% (+32.7%, Wilcoxon p < 10^-6) in a single iteration; DA-conditioned analysis confirms that correct granularity is the prerequisite for effective retrieval (CatR@1 rises from 34% to 41% when DA=1). SkillWeaver reduces context window consumption by over 99%, and transfer experiments confirm generalization (+35.6% relative DA gain even when target categories are absent from the retrieval pool).

2606.18049 2026-06-17 cs.LG 新提交

ConTex: Reformulating Counterfactual Generation For Time Series Forecasting

ConTex:重新定义时间序列预测的反事实生成

Jan Voets, Hasan Tercan, Tobias Meisen, Sebastian Baum

发表机构 * Institute for Technologies and Management of Digital Transformation, University of Wuppertal(伍珀塔尔大学数字转型技术与管理研究所)

AI总结 针对时间序列预测中反事实解释的不一致和高计算成本问题,提出ConTex模型,通过全局一致的干预策略实现单次前向传播生成稀疏反事实,显著降低计算成本并支持实时应用。

Comments 19 pages, 5 figures, 14 tables

详情
AI中文摘要

基于深度学习的时间序列预测的决策制定不仅需要准确的预测,还需要可操作的见解。然而,当前的架构本身并不提供此类信息。具体来说,需要指导如何修改当前条件,以便从预测结果转向期望的未来情景。反事实解释为此任务提供了自然框架,因为它们表示改变模型预测的最小输入变化,指示何时以及如何进行干预。现有方法依赖于实例级优化,导致跨实例的不一致性、高计算成本以及在实时环境中的有限适用性。为了解决这些限制,我们将时间序列预测的反事实生成重新定义为学习全局一致的干预策略的问题,允许通过单个共享函数生成反事实。我们提出了反事实时间序列解释(ConTex),一种模型无关的解耦架构,包括时间上下文编码器和条件编码器,后接两个头部,分别捕获时间相关性和修改强度方面的干预。这种结构通过单次前向传播在时间和特征维度上产生有针对性的、可解释的干预,克服了基于实例的方法的不稳定性和不一致性,使其适用于实时应用。在多个预测架构和基准数据集上,ConTex在生成稀疏反事实的同时实现了最先进的有效性,最小化了必要干预的数量。此外,与实例级生成相比,我们的方法将计算成本降低了至少12-36倍,并支持约0.007秒的实时推理。

英文摘要

Decision-making with deep learning-based time series forecasting requires not only accurate predictions but also actionable insights. However, current architectures do not inherently provide such information. Specifically, guidance is needed on how current conditions must be modified to shift from a predicted outcome to a desired future scenario. Counterfactual explanations provide a natural framework for this task, as they represent minimal input changes that alter the model's prediction, indicating when and how intervention is required. Existing approaches rely on instance-wise optimization, leading to inconsistency across instances, high computational costs, and limited applicability in real-time settings. To address these limitations, we reformulate counterfactual generation for time series forecasting as the problem of learning a globally consistent intervention strategy, allowing counterfactuals to be generated through a single shared function. We propose Counterfactual Time Series Explanations (ConTex), a model-agnostic, decomposed architecture comprising a temporal context encoder and a conditional encoder, followed by two heads that capture interventions in terms of temporal relevance and modification strength. This structure overcomes the instability and inconsistency of instance-based approaches by producing targeted, interpretable interventions across time and feature dimensions in a single forward pass, making it suitable for real-time applications. Across multiple forecasting architectures and benchmark datasets, ConTex achieves state-of-the-art validity while generating sparse counterfactuals that minimize the number of necessary interventions. Additionally, our approach reduces computational cost by at least 12-36x compared to instance-wise generation and supports real-time inference at approximately 0.007 seconds.

2606.18043 2026-06-17 cs.RO cs.LG 新提交

Uncertainty Quantification for Flow-Based Vision-Language-Action Models

基于流的视觉-语言-动作模型的不确定性量化

Ralf Römer, Maximilian Seeliger, Saida Liu, Ben Sturgis, Marco Bagatella, Daniel Marta, Andreas Krause, Angela P. Schoellig

发表机构 * TU Munich(慕尼黑工业大学) ETH Zurich(苏黎世联邦理工学院) MPI IS Tübingen(马克斯·普朗克智能系统研究所)

AI总结 提出利用速度场差异(VFD)量化流匹配模型中的认知不确定性,用于故障检测和主动微调,在LIBERO基准上实现高效任务适应。

Comments Project page: this http URL (http://tum-lsy.github.io/uq_vla/). 28 pages, 12 figures

详情
AI中文摘要

视觉-语言-动作模型(VLAs)将视觉-语言骨干网络与通过大规模机器人数据集上的流匹配训练的生成式动作头相结合。尽管在机器人操作中表现出强大的经验性能,但VLAs缺乏量化其预测置信度和检测动作可能不可靠的机制。这对于在非平稳环境中的实际部署构成了关键限制,因为模型不可避免地会遇到其预训练分布之外的场景,并可能在没有警告的情况下失败。为了解决这个问题,我们通过利用小集成中的速度场差异(VFD),推导出一种量化流匹配模型中认知不确定性的高效方法。我们成功地将这种不确定性估计用于部署期间的故障检测和基于流的VLA的主动微调。为此,我们提出了SAVE,一个不确定性引导的主动多任务微调框架,减少了将VLA适应新任务所需的高成本专家演示数量。通过在LIBERO基准上的广泛实验,我们证明VFD能产生更校准的不确定性估计,预测下游性能,VFD在检测故障方面表现出色,并且使用SAVE进行不确定性引导的数据采集所需的样本比基线至少少22%。总之,我们的工作表明,量化基于流的VLA中的认知不确定性既提高了故障感知能力,也提高了适应性。项目网站:此http URL。

英文摘要

Vision-language-action models (VLAs) combine vision-language backbones with expressive generative action heads trained via flow matching on large-scale robotic datasets. Despite their strong empirical performance in robotic manipulation, VLAs lack mechanisms to quantify confidence in their predictions and to detect when their actions may be unreliable. This presents a critical limitation for real-world deployment in non-stationary environments, where models inevitably encounter scenarios outside their pretraining distribution and may fail without warning. To address this, we derive an efficient method for quantifying epistemic uncertainty in flow-matching models by leveraging velocity-field disagreement (VFD) across a small ensemble. We successfully use this uncertainty estimate for failure detection during deployment and active fine-tuning of flow-based VLAs. To this end, we propose SAVE, a framework for uncertainty-guided active multitask fine-tuning that reduces the number of costly expert demonstrations required to adapt VLAs to new tasks. Through extensive experiments on the LIBERO benchmark, we demonstrate that VFD yields better-calibrated uncertainty estimates predictive of downstream performance, that VFD achieves strong performance in detecting failures, and that uncertainty-guided data acquisition with SAVE requires at least 22% fewer samples than baselines. In summary, our work shows that quantifying epistemic uncertainty in flow-based VLAs improves both failure awareness and adaptation. Project website: this http URL.

2606.18037 2026-06-17 cs.AI cs.CL cs.MA 新提交

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

ProvenanceGuard: 基于MCP的LLM智能体的源感知事实性验证

Ander Alvarez, Santhiya Rajan, Samuel Mugel, Román Orús

发表机构 * Multiverse Computing Parque Cientifico y Tecnológico de Gipuzkoa(吉普斯夸科技园) Centre for Social Innovation(社会创新中心) Donostia International Physics Center(多诺斯蒂亚国际物理中心) Ikerbasque Foundation for Science(伊克尔巴斯克科学基金会)

AI总结 提出ProvenanceGuard,一种源感知验证器,通过追踪MCP工具调用、分解声明并路由到特定源,检测跨源混淆错误,在医疗领域数据集上优于源无关基线。

Comments 20 pages, 4 figures

详情
AI中文摘要

使用工具的LLM智能体越来越多地采用模型上下文协议(MCP)从异构证据源(包括搜索、API、数据库、临床记录和处方工具)中获取答案。标准的事实性指标通常测试答案是否得到汇总证据的支持,但忽略了一种源感知的失败模式:一个声明可能在某个地方得到支持,却被归因于错误的来源。我们称此为跨源混淆。我们引入ProvenanceGuard,一种用于MCP基础答案的源感知验证器。它消耗捕获的MCP轨迹,包含稳定的工具ID、源ID和原始输出;将答案分解为原子声明;将声明路由到特定源的证据;使用NLI和令牌对齐代理检查支持度;比较声明的归属与路由的源;并返回每个声明的判定以及答案级别的允许/阻止决策。被阻止的答案可以通过检索增强的答案修订和重新验证来修复。我们在281个医疗领域的MCP智能体轨迹上进行评估。一个包含266条轨迹的裁定子集产生了2,325个由LLM辅助的声明标签(按轨迹划分);361个保留标签由人工验证。在40条轨迹的保留子集上,ProvenanceGuard在260个符合源条件的声明上实现了阻止F1分数0.802和源准确率0.858,优于不输出声明到源ID的源无关基线。在一个更困难的多源基准上,它达到了阻止F1分数0.846,而源加关系准确率降至0.229,表明在语义相近的源上精确的源归属仍然困难。修复和重新验证解决了完整轨迹集中的所有被阻止答案,通常通过保守回退。在50个受控的临床混淆探测中,ProvenanceGuard检测到所有注入的归属交换,没有保留错误的归属。这些结果表明,源归属是基于MCP的智能体事实性验证的一个独立维度。

英文摘要

Tool-using LLM agents increasingly use the Model Context Protocol (MCP) to answer from heterogeneous evidence sources, including search, APIs, databases, clinical records, and formulary tools. Standard factuality metrics usually test whether an answer is supported by pooled evidence, missing a provenance-sensitive failure mode: a claim may be supported somewhere while being attributed to the wrong source. We call this cross-source conflation. We introduce ProvenanceGuard, a source-aware verifier for MCP-grounded answers. It consumes captured MCP traces with stable tool IDs, source IDs, and raw outputs; decomposes answers into atomic claims; routes claims to source-specific evidence; checks support with NLI and a token-alignment proxy; compares stated attribution with the routed source; and returns per-claim verdicts plus an answer-level allow/block decision. Blocked answers can be repaired with retrieval-augmented answer revision and re-verified. We evaluate on 281 medical-domain MCP-agent traces. A 266-trace adjudicated subset yields 2,325 LLM-assisted claim labels split by trace; 361 held-out labels are human-verified. On the 40-trace held-out split, ProvenanceGuard achieves block F1 0.802 and source accuracy 0.858 over 260 source-eligible claims, outperforming source-blind baselines that do not emit claim-to-source IDs. On a harder multi-source benchmark it reaches block F1 0.846, while source-plus-relation accuracy drops to 0.229, showing that exact source ownership remains difficult with semantically close sources. Repair-and-reverify resolves all blocked answers in the full trace set, often via conservative fallback. In 50 controlled clinical conflation probes, ProvenanceGuard detects all injected attribution swaps with no retained wrong attribution. These results show that source attribution is an independent axis for factuality verification in MCP-based agents.

2606.18033 2026-06-17 cs.CL cs.AI 新提交

When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning

当英语不是最好的老师:跨语言上下文学习中的源语言效应

Fred Philippy, Siwen Guo, Jacques Klein, Tegawendé F. Bissyandé

发表机构 * Snt, University of Luxembourg(卢森堡大学科学技术系) Luxembourg Institute of Science and Technology(卢森堡科学技术研究院)

AI总结 研究跨语言上下文学习(ICL)中源语言选择的影响,发现基于微调的预期在ICL中不成立,提出有效选择源语言的替代启发式方法。

Comments Accepted at 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026), co-located with ACL 2026

详情
AI中文摘要

跨语言迁移在多语言自然语言处理中已在监督微调背景下得到广泛探索,其中数据可用性和语言相似性等因素在很大程度上决定了迁移质量。随着该领域转向少样本上下文学习(ICL),人们通常认为微调中的见解会原封不动地延续下来。然而,这一假设尚未经过严格评估,因此如何为跨语言ICL选择源语言的问题仍然悬而未决。我们对ICL中的跨语言迁移进行了广泛的实证研究,涵盖七项任务、六个模型和一组类型多样的语言。我们进一步分析了语言混淆,这是跨语言ICL中生成任务的关键障碍。我们的结果表明,基于微调的传统预期在ICL场景中并不一致适用,并指出了有效选择源语言的替代启发式方法。

英文摘要

Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward few-shot In-Context Learning (ICL), it is often presumed that insights from fine-tuning carry over unchanged. Yet this assumption has not been rigorously evaluated, leaving open the question of how to choose source languages for cross-lingual ICL. We conduct a broad empirical study of cross-lingual transfer in ICL spanning seven tasks, six models, and a typologically diverse set of languages. We further analyze language confusion, a key obstacle for generative tasks in cross-lingual ICL. Our results show that conventional fine-tuning-based expectations do not consistently apply in the ICL regime and point to alternative heuristics for selecting source languages effectively.

2606.18024 2026-06-17 cs.LG cs.AI 新提交

Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation

灾难性遗忘是低秩的:持续适应的函数空间理论

Ido Nitzan Hidekel, Dan Raviv

发表机构 * Tel Aviv University(特拉维夫大学)

AI总结 本文在神经正切核(NTK)框架下提出函数空间理论,推导出新任务训练导致旧任务预测漂移的闭式表达式,揭示遗忘集中在少量旧任务NTK本征模式上,并给出低秩特性与Kronecker缩放规则。

Comments Accepted to the ICML 2026 Workshop on Continual Adaptation at Scale: Towards Sustainable AI

详情
AI中文摘要

持续适应中的灾难性遗忘通常通过参数漂移、重放或蒸馏来研究,但这些观点未能识别哪些输出空间方向是脆弱的。我们在NTK机制下给出一个函数空间解释:新任务训练通过跨任务核诱导旧任务预测漂移,从而在新任务梯度步骤之前得到遗忘向量的闭式预测器。在冻结主干线性头PEFT-CL中,模型在可训练参数上是线性的,预测器精确到数值精度;对于非线性适配器/全微调,它是局部NTK近似。同一表达式揭示遗忘集中在少量旧任务NTK本征模式上,并在冻结线性头下给出脆弱秩的Kronecker缩放规则。这些结果澄清了与先前NTK重叠理论的关系,解释了为什么参数空间正则化器可能遗漏输出空间干扰,并激发了一种有针对性的谱正则化器。

英文摘要

Catastrophic forgetting in continual adaptation is usually studied through parameter drift, replay, or distillation, but these views do not identify which output-space directions are vulnerable. We give a function-space account in the NTK regime: new-task training induces old-task prediction drift through the cross-task kernel, yielding a closed-form predictor for the forgetting vector before any new-task gradient step. In frozen-backbone linear-head PEFT-CL, where the model is linear in the trainable parameters, the predictor is exact up to numerical precision; for nonlinear adapters/full fine-tuning, it is a local NTK approximation. The same expression reveals that forgetting concentrates in a small number of old-task NTK eigenmodes and under frozen linear heads gives a Kronecker scaling rule for the vulnerable rank. These results clarify the relation to prior NTK-overlap theory, explain why parameter-space regularizers can miss output-space interference, and motivate a targeted spectral regularizer.

2606.18023 2026-06-17 cs.LG cs.AI 新提交

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

LoopCoder-v2: 仅循环一次以实现高效的测试时计算扩展

Jian Yang, Shawn Guo, Wei Zhang, Tianyu Zheng, Yaxin Du, Haau-Sing Li, Jiajun Wu, Yue Song, Yan Xing, Qingsong Cai, Zelong Huang, Chuan Hao, Ran Tao, Xianglong Liu, Wayne Xin Zhao, Mingjie Tang, Weifeng Lv, Ming Zhou, Bryan Dai

发表机构 * Beihang University(北京航空航天大学) IQuest Research Langboat(浪波) Renmin University of China(中国人民大学)

AI总结 本文提出并行循环Transformer(PLT)并研究循环次数选择,发现两循环变体在代码生成等任务上显著提升,而三循环以上性能下降,揭示了增益-成本权衡。

详情
AI中文摘要

循环Transformer通过重复应用共享块来扩展潜在计算,但顺序循环会随着循环次数增加延迟和KV缓存内存。并行循环Transformer(PLT)通过跨循环位置偏移(CLP)和共享KV门控滑动窗口注意力来缓解这一成本,使循环次数成为实际设计选择。因此,我们通过增益-成本视角研究PLT循环次数选择:额外的循环可能细化表示,但CLP在每个循环边界引入位置不匹配。我们通过从头训练LoopCoder-v2(一组具有不同循环次数的7B PLT编码器)在18T token上,随后进行匹配的指令调优和评估来实例化这项研究。经验上,两循环变体在代码生成、代码推理、代理软件工程和工具使用基准上比无循环基线带来广泛提升,将SWE-bench Verified从43.0提高到64.4分,Multi-SWE从14.0提高到31.0分。相比之下,三循环或更多循环的变体性能下降,揭示了强烈的非单调循环次数效应。我们的诊断表明,循环2提供了主要的生产性细化,而后续循环产生递减、振荡的更新和降低的表示多样性。由于CLP引起的不匹配在细化收益缩小时大致固定,偏移成本日益占主导。这种增益-成本权衡解释了PLT在两循环处饱和,并为循环次数选择提供了诊断。

英文摘要

Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.

2606.18022 2026-06-17 cs.LG 新提交

Recursive Scaling in Masked Diffusion Models

掩码扩散模型中的递归缩放

Alba Carballo-Castro, Julianna Piskorz, Paulius Rauba, Mihaela van der Schaar, Pascal Frossard

发表机构 * LTS4, EPFL, Lausanne, Switzerland(瑞士洛桑联邦理工学院LTS4实验室) University of Cambridge, Cambridge, UK(英国剑桥大学)

AI总结 提出递归掩码扩散模型(R-MDMs),通过在每个扩散步骤中重复应用同一去噪变换器增加递归深度,实现参数高效缩放,在数独和倒计时等结构化生成任务中,以更少参数匹配非递归基线性能。

详情
AI中文摘要

掩码扩散模型(MDMs)最近成为一种有前景的序列生成范式。传统上,缩放MDMs通过增加参数数量或去噪步骤数来实现。我们引入了递归掩码扩散模型(R-MDMs),它通过在每个扩散步骤中重复应用相同的去噪变换器,将递归深度作为第三个缩放轴。递归通过参数重用实现了输出的迭代细化,在不增加参数数量的情况下增加了有效模型深度。在包括数独和倒计时在内的结构化生成任务中,我们展示了R-MDMs实现了显著提升的参数效率:具有$L$次递归迭代的模型通常与具有大约$L$倍参数的非递归基线性能相当。此外,递归细化可以部分替代额外的去噪步骤,使得递归模型在推理时以更少的前向传播达到相同的生成质量。这些结果表明,递归深度是MDMs的一种实用缩放机制,提高了参数效率和测试时计算分配。

英文摘要

Masked diffusion models (MDMs) have recently emerged as a promising paradigm for sequence generation. Scaling MDMs is conventionally achieved by increasing the parameter count or the number of denoising steps. We introduce Recursive Masked Diffusion Models (R-MDMs), which add recursive depth as a third scaling axis by repeatedly applying the same denoising transformer within each diffusion step. Recursion enables iterative refinement of the output through parameter reuse, increasing effective model depth without increasing parameter count. Across structured generation tasks, including Sudoku and Countdown, we show that R-MDMs achieve substantially improved parameter efficiency: a model with $L$ recursive iterations often matches the performance of non-recursive baselines with roughly $L\times$ more parameters. Moreover, recursive refinement can partially substitute for additional denoising steps, allowing recursive models to reach the same generation quality with fewer forward passes at inference time. These results suggest that recursive depth is a practically useful scaling mechanism for MDMs, improving both parameter efficiency and the allocation of test-time compute.

2606.18021 2026-06-17 cs.AI cs.CL cs.LG cs.MA 新提交

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

LegalHalluLens: 类型化幻觉审计与校准的多智能体辩论以实现可信赖的法律AI

Lalit Yadav, Akshaj Gurugubelli

发表机构 * Independent Researcher, Sunnyvale, CA, USA(独立研究者,美国加州太阳谷) Independent Researcher, San Diego, CA, USA(独立研究者,美国加州圣地亚哥)

AI总结 针对法律AI中聚合指标掩盖的错误集中性和方向性问题,提出LegalHalluLens审计框架,通过类型化幻觉画像、风险方向指数(RDI)和校准辩论管道,将幻觉检测减少45%,并揭示聚合指标隐藏的失败模式。

Comments 15 pages, 5 figures; Published at the Second Workshop on Agents in the Wild: Safety, Security, and Beyond (AIWILD) at ICML 2026

详情
AI中文摘要

部署在法律工作流程中的AI系统以聚合指标报告的约52%的比率产生幻觉,但这个平均值掩盖了错误集中的位置和方向,使合规官员无法获得可操作的可信部署信号。我们提出LegalHalluLens,一个包含三个组件的审计框架:基于CUAD(Hendrycks等人,2021)的四种法律动机声明类别(数字、时间、义务/权利、事实)的类型化幻觉画像;一个风险方向指数(RDI),将遗漏与发明偏差简化为一个可部署比较的标量;以及一个针对幅度和方向校准的类型化辩论管道。在510份合同和249,252个条款级实例上,我们测量了义务/数字和时间声明之间约38-40个百分点的模型内差距,而聚合报告隐藏了这一点,并表明两个具有匹配的52%比率的系统可能具有相反的RDI。辩论管道将虚构检测减少了45%,每个类别的收益跟踪诊断结果,使用显著更小的骨干网络(4B活跃参数)匹配商业API。类型化画像和RDI揭示了聚合指标隐藏的失败模式;我们进一步表明这些诊断可作为多智能体辩论管道的校准输入,其中针对测量失败模式的怀疑挑战和非对称门优于通用调整的辩论。该框架支持部署在现实世界中的法律AI的方向感知采购、问责制和智能体设计。

英文摘要

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal for trustworthy deployment. We present LegalHalluLens, an auditing framework with three components: typed hallucination profiles across four legally-motivated claim categories (numeric, temporal, obligation/entitlement, factual) over CUAD (Hendrycks et al., 2021); a Risk Direction Index (RDI) that reduces omission-versus-invention bias to a single deployment-comparable scalar; and a typed debate pipeline calibrated to both magnitudes and directions. Across 510 contracts and 249,252 clause-level instances we measure a within-model gap of approximately 38-40 pp between obligation/numeric and temporal claims that aggregate reporting hides, and show that two systems with matched 52% rates can carry opposite RDIs. The debate pipeline reduces fabricated detections by 45% with per-category gains tracking the diagnosis, matching commercial APIs with a substantially smaller backbone (4B active parameters). Typed profiles and RDI surface failure modes that aggregate metrics hide; we further show these diagnostics serve as calibration inputs for multi-agent debate pipelines, where Skeptic challenges and asymmetric gates targeted at measured failure modes outperform generically-tuned debate. The framework supports direction-aware procurement, accountability, and agent design for legal AI deployed in the wild.

2606.18008 2026-06-17 cs.CV 新提交

PhaseWin: An Efficient Search Algorithm for Faithful Visual Attribution

PhaseWin:一种用于忠实视觉归因的高效搜索算法

Zihan Gu, Ruoyu Chen, Junchi Zhang, Li Liu, Xiaochun Cao, Hua Zhang

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) School of Cyber Security, University of Chinese Academy of Sciences(中国科学院大学网络空间安全学院) Shanghai Center for Mathematical Sciences, Fudan University(复旦大学上海数学中心) College of Electronic Science and Technology, National University of Defense Technology(国防科技大学电子科学学院) School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University(中山大学深圳校区网络空间安全学院)

AI总结 提出PhaseWin算法,通过分阶段窗口搜索将视觉归因的计算复杂度从O(n²)降至O(n),在保持接近贪心算法忠实度的同时大幅减少模型评估次数。

Comments 26 pages, 29 figures

详情
AI中文摘要

视觉归因是解释现代视觉和视觉-语言模型的基本工具,尤其在需要检查、诊断或审计模型决策时。其目标是通过对候选图像区域分配重要性排序,解释模型决策如何依赖于视觉输入的局部区域。给定一个划分为n个区域的图像,忠实归因可以转化为有序子集搜索问题,其中逐步插入所选区域应尽可能早地恢复目标模型响应。对区域子集的穷举搜索会产生指数级成本,而广泛使用的贪心搜索仍需要二次数量的模型评估,因为每个选择步骤都会重新评分所有剩余候选。我们提出PhaseWin,一种用于忠实视觉归因的高效子集搜索算法。PhaseWin将贪心区域选择重组为分阶段窗口搜索过程:不是每一步都重新评估整个候选集,而是在全局候选筛选、自适应剪枝和局部窗口细化之间交替,同时保留贪心搜索的基本区域排序行为。我们在单调证据积累条件下分析PhaseWin,并表明在特征级结构假设下,它实现了可控的线性评估复杂度以及接近贪心的忠实度保证。在图像分类、目标检测、视觉定位和图像描述上的大量实验表明,在所有比较的归因方法中,PhaseWin以最少的前向传播达到高忠实度,经验上实现了从O(n²)到O(n)的预测降低。代码可在该网址获取。

英文摘要

Visual attribution is a fundamental tool for interpreting modern vision and vision-language models, particularly when their decisions must be inspected, diagnosed, or audited. Its goal is to explain how a model's decision depends on local regions of the visual input, typically by assigning an importance ordering over candidate image regions. Given an image partitioned into $n$ regions, faithful attribution can be cast as an ordered subset-search problem, in which progressively inserting the selected regions should recover the target model response as early as possible. Exhaustive search over region subsets incurs exponential cost, while the widely used greedy search still requires a quadratic number of model evaluations, because every selection step rescores all remaining candidates. We propose PhaseWin, an efficient subset-search algorithm for faithful visual attribution. PhaseWin reorganizes greedy region selection into a phased window-search procedure: rather than re-evaluating the full candidate set at every step, it alternates between global candidate screening, adaptive pruning, and localized window refinement, while preserving the essential region-ranking behavior of greedy search. We analyze PhaseWin under monotone evidence-accumulation conditions and show that, under feature-level structural assumptions, it attains controllable linear evaluation complexity together with near-greedy faithfulness guarantees. Extensive experiments on image classification, object detection, visual grounding, and image captioning show that, among all compared attribution methods, PhaseWin reaches high faithfulness with the fewest forward passes, empirically realizing the predicted reduction from $O(n^2)$ to $O(n)$. The code is available at this https URL.

2606.18003 2026-06-17 cs.LG cs.AI 新提交

C2FL: Clustered Continual Federated Learning under Spatial and Temporal Drift

C2FL:空间和时间漂移下的聚类持续联邦学习

Davide Domini, Gianluca Aguzzi, Lorenzo Pellegrini, Mirko Viroli, Lukas Esterle

发表机构 * University of Bologna(博洛尼亚大学) Aarhus University(哥本哈根大学)

AI总结 针对空间异质性和时间漂移下节点隐私保护的集体自适应问题,提出C2FL方法,通过空间聚类自组织学习组,结合经验回放和停留时间感知自适应平均,实现鲁棒集体适应。

详情
AI中文摘要

集体自适应系统(CAS)越来越依赖机器学习,让每个节点从本地感知数据中学习,使其行为与周围环境对齐。然而,扩展这种智能带来了根本性挑战:感知数据通常涉及隐私,无法集中收集;节点是移动的,穿越不同区域,附近节点感知相似现象,而远处节点观察到截然不同的条件,形成自然空间聚类;并且由于移动性,这些分布随时间演变,引入时间漂移,使本地模型逐渐过时。这些动态出现在多个领域——车辆感知、无人机监测、智能手机众包——但隐私、空间异质性和时间漂移的相互作用严重削弱了传统学习策略。因此,我们提出C2FL,一种完全分布式的联邦学习(FL)方法,其中节点通过空间聚类自组织成学习组,反映环境的地理结构。为了抵消时间漂移,每个节点将经验回放与停留时间感知的自适应平均步骤相结合,随着在同一区域停留更长时间,逐步纳入区域共识,同时在不断变化的分布下保留先前获得的知识。我们在系统再现空间和时间变化的合成实验上评估了我们的方法,表明标准联邦策略在这些条件下显著退化,而我们的方法恢复了鲁棒的集体适应。

英文摘要

Collective Adaptive Systems (CAS) increasingly rely on machine learning to let each node learn from locally sensed data, aligning its behavior with the surrounding environment. Scaling this intelligence, however, raises fundamental challenges: sensed data is often privacy-sensitive, preventing centralized collection; nodes are mobile, traversing regions where nearby nodes perceive similar phenomena while distant ones observe radically different conditions, creating natural spatial clusters; and these distributions evolve over time due to mobility, introducing temporal drift that makes local models progressively stale. These dynamics arise across domains - vehicular sensing, drone-based monitoring, smartphone crowdsensing - yet the interplay of privacy, spatial heterogeneity, and temporal drift severely undermines conventional learning strategies. Therefore, we propose C2FL, a fully distributed Federated Learning (FL) approach where nodes self-organize into learning groups through spatial clustering, reflecting the geographic structure of the environment. To counteract temporal drift, each node combines experience replay with a dwell-time-aware adaptive averaging step, progressively incorporating the regional consensus as it remains longer within the same area, while preserving previously acquired knowledge under evolving distributions. We evaluate our approach on synthetic experiments that systematically reproduce spatial and temporal shifts, showing that standard federated strategies degrade significantly under these conditions and that our method restores robust collective adaptation.

2606.18001 2026-06-17 cs.LG 新提交

Half a Link can Be Enough to Predict a Whole Link: Understanding Generalization in Knowledge Graph Foundation Models

半条链接足以预测整条链接:理解知识图谱基础模型中的泛化

Cosimo Gregucci, Obaidah Theeb, Daniel Hernandez, Antonio Vergari, Steffen Staab

发表机构 * Institute for AI, University of Stuttgart(斯图加特大学人工智能研究所) University of Southampton(南安普顿大学) University of Edinburgh(爱丁堡大学)

AI总结 本文通过分析知识图谱基础模型在未见图上的零样本泛化,发现模型利用部分可见的“半链接”进行预测,并基于此提出四类场景的分类法,揭示现有模型的泛化机制与改进方向。

详情
AI中文摘要

知识图谱(KG)基础模型(KGFMs)是零样本泛化器:只需训练一次,它们就能在未见过的图上预测链接,无需重新训练。然而,理解它们何时以及如何能够在不同KG间稳健泛化仍是一个开放问题。在本文中,我们揭示了它们的泛化机制,强调了它们在未见KG上的性能在涉及部分可见链接(我们称之为半链接)时并非均匀。事实上,我们表明,要预测一个测试三元组$(h,r,t)$,在实践中可能只需在推理图中观察到半链接$(h,r)$或$(r,t)$。这产生了四种场景的分类法,这些半链接的组合被观察到或未被观察到。通过对这些场景进行严格的分层分析,我们揭示了SoTA KGFMs利用可见的半链接进行预测,而不可见的半链接则带来不同的挑战。因此,我们更细粒度的分类法可以作为稳健KGFM泛化的诊断协议,并突出新KGFM可以改进的地方。

英文摘要

Knowledge graph (KG) foundation models (KGFMs) are zero-shot generalizers: trained once, they can predict links on unseen graphs without retraining. However, understanding when and how they can robustly generalize across KGs is still an open question. In this paper, we shed some light on their generalization mechanisms highlighting how their performance on unseen KGs is not uniform when it comes to partially seen links, which we call half-links. In fact, we show that to predict a test triple $(h,r,t)$ it might suffice in practice to have observed the half-link $(h,r)$ or $(r,t)$ in the inference graph. This yields a taxonomy of four scenarios when combinations of these half-links are observed or not. In a rigorous stratified analysis over these scenarios, we reveal that SoTA KGFMs use seen half links for predictions, while unseen half-links pose different challenges. As such, our finer-grained taxonomy can be a diagnostic protocol for robust KGFM generalization and highlights where novel KGFMs can improve.

2606.17998 2026-06-17 cs.CV 新提交

AIGS-Net: Compact Illumination Field Modeling via 2D Gaussian Splatting for Fast Low-Light Image Enhancement

AIGS-Net: 基于2D高斯泼溅的紧凑光照场建模用于快速低光图像增强

Yuhan Chen, Kunyang Huang, Fuchen Li, Zhuohan Qin, Guofa Li, Wenbo Chu, Keqiang Li

发表机构 * College of Mechanical and Vehicle Engineering, Chongqing University(重庆大学机械与车辆工程学院) Department of Electrical and Computer Engineering, Carnegie Mellon University(卡内基梅隆大学电气与计算机工程系) Herbert Wertheim College of Engineering, University of Florida(佛罗里达大学赫伯特·韦特海姆工程学院) School of Mathematics and Statistics, Qingdao University(青岛大学数学与统计学院) National Innovation Center of Intelligent and Connected Vehicles(国家智能网联汽车创新中心) School of Vehicle and Mobility, Tsinghua University(清华大学车辆与运载学院)

AI总结 提出AIGS-Net,通过输入自适应的2D高斯泼溅光照场和零参数多尺度上下文编码,以约40个可学习参数实现低光图像增强,在LOL和LSRW基准上平衡了增强质量与推理效率。

详情
AI中文摘要

现有的低光图像增强方法通常在光照场建模的表征能力与计算复杂度之间存在瓶颈。为解决此问题,本文提出自适应光照高斯泼溅网络(AIGS-Net),一种用于快速低光增强的超轻量级架构。与传统的静态先验不同,AIGS-Net构建了一个输入自适应的2D高斯泼溅光照场。高斯基函数的不透明度由输入图像的相对亮度统计动态调制,并通过有序alpha合成渲染空间变化的光照补偿。为了高效指导自适应光照补偿,引入了一个零参数非线性多尺度上下文编码模块,无需额外卷积权重即可提取低频结构和局部对比度线索。为抑制噪声放大和传感器引起的颜色偏差,AIGS-Net集成了噪声掩膜估计、锁定单通道伽马映射、跨通道一致性正则化和目标颜色对齐约束。在LOL和LSRW基准上的实验表明,AIGS-Net在仅需约40个可学习参数的情况下,改善了细节恢复和颜色保真度,实现了增强质量与极端推理效率之间的有效权衡。

英文摘要

Existing low-light image enhancement methods often face a bottleneck between the representation capacity of illumination-field modeling and computational complexity. To address this issue, this paper proposes an Adaptive Illumination Gaussian Splatting Network (AIGS-Net), an ultra-lightweight architecture for fast low-light enhancement. Unlike conventional static priors, AIGS-Net constructs an input-adaptive 2D Gaussian Splatting illumination field. The opacity of Gaussian basis functions is dynamically modulated by relative luminance statistics of the input image, and spatially varying illumination compensation is rendered through ordered alpha compositing. To guide adaptive illumination compensation efficiently, a zero-parameter nonlinear multiscale contextual encoding module is introduced to extract low-frequency structures and local contrast cues without additional convolutional weights. To suppress noise amplification and sensor-induced color bias, AIGS-Net integrates noise-mask estimation, locked single-channel Gamma mapping, cross-channel consistency regularization, and target color-alignment constraints. Experiments on LOL and LSRW benchmarks show that AIGS-Net improves detail recovery and color fidelity while requiring only approximately 40 learnable parameters, achieving an effective trade-off between enhancement quality and extreme inference efficiency.

2606.17996 2026-06-17 cs.LG cs.AI 新提交

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

多重周期性与通道相关的小波分解在长期时间序列预测中的应用

Bin Wang, Heming Yang, Jinfang Sheng

发表机构 * School of Computer Science and Engineering, Central South University(中南大学计算机科学与工程学院)

AI总结 提出McWC模型,通过多层周期性构建、多层感知机提取通道相关性、多级小波分解融合高低频信息,并在频域解耦通道内自相关,实现高效准确的长期预测。

详情
AI中文摘要

周期性和趋势是时间序列数据的重要组成部分,许多基于周期性和趋势的研究在长期时间序列预测中取得了良好效果。然而,我们认为当前工作忽略了时间序列数据中真实世界通道间相关性的影响,导致预测次优。此外,这些模型依赖复杂设计来捕获多样信息,导致计算效率低下。为解决这一挑战,我们提出McWC,一种长期时间序列预测模型,分别对周期性、趋势和通道间相关性进行建模。具体来说,McWC首先使用多层周期性构建模块从数据中解耦周期性信息。然后,使用多层感知机提取通道间相关性。接着,使用多级小波分解模块对数据中的多层高频和低频信息进行建模和融合。最后,聚合不同组件的结果以获得输出。同时,我们通过在频域计算损失函数来解耦通道内自相关。在六个真实世界数据集上的实验表明,McWC实现了最先进的性能,展现出卓越的计算效率和历史信息提取能力。

英文摘要

Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the influence of real-world inter-channel correlations in time series data which leads to suboptimal predictions. Furthermore, these models rely on complex designs to capture diverse information so that resulting in low computational efficiency. To address this challenge, we propose McWC, a long-term time series forecasting model that separately models the cyclicity, trend, and inter-channel correlations. Specifically, McWC first decouples cyclical information from data using a multi-layer cyclicity construction module. Then, it extracts inter-channel correlations using multi-layer perceptron. Next, it models and fuses the multi-layer high-frequency and low-frequency information from data using a multi-level wavelet decomposition module. Finally, it aggregates the results of different components to obtain the output. Simultaneously, we decouple intra-channel autocorrelations by calculating a loss function in the frequency domain. Experiments on six real-world datasets demonstrate that McWC achieves state-of-the-art performance, exhibiting excellent computational efficiency and historical information extraction capabilities.