arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1733
2606.13842 2026-06-15 cs.RO 新提交

Efficient Domain-Adaptive Policy Learning via Kernel Representation with Application to Quadrotor Control under Non-Stationary Disturbances

基于核表示的高效域自适应策略学习及其在非平稳扰动下四旋翼控制中的应用

Hongyu Zhou, Mingtian Tan, Vasileios Tzoumas

发表机构 * University of Michigan, Ann Arbor(密歇根大学安娜堡分校)

AI总结 提出一种基于核表示的高效域自适应策略学习算法,通过随机傅里叶特征建模未知扰动,离线训练仅需50秒,在线通过最小二乘估计实时更新参数,在四旋翼轨迹跟踪任务中有效应对非平稳扰动。

详情
AI中文摘要

我们提出了一种基于核表示的高效域自适应策略学习算法。学习域自适应策略具有挑战性,因为它需要一种环境表示,既能足够表达以在离线训练期间建模复杂的模拟到现实差距,又能在部署期间支持快速在线适应。例如,四旋翼可能遇到时变的非平稳扰动,如突然阵风、载荷变化或在不同飞行状态(有无地面效应)之间的转换。为了解决这些挑战,我们使用基于随机傅里叶特征的可微核近似来建模未知扰动。在离线训练阶段,我们随机采样核系数和带宽参数以生成丰富多样的扰动分布。然后通过可微仿真和解析梯度优化控制策略,该过程在RTX 4090 GPU上仅需50秒训练时间。在硬件部署期间,策略通过在线最小二乘估计更新核系数和带宽,实时适应非平稳环境。我们在高保真数值仿真和Crazyflie硬件实验中评估了该方法,在包括复杂气动效应、风、地面效应和载荷波动等各种扰动下进行四旋翼轨迹跟踪任务。

英文摘要

We present an algorithm for efficient domain-adaptive policy learning via kernel representations. Learning domain-adaptive policies is challenging since it requires an environment representation that is both sufficiently expressive to model complex sim-to-real gaps during offline training, and computationally efficient enough to support rapid online adaptation during deployment. For instance, a quadrotor may encounter time-varying, non-stationary disturbances, such as sudden gusts of wind, payload shifts, or transitions between distinct flight regimes with and without ground effects. To address these challenges, we model unknown disturbances using a differentiable kernel approximation based on random Fourier features. During the offline training phase, we randomly sample kernel coefficients and bandwidth parameters to generate a rich diversity of disturbance profiles. We then optimize the control policy via differentiable simulation with analytical gradients, a process that takes only 50 seconds of training time on an RTX 4090 GPU. During hardware deployment, the policy adapts to non-stationary environments in real time by updating both the kernel coefficients and bandwidth through online least-squares estimation. We evaluate our method on quadrotor trajectory tracking tasks across high-fidelity numerical simulations and hardware experiments using Crazyflie, subjected to various disturbances, including complex aerodynamic effects, wind, ground effects, and payload fluctuations.

2606.13840 2026-06-15 cs.RO cs.CV 新提交

Multi-Agent Embodied Autonomous Driving: From V2X Information Exchange to Shared World Models

多智能体具身自动驾驶:从V2X信息交换到共享世界模型

Senkang Hu, Zhengru Fang, Yihang Tao, Zihan Fang, Sam Tak Wu Kwong, Yuguang Fang

发表机构 * Lingnan University, Hong Kong(岭南大学(香港))

AI总结 本文综述了从单车智能向多智能体具身系统转变的自动驾驶技术,通过共享世界模型实现感知共享、意图推断和协同规划,并指出了在仿真评估、实时安全保证等方面的研究空白。

详情
AI中文摘要

自动驾驶正从孤立的车辆智能转向多智能体具身系统,这些系统共享感知、推断意图并在不确定性下协调行动。本综述通过共享世界模型(SWMs)的视角审视这一转变:SWMs是跨车辆、基础设施和其他交通参与者维护的预测性跨智能体表征。我们回顾了超过380篇文献,涵盖车联万物(V2X)通信、协同感知、智能体间认知、协同规划、端到端协同驾驶以及用于闭环验证的仿真和数据引擎。核心问题是交换的观测如何成为对齐的状态、意图感知的交互和协调的下游行动。在所调查的文献中,评估仍然集中在仿真、精心设计的基准测试和离线协议上。基于基础模型的协调也缺乏在开放交通中经过验证的实时安全保证。这些空白为多智能体具身自动驾驶(MAEAD)提出了关键研究重点:可验证的共享状态维护、鲁棒的意图和计划对齐,以及在通信、延迟和部署约束下的安全协调行动。

英文摘要

Autonomous driving is shifting from isolated vehicle intelligence toward multi-agent embodied systems that share perception, infer intent, and coordinate action under uncertainty. This survey examines this transition through the lens of Shared World Models (SWMs): predictive cross-agent representations maintained across vehicles, infrastructure, and other traffic participants. We review more than 380 publications spanning vehicle-to-everything (V2X) communication, collaborative perception, inter-agent cognition, cooperative planning, end-to-end cooperative driving, and simulation and data engines for closed-loop validation. The organizing question is how exchanged observations become aligned state, intent-aware interaction, and coordinated downstream action. Across the surveyed literature, evaluation remains concentrated in simulation, curated benchmarks, and offline protocols. Foundation-model-based coordination also lacks verified real-time safety guarantees in open traffic. These gaps motivate key research priorities for multi-agent embodied autonomous driving (MAEAD): verifiable shared-state maintenance, robust intent and plan alignment, and safe coordinated action under communication, latency, and deployment constraints.

2606.13839 2026-06-15 cs.CV cs.AI eess.IV 新提交

Explaining RhythmFormer: A Systematic XAI Analysis of Periodic Sparse Attention for Remote Photoplethysmography

解释RhythmFormer:远程光电容积描记术周期性稀疏注意力的系统XAI分析

Louis Chen, Torbjörn E. M. Nordling

发表机构 * Department of Mechanical Engineering, National Cheng Kung University(国立成功大学机械工程学系)

AI总结 针对rPPG Transformer可解释性缺乏定量评估的问题,提出四种归因方法并引入皮肤覆盖率和忠诚度系数,量化稀疏注意力中的多跳泄漏效应,Beyond Intuition方法在UBFC-rPPG上取得最优性能。

Comments 26 pages, 8 figures

详情
AI中文摘要

远程光电容积描记术(rPPG)Transformer在基准测试中实现了低心率误差,但其决策仍然不透明——随着rPPG向临床心率估计发展,这一问题日益受到关注。现有的rPPG XAI主要依赖定性热图检查,缺乏定量忠诚度指标或基于生理学的验证,在视觉合理性和可审计证据之间存在差距。我们解决了这一差距。首先,我们将四种归因方法(原始注意力、rollout、flow、Beyond Intuition)适配到RhythmFormer的双层路由注意力(带有top-$k$选择)上。其次,我们引入了一个皮肤覆盖度指标,量化归因质量落在皮肤区域的比例。第三,我们将SaCo忠诚度系数从其原始分类设置适配到rPPG回归,通过使用原始和扰动预测rPPG波形之间的MAE作为扰动影响。应用这些工具,我们量化了稀疏top-$k$路由下的多跳泄漏效应:注意力rollout和flow几乎完全恢复了各个精炼注意力层明确设置为零的连接。Beyond Intuition通过其值投影加权rollout和梯度支持掩码缓解了这一问题,在UBFC-rPPG上获得了评估方法中最高的中位精炼皮肤覆盖度(0.83对比vanilla rollout的0.57)和忠诚度(F=0.92)。需要在不同数据集和模型变体上进行验证。对低SaCo异常值的案例研究进一步表明,一旦替换了伪影区域,所有四种方法都一致恢复,表明在这个示例案例中,归因家族之间的SaCo行为一致。总之,这些指标将rPPG XAI推向关于空间对齐和扰动忠诚度的可审计数值证据,即可信的rPPG XAI。

英文摘要

Remote photoplethysmography (rPPG) transformers achieve low heart-rate error on benchmarks, yet their decisions remain opaque--a growing concern as rPPG moves toward clinical heart rate estimation. Existing rPPG XAI is dominated by qualitative heatmap inspection without quantitative faithfulness metrics or physiology-grounded validation, leaving a gap between visual plausibility and auditable evidence. We address this gap. First, we adapt four attribution methods (raw attention, rollout, flow, Beyond Intuition) to RhythmFormer's bi-level routing attention with top-$k$ selection. Second, we introduce a skin coverage metric quantifying how much attribution mass falls on skin regions. Third, we adapt the SaCo faithfulness coefficient from its original classification setting to rPPG regression by using the MAE between original and perturbed predicted rPPG waveforms as the perturbation impact. Applying these tools, we quantify a multi-hop leakage effect under sparse top-$k$ routing: attention rollout and flow almost completely restores the connections that individual refined-attention layers explicitly set to zero. Beyond Intuition mitigates this via its value-projection-weighted rollout and gradient-supported mask, attaining the highest median refined skin coverage ($0.83$ vs. $0.57$ for vanilla rollout) and faithfulness ($F=0.92$) among the evaluated methods on UBFC-rPPG. Validation across diverse datasets and model variants is needed. A case study on a low-SaCo outlier further shows all four methods recovering consistently once an artefactual region is replaced, suggesting consistent SaCo behavior across attribution families in this illustrative case. Together, these metrics move XAI for rPPG toward auditable numerical evidence about spatial alignment and perturbation faithfulness, i.e. trustworthy rPPG XAI.

2606.13835 2026-06-15 cs.CL cs.AI cs.MA 新提交

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

当合理但不现实:评估基于LLM的城市模拟中的人类移动性

Gustavo H. Santos, Aline Carneiro Viana, Thiago H. Silva

发表机构 * UTFPR(巴西联邦理工大学) Inria(法国国家信息与自动化研究所) U. of Toronto(多伦多大学)

AI总结 提出验证框架,通过移动定律、时间节奏等指标评估基于LLM的城市模拟器生成的人类移动模式,发现叙事合理性与经验移动现实性之间存在显著差距。

Comments 14 pages, 10 figures

详情
AI中文摘要

基于LLM的生成式智能体越来越多地用于城市模拟器,但尚不清楚它们是否再现了经验上真实的人类移动模式,还是仅仅生成合理的移动叙事。我们引入了一个验证框架,用于评估基于LLM的城市模拟器中生成智能体的移动性,并与真实世界移动数据进行比较。为此,我们使用了移动定律、时间节奏、网络模体、语义活动转换和行为移动性配置文件。利用大巴黎地区和上海的数据集,我们评估了AgentSociety和CitySim在多个移动现实性维度上的表现。我们的分析揭示了叙事合理性与经验移动现实性之间的显著差距。尽管模拟器捕捉到了一些高级语义活动分布,但它们难以再现核心的空间和时间约束,包括真实的行程长度分布、起止点流量、停留时间和转换动态。我们进一步观察到,现实的移动多样性在默认提示配置下不稳定,可能需要显式的配置文件感知初始化。为了支持可重复的评估,我们还贡献了可扩展且开放的LLM驱动基础设施,用于区域级地图生成、可观测性增强的模拟、移动性指标计算和交通模拟。我们的发现强调了需要对基于LLM的城市模拟器进行严格的经验验证,并提供了构建更真实和可重复的城市模拟系统的实用工具。

英文摘要

LLM-based generative agents are increasingly used in urban simulators, yet it remains unclear whether they reproduce empirically realistic human mobility patterns or merely generate plausible mobility narratives. We introduce a validation framework for evaluating the mobility of generative agents of LLM-based urban simulators against real-world mobility data. For this, we use mobility laws, temporal rhythms, network motifs, semantic activity transitions, and behavioral mobility profiles. Using datasets from the Greater Paris region and Shanghai, we evaluate AgentSociety and CitySim across multiple dimensions of mobility realism. Our analysis reveals a substantial gap between narrative plausibility and empirical mobility realism. Although the simulators capture some high-level semantic activity distributions, they struggle to reproduce core spatial and temporal constraints, including realistic trip-length distributions, origin-destination flows, dwell times, and transition dynamics. We further observe that realistic mobility diversity is unstable across default prompting configurations and may require explicit profile-aware initialization. To support reproducible evaluation, we also contribute scalable and open LLM-driven infrastructure for regional-scale map generation, observability-enhanced simulation, mobility-metric computation, and traffic simulation. Our findings highlight the need for rigorous empirical validation of LLM-based urban simulators and provide practical tools for building more realistic and reproducible urban simulation systems.

2606.13823 2026-06-15 cs.LG eess.SP stat.ML 新提交

A Stationarity-and-Coupling Criterion for Training-Free Time-Lagged Spectral Embeddings of Multivariate Time Series

多变量时间序列无训练时滞谱嵌入的平稳性与耦合准则

Siddharth Pal, Viktoria Rojkova

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出基于时滞相关矩阵截断的固定长度描述符D(τ),通过平稳高斯VAR(1)模型推导其适用条件:信号近似平稳且类别信息存在于跨通道时间耦合而非边际功率。

Comments 25 pages, 2 figures, 10 tables

详情
AI中文摘要

我们研究多变量时间序列的无训练固定长度描述符,不仅问这样的描述符是否表现良好,而且问何时可以预期它有效。我们的研究对象是$D(\tau)$,它由时滞相关矩阵在Marchenko-Pastur边缘截断构建,使得仅信号承载的特征值存活,并通过与类质心的余弦相似度分类,零学习参数。核心贡献不是描述符本身,而是一个可证伪的适用性准则。基于平稳高斯VAR(1)模型,我们论证当信号近似平稳且类别信息存在于它们的跨通道时间耦合而非边际每通道功率时,$D(\tau)$能分离两个类别。我们半正式地推导出三个结果:可区分性条件、为什么静态($\tau=0$)协方差退化为随机、以及为什么平稳但功率判别范式会击败描述符。该准则是可操作的:一个两部分预检测试——增强Dickey-Fuller平稳性检验和功率基线饱和检验——在任何训练前预测适用性。我们在混合数据集上验证了这两部分。在满足准则的四个范式(Sleep-EDF、BCI-IV-2a、MIT-BIH、ESC-50)上,描述符以极低成本与强基线竞争,在Sleep-EDF上20受试者留一法下达到$88.5\pm4.5\\%$,单CPU线程。在违反准则的三个范式——非平稳ERP、以及功率判别的金融波动和可穿戴压力模式——上,它完全如预检预测的那样失败,而这些负面结果更具信息量。我们明确$D(\tau)$不是最准确的表示;其价值在于它是一个紧凑、无训练的嵌入,其有效域事先已知。

英文摘要

We study training-free fixed-length descriptors for multivariate time series and ask not merely whether such a descriptor performs well, but when it can be expected to work at all. Our object of study is $D(τ)$, built from a time-lagged correlation matrix truncated at the Marchenko-Pastur edge so that only signal-bearing eigenvalues survive and classified by cosine similarity to class centroids with zero learned parameters. The central contribution is not the descriptor but a falsifiable applicability criterion for it. Working from a stationary Gaussian VAR(1) model, we argue that $D(τ)$ separates two classes when the signals are approximately stationary and the class information lives in their cross-channel temporal coupling rather than in marginal per-channel power. We derive, semi-formally, three consequences: a distinguishability condition, why the static ($τ=0$) covariance collapses to chance, and why a stationary but power-discriminated paradigm defeats the descriptor. The criterion is operational: a two-part pre-flight test -- an augmented Dickey-Fuller stationarity check and a power-baseline saturation check -- predicts applicability before any training. We validate both halves on a mixed assortment. On four paradigms that satisfy the criterion (Sleep-EDF, BCI-IV-2a, MIT-BIH, ESC-50) the descriptor is competitive with strong baselines at a fraction of their cost, reaching $88.5\pm4.5\%$ under 20-subject leave-one-subject-out on Sleep-EDF on a single CPU thread. On three that violate it -- non-stationary ERPs, and financial-volatility and wearable-stress regimes that are power-discriminated -- it fails exactly as the pre-flight predicts, and these negatives are the more informative half. We are explicit that $D(τ)$ is not the most accurate representation; its value is a compact, training-free embedding whose domain of validity is known in advance.

2606.13821 2026-06-15 cs.LG 新提交

Attention-Based Estimation of the Individual Treatment Benefit Probability under Dose Variation

基于注意力的剂量变化下个体治疗获益概率估计

Lev V. Utkin, Andrei V. Konstantinov, Stanislav K. Kogan, Natalya M. Verbova, Maksim I. Goriunov

发表机构 * Peter the Great St.Petersburg Polytechnic University Higher School of Artificial Intelligence Technologies(圣彼得堡彼得大帝理工大学人工智能技术高等学院)

AI总结 提出Dose-AIPTB框架,将个体治疗获益概率估计扩展至离散剂量场景,通过注意力机制聚合伪标签实现个性化剂量选择。

详情
AI中文摘要

估计个体患者治疗优于对照的概率,称为个体治疗获益概率(IPTB),提供了比群体平均指标更具临床直观性的替代方案。然而,现有的IPTB估计方法主要局限于二元治疗设置,尽管临床实践中剂量变化干预普遍存在。我们提出一个通用框架,用于离散剂量分配下有序结局的IPTB估计,称为Dose-AIPTB(基于注意力的剂量IPTB)。我们的方法将问题重述为对未观察到的个体治疗效应符号的二元分类,从协变量相似的成对比较中构建伪标签,并通过注意力机制或Nadaraya-Watson核回归进行聚合。该公式自然适应多个离散剂量水平,超越了二元治疗范式。通过在协变量偏移、不同样本量和异质性结局下的真实世界和合成数据上的数值实验,我们证明基于注意力的聚合始终优于核方法。该框架为基于个体水平获益概率的个性化剂量选择提供了基础。实现该模型的代码公开于此https URL。

英文摘要

Estimating the probability that a treatment outperforms a control for an individual patient, called the Individual Probability of Treatment Benefit (IPTB), offers a clinically intuitive alternative to population-average metrics. However, existing methods for IPTB estimation are largely confined to binary treatment settings, despite the prevalence of dose-varying interventions in clinical practice. We propose a general framework for IPTB estimation with ordinal outcomes under discrete dose assignments, called Dose-AIPTB (Dose Attention-based IPTB). Our approach recasts the problem as binary classification over the unobserved sign of the individual treatment effect, constructing pseudo-labels from covariate-similar pairwise comparisons and aggregating them via attention mechanisms or Nadaraya-Watson kernel regression. This formulation naturally accommodates multiple discrete dose levels, extending beyond the binary treatment paradigm. Through numerical experiments on real-world and synthetic data under covariate shift, varying sample sizes, and heterogeneous outcomes, we demonstrate that attention-based aggregation consistently outperforms kernel alternatives. The framework provides a foundation for personalized dose selection grounded in individual-level benefit probabilities. Codes implementing the model are publicly available at https://github.com/NTAILab/AIPTBDose.

2606.13818 2026-06-15 cs.LG 新提交

Uncertainty Estimation and Generalization Bounds for Modern Deep Learning

现代深度学习的不确定性估计与泛化界

Luis A. Ortega

发表机构 * Andrés Department of Computer Science Machine Learning Group(安德烈斯计算机科学系机器学习组) Madrid, June 2026(马德里,2026年6月)

AI总结 本文从贝叶斯角度统一推断、函数空间建模和大偏差理论,提出DVIP、VaLLA和FMGP等方法改进不确定性估计,并利用PAC-贝叶斯和大偏差理论解释过参数化神经网络的泛化能力。

Comments PhD Thesis, Autonomous University of Madrid

详情
AI中文摘要

本论文研究贝叶斯原理如何加深我们对现代深度学习系统的理解。尽管神经网络取得了显著的预测性能,但其泛化能力和不确定性量化能力仍仅被部分理解。本论文从方法论和理论两个角度应对这一挑战:将贝叶斯推断、函数空间建模和大偏差理论统一在一个共同的概率视角下。在方法论方面,论文引入了深度变分隐过程(DVIP),这是一个可扩展的贝叶斯框架,将隐过程扩展到深度架构。作为补充,提出了两种后处理方法——变分线性化拉普拉斯近似(VaLLA)和固定均值高斯过程(FMGP)——为预训练的确定性网络配备校准的不确定性估计。理论贡献集中于现代机器学习中一个核心开放问题:为什么大型、过参数化的神经网络能泛化得这么好?为此,论文发展了一个统一的概率框架,在PAC-贝叶斯和大偏差理论的语言下连接了三个关键机制——多样性、平滑性和随机性。

英文摘要

This thesis investigates how Bayesian principles can deepen our understanding of modern deep learning systems. While neural networks achieve remarkable predictive performance, their ability to generalize and to quantify uncertainty remains only partly understood. This thesis approaches this challenge from both methodological and theoretical angles: unifying Bayesian inference, function-space modeling, and large-deviation theory under a common probabilistic perspective. On the methodological side, the thesis introduces the Deep Variational Implicit Process (DVIP), a scalable Bayesian framework that extends implicit processes to deep architectures. Complementing this, two post-hoc methods -- the Variational Linearized Laplace Approximation (VaLLA) and the Fixed-Mean Gaussian Process (FMGP) -- are proposed to equip pretrained deterministic networks with calibrated uncertainty estimates. The theoretical contributions focus on one of the central open questions in modern machine learning: why do large, over-parameterized neural networks generalize so well? To address this, the thesis develops a unified probabilistic framework that connects three key mechanisms -- diversity, smoothness, and stochasticity -- within the language of PAC-Bayesian and large-deviation theory.

2606.13817 2026-06-15 cs.RO cs.LG 新提交

FlowMo-WM: A World Model with Object Momentum and Hidden Ambient Drift

FlowMo-WM:具有物体动量和隐藏环境漂移的世界模型

Yitao Jiang, Luyang Zhao, Muhao Chen, Devin Balkcom

发表机构 * Dartmouth College(达特茅斯学院) Clemson University(克莱姆森大学) University of Houston(休斯顿大学)

AI总结 提出FlowMo-WM,一种端到端可训练的视觉世界模型,通过分解图像-动作历史为短历史潜在状态和长历史上下文,分别建模物体运动和环境漂移,提升水下机器人等场景的长程预测精度。

详情
AI中文摘要

机器人学习中的世界模型根据视觉观察和动作预测未来状态,使智能体能够推理其控制后果。然而,许多动作条件模型在运动由即时控制主导的场景中评估,而水面航行器和其他真实世界物体在惯性下持续运动,并被水流或风等隐藏环境漂移所位移。我们提出FlowMo-WM,一种端到端可训练的视觉世界模型,无需流场直接监督,从图像-动作历史中推断以物体为中心的运动状态和与隐藏漂移相关的预测性长历史上下文。FlowMo-WM将图像-动作历史分解为短历史潜在状态(训练以总结以物体为中心的运动)和长历史上下文(训练以总结缓慢变化的外生影响)。在潜在展开期间,零上下文残差转移将动作条件基础动力学与上下文相关的漂移效应分离。在具有多样隐藏流、干扰和随机化车辆动力学的模拟水面航行器环境中,FlowMo-WM相比代表性动作条件潜在世界模型提高了长程展开精度。预测时上下文消融实验(在展开过程中将推断的上下文置零或打乱)表明,环境上下文对于隐藏漂移下的稳定预测至关重要,而冻结线性探针则表征了学习因子中编码的信息。

英文摘要

World models in robot learning predict future states from visual observations and actions, enabling agents to reason about the consequences of their controls. However, many action-conditioned models are evaluated in settings where motion is dominated by immediate control, whereas aquatic surface vehicles and other real-world objects continue moving under inertia and are displaced by hidden ambient drift, such as water currents or wind. We propose FlowMo-WM, an end-to-end trainable visual world model that infers object-centric motion state and a predictive long-history context associated with hidden drift from image-action histories without direct supervision of flow fields. FlowMo-WM factorizes image-action history into a short-history latent state, trained to summarize object-centric motion, and a longer-history context, trained to summarize slowly varying exogenous influences. A zero-context residual transition separates action-conditioned base dynamics from context-dependent drift effects during latent rollout. In simulated aquatic surface-vehicle environments with diverse hidden flows, disturbances, and randomized vehicle dynamics, FlowMo-WM improves long-horizon rollout accuracy over representative action-conditioned latent world models. Prediction-time context ablations, in which the inferred context is zeroed or shuffled during rollout, show that the ambient context is important for stable prediction under hidden drift, while frozen linear probes characterize information encoded in the learned factors.

2606.13815 2026-06-15 cs.AI cs.CL 新提交

Poker Arena: Multi-Axis Profiling of Strategic Reasoning and Memory in LLMs

Poker Arena: 大型语言模型中策略推理与记忆的多轴剖析

Pratham Singla, Shivank Garg, Vihan Singh

发表机构 * Indian Institute of Technology Roorkee(印度理工学院罗尔基分校) Raeth AI

AI总结 提出Poker Arena平台,通过三层记忆架构和九轴认知剖面分解策略推理,揭示标量排行榜系统性误排模型能力结构。

Comments 33 pages, ICML Workshop

详情
AI中文摘要

不确定性下的策略推理支撑着谈判、金融和政策中的关键决策,但现有的游戏基准将异质推理维度压缩为单一标量,导致前沿LLM的能力结构未被审视。我们引入Poker Arena,一个无限注德州扑克锦标赛平台,该平台将三层记忆架构(手牌内、会话内和跨会话)与九轴认知剖面相结合,将策略推理分解为可解释的维度,如下注规模校准和位置意识。我们在50个会话(每个会话1000手牌)和受控记忆消融实验中评估了七个前沿模型;锦标赛筹码和聚合轴得分对模型进行了不同排序:Claude Opus 4.6赢得+15,730筹码和14次第一名,但在平均轴得分上仅排名第五(共七个),而持久记忆对某些模型有帮助,对另一些则有损害。这些发现表明,多轴评估揭示了标量排行榜系统性误排的能力结构,其中跨维度一致性优于任何单一维度的峰值性能。

英文摘要

Strategic reasoning under uncertainty underpins consequential decisions in negotiation, finance, and policy, but prevailing game-play benchmarks collapse heterogeneous reasoning dimensions into a single scalar, leaving the capability structure of frontier LLMs unexamined. We introduce Poker Arena, a no-limit Texas Hold'em tournament platform that couples a three-layer memory architecture (within-hand, session, and cross-session) with a nine-axis cognitive profile decomposing strategic reasoning into interpretable dimensions such as bet-sizing calibration and positional awareness. We evaluate seven frontier models across 50 sessions of 1,000 hands and a controlled memory ablation; tournament chips and aggregate axis score order the field differently: Claude Opus 4.6 wins +$15,730 chips with 14 first-place finishes, yet ranks only fifth of seven on mean axis score, while persistent memory helps some models and hurts others. These findings show that multi-axis evaluation surfaces capability structure that scalar leaderboards systematically misrank, with cross-dimensional consistency outweighing peak performance on any single axis.

2606.13809 2026-06-15 cs.CV 新提交

Compressing Image Style Training into a Single Model Forward

将图像风格训练压缩为单次模型前向传播

Zhongjie Duan, Yingda Chen

发表机构 * ModelScope Team, Alibaba Group(阿里巴巴集团 ModelScope 团队)

AI总结 提出i2L框架,通过单次前向传播预测LoRA权重,实现高效风格迁移,避免逐风格优化,在风格保真度、提示对齐和感知质量上超越现有基线。

Comments 11 pages, 9 figures

详情
AI中文摘要

基于扩散的风格迁移必须在推理效率与风格化保真度之间取得平衡。基于适配器的方法效率高,但将风格作为外部条件注入,可能削弱参考图像的特定外观或将参考语义复制到生成图像中。基于优化的个性化方法(如LoRA)能更有效地内化风格,但每个新风格都需要独立的训练过程。我们提出i2L(图像到LoRA),一种将风格LoRA训练摊销为单次前向传播的框架。给定一张或多张参考图像,i2L预测文本到图像模型的LoRA权重,无需逐风格优化即可立即实例化风格。该架构结合了图像编码器、可学习的LoRA查询以及生成适配矩阵的压缩解码头。在语义多样的风格对上训练,鼓励预测器保留外观线索同时抑制参考内容复制。在Z-Image、FLUX.2和Hidream-O1上的实验表明,i2L在风格保真度、提示对齐和感知质量上优于现有基线。由于i2L生成显式的LoRA权重,它还支持非对称无分类器引导、多参考风格融合以及与可控生成模块的组合。

英文摘要

Diffusion-based style transfer must balance inference efficiency with stylization fidelity. Adapter-based methods are efficient, but they inject style as an external condition and can either weaken reference-specific appearance or copy reference semantics into the generated image. Optimization-based personalization methods such as LoRA internalize style more effectively, but require a separate training process for every new style. We introduce i2L (image-to-LoRA), a framework that amortizes style LoRA training into a single forward pass. Given one or more reference images, i2L predicts LoRA weights for a text-to-image model, enabling immediate style instantiation without per-style optimization. The architecture combines an image encoder, learnable LoRA queries, and compressed decoding heads that generate adapted matrices. Training on semantically diverse style pairs encourages the predictor to preserve appearance cues while suppressing reference-content copying. Experiments on Z-Image, FLUX.2, and Hidream-O1 show that i2L improves style fidelity, prompt alignment, and perceptual quality over existing baselines. Because i2L produces explicit LoRA weights, it also supports asymmetric classifier-free guidance, multi-reference style fusion, and composition with controllable-generation modules.

2606.13808 2026-06-15 cs.CL 新提交

The Culture Funnel: You Can't Align What isn't in the Data

文化漏斗:你无法对齐不在数据中的内容

Ananya Sahu, Mehrnaz Mofakhami, Daniel D'Souza, Thomas Euyang, Julia Kreutzer, Marzieh Fadaee

发表机构 * Cohere Labs(Cohere实验室)

AI总结 针对当前文化对齐方法依赖推理时干预而忽视训练数据的问题,提出文化数据漏斗概念,通过多维标签框架分析预训练、微调、对齐和推理数据集,发现后训练阶段文化信号急剧下降,多语言性虽增强地理多样性但未能确保平衡,所提标签可提升下游文化基准性能。

详情
AI中文摘要

当前的文化对齐方法侧重于推理时的干预,假设模型已经包含足够的文化知识。我们认为现代LLM流程存在一个文化数据漏斗。通过使用一个跨预训练、微调、对齐和推理数据集的多维标签框架,我们展示了在后训练阶段显式文化信号急剧下降,而地理上集中、任务专门化的数据占主导地位。多语言性增强了文化知识的地理多样性,但并未确保平衡的代表性。我们的标签提升了下游文化基准性能,表明进展需要将重点转向训练数据流程。为促进未来研究,我们在此https URL发布带有5.6M样本的文化标注数据集。

英文摘要

Current cultural alignment approaches focus on inference-time interventions, assuming models already contain sufficient cultural knowledge. We argue modern LLM pipelines suffer from a cultural data funnel. Using a multidimensional tagging framework across pretraining, fine-tuning, alignment, and reasoning datasets, we show explicit cultural signals decline sharply during post-training, while geographically concentrated, task-specialized data dominates. Multilinguality enhances geographic diversity of cultural knowledge but does not ensure balanced representation. Our tags improve downstream cultural benchmark performance, demonstrating that advances require shifting focus in training data pipelines. To facilitate future research, we release our culturally tagged dataset with 5.6M samples at https://huggingface.co/datasets/CohereLabs/CultureMarkers.

2606.13803 2026-06-15 cs.LG 新提交

Neural Slack Variables for Shape Constraints

形状约束的神经松弛变量

Ruben Wiedemann, Antoine Jacquier, Lukas Gonon

发表机构 * Imperial College London(伦敦帝国理工学院) University of St. Gallen(圣加仑大学)

AI总结 提出神经松弛变量方法,将约束强制执行转化为回归问题,通过联合学习辅助网络实现零违规,应用于单调性和凸性约束及金融波动曲面学习。

详情
AI中文摘要

在神经网络中强制执行单调性和凸性等函数不等式约束是许多工业和科学应用中的基本挑战。经典的惩罚方法和基于互补松弛性的原始-对偶方法仅在违反位置提供约束梯度,导致约束满足脆弱。另一方面,通过构造保证可行性的架构仍然主要限于简单情况,并引入额外的归纳偏差。我们提出神经松弛变量,一种深度学习原生的原始侧方法,通过将主网络与联合学习的辅助网络耦合,将约束强制执行转化为回归问题。辅助网络作为主网络约束量的有效目标,诱导可行性和正则性。神经松弛变量在密集网格的单调性和凸性测试案例上实现了零测量违规,而惩罚和原始-对偶基线存在残余违规,并实现了波动率曲面的无套利学习,这是量化金融中的一个开放工业挑战。

英文摘要

Enforcing functional inequality constraints such as monotonicity and convexity in neural networks is a fundamental challenge in many industrial and scientific applications. Classical one-sided penalty methods, along with primal-dual methods gated by complementary slackness, provide constraint gradients only at violated locations, resulting in fragile satisfaction. Architectures that guarantee feasibility by construction, on the other hand, remain largely limited to elementary cases and impose additional inductive biases. We introduce neural slack variables, a deep learning native primal-side approach that converts constraint enforcement into a regression problem by coupling the primary network with a jointly learned auxiliary network. The auxiliary network serves as a valid target for the primary network's constraint quantities, inducing feasibility and regularity. Neural slack variables achieve zero measured violations on dense-grid monotonicity and convexity test cases, where penalty and primal-dual baselines leave residual violations, and enable arbitrage-free learning of volatility surfaces, an open industrial challenge in quantitative finance.

2606.13801 2026-06-15 cs.LG q-bio.NC 新提交

Neural Variability Enhances Artificial Network Robustness

神经变异性增强人工网络鲁棒性

Robin Preble, Praveen Venkatesh, Stefan Mihalas, Kameron Decker Harris

发表机构 * Department of Computer Science, Western Washington University(西华盛顿大学计算机科学系) Allen Institute(艾伦研究所)

AI总结 研究通过引入结构化噪声(模仿皮层神经变异性)提升人工神经网络对对抗攻击和自然图像修改的鲁棒性,发现噪声结构可显著增强鲁棒性,且对抗攻击的噪声结构可泛化至其他攻击类型。

详情
AI中文摘要

皮层中的神经反应在重复刺激下表现出显著的试验间变异性,而外周感觉神经元的反应则更为一致,这使许多人怀疑随机性是否具有意义。已有研究认为,噪声和信号相关性可能被优化用于动物的辨别,而人工神经网络(ANN)研究也显示了噪声在机器学习任务中的类似益处,尽管大多数ANN研究忽略了相关性的影响。在这里,我们研究相关噪声是否能提高人工神经网络对对抗攻击和自然图像修改的鲁棒性。利用修改输入与干净输入下激活的协方差,我们发现结构化噪声可以显著提高网络鲁棒性。对自然图像修改的鲁棒性最受益于结构,但这种结构在修改类型之间迁移性差。相比之下,来自对抗攻击的噪声结构可以泛化到其他类型的攻击。这些结果表明,ANN激活中的结构化噪声通常能提高鲁棒性,建立了一种仅依赖局部信息的生物合理策略来创建鲁棒的人工神经网络。

英文摘要

Neural responses in cortex exhibit substantial trial-to-trial variability in response to repeated stimuli, while peripheral sensory neurons respond far more consistently, leading many to wonder whether stochasticity may carry meaning. Existing work has argued that noise and signal correlations may be optimized for discrimination in animals, whereas artificial neural network (ANN) studies have shown similar benefits of noise in machine learning tasks, although most ANN work has neglected the effects of correlations. Here we investigate whether correlated noise improves the robustness of artificial neural networks to adversarial attacks and naturalistic image modifications. Using the covariance of activations under modified versus clean inputs, we find that structured noise may significantly improve network robustness. Robustness to naturalistic image modifications benefits most from structure, but this structure transfers poorly across modification types. In contrast, noise structure from adversarial attacks can generalize to other kinds of attacks. These results suggest that structured noise in ANN activations generally improves robustness, establishing a biologically plausible strategy for creating robust artificial neural networks that only relies on local information.

2606.13795 2026-06-15 cs.LG 新提交

Diffusion Policy Optimization without Drifting Apart

无漂移扩散策略优化

Haozhe Jiang, Haiwen Feng, Pieter Abbeel, Jiantao Jiao, Angjoo Kanazawa, Nika Haghtalab

发表机构 * University of California, Berkeley(加州大学伯克利分校) Simons Institute for the Theory of Computing(西蒙斯计算理论研究所) Department of Electrical Engineering and Computer Sciences, University of California, Berkeley(加州大学伯克利分校电气工程与计算机科学系)

AI总结 针对扩散策略梯度方法的不稳定性,提出DiPOD框架,通过自蒸馏与策略改进梯度更新交替进行,维持紧界行为,实现稳定且高效的策略优化。

Comments Project page: astro-eric.github.io/blogs/dipod/

详情
AI中文摘要

RL后训练对于改进扩散策略越来越关键,但现有的扩散策略梯度方法往往不稳定,无法实现可靠的策略改进。我们确定原因是双重漂移现象:优化变分代理可能导致ELBO与真实对数似然分离,从而使产生的代理策略梯度与期望回报的真实策略梯度不对齐。我们提出\textbf{DiPOD},一种扩散策略优化框架,通过将自蒸馏与策略改进梯度更新交替进行,在整个训练过程中维持紧界行为。这导致了一个简单实用的算法:在每个扩散策略梯度更新中增加一个在策略ELBO正则化项。在扩散语言模型后训练和连续控制扩散策略中,DiPOD显著稳定了训练,并达到了比先前方法更高的奖励。

英文摘要

RL post-training has become increasingly pivotal for improving diffusion policies, but existing diffusion policy-gradient methods are often unstable and cannot achieve reliable policy improvement. We identify the cause as the double-drift phenomenon: optimizing a variational surrogate can let the ELBO separate from the true log-likelihood, which then makes the resulting proxy policy gradient misaligned with the true policy gradient of expected return. We propose \textbf{DiPOD}, a diffusion policy optimization framework that maintains tight-bound behavior throughout training by interleaving self-distillation with policy-improving gradient updates. This leads to a simple and practical algorithm: augmenting each diffusion policy-gradient update with an on-policy ELBO regularizer. Across diffusion language model post-training and continuous-control diffusion policies, DiPOD substantially stabilizes training and reaches higher rewards than previous methods.

2606.13768 2026-06-15 cs.CV cs.AI 新提交

CineOrchestra: Unified Entity-Centric Conditioning for Cinematic Video Generation

CineOrchestra:面向电影视频生成的统一实体中心条件控制

Sharath Girish, Tsai-Shien Chen, Zhikang Dong, Mukesh Singhal, Hao Chen, Sergey Tulyakov, Aliaksandr Siarohin

发表机构 * Snap Inc.(Snap公司) UC Merced(加州大学默塞德分校)

AI总结 提出CineOrchestra,一种统一控制主体、事件、相机和镜头切换的视频扩散模型,通过实体中心条件原语和参数无关的旋转位置编码实现多轴联合控制,在密集描述跟随和镜头切换时序上超越六种专用方法。

Comments Project page: https://snap-research.github.io/CineOrchestra

详情
AI中文摘要

电影视频描绘了多个主体在特定时刻行动或互动,通过有意的相机运动捕捉,并由镜头切换拼接而成。这些元素共同要求比当前文本到视频模型更细粒度的控制。现有工作分别处理每个轴:多主体个性化、时间控制、多镜头合成或相机控制;没有先前的框架能联合集成所有四个轴。我们提出CineOrchestra,一种统一的视频扩散模型,同时控制主体、事件、相机和镜头切换。我们的关键洞察是,这些异构的电影元素共享一个基本结构:每个元素都是在特定时间间隔内行动的实体,因此都可以通过一个共享的实体中心条件原语结构来表达,并辅以视觉实体的参考图像。这种表述将架构挑战简化为单个位置编码问题,我们通过两个参数无关的协调旋转嵌入来解决:(a) 间隔采样的时间RoPE,在持续时间差异巨大的事件上产生一致注意力行为;(b) 2D实体-时间交叉注意力RoPE,消除每个实体条件的歧义,并将其路由到对应的时空区域。在两个新基准上,CineOrchestra在密集描述跟随和镜头切换时序上优于六种每轴专家方法,在成对用户研究和组件消融中持续获得增益。

英文摘要

Cinematic video depicts multiple subjects acting or interacting at specific moments, captured with deliberate camera movement, and stitched together by shot transitions. Together, these elements demand a level of fine-grained control beyond current text-to-video models. Existing work addresses each axis in isolation: multi-subject personalization, temporal control, multi-shot synthesis, or camera control; no prior framework jointly integrates all four. We present CineOrchestra, a unified video diffusion model that controls subjects, events, cameras, and shot transitions simultaneously. Our key insight is that these heterogeneous cinematic elements share a fundamental structure: each is an entity acting over a specific temporal interval, which can therefore all be expressed through one shared structure of entity-centric conditioning primitives, augmented with reference images for visual entities. This formulation reduces the architectural challenge to a single positional encoding problem, which we solve with two parameter-free coordinated rotary embeddings: (a) an interval-sampled temporal RoPE that yields consistent attention behavior across events of dramatically varying duration, and (b) a 2D entity-temporal cross-attention RoPE that disambiguates per-entity conditions and routes each to its corresponding spatiotemporal region. On two new benchmarks, CineOrchestra outperforms six per-axis specialists on dense caption following and shot-transition timing, with consistent gains in a pairwise user study and component ablations.

2606.13767 2026-06-15 cs.LG cs.AI cs.IT math.IT 新提交

Beyond LoRA: Is Sparsity-Induced Adaptation Better?

超越LoRA:稀疏诱导的适应更好吗?

Elijah Cadenhead, Cristian McGee, Xin Li, El Houcine Bergou, Aritra Dutta

发表机构 * School of Data, Mathematical and Statistical Sciences, University of Central Florida, United States(中佛罗里达大学数据、数学与统计科学学院) College of Computing, Mohammed VI Polytechnic University (UM6P), Morocco(穆罕默德六世理工大学计算机学院) Department of Computer Science, University of Central Florida, United States(中佛罗里达大学计算机科学系)

AI总结 本文提出Cheap LoRA (cLA)及其变体,通过在LoRA中引入稀疏性实现参数高效微调,理论推导泛化误差界,实验表明在多种任务上性能与参数匹配基线相当,同时减少训练时间和峰值GPU内存。

Comments Overview of the paper and code can be found here: https://elicaden.github.io/Beyond_LoRA/

详情
AI中文摘要

低秩适应(LoRA)及其变体为预训练模型的全微调提供了一种内存和计算高效的替代方案。然而,关于这些方法的比较泛化能力以及低秩更新的结构限制如何保持有效适应性能的问题仍然存在。我们提出了一个历史框架,涵盖过去(全微调和原始LoRA)、现在(LoRA的不同变体),并通过在现有LoRA变体中引入稀疏性,提出了更简单、更便宜、参数高效的扩展:Cheap LoRA (cLA),训练单个低秩因子而固定另一个(确定性地或在其随机变体中随机地),以及链式循环变体${c}^3$LA。我们将cLA视为非对称LoRA的结构化实例,作为全微调的控制列子空间限制。我们推导了这些变体的信息论泛化误差界,这是该领域的首批尝试之一。在实验上,我们评估了10个预训练模型和14个数据集上的11种微调方法,使用损失景观和谱分析等工具分析了微调模型的性能和泛化能力。尽管微调模型对预训练模型、数据集和其他因素敏感,但我们的研究表明,将基于LoRA的PEFT方法的适应限制在稀疏、结构化的列空间上,在参数匹配基线的任务上仍然具有竞争力,同时即使使用朴素、非优化的稀疏实现,也能减少高达10%的训练时间和高达15%的峰值GPU内存。我们的理论和实验泛化度量为其成本效益适应提供了比常用分析工具更一致和原则性的方法。概述和代码可在以下网址获取:此 https URL。

英文摘要

Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how the structural restrictions on low-rank updates preserve effective adaptation performance. We present a historical framing, covering the past (full fine-tuning and original LoRA), the present (different variants of LoRA), and propose simpler, cheaper, parameter-efficient extensions by inducing sparsity within existing LoRA variants: Cheap LoRA (cLA), training a single low-rank factor with the other fixed (deterministically or, in its randomized variant, stochastically), and the chained circulant variant, ${c}^3$LA. We frame cLA as a structured instance of asymmetric LoRA, serving as a controlled column-subspace restriction of full fine-tuning. We derive information-theoretic generalization error bounds for these variants, marking one of the first endeavors in this area. Empirically, we evaluate 11 fine-tuning methods across 10 pre-trained models and 14 datasets, analyzing the fine-tuned models' performance and generalization using tools such as loss landscapes and spectral analysis. Despite the sensitivity of fine-tuned models to the pre-trained model, datasets, and other factors, our study suggests that restricting LoRA-based PEFT methods' adaptation to a sparse, structured column space remains competitive across tasks with their parameter-matched baselines while reducing up to 10% training time and peak GPU memory up to 15%, even with a naïve, non-optimized, sparse implementation. Our theoretical and empirical generalization measures provide a more consistent and principled approach to their cost-effective adaptation than commonly used analytical tools. Overview and code are available at: https://elicaden.github.io/Beyond_LoRA/.

2606.13756 2026-06-15 cs.CL 新提交

QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning

QIAS 2026:伊斯兰继承推理共享任务综述

Abdessalam Bouchekif, Somaya Eltanbouly, Samer Rashwani, Shahd Gaben, Mutaz Al-Khatib, Heba Sbahi, Emad Mohamed, Mohammed Ghaly

发表机构 * Hamad bin Khalifa University(哈马德·本·哈利法大学) Nazarbayev University(纳扎尔巴耶夫大学)

AI总结 本文介绍 QIAS 2026 共享任务,评估大语言模型在伊斯兰继承领域的复杂推理能力,基于 MAWARITH 数据集,使用 MIR-E 多步指标评估,16 支队伍参与,结果显示该任务对当前模型极具挑战性。

详情
AI中文摘要

本文全面概述了 QIAS 2026 共享任务,该任务作为 OSACT7 研讨会的一部分组织,并与 LREC 2026 联合举办。该共享任务旨在评估大语言模型在伊斯兰继承这一宗教和法律领域进行复杂推理的能力。与传统的问答基准不同,QIAS 2026 侧重于从自然语言案例进行端到端推理,要求系统完成完整的继承计算过程,从识别合格继承人到为每位受益人分配正确份额。为支持此评估,任务基于 MAWARITH 基准,这是一个包含 12,500 个阿拉伯语继承案例的数据集,并标注了中间推理步骤和最终答案。系统提交使用 MIR-E 进行评估,这是一个多步指标,衡量继承推理主要阶段的性能。共有 16 支队伍参与共享任务,研究了多种方法,包括基于提示的方法、检索增强生成和微调策略。结果表明,伊斯兰继承对当前语言模型来说仍然是一个极具挑战性的基准,尤其是在需要精确法律解释和结构化数值推理的阶段。本概述总结了任务设计、数据集、评估框架、参与系统和主要结果。

英文摘要

This paper presents a comprehensive overview of the QIAS 2026 shared task, organized as part of the OSACT7 Workshop and co-located with LREC 2026. The shared task was designed to evaluate the ability of large language models to perform complex reasoning in the religious and legal domain of Islamic inheritance. Unlike conventional question-answering benchmarks, QIAS 2026 focuses on end-to-end reasoning from natural language cases, requiring systems to perform the full inheritance calculation process, from identifying the eligible heirs to assigning the correct share to each beneficiary. To support this evaluation, the task was based on the MAWARITH benchmark, a dataset of $12{,}500$ Arabic inheritance cases annotated with intermediate reasoning steps and final answers. System submissions were evaluated using MIR-E, a multi-step metric that measures performance across the main stages of inheritance reasoning. A total of $16$ teams participated in the shared task, investigating a range of approaches, including prompting-based methods, retrieval-augmented generation, and fine-tuning strategies. The results show that Islamic inheritance remains a highly challenging benchmark for current language models, especially in stages that require precise legal interpretation and structured numerical reasoning. This overview summarizes the task design, dataset, evaluation framework, participating systems, and main results.

2606.13754 2026-06-15 cs.LG 新提交

D2H-AD: A Hybrid Model Utilizing Hyperdimensional Computing for Advanced Anomaly Detection

D2H-AD:一种利用超维度计算进行高级异常检测的混合模型

Ghazal Ghajari, Elaheh Ghajari, Ashutosh Ghimire, Saeid Ataei, Faris Alsulami, Fathi Amsaad

发表机构 * Wright State University(莱特州立大学) Azad University(阿扎德大学) Stevens Institute of Technology(史蒂文斯理工学院) University of Jeddah(吉达大学)

AI总结 提出基于超维度计算的异常检测框架D2H-AD,通过距离相似性和密度感知编码统一表示,在多个基准数据集上优于现有方法,具有轻量、可解释和高效的特点。

详情
AI中文摘要

异常检测是智能系统的基本组成部分,应用于医疗、网络安全、智能电网和物联网环境。尽管传统的机器学习和深度学习方法在识别异常方面表现出有效性,但它们通常依赖大量标记数据集、计算成本高,并在边缘和高维场景中面临可扩展性挑战。本文提出D2H-AD,一种基于超维度计算(HDC)的新型异常检测框架,HDC是一种受大脑启发的范式,使用高维分布式向量表示信息。与现有基于HDC的方法不同,D2H-AD在统一框架中集成了基于距离的相似性和密度感知编码,改进了异常表示和检测性能。消融研究表明,仅超维度编码相比直接在原始特征空间应用相同的密度-距离评分,ROC-AUC提升高达5.4%。此外,D2H-AD在所有评估数据集上始终优于五个基线方法:HDAD、ODHD、一类SVM、孤立森林和自编码器。该框架轻量、可解释且计算高效,适用于资源受限和实时应用。我们在五个基准数据集上验证了D2H-AD,展示了优越的F1分数和ROC-AUC性能,以及对类别不平衡、噪声和数据复杂性的鲁棒性。除了提高准确性,D2H-AD还提供可扩展性、小内存占用和低延迟操作,这得益于二进制计算和紧凑设计。这些特性使其特别适用于TinyML和边缘AI部署。所提出的框架突显了HDC在动态环境中进行准确、可解释和节能异常检测的潜力。

英文摘要

Anomaly detection is a fundamental component of intelligent systems with applications in healthcare, cybersecurity, smart grids, and IoT environments. Although conventional machine learning and deep learning methods have demonstrated effectiveness in identifying anomalies, they often rely on large labeled datasets, incur high computational costs, and face scalability challenges in edge and high-dimensional settings. This paper presents D2H-AD, a novel anomaly detection framework based on Hyperdimensional Computing (HDC), a brain-inspired paradigm that represents information using high-dimensional distributed vectors. Unlike existing HDC-based methods, D2H-AD integrates distance-based similarity and density-aware encoding within a unified framework, improving anomaly representation and detection performance. Ablation studies show that hyperdimensional encoding alone yields up to 5.4% higher ROC-AUC than applying the same density-distance scoring directly in the original feature space. Furthermore, D2H-AD consistently outperforms five established baselines, namely HDAD, ODHD, One-Class SVM, Isolation Forest, and Autoencoders, across all evaluated datasets. The framework is lightweight, interpretable, and computationally efficient, making it suitable for resource-constrained and real-time applications. We validate D2H-AD on five benchmark datasets and demonstrate superior F1-score and ROC-AUC performance, together with robustness to class imbalance, noise, and data complexity. In addition to improved accuracy, D2H-AD offers scalability, a small memory footprint, and low-latency operation enabled by binary computations and a compact design. These properties make it particularly attractive for TinyML and edge AI deployments. The proposed framework highlights the potential of HDC for accurate, interpretable, and energy-efficient anomaly detection in dynamic environments.

2606.13753 2026-06-15 cs.LG cs.AI 新提交

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

权重范数设定“顿悟”时间尺度:因果延迟定律

Truong Xuan Khanh, Doan Hoang Viet, Luu Duc Trung, Phan Thanh Duc

发表机构 * H&K Research Studio / Clevix LLC(H&K研究工作室 / Clevix有限责任公司) Bac A Bank(北亚银行) Banking Academy of Vietnam(越南银行学院)

AI总结 通过干预训练中权重范数,发现网络在范数达到临界值Wc时发生顿悟,且延迟时间与固定范数倍数呈指数关系,揭示了范数对顿悟的因果作用。

Comments 14 papges, 9 figs and 3 tables

详情
AI中文摘要

“顿悟”是神经网络中泛化能力的延迟出现,远在模型拟合训练数据之后才发生。权重范数是否导致这种延迟存在争议:一些研究报告了转变时的临界范数,另一些则观察到没有固定范数的顿悟。我们通过在训练过程中干预范数而非仅观察它来解决这一问题。在带权重衰减的自由训练下,当权重范数达到一个跨种子和学习率变化很小(变异系数1%至2%)且随模数基按幂律增长的值Wc时,网络发生顿悟。当我们转而将范数固定为Wc的某个倍数ρ并保持该值时,网络仍然顿悟,但延迟遵循T_grok ∝ exp(α ρ)。一个指数α≈7.5拟合了四个模数下的延迟(R²=0.996)。在扫描范围内,固定范数使延迟变化约19倍,而学习率仅变化约2倍,且将范数保持在Wc以上会减慢而非阻止顿悟。最后的LayerNorm通过解耦权重尺度与网络函数消除了这种依赖;没有它,指数定律重新出现。这种固定范数的延迟是指数对应物,对应于自由收缩范数所预测的对数延迟。

英文摘要

Grokking is the delayed onset of generalization in neural networks, arising long after they fit the training data. Whether the weight norm causes this delay is disputed: some studies report a critical norm at the transition, others observe grokking with no fixed norm at all. We settle this by intervening on the norm during training rather than only observing it. Under free training with weight decay, networks grok when the weight norm reaches a value Wc that varies little across seeds and learning rates (CV 1 to 2 percent) and grows with the modular base as a power law. When we instead clamp the norm to a fixed multiple rho of Wc and hold it there, the network still groks, but the delay follows T_grok proportional to exp(alpha rho). One exponent, alpha near 7.5, fits this delay across four moduli (R^2 = 0.996). Over the swept ranges the held norm moves the delay by about 19x and the learning rate by only about 2x, and holding the norm above Wc slows grokking rather than preventing it. A final LayerNorm removes the dependence by decoupling weight scale from the network function; without it the exponential law returns. This pinned-norm delay is the exponential counterpart to the logarithmic delay predicted for a freely contracting norm.

2606.13748 2026-06-15 cs.LG 新提交

FedSPC: Shared Parameter Correction for Personalized Federated Learning

FedSPC:个性化联邦学习的共享参数校正

Kannanthodath Induchoodan Ajay Menon, Christian Prehofer, Yunfei Xu, Toru Hirano

发表机构 * DENSO AUTOMOTIVE Deutschland GmbH(电装汽车德国有限公司) DENSO International America, Inc.(电装国际美国公司) Technical University of Munich(慕尼黑工业大学)

AI总结 针对个性化联邦学习中共享参数因客户端局部目标不一致而更新冲突的问题,提出模块化校正方法FedSPC,仅对共享参数应用控制变量校正,在多种PFL设置下提升性能。

Comments Accepted for presentation at FL@FM-IJCAI'26, in conjunction with IJCAI 2026. 9 pages

详情
AI中文摘要

个性化联邦学习(PFL)是联邦学习中解决统计异质性的重要方法之一,同时支持客户端特定的适应。许多PFL方法将模型拆分为共享参数和个性化参数,并在每个客户端上联合训练。然而,这产生了一个优化问题:共享参数由优化不同局部目标的客户端更新,可能导致共享更新不一致并削弱共享表示。为解决此问题,我们提出联邦共享参数校正(FedSPC),一种用于PFL的模块化校正方法。FedSPC仅对给定PFL方法的共享参数应用控制变量校正,而保持个性化参数不变。它可以集成到三种常见的PFL设置中:共享特征提取器、共享分类器以及带有局部正则化的完全共享模型。在CIFAR-100和Tiny-ImageNet上使用ViT、ResNet-34和VGG-11的实验表明,FedSPC提高了代表性PFL方法(包括FedPer、FedRep、FedBABU、LG-FedAvg和Ditto)的性能。

英文摘要

Personalized federated learning (PFL) is one of the important approaches in federated learning for addressing statistical heterogeneity while enabling client-specific adaptation. Many PFL methods split the model into shared and personalized parameters, which are jointly trained on each client. However, this creates an optimization issue: shared parameters are updated by clients optimizing different local objectives, which can lead to inconsistent shared updates and weaken the shared representation. To address this problem, we propose Federated Shared Parameter Correction (FedSPC), a modular correction method for PFL. FedSPC applies control-variate correction only to the shared parameters of a given PFL method, while leaving personalized parameters unchanged. It can be integrated into three common PFL settings: shared feature extractors, shared classifiers, and fully shared models with local regularization. Experiments on CIFAR-100 and Tiny-ImageNet with ViT, ResNet-34, and VGG-11 show that FedSPC improves performance across representative PFL methods, including FedPer, FedRep, FedBABU, LG-FedAvg, and Ditto.

2606.13746 2026-06-15 cs.RO 新提交

Scalable Dynamic Tactile Sensing Enabled by Passive and Flexible Acoustic Waveguides

可扩展动态触觉传感:基于被动柔性声波导

Guimin Long, Changhong Linghu, Chuanping Liu, Ke Xu, Xingjian Jing

发表机构 * Department of Mechanical Engineering, City University of Hong Kong(香港城市大学机械工程系)

AI总结 提出一种基于深亚波长声波导的被动分布式触觉传感范式,通过弹性膜帽亥姆霍兹谐振器和弹簧增强微管网络实现弯曲不变性,结合稀疏麦克风阵列与轻量神经网络,在4个麦克风64节点阵列中实现4mm空间分辨率和>99%定位精度,支持低频信号波形重建,并展示指尖阵列、触觉手套和大面积皮肤等原型。

Comments 40 pages, 6 figures

详情
AI中文摘要

人工动态触觉传感需要灵敏度、鲁棒性和柔顺性,但现有技术在大面积阵列扩展时面临权衡,加上布线复杂性和成本。本文报告了一种使用深亚波长声波导的被动分布式范式,将性能与结构柔性解耦。弹性膜帽封装的亥姆霍兹谐振器由弹簧增强微管互连,形成封闭网络,在宏观弯曲下保持声学传输不变。通过稀疏嵌入麦克风,系统实现了低频信号(<100 Hz)的实时定位(4 mm最高空间分辨率;4个麦克风64节点传感阵列中准确率>99%)和波形重建。快速连续小波变换和轻量神经网络可在5.5 ms内完成推理。我们展示了适形原型——指尖阵列、触觉手套和大面积皮肤——可检测从单根头发接触到5 mg颗粒撞击、动脉脉搏波、羽毛触摸和手指接触的刺激。这为下一代人机界面建立了一种可扩展、灵活、低成本的范式。

英文摘要

Artificial dynamic tactile sensing requires sensitivity, robustness, and compliance, yet existing technologies face trade-offs when scaling to large-area arrays, compounded by wiring complexity and cost. Here, we report a passive distributed paradigm using deep sub-wavelength acoustic waveguides that decouples performance from structural flexibility. Elastic-membrane-capped Helmholtz resonators interconnected by spring-reinforced microtubes form an enclosed network with invariant acoustic transmission under macroscopic bending. By sparsely embedding microphones, the system achieves real-time localization (4 mm highest spatial resolution; >99% accuracy in a 4 microphones 64-node sensing array) and waveform reconstruction of low-frequency signals (<100 Hz). Fast Continuous Wavelet Transform and a lightweight neural network enable inference within 5.5 ms. We demonstrate conformable prototypes-fingertip arrays, a tactile glove, and large-area skins-detecting stimuli from single-hair contact to 5-mg particle impacts, arterial pulse waves, feather touches, and finger contact. This establishes a scalable, flexible, low-cost paradigm for next-generation human-machine interfaces.

2606.13742 2026-06-15 cs.LG cs.AI physics.comp-ph physics.flu-dyn stat.ML 新提交

A fully GPU-based workflow for building physics emulators of hypersonic flows

基于全GPU工作流构建高超声速流物理仿真器

Fabian Paischer, Dylan Rubini, Deniz A. Bezgin, Aaron B. Buhendwa, David Hauser, Florian Sestak, Johannes Brandstetter, Sebastian Kaltenbach, Nikolaus A. Adams

发表机构 * TU Munich(慕尼黑工业大学) Institute for Machine Learning, JKU Linz(林茨约翰·开普勒大学机器学习研究所) ELLIS Unit(ELLIS单元) EMMI AI

AI总结 提出全GPU工作流,集成加速数据生成与不确定性量化增强的神经仿真器训练,通过可微求解器JAX-Fluids实现残差驱动改进,提升物理一致性并支持外推。

Comments First authors contributed equally

详情
AI中文摘要

以高保真度和低计算成本解析复杂物理现象的能力是解决现代工程关键挑战的核心。一个典型例子是高超声速流,其中精确预测全流场拓扑,特别是激波位置和强度,至关重要。然而,超声速和高超声速流仍然是传统降阶模型和神经仿真器的绊脚石,这些模型难以在工业相关应用中物理一致地捕捉流态中的陡峭梯度。为此,我们引入了一个完全基于GPU的工作流,该工作流将加速数据生成与通过不确定性量化和物理感知细化增强的神经仿真器训练相结合。我们的工作流由可微高保真求解器(JAX-Fluids)实现,我们利用该求解器进行快速数据集创建和基于残差的神经仿真器改进,以增强物理一致性。在此框架基础上,我们首先提出了一系列模型架构,并分析了它们的缩放行为以揭示其优缺点。然后,我们表明基于残差的细化使得能够在仅提供网格和输入参数的情况下进行训练,显著降低残差并提高物理一致性。可微仿真和基于残差的细化共同产生了在其训练分布之外仍然可靠的物理仿真器,这是在现实工程设计循环中部署代理的关键要求。

英文摘要

The ability to resolve complex physical phenomena with high fidelity and at low computational cost is central to addressing key challenges in modern engineering. A prime example lies in hypersonic flows, where the precise prediction of the full flowfield topology, in particular with respect to shock wave location and intensity, is critical. Yet supersonic and hypersonic flows continue to be a stumbling block for traditional reduced-order models and neural emulators that struggle to capture steep gradients in flow states with physical consistency in applications of industrial relevance. To that end, we introduce a fully GPU based workflow that integrates accelerated data generation with the training of neural emulators augmented by uncertainty quantification and physics-aware refinement. Our workflow is enabled by a differentiable high-fidelity solver (JAX-Fluids) which we employ for rapid dataset creation and residual-based improvement of the neural emulator to enhance physical consistency. Building on this framework, we first present a suite of model architectures and analyze their scaling behavior to expose their strengths and shortcomings. We then show that residual-based refinement enables training on cases where only mesh and input parameters are available, substantially reducing residuals and improving physical consistency. Together, differentiable simulation and residual-based refinement yield physics emulators that remain reliable beyond their training distribution, a key requirement for deploying surrogates in real-world engineering design loops.

2606.13741 2026-06-15 cs.LG 新提交

High-Frequency Pricing at Scale for E-Commerce

电子商务中的大规模高频定价

Stefan Birr, Tobias Huelden, Mones Raslan, Adele Gouttes, Andreas Schmitt, Mateusz Koren, Johannes Stephan, Robert Streek, Manuel Kunz, Tim Januschowski

发表机构 * Zalando SE Databricks

AI总结 提出一种预测-优化框架,结合梯度提升树与多目标优化,实现时尚电商促销活动的每日高频定价,通过23次A/B测试验证,利润提升约6%。

详情
AI中文摘要

本文介绍了针对时尚电商促销活动的一种专门的预测-优化算法定价工具的设计、开发和实施。销售活动给定价带来了独特的挑战,包括波动的需求模式、快速的定价决策以及平衡短期收入与长期盈利能力的需要。我们描述了我们的方法,该方法结合了使用梯度提升树的每日分辨率需求预测与一个多目标优化框架,该框架针对超过500万件商品同时最大化长期利润和净商品价值。我们的解决方案通过实现一个预测-优化架构,将定价决策时间从数小时缩短到数分钟,解决了现有周粒度系统的关键局限性。我们通过在2023-2024年期间在欧洲领先的在线时尚零售商Zalando的12个市场中进行的23次A/B测试验证了我们的方法。实验结果表明,与之前的手动-算法混合方法相比,新的定价系统在保持同等销售和收入表现的同时,实现了约6%的更高利润。基于这些结果,该算法已成功部署到生产环境,现在负责公司促销活动中的大部分算法定价决策。

英文摘要

This paper presents the design, development, and implementation of a specialized forecast-then-optimize algorithmic pricing tool for sales campaigns in fashion e-commerce. Sales events present unique challenges for pricing including volatile demand patterns, rapid pricing decisions, and the need to balance short-term revenue with long-term profitability. We describe our approach combining daily-resolution demand forecasting using gradient-boosted trees with a multi-objective optimization framework that maximizes both long-term profit and net merchandise value for more than 5 million articles. Our solution addresses key limitations of existing weekly-granularity systems by implementing a forecast-then-optimize architecture that reduces pricing decision time from hours to minutes. We validate our approach through 23 A/B tests across 12 markets during 2023-2024 sales campaigns at Zalando, one of Europe's leading online fashion retailers. Experimental results demonstrate that the new pricing system achieves approximately 6% higher profit while maintaining equivalent performance on sales and revenue compared to the previous manual-algorithmic hybrid approach. Based on these results, the algorithm was successfully deployed to production and now handles the majority of algorithmic pricing decisions for sales campaigns at the company.

2606.13740 2026-06-15 cs.LG 新提交

Efficient On-Device Diffusion LLM Inference with Mobile NPU

基于移动NPU的高效设备端扩散大语言模型推理

Tuowei Wang, Yanfan Sun, Ju Ren

发表机构 * Tsinghua University(清华大学) Beihang University(北京航空航天大学)

AI总结 提出首个NPU感知推理框架Diffusion-LLM-on-NPU,通过多块推测解码、双路径渐进修正和交换优化内存运行时,在移动设备上加速扩散大语言模型推理,相比CPU基线实现17-42倍延迟降低。

详情
AI中文摘要

扩散大语言模型(dLLM)通过并行去噪多个token来加速生成,使其适用于延迟敏感的移动端推理。然而,重复去噪在智能手机上引入了大量计算。移动神经处理单元(NPU)提供高吞吐量的密集矩阵计算,但高效利用它们仍然具有挑战性:token提交缩小了每块的有效工作负载,token修订使KV缓存重用复杂化,且NPU可见地址空间有限导致昂贵的重映射和数据传输开销。在本文中,我们提出了Diffusion-LLM-on-NPU,这是首个用于在智能手机上加速dLLM的NPU感知推理框架。Diffusion-LLM-on-NPU通过三种技术将块级dLLM推理与移动NPU的执行特性对齐。(1)多块推测解码用推测的未来块token填充当前块解码后期阶段缩小的负载。(2)双路径渐进修订使已提交的token在稳定前保持可修订,并通过CPU侧路径刷新不稳定token,而不会阻塞密集的NPU执行。(3)交换优化内存运行时压缩NPU可见地址布局,并将数据准备与NPU计算重叠,以减少重映射和传输开销。我们将Diffusion-LLM-on-NPU实现为端到端框架,并在多种硬件平台和dLLM工作负载上进行评估。Diffusion-LLM-on-NPU在保留生成质量的同时,将LLaDA-8B的生成延迟比使用前缀KV缓存重用的CPU基线降低了17倍至42倍。

英文摘要

Diffusion large language models (dLLMs) accelerate generation by denoising multiple tokens in parallel, making them attractive for latency-sensitive mobile inference. However, repeated denoising introduces substantial computation on smartphones. Mobile neural processing units (NPUs) offer high-throughput dense matrix computation, but efficiently exploiting them remains challenging: token commitment shrinks per-block effective workloads, token revision complicates KV cache reuse, and limited NPU-visible address space incurs costly remapping and data transfer overheads. In this paper, we propose llada.cpp, the first NPU-aware inference framework for accelerating dLLMs on smartphones. llada.cpp aligns block-wise dLLM inference with the execution characteristics of mobile NPUs through three techniques. (1) Multi-Block Speculative Decoding fills the shrinking workload in late-stage current-block decoding with speculative future-block tokens. (2) Dual-Path Progressive Revision keeps committed tokens revisable until stable and refreshes unstable tokens through a CPU-side path without stalling dense NPU execution. (3) Swap-Optimized Memory Runtime compacts NPU-visible address layouts and overlaps data staging with NPU computation to reduce remapping and transfer overheads. We implement llada.cpp as an end-to-end framework and evaluate it across diverse hardware platforms and dLLM workloads. llada.cpp reduces LLaDA-8B generation latency by 17x-42x over the CPU baseline with prefix KV cache reuse, while preserving generation quality.

2606.13736 2026-06-15 cs.CV 新提交

Connections Between Pairs of Filters Improve the Accuracy of Convolutional Neural Networks

滤波器对之间的连接提高卷积神经网络的准确性

Kathleen Anderson, Philipp Grüning, Erhardt Barth

发表机构 * GitHub

AI总结 本文提出在卷积神经网络中引入可学习的滤波器对连接函数,替代传统点式非线性激活,通过在不同层自适应调整连接方式提升网络性能。

Comments IJCNN 2023

详情
AI中文摘要

尽管研究人员不断为CNN寻找新的改进网络结构,但大多数新发明的架构仍然依赖于堆叠卷积块并用点式激活函数分隔的传统模式。然而,纯粹基于点式非线性的网络存在缺陷。一种替代方案是在网络的两个滤波器之间引入成对连接。典型的连接函数使用乘法或最小值操作来实现逻辑AND连接。在本文中,我们进一步证明CNN可以从更通用的连接中受益,这些连接包含可学习的参数。通过这样的参数,网络能够在不同的网络层实现不同的连接,并更好地使连接函数适应手头的任务。

英文摘要

While researchers continue to find new and improved network structures for CNNs, most of the newly invented architectures still rely on the traditional pattern of stacking convolutional blocks and separating them with pointwise activation functions. However, there are drawbacks to a network purely building on pointwise nonlinearities. One alternative is to introduce a pairwise connection between two filters of a network. Typical connection functions use multiplications or the minimum operation to realize logical AND connections. In this paper, we go one step further by demonstrating that CNNs can benefit from more general connections, which include parameters that are learned. With such parameters, the network is able to implement different connections in different network layers and better adapt the connection function to the task at hand.

2606.13734 2026-06-15 cs.AI 新提交

AI Receptivity or AI Adoption Breadth? A Tool-Specific Reanalysis of the Lower-Literacy/Higher-Usage Link

AI 接受度还是 AI 采用广度?对低素养/高使用率关联的工具特定再分析

Hristo Inouzhe

发表机构 * Universidad Autónoma de Madrid(马德里自治大学)

AI总结 本文重新分析 Tully 等人(2025)的研究,发现 AI 素养与 AI 使用之间的负相关关系因工具类型而异,低素养仅预测非文本 AI 工具的采用广度而非使用强度。

Comments 11 pages, 2 tables, 1 figure

详情
AI中文摘要

Tully、Longoni 和 Appel(2025)最近报告的证据表明,较低的人工智能(AI)素养预示着对 AI 更高的接受度。我们使用该文章研究 3 的公开数据重新审视这一主张,该数据以五点频率量表测量了过去对五类 AI 工具的使用情况。我们首先通过 OLS 对参与者水平平均值、二元 logit、有序 logit 和多项 logit 规范,再现了 AI 素养与总体 AI 使用之间的负相关关系。然后,我们表明总体关系掩盖了按工具类型划分的显著异质性。在我们调整了人口统计变量的主要规范中,AI 素养不能显著预测文本 AI 使用(有序 logit β = -0.090,p = .387),而它仍然是非文本 AI 采用的强预测因子(β = -0.377,p < .001)。非文本效应在 Tully 等人原始研究 3 的控制规范下也是稳健的(β = -0.502,p < .001)。二元、有序 logit 和多项规范表明,非文本关系主要是一种采用/非采用模式,而非密集使用的证据:调整人口统计变量后,曾经使用过非文本 AI 工具的比值比为 0.68。因此,在测量自我报告过去使用而非陈述偏好的研究中,证据不支持简单的说法,即较低的 AI 素养预示着对 AI 总体上更高的接受度。它反而指向一个更狭窄的模式,即在渗透率较低的非文本 AI 工具中更广泛的采用。

英文摘要

Recent evidence reported by Tully, Longoni, and Appel (2025) suggests that lower artificial intelligence (AI) literacy predicts greater receptivity toward AI. We revisit this claim using the public data from Study 3 of that article, which measures past usage of five AI tool categories on a five-point frequency scale. We first reproduce the negative association between AI literacy and aggregate AI usage using OLS on participant-level averages, binary logit, ordered logit, and multinomial logit specifications. We then show that the aggregate relationship masks substantial heterogeneity by tool type. In our demographic-adjusted primary specification, AI literacy does not significantly predict text AI usage (ordered-logit $β$ = -0.090, p = .387), whereas it remains a strong predictor of non-text AI adoption ($β$ = -0.377, p < .001). The non-text effect is also robust under Tully et al.'s original Study 3 control specification ($β$ = -0.502, p < .001). Binary, ordered-logit, and multinomial specifications suggest that the non-text relationship is primarily an adoption/non-adoption pattern rather than evidence of intensive use: the demographic-adjusted odds ratio of ever having used a non-text AI tool is 0.68. Thus, in the study that measures self-reported past usage rather than stated preferences, the evidence does not support a simple claim that lower AI literacy predicts greater receptivity to AI in general. It points instead to a narrower pattern of broader adoption across lower-penetration, non-text AI tools.

2606.13732 2026-06-15 cs.AI 新提交

When Sample Selection Bias Precipitates Model Collapse

当样本选择偏差引发模型崩溃

Xinbao Qiao, Xianglong Du, Wei Liu, Jingqi Zhang, Peihua Mai, Meng Zhang, Yan Pang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文研究低资源验证场景下,基于局部有偏参考分布的数据选择反而加速模型崩溃,并提出多数据孤岛协同的Wasserstein代理参考缓解多样性退化。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

在合成数据上递归训练的普及可以缓解数据稀缺,但存在模型崩溃的风险,即重复训练会侵蚀分布尾部并使输出同质化。数据选择被广泛视为一种补救措施,但其可靠性关键取决于验证器使用的参考分布。我们表明,在低资源验证机制中,每个验证器仅观察到目标流形的一个小、碎片化且有偏的切片,选择本身也会变得有偏。这种情况自然出现在低资源数据孤岛中,例如医疗联盟或专有金融机构,其中原始数据无法汇集,本地参考固有地不完整。结果,选择优先保留与本地流形对齐的样本,同时剪除全局相关的尾部模式,从防止崩溃的保障转变为引发崩溃的机制。我们从理论上证明,这种孤岛选择加速了崩溃并导致幂律多样性衰减。作为一种初步缓解措施,我们在不共享原始数据的情况下,从多个数据孤岛构建Wasserstein代理参考。实证结果证实,本地参考选择在偏斜分布上失败,而协作代理参考减轻了多样性退化,表明当真实数据覆盖范围碎片化或稀缺时,递归合成数据管道需要特别谨慎。

英文摘要

The proliferation of recursive training on synthetic data can alleviate data scarcity but risks model collapse, where repeated training erodes distributional tails and homogenizes outputs. Data selection is widely viewed as a remedy, yet its reliability depends critically on the reference distribution used by the verifier. We show that in low-resource verification regimes, where each verifier observes only a small, fragmented, and biased slice of the target manifold, selection itself becomes biased. This situation naturally arises in low-resource data silos such as healthcare consortia or proprietary financial institutions, where raw data cannot be pooled and local references are inherently incomplete. As a result, selection preferentially retains samples aligned with the local manifold while pruning globally relevant tail modes, turning from a safeguard against collapse into a mechanism that precipitates it. We theoretically prove that such siloed selection accelerates collapse and induces power-law diversity decay. As an initial mitigation, we construct Wasserstein proxy references from multiple silos without sharing raw data. Empirical results confirm that local-reference selection fails on skewed distributions, whereas collaborative proxy references mitigate diversity degradation, suggesting that recursive synthetic-data pipelines require particular caution when real-data coverage is fragmented or scarce.

2606.13731 2026-06-15 cs.AI cs.MA 新提交

TwinBI: An Agentic Digital Twin for Efficient Augmented Interactions with Business Intelligence Dashboards

TwinBI:一种用于与商业智能仪表盘高效增强交互的智能数字孪生

Jisoo Jang Wen-Syan Li

发表机构 * Graduate School of Data Science, Seoul National University(首尔大学数据科学研究生院)

AI总结 提出TwinBI框架,通过LLM代理与可执行仪表盘状态耦合,统一对话、操作、语义和溯源,提升多步分析中状态一致性,将精确匹配准确率从43.3%提升至63.3%,超时率从40%降至10%。

详情
AI中文摘要

商业智能(BI)越来越多地将仪表盘交互与基于LLM的辅助相结合,但这两种模式在多步分析中常常不同步。当用户在直接仪表盘操作和自然语言查询之间切换时,很难在过滤器、层次结构、指标和图表上下文中保持一致的分析状态。我们提出TwinBI,一种智能数字孪生框架,将基于LLM的代理系统与可执行的BI仪表盘状态耦合。TwinBI通过从统一交互日志重建的共享分析状态,统一了对话交互、仪表盘操作、语义基础和溯源追踪。它还公开了诸如模式视图、SQL、日志和/insights命令等工件,用于基于状态的分析摘要。我们通过两种互补方式评估TwinBI。在相同骨干代理的受控A/B基准测试中,与仅使用仪表盘相比,TwinBI将精确匹配准确率从43.3%提高到63.3%,部分信用准确率从48.3%提高到70.8%,并显著将超时率从40.0%降低到10.0%。在可用性研究中,参与者受益于集成的仪表盘和聊天工作流,任务准确性高,工作负载适中,对状态感知交互机制评价良好。这些结果表明,TwinBI通过将可见的仪表盘状态转化为更丰富的可操作上下文,提高了代理级别的分析可靠性和面向用户的分析支持。我们的数据集和源代码可在以下网址获取:this https URL

英文摘要

Business intelligence (BI) increasingly combines dashboard interaction with LLM-based assistance, but these two modes often fall out of sync during multi-step analysis. As users switch between direct dashboard manipulation and natural-language queries, it becomes difficult to preserve a consistent analytical state across filters, hierarchies, metrics, and chart context. We present TwinBI, an agentic digital-twin framework that couples an LLM-based agent system with an executable BI dashboard state. TwinBI unifies conversational interaction, dashboard manipulation, semantic grounding, and provenance tracking through a shared analytical state reconstructed from a unified interaction log. It also exposes artifacts such as schema views, SQL, logs, and an /insights command for state-grounded analytical summaries. We evaluate TwinBI in two complementary ways. In a controlled A/B benchmark with the same backbone agent, TwinBI improves exact-match accuracy from 43.3% to 63.3%, partial-credit accuracy from 48.3% to 70.8%, and substantially reduces timeout rate from 40.0% to 10.0% relative to Dashboard alone. In a usability study, participants benefited from the integrated dashboard-and-chat workflow, with high task accuracy, moderate workload, and favorable ratings for state-aware interaction mechanisms. These results suggest that TwinBI improves both agent-level analytical reliability and user-facing analytical support by turning visible dashboard state into richer actionable context. Our dataset and source code are available at: https://github.com/simonjisu/TwinBI

2606.13727 2026-06-15 cs.RO 新提交

Occupancy-Grounded Room Segmentation for Hierarchical 3D Scene Graphs

基于占用空间的房间分割用于分层3D场景图

Carlos Cueto Zumaya, Iacopo Catalano, Jorge Peña-Queralta, Wallace Moreira Bessa

发表机构 * University of Turku(图尔库大学) Centre for Artificial Intelligence, Zürich University of Applied Sciences(苏黎世应用科学大学人工智能中心)

AI总结 提出一种基于占用分解的房间节点锚定方法,构建分层3D场景图,在Matterport3D数据集上相比基线方法恢复了更多房间实例。

详情
AI中文摘要

室内机器人的分层3D场景图(3DSGs)在空间尺度上组织几何和语义信息,其中房间层连接对象级感知和房间级推理。现有系统从不同的空间基板(例如,地点聚类、墙壁平面或分割输出)构建该层,因此房间节点没有在共同的几何标准上进行评估。我们提出了一种基于占用空间的3DSG管道,其中房间节点锚定到从占用分解中跟踪的自由空间区域,为每个房间提供明确的多边形足迹。我们在12个Matterport3D场景上评估该管道,通过将预测的房间多边形与标注的房间实例进行匹配,并与代表性最先进的地点连接基线Hydra进行比较。结果表明,基于占用空间的锚定比地点连接构建恢复了更多的房间实例,但代价是精度较低,并且两种方法在墙壁精确的房间边界方面仍然是一个开放问题。代码可在该https URL获取。

英文摘要

Hierarchical 3D scene graphs (3DSGs) for indoor robots organize geometric and semantic information across spatial scales, with a room layer that connects object-level perception to room-scale reasoning. Existing systems construct this layer from different spatial substrates (\eg{} place clusters, wall planes, or segmentation outputs), and as a result, room nodes are not evaluated on a common geometric criterion. We present an occupancy-grounded 3DSG pipeline in which room nodes are anchored to tracked free-space regions derived from occupancy decomposition, giving each room an explicit polygonal footprint. We evaluate the pipeline on 12 Matterport3D scenes by matching predicted room polygons to annotated room instances and compare against Hydra, a representative state-of-the-art place-connectivity baseline. The results show that occupancy-grounded anchoring recovers substantially more room instances than place-connectivity construction, at the cost of lower precision, and that wall-accurate room boundaries remain an open problem for both methods. Code is available at https://github.com/crcz25/OccuSG.

2606.13723 2026-06-15 cs.CV cs.AI 新提交

Morphology-Aware Sample Assignment: Overcoming IoU Insensitivity for Surface Defect Detection

形态感知样本分配:克服IoU不敏感性用于表面缺陷检测

Pengfei Liu, Yuhan Guo

发表机构 * School of Management, Harbin Institute of Technology(管理学院,哈尔滨工业大学) College of Computing and Data Science, Nanyang Technological University(计算与数据科学学院,南洋理工大学)

AI总结 针对IoU在缺陷检测中不敏感的问题,提出基于面积、形状和长宽比的形态相似性度量来优化正样本分配,理论分析表明该方法能重塑匹配函数响应分布,在NEUDET和GC10-DET数据集上基于YOLOv9框架取得一致性能提升,且零额外推理开销。

详情
AI中文摘要

交并比(IoU)作为评估候选框与真实标注空间对齐的关键指标,直接决定了正样本集的质量和视觉检测模型的训练效果。通过理论建模和分析,我们揭示了IoU响应曲线上的一个非敏感区域,在该区域内,尽管样本的几何重叠程度不同,但IoU得分几乎相同。为克服这一局限,我们引入一组形态相似性度量,涵盖面积、形状和长宽比,以优化正样本分配过程,从而确保更具区分性和可靠性的匹配。通过基于均值的多维相似性聚合,推导出一个补充匹配分数,补偿IoU在表示结构对应性方面的固有缺陷。理论上,融入形态相似性重塑了匹配函数的响应分布,产生有效的方向梯度和多边形等响应轮廓,将高响应区域紧密限制在每个真实实例周围,显著提高了正样本选择的精度。基于YOLOv9框架的实验在NEUDET和GC10-DET数据集上均取得一致性能提升。值得注意的是,所提方法完全即插即用,且零额外推理开销,从而确保了工业视觉检测的部署效率。

英文摘要

Intersection-over-Union (IoU), as a pivotal metric for evaluating the spatial alignment between candidate proposals and ground-truth annotations, directly determines the quality of positive sample sets and the training efficacy of visual detection models. Through theoretical modeling and analysis, we uncover a non-sensitive region on the IoU response curve, within which samples yield nearly identical IoU scores despite distinct geometric overlaps. To overcome this limitation, we introduce a set of morphological similarity metrics covering area, shape, and aspect ratio, to refine the positive sample assignment process, thereby ensuring more discriminative and reliable matching. A supplementary matching score is derived via mean-based aggregation of these multidimensional similarities, compensating for the intrinsic limitation of IoU in representing structural correspondence. Theoretically, incorporating morphological similarity reshapes the response distribution of the matching function, yielding both effective directional gradients and polygon-like iso-response contours, which tightly confine high-response regions around each ground-truth instance and substantially enhance the precision of positive sample selection. Experiments based on the YOLOv9 framework demonstrate consistent performance gains on both NEUDET and GC10- DET datasets. Notably, the proposed approach is fully plug-and-play and incurs zero additional inference overhead, thereby ensuring deployment efficiency for industrial visual inspection.