arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.19522 2026-06-19 cs.AI 新提交

REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk

REVEAL++：用于阿尔茨海默病风险视觉-语言视网膜建模的可微分表型分组

Ethan Elio Meidinger, Seowung Leem, Zeyun Zhao, Ruogu Fang

发表机构 * University of Virginia（弗吉尼亚大学）； J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida（佛罗里达大学赫伯特·韦特海姆工程学院J. Crayton Pruitt家庭生物医学工程系）

AI总结提出可微分连续表型相似性权重函数，替代离散分组，在对比学习中端到端学习跨模态对齐与表型结构，提升AD风险预测。

Comments Accepted for publication at MICCAI 2026

详情

AI中文摘要

视网膜为神经退行性疾病提供了非侵入性窗口，能够捕捉与未来认知衰退风险相关的细微结构模式。诸如REVEAL等视觉-语言对齐框架已表明，将视网膜眼底图像与结构化临床风险叙述配对可改善阿尔茨海默病（AD）的早期预测。这些方法的一个关键设计选择是使用表型分组，即在对比学习中将具有相似风险特征的个体视为多正对。然而，现有方法将表型相似性操作化为离散构造，依赖硬分组分配，施加刚性监督并将分组形成与表示学习分离。我们提出对比学习中表型结构的连续形式。我们不将样本分配到固定聚类，而是将受试者间相似性建模为可微分权重函数，该函数源自视网膜图像和风险特征中模态内嵌入相似性。这些权重通过连续聚合算子定义软多正关系，实现反映疾病风险谱的梯度监督。我们进一步引入软目标对比目标，以端到端方式联合学习跨模态对齐和表型结构。在UK Biobank视网膜成像数据上进行AD发病预测评估，所提框架持续优于基于离散分组的对比学习和标准视觉-语言基线。通过将表型相似性视为可学习的连续信号而非固定分组规则，我们的方法为从多模态视网膜和临床数据中进行人群规模的神经退行性风险建模提供了有原则且稳健的基础。

英文摘要

The retina offers a noninvasive window into neurodegenerative disease, capturing subtle structural patterns associated with a risk of future cognitive decline. Vision-language alignment frameworks such as REVEAL have shown that pairing retinal fundus images with structured clinical risk narratives improves early prediction of Alzheimer's disease (AD). A key design choice in these approaches is the use of phenotypic grouping, where individuals with similar risk profiles are treated as multi-positive pairs during contrastive learning. However, existing methods operationalize phenotypic similarity as a discrete construct, relying on hard group assignments that impose rigid supervision and decouple group formation from representation learning. We propose a continuous formulation of phenotypic structure within contrastive learning. Rather than assigning samples to fixed clusters, we model inter-subject similarity as a differentiable weighting function derived from intra-modality embedding similarities in both retinal images and risk profiles. These weights define soft multi-positive relationships through a continuous aggregation operator, enabling graded supervision that reflects the spectrum nature of disease risk. We further introduce a soft-target contrastive objective that jointly learns cross-modal alignment and phenotypic structure in an end-to-end manner. Evaluated on UK Biobank retinal imaging data for incident AD prediction, the proposed framework consistently outperforms discrete group-based contrastive learning and standard vision-language baselines. By treating phenotypic similarity as a learnable, continuous signal rather than a fixed grouping rule, our approach provides a principled and robust foundation for population-scale neurodegenerative risk modeling from multi-modal retinal and clinical data.

URL PDF HTML ☆

赞 0 踩 0

2606.19512 2026-06-19 cs.RO cs.SY eess.SY 新提交

DeXposure-Claw: 一个用于DeFi风险监管的智能体系统

Aijie Shu, Bowei Chen, Wenbin Wu, Cathy Yi-Hsuan Chen, Fengxiang He

发表机构 * University of Edinburgh（爱丁堡大学）； University of Glasgow（格拉斯哥大学）； University of Cambridge（剑桥大学）

AI总结针对DeFi监管中LLM智能体易误报的问题，提出DeXposure-Claw系统，通过图时间序列基础模型预测风险网络，结合确定性监控和置信度门控生成可审计监管票据，并构建六轴评估基准DeXposure-Bench，实验验证有效性。

详情

AI中文摘要

去中心化金融使监管者面临快速变化的网络化信用风险。通用LLM智能体不适合此场景：它们过度解读弱证据并推荐高风险干预，而现有评估无法提供符合监管者需求的误报衡量方式。我们提出DeXposure-Claw，一个基于预测的智能体监管系统，通过结构化证据引导LLM决策：(1) DeXposure-FM，一个图时间序列基础模型，预测未来风险网络；(2) 确定性监控和压力场景将预测转化为类型化警报、归因信号和场景证据；(3) 数据健康和置信度门控在DeXposure-Claw发出带有理由的可审计监管票据前限制升级。我们进一步开发了DeXposure-Bench，一个六轴评估框架，其决策轴根据符合监管者的绝对损失真实情况和显式误干预率对票据评分。在五年每周真实数据上的实验充分支持了我们的系统。代码见 https://this URL。

英文摘要

Decentralized finance exposes supervisors to fast-moving, networked credit risks. General-purpose LLM agents fit this setting poorly: they over-read weak evidence and recommend high-stakes interventions, while existing evaluations offer no regulator-aligned way to measure the resulting false alarms. We introduce DeXposure-Claw, a forecast-grounded agentic supervision system that routes LLM decisions through structured evidence: (1) DeXposure-FM, a graph time-series foundation model, forecasts future exposure networks; (2) deterministic monitors and stress scenarios then turn those forecasts into typed alerts, attribution signals, and scenario evidence; and (3) data-health and confidence gates constrain escalation before DeXposure-Claw emits auditable supervisory tickets with rationales. We further develop DeXposure-Bench, a six-axis evaluation harness, whose decision axis scores tickets against a regulator-aligned absolute-loss ground truth and an explicit false-intervention rate. Experiments on five years of weekly real data fully support our system. Code is at https://github.com/EVIEHub/DeXposure-Claw.

URL PDF HTML ☆

赞 0 踩 0

2606.19496 2026-06-19 cs.LG 新提交

LEAP: 通过自适应进度实现视觉Transformer蒸馏的层跳过效率

Jiaqi Zhang, Ashton Lee, Anthony Wong, John Zou, Sami BuGhanem, Randall Balestriero

发表机构 * Brown University（布朗大学）； Rice University（莱斯大学）

AI总结提出LEAP训练课程，通过自适应选择教师中间特征图作为渐进式目标，加速学生ViT的知识蒸馏，在ImageNet-100上提升12.24%准确率，并节省25.1%训练FLOPs。

详情

AI中文摘要

基于视觉Transformer（ViT）骨干的视觉基础模型（VFMs），如DINOv2，已成为目标识别和语义分割等下游任务的关键。骨干网络的巨大计算需求通常需要将其蒸馏到更小的架构中以便在边缘部署。基于特征的知识蒸馏（KD）常受师生差距影响；学生由于容量有限难以模仿教师复杂的特征图。为缓解这一瓶颈，我们提出LEAP：通过自适应进度实现层跳过效率，一种用于ViT特征知识蒸馏的训练课程。通过利用教师的中间特征图作为一系列逐渐困难的渐进目标，我们的课程允许学生在处理更高层抽象之前构建基础表示。我们的结果表明，这种范式通过在不同学生模型大小和数据集规模上自适应选择难度，显著加速了收敛。采用我们的课程，LEAP蒸馏的ViT-S在ImageNet-100上达到90.1%的准确率，相比基线提升12.24%。在ImageNet-1K上，LEAP在Oxford和Paris数据集上的实例检索任务分别提升3.84%和7.75%。此外，该课程通过在训练初始阶段对教师推理实施早停，在ImageNet-100上节省了25.1%的训练FLOPs和21%的训练时间。代码可在以下网址获取：https://this URL

英文摘要

Vision Foundation Models (VFMs) with Vision Transformer (ViT) backbones, such as DINOv2, have become essential for downstream tasks like object recognition and semantic segmentation. The immense computational requirements of backbones often necessitate distillation into smaller architectures for edge deployment. Feature-based knowledge distillation (KD) often suffers from the teacher-student gap; the student struggles to imitate teacher's complex feature map due to its limited capacity. To mitigate this bottleneck, we propose LEAP: Layer-skipping Efficiency via Adaptive Progression, a training curriculum for ViT feature-based knowledge distillation. By utilizing the teacher's intermediate feature maps as a sequence of progressively more difficult targets, our curriculum allows the student to build a foundational representation before tackling higher-level abstractions. Our results demonstrate that this paradigm significantly accelerates convergence through adaptive difficulty selection across various student model sizes and dataset scales. With our curriculum, the LEAP-distilled ViT-S achieves 90.1% accuracy on ImageNet-100, a +12.24% improvement compared with baseline. On ImageNet-1K, LEAP achieves +3.84% and +7.75% improvement for the instance retrieval task on the Oxford and Paris datasets, respectively. Furthermore, the curriculum enables 25.1% savings in training FLOPs and 21% savings in training time on ImageNet-100 by implementing early-stopping for teacher inference during the initial stages of training. Code is available at https://github.com/KevinZ0217/LEAP

URL PDF HTML ☆

赞 0 踩 0

2606.19481 2026-06-19 cs.LG 新提交

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

Insulin4RL：面向离线强化学习的重症监护室实时胰岛素管理

Thomas Frost, Steve Harris

发表机构 * Institute of Health Informatics（健康信息学研究所）； University College London（伦敦大学学院）

AI总结针对电子健康记录离散化导致模型泛化性差的问题，提出基于真实临床轨迹的离线强化学习数据集Insulin4RL，包含375,000+决策和12,209名患者，用于评估模型在真实采样假设下的性能。

Comments Under submission

详情

AI中文摘要

离线强化学习（ORL）有潜力利用历史电子健康记录（EHR）数据提高临床决策质量。当前该领域的训练和评估实践严重依赖于按固定规则时间间隔离散化的EHR数据集。离散化创建了复杂临床场景的虚构表示，并损害了回顾性模型评估的泛化性。在本文中，我们介绍Insulin4RL，一个医疗ORL数据集，其特点是来自真实临床轨迹的自然不规则输入和动作。该数据集源自MIMIC-IV，包含超过375,000个标记决策，涉及12,209名需要在重症监护室进行胰岛素输注滴定的患者。因此，该数据集可用于研究ORL模型在现实临床采样假设下的性能。我们提供了数据集结构和特征的描述、使用无模型离线强化学习的基线性能指标，以及使用拟合Q评估的标准化评估协议。最后，我们提出了未来研究可以利用该资源解决的领域。

英文摘要

Offline reinforcement learning (ORL) offers the potential to improve the quality of clinical decision-making using historical electronic health record (EHR) data. Current training and evaluative practices in this field rely heavily on EHR datasets that have been temporally discretised into fixed, regular time intervals. Discretisation creates fictional representations of complex clinical scenarios and compromises the generalisability of retrospective model evaluations. In this paper, we introduce Insulin4RL, a healthcare ORL dataset featuring naturally irregular inputs and actions from real clinical trajectories. Derived from MIMIC-IV, Insulin4RL comprises over 375,000 labelled decisions across 12,209 patients requiring insulin infusion titration in the Intensive Care Unit. The dataset can thus be used for research into ORL model performance under realistic clinical sampling assumptions. We provide a description of the dataset's structure and characteristics, baseline performance metrics using model-free offline reinforcement learning, and a standardised evaluation protocol using fitted Q-evaluation. We conclude with suggested areas for future research that could be addressed using this resource.

URL PDF HTML ☆

赞 0 踩 0

2606.19476 2026-06-19 cs.LG cs.AI 新提交

Can In-Context Learning Support Intrinsic Curiosity?

上下文学习能否支持内在好奇心？

Eric Elmoznino, Sangnie Bhardwaj, Johannes von Oswald, Rajai Nasser, Blaise Agüera y Arcas, João Sacramento, Rif A. Saurous, Guillaume Lajoie

发表机构 * Google – Paradigms of Intelligence Team（Google – 智能范式团队）； Google DeepMind

AI总结研究利用序列模型的上下文学习能力作为即时无更新世界模型，以消除传统内在好奇心方法中梯度下降的计算瓶颈，理论证明在非时间设置下可渐近收敛到真实学习进度。

详情

AI中文摘要

有效的机器学习不仅取决于我们如何对数据建模，还取决于我们选择收集哪些数据。虽然大型序列模型已经彻底改变了数据建模，但自动数据选择或“内在好奇心”的问题仍然是一个重大挑战。经典方法通过基于智能体的“学习进度”奖励来激励探索，该奖励衡量新获得的观测在多大程度上改进了世界模型的预测能力。然而，传统上评估这些奖励需要在每个轨迹内进行昂贵的梯度下降内循环更新，这使得它们在规模上计算上不可行。在这项工作中，我们研究序列模型涌现的上下文学习（ICL）能力是否可以通过作为即时的、无需更新的世界模型来消除这一瓶颈。具体来说，我们评估是否可以训练一个探索策略来最大化学习进度，仅使用上下文学习者的预测误差和反事实上下文操作。我们首先证明，在一般马尔可夫决策过程中，这实际上不可能以无偏的方式实现：由此产生的内在奖励要么包含干扰项，使其对真实学习进度的估计产生偏差，要么无法使用上下文学习者的预测误差来实现。相反，我们对于非时间设置的一个广泛子类（包括主动学习和贝叶斯实验设计）证明了积极结果：在这里，ICL派生的奖励成功界定了真实学习进度并渐近收敛到它。我们通过连续和符号环境中的受控实验证实了我们的理论，表明我们的ICL驱动框架成功训练了以最优方式进行探索的好奇数据收集策略。

英文摘要

Effective machine learning depends not only on how we model data, but also on what data we choose to collect. While large sequence models have revolutionized data modeling, the problem of automated data selection, or "intrinsic curiosity", remains a significant challenge. Classic approaches incentivize exploration by rewarding an agent based on its "learning progress", which measures how much a newly acquired observation improves a world model's predictive ability. However, evaluating these rewards traditionally requires expensive inner loops of gradient descent updates within each trajectory, rendering them computationally impractical at scale. In this work, we investigate whether the emergent in-context learning (ICL) capabilities of sequence models can eliminate this bottleneck by serving as immediate, update-free world models. Specifically, we evaluate whether an exploration policy can be trained to maximize learning progress, using solely the prediction errors and counterfactual context manipulations of an in-context learner. We first prove that in general Markov decision processes, this is in fact impossible in an unbiased way: the resulting intrinsic rewards either suffer from nuisance terms that bias their estimation of true learning progress, or they cannot be implemented using an in-context learner's prediction errors. Conversely, we prove a positive result for a broad subclass of non-temporal settings, encompassing active learning and Bayesian Experimental Design: here, ICL-derived rewards successfully bound and asymptotically converge to the true learning progress. We corroborate our theory with controlled experiments across continuous and symbolic environments, demonstrating that our ICL-driven framework successfully trains curious data-collection policies that explore optimally.

URL PDF HTML ☆

赞 0 踩 0

2606.19475 2026-06-19 cs.AI cs.CL 新提交

MortarBench: 评估抵押贷款发起代理

Matthew Toles, Yunan Lu, Manav Munjal, Bojun Liu, Yuanhao Deng, Stephanie Selig, Derek Rindner, Cheng Li, Zhou Yu

发表机构 * Columbia University（哥伦比亚大学）； Tidalwave

AI总结提出MortarBench基准，通过金融数据合成与变异管道生成覆盖边缘案例的示例，评估大语言模型在贷款发起任务中的表现，发现模型准确率低且存在偏见，并引入CRIT校准框架提升准确率至80.5%。

详情

AI中文摘要

贷款发起是贷方创建新贷款的过程，从申请和承保到批准和融资。该过程在评估申请人的资格和风险水平方面起着关键作用。最近，尽管缺乏任何公开基准，公司已开始使用抵押贷款代理来增强人类贷款官员。为填补这一空白，我们提出了MortarBench，一个贷款发起代理基准。MortarBench使用金融数据合成和变异管道生成具有广泛边缘案例覆盖的示例，这些示例匹配真实世界的分布和问题。我们发现最先进的大语言模型（LLM）表现不佳，闭源模型最多达到77.1%的精确匹配准确率。我们还发现LLM对与非英语名字相关的外国性存在系统性偏见。注意到这些弱点，我们引入了CRIT，一个置信度校准框架。我们的方法将准确率提高到80.5%，同时改善了风险管理导向并减少了偏见。

英文摘要

Loan origination is the process by which a lender creates a new loan, from application and underwriting through approval and funding. This process serves a critical role in evaluating the eligibility and level of risk posed by an applicant. Recently, firms have begun using mortgage loan agents to augment human loan officers, despite a lack of any public benchmark. To fill this gap, we present MortarBench, a loan origination agent benchmark. MortarBench uses a financial data synthesis and mutation pipeline to generate examples with broad edge case coverage that match real-world distributions and questions. We find that state-of-the-art large language models (LLMs) perform poorly, with closed-source models achieving at most 77.1\% exact match accuracy. We also discover systematic biases in LLM perception of foreignness related to non-English names. Noting these weaknesses, we introduce CRIT, a confidence calibration framework. Our method increases accuracy to 80.5\% while improving risk management steering and reducing bias.

URL PDF HTML ☆

赞 0 踩 0

2606.19413 2026-06-19 cs.LG 新提交

Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting

文本真的有用吗？揭示并解决多模态时间序列预测中的文本坍缩问题

Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, Hung Le

发表机构 * Applied Artificial Intelligence Initiative（应用人工智能计划）

AI总结针对多模态时间序列预测中文本分支被忽视导致“文本坍缩”的问题，提出REST-TS方法，通过让文本分支专门预测数值主干无法解释的残差，强制其提取真实内容，实现最先进性能。

详情

AI中文摘要

多模态时间序列预测将数值序列与领域相关的文本报告配对，有望将世界知识注入预测流程。然而，我们揭示了现有框架中的一个关键失败模式，称为文本坍缩：文本分支收敛到与内容无关的变换，无论输入描述如何，都贡献可忽略的判别信号。我们认为文本坍缩是时间序列预测中基本不对称性的结果：数值输入与输出强自相关，使得数值主干天生占主导地位，而文本分支尽管携带互补且通常关键的信息，却未被充分利用，导致其系统性欠利用。为解决此问题，我们提出REST-TS（时间序列中文本的残差独占监督），将不对称性转化为设计原则：数值主干产生其独立的数值预测，而文本分支被独占监督以预测残差的结构化组成部分，即数值无法解释的预测差距。由于没有数值路径可以减少这些损失，文本分支必须从输入描述中提取真实内容。在多样化的现实领域和主干架构上的评估表明，REST-TS实现了最先进的性能，并一致地显示出比现有框架更高的文本分支利用率，提供了强有力的经验证据，表明对文本分支进行残差监督迫使其从输入中提取真实内容。

英文摘要

Multimodal time series forecasting, which pairs numerical sequences with domain-relevant textual reports, promises to inject world knowledge into forecasting pipelines. However, we uncover a critical failure mode in existing frameworks that we term text collapse: the text branch converges to a content-independent transformation, contributing negligible discriminative signal regardless of the input description. We argue that text collapse is a consequence of a fundamental asymmetry in time series forecasting: the numerical input is strongly autocorrelated with the output, making the numerical backbone inherently dominant, while the text branch, despite carrying complementary and often critical information, is insufficiently utilized, leading to its systematic underexploitation. To address this, we propose \textbf{REST-TS} (\textbf{R}esidual-\textbf{E}xclusive \textbf{S}upervision for \textbf{T}ext in \textbf{T}ime \textbf{S}eries), which turns the asymmetry into a design principle: the numerical backbone produces its own independent numerical forecast, and the text branch is exclusively supervised to predict the structured components of the residual, the prediction gap that numbers cannot explain. Because no numerical pathway can reduce these losses, the text branch must extract genuine content from the input description. Evaluated across diverse real-world domains and backbone architectures, REST-TS achieves state-of-the-art performance and consistently demonstrates greater text-branch utilization than existing frameworks, providing strong empirical evidence that supervising the text branch on the residual compels it to extract genuine content from the input.

URL PDF HTML ☆

赞 0 踩 0

2606.19412 2026-06-19 cs.LG 新提交

Spectral Retrieval-Augmented Time-Series Forecasting

频谱检索增强的时间序列预测

Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, Hung Le

发表机构 * Applied Artificial Intelligence Initiative（应用人工智能倡议）； Deakin University（迪肯大学）

AI总结提出SpecReTF方法，通过将时间序列转换为窗口化频率表示并采用结合幅度和相位的相似性度量，以及指数移动平均加权方案，解决了现有检索方法在频谱盲区和时间近因上的局限性，提升了非平稳时间序列预测的准确性。

详情

AI中文摘要

时间序列预测利用历史模式来预测未来值，但传统方法在处理复杂、非平稳模式时面临挑战，这些模式在训练期间难以记忆。检索增强方法通过检索相似历史模式来增强预测，已成为有前景的解决方案。然而，现有检索方法存在两个基本局限性：频谱盲区，即忽略了捕捉潜在周期结构的关键频域特征；以及时间近因，即对所有历史数据一视同仁，而不强调最近、更相关的模式。在本文中，我们提出SpecReTF，一种新颖的检索方法，通过将时间序列转换为窗口化频率表示，并使用结合幅度和相位信息的组合度量来衡量相似性，从而解决这些问题。为了平衡近因和历史上下文，我们应用指数移动平均加权方案，强调最近的窗口。在基准数据集上的大量实验表明，SpecReTF优于时域检索方法，在多样化的非平稳时间序列上实现了卓越的预测准确性。

英文摘要

Time series forecasting leverages historical patterns to predict future values, but traditional methods face challenges when dealing with complex, non-stationary patterns that are difficult to memorize during training. Retrieval-augmented approaches have emerged as promising solutions by retrieving similar historical patterns to enhance predictions. However, existing retrieval methods suffer from two fundamental limitations: spectral blindness, which overlooks critical frequency-domain characteristics that capture underlying periodic structures, and temporal recency, which treats all historical data equally without emphasizing recent, more relevant patterns. In this paper, we propose SpecReTF, a novel retrieval method that addresses these issues by converting time series into windowed frequency representations, measuring similarity with a combined metric that captures both amplitude and phase information. To balance recency and historical context, we apply an exponential moving average weighting scheme that emphasizes recent windows. Extensive experiments on benchmark datasets demonstrate that SpecReTF outperforms time-domain retrieval methods, achieving superior forecasting accuracy across diverse, non-stationary time series.

URL PDF HTML ☆

赞 0 踩 0

2606.19411 2026-06-19 cs.LG 新提交

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

通过NEPv的谱DPP：用于多样性感知数据选择的确定性点过程MAP的可扩展连续松弛

Richard Yi Da Xu

发表机构 * Hong Kong Baptist University（香港浸会大学）； TadReamk Limited（TadReamk有限公司）

AI总结提出将NP难的DPP-MAP选择问题转化为Stiefel流形上的连续优化，通过非线性特征值问题（NEPv）的自洽场迭代实现近线性时间求解，适用于大规模数据选择。

详情

AI中文摘要

从海量候选池中选择一个小的、多样化的、高质量的子集是现代机器学习中的一个常见原语——用于训练和微调大型模型的数据整理和核心集选择、主动学习批次获取、上下文学习的提示和示例选择、检索多样化以及实验设计。确定性点过程（DPP）为此任务提供了原则性的、良好校准的多样性概念，但其MAP目标——选择大小为$k$的子集$S$最大化$\log\det(L_S)$——是NP难的，并且标准的贪心和采样算法在候选集大小$n$上具有超线性复杂度。这种成本在多样性最重要的数据为中心的场景中尤其高昂，其中$n$范围从数百万到数十亿的候选示例、特征或嵌入。我们将DPP-MAP重新表述为Stiefel流形上的连续优化问题，并证明其最优性条件构成一个先前未研究形式的具有特征向量依赖性的非线性特征值问题（NEPv）。该NEPv允许自洽场（SCF）迭代，具有基于谱间隙的局部收缩保证，从而提供了一个原则性的迭代求解器，其中多样性目标驱动一个特征向量依赖的算子。由此产生的算法OurMethod仅需要与核的矩阵-向量乘积，运行时间为$O\!\big((ndk+nk^2)\,t\big)$，其中迭代次数$t$很小，在$n$上接近线性，并直接与机器学习中常见的低秩和特征映射核集成。本文重点介绍松弛、求解器和扩展分析；完整的真实数据基准测试留给计划中的实证研究。

英文摘要

Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning batch acquisition, prompt and exemplar selection for in-context learning, retrieval diversification, and experimental design. Determinantal Point Processes (\DPP s) give a principled, well-calibrated notion of diversity for this task, but their \emph{MAP} objective -- pick a size-$k$ subset $S$ maximizing $\logdet(L_S)$ -- is NP-hard, and the standard greedy and sampling algorithms scale superlinearly in the ground-set size $n$. This cost is prohibitive precisely in the data-centric regime where diversity matters most, where $n$ ranges over millions to billions of candidate examples, features, or embeddings. We recast \DPP-MAP as a continuous optimization problem over the Stiefel manifold, and show that its first-order optimality conditions form a \emph{Nonlinear Eigenvalue Problem with eigenvector dependency} (\NEPv) of a previously unstudied form. This \NEPv\ admits a self-consistent field (\SCF) iteration with a spectral-gap-based local contraction guarantee, giving a principled iterative solver where the diversity objective drives an eigenvector-dependent operator. The resulting algorithm, \OurMethod, requires only matrix-vector products with the kernel and runs in time $O\!\big((ndk+nk^2)\,t\big)$ for a small number of iterations $t$, scaling near-linearly in $n$ and integrating directly with low-rank and feature-map kernels common in ML. This paper focuses on the relaxation, solver, and scaling analysis; full real-data benchmarking is left to a planned empirical study.

URL PDF HTML ☆

赞 0 踩 0

2606.19408 2026-06-19 cs.LG cs.RO 新提交

FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning

FlexLAM: 解决潜在动作学习中的瓶颈权衡

Takanori Yoshimoto, Yang Hu, Naruya Kondo, Tatsuya Matsushima

发表机构 * University of Tsukuba（筑波大学）； The University of Tokyo（东京大学）

AI总结针对潜在动作模型中固定容量瓶颈导致的权衡问题，提出FlexLAM，通过嵌套dropout实现变长潜在动作，在不增加架构或损失的情况下，在稀缺标签和低回报任务中优于固定容量模型，并支持推理时调整令牌预算。

详情

AI中文摘要

潜在动作为无动作视频与下游决策提供了紧凑接口，但现有潜在动作模型（LAM）强制每个转换通过固定容量瓶颈。我们识别出一个瓶颈权衡：过于紧凑的编码可能丢弃动作对齐所需的转换线索，而过于松散的编码则保留了额外的转换变化，当对齐标签稀缺或分布狭窄时必须解决这些变化。FlexLAM用通过嵌套dropout训练的变长潜在动作取代固定容量，产生前缀有效编码，首先捕获紧凑的转换结构，仅在需要时添加细节，无需新架构或损失。在标准稀缺标签监督下和低回报单任务对齐压力测试中，单个FlexLAM在每个评估的令牌预算下匹配或超越单独训练的固定容量LAM，表明FlexLAM不仅在推理时可调整，而且在相同令牌预算下学习了更好的潜在动作接口。同一模型支持推理时令牌预算调整而无需重新训练，并且FlexLAM改善了Ego4D转换重建。这些结果表明，变长潜在动作是对潜在动作模型、潜在动作世界模型和视频预训练动作接口中固定容量瓶颈的无架构、即插即用升级。

英文摘要

Latent actions provide a compact interface between action-free video and downstream decision-making, yet existing Latent Action Models (LAMs) force every transition through a fixed-capacity bottleneck. We identify a bottleneck trade-off: overly tight codes can discard transition cues needed for action alignment, while overly loose codes preserve additional transition variation that must be resolved when alignment labels are scarce or narrowly distributed. FlexLAM replaces this fixed capacity with variable-length latent actions trained by nested dropout, yielding prefix-valid codes that capture compact transition structure first and add detail only when needed, without new architectures or losses. A single FlexLAM matches or surpasses separately trained fixed-capacity LAMs at every evaluated token budget under standard scarce-label supervision and under a low-return single-task alignment stress test, indicating that FlexLAM is not merely adjustable at inference time but learns a better latent-action interface at the same token budgets. The same model supports inference-time token-budget adjustment without retraining, and FlexLAM improves Ego4D transition reconstruction. These results suggest that variable-length latent actions are an architecture-free, drop-in upgrade to the fixed-capacity bottleneck in latent action models, latent-action world models, and video-pretrained action interfaces.

URL PDF HTML ☆

赞 0 踩 0

2606.19404 2026-06-19 cs.LG cs.CL 新提交

Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

推理的热力学特征：用于大型语言模型幻觉检测的自由能和谱形因子诊断

Salim Khazem

发表机构 * Talan Research & Innovation Center（Talan研究与创新中心）

AI总结提出自由能签名（Fes）作为谱描述符，将注意力拉普拉斯视为哈密顿量并提取热力学势和随机矩阵理论谱形因子，用于检测LLM幻觉，无需训练即可实现高AUROC。

详情

AI中文摘要

大型语言模型（LLM）中的幻觉检测对部署至关重要，近期研究表明注意力导出的图拉普拉斯谱携带关于推理质量的强信号。然而，先前的谱诊断仅通过少数特征值或手工选取的标量来总结拉普拉斯谱，忽略了其大部分结构。我们提出自由能签名（Fes），一种谱描述符，将每层的注意力拉普拉斯视为哈密顿量，并提取其热力学势（配分函数、自由能、谱熵、热容）以及随机矩阵理论（RMT）谱形因子。我们证明了三个结果：（i）Fes在注意力扰动下的Lipschitz稳定性；（ii）一个表达性结果，表明Fes丰富了有限谱摘要，并在明确的规则性和网格分辨率假设下逼近矩导出的谱泛函；（iii）基于Fes构建的无训练检测器AUROC的有限样本PAC界。实验上，在六个开源LLM和六个基准测试中，基于Fes描述符的轻量级探测在注意力谱基线中实现了最强的平均AUROC，相比LapEig平均提高+6.5 AUROC点，相比GoR-4平均提高+2.4点，且无需更新底层LLM。在完全无监督设置下，RMT偏差得分达到平均AUROC 0.71，提供了一个无标签但较弱的检测器。互补的RMT分析表明，正确生成表现出更接近Wigner-Dyson的谱统计，而幻觉表现出更接近Poisson的统计。匿名代码和配置在补充材料中提供。

英文摘要

Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral diagnostics, however, summarize the Laplacian spectrum by a handful of eigenvalues or hand-picked scalars, leaving most of its structure unused. We propose Free-Energy Signatures (Fes), a spectral descriptor that treats each layer's attention Laplacian as a Hamiltonian and extracts its thermodynamic potentials partition function, free energy, spectral entropy, heat capacity together with the random-matrix-theory (RMT) spectral form factor. We prove three results: (i)~Lipschitz stability of Fes under attention perturbation; (ii)~an expressiveness result showing that Fes enriches finite spectral summaries and approximates moment-derived spectral functionals under explicit regularity and grid-resolution assumptions; and (iii)~a finite-sample PAC bound on the AUROC of a training-free detector built from Fes. Empirically, across six open-weight LLMs and six benchmarks, a lightweight probe on Fes descriptors achieves the strongest aggregate AUROC among attention-spectral baselines, improving over LapEig by $+6.5$ AUROC points and over GoR-4 by $+2.4$ points on average, while requiring no update to the underlying LLM. In the fully unsupervised setting, an RMT-deviation score achieves mean AUROC $0.71$, providing a label-free but weaker detector. A complementary RMT analysis shows that correct generations exhibit more Wigner-Dyson like spectral statistics, whereas hallucinations exhibit more Poisson-like statistics. The anonymized code and config are provided in the supplementary material.

URL PDF HTML ☆

赞 0 踩 0

2606.19399 2026-06-19 cs.LG cs.AI cs.LO cs.PL 新提交

VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving

VERITAS：验证器引导的零样本形式定理证明搜索

Manish Acharya, Zhenyu Liao, Yueke Zhang, Kevin Leach, Yu Huang, Yifan Zhang

发表机构 * Department of Computer Science, Vanderbilt University（范德堡大学计算机科学系）； Amazon（亚马逊）

AI总结提出VERITAS框架，通过两阶段协议（Best-of-N采样+批评引导MCTS）利用验证器反馈进行零样本定理证明，在miniF2F上达40.6%准确率，并发布组合学基准VERITAS-CombiBench。

详情

AI中文摘要

基于LLM的形式化证明器通常将丰富的验证器信号（语法错误、类型不匹配、部分目标进展）压缩为二进制的通过/失败位。我们提出VERITAS，一个零样本框架，通过两阶段协议将每个验证器信号路由回证明搜索：首先进行Best-of-N采样，然后进行批评引导的MCTS遍历，该遍历将第一阶段失败作为显式负例吸收。该协议保留其第一阶段扫描解决的每个定理，因此第二阶段额外的解决可归因于反馈驱动的探索。VERITAS在miniF2F上达到40.6%（相比之下，独立运行的Best-of-5为36.9%，Portfolio为26.2%），在VERITAS-CombiBench上达到7.3%，这是一个我们发布的55个定理的组合学基准，在该基准上Best-of-5（1.8%）低于Portfolio（3.6%），暴露了当必须从验证器反馈中迭代恢复正确的引理名称时，无指导的采样会带来损害。工件可在GitHub上获取。

英文摘要

LLM-based formal provers often collapse rich verifier signals (syntax errors, type mismatches, partial goal progress) into a binary pass/fail bit. We present VERITAS, a zero-shot framework that routes every verifier signal back into proof search through a two-phase protocol: Best-of-N sampling first, then a critic-guided MCTS pass that ingests Phase 1 failures as explicit negative examples. The protocol preserves every theorem solved by its own Phase 1 sweep, so Phase 2's additional solves are attributable to feedback-driven exploration. VERITAS reaches 40.6% on miniF2F (vs. an independently run Best-of-5 at 36.9%, Portfolio 26.2%) and 7.3% on VERITAS-CombiBench, a 55-theorem combinatorics benchmark we release on which Best-of-5 (1.8%) falls below Portfolio (3.6%), exposing that unguided sampling hurts when correct lemma names must be recovered iteratively from verifier feedback. Artifacts are available on GitHub.

URL PDF HTML ☆

赞 0 踩 0

2606.19398 2026-06-19 cs.SD eess.AS eess.SP 新提交

S-JEPA : Soft Clustering Anchors for Self-Supervised Speech Representation Learning

S-JEPA：用于自监督语音表示学习的软聚类锚点

Georgios Ioannides, Adrian Kieback, Judah Goldfeder, Linsey Pang, Aman Chadha, Aaron Elkins, Yann LeCun, Ravid Shwartz-Ziv

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； New York University（纽约大学）； James Silberrad Brown Center for AI（詹姆斯·西尔伯拉德·布朗人工智能中心）； Columbia University（哥伦比亚大学）； Northeastern University（东北大学）； Stanford University（斯坦福大学）； Amazon GenAI（亚马逊生成式人工智能）

AI总结提出S-JEPA，通过KL散度匹配高斯混合模型的软后验概率训练编码器-预测器对，无需离线重聚类或教师蒸馏，在SUPERB协议下以低于90M参数取得最低WER，并建立新的帕累托前沿。

详情

AI中文摘要

自监督语音编码器主要通过预测掩蔽位置处的离散硬聚类ID进行训练，这种方法会坍缩类别边界处的声学模糊性，并需要在迭代之间中断训练以对整个语料库进行重聚类。我们提出S-JEPA，一种JEPA风格的编码器-预测器对，通过KL散度训练以匹配掩蔽位置处高斯混合模型的软后验概率。训练作为连续优化轨迹分两个阶段进行：首先在MFCC特征上使用固定GMM，然后在编码器特征上使用在线GMM，输入层从无标签信号中自适应选择，从而消除了离线重聚类步骤以及手动选择聚类所在Transformer层的问题。在SUPERB协议下，S-JEPA在评估的低于90M参数的自监督方法中实现了最低的词错误率（WER），并在大约一半参数量的情况下在情感识别任务上与HuBERT-Base相当，无需离线重聚类或教师蒸馏即建立了新的帕累托前沿。对预测器在保留语音上的每帧熵的分析揭示了双峰分布，其中相当一部分帧的熵接近完美两聚类平局的熵，这直接经验性地证明了软目标目标保留了硬目标会坍缩的声学模糊性。代码可在以下网址获取：https://this https URL。

英文摘要

Self-supervised speech encoders are predominantly trained by predicting discrete hard cluster IDs at masked positions, a recipe that collapses acoustic ambiguity at category boundaries and requires interrupting training to re-cluster the entire corpus between iterations. We introduce S-JEPA, a JEPA-style encoder-predictor pair trained to match the soft posteriors of a Gaussian Mixture Model at masked positions via KL divergence. Training runs as one continuous optimization trajectory in two phases: a fixed GMM over MFCC features, then an online GMM over encoder features, with the input layer selected adaptively from a label-free signal, removing both the offline re-cluster step and the hand-tuned choice of which transformer layer to cluster on. Under the SUPERB protocol, S-JEPA achieves the lowest WER among evaluated SSL methods below 90M parameters and matches HuBERT-Base on emotion recognition at roughly half its parameter count, establishing a new Pareto frontier without offline re-clustering or teacher distillation. An analysis of the predictor's per-frame entropy on held-out speech reveals a bimodal distribution with a substantial minority of frames near the entropy of a perfect two-cluster tie, providing direct empirical evidence that the soft-target objective preserves the acoustic ambiguity that hard targets would collapse. Code is available at https://github.com/gioannides/s-jepa.

URL PDF HTML ☆

赞 0 踩 0

2606.19383 2026-06-19 cs.RO cs.CV 新提交

3D Scene Graphs: Open Challenges and Future Directions

3D场景图：开放挑战与未来方向

Dennis Rotondi, Francesco Argenziano, Sebastian Koch, Nathan Hughes, Martin Buechner, Johanna Wald, Lukas Rosenberger Schmid, Daniele Nardi, Abhinav Valada, Liam Paull, Federico Tombari, Luca Carlone, Kai O. Arras

发表机构 * University of Stuttgart（斯图加特大学）； IMPRS-IS（马克斯·普朗克研究所-智能系统）； Sapienza University of Rome（罗马萨皮恩扎大学）； Google（谷歌）； MIT（麻省理工学院）； University of Freiburg（弗赖堡大学）； UTN University of Montreal（蒙特利尔大学UTN分校）； Mila TU Munich（慕尼黑技术大学Mila）

AI总结本文统一综述3D场景图（3DSG）的构建、应用与评估，分析现有建模选择与开放挑战，旨在推动鲁棒部署。

Comments Invited article for the Annual Review of Control, Robotics, and Autonomous Systems Volume 10

详情

AI中文摘要

3D场景图（3DSG）通过将几何基础与环境的语义和关系抽象相结合，已成为空间AI的强大表示。其表现力使其与机器人和计算机视觉中的广泛问题相关，包括操作、导航、任务规划、场景理解等。然而，该领域仍然分散：不同的社区采用不同的公式、构建流程和评估协议，使得比较方法、识别共同假设以及评估鲁棒实际部署的剩余挑战变得困难。本综述提供了对3DSG的统一和批判性回顾，特别强调开放挑战和未来方向。我们首先在共同定义下形式化3DSG，并分析表征现有公式的主要建模选择，包括节点和边属性、层次结构、动态场景表示和可供性感知扩展。然后，我们回顾如何从原始感官观察构建3DSG，讨论最常见的术语、约定和技术。最后，我们检查下游应用和评估策略，从内在图质量到任务级性能。为支持社区，我们还提供了一个专用网站，组织和扩展所调查的内容，可访问此 https URL。

英文摘要

3D Scene Graphs (3DSGs) have emerged as a powerful representation for spatial AI by combining geometric grounding with semantic and relational abstractions of the environment. Their expressiveness has made them relevant to a broad range of problems in robotics and computer vision, including manipulation, navigation, task planning, scene understanding, and many others. However, the field remains fragmented: different communities adopt distinct formulations, construction pipelines, and evaluation protocols, making it difficult to compare methods, identify common assumptions, and assess remaining challenges for robust real-world deployment. This survey provides a unified and critical review of 3DSGs, with particular emphasis on open challenges and future directions. We first formalize 3DSGs under a common definition and analyze the principal modeling choices that characterize existing formulations, including node and edge attributes, hierarchical structure, dynamic scene representations, and affordance-aware extensions. We then review how 3DSGs are built from raw sensory observations, discussing the most common terminologies, conventions, and techniques. Finally, we examine downstream applications and evaluation strategies, from intrinsic graph quality to task-level performance. To support the community, we also provide a dedicated website that organizes and extends the surveyed content, accessible at https://3dscenegraphs.com/.

URL PDF HTML ☆

赞 0 踩 0

2606.19381 2026-06-19 cs.SD cs.AI 新提交

Improving Code-Switching ASR with Code-Mixing Guided Synthetic Speech

利用语码混合引导的合成语音改进语码转换语音识别

Yue Heng Yeo, Haoyang Li, Yizhou Peng, Shreyas Gopal, Hexin Liu, Leibny Paola Garcia-Perera, Hardik B. Sailor, Jeremy H. M. Wong, Eng Siong Chng

发表机构 * College of Computing and Data Science, Nanyang Technological University（南洋理工大学计算与数据科学学院）； Google DeepMind（谷歌深度思维）

AI总结针对语码转换语音识别中高质量文本-语音对稀缺的问题，提出语码混合引导的偏好学习框架，通过语码混合指数优化合成语音的转换保真度，在SEAME语料库上微调Whisper Large，将混合错误率从12.1%/17.8%降至8.9%/14.2%。

Comments Accepted to Interspeech 2026

AI 大模型

视觉与机器人

科学与医疗

REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk

Proprioceptive Invariant State Estimation for Humanoid Robots on Non-Inertial Ground

LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data

Simulating Robotic Locomotion in Sand: Resistive Force Theory in an Open-Source Physics Engine

DeXposure-Claw: An Agentic System for DeFi Risk Supervision

Calibrating Generative Models to Feature Distributions with MMD Finetuning

LooseControlVideo: Directorial Video Control using Spatial Blocking

Hidden Anchors in Multi-Agent LLM Deliberation

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

Concept Flow Models: Anchoring Concept-Based Reasoning with Hierarchical Bottlenecks

LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

Can In-Context Learning Support Intrinsic Curiosity?

Diffusion Language Models: An Experimental Analysis

Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS2023

Characterizing Narrative Content in Web-scale LLM Pretraining Data

Deontic Policies for Runtime Governance of Agentic AI Systems

Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers

3D-DLP: Self-Supervised 3D Object-Centric Scene Representation Learning

Playful Agentic Robot Learning

MortarBench: Evaluating Mortgage Loan Origination Agents

Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting

Spectral Retrieval-Augmented Time-Series Forecasting

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning

Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving

S-JEPA : Soft Clustering Anchors for Self-Supervised Speech Representation Learning

3D Scene Graphs: Open Challenges and Future Directions

Improving Code-Switching ASR with Code-Mixing Guided Synthetic Speech