arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.09925 2026-06-10 cs.SD 新提交

AudioProcessBench: Benchmark for Identifying Process Errors in Audio-Grounded Reasoning

AudioProcessBench: 音频基础推理中过程错误识别的基准

Xiangyu Zhao, Junyu Yan, Yaling Shen, Zimu Wang, Yiwen Jiang, Stephanie Fong, Qingyang Xu, Jiahe Liu, Dominic Dwyer, Zongyuan Ge

AI总结提出AudioProcessBench基准，用于评估音频-语言模型在推理步骤中的过程错误识别能力，涵盖步骤正确性、错误类型检测和链级聚合三种范式。

详情

AI中文摘要

大型音频-语言模型（LALMs）越来越多地使用显式推理轨迹进行复杂的音频理解，但对推理质量的评估仍未被充分探索。尽管过程级基准（用于过程奖励模型PRMs）在文本和多模态领域推进了推理评估，但音频推理的类似评估仍然有限。在本文中，我们提出了AudioProcessBench，一个用于音频推理中步骤级过程错误识别的综合基准。AudioProcessBench包含由6个音频和全模态语言模型生成的不同推理轨迹。每个轨迹被分割成离散的推理步骤，并标注了二元步骤正确性和细粒度错误类型。我们的基准在三种互补范式下评估模型：（1）步骤正确性识别，（2）错误类型条件检测，用于诊断音频特定验证器能力，以及（3）链级聚合，其中验证器为同一问题选择或聚合多个推理轨迹。这种设计使得系统分析当前模型是否能检测过程错误、它们的弱点是否因音频特定错误类型而异，以及过程验证是否能转化为改进的答案选择成为可能。AudioProcessBench为未来关于音频推理验证器、过程奖励模型和可靠的全模态推理研究提供了测试平台。

英文摘要

Large audio-language models (LALMs) increasingly use explicit reasoning traces for complex audio understanding, yet the evaluation of reasoning quality remains underexplored. Although process-level benchmarks for process reward models (PRMs) have advanced reasoning evaluation in text and multi-modal domains, comparable evaluation for audio reasoning remains limited. In this paper, we present AudioProcessBench, a comprehensive benchmark for step-level process error identification in audio reasoning. AudioProcessBench contains diverse reasoning traces generated by 6 audio and omni language models. Each trace is segmented into discrete reasoning steps and annotated with binary step correctness and fine-grained error types. Our benchmark evaluates models under three complementary paradigms: (1) step correctness identification, (2) error-type-conditioned detection for diagnosing audio-specific verifier capacities, and (3) chain-level aggregation, where verifiers select or aggregate among multiple reasoning traces for the same question. This design enables a systematic analysis of whether current models can detect process errors, whether their weaknesses differ across audio-specific error types, and whether process verification translates into improved answer selection. AudioProcessBench provides a testbed for future research on audio reasoning verifiers, process reward models, and reliable omni-modal reasoning.

URL PDF HTML ☆

赞 0 踩 0

2606.09924 2026-06-10 cs.LG cs.AI 新提交

Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters

Sigma-Branch: 用于动态推理的分层单路径网络重构，减少活跃参数

Kohga Tanaka, Hiroaki Nishi

AI总结提出Sigma-Branch框架，通过分层二叉树结构将预训练密集网络重构为共享主干、分层路由器和专用叶子，利用激活聚类初始化并微调，推理时仅执行单一路径，在CIFAR-100/ResNet-50等任务上减少58-60%活跃参数，性能损失小于1.72个百分点。

详情

AI中文摘要

在内存受限的边缘加速器上部署深度神经网络，瓶颈在于每次推理的片外权重传输而非计算：密集网络无法保留在芯片上，每个输入都必须加载所有参数。现有模型压缩仅在永久容量损失代价下减少这种传输。我们提出Sigma-Branch (SigmaB)，一个将预训练密集网络重构为分层二叉树的框架，该树由共享主干、分层路由器和专用叶子组成。预训练权重通过基于激活的球形k-means聚类分布在树中，该聚类联合初始化路由器权重和每分支通道分配；然后通过软路由微调使每个叶子与其路由输入子集对齐。在推理时，所得网络仅执行一条根到叶路径，减少活跃参数占用，同时将完整密集参数集存储在内存中。在CIFAR-100 / ResNet-50、ImageNet-1K / ResNet-50和ModelNet40 / PointNet++上，SigmaB-Net将每次推理的活跃参数减少58-60%，同时与密集基线Top-1相比误差在1.72个百分点以内。在可比的ImageNet-1K Top-1下，活跃参数减少超过静态结构化剪枝（FPGM、HRank）14-23个百分点。跨模态评估涵盖2D视觉和3D点云骨干网络，证实了将每次推理内存流量与总参数数量解耦的框架级主张。

英文摘要

Deploying deep neural networks on memory-constrained edge accelerators is bottlenecked by per-inference off-chip weight transfer rather than computation: the dense network cannot be retained on-chip, and every parameter must be loaded for every input. Existing model compression reduces this transfer only at the cost of permanent capacity loss. We propose Sigma-Branch (SigmaB), a framework that restructures a pretrained dense network into a hierarchical binary tree composed of a shared backbone, hierarchical routers, and specialized leaves. Pretrained weights are distributed across the tree via activation-based spherical k-means clustering, which jointly initializes router weights and per-branch channel allocations; soft-routing fine-tuning then aligns each leaf with its routed input subset. At inference, the resulting network executes only a single root-to-leaf path, reducing the active-parameter footprint while storing the complete dense parameter set in memory. Across CIFAR-100 / ResNet-50, ImageNet-1K / ResNet-50, and ModelNet40 / PointNet++, SigmaB-Net reduces per-inference active parameters by 58-60% while remaining within 1.72 percentage points (pp) of the dense baseline Top-1. At comparable ImageNet-1K Top-1, the active-parameter reduction exceeds static structured pruning (FPGM, HRank) by 14-23 pp. The cross-modal evaluation, spanning 2D vision and 3D point-cloud backbones, substantiates a framework-level claim that decouples per-inference memory traffic from the total parameter count.

URL PDF HTML ☆

赞 0 踩 0

2606.09923 2026-06-10 cs.LG cs.AI 新提交

Conformal Prediction for Neural Operators: Distribution-Free Uncertainty Quantification in Physics Simulation

神经算子的共形预测：物理模拟中无分布不确定性量化

Michael Chin

AI总结提出将分裂共形预测应用于神经算子物理模拟，实现无分布预测区间和有限样本覆盖保证，并通过归一化共形预测方案生成自适应宽度区间。

Comments 13 pages, 7 tables, 7 figures. Full-scale experiments on NVIDIA V100

详情

AI中文摘要

神经算子如傅里叶神经算子（FNO）已成为求解偏微分方程（PDE）的强大替代方法，比传统数值求解器快几个数量级。然而，在安全关键工程应用（如电子元件和电池系统的热管理）中部署这些模型，不仅需要准确的点预测，还需要严格的不确定性保证。现有的神经算子不确定性量化（UQ）方法，包括蒙特卡洛Dropout和深度集成，仅提供相对不确定性估计，没有正式的覆盖保证。在这项工作中，我们首次将分裂共形预测应用于基于神经算子的物理模拟，提供具有有限样本覆盖保证的无分布预测区间。我们进一步引入了一种归一化共形预测方案，利用MC Dropout不确定性生成自适应宽度区间，在低不确定性区域产生更紧的区间，在模型不太确定的区域产生更宽的区间。在稳态热传导基准上的全规模实验（3370万参数，800个训练样本，5个集成成员，NVIDIA V100）表明，我们的方法在目标水平alpha=0.1下达到89.1%的经验覆盖率，同时生成反映底层物理不确定性结构的空间自适应预测区间。我们还提供了一个不确定性分解框架，将认知不确定性（占总量的68%）与偶然不确定性（占总量的32%）分离，为数据收集和模型改进提供可操作指导。我们的方法在一个开源平台上实现，具有REST API端点和交互式3D可视化。

英文摘要

Neural operators such as the Fourier Neural Operator (FNO) have emerged as powerful surrogates for solving partial differential equations (PDEs), achieving speedups of several orders of magnitude over traditional numerical solvers. However, deploying these models in safety-critical engineering applications -- such as thermal management of electronic components and battery systems -- requires not only accurate point predictions but also rigorous uncertainty guarantees. Existing uncertainty quantification (UQ) methods for neural operators, including Monte Carlo Dropout and Deep Ensembles, provide only relative uncertainty estimates without formal coverage guarantees. In this work, we propose the first application of split conformal prediction to neural operator-based physics simulation, providing distribution-free prediction intervals with finite-sample coverage guarantees. We further introduce a normalized conformal prediction scheme that leverages MC Dropout uncertainty to produce adaptive-width intervals, yielding tighter intervals in regions of low uncertainty and wider intervals where the model is less certain. Full-scale experiments (33.7M parameters, 800 training samples, 5 ensemble members, NVIDIA V100) on steady-state heat conduction benchmarks demonstrate that our method achieves 89.1% empirical coverage at the target level of alpha=0.1, while producing spatially adaptive prediction intervals that reflect the underlying physical uncertainty structure. We also provide an uncertainty decomposition framework that separates epistemic uncertainty (68% of total) from aleatoric uncertainty (32% of total), offering actionable guidance for data collection and model improvement. Our method is implemented in an open-source platform with REST API endpoints and interactive 3D visualization.

URL PDF HTML ☆

赞 0 踩 0

2606.09919 2026-06-10 cs.LG cs.AI cs.MA cs.RO 新提交

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

Co-GLANCE: 异构机器人团队的不确定性感知主动感知

Michal P. Podolinsky, Neel P. Bhatt, Pranay Samineni, Rohan Siva, Christian Ellis, Ufuk Topcu

AI总结提出Co-GLANCE系统，通过蒸馏视觉语言模型实现实时遮挡分割与机器人分配，结合共形预测与选择性弃权提供统计保证的不确定性量化，驱动主动感知，在真实场景中遮挡分割和分配准确率分别提升25%和36%，推理延迟降低350倍。

Comments Code, videos, and dataset available at https://co-glance.github.io/

详情

AI中文摘要

感知不确定性是异构机器人团队在非结构化户外环境中运行的核心挑战，因为单一视角无法提供可靠的场景理解。由遮挡等来源引起的感知不确定性，根据场景结构在不同机器人视角下表现不同。检测和解决感知不确定性的来源需要基于场景的上下文推理和具备能力感知的机器人分配。虽然视觉语言模型为两者提供了强大的语义先验，但它们对于机载推理在计算上过于昂贵，且缺乏校准的不确定性量化。我们介绍了Co-GLANCE，一个用于异构机器人团队不确定性解决的实时机载感知与决策系统。Co-GLANCE将视觉语言模型的语义推理能力蒸馏为用于遮挡分割和机器人分配的端到端模型，消除了对基于云推理的需求。为了量化感知不确定性，Co-GLANCE结合了共形预测与选择性弃权，为分割、机器人分配和检测输出提供统计有效的覆盖保证。这些校准的不确定性估计直接触发主动感知，派遣最合适的机器人获取信息丰富的视角并解决不确定性。在真实世界场景中，Co-GLANCE在遮挡分割和机器人分配准确率上分别比基于云的视觉语言模型基线高出25%和36%，同时将每帧推理延迟降低350倍。我们还发布了一个空地数据集以供未来研究。代码、视频和数据集可在以下网址获取：此 https URL。

英文摘要

Perceptual uncertainty is a central challenge for heterogeneous robot teams operating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from sources such as occlusions, manifests differently across robot viewpoints depending on scene structure. Detecting and resolving sources of perceptual uncertainty requires both scene-based contextual reasoning and capability-aware robot allocation. While vision-language models provide strong semantic priors for both, they are computationally prohibitive for onboard inference and lack calibrated uncertainty quantification. We introduce Co-GLANCE, a real-time onboard perception and decision-making system for uncertainty resolution in heterogeneous robot teams. Co-GLANCE distills the semantic reasoning capabilities of a vision-language model into an end-to-end model for occlusion segmentation and robot allocation, eliminating the need for cloud-based inference. To quantify perceptual uncertainty, Co-GLANCE combines conformal prediction with selective abstention to provide statistically valid coverage guarantees for segmentation, robot allocation, and detection outputs. These calibrated uncertainty estimates directly trigger active perception, dispatching the most appropriate robot to acquire informative viewpoints and resolve uncertainty. Across real-world scenarios, Co-GLANCE outperforms cloud-based vision-language model baselines in occlusion segmentation and robot allocation accuracy by 25% and 36%, respectively, while reducing per-frame inference latency 350x. We also release an air-ground dataset for future research. Code, videos, and dataset available at https://co-glance.github.io/ .

URL PDF HTML ☆

赞 0 踩 0

2606.09917 2026-06-10 cs.LG 新提交

SPDM: Geometry-Modulated State Space Modeling with Manifold Constraints for Time Series Forecasting

SPDM: 基于流形约束的几何调制状态空间建模用于时间序列预测

Xingsheng Chen, Siu-Ming Yiu

AI总结提出SPDM，一种将对称正定流形约束引入状态空间模型的几何感知架构，通过流形轨迹和几何门控机制调制选择性扫描，在保持线性复杂度同时提升多变量时间序列预测精度。

详情

AI中文摘要

多变量时间序列预测需要捕捉交互变量间持续演化的相关结构。现有状态空间模型通过扫描标记化的时间或空间序列来处理时间序列，忽略了演化的几何结构。我们通过将流形约束引入状态空间建模来解决这一局限性：将跨变量相关结构视为对称正定流形上的连续轨迹，其黎曼几何特征、切空间线性度和弗雷歇均值中心性作为原则性的几何正则化器，引导并稳定SSM的选择性扫描动态。我们提出SPDM，一种几何感知的SSM架构，通过两种协作机制实现这一原则：一个流形轨迹路径，将动态演化的协方差矩阵从SPD流形投影到欧几里得切空间；以及一个几何门控方案，基于从流形轨迹导出的几何信号直接调制SSM的内部选择性参数。该参数化在嵌入丰富结构约束的同时保持了Mamba并行扫描的线性时间复杂度，使架构同时保持预测精度和计算效率。在11个真实世界基准数据集上的广泛实验建立了最先进的预测性能，进一步研究证实几何约束的状态空间动态是其性能提升背后的主导架构因素。

英文摘要

Multivariate time series forecasting requires capturing the continuously evolving correlation structure among interacting variables. Existing state-space models process time series by scanning tokenized temporal or spatial sequences, discarding the evolutionary geometric structure. We address this limitation by introducing manifold constraints into state-space modeling: treating the cross-variable correlation structure as a continuous trajectory on the symmetric positive definite manifold, whose Riemannian geometric features, tangent space linearity, and Frechet mean centrality act as a principled geometric regularizer that guides and stabilizes the selective scanning dynamics of SSMs. We propose SPDM, a geometry-aware SSM architecture that realizes this principle through two cooperating mechanisms: a manifold trajectory path that projects dynamically evolving covariance matrices from the SPD manifold to a Euclidean tangent space, and a geometric gating scheme that directly modulates SSM's internal selective parameters based on geometric signals derived from the manifold trajectory. The parameterization preserves the linear-time complexity of the Mamba parallel scan while embedding rich structural constraints, making the architecture preserve prediction accuracy and computational efficiency simultaneously. Extensive experiments on eleven real-world benchmark datasets establish state-of-the-art forecasting performance, and further studies confirm that geometrically constrained state-space dynamics are the dominant architectural factor behind its performance gains.

URL PDF HTML ☆

赞 0 踩 0

2606.09916 2026-06-10 cs.LG cs.AI 新提交

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

IntentKV: 面向Agent推理的跨轮次意图感知KV缓存剪枝

Junjie Li, Jiong Lou, Jie Li

AI总结针对多轮LLM Agent中KV缓存成为服务瓶颈的问题，提出IntentKV方法，通过会话级QueryMemory和残差注意力头实现跨轮次意图感知的KV剪枝，在保持精度的同时大幅降低峰值请求token和KV读取量。

详情

AI中文摘要

多轮LLM Agent将短查询扩展为包含工具调用、搜索结果和中间推理的长轨迹。在单条轨迹中，KV内存和KV读取带宽增长数个数量级，使得键值（KV）缓存（而非参数计算）成为长时Agent的主要服务瓶颈。我们提出IntentKV，一种学习型KV剪枝方法，保持基础LLM冻结。IntentKV维护一个会话级的跨轮次意图QueryMemory，通过记忆-注意力规则对实时历史token进行评分，并添加一个零初始化的残差注意力头，对当前查询的K向量进行交叉注意力。为了与前缀缓存保持可组合性，驱逐采用槽位映射重定向：被丢弃的位置路由到一个哨兵死槽，而存活的K/V行、RoPE相位和槽位标识保持不变。在严格的KV预算下，IntentKV与无剪枝的全缓存基线相比几乎没有精度下降：在8k KV预算下，Qwen3-8B的平均峰值请求token下降23.9%，Qwen2.5-14B下降30.7%。在Qwen2.5-14B上所有方法都能完成的100个最长BCP查询中，IntentKV-8k进一步将最坏情况下的峰值请求token从92.3k降至20.5k（减少77.8%），最坏情况下的原始KV读取从4.11亿降至3100万（减少92.6%）。

英文摘要

Multi-turn LLM agents fan short queries into long trajectories of tool calls, search results, and intermediate reasoning. Both KV memory and KV read bandwidth grow by orders of magnitude across a single trajectory, making the key-value (KV) cache, not parameter compute, the dominant serving bottleneck for long-horizon agents. We introduce IntentKV, learned KV pruning that keeps the base LLM frozen. IntentKV maintains a session-level QueryMemory of cross-turn intent, scores live history tokens with a memory-attention rule, and adds a zero-initialized residual head with cross-attention over current-query K-vectors. To stay composable with prefix caches, eviction is a slot-map redirection: dropped positions route to a sentinel dead slot while surviving K/V rows, RoPE phases, and slot identities stay in place. IntentKV matches the no-pruning full-cache baseline with almost no accuracy drop under tight KV budgets: at an 8k KV budget, mean peak request tokens drop 23.9% on Qwen3-8B and 30.7% on Qwen2.5-14B. On the 100 longest BCP queries that all methods complete on Qwen2.5-14B, IntentKV-8k further cuts worst-case peak request tokens from 92.3k to 20.5k, a 77.8% reduction, and worst-case raw KV reads from 411M to 31M, a 92.6% reduction.

URL PDF HTML ☆

赞 0 踩 0

2606.09912 2026-06-10 cs.LG cs.AI 新提交

Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining

混合而非挑选：为什么合成语料组合对时间序列基础模型预训练至关重要

Aaryan Nagpal, Debdeep Sanyal, Murari Mandal, Dhruv Kumar, Saurabh Deshpande

AI总结针对时间序列基础模型预训练中合成数据生成器选择困难的问题，提出简单等权混合所有生成器的方法，匹配或超越最优单个生成器，并与真实数据结合获得最强预训练语料。

Comments Accepted at the ICML 2026 Workshop on Foundation Models for Structured Data (FMSD), Seoul, South Korea

详情

AI中文摘要

为时间序列基础模型预训练选择错误的合成生成器代价高昂：在相同训练预算下，最佳和最差生成器产生的预测误差差距可达2倍，然而该领域尚无原则性的选择方法。问题因生成器排名在不同架构间不稳定而加剧：在11个生成器家族上，对从头训练的Chronos-T5-Mini和Moirai-Small进行评估，我们发现哪些生成器有用取决于模型架构。我们没有解决生成器选择问题，而是绕过了它：所有生成器的简单等权混合匹配或击败了两种架构的最佳单个生成器，并且将此混合与真实数据组合产生了整体最强的预训练语料。因此，合成预训练是一个语料组合问题，而非生成器选择问题，组合选择应针对每个模型家族进行验证，而非假设可迁移。

英文摘要

Choosing the wrong synthetic generator for time-series foundation model pretraining is costly: under identical training budgets, the best and worst generators produce up to a $2\times$ gap in forecasting error, yet the field has no principled way to make this choice. The problem is compounded by the fact that generator rankings are not stable across architectures: across 11 generator families evaluated on Chronos-T5-Mini and Moirai-Small trained from scratch, we find that which generators are useful depends on the model architecture. Rather than solving the generator selection problem, we sidestep it: a simple equal-weight mixture of all generators matches or beats the best individual generator for both architectures, and composing this mixture with real data yields the strongest pretraining corpora overall. Synthetic pretraining is therefore a corpus composition problem, not a generator selection problem, and composition choices should be validated per model family rather than assumed to transfer.

URL PDF HTML ☆

赞 0 踩 0

2606.09907 2026-06-10 cs.LG cs.AI 新提交

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

LongMoE：基于轨迹感知的混合专家模型的纵向多模态学习

Maxx Richard Rahman, Prakhar Kumar, Wolfgang Maass

AI总结提出LongMoE框架，通过上下文感知插补、注意力标记化、轨迹感知编码和稀疏MoE路由，联合解决临床多模态学习中模态缺失和纵向动态两大挑战，在ADNI等数据集上验证了鲁棒性。

详情

AI中文摘要

多模态临床学习对于整合包括影像、文本和个性化健康记录在内的多样化患者数据日益重要。然而，它面临两个基本挑战：i) 模态缺失，即在一次患者就诊中任意子集的模态不可用；ii) 纵向动态，即观察结果的诊断意义取决于患者随时间演变的疾病轨迹。现有方法孤立地处理这些挑战：缺失模态框架将每次就诊视为独立的静态快照并丢弃时间上下文，而纵向模型通常假设模态完全可用并在系统性模态不完整时性能下降。我们提出LongMoE（纵向混合专家模型），这是一个统一框架，用于联合解决这两个挑战。LongMoE结合了上下文感知插补模块和注意力标记化模块，后者捕获不规则就诊序列中的频域时间模式，以及用于建模疾病进展的轨迹感知编码器和用于患者特定专家选择的上下文条件稀疏MoE路由。在ADNI、OASIS-3和MIMIC-IV上的实验表明，LongMoE在缺失或弱共时模态下提高了鲁棒性，并在全模态设置中保持竞争力，为纵向感知的多模态临床学习奠定了坚实基础。

英文摘要

Multimodal clinical learning is increasingly important for integrating diverse patient data, including imaging, text, and personalised health records. However, it faces two fundamental challenges: i) modality missingness, where arbitrary subsets of modalities are unavailable at a given patient visit, ii) longitudinal dynamics, where the diagnostic significance of an observation depends on the patient's evolving disease trajectory over time. Existing methods address these challenges in isolation: missing-modality frameworks treat each visit as an independent static snapshot and discard temporal context, while longitudinal models often assume complete modality availability and degrade under systematic modality incompleteness. We propose LongMoE (Longitudinal Mixture-of-Experts), the unified framework to jointly address both challenges. LongMoE combines a context-aware imputation module with an attentional tokenization module that captures frequency-domain temporal patterns across irregular visit sequences, a trajectory-aware encoder for modeling disease progression, and context-conditioned Sparse MoE routing for patient-specific expert selection. Experiments on ADNI, OASIS-3, and MIMIC-IV show that LongMoE improves robustness under missing or weak contemporaneous modalities and remains competitive in full-modality settings, establishing a strong foundation for longitudinally-aware multimodal clinical learning.

URL PDF HTML ☆

赞 0 踩 0

2606.09875 2026-06-10 cs.LG cs.AI stat.ML 新提交

Integrating Local and Global Entropy for Uncertainty Quantification in LLMs

集成局部和全局熵用于大语言模型的不确定性量化

Johanne Medina, Tianyi Zhou, Keivin Isufaj, Aristides Gionis, Sanjay Chawla

AI总结本文提出GLU方法，通过融合隐藏状态几何熵（全局）和token级熵（局部）来量化LLM不确定性，有效捕捉自信但错误的失败模式，无需额外训练。

Comments 17 pages, 2 figures

详情

AI中文摘要

大语言模型会自信地产生幻觉，使得不确定性量化（UQ）对于可靠部署至关重要。现有方法主要依赖token级信号，而中间隐藏状态的几何结构未被充分利用。在本文中，我们将隐藏状态矩阵的几何复杂度作为LLM全局不确定性的度量，同时将token级不确定性估计视为局部度量。我们表明，隐藏状态几何熵（全局不确定性）和token级熵（局部不确定性）在统计上近似正交，捕捉了可靠性预测的不同失败模式。特别地，全局几何恢复了局部信号系统性遗漏的自信但错误的失败模式。基于此，我们提出了全局-局部不确定性（GLU），这是一种无监督、单次前向传播的分数，通过乘法门融合两种信号。在三个模型族和六个基准测试中，GLU匹配或优于所有无监督基线，同时仅需一次前向传播，且保持长度归一化和架构无关性。

英文摘要

Large language models hallucinate confidently, making uncertainty quantification (UQ) essential for reliable deployment. Existing methods rely predominantly on token-level signals, leaving the geometric structure of intermediate hidden states underused. In this paper, we take the geometric complexity of hidden-state matrices as a measure of the global uncertainty of LLMs, while treating token-level uncertainty estimation as a local metric. We show that hidden-state geometric entropy (global uncertainty) and token-level entropy (local uncertainty) are statistically near-orthogonal, capturing distinct failure regimes for reliability prediction. In particular, global geometry recovers the confident-but-wrong failure mode that local signals systematically miss. Building on this, we propose Global-Local Uncertainty (GLU), an unsupervised, single-pass score that fuses the two signals via a multiplicative gate. Across three model families and six benchmarks, GLU matches or outperforms all unsupervised baselines while requiring only a single forward pass and remaining length-normalized and architecture-agnostic.

URL PDF HTML ☆

赞 0 踩 0

2606.09873 2026-06-10 cs.LG cs.AI 新提交

Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

Rotate2Think：通过正交旋转进行几何提示以提升语言模型推理能力

Aditya Sharma, Christopher J. Pal, Amal Zouaq

AI总结发现推理模型的输入嵌入与思考嵌入存在高锥度且方向非共线，提出无训练方法Rotate2Think，通过正交Procrustes分析估计旋转并注入合成思考向量，在30/32配置中提升数学、科学和代码任务准确率。

详情

AI中文摘要

推理模型通过生成显式的中间推理轨迹再给出最终答案，在挑战性任务上取得了强劲表现。然而，推理过程中表示空间的内部结构仍知之甚少：模型的隐藏表示在思考时与输入提示的嵌入有何不同？这种结构能否被利用以在推理时激发更强的推理能力？我们表明，输入嵌入和思考嵌入（分别对提示和推理轨迹的最后一层隐藏状态进行均值池化）都表现出极高的锥度，所有向量紧密聚集在单一平均方向周围。关键的是，这些平均输入方向和思考方向是非共线的，思考嵌入在嵌入空间中占据了几何上不同的区域，这在许多不同模型和基准任务中均成立。这一观察启发我们将输入到思考的转换视为一个旋转问题，该问题可通过正交Procrustes分析得到闭式解。我们提出Rotate2Think，一种无需训练的方法，从少量正确求解的示例中估计该旋转，并在推理时将生成的合成思考向量注入思考分隔符之间，在推理轨迹开始时提供几何提示。在多个基准和模型家族上的评估表明，Rotate2Think在数学、科学和代码任务的32个模型-基准配置中改进了30个的准确率，并零样本泛化到MATH-Vision上的多模态推理。

英文摘要

Reasoning models achieve strong performance on challenging tasks by generating explicit intermediate reasoning traces before producing a final answer. Yet the internal structure of representation space when reasoning remains poorly understood: how do a model's hidden representations differ during thinking versus the embeddings of the input prompt, and can this structure be exploited to elicit stronger reasoning at inference time? We show that both input embeddings and thinking embeddings (mean-pooled last-layer hidden states over the prompt and reasoning trace, respectively) exhibit extremely high conicity, with all vectors clustering tightly around a single mean direction. Crucially, these mean input and thinking directions are non-collinear, with thinking embeddings occupying a geometrically distinct region of embedding space across many different models and benchmark tasks. This observation motivates casting the input-to-thinking transition as a rotation problem admitting a closed-form solution via orthogonal Procrustes analysis. We propose Rotate2Think, a training-free method that estimates this rotation from a small set of correctly solved examples and injects the resulting synthetic thinking vector between thinking delimiters at inference time, providing a geometric primer at the onset of the reasoning trace. Evaluated across multiple benchmarks and model families, Rotate2Think improves accuracy in 30 of 32 model-benchmark configurations across mathematics, science, and code tasks, and generalizes zero-shot to multimodal reasoning on MATH-Vision.

URL PDF HTML ☆

赞 0 踩 0

2606.09871 2026-06-10 cs.CV cs.AI cs.LG 新提交

SD-GRPO: Verifiable Segment Decomposition for Long-Form Vision-Language Generation

SD-GRPO：面向长格式视觉-语言生成的可验证片段分解

Hyunwoong Kim, Seongeun Lee, Hannah Yun, Junhyun Park, Jonggwon Park

AI总结提出SD-GRPO方法，通过将长格式输出分解为片段并计算逐片段优势，解决GRPO在视觉-语言任务中粗粒度信用分配不足的问题，实验证明其在多种长格式生成任务中优于基线。

详情

AI中文摘要

群体相对策略优化（GRPO）及其变体最初为大型语言模型（LLM）开发，最近被应用于多模态LLM并取得了强劲结果。然而，它们基于单一标量优势的粗粒度整体信用分配在视觉-语言（VL）任务中拟合不足，这些任务的输出通常是基于语义丰富图像的长格式响应。为解决这一限制，我们利用了一种单标量公式丢弃的结构化信号：长格式VL输出的自然分段。具体地，我们提出片段分解GRPO（SD-GRPO），它对整个rollout组中可验证的逐片段奖励进行z归一化，生成一个逐片段优势向量以替代单一标量。我们在三个设置中评估SD-GRPO，涵盖受控和真实世界的长格式VL生成，按片段间语义纠缠程度递增组织。在从DOCCI构建的受控多面板密集字幕任务中（片段语义独立），SD-GRPO始终优于GRPO基线，且片段数量越多增益越大。扩展到从MultiChartQA构建的受控多图表长格式VQA任务，我们从理论和经验上证明，rollout级奖励存在随输出长度增加而加剧的跨片段信用错误归因。在MMSci数据集上的真实世界科学图表字幕任务中（子图字幕共享图表上下文），混合整体和逐片段奖励进一步提升了两者性能，表明当片段语义纠缠时，仅逐片段归一化是不够的。最后，通过将SD-GRPO集成到Dr. GRPO中，我们确认它可以以最小的实现开销应用于任何GRPO框架，以增强长格式VL生成。

英文摘要

Group Relative Policy Optimization (GRPO) and its variants, originally developed for Large Language Models (LLMs), have recently been applied to Multimodal LLMs and produced strong results. However, their coarse-grained holistic credit assignment from a single scalar advantage underfits vision-language (VL) tasks, where outputs are often long-form responses grounded in semantically rich images. To address this limitation, we exploit a structured signal that single-scalar formulations discard: the natural segmentation of long-form VL outputs. Concretely, we propose Segment-Decomposed GRPO (SD-GRPO), which z-normalizes verifiable per-segment rewards across the rollout group, yielding a vector of per-segment advantages in place of a single scalar. We evaluate SD-GRPO across three settings spanning controlled and real-world long-form VL generation, organized by increasing semantic entanglement across segments. On a controlled multi-panel dense-captioning task constructed from DOCCI, where segments are semantically independent, SD-GRPO consistently outperforms the GRPO baseline, with larger gains at higher segment counts. Extending to a controlled multi-chart long-form VQA task constructed from MultiChartQA, we show both theoretically and empirically that rollout-level rewards suffer from cross-segment credit misattribution that scales with output length. On a real-world scientific figure captioning task on the MMSci dataset, where subfigure captions share context across the figure, blending holistic and per-segment rewards further improves on both, suggesting per-segment normalization alone is insufficient when segments are semantically entangled. Finally, by integrating SD-GRPO into Dr. GRPO, we confirm that it can be applied to any GRPO framework with minimal implementation overhead to enhance long-form VL generation.

URL PDF HTML ☆

赞 0 踩 0

2606.09869 2026-06-10 cs.LG cs.AI cs.CR 新提交

QSplitFL: Capability Aware Deep Q-Learning for Optimal Split Point Selection in Split Federated Learning

QSplitFL: 基于能力感知的深度Q学习在分割联邦学习中的最优分割点选择

Nazmus Shakib Shadin, Xinyue Zhang, Jingyi Wang, Miao Pan

AI总结提出QSplitFL框架，利用深度Q网络基于客户端硬件指标（CPU、内存、电池、网络延迟）动态选择最优分割点，解决异构设备上的分割联邦学习挑战，通过衰减损失奖励函数和委员会投票机制提升收敛速度和精度。

Comments Accepted by ECML-PKDD 2026

详情

AI中文摘要

联邦学习（FL）与分割学习（SL）结合是一种隐私保护范式，能够在资源受限设备上训练深度神经网络（DNN），同时降低整体训练成本。然而，确定最优分割点（即模型被分割的层）仍然是一个关键挑战，尤其是当客户端具有异构硬件能力时。固定分割点可能使弱设备过载，增加通信和服务器负载，从而减慢收敛速度并降低稳定性。本文介绍了QSplitFL，一种新颖的基于能力感知的深度Q网络（DQN）框架，用于在基于分割学习的联邦学习（SFL）环境中选择最优分割点。与依赖高维模型权重表示的现有方法不同，QSplitFL采用直接从客户端硬件指标（包括CPU利用率、内存、电池电量和网络延迟）导出的轻量级状态表示。所提出的框架包含一个衰减损失下降奖励函数，优先考虑早期收敛，以及一个基于委员会的DQN架构，通过多数投票来减轻奖励黑客攻击。在MNIST、Fashion-MNIST、CIFAR-10和CIFAR-100数据集上，使用CNN、ResNet50、MobileNetV4和ConvNeXt架构进行的广泛实验表明，我们的方法在收敛速度和精度上优于现有方法，同时有效适应异构设备资源。源代码在此https URL公开可用。

英文摘要

Federated Learning (FL) combined with Split Learning (SL) is a privacy preserving paradigm that enables training deep neural networks (DNNs) on resource constrained devices while reducing overall training cost. However, determining the optimal split point, meaning the layer where the model is divided still remains a critical challenge, especially when clients have heterogeneous hardware capabilities. Fixed split points can overload weak devices and increase the communication and server load, which slows convergence and reduces stability. This paper introduces QSplitFL, a novel capability-aware Deep Q-Network (DQN) framework for optimal split point selection in Split learning based Federated Learning (SFL) environments. Unlike existing approaches that rely on high-dimensional model weight representations, QSplitFL employs a lightweight state representation derived directly from client hardware metrics, including CPU utilization, memory, battery level, and network latency. The proposed framework incorporates a decayed loss-drop reward function that prioritizes early convergence, and a committee-based DQN architecture with majority voting to mitigate reward hacking. Extensive experiments on MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets using CNN, ResNet50, MobileNetV4, and ConvNeXt architectures demonstrate that our approach achieves better convergence and higher accuracy compared to existing methods, while effectively adapting to heterogeneous device resources. The source code is publicly available at https://github.com/AIPO-Lab/QSplitFL.

URL PDF HTML ☆

赞 0 踩 0

2606.09866 2026-06-10 cs.LG cs.AI 新提交

Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning

双人探戈：面向安全LLM微调的耦合任务-参考选择

Xinrui Chen, Jianhao Zhang, Ou Wu, Di Gao

AI总结提出DualSelect框架，通过耦合任务与安全参考选择，在微调时保持安全对齐，提升安全评分至少5.10点。

详情

AI中文摘要

在下游数据上微调安全对齐的大型语言模型（LLMs）可以提高适应性，但可能会侵蚀已学习的安全行为。现有方法使用固定的安全示例、全局约束或单边任务过滤。我们的诊断表明，任务更新暴露了不同的安全约束，从而激发了联合选择相关参考和兼容任务样本的需求。我们提出DualSelect，一个耦合的任务和参考选择框架，它在过滤与诱导参考方向兼容的整个任务样本之前，刷新任务条件化的安全参考。在极小极大视角下，DualSelect通过熵正则化评分代理、惰性参考刷新和梯度校正，选择具有高保留损失和任务冲突的安全参考以及兼容的任务样本。在1B-8B LLMs上，DualSelect在不损失任务效用的情况下保持安全性；使用REDORCA评估器，它在安全平均值上比最强基线至少提高5.10分，并且在所有评估器中保持最高的安全平均值，且开销适中。这一观点扩展到以保留为中心的持续学习。

英文摘要

Fine-tuning safety aligned large language models (LLMs) on downstream data improves adaptation but may erode learned safety behavior. Existing methods use fixed safety examples, global constraints, or one-sided task filtering. Our diagnostics show task updates expose different safety constraints, motivating joint selection of relevant references and compatible task samples. We propose DualSelect, a coupled framework for task and reference selection that refreshes task conditioned safety references before filtering whole task samples compatible with the induced reference direction. Under a minimax view, DualSelect selects safety references with high preservation loss and task conflict, together with compatible task samples, through entropy-regularized scoring surrogates, lazy reference refresh, and gradient correction. On 1B-8B LLMs, DualSelect preserves safety without losing task utility; using the REDORCA judge, it improves Safety Avg. over the strongest baseline by at least 5.10 points and remains highest in Safety Avg. across judges with moderate overhead. This view extends to retention focused continual learning.

URL PDF HTML ☆

赞 0 踩 0

2606.09865 2026-06-10 cs.LG cs.CR cs.IR 新提交

LLM-as-a-Discriminator: When Synthetic Tables Still Look Real

LLM作为判别器：当合成表格看起来仍然真实

Manel Slokom, Malek Slokom, Thierno Kante

AI总结提出用LLM区分真实与合成表格数据，测试不同设置和模型，发现LLM判别可作为实用的隐私审计信号。

详情

AI中文摘要

隐私和数据共享常常处于紧张状态。许多组织使用合成数据来降低隐私风险，同时仍能共享有用的数据。对于表格数据，审计隐私仍然困难。在许多情况下，即使是人类也很难判断一个表格是真实的还是合成的。在本文中，我们提出了一种基于LLM判别的方法。我们要求LLM将每个表格样本分类为真实或合成。我们测试了两种设置：C1仅包含表格，C2包含表格和分布元数据。我们使用LLaMA作为开放模型，Gemini作为参考模型。在我们的实验中，我们在两个公共数据集UCI Adult和ACS Census上运行了三种合成模型：CTGAN、TVAE和Gaussian Copula。我们收集了451个有效试验。我们的结果显示模型之间存在明显差异。在Adult上，LLaMA在报告单元格中达到DRS=0%，而Gemini对CTGAN和TVAE达到DRS=100%。在Census上，LLaMA预测大多数样本为合成，而Gemini在C1中保持高值，但在C2中对CTGAN和TVAE下降。我们还与分类器双样本检验（C2ST）和记录链接作为分布基线进行了比较，并与2名标注员和240次试验的人类试点进行了比较。我们的结果表明，当模型选择、每个提供者的报告和数据编码得到谨慎处理时，LLM判别是一种实用的隐私审计信号。为了可重复性，代码和实验脚本可在以下网址获得：https://this URL。

英文摘要

Privacy and data sharing are often in tension. Many organizations use synthetic data to reduce privacy risk and still share useful data. For tabular data, auditing privacy remains hard. In many cases, even humans cannot easily tell if a table is real or synthetic. In this paper, we propose a method based on LLM discrimination. We ask an LLM to classify each table sample as REAL or SYNTHETIC. We test two settings: C1 with table only, and C2 with table plus distributional metadata. We use LLaMA as an open model and Gemini as a reference model. In our experiments, we run three synthesis models, CTGAN, TVAE, and Gaussian Copula, on two public datasets, UCI Adult and ACS Census. We collect 451 valid trials. Our results show clear differences between models. On Adult, LLaMA reaches DRS=0% in reported cells, while Gemini reaches DRS=100% for CTGAN and TVAE. On Census, LLaMA predicts SYNTHETIC for most samples, while Gemini stays high in C1 but drops for CTGAN and TVAE in C2. We also compare with a classifier two-sample test (C2ST) and record linkage as distributional baselines, and with a human pilot of 2 annotators and 240 trials. Our results show that LLM discrimination is a practical privacy audit signal when model choice, per provider reporting, and data encoding are handled with care. For reproducibility, code and experiment scripts are available at https://github.com/SlokomManel/LLM-as-a-Discriminator.

URL PDF HTML ☆

赞 0 踩 0

2606.09862 2026-06-10 cs.LG cs.AI 新提交

Blurry Window Attention

模糊窗口注意力

Axel Laborieux, Christos Sourmpis, Juan Gabriel Kostelec, Qinghai Guo

AI总结提出模糊窗口注意力（BLA），一种基于Dirichlet核插值重构模糊KV历史的有界记忆控制方法，在合成任务中状态效率比滑动窗口注意力高8倍，且随状态增大性能提升。

详情

AI中文摘要

Transformer语言模型中的Softmax注意力操作在序列长度上具有二次复杂度，且状态大小以KV缓存形式增长，这成为长上下文场景中的瓶颈。为克服此限制，引入了具有线性复杂度和有限状态大小的替代架构，如状态空间模型（SSM）、线性注意力（LA）和有界记忆控制注意力（ABC）。尽管线性模型在语言困惑度上与Transformer相当，但在需要检索或回忆特定信息的任务中仍落后。本文提出模糊窗口注意力（BLA），一种受SSM启发的新型ABC方法。BLA存储一个频率窗口，通过使用Dirichlet核进行插值从中重建模糊的KV历史。根据Dirichlet核的分辨率，BLA可理解为滑动窗口注意力（SWA）的泛化，或门控槽注意力（GSA）的特例，其中衰减因子由Dirichlet核实现。我们详细描述了BLA的理论和高效实现。在多查询关联回忆（MQAR）合成任务上，我们表明BLA的状态效率比SWA高8倍，且与流行的线性注意力模型竞争；在RegBench合成任务中，在我们测试的线性模型中，只有BLA和SWA随着状态大小增长而提升性能。

英文摘要

The Softmax Attention operation in Transformer language models has a quadratic complexity in the sequence length and a growing state size in the form of KV cache, which becomes a bottleneck in long context scenarios. To overcome this limitation, alternative architectures with linear complexity and finite state size have been introduced, such as State-Space Models (SSMs), Linear Attention (LA), and Attention with Bounded-memory Control (ABC). Though linear models achieve similar language perplexity as Transformers, they are still behind in tasks which require retrieval or recall of specific information. In this work, we introduce Blurry Window Attention (BLA) a novel ABC method inspired by SSMs. BLA stores a frequency window from which a blurry KV history is reconstructed via interpolation using Dirichlet kernels. BLA can be understood as a generalization of Sliding Window Attention (SWA) depending on the Dirichlet kernels resolution or as a special case of the Gated Slot Attention (GSA), where the decay factor is implemented with Dirichlet kernels. We describe in details the theory and efficient implementation of BLA. On the Multi-Query Associate Recall (MQAR) synthetic task, we show that the state efficiency of BLA is 8$\times$ better than SWA and is competitive with popular linear attention models, and in the RegBench synthetic task, only BLA and SWA improve their performance as the state size grows among the linear models we tested.

URL PDF HTML ☆

赞 0 踩 0

2606.09861 2026-06-10 cs.LG cs.AI 新提交

Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models

时间序列作为语言：通用时间序列基础模型的通用分词器

Yunhao Zhang, Ruiying Qi, Jiale Zheng, Jianfeng Zhang, Lujia Pan, Junchi Yan

AI总结提出UniTok通用分词器将时间序列转化为离散令牌，并基于NTP预训练UniTok-FM基础模型，支持零样本预测、提示增强预测以及少样本生成和分类，无需任务特定修改。

详情

AI中文摘要

虽然下一个令牌预测（NTP）统一了LLM的预训练，但其对无界、连续时间序列（TS）的适应仍然是一个开放问题。为了弥合这一差距，我们引入了UniTok，一个将TS转化为离散令牌的通用分词器，以及UniTok-FM，一个在这些令牌上通过NTP预训练的基础模型。UniTok-FM是一个通用基础模型，支持零样本和提示增强的预测，以及通过无训练上下文推理进行的少样本生成和分类——这是先前工作未能实现的能力。在技术上，UniTok是一个向量量化自编码器，结合了前缀归一化以实现尺度稳定、渐进分辨率因果架构用于编码和解码，以及结构保持重建损失用于训练。UniTok-FM采用现成的LLM架构，无需针对TS的特定修改。它不是在孤立的TS上预训练，而是在由多个具有相似模式的序列形成的上下文窗口上执行NTP，旨在捕捉它们的共享动态。在预测、生成和分类上的实验表明，单个统一的UniTok-FM始终优于统计和监督基线，与任务特定的基础模型性能相当，并且独特地实现了跨任务的无训练上下文推理。

英文摘要

While Next-Token Prediction (NTP) has unified LLM pretraining, its adaptation to unbounded, continuous time series (TS) remains open. To bridge the gap, we introduce UniTok, a universal tokenizer that transforms TS into discrete tokens, and UniTok-FM, a foundation model pretrained via NTP on these tokens. UniTok-FM is a general-purpose foundation model that supports zero-shot and prompt-boosted forecasting, as well as few-shot generation and classification via training-free in-context inference--a capability not achieved by prior works. Technically, UniTok is a vector-quantized autoencoder incorporating prefix normalization for scale stabilization, a progressive-resolution causal architecture for encoding and decoding, and a structure-preserving reconstruction loss for training. UniTok-FM adopts an off-the-shelf LLM architecture without TS-specific modifications. Instead of pretraining on isolated TS, it performs NTP on context windows formed by multiple series with similar patterns, aiming to capture their shared dynamics. Experiments on forecasting, generation, and classification show that a single unified UniTok-FM consistently outperforms statistical and supervised baselines, achieves competitive performance with task-specific foundation models, and uniquely enables training-free in-context inference across tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.09860 2026-06-10 cs.LG cs.AI stat.AP stat.ML 新提交

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

基于梯度提升与无分布覆盖的非酒精性脂肪肝病共形风险预测

Xinze Zhang

AI总结提出结合梯度提升决策树与共形预测的机器学习框架Method，实现非酒精性脂肪肝病个体风险的无分布校准覆盖预测，在中国多中心队列中AUROC达0.912，优于多种方法。

详情

AI中文摘要

非酒精性脂肪肝病（NAFLD）影响全球约25%的成年人，带来显著的肝脏和心血管风险。然而，人群层面的筛查工具仍不充分。我们提出Method，一种用于NAFLD风险预测的机器学习框架，将梯度提升决策树与共形预测相结合，以在个体风险估计上产生校准的、无分布的覆盖保证。它集成了基于互信息的稳定性选择过程，通过自助重采样识别紧凑、临床可解释的特征子集，构建预测集，其边际覆盖可证明超过用户指定的置信水平。我们在中国广州的多中心队列（主要n=2,187；外部验证n=412）上评估了Method，使用了涵盖人口统计学、代谢生物标志物和生活方式因素的78个候选特征。Method内部AUROC为0.912，外部为0.891，优于深度神经网络、TabNet、支持向量机和逻辑回归。共形预测集在90%名义水平下达到91.3%的经验覆盖。从这些分数得出的三层风险分层将人群分为不同组别，高风险亚组的12个月进展率是低风险组的4.7倍。选定的特征——特别是腰围、ALT、GGT、甘油三酯、空腹血糖和BMI——与已建立的代谢风险因素一致，提供了生物学合理性。

英文摘要

Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning framework for NAFLD risk prediction coupling gradient-boosted decision trees with conformal prediction to yield calibrated, distribution-free coverage guarantees on individual risk estimates. It integrates a mutual-information-based stability selection procedure to identify a compact, clinically interpretable feature subset via bootstrap resampling, constructing prediction sets whose marginal coverage provably exceeds a user-specified confidence level. We evaluated Method on a multicenter cohort from Guangzhou, China (primary n=2,187; external validation n=412) using 78 candidate features across demographics, metabolic biomarkers, and lifestyle factors. Method achieves an AUROC of 0.912 internally and 0.891 externally, outperforming deep neural networks, TabNet, support vector machines, and logistic regression. Conformal prediction sets achieve 91.3% empirical coverage at the 90% nominal level. A three-tier risk stratification derived from these scores separates the population into distinct groups, with the high-risk subgroup showing a 12-month progression rate 4.7 times that of the low-risk tier. The selected features -- notably waist circumference, ALT, GGT, triglycerides, fasting glucose, and BMI -- align with established metabolic risk factors, providing biological plausibility.

URL PDF HTML ☆

赞 0 踩 0

2606.09856 2026-06-10 cs.CL cs.AI cs.LG stat.ML 新提交

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

使用概率程序训练大型语言模型的归纳推理

Liyi Zhang, Akshay K. Jagadish, Brenden M. Lake, Thomas L. Griffiths

AI总结提出基于程序的后验训练（PPT）方法，利用LLM生成概率程序场景，通过推理产生分布目标，微调模型以提升归纳推理准确性、与人类判断的一致性及校准能力。

Comments 20 pages, 5 figures

详情

AI中文摘要

大型语言模型（LLM）的后训练推理通常专注于数学和编码等演绎任务，其中正确性可验证。然而，许多现实世界的推理问题是归纳性的：智能体必须从稀疏、模糊的观测中推断不确定的信念。使用标准微调方法进行归纳推理面临挑战，包括难以策划大规模、高质量标注数据集以及处理本质上是分布式的目标。在这项工作中，我们引入了一种称为基于程序的后验训练（PPT）的新方法来解决这些局限性：我们使用LLM生成多样化的开放世界场景作为概率程序，运行概率推理以产生查询的分布式目标响应，然后在这些概率软标签上进行微调。使用这种方法，我们在10,000个程序生成的场景上微调LLM，并在保留的模板、人工标注的判断和外部基准上进行评估。总体而言，PPT显著提高了保留归纳任务的估计准确性，增强了与人类判断的一致性，并迁移到估计和校准的外部基准。此外，原始校准的增益并未被事后温度缩放所涵盖，表明与输出重新缩放相比，模型更深入地内化了不确定性。这些结果表明，概率程序介导的微调是一种有前景的方法，用于后训练LLM以可靠地执行近似归纳推理。

英文摘要

Post-training Large Language Models (LLMs) for reasoning typically focuses on deductive tasks such as mathematics and coding where correctness is verifiable. Yet, many real-world reasoning problems are inductive: agents must infer uncertain beliefs from sparse, ambiguous observations. There are challenges to using standard fine-tuning methods for inductive reasoning, including difficulties in curating large-scale, high-quality labeled datasets and in handling targets that are inherently distributional. In this work, we introduce a novel approach, called Program-based Posterior Training (PPT), to address these limitations: we use an LLM to generate diverse open-world scenarios as probabilistic programs, run probabilistic inference to produce distributional target responses to queries, and then fine-tune on these probabilistic soft labels. Using this approach, we fine-tune LLMs on 10,000 programmatically generated scenarios and evaluate on held-out motifs, human-labeled judgments, and external benchmarks. Overall, PPT substantially improves estimation accuracy on held-out inductive tasks, increases alignment with human judgments, and transfers to external benchmarks for estimation and calibration. Additionally, the gains in raw calibration are not subsumed by post-hoc temperature scaling, showing that the models have more deeply internalized uncertainty compared to output rescaling. Together, these results suggest that probabilistic-program-mediated fine-tuning is a promising approach for post-training LLMs to reliably perform approximate inductive inference.

URL PDF HTML ☆

赞 0 踩 0

2606.09854 2026-06-10 cs.CL cs.AI cs.CY cs.LG 新提交

Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis

多智能体大语言模型能否识别其同类？角色约束政治分析中的笔迹风格指纹识别

Juergen Dietrich

AI总结研究多智能体LLM在政治分析中能否通过笔迹风格识别模型家族，提出SD-CV协议，T5模型在五类归属任务中达到F1=0.991，证明提示级匿名化无法消除模型身份信号。

Comments 24 pages, 3 figures

详情

AI中文摘要

用于政治声明分析的多智能体大语言模型（LLM）管道容易受到同伴保护偏见的影响：模型倾向于保护同伴模型免于停用，并表现出依赖身份的评分扭曲。提示级匿名化被提出作为缓解措施，但先前的工作同时记录了在角色约束输出中笔迹风格指纹在匿名化后仍然存在——这引发了该缓解措施是否足够的问题。本文首次系统研究LLM是否能在匿名化条件下识别政治分析文本背后的模型家族。我们评估了三种分类器方法——LLM零样本和少样本（Claude Sonnet 4.6和Llama-3.3-70B）以及微调的T5-base模型——在一个涵盖四个商业LLM家族和一个开放世界“未知”类的五类归属任务上。我们引入了一种声明不相交的交叉验证协议（SD-CV；定义见第3.5节），该协议保证训练和验证数据之间没有内容重叠，并将其与运行不相交的基线（RD-CV）进行对比。T5在SD-CV下达到Macro F1 = 0.991（±0.008），在24个完全保留的声明上F1 = 0.978——尽管与RD-CV相比，训练-测试内容距离增加了2.1倍（0.767 vs. 0.366，p<0.001），但仍表现出稳健性，证明了真正的笔迹风格泛化能力。一项分数SD-CV分析确定了训练数据40%（约440篇文本）处的性能拐点。我们的研究结果证实，仅靠提示级匿名化无法消除模型身份信号，这对欧盟AI法案合规性（第13、14、26条）以及质量关键型多智能体部署中的计算机系统验证（CSV）具有直接影响。

英文摘要

Multi-agent large language model (LLM) pipelines for political statement analysis are vulnerable to peer-preservation bias: models tend to protect peer models from deactivation and show identity-dependent scoring distortions. Prompt-level anonymization was proposed as a mitigation, but prior work simultaneously documented that stylometric fingerprints survive anonymization in role-constrained outputs - raising the question of whether this mitigation is sufficient. This paper provides the first systematic investigation of whether LLMs can identify the model family behind political analysis texts under anonymization conditions. We evaluate three classifier approaches - LLM zero-shot and few-shot (Claude Sonnet 4.6 and Llama-3.3-70B) and a fine-tuned T5-base model - on a five-class attribution task covering four commercial LLM families and an open-world 'unknown' class. We introduce a statement-disjoint cross-validation protocol (SD-CV; defined in Section 3.5) that guarantees no content overlap between training and validation data, and contrast it with a run-disjoint baseline (RD-CV). T5 achieves Macro F1 = 0.991 (+-0.008) under SD-CV and F1 = 0.978 on 24 completely held-out statements - robust despite a 2.1x increase in train-test content distance versus RD-CV (0.767 vs. 0.366, p<0.001), demonstrating genuine stylometric generalization. A fractional SD-CV analysis identifies a performance knee at 40% of training data (~440 texts). Our findings confirm that prompt-level anonymization alone cannot neutralize model identity signals, with direct implications for EU AI Act compliance (Articles 13, 14, 26) and for computer system validation (CSV) in quality-critical multi-agent deployments.

URL PDF HTML ☆

赞 0 踩 0

2606.09850 2026-06-10 cs.LG cs.CL 新提交

Mechanistic Analysis of Alignment Algorithms in Language Models

语言模型中对齐算法的机制分析

Aarush Sinha, Ishan Garg, Veeraraju Elluru, Arth Singh, Kushal Garg

AI总结本文通过层间线性探针、稀疏自编码器和交叉编码器，系统分析了六种偏好优化方法在语言模型中的内部机制，发现不同目标函数导致不同的表示几何变换，并揭示了行为对齐与内部结构变化的不一致性。

Comments Work in Progress

详情

AI中文摘要

后训练对齐算法主要作为黑箱进行评估，掩盖了它们如何重塑语言模型的内部计算。我们对三种开源模型家族的六种偏好优化方法（PPO、DPO、SimPO、ORPO、GRPO 和 KTO）进行了系统的机制分析。通过集成层间线性探针、稀疏自编码器和交叉编码器，我们定位了偏好表示并量化了对齐引起的潜在空间几何变换。我们发现偏好信号一致地集中在早期-中期或中期-后期层，但不同的目标函数导致定性的不同表示偏移。KTO 和 GRPO 通过建设性的特征共享和稀疏高显著性招募增强了线性可分离性。相反，DPO 和 ORPO 通过非建设性的几何旋转和特征衰减降低了可分离性，而 PPO 和 SimPO 基本保持了基线几何。这些变换表现出架构依赖的变异性，表明行为对齐并不意味着统一的内部重构。我们的发现将对齐确立为一种异质性干预，激励了安全性和可解释性的标准化特征级审计，并强调了需要机制感知的优化目标。

英文摘要

Post-training alignment algorithms are predominantly evaluated as black boxes, obscuring how they reshape language models' internal computations. We present a systematic mechanistic analysis of six preference-optimization methods: PPO, DPO, SimPO, ORPO, GRPO, and KTO across three open-weight model families. By integrating layer-wise linear probing, Sparse Autoencoders, and crosscoders, we localize preference representations and quantify alignment-induced geometric transformations in latent space. We find that preference signals consistently concentrate in early--mid or mid--late layers, but different objectives induce qualitatively distinct representational shifts. KTO and GRPO enhance linear separability through constructive feature sharing and sparse, high-salience recruitment. In contrast, DPO and ORPO degrade separability via non-constructive geometric rotation and feature attenuation, while PPO and SimPO largely preserve baseline geometry. These transformations exhibit architecture-dependent variability, demonstrating that behavioral alignment does not imply uniform internal restructuring. Our findings establish alignment as a heterogeneous intervention, motivate standardized feature-level auditing for safety and interpretability, and highlight the need for mechanism-aware optimization objectives.

URL PDF HTML ☆

赞 0 踩 0

2606.11138 2026-06-10 cs.LG cs.NA math.NA 新提交

First-Order Trajectory Matching: Fast Ensemble Predictions of Chaotic, Turbulent, Stochastic Systems

一阶轨迹匹配：混沌、湍流、随机系统的快速集成预测

Shreya Jha, Timo Schorlepp, Nicholas Geissler, Jules Berman, Benjamin Peherstorfer

AI总结提出一阶轨迹匹配（FTM）方法，通过学习随机系统轨迹的一阶局部概率质量输运，实现低成本的集成预测，并捕捉通量、环流等轨迹量。

2606.10975 2026-06-10 cs.LG eess.SP math.OC 新提交

Learning Doubly Sparse Explicitly Conditioned Transforms

学习双稀疏显式条件变换

Tudor Pistol

AI总结提出一种将固定规范矩阵与自适应稀疏分量乘积形式的结构化显式条件变换学习方法，在保持快速稳定分析变换优势的同时引入可控自适应性，实验表明在双稀疏变换学习问题上达到最优性能。

Comments 10 pages, 1 figure, 1 table. Accepted for publication in Procedia Computer Science (30th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems - KES 2026; Invited Session: Global and Constrained Optimization: Algorithms and Applications)

详情

AI中文摘要

在最近的研究中，找到自然信号假定稀疏结构成立的便利空间已成为一个理想结果，其影响体现在数据压缩、降噪和特征提取等领域。虽然广泛使用的分析变换（如DFT或DCT）已经提供了高效的算法和鲁棒的稀疏表示，但它们假设了关于数据的固定先验，无法准确捕捉更严格信号类别的特定结构。为了解决这个问题，文献中引入了数据自适应学习变换的概念，允许减少变换域中的残差项。最近的研究表明，条件数在此背景下是一个良好的度量，期望的结果在泛化倾向和实现最小近似误差之间交替。受这些考虑启发，我们引入了一种结构化显式条件变换的学习，该变换被表述为一个固定规范矩阵与一个精炼的数据自适应稀疏分量的乘积。这种方法旨在保留快速稳定分析变换的优势，同时引入对数据的可控自适应性。目前尚未发现涉及这种特定公式的参考文献，表明其新颖性。所提出的算法在不精确近端方法的框架内被推导，利用了一个新导出的闭式投影算子。实验观察表明，在双稀疏变换学习问题上取得了最先进的结果，并且与密集变体相比，在显著降低计算成本的同时，有时收敛更快且更好地避免不良局部最小值。

英文摘要

Finding convenient spaces in which certain hypotheses regarding an assumed sparse structure of natural signals hold true has become a desirable result in recent research, its implications being reflected in areas such as data compression, noise reduction and feature extraction. While the extensively used analytical transforms, such as DFT or DCT, already provide efficient algorithms and robust sparse representations, they assume a fixed prior about the data, failing to accurately capture the specific structure of more restrictive classes of signals. To address this, the concept of a data-adaptive, learnt transform has been introduced in the literature, allowing for the reduction of a residual term in the transform domain. More recent studies have shown that the condition number serves as a good metric in this context, where the desired outcome alternates between a generalizing tendency and one that achieves minimal approximation error. Motivated by these considerations, we introduce the learning of a structured, explicitly conditioned transform formulated as the product of a fixed canonical matrix and a refining data-adaptive sparse component. This approach seeks to preserve the advantages of fast and stable analytical transforms, while introducing controllable adaptivity to the data. No references that concern this specific formulation have been identified so far, indicating its novelty. The proposed algorithm is motivated within the framework of inexact proximal methods, leveraging a newly derived closed-form projection operator. Empirical observations demonstrate state-of-the-art results on the doubly sparse transform learning problem and comparable performance with its dense variant at significantly lower computational costs and sometimes faster convergence and better avoidance of bad local minima.

URL PDF HTML ☆

赞 0 踩 0

2606.10944 2026-06-10 cs.LG cs.DS math.ST stat.ME stat.ML stat.TH 新提交

Express Language Modeling

Express 语言建模

Albert Gong, Annabelle Michael Carrell, Raaz Dwivedi, Lester Mackey

AI总结提出 Express 工具，将非因果注意力近似转换为因果近似，结合 Thinformer 实现最优因果注意力保证，并加速语言建模中的四个资源瓶颈。

2606.10841 2026-06-10 cs.RO cs.SY eess.SY math.OC 新提交

Gradient based Bilevel for Inverse Optimal Control, a Riemannian approach

基于梯度的双层逆最优控制：一种黎曼方法

Ahmed-Manaf Dahmani, Vincent Bonnet, David Daney, François Charpillet

AI总结提出一种黎曼逆最优控制方法，将最优轨迹集视为流形，通过流形上的优化避免标准约束违规，计算时间减少约四倍。

Comments 6 Pages, 4 Figures. To be published in a control journal

详情

AI中文摘要

逆最优控制旨在恢复解释观测轨迹作为最优控制问题解的成本函数。经典逆最优控制公式依赖于双层优化，反复求解嵌套的最优控制问题，对于实际系统很快变得计算上不可行。最近的基于投影的方法提供了一种有希望的替代方案，但由于违反标准约束条件，在使用基于梯度的方法求解时会出现数值不稳定性。在本文中，我们表明这些困难源于逆最优控制可行集的几何结构。我们证明满足最优性条件的轨迹集自然形成一个流形，并将逆最优控制重新表述为该流形上的优化问题。基于这一见解，我们提出了一种黎曼逆最优控制方法，该方法将观测轨迹投影到最优解流形上，同时通过构造保持可行性。在真实人类手臂轨迹上的实验表明，所提出的方法在重建精度上与经典双层逆最优控制相当或更好，同时计算时间减少约四倍。这些结果凸显了几何优化方法在提高逆最优控制在机器人和人体运动分析中的可扩展性和可靠性方面的潜力。

英文摘要

Inverse Optimal Control (IOC) aims to recover the cost function that explains observed trajectories as solutions of an optimal control problem. Classical IOC formulations rely on bilevel optimization, which repeatedly solves a nested optimal control problem and quickly becomes computationally prohibitive for realistic systems. Recent projection-based approaches offer a promising alternative but suffer from numerical instability when solved with gradient-based methods due to violations of standard constraint qualifications. In this paper, we show that these difficulties stem from the geometric structure of the IOC feasible set. We demonstrate that the set of trajectories satisfying the optimality conditions naturally forms a manifold and reformulate IOC as an optimization problem on this manifold. Based on this insight, we propose a Riemannian Inverse Optimal Control (RIOC) method that projects observed trajectories onto the manifold of optimal solutions while preserving feasibility by construction. Experiments on real human arm trajectories show that the proposed method achieves comparable or better reconstruction accuracy than classical bilevel IOC while reducing computation time by about a factor of four. These results highlight the potential of geometric optimization methods to improve the scalability and reliability of IOC for robotics and human motion analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.10824 2026-06-10 cs.LG math.AT 新提交

Encoding the Euler Characteristic Transform

编码欧拉特征变换

Nello Blaser, Odin Hoff Gardaa, Lars M. Salbu, Elena Xinyi Wang, Bastian Rieck

AI总结提出连续编码方法，将欧拉特征曲线转化为每个顶点的净变化序列，通过小型变换器生成特征向量，并在多个数据集上提升分类精度。

详情

AI中文摘要

欧拉特征曲线（ECC）记录线性嵌入的胞复形在给定方向上的欧拉特征随过滤高度的变化，而欧拉特征变换（ECT）是通过收集多个方向上的ECC得到的单射形状描述符。如何为神经网络编码ECT本身是一种归纳偏置，传统上通过离散化每个ECC来固定。我们引入一种连续编码：对于每个方向和每个顶点，它记录归因于该顶点的净欧拉特征变化，产生一个每个方向的令牌序列，由一个小型变换器映射到特征向量。我们将得到的流程分为两个正交轴上的阶段：一个ECC编码器，在每个方向内作用，将其曲线映射到固定长度向量；以及一个ECT表示，跨方向作用，聚合每个方向的向量为一个。我们研究了六种ECT表示架构，涵盖从结构无关的前馈基线到保持平面旋转等变性的卷积和复值模型的一系列归纳偏置。在涵盖点云、图、立方复形和网格的六个分类基准上，连续编码在六个数据集中有五个提高了准确率，控制实验将增益归因于令牌化本身而非增加的变换器容量。表示架构的重要性小于编码，其归纳偏置的收益取决于编码：前馈网络在连续编码下表现最佳，但在离散化下不如卷积架构鲁棒。

英文摘要

The Euler Characteristic Curve (ECC) records the Euler characteristic of a linearly embedded cell complex as a function of filtration height in a given direction, and the Euler Characteristic Transform (ECT) is the injective shape descriptor obtained by collecting ECCs over many directions. How the ECT is encoded for a neural network is itself an inductive bias, conventionally fixed by discretizing each ECC. We introduce a continuous encoding: for each direction and each vertex it records the net Euler-characteristic change attributed to that vertex, producing a per-direction token sequence that a small transformer maps to a feature vector. We separate the resulting pipeline into two stages on orthogonal axes: an ECC encoder that acts within each direction, mapping its curve to a fixed-length vector, and an ECT representation that acts across directions, aggregating the per-direction vectors into one. We study six ECT representation architectures spanning a range of inductive biases, from a structure-agnostic feedforward baseline to convolutional and complex-valued models that preserve equivariance under planar rotations. Across six classification benchmarks covering point clouds, graphs, cubical complexes, and meshes, the continuous encoding improves accuracy on five of six datasets, and control experiments attribute the gain to the tokenization itself rather than to the added transformer capacity. The representation architecture matters less than the encoding, and the payoff from its inductive biases depends on the encoding: a feedforward network performs best under continuous encoding but is less robust under discretization than convolutional architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.10806 2026-06-10 cs.AI math.FA 新提交

Moonshine: An Autonomous Mathematical Research Agent Centered on Conjecture Generation

Moonshine：一个以猜想生成为中心的自主数学研究智能体

Xiaoyang Chen, Xiang Jiang

AI总结提出自主智能体Moonshine，通过提取经典问题结构、提炼新概念并生成数学猜想，以Jacobian猜想为例，将其转化为神经Jacobian猜想并证明部分情况。

详情

AI中文摘要

Moonshine是一个自主智能体，其核心目标是生成数学猜想。它的核心能力是从经典问题中提取结构、提炼新概念，并制定具有数学意义的猜想。Moonshine不将解决单个命题作为终点，而是通过猜想生成、桥梁构建和障碍识别来构建可扩展的理论框架。本文以Moonshine对Jacobian猜想的探索为例，展示了局部非退化性是否强制全局单射性的核心逻辑如何转移到单隐层仿射-岭sigmoid网络上。这导致了\emph{神经Jacobian猜想}（NJC）的提出：如果这样的网络在整个空间上具有严格正的Jacobian行列式，则它必须是全局单射的。通过分别调用GPT-5.5-pro和DeepSeek-V4-pro，Moonshine获得了情况$N=n+1$的独立完整证明。此外，在ChatGPT通过其网页界面与GPT-5.5-pro交互使用的辅助下，开发了一个几何拓扑证明。这些结果为猜想的合理性提供了初步证据。然而，一般的高宽度情况$N\ge n+2$仍未解决，留待进一步研究。这项工作展示了Moonshine自主生成有意义的数学问题并对其取得严格进展的能力。

英文摘要

Moonshine is an autonomous agent whose central objective is to generate mathematical conjectures. Its core capability is to extract structure from classical problems, distill new concepts, and formulate conjectures of mathematical significance. Rather than treating the solution of a single proposition as its endpoint, Moonshine builds an extensible theoretical framework through conjecture generation, bridge building, and obstacle identification. This article uses Moonshine's exploration of the Jacobian conjecture as an example. It shows how the central logic of whether local nondegeneracy can force global injectivity is transferred to one-hidden-layer affine-ridge sigmoid networks. This leads to the formulation of the \emph{Neural Jacobian Conjecture} (NJC): if such a network has strictly positive Jacobian determinant on the whole space, then it must be globally injective. By invoking GPT-5.5-pro and DeepSeek-V4-pro separately, Moonshine obtained independent complete proofs for the case $N=n+1$. In addition, with the assistance of ChatGPT through interactive use of its web interface with GPT-5.5-pro, a geometric-topological proof was developed. These results provide preliminary evidence for the plausibility of the conjecture. The general higher-width case $N\ge n+2$, however, remains unresolved and is left for further investigation. This work illustrates Moonshine's ability to autonomously generate meaningful mathematical problems and make rigorous progress on them.

URL PDF HTML ☆

赞 0 踩 0

2606.10583 2026-06-10 cs.LG cs.AI math.OC 新提交

NOVA: Symbolic Regression Discovery of Interpretable Car-Following and Lane-Change Models with Driver Heterogeneity

NOVA: 可解释的跟驰与换道模型及驾驶员异质性的符号回归发现

Ishak Abassi, Nassim Ali Bouazzouni, Farah Ibelaiden, Nadir Farhi

AI总结提出NOVA符号回归框架，从原始轨迹数据自动发现可解释的跟驰与换道结构，在NGSIM数据集上优于基线，并揭示主导非线性项与心理物理理论关联。

详情

AI中文摘要

我们提出了NOVA，一个自主符号回归框架，能够从原始轨迹数据中识别出可解释的跟驰和换道结构，且仅需极少的先验行为假设。应用于来自NGSIM I-80和US-101数据集的4,765,788个活跃驾驶观测，NOVA的确定性Rust驱动搜索引擎评估了超过10,000个候选代数结构，并在前向平移滚动均值预测目标下识别出一个紧凑的两项加速度模型。在两种互补的预处理流程下评估，NOVA在意图预测基准上实现了RMSE = 1.376 m/s²（R² = 15.57%），在相同评估协议下，RMSE比最佳重新校准的符号回归基线（SR-LLM, PNAS 2025）低0.135 m/s²。在八个独立实验中，单个主导非线性项作为人类跟驰的稳健骨干出现；残差引导的扩展进一步将所选结构与已建立的碰撞避免心理物理理论联系起来。发现的特征算子在不同高速公路地点之间零样本迁移，R²损失低于3个百分点。扩展到多项logit框架内的换道建模，NOVA在502个未见驾驶员的严格车辆ID留出测试中实现了67.4%的平衡准确率，在三类问题上超过现有换道基线+29.8个百分点。

英文摘要

We present NOVA, an autonomous symbolic regression framework that identifies interpretable car-following and lane-change structures from raw trajectory data with minimal behavioral priors. Applied to 4,765,788 active driving observations from the NGSIM I-80 and US-101 datasets, NOVA's deterministic Rust-powered search engine evaluates over 10,000 candidate algebraic structures and identifies a compact two-term acceleration model under a forward-shifted rolling-mean prediction target. Evaluated under two complementary preprocessing pipelines, NOVA achieves $RMSE = 1.376 m/s^2$ ($R^2 = 15.57\%$) on the intent-forecasting benchmark, outperforming the best recalibrated symbolic-regression baseline (SR-LLM, PNAS~2025) by 0.135 m/s$^2$ in RMSE under an identical evaluation protocol. Across eight independent experiments, a single dominant nonlinear term emerges as a robust backbone of human car-following; a residual-guided extension further links the selected structure to an established psychophysical theory of collision avoidance. The discovered feature operators transfer zero-shot between freeway sites with under 3 pp $R^2$ loss. Extended to lane-change modelling within a multinomial logit framework, NOVA achieves 67.4\% balanced accuracy under strict vehicle-ID holdout on 502 unseen drivers, surpassing existing lane-changing baselines by +29.8 percentage points on a three-class problem.

URL PDF HTML ☆

赞 0 踩 0

2606.10321 2026-06-10 cs.LG cs.AI cs.RO math.OC 新提交

Baseline-Free Policy Optimization for Neural Combinatorial Optimization

无基线的神经组合优化策略优化

Carlos S. Sepúlveda, Gonzalo A. Ruz

AI总结提出使用GRPO算法消除神经组合优化中的基线依赖，避免训练崩溃，在TSP和CVRP上达到接近POMO的性能。

详情

AI中文摘要

神经组合优化（NCO）训练自回归策略以解决路由问题。标准训练算法REINFORCE使用滚动基线，需要维护并定期更新策略的冻结副本以降低方差。这种基线引入了一个结构脆弱性：在更难的问题实例上，较差的基线会产生噪声梯度估计，从而破坏训练稳定性。我们评估了来自大语言模型对齐的组相对策略优化（GRPO），该算法通过归一化组内采样轨迹的优势完全消除了基线。在RL4CO框架内对TSP和CVRP基准上的五种RL算法进行受控比较，我们发现：(i) GRPO避免了REINFORCE在TSP-100上观察到的训练崩溃，其中性能在预热阶段后立即从成本9.8下降到52.1，并且在延长训练下无法恢复；(ii) 在匹配的梯度更新次数下，GRPO达到了与POMO（一种基于AM的强多起点基线）在2%以内的解质量，同时无需外部基线；(iii) P3O，一种也来自对齐文献的成对偏好算法，在TSP上具有竞争力，但在CVRP上表现出更高的变异性。这些结果表明GRPO是一种有前途的无基线NCO替代方案，特别是在基线依赖训练变得脆弱的场景中。

英文摘要

Neural combinatorial optimization (NCO) trains autoregressive policies to solve routing problems. The standard training algorithm, REINFORCE with a rollout baseline, requires maintaining and periodically updating a frozen copy of the policy for variance reduction. This baseline introduces a structural vulnerability: on harder instances, a poor baseline produces noisy gradient estimates that can destabilize training. We evaluate Group Relative Policy Optimization (GRPO), an algorithm from large language model alignment that eliminates the baseline entirely by normalizing advantages within groups of sampled trajectories. In a controlled comparison of five RL algorithms on TSP and CVRP benchmarks within the RL4CO framework, we find that: (i) GRPO avoids the training collapse observed with REINFORCE on TSP-100, where performance degrades from cost 9.8 to 52.1 immediately after the warmup phase and does not recover under extended training; (ii) at matched gradient updates, GRPO achieves solution quality within 2% of POMO, a strong AM-based multi-start baseline, while requiring no external baseline; and (iii) P3O, a pairwise preference algorithm also from the alignment literature, is competitive on TSP but shows higher variability on CVRP. These results identify GRPO as a promising baseline-free alternative for NCO, particularly in settings where baseline-dependent training becomes fragile.

URL PDF HTML ☆

赞 0 踩 0

2606.10289 2026-06-10 cs.RO cs.NA math.NA 新提交

Improved Representation of Matrix Lie Group Operations through Tensor Notation

通过张量符号改进矩阵李群运算的表示

Clark Taylor

AI总结本文引入张量和爱因斯坦求和符号来简化矩阵李群在李导数计算中的表示，提高估计框架中梯度计算的清晰度。

Comments 12 pages, 4 figures + graphical abstract, 1 algorithm, 4 tables

2606.10085 2026-06-10 cs.LG eess.SP math.OC 新提交

Structured Adaptive Tensor Prediction for Streaming Data

流式数据的结构化自适应张量预测

Zhen Qin, Yang Chen

AI总结针对矩阵值时间序列的流式预测，提出自适应张量回归框架，包含矩阵-矩阵和张量-矩阵两种形式，并开发在线SGD算法，张量-矩阵模型在稳态误差和去噪方面更优，同时建立了低维结构下的恢复保证。

详情

AI中文摘要

矩阵值时间序列出现在广泛的应用中，例如来自医学成像和地球物理学的时空数据。现有方法主要针对静态环境设计，缺乏对流式和时变环境的适应性。自适应滤波技术也大多局限于标量或向量值数据，使得矩阵值时间序列的自适应预测理解不足。为弥补这些差距，我们开发了一个自适应张量回归框架，包括矩阵-矩阵（MoM）和张量-矩阵（ToM）两种形式，用于流式矩阵值预测。这两种形式的区别在于是否直接建模矩阵值输出，或通过高阶张量表示利用时间结构。针对所提出的张量回归框架，我们开发了用于在线学习的随机梯度下降（SGD）算法。我们表明，将多个响应随时间堆叠成高阶张量可以提高性能；特别是，ToM比MoM实现了更低的稳态误差和更强的去噪能力，这促使我们关注ToM模型。我们进一步刻画了SGD在时变动态下的跟踪行为。从统计角度，我们建立了ToM在一般低维结构（包括稀疏性、低秩性及其联合稀疏低秩模型）下的固定时间恢复保证。

英文摘要

Matrix-valued time series arise in a wide range of applications, such as spatio-temporal data from medical imaging and geophysics. Existing methods are mainly designed for static settings and lack adaptability to streaming and time-varying environments. Adaptive filtering techniques have also been largely limited to data with scalar or vector values, leaving adaptive forecasting for matrix-valued time series inadequately understood. To bridge these gaps, we develop an adaptive tensor regression framework that includes Matrix-on-Matrix (MoM) and Tensor-on-Matrix (ToM) formulations for streaming matrix-valued prediction. The two formulations differ in whether to directly model matrix-valued outputs or to exploit temporal structure via higher-order tensor representations. For the proposed tensor regression framework, we develop stochastic gradient descent (SGD) algorithms for online learning. We show that stacking multiple responses across time into higher-order tensors improves performance; in particular, the ToM achieves lower steady-state error and stronger denoising capability than MoM, motivating our focus on the ToM model. We further characterize the tracking behavior of SGD under time-varying dynamics. From a statistical perspective, we establish fixed-time recovery guarantees for ToM under general low-dimensional structures, including sparsity, low-rankness, and their joint sparselow-rank models.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

AudioProcessBench: Benchmark for Identifying Process Errors in Audio-Grounded Reasoning

Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters

Conformal Prediction for Neural Operators: Distribution-Free Uncertainty Quantification in Physics Simulation

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

SPDM: Geometry-Modulated State Space Modeling with Manifold Constraints for Time Series Forecasting

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

Integrating Local and Global Entropy for Uncertainty Quantification in LLMs

Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

SD-GRPO: Verifiable Segment Decomposition for Long-Form Vision-Language Generation

QSplitFL: Capability Aware Deep Q-Learning for Optimal Split Point Selection in Split Federated Learning

Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning

LLM-as-a-Discriminator: When Synthetic Tables Still Look Real

Blurry Window Attention

Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis

Mechanistic Analysis of Alignment Algorithms in Language Models

First-Order Trajectory Matching: Fast Ensemble Predictions of Chaotic, Turbulent, Stochastic Systems

Learning Doubly Sparse Explicitly Conditioned Transforms

Express Language Modeling

Gradient based Bilevel for Inverse Optimal Control, a Riemannian approach

Encoding the Euler Characteristic Transform

Moonshine: An Autonomous Mathematical Research Agent Centered on Conjecture Generation

NOVA: Symbolic Regression Discovery of Interpretable Car-Following and Lane-Change Models with Driver Heterogeneity

Baseline-Free Policy Optimization for Neural Combinatorial Optimization

Improved Representation of Matrix Lie Group Operations through Tensor Notation

Structured Adaptive Tensor Prediction for Streaming Data