arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1971
热门方向导航
2606.18884 2026-06-18 cs.CV 新提交

Performance Gap Analysis between Latin and Arabic Scripts HTR

拉丁文与阿拉伯文手写文本识别之间的性能差距分析

Sana Al-azzawi, Elisa Barney, Marcus Liwicki

发表机构 * Luleå University of Technology Department of Computer Science, Electrical

AI总结 本研究使用统一CRNN模型在多个数据集上比较阿拉伯文和拉丁文手写文本识别性能,发现性能差距在低资源场景下显著,随数据增加而缩小但持续存在,并分析了标注质量、视觉变异性和字符分布等因素。

Comments this paper accepted at TIPS workshop ICPR 2026

详情
AI中文摘要

最近的研究表明,手写文本识别(HTR)系统在阿拉伯文数据集上的表现不如拉丁文数据。然而,由于缺乏受控比较,这种差距的原因仍不清楚。在这项工作中,我们使用统一的CRNN模型对阿拉伯文和拉丁文脚本进行行级HTR的全面研究,涵盖九个数据集(包括KHATT(阿拉伯文)、Muharaf(阿拉伯文)、NUST-UHWR(乌尔都文)、PHTD(波斯文)、IAM(英文)、READ-2016(德文)等)和不同的训练规模(K ∈ {100, 500, 1000, 2000, ..., Kfull})。我们的结果显示性能差距仍然存在:在低资源设置下差距很大,随着数据增加而缩小,但在全规模下仍然存在,一致相差5-7个CER点。我们表明标注质量很重要,因为许多数据集包含标注错误。清理降低了错误率并缩小了差距,但并未消除差距。此外,我们发现由于阿拉伯文具有更高的视觉变异性,固定数量的训练样本提供的覆盖效率较低,需要更多数据来学习相似的表示。我们根据文本行数和字符数比较了跨数据集的识别性能,显示了等价权衡。我们比较了跨脚本的字符频率分布,并表明阿拉伯文比拉丁文显著更重尾。我们的错误分析显示,阿拉伯文数据集(例如KHATT)中约30%的替换错误是由视觉相似字符之间的混淆引起的,而在拉丁文数据集(如IAM)中约为15%。

英文摘要

Recent studies have shown that handwritten text recognition (HTR) systems perform worse on Arabic-script datasets than on Latin-script data. However, the reasons for this gap are still not well understood due to the lack of controlled comparisons. In this work, we present a comprehensive study of Arabic and Latin scripts HTR using a unified CRNN model for line-level HTR across nine datasets (including KHATT (Arabic), Muharaf (Arabic), NUST-UHWR (Urdu), PHTD (Persian), IAM (English), READ-2016 (German), and others) and di ferent training sizes (K in {100, 500, 1000, 2000, ..., Kfull}). Our results show the performance gap remains: it is large in low-resource settings, decreases with more data, but remains even at full scale, with a consistent difference of 5-7 CER points. We show that annotation quality matters, as many datasets contain labeling errors. Cleaning reduces error rates and narrows the gap, but does not eliminate it. In addition, we find that a fixed number of training samples provides less effective coverage in Arabic due to higher visual variability, requiring more data to learn similar representations. We compare recognition across datasets in terms of the number of text lines and the number of characters, showing an equivalence trade-off. We compare character frequency distributions across scripts and show that Arabic is significantly more heavy-tailed than Latin. Our error analysis reveals that around 30 percent of substitution errors in Arabic datasets (e.g., KHATT) are caused by confusion between visually similar characters, compared to about 15 percent in Latin-script datasets such as IAM.

2606.18883 2026-06-18 cs.RO 新提交

ZiMPedance: Impedance-Aware ZMP Modeling and Control for Payload Carrying with Quadruped Robots

ZiMPedance:面向四足机器人负载搬运的阻抗感知ZMP建模与控制

Giovanni B. Dessy, Lorenzo Amatucci, Victor Barasuol, Claudio Semini

发表机构 * Dynamic Legged Systems Lab, Istituto Italiano di Tecnologia (IIT)(动态腿部系统实验室,意大利技术研究院(IIT))

AI总结 提出扩展零力矩点(ZMP)公式以包含被动负载接口动力学,结合模型预测控制减少稳定性违规达10倍,并提高运动效率。

详情
AI中文摘要

四足机器人的负载运输受到机器人与负载之间物理接口动力学的强烈影响。与主动机械臂相比,被动弹簧臂减轻了重量和复杂性,但其弹簧-阻尼动力学可能引入振荡力,降低运动稳定性。本文推导了一个扩展的零力矩点(ZMP)公式,该公式包含被动负载接口动力学,将刚度、阻尼和负载质量与稳定性裕度联系起来。分析表明,欠阻尼配置可能与运动谐波共振。基于这一见解,我们通过被动子系统动力学增强了单刚体动力学模型,并将其集成到模型预测控制框架中。在仿真中,所提出的控制器将稳定性违规减少高达10倍(从7.0%降至0.7%),并通过将水平地面反作用力努力降低高达15%来提高运动效率。硬件实验表明,在标称控制器失效的拉放扰动下,携带2公斤负载的机器人能够稳定运动。同一模型还使得通过被动臂动力学实现末端执行器跟踪成为可能,而无需直接驱动臂。

英文摘要

Load transportation with quadruped robots is strongly affected by the dynamics of the physical interface between the robot and the load. Passive spring-based arms reduce weight and complexity compared to active manipulators, but their spring-damper dynamics can introduce oscillatory forces that degrade locomotion stability. This paper derives an extended Zero Moment Point (ZMP) formulation that includes passive payload-interface dynamics, relating stiffness, damping, and payload mass to the stability margin. The analysis shows that underdamped configurations can resonate with locomotion harmonics. Based on this insight, we augment a Single Rigid Body Dynamics model with passive subsystem dynamics and integrate it into a Model Predictive Control framework. In simulation, the proposed controller reduces stability violations by up to $10\times$, from $7.0\%$ to $0.7\%$, and increase locomotion efficiency by lowering horizontal ground reaction force effort by up to $15\%$ compared to a nominal baseline. Hardware experiments with a $2\,\mathrm{kg}$ payload show stable locomotion under pull-release disturbances where the nominal controller fails. The same model also enables end-effector tracking through passive arm dynamics without direct arm actuation.

2606.18882 2026-06-18 cs.LG cs.AI eess.SP 新提交

Domain-Shift Aware Neural Networks for Unbalance Characterization in Rotating Systems

面向旋转系统不平衡表征的域偏移感知神经网络

Bernardo Feijó Junqueira, Claudio Kiyoshi Umezu, Bruno Bilhar Karaziack, Tomaz Junior, Daniel Alves Castello

发表机构 * Springer Nature

AI总结 提出域偏移感知神经网络,通过最大均值差异策略对齐源域与目标域特征,解决变工况下旋转轴不平衡质量估计的回归问题,实验证明该方法在域偏移未知时显著提升预测精度。

详情
AI中文摘要

本文研究了域偏移感知神经网络在回归任务中的应用,旨在估计不同运行条件下旋转轴的不平衡质量。实验数据来自一个测试台,其中主轴上安装有带不平衡质量的法兰,在不同转速下驱动,同时可选择性地激活副轴以引入域差异。不平衡质量固定在径向距离上,使用三轴加速度计记录系统的动态响应。质量估计的逆问题在域自适应框架中提出,网络采用最大均值差异策略进行训练,以对齐源域和目标域的特征表示。结果表明,显式处理域偏移能有效提高预测精度,尤其是在系统的物理行为和域偏移来源不完全已知且超出训练条件的情况下。这些发现凸显了域偏移感知模型在结构健康监测回归任务中的潜力。

英文摘要

This work investigates the application of a domain-shift aware neural network for regression tasks aimed at estimating unbalance masses in rotating shafts under varying operating conditions. Experimental data were collected from a test rig in which a primary shaft, equipped with a flange carrying unbalanced masses, was driven at different rotational speeds, while a secondary shaft could be optionally activated to introduce domain discrepancy. The unbalance masses were positioned at a fixed radial distance, and the dynamic response of the system was recorded using triaxial accelerometers. The inverse problem of mass estimation is formulated within a domain adaptation framework, where the network is trained with a maximum mean discrepancy strategy to align feature representations across source and target distributions. The results demonstrate the effectiveness of explicitly addressing domain shift in improving prediction accuracy, especially when the system's physical behavior and sources of domain discrepancy are not fully known and fall outside the training conditions. These findings highlight the potential of domain-shift aware models for regression tasks in Structural Health Monitoring.

2606.18876 2026-06-18 cs.CV cs.LG 新提交

Test-Time Adaptation in Optical Coherence Tomography Using Trajectory-Aligned Time-Independent Flow

光学相干断层扫描中基于轨迹对齐的时间无关流的测试时自适应

Veit Hucke, Thomas Pinetz, Gregor Reiter, Ursula Schmidt-Erfurth, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria(人工智能研究所、医学数据科学中心、维也纳医学大学,奥地利) Comprehensive Center for Artificial Intelligence in Medicine, Medical University of Vienna, Austria(医学人工智能综合中心、维也纳医学大学,奥地利) Department of Ophthalmology and Optometry, Medical University of Vienna, Austria(眼科与视光学部、维也纳医学大学,奥地利) Laboratory for Ophthalmic Image Analysis, Medical University of Vienna, Austria(眼科图像分析实验室、维也纳医学大学,奥地利)

AI总结 提出一种基于流匹配的测试时自适应方法,通过直方图匹配和去除时间条件,生成高质量替代图像,在AMD分割中达到最优性能。

Comments Accepted in MICCAI

详情
AI中文摘要

光学相干断层扫描(OCT)在眼科中至关重要,但图像质量不一致,尤其是在低成本设备中,阻碍了自动化分析。为了解决这个问题,我们引入了一种基于流匹配的测试时自适应方法,从噪声输入生成高质量替代图像。通常,测试数据和训练数据之间的域差距会导致去噪过程中像素分布不匹配。我们通过将测试图像的直方图与合成参考轨迹匹配来克服这一问题,成功地将输入与预期分布对齐。此外,我们移除了网络的时间条件,以考虑真实世界噪声分布的轻微偏差。我们的方法在分割年龄相关性黄斑变性(AMD)两个阶段的关键生物标志物方面达到了最先进的性能。代码地址:this https URL。

英文摘要

Optical coherence tomography (OCT) is essential in ophthalmology, but inconsistent image quality especially in low-cost devices hinders automated analysis. To address this, we introduce a flow-matching-based test-time adaptation method that generates high-quality surrogate images from noisy inputs. Typically, domain gaps between test and training data cause pixel distribution mismatches during the denoising process. We overcome this by matching the test image's histogram to synthetic reference trajectories, successfully aligning the input with expected distributions. Additionally, we remove the network's time conditioning to account for slight deviations in real-world noise distributions. Our approach achieves state-of-the-art performance in segmenting critical biomarkers for two stages of Age-related Macular Degeneration (AMD). Code is available: https://github.com/Veit21/tta-flow.

2606.18875 2026-06-18 cs.CL 新提交

Efficient Financial Language Understanding via Distillation with Synthetic Data

通过合成数据蒸馏实现高效金融语言理解

Wen-Fong, Huang, Edwin Simpson

发表机构 * School of Engineering Mathematics and Technology(工程数学与技术学院) University of Bristol(布里斯托大学)

AI总结 提出一种在低资源条件下通过合成数据蒸馏进行金融情感分析的框架,利用聚类种子选择生成代表性合成数据,使紧凑模型在少量标注下达到强性能,甚至在某些任务上超越教师模型。

Journal ref Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), European Language Resources Association (ELRA), 2026, pp. 10242-10254

详情
AI中文摘要

大型指令跟随模型功能强大但部署成本高昂,尤其在金融领域,标注数据因保密性和专家标注成本而受限。我们提出一种通过合成数据蒸馏进行金融情感分析的高效框架,将知识从大型指令调优教师模型迁移到紧凑的学生模型。该框架专为低资源条件设计,其中收集并手工标注少量真实样本。框架随后对样本进行聚类,并利用聚类结果选择种子,通过结构化少样本提示生成合成样本。实验表明,基于聚类的种子选择比随机采样能生成更具代表性的合成数据,使紧凑模型在极少量监督下实现强性能。值得注意的是,在更复杂且噪声更多的文本领域,基于完整合成种子语料库训练的紧凑模型甚至优于教师模型,同时在正式文本上保持竞争力。该框架为金融NLP中资源高效的领域自适应提供了一条实用途径,且只需最少的人工标注工作。

英文摘要

Large instruction-following models are powerful but costly to deploy, particularly in finance, where labelled data are limited by confidentiality and expert annotation cost. We present an efficient framework for financial sentiment analysis through distillation with synthetic data, transferring knowledge from a large instruction-tuned teacher to compact student models. The framework is designed for low-resource conditions, where a small set of real examples are collected and labelled by hand. The framework then clusters the examples and uses the clusters to select seeds for generating synthetic examples via structured few-shot prompting. Experiments show that clustering-based seed selection yields more representative synthetic data than random sampling, enabling compact models to achieve strong performance with minimal supervision. Notably, on a more complex and noisy text domain, the compact model trained on the complete synthetic-seed corpus even outperforms the teacher model, while remaining competitive on formal text. The framework provides a practical route toward resource-efficient domain adaptation in financial NLP with minimal human labelling effort.

2606.18874 2026-06-18 cs.AI 新提交

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

通过研究框架将AI科学家的研究综合与验证外部化

Zijian Wang, Hanqi Li, Ziyue Yang, Zijian Hu, Shenghan Zuo, Yunzhe Zhang, Da Ma, Danyu Luo, Chenrun Wang, Jing Peng, Tiancheng Huang, Sijia Guo, Huayang Wang, Zichen Zhu, Senyu Han, Yilu Cao, Kai Yu, Lu Chen

发表机构 * X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China(上海交通大学计算机学院X-LANCE实验室) Jiangsu Key Lab of Language Computing, Suzhou, China(江苏省语言计算重点实验室) Suzhou Laboratory, Suzhou, China(苏州实验室)

AI总结 提出Xcientist框架,将研究综合与实验验证外部化为可检查的合同驱动过程,解决自动研究中的声明漂移问题,并在多个领域验证其有效性。

Comments 65 pages, 14 figures, 19 tables

详情
AI中文摘要

AI系统日益能够自动化科学工作流程,但连接先前证据、生成的想法、实验和最终声明的推理通常仍然隐含在模型推理中。这里我们介绍Xcientist,一个研究框架,将研究综合和实验验证外部化为可检查的、合同驱动的过程。Xcientist将文献证据、想法状态、实施计划、消融记录和修复痕迹组织为持久的研究工件,使得生成的机制可以在不丢失其证据基础的情况下被基础化、执行、测试和修订。我们将声明漂移识别为自动化研究的一种失败模式,其中可运行的工件不再支持最初声称的机制。在无训练记忆系统、图结构交通预测和多尺度物理信息神经网络中,Xcientist保留了从问题公式化到机制设计、验证和有限修订的可追踪轨迹。这些结果表明,AI科学家不仅应根据其最终工件进行评估,还应看其综合和验证过程是否可归因、可检查且在科学上可问责。

英文摘要

AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts, so that generated mechanisms can be grounded, executed, tested and revised without losing their evidential basis. We identify claim drift as a failure mode of automated research, where runnable artifacts no longer support the mechanism originally claimed. Across training-free memory systems, graph-structured traffic forecasting and multi-scale physics-informed neural networks, Xcientist preserves traceable trajectories from problem formulation to mechanism design, validation and bounded revision. These results suggest that AI scientists should be evaluated not only by their final artifacts, but by whether their synthesis and validation processes remain attributable, inspectable and scientifically accountable.

2606.18872 2026-06-18 cs.CV 新提交

Bridging Single Distortion Artifacts and Mmultifactorial Clinical Quality: Few-shot Biparametric MRI Quality Assessment via Distortion-trained Prototypical Networks

桥接单一失真伪影与多因素临床质量:基于失真训练的原型网络的少样本双参数MRI质量评估

Yuheng Tang, Alexander Ng, Wen Yan, Natasha Thorley, Pawel Rajwa, Yipei Wang, Aqua Asif, Clare Allen, Louise Dickinson, Francesco Giganti, Shonit Punwani, Daniel Alexander, Veeru Kasivisvanathan, Yipeng Hu

发表机构 * UCL Hawkes Institute(UCL Hawkes研究所) Department of Medical Physics and Biomedical Engineering(医学物理与生物医学工程系) University College London(伦敦大学学院) Division of Surgery and Interventional Science(外科与介入科学分会) Centre for Medical Imaging(医学成像中心) British Urology Researchers in Surgical Training (BURST)(英国泌尿外科手术培训研究人员(BURST)) Department of Radiology(放射科) University College London Hospitals NHS Foundation Trust(伦敦大学学院医院国家健康服务信托基金) Centre of Medical Imaging, Division of Medicine(医学成像中心,医学分会) Centre for Medical Image Computing(医学图像计算中心) Department of Computer Science(计算机科学系) Department of Urology(泌尿科)

AI总结 提出一种少样本双参数原型网络,利用失真标签元训练,通过特征融合和域对齐,仅用5个样本即可预测PI-QUAL临床质量评分,解决临床数据稀缺问题。

详情
AI中文摘要

临床前列腺多参数MRI高度依赖高质量扩散加权成像(DWI),但DWI读图常因几何失真(通常由直肠气体引起)而受损。通过PI-QUAL评分系统评估质量是新兴的临床标准,但该方法主观、耗时,且存在类别不平衡问题,其中低质量病例多样且相对稀少。以PRIME临床试验为例,6%的图像PI-QUAL评分低于4,87%的DWI问题源于失真,许多其他临床质量问题代表性不足。为解决这种标注临床数据的双重稀缺性,我们提出了一种用于自动图像质量评估(IQA)的少样本双参数原型网络。我们的框架利用双分支3D ResNet融合T2加权和DWI特征,提供解剖背景以区分真实形态与失真。为处理现实异质性,我们引入特征级线性调制(FiLM)和梯度反转层(GRL),以对齐基于不同b值的特征分布,同时抑制采集相关偏差。我们证明,仅基于相对客观、易于获取的失真标签进行元训练的模型,能够仅使用五个代表性样本有效适应预测复杂的多因素临床质量评分(如PI-QUAL)。在两个数据集上的实验结果表明,我们的方法在此具有挑战性的IQA任务中显著优于少样本学习基线,为临床工作流程中标准化前列腺MRI质量控制提供了实际可行且数据高效的解决方案。

英文摘要

Clinical prostate multi-parametric MRI relies heavily on high-quality diffusion-weighted imaging (DWI), yet reading DWI is frequently compromised by geometric distortion, often caused by rectal air. Assessing quality via the PI-QUAL scoring system is an emerging clinical standard, but it is subjective, time-consuming and suffers from a class imbalance where low-quality cases are diverse and relatively scarce. Using the PRIME clinical trial as an example, there are $6\%$ images with PI-QUAL scores lower than 4, $87\%$ of DWI issues are due to distortion. Many of the other clinical quality issues are under-represented. To address this common dual-scarcity of annotated clinical data, we propose a few-shot biparametric prototypical network for automated image quality assessment (IQA). Our framework utilizes a dual-branch 3D ResNet to fuse T2-weighted and DWI features, providing anatomical context to distinguish true morphology from distortion. To handle real-world heterogeneity, we introduce feature-wise linear modulation (FiLM) and a gradient reversal layer (GRL) to align feature distributions conditioned on varying b-values while suppressing acquisition-related biases. We demonstrate that a model meta-trained solely on comparatively objective, readily obtainable distortion labels can effectively adapt to predicting complex, multi-factorial clinical quality scores such as PI-QUAL using only five representative samples. Experimental results on two datasets show that our method significantly outperforms few-shot learning baselines for this challenging IQA task, offering a practically feasible and data-efficient solution for standardizing prostate MRI quality control in clinical workflows.

2606.18869 2026-06-18 cs.CV 新提交

Learning to Distort: Weakly-Supervised Image Quality Transfer for Prostate DWI Correction

学习扭曲:用于前列腺DWI校正的弱监督图像质量迁移

YuCheng Tang, Wen Yan, Alexander Ng, Natasha Thorley, Pawel Rajwa, Yipei Wang, Aqua Asif, Clare Allen, Louise Dickinson, Francesco Giganti, David Atkinson, Shonit Punwani, Daniel Alexander, Shaheer Ullah Saeed, Veeru Kasivisvanathan, Yipeng Hu

发表机构 * UCL Hawkes Institute(UCL哈维斯研究所) Department of Medical Physics and Biomedical Engineering(医学物理与生物医学工程系) University College London(伦敦大学学院) Division of Surgery and Interventional Science(外科与介入科学分会) Centre for Medical Imaging(医学成像中心) British Urology Researchers in Surgical Training (BURST)(英国泌尿外科手术培训研究人员(BURST)) Department of Radiology(放射科) University College London Hospitals NHS Foundation Trust(伦敦大学学院医院国家健康服务信托基金) Centre for Medical Image Computing(医学图像计算中心) Department of Computer Science(计算机科学系) Department of Urology(泌尿科)

AI总结 提出弱监督图像质量迁移框架,利用图像质量评估信号从无失真图像学习生成真实失真,并训练校正模型,在PI-RADS和Gleason评分分类任务中优于现有无配对方法。

详情
AI中文摘要

单次激发平面回波前列腺弥散加权成像(DWI)常因几何失真而复杂化,影响从这些图像中获得可靠诊断的能力。开发自动化校正方法面临缺乏配对的失真和未失真临床扫描的挑战。本文首先提出一种新颖的弱监督图像质量迁移(IQT)框架,从无失真图像到失真图像,利用图像质量评估(IQA)信号监督迁移过程。与传统方法需要昂贵的体素级配对数据或采用无配对算法不同,我们的方法利用图像级质量标签(此处为失真与无失真)在预训练特征空间中建立潜在质量原型。认识到模拟真实失真比直接无配对校正更可靠,我们描述了一种弱监督原型流匹配算法,显式正则化生成轨迹朝向失真原型,产生模拟临床退化的真实磁敏感伪影。通过合成这些真实配对,我们能够训练第二个IQT模型进行正向失真校正。实验结果表明,我们生成的图像成功模拟了真实伪影的诊断干扰,从而产生更强大的失真校正IQT模型。除定性比较外,我们还通过评估临床下游任务性能(PI-RADS和Gleason评分分类),使用分布内和外部数据集,将我们的方法与现有无配对方法(如CycleGAN、UNIT-DDPM和OT-FM)作为正向或反向替代方案进行详尽的定量评估。

英文摘要

Single-shot echo-planar prostate diffusion-weighted imaging (DWI) is frequently complicated by geometric distortions, which impact the ability to derive reliable diagnoses from such images. Developing automated correction methods is challenged by the absence of paired distorted and undistorted clinical scans. In this paper, we first propose a novel weakly-supervised image quality transfer (IQT) framework from undistorted to distorted images that utilizes image quality assessment (IQA) signals to supervise the transfer process. Unlike traditional methods that require expensive, voxel-wise paired data or resort to developing unpaired algorithms, our approach utilizes image-level quality labels (here, distorted vs. undistorted) to establish latent quality prototypes within a pre-trained feature space. Recognizing that simulating realistic distortions is more reliable than direct unpaired correction, we describe a weakly-supervised prototype flow matching algorithm to explicitly regularize generative trajectories towards distorted prototypes, producing realistic susceptibility artifacts that mimic clinical degradations. By synthesizing these realistic pairs, we enable a second IQT model to be trained in the forward direction for distortion correction. Experimental results demonstrate that our generated images successfully mimic the diagnostic interference of real-world artifacts, which leads to more capable distortion correction IQT models. In addition to qualitative comparisons, we also conduct exhaustive quantitative evaluations that compare our approach with existing unpaired approaches (e.g., CycleGAN, UNIT-DDPM, and OT-FM) - as either forward or reverse alternatives - by assessing clinical downstream task performance in PI-RADS and Gleason score classification, using both in-distribution and external data sets.

2606.18867 2026-06-18 cs.LG cs.CY stat.ML 新提交

Strategic Feature Selection

战略特征选择

Jivat Neet Kaur, Pratik Patil, Divya Shanmugam, Emma Pierson, Michael I. Jordan, Nika Haghtalab, Meena Jagadeesan, Ahmed Alaa, Serena Wang

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Texas, Austin(德克萨斯大学奥斯汀分校) Cornell Tech(康奈尔科技) Stanford University(斯坦福大学) University of Pennsylvania(宾夕法尼亚大学) Harvard University(哈佛大学) Inria, Paris(巴黎Inria)

AI总结 研究通过特征选择和岭正则化应对战略操纵的分类问题,发现仅基于可操纵性排除特征通常次优,提出联合优化特征集与正则化水平的算法,并在医疗支付基准上验证。

详情
AI中文摘要

当算法预测器在高风险领域(如医疗)中指导资源分配时,这些预测器必须考虑输入特征的战略操纵。典型的解决方案是重新设计预测器本身以明确考虑战略互动。然而在实践中,决策者通常受限于调整现有预测管道中的粗粒度杠杆。例如,医疗组织通常根据感知的可操纵性选择排除哪些特征,同时使用标准正则化程序来收缩保留特征的系数。在这项工作中,我们通过特征选择及其与岭正则化的相互作用,启动了对战略分类的形式化研究。我们的主要发现是,仅基于可操纵性排除单个特征通常是次优的。我们提供了在最优正则化下特征子集性能的细粒度刻画,为政策设计提供了新的见解。受此刻画启发,我们开发了一种实用算法,用于联合选择特征集和岭正则化水平。通过一个关于医疗支付基准的真实世界案例研究,我们说明了我们的算法如何指导实践中粗粒度政策杠杆的设计。我们的结果为减轻算法决策系统中战略行为的影响提供了一个有原则的、实用的框架。

英文摘要

When algorithmic predictors inform resource allocation in high-stakes domains such as healthcare, these predictors must account for strategic manipulation of input features. The typical solution is to redesign the predictor itself to explicitly account for strategic interactions. In practice, however, decision makers are often constrained to adjusting coarser levers within existing prediction pipelines. For example, healthcare organizations often select which features to exclude based on perceived manipulability, while using standard regularization procedures to shrink the coefficients of retained features. In this work, we initiate a formal study of strategic classification through feature selection and its interaction with ridge regularization. Our main finding is that excluding individual features based on their manipulability alone is generally suboptimal. We provide a fine-grained characterization of the performance of a feature subset under optimal regularization, yielding new insights for policy design. Motivated by this characterization, we develop a practical algorithm for jointly choosing the feature set and the level of ridge regularization. Through a real-world case study on a healthcare payments benchmark, we illustrate how our algorithm can guide the design of coarse policy levers in practice. Our results provide a principled, practical framework for mitigating the effects of strategic behavior in algorithmic decision-making systems.

2606.18864 2026-06-18 cs.LG cs.AI 新提交

Scaling Learning-based AEB with Massive Unlabeled Data

基于大规模无标签数据的可扩展学习型自动紧急制动

Xiangyu Wang, Yang Zhan, Mengxiang Hao, Chuanchuan Zhong, Yansong Jia, Junjie Zhang, Yu Han, Xin Jiang, Zhen Cao, Ying Wang, Yulun Song, Zhitao Xu

发表机构 * Li Auto

AI总结 提出稳定元反馈半监督学习框架,通过噪声感知解耦和运动学门控伪标签,利用大规模无标签数据提升自动紧急制动性能,实现超100:1正误触发比和35%无事故里程提升。

Comments Accepted for presentation at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

详情
AI中文摘要

本文研究如何在生产约束下,利用大规模无标签车队数据扩展基于学习的自动紧急制动(AEB)。我们的方法基于元反馈半监督学习(MF-SSL),其中教师模型为无标签驾驶数据生成伪标签,并使用小型有标签锚定集作为安全关键反馈进行更新。在生产中,锚定歧义和有标签-无标签不匹配会放大系统性的伪标签错误,导致误触发。我们提出了一种稳定的MF-SSL框架,包括:(i) 噪声感知解耦,从教师监督更新路径中移除易产生歧义的锚定;(ii) 运动学门控伪标签,结合教师冲突惩罚,抑制无标签数据上由不匹配引起的风险幻觉,同时保持广泛覆盖。大量实验表明,随着无标签数据从1M扩展到1B窗口,模型性能持续提升,在保持舒适性的同时提高了安全性。经过1B数据训练的学生模型已部署到数十万辆车辆上,并在超过10^9公里的行驶中得到验证,实现了超过100:1的正误触发比,且相比仅基于规则的基线,无事故行驶里程提升了35%。

英文摘要

This paper studies how to scale learning-based automatic emergency braking (AEB) with massive unlabeled fleet data under production constraints. Our approach is based on meta-feedback semi-supervised learning (MF-SSL), where a teacher generates pseudo labels for unlabeled driving data and is updated using a small labeled anchor set as safety-critical feedback. In production, anchor ambiguity and labeled-unlabeled mismatch can amplify systematic pseudo-label errors, leading to spurious triggers. We propose a stabilized MF-SSL framework with (i) Noise-Aware Decoupling, which removes ambiguity-prone anchors from the teacher's supervised update path, and (ii) kinematics-gated pseudo-labeling with a teacher conflict penalty to suppress mismatch-induced risk hallucinations on unlabeled data while maintaining broad coverage. Extensive experiments show consistent gains as unlabeled data scale from 1M to 1B windows, improving safety while keeping comfort stable. The 1B-trained student model is deployed to hundreds of thousands of vehicles and validated over \$10^9$ km of driving, achieving a positive-to-false activation ratio exceeding 100:1 and a 35% improvement in accident-free driving mileage over a production rule-only baseline.

2606.18861 2026-06-18 cs.CV cs.AI 新提交

URDF Synthesis from RGB-D Sequences via Differentiable Joint Inference and Energy-Consistent Verification

基于可微联合推理与能量一致性验证的RGB-D序列URDF合成

Xinze Zhang

发表机构 * University of Southern California(南加州大学)

AI总结 提出KinemaForge管道,通过可微关节推理和能量一致性验证,从RGB-D序列联合估计部件形状、关节拓扑和参数,显著降低关节轴误差和仿真漂移。

详情
AI中文摘要

从传感器观测重建可仿真的铰接物体数字孪生仍受两个持续存在的差距制约:(i) 部件级几何重建与运动学参数估计分离,(ii) 恢复的模型常违反能量守恒等基本动态不变量,导致URDF在物理仿真器中重放时出现漂移。我们提出KinemaForge,一种约束驱动管道,从短RGB-D序列联合推断部件级形状、关节拓扑和关节参数,并通过基于可微刚体动力学构建的能量一致性验证器验证结果。该管道引入三个组件:将关节-部件关联编码为软边的运动学约束图;通过Featherstone铰接体算法从渲染观测反向传播到关节参数的可微螺旋轴求解器;以及惩罚重建模型非物理自由响应的能量残差损失。在五个PartNet-Mobility类别和一个内部RGB-D基准上,KinemaForge将平均关节轴误差从最强几何基线(PARIS)的4.52度降至2.83度(-37.4%),从基于交互的Ditto基线的5.30度降至2.83度(-46.6%),在50秒滚动中长时仿真漂移比PARIS降低64%,初步评估中闭环操作成功率比Ditto提高14.6个百分点。代码和重建数据将在接收后发布。

英文摘要

Reconstructing simulation-ready digital twins of articulated objects from sensor observations remains constrained by two persistent gaps: (i) part-level geometric reconstruction is decoupled from kinematic-parameter estimation, and (ii) the recovered models often violate basic dynamic invariants such as energy conservation, leading to drift when the URDF is replayed in physics simulators. We present KinemaForge, a constraint-driven pipeline that jointly infers part-level shape, joint topology, and joint parameters from short RGB-D sequences and validates the result against an energy-consistent verifier built on differentiable rigid-body dynamics. The pipeline introduces three components: a kinematic constraint graph that encodes joint-part incidences as soft edges; a differentiable screw-axis solver that backpropagates from rendered observations through Featherstone's articulated-body algorithm to joint parameters; and an energy residual loss that penalises non-physical free responses of the reconstructed model. Across five PartNet-Mobility categories and an internal RGB-D benchmark, KinemaForge reduces the average joint-axis error from 4.52 degrees to 2.83 degrees (-37.4%) over the strongest geometric baseline (PARIS) and from 5.30 degrees to 2.83 degrees (-46.6%) over the interaction-based Ditto baseline, lowers long-horizon simulation drift by 64% (vs. PARIS) over 50 s rollouts, and yields URDFs whose closed-loop manipulation success rate improves by 14.6 percentage points over Ditto in our preliminary evaluation. Code and reconstruction data will be released upon acceptance.

2606.18860 2026-06-18 cs.CV cs.LG 新提交

Quantification of Uncertainty with Adversarial Models in Medical Image Segmentation

医学图像分割中对抗模型的不确定性量化

Hana Jebril, Thomas Pinetz, Günter Klambauer, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria(人工智能研究所、医学数据科学中心、维也纳医学大学,奥地利) Comprehensive Center for AI in Medicine, Medical University of Vienna, Austria(医学人工智能综合中心、维也纳医学大学,奥地利) ELLIS Unit Linz, LIT AI Lab and Institute for Machine Learning, Johannes Kepler University Linz, Austria(林茨ELLIS单位、LIT人工智能实验室和机器学习研究所、林茨约瑟夫·冯·克拉夫特大学,奥地利) Institute for Machine Learning, Johannes Kepler University Linz, Austria(机器学习研究所、林茨约瑟夫·冯·克拉夫特大学,奥地利) Clinical Research Center for Medical AI, Johannes Kepler University Linz, Austria(医学人工智能临床研究中心、林茨约瑟夫·冯·克拉夫特大学,奥地利)

AI总结 提出QUAM-SM后处理框架,通过针对性对抗搜索识别脆弱像素,量化不确定性并分离认知与偶然不确定性,在公开数据集上优于现有方法。

Comments Accepted at MICCAI 2026

详情
AI中文摘要

可靠的像素级不确定性量化具有通过实现高保真纵向监测和区分真实病理变化与伪影来改变临床工作流程的潜力。理想情况下,这些模型提供关键治疗计划和手术干预所需的稳定性。然而,标准深度学习模型常常遭受校准不良,产生过度自信的预测,掩盖了微妙病理边界处的潜在脆弱性。为了解决这个问题,我们提出了QUAM-SM,一种使用针对性对抗搜索来识别“对抗脆弱”像素的后处理框架。通过主动寻找暴露预测不稳定性的扰动,我们的方法突出了决策最容易被翻转的区域。重要的是,该框架将认知不确定性与偶然不确定性分离。在两个具有多个专家标注的公开数据集上的实验表明,QUAM-SM在可靠性和边界敏感性方面优于标准和最新的不确定性估计方法。代码可在以下网址获取:https://this https URL

英文摘要

Reliable pixel-level uncertainty quantification holds the potential to transform clinical workflows by enabling high-fidelity longitudinal monitoring and distinguishing true pathological changes from artifacts. Ideally, these models provide the stability required for critical treatment planning and surgical intervention. However, standard deep learning models often suffer from miscalibration, yielding overconfident predictions that mask underlying vulnerabilities at subtle pathological boundaries. To address this, we propose QUAM-SM, a post-hoc framework using targeted adversarial search to identify "adversarially fragile" pixels. By actively seeking perturbations that expose predictive instability, our method highlights regions where decisions are most vulnerable to being flipped. Importantly, the framework disentangles epistemic uncertainty from aleatoric uncertainty. Experiments on two public datasets with multiple expert annotations demonstrate that QUAM-SM outperforms both standard and recent uncertainty estimation approaches in terms of reliability and boundary sensitivity. Code is available at https://github.com/HanaJebril/quam_sm

2606.18856 2026-06-18 cs.CL cs.LG 新提交

Approximate Structured Diffusion for Sequence Labelling

近似结构化扩散用于序列标注

Nicolas Floquet, Joseph Le Roux, Nadi Tomeh

发表机构 * Université Sorbonne Paris Nord, CNRS, Laboratoire d’Informatique de Paris Nord, LIPN(巴黎北大学 Sorbonne、法国国家科学研究中心、巴黎北信息学实验室、LIPN)

AI总结 提出一种基于扩散的条件随机场(CRF)训练方法,通过引入标签噪声条件来捕捉长距离依赖,结合近似推理在词性标注任务上实现16.5%的错误率降低。

详情
AI中文摘要

序列标注是自然语言处理(NLP)的核心任务,涉及为输入句子的每个标记分配一个标签。从机器学习的角度来看,序列标注通常被建模为由神经网络参数化的线性链条件随机场(CRF)。虽然这种方法在经验上取得了良好结果,但CRF假设有限的决策跨度(例如标签二元组),这可能会限制其表达能力,并在需要长距离依赖时损害性能。我们证明可以利用扩散来训练一个以整个标签序列为条件的CRF,但条件是标签的噪声版本。实验表明,该方法结合近似CRF推理,在词性标注任务上实现了16.5%的错误率降低,提高了标签准确性。

英文摘要

Sequence labelling, a core task of Natural Language Processing (NLP), consists in assigning each token of an input sentence a label. From a Machine Learning point of view, sequence labelling is often cast as a Linear-Chain Conditional Random Field (CRF) parametrised by a neural network. While this approach gives good empirical results, CRFs assume a finite decision span (eg label bigrams) which can limit their expressivity and hurt performance when long-range dependencies are required. We show we can leverage diffusion to train a CRF conditioned on an entire label sequence, with the caveat that the condition is on a noisy version of labels. We show experimentally that this method, in conjunction with approximate CRF inference, improves label accuracy with a 16.5% error reduction for POS-tagging.

2606.18852 2026-06-18 cs.CL cs.AI 新提交

Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining

对齐隐含陈述:通过上下文边界半硬负挖掘实现隐式仇恨言论的泛化性

Wicaksono Leksono Muhamad, Yunita Sari

发表机构 * Mantera Studio(Mantera工作室) Universitas Gadjah Mada(加雅玛大学)

AI总结 提出ImpSH三元组框架,通过将帖子与隐含陈述对齐并使用上下文边界半硬负样本聚焦学习,提升隐式仇恨言论的跨域泛化能力,在多个数据集上优于对比基线。

详情
AI中文摘要

隐式仇恨言论分类仍然是一个挑战,因为意图通常通过暗示和上下文而非明确辱骂来掩盖。先前的监督对比方法改进了域内检测,但可能过拟合表面线索,且难以跨数据集迁移。我们提出ImpSH,一个基于三元组的框架,当隐含陈述可用时将其与帖子对齐,并使用上下文边界半硬负样本将学习聚焦于近混淆项。我们还研究了AugSH,它通过数据增强形成正样本。在使用BERT和HateBERT对IHC、SBIC和DynaHate进行的受控评估中,ImpSH是标准监督对比基线的可行替代方案,并且在匹配的预处理和调优预算下通常能提高跨域性能。使用对齐性和均匀性进行的表示分析表明,正样本对更紧密且全局分布平衡,定性最近邻案例研究展示了域转移下的典型假负例。这些结果表明,通过上下文边界挖掘将帖子与其隐含陈述对齐,提供了到相关暗示的更稳定、类似双射的映射,克服了传统基于聚类的表示学习固有的波动性。

英文摘要

Classifying implicit hate speech remains a challenge, as intent is often masked through insinuation and context rather than explicit slurs. Prior supervised contrastive approaches improve in-domain detection but can overfit surface cues and struggle to transfer across datasets. We propose ImpSH, a triplet-based framework that aligns posts with implied statements when available and uses context-bounded semi-hard negatives to focus learning on near confusions. We also examine AugSH, which forms positives via data augmentation. In controlled evaluations on IHC, SBIC, and DynaHate with BERT and HateBERT, ImpSH is a viable alternative to standard supervised contrastive baselines and often improves cross-domain performance under matched preprocessing and tuning budgets. Representation analysis using alignment and uniformity indicates tighter positive pairs with balanced global spread, and qualitative nearest-neighbor case studies illustrate typical false negatives under domain shift. These results demonstrate that aligning posts with their implied statements via context-bounded mining provides a more stable, bijective-like mapping to related insinuations, overcoming the volatility inherent in traditional clustering-based representation learning.

2606.18850 2026-06-18 cs.CL cs.IR 新提交

ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

ScholarSum:基于知识图谱推理与反思性精炼的师生式抽象摘要生成

Bohou Zhang, Xiaoyu Tao, Mingyue Cheng, Huijie Liu, Qi Liu

发表机构 * State Key Laboratory of Cognitive Intelligence(认知智能国家重点实验室)

AI总结 提出ScholarSum框架,通过构建层次知识图谱引导学生生成初稿,并利用教师式审阅者迭代检查与修正,实现科学文献摘要的流畅性与事实一致性。

详情
AI中文摘要

抽象摘要生成在实现科学文献高效理解中起着关键作用,但它本质上要求同时具备语言流畅性和事实忠实性。现有方法往往难以协调这两个要求。抽取式方法依赖僵硬的句子拼接,破坏了宏观层面的逻辑连贯性;而基于大语言模型的生成式方法尽管掌握了语言流畅性,但事实一致性有限。在这项工作中,我们提出了ScholarSum,一个层次化反思性图框架,模拟师生写作过程以实现流畅且忠实的科学摘要生成。ScholarSum首先通过将文档分割成语义连贯的单元,组织成层次知识图谱,其多层社区结构捕获全局逻辑和宏观主题。在该全局结构引导下,学生生成初稿,随后通过细粒度证据检索进行精炼。为确保事实一致性,教师式审阅者迭代检查初稿,识别不支持的内容,并触发有针对性的重新检索和重写,直到摘要达到严格的质量标准。大量实验表明,ScholarSum在完整性和忠实性方面显著优于之前的基线方法。我们的代码可在该https URL获取。

英文摘要

Abstractive summarization plays a crucial role in enabling efficient understanding of scientific literature, yet it inherently demands both linguistic fluency and factual faithfulness. Existing approaches often fail to reconcile these two requirements. Extractive methods rely on rigid sentence splicing that disrupts macro-level logical coherence, while large language model (LLM)-based generative approaches, despite mastering linguistic fluency, exhibit limited factual consistency. In this work, we propose ScholarSum, a hierarchical reflective graph-based framework that emulates a student-teacher writing process for fluent and faithful scientific summarization. ScholarSum first organizes the document into a hierarchical knowledge graph by segmenting it into semantically coherent units, whose multi-layered community structure captures global logic and macro-level themes. Guided by this global structure, the student generates an initial draft, which is subsequently refined through fine-grained evidence retrieval. To ensure factual consistency, a teacher-like reviewer then iteratively examines the draft, identifies unsupported content, and prompts targeted re-retrieval and rewriting until the summary meets rigorous quality standards. Extensive experiments demonstrate that ScholarSum significantly outperforms previous baselines in terms of both completeness and faithfulness. Our code is available at https://github.com/Xiaoyu-Tao/ScholarSum.

2606.18847 2026-06-18 cs.AI 新提交

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

WorldLines: 对长时域有状态具身智能体进行基准测试与建模

Yehang Zhang, Jianchong Su, Haojian Huang, Yifan Chang, Tianhao Zhou, Xinli Xu, Yingjie Xu, Yinchuan Li, Zexi Li, Ying-Cong Chen

发表机构 * HKUST(GZ)(香港科技大学(广州)) HKUST(香港科技大学) Knowin

AI总结 提出WorldLines基准,通过构建带时间跨度的家庭轨迹(含对话、动作、状态变化等)评估具身智能体的长时记忆与任务规划能力,并设计ObsMem记忆框架提升状态感知决策。

Comments 27 pages, 18 figures

详情
AI中文摘要

为了在真实家庭环境中长时间协助人类,具身智能体必须记住用户习惯、世界状态和过去的交互。现有的长期记忆基准主要评估以语言为中心的检索和问答,而具身基准通常关注短时域任务执行,未测试在动态环境中长期记忆的使用。我们引入WorldLines,一个项目驱动的长时域具身家庭辅助基准。它构建了带时间跨度的家庭轨迹,包含对话、动作、执行反馈、物体和设备状态变化,并将其转换为带有证据链接的样本,用于记忆问答和具身任务规划。我们进一步提出ObsMem,一个观察者锚定的记忆框架,维护可见性感知的记忆和动作原生状态轨迹,以实现状态感知的决策。实验揭示了在部分可观测性、被覆盖的世界状态以及将长期记忆转化为具身规划方面的持续挑战,而ObsMem为此场景提供了更强的参考架构。

英文摘要

To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.

2606.18846 2026-06-18 cs.CV 新提交

From Bounding Boxes to Visual Reasoning: An On-Policy Data Annotation Tool for Vision-Language Models

从边界框到视觉推理:一种用于视觉语言模型的在线策略数据标注工具

Like Zhang, Runliang Niu, Shiqi Wang, Xiyu Hu, Qianli Xing, Pan Wang, Qingzu He, Qi Wang

发表机构 * School of Artificial Intelligence, Jilin University(吉林大学人工智能学院) College of Computer Science, Jilin University(吉林大学计算机科学与技术学院) OPPO

AI总结 提出ScreenAnnotator,通过统一标注原子模式、在线策略循环与贝叶斯验证器,解决现有工具表达力不足、标注-训练脱节和数据复用性差的问题,实现高效多任务数据生成。

Comments 14 pages, 7 figures

详情
AI中文摘要

视觉语言模型(VLM)正快速向复杂的基于基础的结构化视觉推理发展。训练具备此类高级能力的模型需要一种新型数据,该数据能将空间坐标、开放词汇描述、结构化属性和拓扑关系无缝统一为单一表示。然而,现有数据标注工具从根本上无法满足这些复杂需求,存在三个系统性瓶颈:表达力有限、严重的标注-训练解耦以及数据复用性差。为弥补这一基础设施差距,我们引入了一个开源标注工具ScreenAnnotator。首先,我们定义了一个统一的标注原子模式,将空间、语义和结构基元绑定为单个单元。其次,我们实现了一个嵌入贝叶斯标注验证器(BAV)的在线策略标注循环。最后,我们设计了一个模板驱动的多任务数据合成过程,动态地将静态原子转化为多样化的多维推理任务,消除了冗余的重新标注。在线策略循环将流程图上的标注接受率提升至近100%,GUI截图上的接受率达到77%,同时随着标注数据的积累,每张图像的标注时间稳步减少。在流程图场景中,微调VLM的平均准确率达到76.1%,绝对提升了35.1个百分点。我们的代码可在以下网址获取:this https URL。

英文摘要

Vision-language models (VLMs) are rapidly advancing toward sophisticated grounded structured visual reasoning. Training models for such advanced capabilities demands a new genre of data that seamlessly unifies spatial coordinates, open-vocabulary descriptions, structured attributes, and topological relationships into a singular representation. However, existing data annotation tools fundamentally fail to meet these intricate demands, suffering from three systematic bottlenecks: limited expressiveness, severe annotation-training decoupling, and poor data reusability. To bridge this infrastructure gap, we introduce an open-source annotation tool, ScreenAnnotator. First, we define a unified annotation atom schema that binds spatial, semantic, and structural primitives into a single unit. Second, we implement an on-policy annotation loop embedded with a Bayesian Annotation Verifier (BAV). Finally, we design a template-driven multi-task data synthesis process dynamically transforms static atoms into diverse multi-dimensional reasoning tasks, eliminating redundant re-annotation. The on-policy loop drives the annotation accept rate to nearly 100% on flowcharts and 77% on GUI screenshots, while steadily reducing per-image annotation time as labeled data accumulate. In the flowchart scenario, fine-tuning a VLM yields 76.1% average accuracy, which is a 35.1% point absolute gain. Our code is available at: https://github.com/WnQinm/Annotator.

2606.18844 2026-06-18 cs.LG 新提交

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

从自身错误中学习:为自蒸馏构建可学习的微反思轨迹

Zhilin Huang, Hang Gao, Ziqiang Dong, Yuan Chen, Yifeng Luo, Chujun Qin, Jingyi Wang, Yang Yang, Guanjun Jiang

发表机构 * Qwen Business Unit of Alibaba(阿里巴巴通义千问事业部) Tsinghua University(清华大学) Peking University(北京大学)

AI总结 提出TAPO方法,通过对比正确与错误轨迹构建微反思修正,实现从隐式分布对齐到显式轨迹构建的自蒸馏改进,在多个数学推理基准上优于GRPO。

详情
AI中文摘要

自蒸馏通过使用模型自身的生成作为训练信号来改进大型语言模型的推理能力,通常通过隐式的logit级对齐来实现,最小化与特权目标分布的KL散度。然而,由于这种监督是通过无控制采样生成的,它无法提供关于模型特定错误的诊断性洞察,也无法针对其个体失败模式提供纠正性指导。因此,模型学习的是模仿特权分布,而不是接收精确指出其推理失败位置和原因的细粒度修正。在本文中,我们提出了轨迹增强策略优化(TAPO),将自蒸馏从隐式分布对齐推进到显式轨迹构建。在强化学习训练期间,模型对同一查询同时产生正确和错误的生成轨迹,TAPO利用这种对比结构来构建微反思修正——新的训练轨迹,保留模型在失败点之前的错误推理,然后插入自然语言诊断和由同一采样组中的正确参考引导的修正推理。由于每条轨迹都锚定在学习者自身的前缀和解决方案上,与基于KL的方法施加的位置级对齐相比,修正信号在更大程度上保留了模型的在策略分布。为了整合这些轨迹,TAPO在模型能力边界引入了难度感知的候选选择,并采用解耦优势估计以防止梯度污染。在AIME 2024、AIME 2025和HMMT 2025上的实验表明,在相同训练步数下,TAPO相比GRPO取得了一致的改进。进一步分析表明,TAPO增强了首次推理和错误纠正的有效性。

英文摘要

Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target distribution. However, because this supervision is generated via uncontrolled sampling, it provides no diagnostic insight into the model's specific errors or corrective guidance for its individual failure patterns. Consequently, the model learns to imitate a privileged distribution rather than receiving fine-grained corrections that pinpoint where and why its reasoning fails. In this paper, we propose Trajectory-Augmented Policy Optimization (TAPO), which advances self-distillation from implicit distributional alignment to explicit trajectory construction. During RL training, the model produces both correct and incorrect rollouts to the same query, and TAPO leverages this contrastive structure to construct micro-reflective corrections, new training trajectories that retain the model's erroneous reasoning up to the point of failure, then insert a natural-language diagnosis and corrected reasoning guided by a correct reference from the same sampling group. Since each trajectory is anchored in the learner's own prefix and solutions, the corrective signal preserves the model's on-policy distribution to a greater extent than the position-wise alignment imposed by KL-based methods. To integrate these trajectories, TAPO introduces difficulty-aware candidate selection at the model's capability boundary and decoupled advantage estimation to prevent gradient contamination. Experiments on AIME 2024, AIME 2025, and HMMT 2025 show that TAPO achieves consistent improvements over GRPO under the same number of training steps. Further analysis demonstrates that TAPO strengthens both first-pass reasoning and error-correction effectiveness.

2606.18841 2026-06-18 cs.CV 新提交

Rethinking Air-Ground Collaboration: A Progressive Cross-Task Benchmark and Socialized Learning Framework

重新思考空地协作:渐进式跨任务基准与社会化学习框架

Zhoupeng Guo, Yunqi Zhu, Zhihe Fan, Xinjie Yao, Ruipu Zhao, Boan Tao, Yiming Sun, Zhen Wang, Pengfei Zhu

发表机构 * School of Automation, Southeast University(东南大学自动化学院) School of Computer Science and Engineering, University of New South Wales(新南威尔士大学计算机科学与工程学院) School of Sports Training, Tianjin University of Sport(天津体育学院运动训练学院) Faculty of Information Engineering and Automation, Kunming University of Science and Technology(昆明理工大学信息工程与自动化学院) School of Artificial Intelligence, Tianjin University(天津大学人工智能学院) School of Artificial Intelligence, Hebei University of Technology(河北工业大学人工智能学院)

AI总结 提出空地渐进协作基准AGPC和社会化协同感知框架SCP,通过双层级路由器实现跨视角跨任务选择性交互,在异构空地感知中提升下游性能7.86%。

详情
AI中文摘要

空地协同感知对于真实世界动态环境中的鲁棒视觉理解至关重要。然而,现有研究通常将协作建模为单任务跨视角融合,忽视了定位、目标关联和细粒度解析之间的功能依赖关系。此外,空中和地面视角的异构性引入了显著的几何、尺度和遮挡差异,使得统一特征共享容易受到负迁移的影响。为解决这些问题,我们将空地感知建模为渐进式跨任务协作任务,并构建了空地渐进协作(AGPC)基准,这是一个包含超过745K原始视频帧的时空对齐基准。基于该基准,我们提出了社会化协同感知(SCP),一个从空中全局定位到地面目标关联和身份感知解析的渐进式协作框架。其核心模块——双层级路由器(DLR),将输入侧的多尺度专家选择与输出侧的任务条件调制解耦,实现了选择性的跨视角和跨任务交互,同时抑制有害干扰。大量实验证明了SCP的有效性。它实现了3.73%的协同进化增益和7.86%的平均下游性能提升。这些结果表明,对于异构空地感知,任务条件协作比统一融合更有效。代码可在该网址获取。

英文摘要

Air-ground collaborative perception is crucial for robust visual understanding in real-world dynamic environments. However, existing studies typically formulate collaboration as single-task cross-view fusion, overlooking the functional dependencies among localization, target association, and fine-grained parsing. In addition, the heterogeneous nature of aerial and ground views introduces substantial geometric, scale, and occlusion discrepancies, making uniform feature sharing vulnerable to negative transfer. To tackle these issues, we model air-ground perception as a progressive cross-task collaboration task and construct the Air-Ground Progressive Collaboration (AGPC) benchmark, a spatio-temporally aligned benchmark comprising more than 745K raw video frames. Built upon this benchmark, we propose Socialized Co-Perception (SCP), a coarse-to-fine framework that organizes collaboration progressively from aerial global localization to ground target association and identity-aware parsing. Its core module, the Dual-Layer Router (DLR), decouples input-side multi-scale expert selection from output-side task-conditioned modulation, enabling selective cross-view and cross-task interaction while suppressing harmful interference. Extensive experiments demonstrate the effectiveness of SCP. It achieves a 3.73\% coevolutionary gain and a 7.86\% improvement in average downstream performance. These results show that task-conditioned collaboration is more effective than uniform fusion for heterogeneous air-ground perception. The code is available at https://github.com/g1136639260-spec/AGSCP.

2606.18839 2026-06-18 cs.LG cs.CV 新提交

Semantic Robustness Certification for Vision-Language Models

视觉语言模型的语义鲁棒性认证

Peiyu Yang, Paul Montague, Feng Liu, Andrew C. Cullen, Amardeep Kaur, Christopher Leckie, Sarah M. Erfani

发表机构 * School of Computing \& Information Systems, University of Melbourne, Australia

AI总结 提出首个无需额外数据即可认证视觉语言模型在语义层面(如形状、大小、风格)鲁棒性的框架,通过文本提示作为语义代理并量化决策边界,确保预测类别在语义变换下不变。

Comments Accepted to ICML

详情
AI中文摘要

视觉语言模型(VLM)现在被广泛用于下游任务。然而,现实世界的应用常常使VLM面临由语义变化(例如形状、大小和风格)引起的分布偏移。鲁棒性认证确定当对输入应用变换时模型的预测是否改变。虽然大多数认证框架研究输入的几何或像素级变换,但本文提出了一种新颖的框架,能够在语义级变换下认证VLM的鲁棒性。利用VLM的开放词汇能力,我们使用文本提示作为语义代理来构建由控制语义变化程度的范围参数化的变换。通过以封闭形式表征VLM决策边界,我们的框架定量地认证了在语义变换下预测类别保持不变的范围区间。我们的框架是第一个在语义级变化下认证VLM鲁棒性而无需为每种变化提供额外数据的框架,使其易于应用。在合成数据和真实数据上的实验表明,我们的框架能够在各种场景下认证针对多种语义变化的鲁棒性。

英文摘要

Vision-language models (VLMs) are now widely used in downstream tasks. However, real-world applications often expose VLMs to distribution shifts induced by semantic variation (e.g., shape, size, and style). Robustness certification determines if a model's prediction changes when transformations are applied to its input. While most certification frameworks study geometric or pixel-level transformations over inputs, this work proposes a novel framework that enables certifying VLM robustness under semantic-level transformations. Leveraging the open-vocabulary capability of VLMs, we use text prompts as semantic proxies to construct transformations parameterized by an extent that controls the degree of semantic variation. By characterizing the VLM decision boundary in closed form, our framework quantitatively certifies extent intervals for which the predicted class remains unchanged under the semantic transformation. Our framework is the first to certify VLM robustness under semantic-level variations without requiring additional data for each variation, making it practical to apply. Experiments on both synthetic and real-world data show that our framework enables certifying robustness under diverse semantic variations across scenarios.

2606.18834 2026-06-18 cs.LG 新提交

Identifying Structural Biases from Causal Mechanism Shifts

从因果机制变化中识别结构性偏差

Praharsh Nanavati, Jilles Vreeken, David Kaltenpoth

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍茨信息安全中心)

AI总结 提出利用环境间机制变化识别隐藏混淆和选择偏差,基于互信息构建可检验准则,并设计StruBI算法,在合成和真实数据上显著优于现有方法。

详情
AI中文摘要

因果发现方法通常假设所有数据独立同分布(i.i.d.),且系统中没有未测量的变量影响。在实践中,这些假设经常被违反,导致推断不准确。在本文中,我们研究如何从因果机制变化中识别隐藏混淆和选择偏差。特别地,我们表明结构性偏差会导致依赖的机制变化。也就是说,通过考虑在不同环境下的数据中哪些变量的机制发生了变化,我们可以判断哪些变量是无偏的,哪些受到隐藏混淆的影响,哪些正在经历选择偏差。我们将此形式化为一个基于互信息的经验可检验准则,并展示在哪些条件下它能识别结构性偏差。为了判断哪些节点受到何种偏差的影响,我们引入了StruBI算法。在合成和真实数据上的实验表明,StruBI在实践中表现良好,准确恢复了受影响的变量集和偏差类型,以较大优势超越了现有技术水平。

英文摘要

Causal discovery methods commonly assume that all data is independently and identically distributed (i.i.d.) and that there are no unmeasured variables affecting the system. In practice, these assumptions are often violated, leading to inaccurate inference. In this paper, we study how to identify hidden confounding and selection biases from causal mechanism shifts. In particular, we show that structural biases lead to dependent mechanism shifts. That is, by considering for which variables the mechanisms change given data from different environments, we can tell which variables are unbiased, which are subject to hidden confounding, and which are undergoing selection bias. We formalize this into an empirically testable criterion based on mutual information, and show under which conditions it identifies structural biases. To tell which nodes are subject to what kind of bias, we introduce the StruBI algorithm. Experiments on synthetic and real-world data show that StruBI works well in practice, accurately recovering affected variable sets and types of biases, outperforming the state-of-the-art by a wide margin.

2606.18833 2026-06-18 cs.LG 新提交

Seed-Guided Semi-Supervised Clustering by A-Contrario Anomaly Detection

基于A-Contrario异常检测的种子引导半监督聚类

Nassir Mohammad

发表机构 * Cyber Innovation Lab, Airbus, Newport, UK(空中客车公司网络创新实验室(英国纽波特))

AI总结 提出一种基于统计对偶性的半监督聚类框架,通过a-contrario推理和感知算法,利用种子标签初始化并迭代排除异常点,实现鲁棒聚类,在少量种子下达到强性能。

详情
AI中文摘要

本文介绍了一种基于分组原则与异常检测之间统计对偶性的半监督聚类框架。我们解决了噪声环境中鲁棒聚类定义的挑战——在该任务中,划分算法往往过度分配离群点,而基于密度的方法仍对启发式全局参数敏感。借鉴\textit{a-contrario}统计推理和格式塔邻近原则,我们将聚类定义为相对于均匀随机性零假设不包含任何异常点的最大数据点子集。该方法的核心是感知算法,该算法利用基于期望的原则性阈值($\mathbb{E} < 1$)来识别异常点,无需手动参数调整。通过将聚类视为异常检测的对偶问题,我们采用迭代的“通过排除进行聚类”机制。该算法由种子引导,利用最少的用户提供标签来初始化鲁棒的聚类中位数并形成初始组,随后通过接纳非异常点进行扩展。这种方法自然地隔离了边缘点、孤立噪声和新兴的未知聚类。我们在合成和真实基准数据集上评估了该方法,包括通过原始、线性降维和邻域保持嵌入表示的图像和文本数据集。结果表明,在每个聚类仅使用10-30个种子的情况下,所提出的方法在实用的低调优基准测试协议下实现了具有竞争力且通常非常强的性能,同时在固定种子聚类数和迭代次数下,对观测数和维度均保持线性可扩展性。

英文摘要

This paper introduces a semi-supervised clustering framework grounded in the statistical duality between grouping principles and anomaly detection. We address the challenge of robust cluster definition in noisy environments -- a task where partitioning algorithms often over-assign outliers and density-based methods remain sensitive to heuristic global parameters. Drawing on \textit{a-contrario} statistical reasoning and Gestalt proximity principles, we define a cluster as a maximal subset of data points containing no anomalies relative to a null hypothesis of uniform randomness. Central to this approach is the Perception algorithm, which utilises a principled expectation-based threshold ($\mathbb{E} < 1$) to identify outliers without manual parameter tuning. By treating clustering as the dual of anomaly detection, we employ an iterative ``clustering-by-exclusion'' mechanism. The algorithm is seed-guided, leveraging minimal user-provided labels to initialise robust cluster medians and form initial groups, which are subsequently expanded by admitting non-anomalous points. This approach naturally isolates fringe points, isolated noise, and emerging unknown clusters. We evaluate the method on synthetic and real-world benchmarks, including image and text datasets represented through raw, linear-reduced, and neighbourhood-preserving embeddings. Results demonstrate that with as few as 10--30 seeds per cluster, the proposed method achieves competitive and often very strong performance under a practical low-tuning benchmarking protocol, while maintaining linear scalability with respect to both observations and dimensionality for a fixed number of seeded clusters and iterations.

2606.18832 2026-06-18 cs.LG cs.AI 新提交

Target-confidence Recourse Using tSeTlin machines: TRUST

使用Tsetlin机器的目标置信度追索:TRUST

K. Darshana Abeyrathna, Sara El Mekkaoui, Nils Enric Canut Taugbøl, Anuja Vats

发表机构 * Group Research and Development Det Norske Veritas (DNV)(挪威船级社(DNV)集团研发部)

AI总结 提出TRUST框架,通过概率Tsetlin机器和贝叶斯优化直接搜索满足用户指定置信度目标的最小输入变化,生成更稳健和可解释的反事实解释。

详情
AI中文摘要

反事实解释被广泛用于高风险决策系统中的算法追索。大多数现有方法寻求最小化改变输入以翻转模型决策。然而,决策者通常不仅依赖预测标签,还依赖置信度阈值和风险边际。刚好越过决策边界的反事实在噪声或模型变化下可能脆弱且不稳定。本文提出使用Tsetlin机器的目标置信度追索(TRUST),一种用户明确指定追索所需预测置信度的框架。TRUST不是先生成反事实再评估置信度,而是直接搜索满足用户定义置信度目标的最小变化,从而在成本、置信度和鲁棒性方面比较追索选项。我们使用概率Tsetlin机器(PTM)结合贝叶斯优化实例化TRUST。PTM基于概率子句的结构将预测置信度与决策规则的稳定性联系起来。我们表明,满足相同规则的反事实在可靠性上可能差异很大,取决于它们满足这些规则的安全程度,揭示了决策是由稳健还是脆弱的子句激活支持的。在合成和真实数据集上的实验表明,目标置信度反事实比传统的基于边界的方法产生更稳健和可解释的追索。在多个基准测试中,TRUST实现了完美的鲁棒性,同时保持较低的追索成本,包括在Haberman数据集上以0.92置信度达到0.10的L2距离。通过显式控制置信度和暴露规则级稳定性,TRUST为高风险决策支持提供了可操作的追索。

英文摘要

Counterfactual explanations are widely used to provide algorithmic recourse in high-stakes decision-making systems. Most existing methods seek the smallest change to an input that flips a model's decision. However, decision-makers often rely not only on predicted labels but also on confidence thresholds and risk margins. Counterfactuals that barely cross a decision boundary can be fragile and unstable under noise or model variation. In this paper, we propose Target-confidence Recourse Using tSeTlin machines (TRUST), a framework in which users explicitly specify the desired prediction confidence for recourse. Rather than generating counterfactuals and evaluating confidence afterward, TRUST directly searches for minimal changes that satisfy a user-defined confidence target, enabling comparison of recourse options in terms of cost, confidence, and robustness. We instantiate TRUST using a Probabilistic Tsetlin Machine (PTM) combined with Bayesian optimization. The probabilistic clause-based structure of PTM links prediction confidence to the stability of decision rules. We show that counterfactuals satisfying the same rules can still differ substantially in reliability depending on how securely they satisfy those rules, revealing whether decisions are supported by robust or fragile clause activations. Experiments on synthetic and real-world datasets demonstrate that target-confidence counterfactuals produce more robust and interpretable recourse than conventional boundary-based approaches. Across multiple benchmarks, TRUST achieves perfect robustness while maintaining low recourse cost, including an L2 distance of 0.10 on the Haberman dataset at 0.92 confidence. By explicitly controlling confidence and exposing rule-level stability, TRUST provides actionable recourse for high-stakes decision support.

2606.18831 2026-06-18 cs.CL cs.AI 新提交

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

超越奖励工程:长上下文强化学习的数据配方

Xiaoyue Xu, Sikui Zhang, Xiaorong Wang, Xu Han, Chaojun Xiao

发表机构 * OpenBMB Tsinghua University(清华大学)

AI总结 提出一种简单有效的数据配方,结合最小化基于结果的GRPO设置,显著提升大语言模型的长上下文推理能力,在多个基准和智能体任务上取得平均+3.2至+7.2点的提升。

Comments 15 pages, 6 figures, 12 tables

详情
AI中文摘要

长上下文推理是大语言模型的一项关键能力,特别是当它们作为必须推理长轨迹的自主智能体部署时。强化学习最近成为提升这一能力的主要范式,然而现有工作主要关注奖励工程,而多样化的训练数据仍然稀缺。我们从数据为中心的角度重新审视这个问题,并表明仅凭一种简单有效的数据配方,结合最小化基于结果的GRPO设置,就足以显著提升长上下文推理。我们的配方针对三个互补的任务族——检索、多证据合成和推理——我们构建并整理了八个数据集,总计约1.4万个示例。在三个模型(Qwen3-4B/8B/30B-A3B)上的实验在七个长上下文基准上取得了平均+7.2/+3.2/+6.4分的提升,超过了之前的强化学习训练集。我们进一步证明这些增益可以迁移到智能体任务中,在基于智能体调整的模型上继续使用我们的数据配方进行强化学习训练,GAIA提升+4.8分,BrowseComp提升+7.0分。我们将发布我们的数据集以促进未来研究。

英文摘要

Long-context reasoning is an essential capability for large language models, particularly when they are deployed as autonomous agents that must reason over lengthy trajectories. Reinforcement learning (RL) has recently emerged as a dominant paradigm for improving this ability, yet existing work largely focuses on reward engineering while diverse training data remains scarce. We revisit this problem from a data-centric perspective and show that a simple yet effective data recipe alone, paired with a minimal outcome-based GRPO setup, suffices to substantially improve long-context reasoning. Our recipe targets three complementary task families -- retrieval, multi-evidence synthesis, and reasoning -- for which we construct and curate eight datasets totaling ~14K examples. Experiments on three models (Qwen3-4B/8B/30B-A3B) yield average gains of +7.2/+3.2/+6.4 points across seven long-context benchmarks, surpassing prior RL training sets. We further demonstrate that these gains transfer to agentic tasks, where continuing RL training on an agent-tuned model with our data recipe improves GAIA by +4.8 and BrowseComp by +7.0 points. We will release our datasets to facilitate future research.

2606.18829 2026-06-18 cs.LG cs.CL 新提交

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

GateMem:多主体共享内存代理中的内存治理基准

Zhe Ren, Yibo Yang, Yimeng Chen, Zijun Zhao, Benshuo Fu, Zhihao Shu, Bingjie Zhang, Yangyang Xu, Dandan Guo, Shuicheng Yan

发表机构 * School of Artificial Intelligence, Jilin University(吉林大学人工智能学院) Shanghai Jiao Tong University(上海交通大学) King Abdullah University of Science and Technology (KAUST)(卡尔斯鲁厄大学) Tsinghua University(清华大学) National University of Singapore(新加坡国立大学)

AI总结 提出GateMem基准,评估多主体共享内存代理在效用、访问控制和遗忘三方面的治理能力,发现现有方法无法同时满足三者。

Comments 24 pages, 8 figures. Code and dataset are available at https://github.com/rzhub/GateMem and https://huggingface.co/datasets/Ray368/GateMem

详情
AI中文摘要

LLM代理的内存基准主要假设单用户设置,而医院、工作场所、校园和家庭中的共享助手研究不足。在这些部署中,多个主体写入公共内存池并根据不同角色、范围和关系进行查询,因此内存质量需要治理和召回。我们引入GateMem,一个多主体共享内存代理的基准。GateMem联合评估合法长期请求的效用(含状态更新)、跨上下文授权边界的访问控制,以及显式删除请求后的主动遗忘。它涵盖医疗、办公、教育和家庭领域,包含长形式多方情节、增量内存注入、隐藏检查点、结构化评判和泄漏目标注释。在多种基线和骨干模型上,没有方法能同时实现强效用、鲁棒访问控制和可靠遗忘。长上下文提示通常以高令牌成本获得最佳治理分数,而基于检索和外部内存的方法降低成本但仍泄漏未授权或已删除信息。这些结果表明,当前内存代理远未达到可靠的共享机构部署水平。

英文摘要

Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.

2606.18828 2026-06-18 cs.RO cs.AI 新提交

Space Is Intelligence: Neural Semigroup Superposition for Riemannian Metric Generation

空间即智能:用于黎曼度量生成的神经半群叠加

Chenghao Xu

发表机构 * National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University(湖南大学机器人视觉感知与控制技术国家工程研究中心)

AI总结 提出将智能置于空间本身,通过神经半群叠加机制生成黎曼度量,使动作简化为测地线跟随,在单障碍场景训练后零样本泛化到未见配置。

详情
AI中文摘要

传统方法将智能置于智能体中,无论是作为学习策略还是搜索过程。我们则将智能置于空间本身:场景在构型流形上诱导一个黎曼度量,动作简化为跟随该度量的测地线,而无需调用单独的规划器或碰撞检查器。一个单一的编码器-路由器网络通过三个互补的参数组实现这一思想——框架参数(定向生成器)、调制参数(控制空间传播)和基本系数(决定强度)。这些组通过共享的半群叠加机制组合,产生单个黎曼度量场,形成一种紧凑的架构,其几何复杂度自然随场景复杂度扩展。在单个双障碍场景上训练后,该模型在未见过的障碍配置上展现出鲁棒的零样本泛化能力,无碰撞路径成本与障碍穿透路径成本相差数个数量级。

英文摘要

Traditional approaches place intelligence in the agent, whether as a learned policy or a search procedure. We instead place intelligence in the space itself: a scene induces a Riemannian metric on the configuration manifold, and action reduces to following the geodesics of that metric rather than invoking a separate planner or collision checker. A single Encoder-Router network realizes this idea through three complementary parameter groups -- frame parameters that orient the generators, modulation parameters that govern their spatial propagation, and basic coefficients that determine their strength. These groups combine through a shared semigroup-superposition mechanism to produce a single Riemannian metric field, yielding a compact architecture whose geometry scales naturally with scene complexity. Trained on a single two-obstacle scene, the model demonstrates robust zero-shot generalization across unseen obstacle configurations, with orders-of-magnitude separation between collision-free and obstacle-penetrating path costs.

2606.18825 2026-06-18 cs.CV 新提交

DreamReg: Belief-Driven World Model for 2D-3D Ultrasound Registration

DreamReg:基于信念驱动的世界模型用于2D-3D超声配准

Luoyao Kang, Yuelin Zhang, Jiwei Shan, Haifan Gong, Qingpeng Ding, Shing Shin Cheng

发表机构 * T Stone Robotics Institute, The Chinese University of Hong Kong(香港中文大学T Stone机器人研究所) Multi-scale Medical Robotics Center(多尺度医疗机器人中心) Perelman School of Medicine, University of Pennsylvania(宾夕法尼亚大学佩雷尔曼医学院)

AI总结 提出DreamReg框架,将2D-3D超声配准建模为信念更新,通过世界模型模拟探头运动并整合想象结果,在CAMUS和u-RegPro数据集上实现鲁棒且准确的实时配准。

详情
AI中文摘要

超声(US)广泛应用于手术导航,但由于部分可观测性、散斑噪声以及依赖于动作的US采集,术中2D切片与术前3D体积之间的实时配准仍然具有挑战性。现有方法是一次性的或短视的,难以随时间收集证据或捕捉外科医生如何根据屏幕反馈调整探头运动。我们提出DreamReg,一个基于信念驱动的世界模型框架,将2D-3D配准形式化为对刚性变换的信念更新。DreamReg维护一个潜在信念状态,总结过去的观测和位姿信息,并在新切片到达时通过学习到的动态不断细化变换。在训练期间,DreamReg暴露于模拟临床扫描行为的探头运动轨迹,并通过将位姿细化条件于当前US观测来学习更新其信念。在推理期间,DreamReg通过内部想象来细化配准:它展开学习到的世界模型以模拟候选探头运动及其预测的观测,并整合这些想象的结果以收敛到准确的刚性变换。在CAMUS和u-RegPro数据集上的实验表明,与最先进方法相比,DreamReg在实时引导中具有改进的鲁棒性和有竞争力的配准精度。

英文摘要

Ultrasound (US) is widely used for surgical navigation, yet real-time registration between intraoperative 2D slices and preoperative 3D volumes remains challenging due to partial observability, speckle noise, and the action-dependent US acquisition. Existing methods are one-shot or short-horizon, making it hard for them to gather evidence over time or capture how surgeons adjust probe motion based on on-screen feedback. We propose DreamReg, a belief-driven world-model framework that formulates 2D-3D registration as belief updating over rigid transformations. DreamReg maintains a latent belief state that summarizes past observations and poses information, and continuously refines the transformation through learned dynamics as new slices arrive. During training, DreamReg is exposed to probe-motion trajectories that mimic clinical scanning behavior and learns to update its belief by conditioning pose refinement on the current US observation. During inference, DreamReg refines registration via internal imagination: it rolls out the learned world model to simulate candidate probe motions and their predicted observations, and integrates these imagined outcomes to converge to an accurate rigid transformation. Experiments on CAMUS and u-RegPro datasets demonstrate improved robustness and competitive registration accuracy for real-time guidance compared with state-of-the-art methods.

2606.18824 2026-06-18 cs.CV cs.LG 新提交

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

他们将去哪里?从自我中心视频建模多模态行人机动

Yuxuan Xie, Nicolas Pugeault, Chongfeng Wei, Hubert P. H. Shum, Edmond S. L. Ho

发表机构 * School of Computing Science, University of Glasgow(格拉斯哥大学计算机科学学院) James Watt School of Engineering, University of Glasgow(格拉斯哥大学詹姆斯·瓦特工程学院) Department of Computer Science, Durham University(杜伦大学计算机科学系)

AI总结 提出MMPM框架,通过行为感知交互模块和基于CVAE的模态感知轨迹预测器,分别建模行人过马路和不过马路两种模式,提升自我中心视角下多模态轨迹预测准确性。

Comments Accepted at The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2026

详情
AI中文摘要

从自我中心摄像头进行行人轨迹预测具有挑战性,因为它依赖于与车辆和场景上下文的复杂交互以及行人的意图。通过建模行人历史与未来轨迹的相关性和意图,通常会产生多模态(即多个模式)分布。现有的随机预测器通常从单一单峰分布中采样多个未来轨迹,这可能导致次优的“混合模式”轨迹,这些轨迹位于不同的运动模式之间,并在真实场景中变得不合理。在本文中,我们提出MMPM,一种模态感知框架,基于行人的过马路行为将未来轨迹分布分别建模为语义上有意义的模式。MMPM由两个模块组成:行为感知行人交互模块(PIM),通过引入注视、头部和手势来联合捕捉行人-车辆和行人-环境交互;以及基于CVAE的模态感知轨迹预测器(MTP)模块,分别对过马路和不过马路两种模式的未来轨迹分布进行建模。基于查询的解码器进一步在解码过程中强制执行模态一致性。在PIE和JAAD数据集上的实验表明,我们的方法超越了最先进的基线。我们提出的MTP是模型无关的,可以集成到现有框架如BiTrap-NP和SGNet-ED中,以进一步提高未来轨迹预测性能。我们还引入了一种数据驱动的验证协议,将预测与时空一致的真实轨迹匹配,展示了相比先前工作改进的逐帧位移误差。

英文摘要

Pedestrian trajectory prediction from an ego-centric camera is challenging since it depends on complex interactions with vehicles and scene context, as well as the intention of the pedestrian. By modelling correlation and intent from the historical and future trajectories of the pedestrian, it will usually result in a multimodal (i.e. multiple modes) distribution. Existing stochastic predictors often sample multiple futures from a single unimodal distribution, which can yield sub-optimal 'mixed-mode' trajectories that lie between distinct motion patterns and become implausible in real scenes. In this paper, we propose MMPM, a mode-aware framework that separately models future trajectory distributions into semantically meaningful modes based on the pedestrian's crossing behavior. MMPM consists of two modules: behavior-aware Pedestrian Interaction Module (PIM) that jointly captures pedestrian-vehicle and pedestrian-environment interactions by introducing gaze, head and hand gesture, and a CVAE-based Mode-aware Trajectory Predictor (MTP) module to model the future trajectory distributions on two modes, crossing and non-crossing the road, separately. A query-based decoder further enforces mode consistency during decoding. Experiments on PIE and JAAD datasets show that our method surpasses state-of-the-art baselines. Our proposed MTP is model-agnostic, which can be integrated into existing frameworks such as BiTrap-NP and SGNet-ED to further improve future trajectory prediction performance. We additionally introduce a data-driven validation protocol that matches predictions to spatio-temporally consistent ground-truth trajectories, demonstrating improved frame-wise displacement errors over previous work.

2606.18820 2026-06-18 cs.LG cs.AI 新提交

Maturing Markov Decision Processes: Decision Making under Increasing Information and Shrinking Action Sets

成熟马尔可夫决策过程:信息增加与动作集缩小下的决策制定

Jiaxi Liu, Aiping Yang, Yuhang Yang, Shuqi Zhang, Zewei Dong, Jiangming Yang, Xuebin Chen

发表机构 * Ant International(蚂蚁国际) School of Economics, Sichuan University(四川大学经济学院) School of Economics, Fudan University(复旦大学经济学院)

AI总结 针对决策过程中信息增加与动作集缩小的不对称性,提出成熟马尔可夫决策过程(MMDP)框架,并基于过期动作优先级原则开发结构感知强化学习方法,实验证明其能提升学习效率。

Comments 25 pages, 9 figures

详情
AI中文摘要

序列决策问题通常表现出信息和决策灵活性的不对称演化:随着决策周期的展开,智能体获得更丰富的信息,而由于操作截止、承诺或资源约束,可行动作逐渐过期。标准的MDP公式通常将这种结构扁平化为阶段相关的状态描述和动作掩码,从而掩盖了嵌套的信息-动作不对称性,而这种不对称性决定了哪些决策是紧急的、哪些可以推迟。我们引入了成熟马尔可夫决策过程(MMDP),这是一种围绕这种信息-动作不对称性构建的公式。我们通过一个过期动作优先级原则来刻画其关键后果之一,该原则识别出必须在下一阶段之前解决的动作。受此结构启发,我们开发了一个结构感知的强化学习框架,包括阶段感知的策略设计、过期动作抽象以及带有蒸馏的搜索增强学习。在受控的多供应商补货问题、复杂度递增的简化现金管理环境以及生产级模拟器上的实验表明,显式建模这种不对称性可以提高学习效率,并且随着决策问题的规模扩大,其价值日益增加。

英文摘要

Sequential decision problems often exhibit an asymmetric evolution of information and decision flexibility: as a decision cycle unfolds, the agent receives richer information while feasible actions expire due to operational cutoffs, commitments, or resource constraints. Standard MDP formulations typically flatten this structure into stage-dependent state descriptions and action masks, thereby obscuring the nested information--action asymmetry that determines which decisions are urgent and which can be deferred. We introduce Maturing Markov Decision Processes (MMDPs), a formulation built around this information--action asymmetry. We characterize one of its key consequences through an expiring-action priority principle, which identifies the actions that must be resolved before the next stage. Motivated by this structure, we develop a structure-aware reinforcement learning framework with stage-aware policy design, expiring-action abstraction, and search-augmented learning with distillation. Experiments on a controlled multi-supplier replenishment problem, simplified cash-management environments of increasing complexity, and a production-scale simulator show that explicitly modeling this asymmetry improves learning efficiency and becomes increasingly valuable as decision problems scale.

2606.18810 2026-06-18 cs.LG cs.AI 新提交

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

从自身解中学习:面向可验证奖励强化学习的自条件化信用分配

Yingyu Shan, Yuhang Guo, Zihao Cheng, Zeming Liu, Xiangrong Zhu, Xinyi Wang, Jiashu Yao, Wei Lin, Hongru Wang, Heyan Huang

发表机构 * Beijing Institute of Technology(北京理工大学) Beihang University(北京航空航天大学) Independent Researcher(独立研究者)

AI总结 提出SC-GRPO方法,利用自条件化分布间的KL散度作为GRPO梯度的乘性权重,实现细粒度信用分配,在数学、代码和智能体任务上平均提升8.1%。

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)在训练LLMs进行推理任务方面取得了显著进展,但代表性方法如GRPO对所有token分配统一信用,浪费了常规token上的梯度,同时低估了关键推理步骤。现有的token级信用分配方法需要超出模型自身rollout的资源。GRPO变体依赖于过程奖励模型或真实答案。知识蒸馏通过每个token的散度分配信用,但需要外部教师(在线策略蒸馏)或特权信息(在线策略自蒸馏)。然而,这些依赖性限制了在纯RLVR设置中的适用性。我们观察到,将模型以其自身验证过的轨迹为条件,会在原始分布和条件分布之间诱导出可测量的每token KL散度,并证明当存在多个验证过的轨迹时,从由验证过的轨迹构建的自教师进行蒸馏会导致不可行的加权平均解。我们提出SC-GRPO(自条件化GRPO),它使用前述KL散度作为GRPO梯度的乘性权重。在涵盖数学、代码和智能体任务的五个基准上,SC-GRPO一致优于GRPO 8.1%,优于DAPO 5.9%,并具有更强的分布外性能。此外,SC-GRPO实现了比OPD更高的性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on routine tokens while under-crediting pivotal reasoning steps. Existing token-level credit assignment methods require resources beyond the model's own rollouts. GRPO variants rely on process reward models or ground-truth answers. Knowledge distillation assigns credit through per-token divergence but requires external teachers (On-Policy Distillation) or privileged information (On-Policy Self Distillation). However, these dependencies limit applicability in the pure RLVR setting. We observe that conditioning the model on its own verified trajectories induces a measurable per-token KL divergence between the original and conditioned distributions, and prove that distilling from a self-teacher constructed by verified trajectories leads to infeasible weighted-average solutions when multiple verified trajectories exist. We propose SC-GRPO (Self-Conditioned GRPO), which uses KL divergence mentioned before as a multiplicative weight on GRPO gradients. Across five benchmarks spanning math, code, and agentic tasks, SC-GRPO consistently outperforms 8.1% over GRPO and 5.9% over DAPO with stronger OOD performance. Moreover, SC-GRPO achieves higher performance than OPD.