arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

cs.LG 机器学习应用 76 cs.AI AI应用与系统 64 cs.AI 机器学习与表示学习 53 cs.AI 评测、基准与数据集 49 cs.LG 数据集、基准与评测 37 cs.LG 强化学习与序列决策 35 cs.LG 优化、泛化与理论分析 34 cs.AI 可信、安全与AI治理 33 cs.AI 自然语言与多模态智能 32 cs.LG 深度学习架构与训练方法 31 cs.CV 医学影像与生物视觉 30 cs.CL 大语言模型与基础模型 27 cs.CV 生成式视觉与世界模型 26 cs.CL 评测、数据集与基准 26 cs.AI 机器人与具身智能 24 cs.CV 数据集、基准、评测与训练方法 24 cs.AI 智能体、规划与决策 22 cs.RO 导航、定位与SLAM 22 cs.LG 高效学习、压缩与部署 21 cs.CV 多模态与视觉语言模型 18 cs.LG 鲁棒性、不确定性与可信学习 18 cs.AI 多智能体与博弈 17 cs.CV 具身智能、机器人与自动驾驶 17 cs.LG 生成模型与概率建模 16 cs.CV 3D视觉、点云与空间智能 15 cs.CL 对话系统与智能体 15 cs.CL 安全、隐私、公平与可解释NLP 15 cs.RO 仿真、数据集与评测 14 cs.RO 机器人学习与模仿强化学习 13 cs.RO 操作、抓取与灵巧手 13 cs.RO 无人车、无人机与移动机器人 12 cs.CV 鲁棒性、安全、隐私与可信视觉 11 cs.LG 表示学习、自监督与对比学习 11 cs.LG 其他/综合机器学习 11 cs.RO 运动规划、控制与动力学 11 cs.AI 其他/综合AI 10 cs.CV 视频理解与时序视觉 9 cs.CV 低层视觉、计算成像与图像增强 9 cs.CL 信息抽取、检索与问答 9 cs.CV 目标检测、分割与定位 8 cs.CL 多模态语言处理 8 cs.CL 语音语言联合与音频文本 8 cs.CL 其他/综合NLP 8 cs.LG 联邦学习、隐私与安全 8 cs.LG 图学习与结构化数据 8 cs.SD 语音识别与关键词检测 8 cs.SD 语音合成与声音生成 8 cs.CV 其他/综合视觉 7 cs.AI 搜索、优化与约束求解 6 cs.CV 图像识别、检索与分类 6 cs.LG 迁移、元学习与持续学习 6 cs.RO 人机交互与协作机器人 6 cs.RO 具身智能与视觉语言动作模型 6 cs.RO 多机器人与群体系统 6 cs.RO 安全、鲁棒性与可信机器人 6 cs.AI 知识表示、推理与符号AI 5 cs.SD 数据集、基准与评测 5 cs.CL 机器翻译与跨语言处理 4 cs.RO 其他/综合机器人 4 cs.CL 文本生成、摘要与编辑 3 cs.RO 软体机器人与硬件设计 3 cs.CL 语义、语法与语言学分析 2 cs.SD 语音增强、降噪与音频修复 2 cs.SD 音频事件检测与场景理解 2 cs.SD 低资源、多语言与方言语音 2 cs.SD 安全、隐私与深度伪造音频 2 cs.CV 文档图像、OCR与图表理解 1 cs.CL 低资源、领域适配与高效训练 1 cs.SD 多模态音频与视听学习 1 cs.SD 其他/综合语音音频 1

2606.19774 2026-06-19 cs.RO 新提交

Start Right, Arrive Right: Asynchronous Execution via Initial Noise Selection

开始正确，到达正确：通过初始噪声选择实现异步执行

Trong-Bao Ho, Quang-Tan Nguyen, Thien-Loc Ha, Gia-Binh Nguyen, Viet-Thanh Nguyen, Long Dinh, Minh N. Vu, Duy M. H. Nguyen, An Thai Le, Ngo Anh Vien

发表机构 * VinRobotics ； VinUniversity ； DFKI（德国人工智能研究中心）； University of Stuttgart（斯图加特大学）； IMPRS-IS（国际马克斯·普朗克智能系统研究学院）

AI总结针对流式策略异步执行中的动作块边界不一致问题，提出无需训练的PAINT方法，通过初始噪声选择而非轨迹引导实现前缀一致性，在12个模拟和6个真实操作任务中提升执行一致性与任务性能。

Comments First version 19 pages, project site: https://paint-action-chunking.github.io

详情

AI中文摘要

动作分块使机器人策略能够产生时间上连贯的行为，但基于流的策略生成多步动作序列会产生延迟，与实时控制不兼容。在异步执行下，机器人继续执行当前块的同时生成下一个块，即使微小延迟也会在块边界造成不一致。现有方法通过将生成导向已执行的动作前缀来解决此问题。我们则表明，通过在生成开始前选择合适的初始噪声即可实现前缀一致性，使得未经修改的流ODE能够生成连贯的下一块。这将异步推理重新定义为噪声选择问题而非轨迹引导问题。我们提出\textbf{PAINT}，一种无需训练的方法，通过后向欧拉反演找到此噪声，并通过重绘规则构建最终块。总之，\texttt{PAINT}不需要梯度、重新训练或策略修改；然而它在\textit{12个模拟基准}和\textit{6个真实世界操作任务}（涵盖单臂、双臂和人形机器人）上提高了执行一致性和任务性能。网站：~\href{ this https URL }{\texttt{ this https URL }}。

英文摘要

Action chunking enables robot policies to produce temporally coherent behavior, but generating multi-step action sequences with flow-based policies incurs latency that is incompatible with real-time control. Under asynchronous execution, the robot continues executing the current chunk while the next one is generated, causing even minor delays to create inconsistencies at chunk boundaries. Existing methods address this problem by steering generation toward the already executed action prefix. We instead show that prefix consistency can be achieved by selecting an appropriate initial noise before generation begins, allowing the unmodified flow ODE to produce a coherent next chunk. This reframes asynchronous inference as a noise selection problem rather than a trajectory steering problem. We introduce \textbf{PAINT}, a training-free method that finds this noise via backward Euler inversion and constructs the final chunk through a repainting rule. In summary, \texttt{PAINT} requires no gradients, retraining, or policy modification; yet it improves execution consistency and task performance across \textit{12 simulated benchmarks} and \textit{6 real-world manipulation tasks} spanning single-arm, bimanual, and humanoid embodiments. Website: ~\href{https://paint-action-chunking.github.io}{\texttt{https://paint-action-chunking.github.io}}.

URL PDF HTML ☆

赞 0 踩 0

2606.19771 2026-06-19 cs.AI 新提交

Beyond Entropy: Learning from Token-Level Distributional Deviations for LLM Reasoning

超越熵：从令牌级分布偏差中学习以增强LLM推理

Xuanzhi Feng, Zhengyang Li, Zeyu Liu, Haoxi Li, Yuming Jiang, Bing Guo, Jingcai Guo, Jie Zhang, Song Guo

发表机构 * The Hong Kong University of Science and Technology（香港科技大学）； Sichuan University（四川大学）； The Hong Kong Polytechnic University（香港理工大学）

AI总结针对RLVR中令牌更新导致的熵塌陷或爆炸问题，提出ICT框架，利用JS散度识别关键令牌，通过选择性更新平衡策略集中度，提升推理性能。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）显著推进了大语言模型（LLM）推理；然而，它面临一个基本的优化不稳定性：均匀令牌更新会导致熵塌陷，从而过早收敛到次优策略，而过度的香农熵最大化可能导致熵爆炸，驱动盲目探索走向不连贯的推理链。为解决这一二分问题，我们引入了独立组合令牌（ICT）框架，该框架将优化焦点从标量不确定性转移到令牌logits的分布特性。通过利用令牌logits分布之间的詹森-香农（JS）散度，ICT将具有独特分布模式的令牌识别为引导LLM推理中有效探索的关键分支点。我们的理论分析基于香农熵和二阶Rényi熵，证明选择性地更新这些令牌可以调节策略集中度：它降低了由香农熵度量的整体分布不确定性，同时控制了由二阶Rényi熵捕获的概率集中度。这种双重效应防止了过度集中的令牌生成削弱探索，并有效稳定了训练景观。实验结果表明，在Qwen2.5（0.5B/1.5B/7B）模型上仅更新前10%的独特令牌，在涵盖数学、常识和奥林匹克级别问题的七个基准测试中，与GRPO、20-Entropy和STAPO基线相比，平均pass@4提升了4.58%，最大提升达14.9%。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced Large Language Model (LLM) reasoning; however, it faces a fundamental optimization instability: uniform token updates precipitate entropy collapse, leading to premature convergence to suboptimal strategies, whereas excessive Shannon Entropy maximization can cause entropy explosion, driving blind exploration toward incoherent reasoning chains. To resolve this dichotomy, we introduce the Independent Combinatorial Tokens (ICT) framework, which shifts the optimization focus from scalar uncertainty to the distributional properties of token logits. By leveraging the Jensen-Shannon (JS) divergence between token logits distributions, ICT identifies tokens with distinctive distributional patterns as critical branching points for guiding effective exploration in LLM reasoning. Our theoretical analysis, grounded in both Shannon and second-order Rényi entropy, proves that selectively updating on these tokens regulates policy concentration: it reduces the overall distribution uncertainty measured by Shannon entropy, while controlling probability concentration captured by second-order Rényi entropy. This dual effect prevents over-concentrated token generation from weakening exploration and effectively stabilizes the training landscape. Empirical results demonstrate that updating only the top 10% of unique tokens on Qwen2.5 (0.5B/1.5B/7B) models yields an average pass@4 improvement of 4.58%, with a maximum gain of 14.9%, over GRPO, 20-Entropy, and STAPO baselines across seven benchmarks spanning math, commonsense, and Olympiad-level problems.

URL PDF HTML ☆

赞 0 踩 0

2606.19770 2026-06-19 cs.LG 新提交

An Information Theoretic Framework for Graph Novelty Generation via Latent Mixture Modeling

基于潜在混合建模的图新颖性生成的信息论框架

Itsuki Nakagawa, Kenji Yamanishi

发表机构 * Graduate School of Information Science and Technology, The University of Tokyo（东京大学信息科学与技术研究生院）

AI总结提出信息论框架，通过潜在混合建模和描述长度约束，生成与现有模式不同且保持全局结构一致性的新颖图数据。

详情

AI中文摘要

我们提出了一个用于图新颖性生成的信息论框架，旨在生成与现有模式不同且保持全局结构一致性的数据。我们的方法将数据嵌入潜在空间，使用有限混合模型对潜在分布进行建模，并通过基于描述长度制定的显式新颖性和可靠性条件生成新颖样本。具体来说，新颖性通过要求生成样本难以被所有现有混合成分解释来强制执行，而可靠性则根据最小描述长度（MDL）原则约束其对整体混合结构的影响。我们提供了理论分析，表明在适当的阈值选择下，将非新颖或不可靠样本错误分类的概率以显式速率收敛到零。在合成和基准图数据集上的实验表明，所提出的方法能够以可量化的风险实现原则性的新颖性生成。

英文摘要

We propose an information-theoretic framework for graph novelty generation, which aims to generate data that are distinct from existing patterns while preserving global structural consistency. Our approach embeds data into a latent space, models the latent distribution using finite mixture models, and generates novel samples by imposing explicit novelty and reliability conditions formulated in terms of description length. Specifically, novelty is enforced by requiring generated samples to be poorly explained by all existing mixture components, while reliability constrains their impact on the overall mixture structure under the Minimum Description Length (MDL) principle. We provide a theoretical analysis showing that, with appropriate threshold choices, the probabilities of misclassifying non-novel or unreliable samples converge to zero with explicit rates. Experiments on synthetic and benchmark graph datasets demonstrate that the proposed method enables principled novelty generation with quantifiable risk.

URL PDF HTML ☆

赞 0 踩 0

2606.19759 2026-06-19 cs.AI cs.SI 新提交

Optimal Scheduling in a Question-Answering Forum of Knowledge Workers

知识工作者问答论坛中的最优调度

Rohit Negi, Mustafa Yilmaz

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结针对知识工作者问答论坛，提出基于专家专业水平的请求调度模型，计算系统容量并设计达到容量的调度器，同时探讨专家协作对容量的提升。

Comments 14 pages, 4 figures

2606.19752 2026-06-19 cs.RO cs.AI 新提交

OnDeFog：帧丢失下的在线决策变压器

Daiki Yotsufuji, Kenta Nishihara, Shoma Shimizu, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University（横滨国立大学）

AI总结针对帧丢失导致性能下降的问题，提出OnDeFog，将DeFog机制与在线决策变压器结合，通过直接环境交互学习策略，在高丢帧率环境下优于ODT，在低奖励数据集上优于DeFog。

Comments Accepted to PRICAI 2025

详情

DOI: 10.1007/978-981-95-7072-0_10

AI中文摘要

在具有挑战性的现实世界强化学习应用中，通信延迟或传感器故障经常导致帧丢失，此时智能体无法接收丢失的状态及相关奖励。为了解决帧丢失导致的性能下降问题，通过将额外机制引入决策变压器以处理帧丢失，开发了随机帧丢失下的决策变压器（DeFog）。尽管DeFog可以缓解帧丢失环境中的性能下降，但由于DeFog是一种离线学习方法，它难以有效泛化到训练数据集中未充分表示的新状态。在本研究中，我们提出OnDeFog，它将DeFog中的机制与在线决策变压器（ODT）相结合，ODT是一种通过直接环境交互学习策略的在线强化学习方法。全面的实验评估表明，我们提出的OnDeFog在高丢帧率环境下相比ODT取得了更优的性能，并且在包含大量低奖励数据的数据集上优于DeFog。

英文摘要

In challenging real-world reinforcement learning applications, communication delays or sensor failures often cause frame dropping, in which the agent cannot receive the dropped states and associated rewards. To address the performance degradation caused by frame dropping, the Decision Transformer under Random Frame Dropping (DeFog) was developed by incorporating additional mechanisms into the decision transformer to tackle frame dropping. Although DeFog can mitigate performance degradation in frame-dropping environments, since DeFog is an offline learning method, it struggles to effectively generalize to novel states not adequately represented in the training dataset. In this study, we propose OnDeFog, which integrates the mechanisms in DeFog with the online decision transformer (ODT), an online reinforcement learning method that learns policies through direct environmental interaction. Comprehensive experimental evaluation demonstrates that our proposed OnDeFog achieves superior performance compared to ODT in environments characterized by high dropping frame rate and outperforms DeFog on datasets containing a large amount of low-reward data.

URL PDF HTML ☆

赞 0 踩 0

2606.19718 2026-06-19 cs.CV 新提交

One-Shot Novel View and Pose Human Image Synthesis via 3D Prior Guided Diffusion Model

基于3D先验引导扩散模型的单样本新视角与姿态人体图像合成

Shenjian Gong, Kangkan Wang, Shanshan Zhang, Jian Yang

发表机构 * PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology（南京理工大学计算机科学与工程学院教育部高维信息智能感知与系统重点实验室、江苏省社会安全图像与视频理解重点实验室及PCA实验室）； Advanced Laser Technology Laboratory of Anhui Province, Electronic Engineering Institute, National University of Defense Technology, and Jianghuai Advance Technology Center（国防科技大学电子工程学院安徽省先进激光技术实验室及江淮前沿技术中心）

AI总结提出一种基于条件去噪扩散模型的方法，利用3D人体先验（法线图和颜色提示）作为几何和颜色条件，从单张参考图像合成任意姿态和视角的高质量人体图像，包括被遮挡部分。

Comments 30 pages, 10 figures

详情

DOI: 10.1016/j.patcog.2026.113644

AI中文摘要

本文解决了单样本新视角和姿态人体图像合成的挑战。现有方法通过一组2D姿态关键点将参考人体图像转移到目标姿态，或基于可泛化人体NeRF（使用人体模型先验提取逐点特征）合成人体图像。然而，基于姿态转移的方法无法处理使用模糊2D姿态作为条件的复杂人体姿态，而可泛化人体NeRF在缺乏可靠特征时可能无法准确恢复被遮挡/不可见的人体部分。为解决这些问题，我们提出了一种基于条件去噪扩散模型的新方法，用于从单张人体图像进行新视角和姿态合成。我们的扩散模型将新视角和姿态合成问题分解为一系列条件去噪步骤。具体而言，为了生成具有复杂和任意姿态的人体，我们将3D人体先验（即3D法线图和颜色提示）作为几何和颜色条件引入生成过程。通过一系列扩散步骤将参考人体转移到目标人体，我们的扩散模型能够实现高质量合成，包括被遮挡/不可见部分。此外，我们提出了一种基于自重建的自定义细化方法，以在测试新视角时增强细节。在多个公共数据集上的实验结果表明，我们的方法显著优于先前方法，并显示出更好的跨数据集泛化能力。代码将在https://this https URL上公开。

英文摘要

This paper addresses the challenge of one-shot novel view and pose human image synthesis. The existing methods transfer the reference human image to a target pose using a set of 2D pose keypoints or synthesize human images based on generalizable human NeRF which uses human model priors to extract point-wise features. However, pose transfer based methods can not handle complex human pose using ambiguous 2D pose as the condition, while generalizable human NeRFs may be inaccurate to recover occluded/invisiable human parts without extracted reliable features. To solve these problems, we propose a novel approach for novel view and pose synthesis from a singe human image via conditional denoising diffusion model. Our diffusion model divides the novel view and pose synthesis problem into a sequence of conditional denoising steps. Specifically, to generate humans with complex and arbitrary poses, we introduce 3D human priors, i.e., 3D normal map and color prompt, as geometry and color conditions into the generation process. By transferring the reference human into the target human with a series of diffusion steps, our diffusion model enables high-quality synthesis including the occluded/invisible parts. Further, we propose a self-reconstruction based customized refinement to enhance fine details when tested on novel persons.Experimental results on different public datasets demonstrate that our approach significantly outperforms previous methods and also shows better generalization ability across datasets. The code will be made publicly available at https://github.com/Yankeegsj/3DPGDM.

URL PDF HTML ☆

赞 0 踩 0

2606.19712 2026-06-19 cs.LG cs.CV 新提交

Efficient Neural Network Model Selection for Few-Class Application Datasets

面向少类应用数据集的高效神经网络模型选择

Bryan Bo Cao, Abhinav Sharma, Lawrence O'Gorman, Michael Coss, Shubham Jain

发表机构 * Nokia Bell Labs（诺基亚贝尔实验室）

AI总结针对实际应用中常见的少类数据集，提出基于数据属性的分类难度度量，实现比传统方法快6-29倍的模型选择，并扩展模型族至更小规模，在移动机器人等场景中提升效率。

Comments 36 pages, 9 tables, 13 figures

详情

AI中文摘要

尽管大量工作集中在开发和基准测试高性能神经网络上，但较少关注已知的数据集属性如何指导高效的模型选择。神经网络模型通常在数千类数据集上评估，然而许多实际应用涉及少于十类。为了解决这一被忽视但常见的情况，我们基于数据侧属性开发了一种分类难度度量，并展示了它如何为少类数据集实现更高效的模型选择，而传统方法在此效果较差。我们将此现象称为“少类独特性”。我们的度量允许比重复训练和测试快6到29倍的模型和数据集比较。利用这一洞察，我们将缩放模型族扩展到已发布的最小模型以下，在相似精度下实现更高效率，例如在移动机器人任务中模型比YOLOv5-nano小42%。针对资源受限的应用，我们在移动机器人、无人机和物联网场景中展示了少类模型选择，突出了在不牺牲性能的情况下效率的实际提升。

英文摘要

While much effort has focused on developing and benchmarking high-performance neural networks, less attention has been given to how dataset properties, known to practitioners, can guide efficient model selection. Neural models are typically evaluated on datasets with thousands of classes, yet many real-world applications involve fewer than ten. To address this understudied but common setting, we develop a measure of classification difficulty based on data-side properties and show how it enables more efficient model selection for few-class datasets, where traditional approaches are less effective. We term this phenomenon "few-class distinctiveness". Our metric allows comparison of models and datasets 6 to 29$\times$ faster than repeated training and testing. Leveraging this insight, we extend scaled model families below the smallest published models, achieving greater efficiency at similar accuracy, for example models up to 42% smaller than YOLOv5-nano for a mobile robot task. Targeting resource-constrained applications, we demonstrate few-class model selection across mobile robot, drone, and IoT scenarios, highlighting practical gains in efficiency without sacrificing performance.

URL PDF HTML ☆

赞 0 踩 0

2606.19711 2026-06-19 cs.RO cs.LG cs.SY eess.SY 新提交

A Differentiable Composite Approximation Framework for Autonomous Underwater Vehicle Maneuvering Modeling from Sea-Trial Data

一种可微复合近似框架：基于海试数据的自主水下航行器机动建模

Aobo Wang, Aifei Xia, Zihao Wang, Lizhu Hao

发表机构 * College of Shipbuilding Engineering, Harbin Engineering University（哈尔滨工程大学船舶工程学院）； China Academy of Aerospace Aerodynamics（中国航天空气动力技术研究院）； Institute of Artificial Intelligence, Shanghai University（上海大学人工智能研究院）； China Ship Scientific Research Center（中国船舶科学研究中心）

AI总结提出可微复合近似框架，结合多项式基与数据自适应基联合校准，并引入转向运动电流估计补偿，提升AUV机动预测精度。

详情

AI中文摘要

基于机载测量的场建模可以生成反映真实运行特性的自主水下航行器（AUV）机动模型。从近似角度看，传统机动模型使用预定义的约束多项式基，而数据驱动模型使用数据自适应基。受此基函数视角启发，本文提出一种可微复合近似公式，其中多项式基分量和数据自适应基分量被视为单个预测器的可微部分并联合校准。开发了一种基于梯度的协同校准方法用于全尺寸AUV机动预测，其中灵敏度感知机制调节有界多项式更新，而神经残差在共享预测目标下捕获剩余非线性差异。为了考虑现场数据中的海流效应，引入了一种基于转向运动的电流估计和补偿程序，以构建电流补偿的学习目标用于训练和滚动预测。该框架使用从7米长AUV在多种机动条件下收集的海试数据进行评估。结果表明，与纯多项式、纯神经网络和冻结先验混合基线相比，所提方法改进了递归轨迹和速度预测，证明了其在基于现场数据的AUV机动建模中的适用性。

英文摘要

Field-based modeling from onboard measurements can produce autonomous underwater vehicle (AUV) maneuvering models that reflect real operating characteristics. From an approximation perspective, conventional maneuvering models use predefined constraint polynomial bases, whereas data-driven models use data-adaptive bases. Motivated by this basis-function view, this paper presents a differentiable composite-approximation formulation, in which the polynomial-basis component and the data-adaptive basis component are treated as differentiable parts of a single predictor and calibrated jointly. A gradient-based co-calibration method is developed for full-scale AUV maneuvering prediction, where a sensitivity-aware mechanism regulates bounded polynomial updates while the neural residual captures remaining nonlinear discrepancies under a shared prediction objective. To account for ocean-current effects in field data, a turning-motion-based current estimation and compensation procedure is incorporated to construct current-compensated learning targets for training and rollout. The framework is evaluated using sea-trial data collected from a 7-meter AUV under multiple maneuvering conditions. Results show that the proposed method improves recursive trajectory and velocity prediction compared with polynomial-only, neural-only, and frozen-prior hybrid baselines, demonstrating its applicability to field-data-based AUV maneuvering modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.19710 2026-06-19 cs.CL cs.AI 新提交

FineREX: Fine-Tuned NER-RE for Human Smuggling Knowledge Graphs

FineREX: 面向人口走私知识图谱的微调NER-RE

Elijah Feldman, Dipak Meher, Carlotta Domeniconi

发表机构 * Thomas Jefferson High School for Science and Technology（托马斯·杰斐逊科技高中）

AI总结提出FineREX，一个基于微调LLM的流水线，用于从法律文档中提取实体和关系构建知识图谱，在F1分数上分别提升15.50%和31.46%，并减少50%处理时间。

Comments Code available at https://github.com/ElijahFeldman7/FineREX

详情

AI中文摘要

法庭记录包含关于人口走私网络的有价值证据，但这些信息通常埋藏在非结构化的、充满术语的法律文件中。虽然大型语言模型（LLM）可以通过自动信息提取支持知识图谱构建，但现有方法依赖通用模型，未针对该领域所需的实体和关系定义进行定制。我们提出FineREX，一个精简的知识图谱构建流水线，基于微调的LLM进行命名实体识别和关系提取（NER-RE）。使用包含512个文本块的手动标注数据集，FineREX在实体和关系F1分数上分别比更大的通用基线模型绝对提高了15.50%和31.46%。这些提升转化为更高质量的知识图谱，将法律噪声减少近一半，并将长文档上的节点重复率从17.78%降至11.17%。通过消除文档重写和冗余提取阶段，FineREX还将端到端处理时间减少了50.0%。我们的结果表明，领域特定的微调可以显著优于更大的通用模型，同时提高非法网络分析知识图谱构建的质量和效率。

英文摘要

Court proceedings contain valuable evidence about human smuggling networks, but this information is often buried within unstructured, jargon-heavy legal documents. While large language models (LLMs) can support knowledge graph construction through automated information extraction, existing approaches rely on general-purpose models that are not tailored to the entity and relationship definitions required in this domain. We introduce FineREX, a streamlined knowledge graph construction pipeline built around a fine-tuned LLM for named entity recognition and relationship extraction (NER-RE). Using a manually annotated dataset of $512$ text chunks, FineREX achieves absolute improvements of 15.50% and 31.46% in entity and relationship F1-score, respectively, compared to a larger general-purpose baseline. These gains translate into higher-quality knowledge graphs, reducing legal noise by nearly half and lowering node duplication on long documents from 17.78% to 11.17%. By eliminating document rewriting and redundant extraction stages, FineREX also reduces end-to-end processing time by 50.0%. Our results demonstrate that domain-specific fine-tuning can substantially outperform larger general-purpose models while improving both the quality and efficiency of knowledge graph construction for illicit network analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.19706 2026-06-19 cs.CV cs.CL 新提交

高效表示链式思维Transformer中的算法

Yanhong Li, Anej Svete, Ashish Sabharwal, William Merrill

发表机构 * Allen Institute for AI（艾伦人工智能研究所）； ETH Zürich（苏黎世联邦理工学院）

AI总结本文证明链式思维Transformer能以多对数开销高效模拟Word RAM算法，包括排序和Dijkstra算法，优于模拟图灵机的二次开销。

详情

AI中文摘要

推理模型（即在产生答案前输出一系列推理或思维token的语言模型）日益流行，部分原因在于理论结果表明链式思维（CoT）Transformer可以模拟图灵机，从而执行任意计算。然而，图灵机虽然适用于复杂性理论分析，但在讨论算法时并不方便、直观或高效。算法通常在更高的抽象层次上设计和分析，即具有随机访问存储器和单位成本操作（对$\bigO(\log n)$位字）的Word RAM模型。因此，Word RAM算法可能比其图灵机对应物更高效，这引出了一个问题：CoT Transformer能否高效模拟Word RAM算法？例如，它们能否在$\bigO(n \log n)$步内对n个元素排序，或在$\bigO(E + V \log V)$步内运行Dijkstra算法？我们给出肯定回答，开销不超过多对数。我们首先为具有多对数宽度和最右唯一硬注意力的有限精度Transformer建立这一结果，然后将结果推广到两个更实际的设置：有限宽度和对数精度：连续CoT（其中推理采用向量而非token形式）和混合架构（其中Transformer层位于循环（线性RNN）层之上）。在所有三种情况下，我们发现CoT可以高效模拟任何Word RAM算法，仅需在n上多对数开销。当Word RAM具有“平坦”指令集时，此开销降至对数平方，而对于无乘法平坦指令仅需对数开销——这与已知的CoT模拟图灵机（需要二次开销）形成鲜明对比。

英文摘要

The increasing popularity of \emph{reasoning} models -- language models that output a series of reasoning or thought tokens before producing an answer -- is justified, in part, by theoretical results showing that chain-of-thought (CoT) transformers can simulate Turing machines, and thus perform arbitrary computation. However, the Turing machine, while suitable for complexity-theoretic analysis, is not convenient, intuitive, or efficient for discussing algorithms. Algorithms are typically designed and analyzed at a higher level of abstraction, captured by the \emph{Word RAM} model with random-access memory and unit-cost operations on $\bigO(\log n)$-bit words. As a result, Word RAM algorithms can be substantially more efficient than their Turing machine counterparts, raising the question: \emph{Can CoT transformers efficiently simulate Word RAM algorithms?} For instance, can they sort $n$ items in $\bigO(n \log n)$ steps or run Dijkstra's algorithm in $\bigO(E + V \log V)$ steps? We answer affirmatively, up to poly-logarithmic overhead. We first establish this for finite-precision transformers with poly-logarithmic width and rightmost unique hard attention, then strengthen the result to two more practical settings with finite width and log-precision: \emph{continuous} CoT, where reasoning takes the form of vectors rather than tokens, and a \emph{hybrid} architecture in which transformer layers sit atop a recurrent (linear RNN) layer. In all three cases, we find that CoT \emph{can} efficiently simulate any Word RAM algorithm with only a poly-logarithmic overhead in $n$. This overhead reduces to log-square when the Word RAM has a ``flat'' instruction set, and only logarithmic for multiplication-free flat instructions -- in stark contrast to known CoT simulations of Turing machines, which require quadratic overhead over Word RAM.

URL PDF HTML ☆

赞 0 踩 0

2606.19688 2026-06-19 cs.SD eess.AS 新提交

Latency-Configurable Streaming Speech Enhancement via Asymmetric Temporal Padding

通过非对称时间填充实现延迟可配置的流式语音增强

Yunsik Kim, Yoonyoung Chung

发表机构 * Department of Electrical Engineering, Pohang University of Science and Technology (POSTECH)（电气工程系，浦项科技大学）； Intus Co. Ltd.（Intus有限公司）

AI总结提出LaCo-SENet，通过非对称时间填充和双缓冲流式机制，在单一超参数下实现延迟与质量的灵活权衡，在VoiceBank+DEMAND上以1.37M参数获得12.5-75.0ms延迟范围，PESQ从3.35到3.43。

Comments 5 pages, 3 figures. Accepted for presentation at Interspeech 2026

详情

AI中文摘要

流式语音增强需要在算法延迟和质量之间取得平衡，但现有方法大多将其视为因果与非因果的二元选择。LaCo-SENet通过单个训练时超参数参数化的两种机制解决了这个问题。首先，非对称时间填充重新分配卷积中的过去和未来上下文，实现系统性的延迟配置。其次，双缓冲流式结合了过去上下文的状体缓冲区和在输入和特征层面提供未来上下文的超前缓冲区。选择性状态更新还防止未来帧泄漏到流式状态中，确保训练-推理一致性。在VoiceBank+DEMAND上，固定预算（1.37M参数）的主干网络产生了覆盖12.5-75.0毫秒的模型系列，PESQ从3.35上升到3.43。在仅12.5毫秒（完全因果）时，PESQ为3.35，达到或超过了先前的因果最先进水平（46.5毫秒时为3.27）。

英文摘要

Streaming speech enhancement requires balancing algorithmic latency against quality, yet existing approaches largely treat this as a binary causal versus non-causal choice. LaCo-SENet addresses this issue with two mechanisms parameterized by a single training-time hyperparameter. First, asymmetric temporal padding redistributes past and future context in convolutions, enabling systematic latency configuration. Second, dual-buffer streaming combines state buffers for past context with lookahead buffers that supply future context at both the input and feature levels. Selective state updates also prevent future-frame leakage into the streaming state, ensuring training-inference consistency. On VoiceBank+DEMAND, a fixed-budget (1.37M parameters) backbone yields a family of models spanning 12.5-75.0 ms, with PESQ rising from 3.35 to 3.43. At just 12.5 ms (fully causal), a PESQ of 3.35 matches or exceeds the prior causal state-of-the-art (3.27 at 46.5 ms).

URL PDF HTML ☆

赞 0 踩 0

2606.19687 2026-06-19 cs.RO 新提交

Route-Constrained Robust Fusion Estimation for MEMS/GNSS Integrated Navigation of Unmanned Ground Vehicles in GNSS Degraded Environments

MEMS/GNSS组合导航中无人地面车辆在GNSS退化环境下的路径约束鲁棒融合估计

Jingzhi Cui, Chao Zhang, Yuliang Mao, Shaolin Lü, Dongmei Li, Huan Che, Rong Zhang

发表机构 * State Key Laboratory of Precision Space-time Information Sensing Technology, Tsinghua University（清华大学精密时空信息感知技术国家重点实验室）； Xiaomi Inc.（小米公司）

AI总结针对GNSS信号严重遮挡下结构化道路环境中无人地面车辆的累积定位漂移，提出一种鲁棒的路径约束状态估计方法，利用历史航位推算轨迹与高精地图匹配生成伪位置观测，通过扩展卡尔曼滤波持续注入道路级约束，抑制位置偏差并改善方位估计。

Comments Accepted workshop paper, 1st Workshop on Robot Meets GNSS and Ranging for Seamless Autonomy, IEEE ICRA 2026

Journal ref 1st Workshop on Robot Meets GNSS and Ranging for Seamless Autonomy, IEEE ICRA 2026, Vienna, Austria, June 5, 2026

详情

AI中文摘要

为了解决在严重全球导航卫星系统信号遮挡下结构化道路环境中无人地面车辆的累积定位漂移问题，本文提出了一种鲁棒的路径约束状态估计方法。在无卫星信号期间，该方法建立了历史航位推算轨迹与从高精地图中提取的任务路线局部段之间的对应关系，并通过二维刚性变换估计出路线参考位置。然后将估计的位置作为伪位置观测，纳入扩展卡尔曼滤波更新中。这样，道路级的路径约束可以持续注入到统一的状态估计框架中，从而抑制相对于任务路线的位置偏差，同时间接改善方位估计。为了增强实际适用性，进一步引入了触发控制、匹配质量验证、路径偏移补偿和单次更新修正限制等工程策略。在三个代表性场景（长隧道、多段隧道和弯曲隧道）中的实验表明，所提方法有效抑制了卫星中断期间的误差累积，降低了最大偏差过大的风险，并提高了定位连续性和道路级可用性。

英文摘要

To address cumulative localization drift of unmanned ground vehicles in structured road environments under severe Global Navigation Satellite System signal occlusion, this paper proposes a robust route-constrained state estimation method. During periods without satellite signals, the proposed method establishes the correspondence between the historical dead reckoning trajectory and local segments of the mission route extracted from a high-definition map, and estimates a route-referenced position via a two-dimensional rigid transformation. The estimated position is then formulated as a pseudo-position observation and incorporated into an Extended Kalman Filter update. In this way, route constraints at the road level can be continuously injected into a unified state estimation framework, thereby suppressing position deviation relative to the mission route while indirectly improving azimuth estimation. To enhance practical applicability, engineering strategies, such as trigger control, matching quality validation, route offset compensation, and single update correction limiting, are further introduced. Experiments in three representative scenarios, including a long tunnel, a multi-segment tunnel, and a curved tunnel, show that the proposed method effectively suppresses error accumulation during satellite outages, reduces the risk of large maximum deviation, and improves localization continuity and road-level usability.

URL PDF HTML ☆

赞 0 踩 0

2606.19684 2026-06-19 cs.CV 新提交

Exploring Multi-Modal Large Language Models and Two-Stage Fine-Tuning for Fashion Image Retrieval

探索多模态大语言模型与两阶段微调在时尚图像检索中的应用

Nguyen Cao Hoang, Hoang Bui Le, Nam Vo Hoang, Trung-Nghia Le

发表机构 * University of Science, VNU-HCM（胡志明市国家大学下属理科大学）； Vietnam National University, Ho Chi Minh（胡志明市国家大学）

AI总结提出融合多模态大语言模型（LLaVA）生成属性感知三元组，并采用两阶段微调策略增强对比学习，以解决时尚图像检索中标注数据稀缺和负采样简单的问题。

Comments SOICT 2025

2606.19683 2026-06-19 cs.AI cs.MA cs.SY eess.SY 新提交

Exit-and-Join Dynamics for Decentralized Coalition Formation

去中心化联盟形成的退出与加入动力学

Quanyan Zhu

发表机构 * New York University Tandon School of Engineering（纽约大学坦登工程学院）； Department of Electrical and Computer Engineering（电气与计算机工程系）

AI总结研究基于单边退出与加入决策的去中心化联盟形成动力学，利用Aumann-Dreze值计算个体收益，建立合作支付分配与非合作最优反应的关联，并分析均衡特征及成本对局部稳定性的影响。