arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2094
专题追踪
2606.19971 2026-06-19 cs.RO 新提交

Evaluation of Augmented Reality-based Intuitive Interface for Robot-Assisted Transesophageal Echocardiography: A User Study

基于增强现实的机器人辅助经食管超声心动图直观界面评估:用户研究

Xiu Zhang*, Matteo Di Mauro*, Sofia Breschi, Angela Peloso, Emiliano Votta, Arianna Menciassi, Elena De Momi

AI总结 本研究提出并评估了一种基于增强现实的直观界面,用于机器人辅助经食管超声心动图,通过3D可视化与尖端控制显著提升空间精度并降低操作误差。

详情
AI中文摘要

经食管超声心动图(TEE)对于诊断和引导结构性心脏病(SHD)介入治疗至关重要。然而,手动TEE操作需要操作者具备丰富的专业技能,体力消耗大,并且在透视下操作会使临床医生暴露于辐射中。机器人辅助TEE系统已被引入以改进探头操作并减少操作者疲劳,但直观有效的用户界面设计仍是一个开放挑战。本研究提出并评估了一种模型增强的、基于增强现实(AR)的直观界面,用于机器人辅助TEE,旨在提高空间意识和控制直观性。使用集成电磁跟踪和虚拟模拟器的机器人TEE平台,比较了三种在可视化和交互模式上不同的用户界面:2D关节级(2D-JI)、3D关节级(3D-JI)和3D尖端级(3D-TI)。36名参与者执行标准化导航任务以再现目标超声心动图视图,通过位置和方向误差、完成时间和NASA-TLX工作量评分评估性能。结果表明,3D可视化显著提高了空间精度,与2D界面相比,中位位置误差从13毫米减少到3毫米,方向误差减半。尖端级交互相比关节级控制,方向误差进一步降低50%,并减少了用户间变异性。总体而言,3D-TI配置结合了沉浸式可视化与直接尖端级控制,被证明是最有效且符合人体工程学的界面,支持将基于AR的可视化和直观控制范式集成到下一代机器人TEE系统中,以增强操作者性能和手术安全性。

英文摘要

TransEsophageal Echocardiography (TEE) is essential for diagnosing and guiding Structural Heart Disease (SHD) interventions. However, manual TEE manipulation demands significant operator expertise, is physically demanding, and exposes clinicians to radiation when performed alongside fluoroscopy. Robotic-assisted TEE systems have been introduced to improve probe handling and reduce operator fatigue, yet the design of intuitive and effective user interfaces remains an open challenge. This study presents and evaluates a model-enhanced, Augmented Reality (AR)-based intuitive interface for robot-assisted TEE, designed to improve spatial awareness and control intuitiveness. A robotic TEE platform integrated with electromagnetic tracking and a virtual simulator was used to compare three user interfaces differing in visualization and interaction modalities: 2D jointlevel (2D-JI), 3D joint-level (3D-JI), and 3D tip-level (3D-TI). Thirty six participants performed standardized navigation tasks to reproduce target echocardiographic views, with performance assessed via position and orientation errors, completion time, and NASA-TLX workload scores. Results show that 3D visualization significantly improved spatial accuracy, reducing median position error from 13 mm to 3 mm and halving the orientation error compared with the 2D interface. Tip-level interaction yielded a further 50% reduction in orientation error and reduced interuser variability relative to joint-level control. Overall, the 3D-TI configuration, combining immersive visualization with direct tip-level control, proved the most effective and ergonomic interface, supporting the integration of AR-based visualization and intuitive control paradigms into next-generation robotic TEE systems to enhance operator performance and procedural safety.

2606.19946 2026-06-19 cs.CL cs.LG 新提交

GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

GEMS: 几何约束使LLM中多语义叠加成为可能

Yu Deng

AI总结 提出GEMS方法,通过范数保持加权叠加、目标注意力路径注入和实时正交化两个几何约束,解决无训练多方向激活干预中的分布偏差和方向干扰问题,在GSM8K上保持98%准确率。

Comments 30 pages, 5 figures, 20 tables. Code and logs are available at: https://github.com/LuLu663939/gems-multi-semantic-steering

详情
AI中文摘要

激活引导通过在推理时修改中间隐藏状态来控制模型行为,无需重新训练。现有方法仅处理单方向注入;当多个语义方向无约束叠加时,模型崩溃。我们证明这种崩溃分解为两个独立作用的来源:分布偏差(加法扰动在层间累积范数并将激活推出训练分布)和方向干扰(非正交语义向量叠加时相互抑制)。这两个来源定义了任何无训练多方向干预必须满足的设计约束。作为这些原则的一个实例,我们提出GEMS,一种无训练方法,将每个来源映射到相应的几何约束:针对分布偏差的范数保持加权叠加和目标注意力路径注入,以及针对方向干扰的实时正交化。在GSM8K上,注入三个并发非数学方向保持98%的准确率(基线92%),而无约束加法崩溃至4%;在Wikitext-2上,相同注入仅导致2.2%的PPL增加。组件消融隔离了每个约束的因果作用,层级探针确认正交化信号通过FFN路径存活并以语义特异性到达输出分布。定性引导效果跨架构从3B到31B迁移。

英文摘要

Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed without constraints, the model collapses. We show that this collapse decomposes into two independently acting sources: distributional deviation, where additive perturbations accumulate in norm across layers and drive activations outside the training distribution, and directional interference, where non-orthogonal semantic vectors mutually dampen when superposed. These two sources define the design constraints that any training-free multi-directional intervention must address. As one instantiation of these principles, we propose GEMS, a training-free method that maps each source to a corresponding geometric constraint: norm-preserving weighted superposition and targeted attention-pathway injection for distributional deviation, and real-time orthogonalization for directional interference. On GSM8K, injecting three concurrent non-mathematical directions preserves accuracy at 98% (baseline 92%), while unconstrained addition collapses to 4%; on Wikitext-2, the same injection incurs only 2.2% PPL increase. Component ablation isolates the causal role of each constraint, and layer-level probes confirm that orthogonalized signals survive the FFN pathway and reach the output distribution with semantic specificity. Qualitative steering effects transfer across architectures from 3B to 31B.

2606.19769 2026-06-19 cs.RO cs.AI 新提交

Data Standards for Humanoid Robotics: The Missing Infrastructure for Physical AI

人形机器人数据标准:物理AI缺失的基础设施

Shaoshan Liu, Xiugong Qin, Xuan Wu, Xuan Xia, Ning Ding, Jialu Liu, Jie Tang

AI总结 本文论证数据标准是人形机器人可扩展性的关键基础设施,通过提出ISO/WD 26264-1标准,解决数据非累积性问题,使具身经验可解释、可共享、可追溯和可复用。

详情
AI中文摘要

人形机器人的可扩展性不仅取决于模型和硬件,还取决于物理经验能否在机器人、任务、组织及时间维度上积累。基于作者在ISO/TC 299/WG 16内制定ISO/WD 26264-1《人形机器人数据集——第1部分:通用要求》的工作,本文论证数据标准正成为物理AI的基础设施。我们提出三个见解:第一,人形机器人数据是具身交互数据,而非孤立数字样本的集合;有用的数据集必须保留机器人本体、动作、任务、场景、执行轨迹和结果之间的关系。第二,其价值取决于物理一致性:多模态流仅在时序、坐标系、标定、运动学、单位和同步假设可检查时才可复用。第三,主要瓶颈不仅是数据稀缺,更是由高采集成本、数据孤岛和不一致评估导致的非累积性数据。我们认为人形机器人数据标准通过使具身经验可解释、可共享、可追溯和可复用来解决这些瓶颈。通用标准应为生命周期管理、元数据、来源、质量、版本控制和可追溯性提供横向基础设施,而能力特定部分应定义操作、移动、人机交互、认知及未来人形能力的领域语法。随着AI从屏幕进入实体,数据标准必须从组织数字信息演变为结构化物理交互。

英文摘要

The scalability of humanoid robots will depend not only on models and hardware, but also on whether physical experience can accumulate across robots, tasks, organizations, and time. Drawing on the authors' work in developing ISO/WD 26264-1, Humanoid robot datasets -- Part 1: General requirements, within ISO/TC 299/WG 16, this article argues that data standards are becoming foundational infrastructure for Physical AI. We develop three insights. First, humanoid robot data is embodied interaction data, not a collection of isolated digital samples; a useful dataset must preserve the relationship among robot body, action, task, scene, execution trace, and outcome. Second, its value depends on physical coherence: multimodal streams are reusable only when timing, coordinate frames, calibration, kinematics, units, and synchronization assumptions remain inspectable. Third, the main bottleneck is not only data scarcity, but non-cumulative data caused by high collection costs, data silos, and inconsistent evaluation. We argue that humanoid robot data standards address these bottlenecks by making embodied experience interpretable, shareable, traceable, and reusable. A general standard should provide horizontal infrastructure for lifecycle management, metadata, provenance, quality, versioning, and traceability, while capability-specific parts should define domain grammar for manipulation, locomotion, human-robot interaction, cognition, and future humanoid capabilities. As AI moves from screens into bodies, data standards must evolve from organizing digital information to structuring physical interaction.

2606.19690 2026-06-19 cs.LG 新提交

Multi-Granular Attention-Driven Reinforcement Learning Framework for Web Intelligent Enhancement Systems

多粒度注意力驱动的强化学习框架用于Web智能增强系统

Navin Chhibber, Deepak Singh, Anokh Kishore, Nikita Chawla, K. Anguraj

AI总结 提出MGAR-WIES框架,通过语义图建模、注意力机制和自适应强化学习,解决Web环境中异构动态数据的语义理解与可扩展性问题,在准确率上达到80%。

Comments 2026 3rd International Conference on Integrated Intelligence and Communication Systems (ICIICS), 6 Pages

详情
AI中文摘要

近年来,Web智能增强系统越来越依赖异构和动态的Web数据来提供个性化的上下文感知服务。然而,传统的机器学习、深度学习和强化学习模型在持续演化的Web环境中往往难以应对语义理解、适应性和可扩展性的挑战。本研究提出了一种基于多粒度注意力的强化Web智能增强系统(MGAR-WIES),通过集成语义图建模、注意力机制和自适应强化学习来应对这些挑战。首先,收集包括结构化、半结构化和非结构化来源的异构Web数据,并进行预处理以生成统一特征表示。这些表示被转换为动态语义图,其中实体及其关系通过注意力机制增强的图嵌入进行建模,以捕捉局部相关性和全局上下文依赖。随后,一种自适应多智能体强化学习策略利用注意力感知的语义状态来优化个性化Web动作,如内容推荐、导航优化和服务自适应。最后,持续在线反馈被进一步集成,以实时更新图表示和学习策略,确保持续的适应性和性能。与现有方法相比,提出的MGAR-WIES在准确率(80%)方面取得了更好的结果。

英文摘要

From the past few years, web intelligent enhancement systems increasingly rely on heterogeneous and dynamic web data to deliver personalized, context-aware services. However, traditional machine learning, deep learning, and reinforcement learning models often struggle with semantic understanding, adaptability, and scalability in continuously evolving web environments. In this research, a Multi-Granular Attention-based Reinforcement Web Intelligent Enhancement System (MGAR-WIES) is proposed to address the challenges by integrating semantic graph modeling, attention mechanisms, and adaptive reinforcement learning. Initially, heterogeneous web data comprising structured, semi-structured and unstructured sources are collected and preprocessed for generating unified feature representations. These representations are transformed into a dynamic semantic graph, where entities and their relationships are modeled by using graph embeddings enhanced by attention mechanisms for capturing both local relevance and global contextual dependencies. Subsequently, an adaptive multi-agent reinforcement learning strategy leverages the attention-aware semantic states to optimize personalized web actions like content recommendation, navigation optimization, and service adaptation. Finally, the continuous online feedback is further integrated to update graph representations and learning policies in real time by ensuring sustained adaptability and performance. The proposed MGAR-WIES acheived better results in terms of accuracy (80%) when compared with existing approaches.

2606.19676 2026-06-19 cs.CV cs.AI 新提交

TeleMorpher: Toward Robust Simultaneous Motion-Location Editing

TeleMorpher: 迈向鲁棒的同步运动-位置编辑

Haengbok Chung

AI总结 提出TeleMorpher,一种基于扩散模型的一步式框架,通过运动先验、姿态扭曲和基线运动编辑器注入,实现视频中主角运动与位置的同步编辑,在定量和定性评估中表现优异。

详情
AI中文摘要

扩散模型在图像和视频生成与编辑中取得了显著成功。尽管最近的研究将工作扩展到运动编辑,但同步变换运动与位置——尽管具有实际重要性——仍基本未被探索。为了更好地理解鲁棒的运动-位置编辑,我们首先分析了降低其质量的根本因素。基于此分析,我们提出了TeleMorpher,据我们所知,这是首个用于同步运动-位置编辑的一步式框架之一。我们的方法利用运动先验(从现成模型生成的目标运动中心视频作为运动编辑指导)和真实运动,实现更可控和精确的运动-位置编辑。通过这种方式,我们的框架工作如下:(1) 首先通过预训练的分割和修复模型分离主角和背景。(2) 然后,我们引入一种无需训练的姿势扭曲,以运动先验为指导编辑主角的运动。(3) 扭曲运动视频的结果在推理时直接注入基线运动编辑器,减轻源运动与目标运动之间的差异,同时保留源视频的外观。(4) 为提高定量评估的可靠性,我们提出了两个新的基于LPIPS的指标,分别测量运动编辑前后背景一致性以及通过测量从源视频和目标视频中提取的主角骨架差异来评估运动编辑性能的保真度。在野外视频和TaiChi数据集上的实验表明,TeleMorpher在定量和定性测量(真实人类评估)中均取得了优越性能,凸显了其有效性。

英文摘要

Diffusion models have achieved remarkable success in image and video generation and editing. While recent studies have extended these efforts toward motion editing, simultaneously transforming both motion and location-despite its practical importance-remains largely unexplored. To better understand robust motion-location editing, we first analyze the fundamental factors that degrade its quality. Based on this analysis, we propose TeleMorpher, one of the first one-shot frameworks to the best of our knowledge, for simultaneous motion-location editing. Our approach leverages motion priors, a target motion-centric video generated from an off-the-shelf model as motion-editing guidance, and the ground truth motion to enable more controllable and precise motion-location editing. Via this, our framework works as follows: (1) we first disentangle the protagonist and the background via pre-trained segmentation and inpainting models. (2) Then, we introduce a training-free pose warping that edits the protagonist's motion with the motion prior as the guidance. (3) The result of warped motion video is directly injected into a baseline motion editor during inference, mitigating the difference between source and target motions while preserving the appearance of the source video. (4) To enhance the reliability of quantitative evaluations, we propose two new LPIPS-based metrics that measure the background consistency before and after the motion editing and the fidelity of motion editing performance via measuring the difference between the extracted protagonist's skeletons from source and target videos. Experiments with in-the-wild videos and the TaiChi dataset demonstrate that TeleMorpher achieves superior performance across both quantitative and qualitative measurements (real-human evaluation), underscoring its effectiveness.

2606.19638 2026-06-19 cs.CL 新提交

MiqraBERT: Regression-Based Sentence-BERT Finetuning for Biblical Hebrew Parallel Detection

MiqraBERT:基于回归的Sentence-BERT微调用于圣经希伯来语平行检测

David M. Smiley

AI总结 提出MiqraBERT模型,通过余弦相似度回归微调Sentence-BERT,在圣经希伯来语中检测文本平行,将分布分离度提升2.7倍,重叠区域从24%降至6%。

详情
AI中文摘要

文本复用遍及希伯来圣经,但用于检测的计算方法仍主要依赖词汇重叠,一旦平行涉及释义、词汇替换或句法重组,这些方法就会失效。本文介绍MiqraBERT,一个从AlephBERT(现代希伯来语编码器)微调而来的Sentence-BERT模型,用于圣经希伯来语的诗句级语义相似度。训练集包含1,650个标注的诗句和半诗句对:825个来自编年史同源材料和诗歌平行基础研究的真实平行,与825个随机采样的负例平衡。通过余弦相似度回归,模型学习到一个嵌入空间,其中平行诗句聚集在一起,无关诗句彼此远离。我们使用基于分布的指标、Wasserstein距离和重叠系数,在十个随机种子上评估分离度。MiqraBERT将分布分离度比预训练基线提高了2.7倍,并将模糊重叠区域从约24%减少到约6%。叙事同源平行的召回@10达到87.1%;诗歌平行仍然困难,低于9%。这种依赖于体裁的不对称性将模型的可靠范围限制在叙事文本复用。MiqraBERT在此https URL公开可用。

英文摘要

Textual reuse pervades the Hebrew Bible, yet the computational methods used to detect it still rest largely on lexical overlap, and they falter once a parallel involves paraphrase, lexical substitution, or syntactic reworking. This paper introduces MiqraBERT, a Sentence-BERT model finetuned from AlephBERT (a Modern Hebrew encoder) for verse-level semantic similarity in Biblical Hebrew. The training set comprises 1,650 labeled verse and half-verse pairs: 825 true parallels drawn from the Chronicles synoptic material and from foundational studies of poetic parallelism, balanced against 825 randomly sampled negatives. Through cosine-similarity regression, the model learns an embedding space in which parallel verses cluster together and unrelated verses move apart. We evaluate separation with distribution-based metrics, Wasserstein distance and the overlap coefficient, across ten random seeds. MiqraBERT improves distributional separation 2.7-fold over the pre-trained baseline and reduces the ambiguous overlap region from roughly 24% to about 6%. Narrative synoptic parallels reach a recall@10 of 87.1%; poetic parallels remain difficult, below 9%. This genre-dependent asymmetry confines the model's reliable scope to narrative textual reuse. MiqraBERT is publicly available at https://huggingface.co/davidmsmiley/MiqraBERT

2606.19568 2026-06-19 cs.SD cs.AI 新提交

Exploring Feature Extraction Technique Parameters for Acoustic Gunshot Classification

声学枪声分类的特征提取技术参数探索

Sinclair Gurny, Ryan Quinn

AI总结 本文系统研究了特征提取技术及其参数对声学枪声分类的影响,使用ResNet-18在23000条枪声数据集上评估,发现正确技术可提升top-1准确率20%,参数优化可再提升4.7%。

详情
AI中文摘要

声学枪声检测是一个在民用公共安全、军事行动和野生动物保护中都有应用的问题,但该领域缺乏对特征提取技术的严格探索,且未关注对现实数据的泛化能力。商业枪声检测与分类系统的混合有效性表明,当前文献未能充分解决这一开放问题。在本文中,我们使用包含85种枪械和21种口径的23000条枪声记录数据集,对常见特征提取技术进行了系统研究。我们使用ResNet-18对三种特征提取技术及其12个独特参数集进行了基准测试。结果表明,使用正确的特征提取技术可将top-1准确率提升高达20%,而针对给定特征提取技术使用正确的参数可进一步提升高达4.7%。

英文摘要

Acoustic gunshot detection is a problem with applications across civilian public safety, military operations, and wildlife conservation, yet the field lacks a rigorous exploration of feature extraction techniques with a focus on generalization to realistic data. The mixed effectiveness of commercial gunshot detection and classification systems indicates an open problem that is not adequately addressed by the current literature. In this paper, we present a systematic investigation of common feature extraction techniques using a dataset of 23,000 gunshot recordings across 85 firearms and 21 calibers. We benchmark three feature extraction techniques with 12 total unique parameter sets using ResNet-18. Our results demonstrate that using the correct feature extraction technique can improve top-1 accuracy by up to 20%, and utilizing the correct parameters for a given feature extraction technique can improve that value by up to 4.7%.

2606.19525 2026-06-19 cs.RO 新提交

A Categorial and Sheaf-Theoretic Semantics for Autonomic Component Ensembles

自主组件集合的范畴与层论语义

Manuel Hernández, Eduardo Sánchez-Soto

AI总结 针对自主组件集合语言SCEL,提出基于范畴论和层论的多层数学模型,将机器人社会建模为拓扑空间上的层,通过层上同调量化系统故障,将分布式系统验证转化为几何分析。

详情
AI中文摘要

大规模、去中心化的自主代理系统(如机器人集群和网络化信息物理系统)的激增对传统形式化方法提出了严峻挑战。软件组件集合语言(SCEL)为这类系统提供了形式化模型,但其操作语义不适合推理全局、结构和涌现属性。本报告利用范畴论和层论为SCEL提出了一种新的多层数学模型。我们认为,用SCEL描述的机器人社会可以形式化地建模为拓扑空间上的层,其中组件是点,集合是开集,分布式知识构成层的数据。在此框架下,信息共享等计算过程等价于“粘合”局部数据的层论操作。系统故障可以被理解并量化为拓扑障碍,通过层上同调可测量。该方法将复杂分布式系统的验证转化为数学对象的几何分析,为设计鲁棒的自主系统提供了深刻的结构性见解。

英文摘要

The proliferation of large-scale, decentralized systems of autonomous agents, such as swarms of robots and networked cyber-physical systems, presents a formidable challenge to traditional formal methods. The Software Component Ensemble Language (SCEL) offers a formal model for such systems, but its operational semantics is not ideal for reasoning about global, structural, and emergent properties. This report proposes a new, multi-layered mathematical model for SCEL using category theory and sheaf theory. We argue that a society of robots described in SCEL can be formally modeled as a sheaf on a topological space, where components are points, ensembles are open sets, and distributed knowledge forms the sheaf's data. In this framework, computational processes like information sharing become equivalent to the sheaf-theoretic operation of "gluing" local data. System failures can then be understood and quantified as topological obstructions, measurable by sheaf cohomology. This approach transforms the verification of a complex distributed system into the analysis of the geometry of a mathematical object, providing deep, structural insights for the design of robust autonomic systems.

2606.19397 2026-06-19 cs.RO 新提交

DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy

DiffusionVS:基于扩散策略的鲁棒视觉伺服生成框架

Hongkang Cui, Rui He, Haoyao Chen

AI总结 提出基于扩散策略的视觉伺服方法,通过条件去噪生成相机速度,并采用在线训练增强泛化能力,仿真成功率近100%,物理实验93%。

Comments 8 pages, 4 figures, 7 tables

详情
AI中文摘要

视觉伺服是机器人操作和导航中的基础技术。基于回归的视觉伺服常因噪声敏感的单步映射和分布偏移时的误差累积而出现轨迹抖动。相比之下,扩散策略通过预测动作序列保持时间一致性,并通过隐式数据增强提高鲁棒性。本文提出一种新颖的基于扩散的伺服方法。基于扩散策略,该方法使用观测标签角点的归一化图像坐标作为输入,通过条件去噪生成相机速度。为了克服在静态数据集上训练的模型的泛化限制,采用了在线训练范式,通过交互经验收集持续扩展训练数据的多样性。该策略显著提升了模型的性能和泛化能力。全面的仿真和实际实验证明了该方法的有效性,在仿真中实现了近100%的成功率,在物理实验中达到93%。除了具体的流程,我们进一步验证了扩散机制的通用性。实验表明,现有的视觉伺服网络在与我们的扩散模块集成时,性能持续提升。这些结果表明,所提出的策略具有广泛的适用性,能够增强除本文具体架构之外的各种视觉伺服系统。

英文摘要

Visual servoing is a fundamental technique in robotic manipulation and navigation. Regression-based visual servoing frequently experiences trajectory jitter as a result of noise-sensitive single-step mappings and the accumulation of errors during distribution shifts. In contrast, Diffusion Policy maintains temporal consistency by predicting action sequences and improves robustness through implicit data augmentation. This paper presents a novel diffusion-based servoing method. Based on Diffusion Policy, the proposed approach uses normalized image coordinates of observed tag corners as input and generates camera velocity through conditional denoising. To overcome the generalization limitations of models trained on static datasets, an online training paradigm is adopted, continuously expanding the diversity of training data through interactive experience collection. This strategy substantially enhances both the performance and generalization capability of the model. Comprehensive simulations and real-world experiments demonstrate the effectiveness of the proposed method, achieving success rates of nearly 100\% in simulation and 93\% in physical experiments. Beyond the specific pipeline, we further validate the generality of the diffusion mechanism. Experiments show that existing visual servoing networks consistently achieve improved performance when integrated with our diffusion-based module. These results indicate that the proposed strategy possesses broad applicability and can enhance various visual servoing systems beyond the specific architecture presented here.

2606.20325 2026-06-19 cs.LG cs.SC math.DS 新提交

Recurrent neural networks approximate continuous functions

递归神经网络近似连续函数

Valentin Abadie, Clemens Hutter, Helmut Bölcskei

AI总结 本文证明,对于[-1,1]上的任意连续函数,存在一个固定权重和隐藏维度的ReLU递归神经网络,其时间演化可以均匀逼近该函数,并给出了收敛速率和极小极大下界。

详情
AI中文摘要

经典逼近定理要求每当目标精度提高时,就需要一个新的神经网络。本文研究相反的可能性:能否一劳永逸地选择网络,而仅通过让其运行更长时间来换取精度?我们证明这对于[-1,1]上的每个连续函数都是可能的。更准确地说,每个这样的函数都可以通过一个具有固定权重和固定隐藏维度的单ReLU递归神经网络的时间演化来均匀逼近。该构造背后的机制是一个新的中间模型——带神经单元的图灵机(TMNU)。该模型保留了实现多项式逼近方案所需的算法自由度,同时保持足够的刚性,以便被具有显式隐藏维度和权重幅度界限的RNN模拟。由此产生的收敛速率反映了底层多项式逼近的速率。我们通过极小极大下界补充了该构造,表明运行时间不仅仅是证明的产物,而是这种固定网络逼近范式中不可避免的资源。

英文摘要

Classical approximation theorems ask for a new neural network whenever the target accuracy is improved. This paper studies the opposite possibility: can the network be chosen once and for all, and can accuracy be bought only by letting it run longer? We prove that this is possible for every continuous function on [-1,1]. More precisely, each such function is uniformly approximated by the time evolution of a single ReLU recurrent neural network with fixed weights and fixed hidden dimension. The mechanism behind the construction is a new intermediate model, the Turing machine with neural units (TMNU). This model retains the algorithmic freedom needed to implement polynomial approximation schemes, while remaining rigid enough to be simulated by RNNs with explicit bounds on hidden dimension and weight magnitude. The resulting convergence rates reflect the underlying polynomial approximation rates. We complement the construction with minimax lower bounds showing that runtime is not merely a proof artifact, but an unavoidable resource in this fixed-network approximation paradigm.

2606.19876 2026-06-19 cs.LG math.OC 新提交

Global Convergence of Gradient Descent for Score Matching in Gaussian Mixtures via Reverse Fisher Divergence

通过反向Fisher散度实现高斯混合模型中得分匹配的梯度下降全局收敛

Alexander Tyurin

AI总结 研究反向Fisher散度下梯度下降拟合高斯混合模型的全局收敛性,证明从任意初始化或随机初始化下学生分量收敛到最近教师分量,并给出全变差距离收敛条件。

详情
AI中文摘要

得分匹配问题是现代生成建模、扩散模型、拟合非归一化统计模型和逆问题中的核心训练目标。标准方法是最小化前向Fisher散度,其中期望相对于教师分布取。然而,最近结果表明,即使在简单的高斯混合模型设置中,该目标也可能导致不良且依赖初始化的收敛行为。本文研究另一种目标:反向Fisher散度,其中期望相对于学生分布取。我们分析梯度下降(GD)拟合高斯混合模型,并表明目标函数的这一改变导致显著更好的优化性质。首先,当教师分布是单个高斯分布且学生是固定权重和单位协方差的高斯混合模型时,我们证明了从任意初始化出发GD的全局收敛性。其次,我们将分析扩展到教师也是高斯混合模型的情况,并在全局随机初始化方案和目标均值满足$\widetilde{\Omega}(1)$-分离假设下证明了全局收敛保证。特别地,以高概率,每个学生分量收敛到其最近的教师分量,并且我们提供了学生分布在全变差距离下收敛的条件。我们的证明依赖于基于Lyapunov的梯度下降动力学新分析,表明反向Fisher散度比前向Fisher散度具有更有利的优化景观。

英文摘要

The score matching problem is a central training objective in modern generative modeling, diffusion models, fitting unnormalized statistical models, and inverse problems. A standard approach is to minimize the forward Fisher divergence, where the expectation is taken with respect to the teacher distribution. However, recent results show that even in simple Gaussian mixture model settings, this objective can lead to undesirable and initialization-dependent convergence behavior. In this paper, we study an alternative objective: the reverse Fisher divergence, where the expectation is taken with respect to the student distribution. We analyze gradient descent (GD) for fitting Gaussian mixture models and show that this change in the objective leads to significantly better optimization properties. First, when the teacher distribution is a single Gaussian and the student is a Gaussian mixture model with fixed weights and identity covariances, we prove the global convergence of GD from arbitrary initializations. Second, we extend the analysis to the case where the teacher is also a Gaussian mixture model and prove global convergence guarantees under a global random initialization scheme and a $\widetildeΩ(1)$-separation assumption on the target means. In particular, with high probability, each student component converges near its closest teacher component, and we provide conditions under which the student distribution converges in total variation distance. Our proofs rely on a new Lyapunov-based analysis of the gradient descent dynamics, showing that the reverse Fisher divergence has a much more favorable optimization landscape than the forward Fisher divergence.

2606.20347 2026-06-19 cs.LG cond-mat.dis-nn 新提交

Critical Percolation as a Synthetic Data Model for Interpretability

临界渗流作为可解释性的合成数据模型

Aryeh Brill, Tom Ingebretsen Carlson

AI总结 提出基于临界平均场渗流簇的层次函数合成数据集,具有稀疏、分形和幂律分布特性,支持几乎线性时间算法生成任意规模数据,可用于评估可解释性方法。

Comments 21 pages, 10 figures, accepted to the Mechanistic Interpretability Workshop at ICML 2026

详情
AI中文摘要

神经网络学习反映自然数据层次化、多尺度结构的特征。用于评估可解释性方法的合成数据集通常缺乏这种结构,限制了其作为现实玩具模型的价值。为弥补这一差距,我们引入了一系列合成数据集,由定义在高维数据空间中嵌入的临界平均场渗流簇上的层次函数组成。渗流数据由稀疏、低维的分形簇组成,具有幂律大小分布。模拟分类层次结构的潜变量生成每个数据点的目标值。该数据模型在分析上易于处理,具有已知的临界指数,无需超参数调整即可固定其属性。我们利用渗流簇、随机树和加法凝聚之间的映射,提出了一种几乎线性时间的算法,用于联合采样随机树及其层次潜变量分解,从而能够生成任意规模的数据。通过探测实验,我们发现模型的地面真值潜变量可以从神经网络激活中线性解码。稀疏性、自相似性、幂律统计和分析可处理性共同使临界渗流成为可解释性研究的原理性测试平台。

英文摘要

Neural networks learn features that reflect the hierarchical, multi-scale structure of natural data. Synthetic datasets used to evaluate interpretability methods typically lack this structure, limiting their value as realistic toy models. To close this gap, we introduce a family of synthetic datasets consisting of hierarchical functions defined on critical mean-field percolation clusters embedded in a high-dimensional data space. The percolation data consists of sparse, low-dimensional fractal clusters with a power-law size distribution. Latent variables modeling a taxonomic hierarchy generate each data point's target value. The data model is analytically tractable with known critical exponents that fix its properties without requiring hyperparameter tuning. We leverage a mapping between percolation clusters, random trees, and additive coalescence to propose an almost linear-time algorithm to jointly sample a random tree and its hierarchical latent decomposition, enabling data generation at arbitrary scale. Using probing experiments, we find that the model's ground-truth latent variables can be linearly decoded from neural network activations. Together, sparsity, self-similarity, power-law statistics, and analytical tractability make critical percolation a principled testbed for interpretability research.

2606.18951 2026-06-19 cs.RO 新提交

A High-accuracy Event-based Underwater SLAM System

高精度事件相机水下SLAM系统

Yifan Peng, Qihang Liu, Haoying Li, Yuzhe Li, Junfeng Wu, Ziyang Hong

AI总结 针对事件相机水下SLAM中时间曲面成像质量差和匹配失败问题,提出基于结构感知度量和贝叶斯优化的高精度立体SLAM系统,并贡献首个高质量水下事件数据集UWE。

详情
AI中文摘要

虽然事件相机为水下SLAM提供了巨大潜力,但现有的基于时间曲面(TS)的方法在水下部署时被证明非常不可靠。波动的相机速度严重降低了TS成像质量,而宽立体基线和重复的水下纹理导致关键匹配失败,频繁引发系统崩溃。为克服这些挑战,我们开发了首个高精度事件相机水下立体SLAM系统。基于结构张量相干性和梯度,设计了一种结构感知度量来定量评估TS结构信息密度。通过将最优TS生成解耦为基于系统初始化的两个不同阶段,贝叶斯优化(BO)在初始化前首先预测最优先验TS,同时我们设置异步在线局部搜索方法,在跟踪阶段实时获取合适的TS。我们使用先验视差保证精确的数据关联,并采用“最新观测优先”三角测量机制实现稳定三角测量。作为这些解决方案的基准和社区资源,我们还贡献了UWE,这是首个高质量真实世界水下事件数据集,包含变化的相机运动、复杂纹理和不同轨迹特征。在公共数据集和UWE上的广泛评估表明,所提出的SLAM系统与最先进的事件相机方法相比具有竞争力的精度性能。代码和数据将开源。

英文摘要

While event cameras offer immense potential for underwater SLAM, existing Time Surface (TS)-based methods prove highly unreliable when deployed underwater. Fluctuating camera velocities severely degrade TS imaging quality, while wide stereo baselines and repetitive underwater textures induce critical matching failures, frequently triggering system failure. To overcome these challenges, we develop the first high-accuracy event-based underwater stereo SLAM system. A structure-aware metric for TS is designed based on structure tensor coherence and gradients to quantitatively evaluate TS structural information density. By decoupling the optimal TS generation into two distinct stages based on system initialization, Bayesian Optimization(BO) first predicts an optimal prior TS sequentially before initialization while we set an asynchronous online local searching method periodically to obtain appropriate TS in real-time during the tracking stage. We use the prior disparity to guarantee precise data association and "latest-observation-first'' triangulation mechanism to realize stable triangulation. As a benchmark for these solutions and a resource for the community, we also contribute UWE, the first high-quality real-world underwater event dataset containing variable camera motions, complex textures and different trajectory features. Extensive evaluations on public datasets and UWE show the competitive accuracy performance of the proposed SLAM system compared to the state-of-the-art event-based method. The code and data will be open-sourced.

2606.12500 2026-06-19 cs.LG cs.AI 新提交

Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

基于机器学习的微观仿真从模拟交通冲突改进碰撞频率预测

Xian Liu, Carlo G. Prato, Gustav Markkula

AI总结 本文利用机器学习行为模型替代传统规则模型进行交通微观仿真,通过极端值理论分析模拟冲突预测碰撞频率,在英国利兹五个信号交叉口验证了ML模型无需地点校准即可提升预测准确性。

详情
AI中文摘要

交通微观仿真结合替代安全措施越来越多地被用作历史碰撞数据的主动替代方案,用于预测当前或计划道路基础设施设计的碰撞频率。然而,现有的基于微观仿真的安全研究采用了简化的基于规则的行为模型,这些模型能较好地再现交通流,但往往无法生成真实的冲突动态,限制了碰撞预测的准确性。机器学习(ML)行为模型的最新进展提供了一个有希望的机会,通过直接从大规模轨迹数据集中学习人类驾驶行为,可能提高微观仿真的真实性和碰撞频率预测。为了研究这种可能性,我们对英国利兹的五个真实信号交叉口进行了交通微观仿真,使用了标准的基于规则模型和最先进的ML模型。使用二维碰撞时间指标分析模拟车辆轨迹以识别模拟冲突,然后使用极端值理论建模以预测碰撞频率。结果表明,ML模型的冲突产生的碰撞预测与实际碰撞数据一致,而基于规则的模型由于缺乏对特定模拟交叉口的模型校准,无法产生有意义的预测。直接使用ML生成的模拟碰撞来预测实际碰撞频率也产生了较差的结果,这表明尽管当前的ML模型可以真实地再现冲突,但尚不能生成真实的碰撞。总体而言,研究结果表明,基于ML的行为模型在无需特定地点模型校准的情况下,有望从模拟冲突中改进碰撞预测,并为基于ML的交通微观仿真指明了明确的未来方向。

英文摘要

Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequency for current or planned road infrastructure designs. However, existing microsimulation-based safety studies have adopted simplified rule-based behaviour models, which reproduce traffic flow reasonably well but often fail to generate realistic conflict dynamics, limiting crash prediction accuracy. Recent advances in machine learning (ML)-based behaviour models offer a promising opportunity to potentially improve microsimulation realism and crash frequency predictions by learning human driving behaviour directly from large-scale trajectory datasets. To investigate this possibility, traffic microsimulation was conducted for five real-world signalised intersections in Leeds, UK, using both a standard rule-based model and a state-of-the-art ML model. Simulated vehicle trajectories were analysed using a two-dimensional Time-to-Collision metric to identify simulated conflicts, which were then modelled using Extreme Value Theory to predict crash frequency. Results show that conflicts from the ML model yielded crash predictions in line with the real-world crash data, whereas the rule-based model did not permit meaningful predictions, presumably due to a lack of model calibration to the specific simulated intersections. Directly using ML-generated simulated crashes to predict real-world crash frequency also yielded poor results, suggesting that while current ML models can realistically reproduce conflicts, they are not yet able to generate realistic crashes. Overall, the findings demonstrate that ML-based behaviour models are promising for improving crash prediction from simulated conflicts, without a need for location-specific model calibration, and suggest clear future directions for ML-based traffic microsimulation.

2606.10136 2026-06-19 cs.CV 新提交

iSAGE: A Human-in-the-Loop Framework for Remote Sensing Semantic Segmentation via Sparse Point Supervision

iSAGE: 一种通过稀疏点监督进行遥感语义分割的人机协同框架

Osmar Luiz Ferreira de Carvalho, Osmar Abilio de Carvalho Junior, Anesmar Olino de Albuquerque, Daniel Guerreiro e Silva

AI总结 提出iSAGE框架,通过专家点击模型错误像素而非任意像素,无需辅助机制即可匹配密集监督,在BsB Aerial和ISPRS Vaihingen数据集上以极低标注率达到与密集监督相当的性能。

Comments 47 pages, 8 tables, 6 figures

详情
AI中文摘要

遥感中的语义分割需要昂贵的像素级标注,且由于模型很少能在传感器、平台或地理区域间迁移,几乎每个问题都需要新的数据集。现有的人机协同框架通过辅助机制(伪标签、传播、CRF、基础模型提示、辅助头)将稀疏点击扩展为密集监督,这些机制均基于模型的预测分布。在该分布中,一个自信的错误像素与一个自信的正确像素在结构上无法区分,因此任何读取该分布的规则都无法区分两者;区分信号位于模型外部。本文假设,专家针对模型错误(而非任意像素)的点击足以匹配密集监督,无需扩展机制。iSAGE(基于专家指导的迭代稀疏标注)在一个集成的开源平台上实现了这一假设,其中错误加权损失放大了每次点击的梯度,而标注记录本身即为数据集,可扩展、可纠正、可审计。实验采用最小努力策略:每帧每类最多一个标注像素。在BsB Aerial上,iSAGE恢复了密集监督的97.2%(在0.040%的像素上达到74.79% mIoU),并呈现出对比性的类别动态:无定形类别(渗透区域)从种子点开始饱和,而小类别(汽车)需要后期迭代的努力。在ISPRS Vaihingen(外部基准)上,iSAGE以0.011%的像素达到76.78% mIoU,匹配密集基线(76.65%)并超越所有已发表方法。在相同流程下,四种输出读取机制(预算1-100倍的oracle熵、阈值0.90-0.99的伪标签、基于CRF的传播、均匀随机)比iSAGE低7.4至14.5个百分点。在调查的31种方法中,iSAGE是唯一无需辅助机制即可运行的迭代式人机协同框架。

英文摘要

Semantic segmentation in remote sensing requires costly pixel-level annotations, and nearly every problem demands a new dataset since models rarely transfer across sensors, platforms, or geographies. Existing human-in-the-loop frameworks expand sparse clicks into dense supervision via auxiliary machinery (pseudo-labels, propagation, CRFs, foundation-model prompts, auxiliary heads), all operating on the model's predictive distribution. A confidently wrong pixel is indistinguishable from a confidently correct one in that distribution by construction, so no rule reading it can separate the two; the distinguishing signal is external to the model. This paper hypothesizes that expert clicks targeting confident model errors, not arbitrary pixels, suffice to match dense supervision, with no expansion machinery. iSAGE (Iterative Sparse Annotation Guided by Expert) realizes this hypothesis on an integrated open-source platform, where an error-weighted loss amplifies the gradient at each click and the annotation record itself is the dataset, extensible, correctable, and auditable. Experiments use a minimum-effort regime: at most one labeled pixel per class per frame. On BsB Aerial, iSAGE recovers 97.2% of dense supervision (74.79% mIoU on 0.040% of pixels) with contrasting class dynamics: amorphous classes (permeable areas) saturate from the seed, while small classes (cars) require late-iteration effort. On ISPRS Vaihingen (external benchmark), iSAGE reaches 76.78% mIoU with 0.011% of pixels, matching the dense baseline (76.65%) and exceeding all published methods. Under the same pipeline, four output-reading mechanisms (oracle entropy across budgets 1--100x, pseudo-labels across thresholds 0.90--0.99, CRF-based propagation, uniform random) plateau 7.4 to 14.5 pp below iSAGE. Across 31 surveyed methods, iSAGE is the only iterative human-in-the-loop framework operating without auxiliary machinery.

2606.11171 2026-06-19 cs.LG cond-mat.stat-mech cs.IT math.IT math.OC math.ST stat.TH 新提交

Indexed Bellman Information Complexity

核赌博机中的算法与极小极大复杂度

Yunbei Xu

AI总结 本文通过统一MAIR框架,将GP-UCB与MAMS算法置于共同语言下,提出结合两者优势的安全主算法,并证明在过参数化模型中算法复杂度比类宽极小极大或DEC证书更具信息性。

详情
AI中文摘要

高斯过程上置信界(GP-UCB)和决策估计系数(DEC)方法乍看之下可能属于不同的理论。本文将这两种观点置于一个共同的算法信息语言中,用于频率学派RKHS赌博机。GP-UCB固定了一个算法性的(而非真实的)高斯过程先验,并利用实现轨迹的复杂度以及计算可处理性,而MAMS优化了一个鲁棒的类宽MAIR/DEC包络。通过统一的MAIR框架和异质半正定算法先验,我们推广了GP-UCB分析和MAMS算法,提出了一种结合两者优势的安全主算法,并提供了一个核赌博机构造,表明在过参数化模型中算法复杂度可以比类宽极小极大或DEC证书更具信息性。由此得出的信息是:算法信息和类宽极小极大系数回答不同的问题,并可能导致不同的差距;核赌博机提供了一个干净的环境,使得这种区别在数学上变得可见。

英文摘要

We develop indexed Bellman information complexity, a representation-level theory of interactive decision making centered on information indices and reference histories. The representation strips away problem-specific syntax and retains only the ingredients needed for dynamic programming and information accounting, thereby unifying the earlier framework of indexed algorithmic information ratios (AIR). On the upper-bound side, regret is controlled by Bellman supersolutions or potential identities whose gradient bracket is paid for by indexed information. Upper-confidence-bound (UCB), estimation-to-decision/decision-estimation-coefficient (E2D/DEC), and adaptive-minimax-sampling or exploration-by-optimization (AMS/EBO) methods appear as three relaxations of this same identity. On the lower-bound side, the posterior-reference trajectory supplies both the information telescope and the ghost quantile of small-regret trajectories. The resulting critical radius in the lower bound is an effective-dimension-scale quantity, as in Fano and local-prior-mass lower bounds, rather than the constant radius of a two-point Le Cam argument. The examples show that DEC is best viewed as a one-step relaxation of indexed Bellman information complexity, not as a universally tight conversion mechanism. We illustrate the framework through several applications, with particular emphasis on kernel bandits. In this setting, the active action marginal provides a concrete basis for comparing UCB, E2D, and AMS/EBO.

2605.13438 2026-06-19 cs.AI cs.CL 版本更新

CogniFold: Always-On Proactive Memory via Cognitive Folding

CogniFold: 通过认知折叠实现始终在线的主动记忆

Suli Wang, Yiqun Duan, Yu Deng, Rundong Zhao, Dai Shi, Minghua Deng, Chen Chen, Xinliang Zhou

AI总结 提出CogniFold,一种受大脑启发的主动记忆系统,通过将互补学习系统扩展为三层(海马体、新皮层、前额叶意图层)并利用图拓扑自组织,实现事件流的持续认知结构涌现,在认知评估和常规记忆基准上均表现优异。

Comments Code is available at https://github.com/OpenNorve/CogniFold

详情
AI中文摘要

现有的智能体记忆主要仍是被动反应式和基于检索的,缺乏自主将经验组织成持久认知结构的能力。为了迈向真正自主的智能体,我们引入了CogniFold,一种受大脑启发的“始终在线”智能体记忆,专为下一代主动助手设计。CogniFold持续将碎片化事件流折叠成自涌现的认知结构,从传入事件和积累的知识中逐步引导出更高层次的认知。我们通过将互补学习系统(CLS)理论从两层(海马体、新皮层)扩展到三层,增加了一个前额叶意图层来奠定基础。模仿前额叶皮层作为意图控制和决策制定的中心,CogniFold通过图拓扑自组织实现这一点:认知结构在事件流下主动组装,语义相似时合并,过时时衰减,通过联想回忆重新链接,并在概念簇密度超过阈值时浮现意图。我们使用CogEval-Bench评估结构形成,证明CogniFold独特地产生了符合认知期望和概念涌现的记忆结构。此外,在跨越五个认知领域的7个广泛覆盖的基准测试中,我们验证了CogniFold在常规记忆基准上同时表现出稳健的性能。

英文摘要

Existing agent memory remains predominantly reactive and retrieval-based, lacking the capacity to autonomously organize experience into persistent cognitive structure. Toward genuinely autonomous agents, we introduce CogniFold, a brain-inspired "always-on" agent memory designed for the next generation of proactive assistants. CogniFold continuously folds fragmented event streams into self-emerging cognitive structures, bootstrapping progressively higher-level cognition from incoming events and accumulated knowledge. We ground this by extending Complementary Learning Systems (CLS) theory from two layers (hippocampus, neocortex) to three, adding a prefrontal intent layer. Emulating the prefrontal cortex as the locus of intentional control and decision-making, CogniFold achieves this through graph-topology self-organization: cognitive structures proactively assemble under the stream, merge when semantically similar, decay when stale, relink through associative recall, and surface intents when concept-cluster density crosses a threshold. We evaluate structural formation using CogEval-Bench, demonstrating that CogniFold uniquely produces memory structures that match cognitive expectations and concept emergence. Furthermore, across eight downstream benchmarks -- two probing long-term conversational memory (LoCoMo, LongMemEval) and six spanning other cognitive domains -- we validate that CogniFold simultaneously performs robustly on conventional memory tasks. Our code is available at https://github.com/OpenNorve/CogniFold.

2605.05481 2026-06-19 cs.LG 版本更新

Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL

近似下一策略采样:替代深度强化学习中的保守目标策略更新

Dillon Sandhu, Ronald Parr

AI总结 提出近似下一策略采样(ANPS)方法,通过修改训练分布而非约束策略更新来解决强化学习中的“鸡生蛋”问题,并基于此设计稳定值近似策略迭代(SV-API)算法,在Atari和连续控制任务上实现更大目标策略更新且性能匹配或提升。

详情
AI中文摘要

我们重新审视强化学习中一个经典的“鸡生蛋”问题:为了安全地改进策略,价值函数必须在更新策略的状态访问分布上准确。该状态分布是未知的,且无法为训练价值函数而采样。保守更新解决了这个问题,但代价是缩小策略更新。本文探索了一种替代方案,即近似下一策略采样(ANPS),它通过修改训练分布而非约束策略更新来解决问题。如果训练数据的分布近似于下一策略的分布,则ANPS成立。为了证明ANPS的可行性和有效性,我们引入了稳定值近似策略迭代(SV-API)。SV-API修改了标准的近似策略迭代循环,在迭代更新的行为策略收集相关经验的同时,保持目标策略固定。它仅在满足收敛准则后才承诺采用新策略。如果满足某些稳定性准则,则更新保证是安全的;否则,其安全性不低于标准近似策略迭代。将SV-API应用于PPO得到稳定值PPO(SV-PPO),在高维离散(Atari)和连续控制基准测试中,SV-PPO在执行显著更大的目标策略更新的同时,性能匹配或提升。这些结果证明了ANPS作为RL中这一经典挑战的新解决方案的可行性。

英文摘要

We revisit a classic "chicken-and-egg" problem in reinforcement learning: to safely improve a policy, the value function must be accurate on the state-visitation distribution of the updated policy. That distribution over states is unknown and cannot be sampled for the purposes of training the value function. Conservative updates solve this problem, but at the cost of shrinking the policy update. This paper explores an alternative solution, Approximate Next Policy Sampling (ANPS), which addresses the problem by modifying the training distribution rather than constraining the policy update. ANPS is satisfied if the distribution of the training data approximates that of the next policy. To demonstrate the feasibility and efficacy of ANPS, we introduce Stable Value Approximate Policy Iteration (SV-API). SV-API modifies the standard approximate policy iteration loop to hold the target policy fixed while an iteratively updated behavioral policy gathers relevant experience. It only commits to a new policy once a convergence criterion has been met. If certain stability criteria are met, the update is guaranteed to be safe; otherwise, it remains no less safe than standard approximate policy iteration. Applying SV-API to PPO yields Stable Value PPO (SV-PPO), which matches or improves performance on high-dimensional discrete (Atari) and continuous control benchmarks while executing substantially larger target policy updates. These results demonstrate the viability of ANPS as a new solution to this classic challenge in RL.

2602.13139 2026-06-19 cs.CL 版本更新

OpenLID-v3: Improving the Precision of Closely Related Language Identification -- An Experience Report

OpenLID-v3:提高近亲语言识别精度的经验报告

Mariia Fedorova, Nikolay Arefyev, Maja Buljan, Jindřich Helcl, Stephan Oepen, Egil Rønningstad, Yves Scherrer

AI总结 针对现有语言识别工具对近亲语言和噪声区分困难的问题,通过增加训练数据、合并问题语言变体簇和引入噪声标签扩展OpenLID分类器,提出OpenLID-v3,在多个基准上提升精度。

Comments VarDial'26 workshop at the EACL 2026 conference

详情
AI中文摘要

语言识别(LID)是从网络数据构建高质量多语言数据集的关键步骤。现有的LID工具(如OpenLID或GlotLID)通常难以识别近亲语言,也难以区分有效自然语言与噪声,这污染了特定语言子集,尤其是低资源语言。在本工作中,我们通过增加更多训练数据、合并有问题的语言变体簇以及引入一个专门标记噪声的标签来扩展OpenLID分类器。我们将这个扩展系统称为OpenLID-v3,并在多个基准上将其与GlotLID进行评估。在开发过程中,我们重点关注三组近亲语言(波斯尼亚语、克罗地亚语和塞尔维亚语;意大利北部和法国南部的罗曼语变体;以及斯堪的纳维亚语言),并在现有评估数据集不足的地方贡献了新的评估数据集。我们发现集成方法提高了精度,但也显著降低了对低资源语言的覆盖。OpenLID-v3可在该https URL上获取。

英文摘要

Language identification (LID) is an essential step in building high-quality multilingual datasets from web data. Existing LID tools (such as OpenLID or GlotLID) often struggle to identify closely related languages and to distinguish valid natural language from noise, which contaminates language-specific subsets, especially for low-resource languages. In this work we extend the OpenLID classifier by adding more training data, merging problematic language variant clusters, and introducing a special label for marking noise. We call this extended system OpenLID-v3 and evaluate it against GlotLID on multiple benchmarks. During development, we focus on three groups of closely related languages (Bosnian, Croatian, and Serbian; Romance varieties of Northern Italy and Southern France; and Scandinavian languages) and contribute new evaluation datasets where existing ones are inadequate. We find that ensemble approaches improve precision but also substantially reduce coverage for low-resource languages. OpenLID-v3 is available on https://huggingface.co/HPLT/OpenLID-v3.

2510.27568 2026-06-19 cs.AI cs.CL 版本更新

SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning

SIGMA: 搜索增强的按需知识集成用于智能体数学推理

Ali Asgarov, Umid Suleymanov, Aadyant Khatri

AI总结 提出SIGMA框架,通过多智能体独立推理、定向搜索和协调机制,实现上下文敏感的知识集成,在MATH500等基准上提升7.4%的绝对性能。

Comments AAAI 2026 LMReasoning

详情
AI中文摘要

解决数学推理问题不仅需要准确访问相关知识,还需要仔细的多步骤思考。然而,当前的检索增强模型通常依赖单一视角,遵循僵化的搜索策略,并且难以有效结合来自多个来源的信息。我们提出了SIGMA(搜索增强的按需知识集成用于智能体数学推理),这是一个统一框架,通过协调机制编排专门智能体独立推理、执行定向搜索并综合发现。每个智能体生成假设段落以优化其分析视角的检索,确保知识集成既上下文敏感又计算高效。在MATH500、AIME和博士级科学问答GPQA等具有挑战性的基准测试中,SIGMA持续优于开源和闭源系统,实现了7.4%的绝对性能提升。我们的结果表明,多智能体按需知识集成显著提高了推理准确性和效率,为复杂、知识密集型问题解决提供了可扩展的方法。代码将在发表后公开。

英文摘要

Solving mathematical reasoning problems requires not only accurate access to relevant knowledge but also careful, multi-step thinking. However, current retrieval-augmented models often rely on a single perspective, follow inflexible search strategies, and struggle to effectively combine information from multiple sources. We introduce SIGMA (Search-Augmented On-Demand Knowledge Integration for AGentic Mathematical reAsoning), a unified framework that orchestrates specialized agents to independently reason, perform targeted searches, and synthesize findings through a moderator mechanism. Each agent generates hypothetical passages to optimize retrieval for its analytic perspective, ensuring knowledge integration is both context-sensitive and computation-efficient. When evaluated on challenging benchmarks such as MATH500, AIME, and PhD-level science QA GPQA, SIGMA consistently outperforms both open- and closed-source systems, achieving an absolute performance improvement of 7.4%. Our results demonstrate that multi-agent, on-demand knowledge integration significantly enhances both reasoning accuracy and efficiency, offering a scalable approach for complex, knowledge-intensive problem-solving. We will release the code upon publication.

2507.05169 2026-06-19 cs.LG cs.AI cs.CL cs.CV cs.RO 版本更新

Critique of World Model

世界模型批判:一种用于世界建模的生成式潜在预测架构

Eric Xing, Mingkai Deng, Jinyu Hou

AI总结 本文从心理学“假设性思维”出发,提出世界模型的核心目标是模拟真实世界的所有可行动可能性,并设计了一种基于状态化、分层、多级、混合连续/离散表示的生成式潜在预测(GLP)架构。

详情
AI中文摘要

世界模型,即生物智能体所经历并对其采取行动的真实世界环境的算法模拟器,近年来因开发具有人工(通用)智能的虚拟智能体的需求日益增长而成为一个新兴课题。关于世界模型究竟是什么、如何构建、如何使用以及如何评估,已有许多讨论。本文从著名科幻经典《沙丘》中的想象出发,并借鉴心理学文献中“假设性思维”的概念,论证世界模型的主要目标是模拟真实世界中所有可行动的可能性,以进行有目的的推理和行动。我们审视了世界建模的关键设计维度:数据、表示、架构、学习目标和使用,调查了现有方法并分析了它们的权衡。在此基础上,我们提出了一种新的通用世界模型生成式潜在预测(GLP)架构,基于有状态的、分层的、多层次的、混合连续/离散表示,以及生成式和自监督学习框架,并展望了由这种模型支持的物理、智能体和嵌套(PAN)AGI系统。

英文摘要

World Model, the algorithmic simulator of the real-world environment which biological agents experience and act upon, has been an emerging topic in recent years due to the rising need to develop virtual agents with artificial (general) intelligence. There has been much discussion on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of ``hypothetical thinking'' in psychology literature, we argue the primary goal of a world model to be {\it simulating all actionable possibilities of the real world for purposeful reasoning and acting}. We examine the key design dimensions of world modeling: data, representation, architecture, learning objective, and usage, surveying existing approaches and analyzing their tradeoffs. Building on this examination, we propose a new Generative Latent Prediction (GLP) architecture for a general-purpose world model, based on stateful, hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervised learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

2606.19943 2026-06-19 eess.IV cs.AI 新提交

SIMBA: ABidirectional Retrieval Forward Simulation Framework for Modeling FY-4A GIIRS Hyperspectral Infrared Radiances Toward NWP Applications

SIMBA:面向NWP应用的FY-4A GIIRS高光谱红外辐射双向检索正向模拟框架

Jingdong Shen, Fu Wang*, Qifeng Lu, Hao Huang, Chunqiang Wu, Chi Yang, Xiaofang Liu

AI总结 提出SIMBA框架,联合进行大气廓线检索和辐射重建,通过循环一致性约束和双向Mamba模块增强耦合,在FY-4A GIIRS数据上优于多种深度学习基线。

详情
AI中文摘要

高光谱红外观测是数值天气预报(NWP)的重要数据源,因为它们提供了大气温度和湿度垂直结构的丰富信息。然而,现有的深度学习方法主要关注从辐射到大气廓线的单向检索,而反向辐射模拟过程以及大气状态空间与辐射观测空间之间的一致性考虑不足。在本研究中,我们提出了SIMBA,一个用于FY-4A GIIRS高光谱红外辐射建模的统一双向检索-正向模拟框架,面向NWP应用。该框架联合执行大气廓线检索和辐射重建,引入循环一致性约束以加强两个过程之间的耦合,并采用双向Mamba状态空间模块来捕捉沿气压层的长程依赖。利用配准的FY-4A GIIRS观测和ERA5再分析数据,该方法在温度检索、比湿检索、长波辐射重建和中波辐射重建上进行了评估。实验结果表明,SIMBA在检索和重建任务上均优于多个代表性深度学习基线,而消融实验证实了双向设计和循环一致性机制的贡献。这些结果表明,所提出的框架对于联合大气廓线检索和高光谱红外辐射建模是有效的,并显示出未来在雅可比相关分析和面向NWP扩展方面的潜力。

英文摘要

Hyperspectral infrared observations are an important data source for numerical weather prediction (NWP) because they provide rich information on the vertical structure of atmospheric temperature and humidity. However, most existing deep learning methods mainly focus on one-way retrieval from radiances to atmospheric profiles, while the reverse radiance simulation process and the consistency between atmospheric state space and radiance observation space are insufficiently considered. In this study, we propose SIMBA, a unified bidirectional retrieval-forward simulation framework for FY-4A GIIRS hyperspectral infrared radiance modeling toward NWP applications. The framework jointly performs atmospheric profile retrieval and radiance reconstruction, introduces a cycle-consistency constraint to strengthen the coupling between the two processes, and employs a bidirectional Mamba state-space module to capture long-range dependencies along pressure levels. Using collocated FY-4A GIIRS observations and ERA5 reanalysis data, the proposed method is evaluated for temperature retrieval, specific humidity retrieval, long-wave radiance reconstruction, and medium-wave radiance reconstruction. Experimental results show that SIMBA outperforms several representative deep learning baselines across both retrieval and reconstruction tasks, while ablation experiments confirm the contribution of the bidirectional design and cycle-consistency mechanism. These results demonstrate that the proposed framework is effective for joint atmospheric profile retrieval and hyperspectral infrared radiance modeling, and suggest potential for future Jacobian-related analysis and NWP-oriented extensions.

2606.19574 2026-06-19 eess.IV cs.CV 新提交

FrequencyFormer: A Co-Designed Sensor-to-Processor Pipeline for Frequency-Domain Vision Transformer Inference

FrequencyFormer: 面向频域视觉Transformer推理的协同设计传感器到处理器流水线

Chengwei Zhou, Ovishake Sen, Xuming Chen, Rishith Paramasivam, Shaahin Angizi, Swarup Bhunia, Baibhab Chatterjee, Gourav Datta

AI总结 提出FrequencyFormer,通过多尺度DCT标记化将图像压缩为频域令牌,结合近传感器LUT硬件和低功耗通信架构,实现高达128倍数据压缩和28.8 TOPS/W能效,兼容多种视觉任务。

详情
AI中文摘要

在传感器边缘系统上部署视觉Transformer(ViT)不仅受限于设备计算能力,还受限于从传感器到处理器传输高维图像数据所需的能量和带宽。虽然传感器内和近传感器计算通过早期特征提取降低了这一成本,但现有方法通常仅提供适度的压缩。我们观察到频域提供了视觉信息的自然紧凑表示,并且可以在传感器级别利用以减少传感器到处理器的数据移动。基于这一见解,我们提出了FrequencyFormer,一种用于高效ViT推理的协同设计传感器到处理器流水线。FrequencyFormer包括:(1)多尺度DCT标记化器,将224x224图像压缩为紧凑的频域令牌,实现高达128倍的片外数据量减少,且精度损失较小;(2)基于查找表(LUT)的近传感器硬件实现,利用固定DCT系数实现无乘法器、节能且面积高效的标记化;(3)改进的基于MIPI的低功耗通信架构,进一步降低传输能量。FrequencyFormer可作为标准ViT补丁嵌入的直接替代,并与分类、检测和分割任务的预训练骨干网络兼容。该流水线实现了28.8 TOPS/W的能效,将通信能量降低230倍,并将总传感器侧能量降低2.22倍,展示了频域标记化作为传感器内ViT部署的可扩展基础。

英文摘要

Deploying vision transformers (ViTs) on sensor-edge systems is limited not only by on-device compute, but also by the energy and bandwidth required to transmit high-dimensional image data from the sensor to the processor. While in-sensor and near-sensor computing reduce this cost through early feature extraction, existing methods often provide only modest compression. We observe that the frequency domain provides a naturally compact representation of visual information and can be exploited at the sensor level to reduce sensor-to-processor data movement. Building on this insight, we present FrequencyFormer, a co-designed sensor-to-processor pipeline for efficient ViT inference. FrequencyFormer includes: (1) a multi-scale DCT tokenizer that compresses a 224x224 image into compact frequency-domain tokens, achieving up to 128x reduction in off-chip data volume with modest accuracy loss; (2) a LUT-based near-sensor hardware implementation that leverages fixed DCT coefficients for multiplier-free, energy- and area-efficient tokenization; and (3) a modified MIPI-based low-power communication architecture that further reduces transfer energy. FrequencyFormer serves as a drop-in replacement for standard ViT patch embedding and remains compatible with pretrained backbones across classification, detection, and segmentation tasks. The pipeline achieves 28.8 TOPS/W, reduces communication energy by 230x, and lowers total sensor-side energy by 2.22x, demonstrating frequency-domain tokenization as a scalable foundation for in-sensor ViT deployment.

2606.20520 2026-06-19 cs.CR cs.AI cs.DC cs.LG 新提交

Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control Planes

主权执行代理:在智能体控制平面中强制执行证书绑定权限

Jun He, Deying Yu

AI总结 针对自主代理在生产环境中执行变更时缺乏强制权限验证的问题,提出主权执行代理(SEB),通过证书验证、状态检查和范围身份实现运行时强制权限控制,并在AWS和Kubernetes上验证了其安全性和性能。

Comments 19 pages, 6 figures, 10 tables

详情
AI中文摘要

自主代理越来越多地连接到云、部署和数据控制工作流,但生产环境的变更权限不应存在于非确定性推理过程中。现有的访问控制机制授权身份,而保证层认证提议的操作;两者单独都无法在变更时刻提供对认证权限的强制执行点。本文介绍了主权执行代理(SEB),一种用于证书绑定智能体基础设施的运行时强制边界。SEB消耗由主权保证边界(SAB)颁发的证书,验证请求的变更与认证的执行合约匹配,检查有效期窗口、策略时期、撤销时期和实时状态漂移,铸造范围执行身份,调用基础设施API,并记录签名的决策和结果记录。通过分离提议、准入和执行,SEB将认证权限转化为短暂的、可撤销的、可审计的运行时能力,前提是生产变更API拒绝非代理身份。我们展示了SEB执行模型、证书和重放验证谓词、范围身份语义、绕过预防部署模式、失败行为以及一个具体的原型实现。我们在AWS和Kubernetes集群上评估了原型,测量了延迟开销、撤销传播、漂移检测以及故障注入下的安全性。

英文摘要

Autonomous agents are increasingly connected to cloud, deployment, and data-control workflows, but production mutation authority should not reside inside non-deterministic reasoning processes. Existing access-control mechanisms authorize identities, while assurance layers certify proposed actions; neither alone provides a mandatory enforcement point for certified authority at the moment of mutation. This paper introduces the Sovereign Execution Broker (SEB), a runtime enforcement boundary for certificate-bound agentic infrastructure. SEB consumes certificates issued by the Sovereign Assurance Boundary (SAB), verifies that the requested mutation matches the certified execution contract, checks validity windows, policy epochs, revocation epochs, and live-state drift, mints scoped execution identity, invokes infrastructure APIs, and records signed decision and outcome records. By separating proposal, admission, and execution, SEB turns certified authority into a short-lived, revocable, auditable runtime capability, provided that production mutation APIs reject non-broker identities. We present the SEB execution model, certificate and replay-verification predicates, scoped identity semantics, bypass-prevention deployment patterns, failure behavior, and a concrete prototype implementation. We evaluate the prototype on AWS and Kubernetes clusters, measuring latency overheads, revocation propagation, drift detection, and security under fault injection.

2606.20388 2026-06-19 cs.HC cs.AI cs.DB 新提交

DataMagic: Transforming Tabular Data into Data Insight Video

DataMagic: 将表格数据转化为数据洞察视频

Yupeng Xie, Chen Ma, Zhenyang Wang, Liangwei Wang, Jiayi Zhu, Chuxuan Zeng, Zhouan Shen, Boyan Li, Yuyu Luo

AI总结 提出DataMagic系统,通过声明式规范DVSpec和多智能体架构,将原始表格数据和自然语言查询转化为叙事性数据洞察视频,并支持交互式探索。

Comments 5 pages, 3 figures, accepted at VLDB 2026

详情
AI中文摘要

数据视频整合动态图表、语音叙述和同步动画,以时间叙事的方式传达数据洞察,使其成为提高数据管理生命周期中数据消费效率的有效媒介。然而,制作高质量的数据视频需要涵盖数据分析、叙事设计和视频制作的专业知识。现有方法存在不足:静态可视化工具(如BI仪表板)缺乏叙事逻辑和动画;创作工具要求用户预先准备可视化,而非从原始数据开始;像素级视频生成模型无法保证数据保真度或来源。我们演示了DataMagic,一个端到端的交互式系统,将原始表格数据和自然语言查询转化为叙事性数据洞察视频。为确保数据保真度,DataMagic引入了声明式规范DVSpec,通过数据驱动的语义引用将视觉和动画元素绑定到底层数据字段。为解决设计空间的组合爆炸问题,DataMagic采用先生成后编排的多智能体架构,并行生成候选场景,然后通过全局编排优化叙事连贯性。利用DVSpec逻辑与渲染的解耦,系统进一步支持三种交互模式和基于结构化来源的数据问答,将单向视频转化为可探索的交互式数据界面。在109个真实世界样本上的评估验证了DataMagic的有效性。主页:此 https URL

英文摘要

Data videos integrate dynamic charts, voice narration, and synchronized animations to communicate data insights as temporal narratives, making them an effective medium for improving data consumption efficiency in the data management lifecycle. However, producing high-quality data videos requires expertise spanning data analysis, narrative design, and video production. Existing approaches fall short: static visualization tools (e.g., BI dashboards) lack narrative logic and animation; authoring tools require users to pre-prepare visualizations rather than working from raw data; pixel-level video generation models cannot guarantee data fidelity or provenance. We demonstrate DataMagic, an end-to-end interactive system that transforms raw tabular data and natural language queries into narrative data-insight videos. To ensure data fidelity, DataMagic introduces the declarative specification DVSpec, which binds visual and animation elements to underlying data fields through data-driven semantic references. To address the combinatorial explosion of the design space, DataMagic adopts a Generate-then-Orchestrate multi-agent architecture that generates candidate scenes in parallel and then optimizes narrative coherence through global orchestration. Leveraging DVSpec's decoupling of logic and rendering, the system further supports three interaction modes and structured provenance-based data Q&A, transforming one-way videos into explorable interactive data interfaces. Evaluation on 109 real-world samples validates the effectiveness of the DataMagic. Homepage: https://datamagic-home.github.io/

2606.20324 2026-06-19 cs.SE cs.LG 新提交

A Model-Driven Approach for Developing Families of Reinforcement Learning Environments

一种模型驱动的方法用于开发强化学习环境族

Xiaoran Liu, Istvan David

AI总结 提出一种模型驱动方法,通过混合遗传算法和模型转换自动生成强化学习训练环境族,以解决手动开发环境族耗时且易错的问题,并在野火缓解场景中验证了其有效性。

详情
AI中文摘要

虚拟训练环境是软件密集型系统,强化学习(RL)智能体在其中学习、适应并展示有意义的行为。虚拟训练环境为在现实环境中训练智能体提供了一种安全且成本效益高的替代方案。然而,为了收敛,大多数现实的RL问题需要在多个相似但略有不同的环境中进行训练——即环境变体族。环境族的典型开发过程是一项劳动密集型且容易出错的手动工作,难以扩展。为了缓解这些问题,本文提出了一种模型驱动的方法来开发RL训练环境族。为了获得环境族,我们开发了一种方法和原型工具。在我们的方法中,一种混合遗传算法——基于种群的全局搜索和启发式局部搜索的结合——生成环境族。变异和约束被表达为模型转换,并通过最先进的模型转换引擎操作化为搜索过程。我们在野火缓解场景和课程学习(一种依赖于环境族的特定学习范式)中展示了我们方法的有效性。

英文摘要

Virtual training environments are software-intensive systems in which reinforcement learning (RL) agents learn, adapt, and demonstrate meaningful behavior. Virtual training environments offer a safe and cost-efficient alternative to training agents in real-world settings. However, to converge, most realistic RL problems require training in multiple, mostly similar but slightly different environments - i.e., families of environment variants. The typical development process of environment families is a labor-intensive and error-prone manual endeavor that does not scale well. To alleviate these issues, in this paper, we propose a model-driven approach for developing families of RL training environments. To obtain the family of environments, we develop an approach and prototype tool. In our approach, a hybrid genetic algorithm - a combination of population-based global search and heuristic local search - generates environment families. Mutations and constraints are expressed as model transformations and are operationalized into a search process by a state-of-the-art model transformation engine. We demonstrate the soundness of our approach in a wildfire mitigation scenario and curriculum learning - a particular learning paradigm that relies on environment families.

2606.20151 2026-06-19 cs.NE cs.AI 新提交

Hybrid ANN-SNN Pipeline with Local Plasticity

混合ANN-SNN流水线与局部可塑性

Denis Larionov, Khairutin Shtanchaev, Mikhail Kiselev, Mikhail Korovin, Ivan Tugoy

AI总结 提出一种混合ANN-SNN流水线,利用预训练ANN的丰富嵌入实现高性能SNN,通过速率编码和局部学习规则训练,在64类ImageNet上达到99.09%准确率。

Comments 9 pages, 4 figues, source-code available

详情
AI中文摘要

本文提出了一种混合ANN-SNN流水线,有效利用预训练人工神经网络(ANN)的丰富嵌入来实现高性能脉冲神经网络(SNN)。该架构将预训练的EfficientNet编码器与CoLaNET脉冲分类器耦合。我们通过速率编码将编码器的激活转换为脉冲序列,并使用局部、生物启发的学习规则训练后续的SNN分类器,绕过了端到端的梯度传播。该方法在64类ImageNet基准测试中达到了99.09%的准确率,展现了与传统深度网络相当的性能。该工作为将强大的预训练编码器适应于下游脉冲神经网络任务提供了一种生物上合理且高效的框架。

英文摘要

This work proposes a hybrid ANN-SNN pipeline that effectively leverages the rich embeddings of pretrained artificial neural networks (ANNs) to enable high-performance spiking neural networks (SNNs). The architecture couples a pretrained EfficientNet encoder with a CoLaNET spiking classifier. We convert the encoder's activations into spike trains via rate-coding and train the subsequent SNN classifier using local, biologically inspired learning rules, bypassing end-to-end gradient propagation. This approach achieves 99.09% accuracy on a 64-class ImageNet benchmark, demonstrating performance on par with conventional deep networks. The work presents a biologically plausible and efficient framework for adapting powerful pretrained encoders to downstream spiking neural network tasks.

2606.19975 2026-06-19 cs.CY cs.AI 新提交

The Algorithmic-Human Manager: AI, Apps, and Workers in the Indian Gig Economy

算法-人类管理者:印度零工经济中的AI、应用程序与工人

Omir Kumar, Krishnan Narayanan

AI总结 本文研究AI和数字技术对印度蓝领零工经济中算法管理的影响,发现其虽扩大就业机会但引发公平性、透明度和工人尊严问题,提出算法-人类管理者混合治理模型。

Comments Published by the Centre for Responsible AI (CeRAI) at IIT Madras

详情
AI中文摘要

本文考察了人工智能和数字技术对印度蓝领零工经济的影响,重点关注算法管理——即在基于位置的服务(如拼车和配送)中使用自动化系统来分配、监控和评估工作。采用社会正义框架和混合方法(包括对16名零工工人和21名关键利益相关者的访谈),研究揭示了一个双重现实:虽然AI驱动的系统扩大了工作机会并产生了运营效率,但它们同时引入了与公平、透明度和工人尊严相关的重大挑战。关键发现表明,算法系统设计上不透明,产生不公平的结果,并且其结构不能为额外劳动提供相应报酬。研究倡导一种务实的混合治理模型——算法-人类管理者框架,其中技术效率和人类问责制共同运作而非对立。研究结果对政策制定者、平台公司以及致力于为印度和全球南方的零工经济设计公平AI治理框架的民间社会组织具有启示意义。

英文摘要

This paper examines the impact of artificial intelligence and digital technologies on the blue-collar gig economy in India, focusing on algorithmic management. This paper examines the impact of artificial intelligence and digital technologies on the blue collar gig economy in India, focusing on algorithmic management he use of automated systems to allocate, monitor, and evaluate work in location-based services such as ride sharing and delivery. Using a social justice framework and a mixed-methods approach comprising interviews with 16 gig workers and 21 key stakeholders, the study uncovers a dual reality: while AI-powered systems expand access to work and generate operational efficiencies, they simultaneously introduce significant challenges related to fairness, transparency, and worker dignity. Key findings reveal that algorithmic systems are opaque by design, produce inequitable outcomes, and are not structured to reward additional labour with proportionate pay. The study advocates for a pragmatic hybrid governance model an Algorithmic Human Manager framework in which technological efficiency and human accountability operate together rather than in opposition. The findings carry implications for policymakers, platform companies, and civil society organizations working to design equitable AI governance frameworks for the gig economy in India and across the Global South.

2606.19799 2026-06-19 cs.SE cs.LG 新提交

The Hidden Environmental Cost of Poor Coding Practices in TensorFlow and Keras Applications: A Study on Resource Leaks and Carbon Emissions

TensorFlow和Keras应用中不良编码实践的隐藏环境成本:资源泄漏与碳排放研究

Bashar Abdallah, Gustavo Santos, Rola Al Bataineh, Alain Abran, Mohammad Hamdaqa

AI总结 研究TensorFlow/Keras中两种资源泄漏气味(IMR和UTR)对能耗和碳排放的影响,实验表明两者分别增加约32%和46%的电力消耗,证明资源泄漏显著降低ML能效并增加环境负担。

详情
AI中文摘要

效率和可持续性是机器学习(ML)应用开发和部署中的关键考量。在影响可持续性的因素中,ML代码中的资源泄漏可能引入隐藏的低效率,从而增加能源消耗和CO2排放。尽管如此,量化其环境影响的实证证据仍然有限。这篇新兴结果论文对两种常见的资源泄漏气味,即不当模型重用(IMR)和未释放张量引用(UTR),及其对TensorFlow和Keras工作负载中能源消耗和CO2排放的影响进行了初步实证研究。通过执行相同的训练任务,并与无气味基线进行比较,对每种气味进行了受控实验。我们的初步结果表明,两种气味都持续增加了估计的用电量和碳排放。IMR和UTR分别使电力消耗增加约32%和46%,CO2排放也成比例增加。配对统计检验表明这些差异是系统性的且具有统计显著性,提供了初步的实证证据,表明资源泄漏气味可能降低ML的能效和环境可持续性。这些发现表明,资源泄漏气味对软件质量和可持续性构成可衡量的风险,强调了将资源生命周期管理和能效考虑纳入ML开发的重要性。

英文摘要

Efficiency and sustainability are critical considerations in the development and deployment of machine learning (ML) applications. Among the factors influencing sustainability, resource leaks in ML code can introduce hidden inefficiencies that elevate energy consumption and CO2 emissions. Despite this, empirical evidence quantifying their environmental impact remains limited. This emerging results paper presents an initial empirical investigation of two common resource-leak smells, namely Improper Model Reuse (IMR) and Unreleased Tensor References (UTR), and their impact on energy consumption and CO2 emissions in TensorFlow and Keras workloads. Controlled experiments were conducted for each smell by executing identical training tasks while comparing against a smell-free baseline. Our preliminary results show that both smells consistently increase estimated electricity usage and carbon emissions. IMR and UTR increased electricity consumption by approximately 32% and 46%, respectively, with proportional increases in CO2 emissions. Paired statistical tests indicate that these differences are systematic and statistically significant, providing initial empirical evidence that resource-leak smells may degrade ML energy efficiency and environmental sustainability. These findings suggest that resource-leak smells pose measurable risks to both software quality and sustainability, emphasizing the importance of integrating resource-lifecycle management and energy-efficiency considerations into ML development.

2606.19660 2026-06-19 cs.CR cs.CL 新提交

A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots

基于RAG的聊天机器人中针对提示注入的分层安全框架

Gulshan Saleem, Nisar Ahmed, Muhammad Imran Zaman, Ali Hassan

AI总结 提出三层防御框架,通过输入过滤、上下文指令层级和输出审计,将提示注入攻击成功率从71.4%降至11.3%,误报率4.8%,延迟开销61.2毫秒。

Comments Submitted in ICCK Transactions on Information Security and Cryptography

详情
AI中文摘要

提示注入被OWASP Top 10 for LLM Applications列为大语言模型(LLM)部署中最关键的漏洞,然而现有防御措施仅在孤立的流水线阶段运行且不完整。输入过滤器无法检查检索到的文档,而输出监控器无法阻止恶意载荷到达模型。因此,检索增强生成(RAG)聊天机器人仍然容易受到间接注入攻击,其中被污染的知识库文档会损害每个检索到它的用户。我们提出了一个三层框架,在推理流水线中拦截直接和间接的提示注入。第一层使用基于规则的模式库和微调后的语义异常分类器筛选用户输入。第二层在上下文组装期间强制执行基于来源的指令层级,防止检索到的内容覆盖操作员策略。第三层在交付前使用策略规则引擎和语义漂移检测器审计模型输出。一个持续审计循环聚合结构化日志,并支持重新训练以适应新兴攻击模式。该框架与模型无关,作为中间件部署,无需修改底层LLM。在GPT-4o、Llama 3和Mistral 7B上对5,080个样本的评估显示,该框架将攻击成功率(ASR)从71.4%降至11.3%,比最佳单层基线高出27.3个百分点,比已发布的护栏系统高出23.8个百分点,同时保持4.8%的误报率和61.2毫秒的中位延迟开销。消融研究证实,所有三层提供互补保护,且其组合效果超过单个贡献的总和。

英文摘要

Prompt injection is ranked as the most critical vulnerability in large language model (LLM) deployments by the OWASP Top 10 for LLM Applications, yet existing defenses operate at isolated pipeline stages and remain incomplete. Input filters cannot inspect retrieved documents, while output monitors cannot prevent malicious payloads from reaching the model. Consequently, retrieval-augmented generation (RAG) chatbots remain vulnerable to indirect injection, where a poisoned knowledge-base document compromises every user whose query retrieves it. We present a three-layer framework that intercepts both direct and indirect prompt injection throughout the inference pipeline. Layer 1 screens user input using a rule-based pattern library and a fine-tuned semantic anomaly classifier. Layer 2 enforces a provenance-based instruction hierarchy during context assembly, preventing retrieved content from overriding operator policy. Layer 3 audits model output using a policy rule engine and semantic drift detector before delivery. A continuous audit loop aggregates structured logs and supports retraining to adapt the classifier to emerging attack patterns. The framework is model-agnostic and deploys as middleware without modifying the underlying LLM. Evaluation on 5,080 samples across GPT-4o, Llama 3, and Mistral 7B shows that the framework reduces Attack Success Rate (ASR) from 71.4\% to 11.3\%, outperforming the best single-layer baseline by 27.3 percentage points and a published guardrail system by 23.8 percentage points, while maintaining a 4.8\% false positive rate and a median latency overhead of 61.2 ms. Ablation studies confirm that all three layers provide complementary protection and that their combined effect exceeds the sum of individual contributions.