2604.28179 2026-05-01 cs.CV

Stop Holding Your Breath: CT-Informed Gaussian Splatting for Dynamic Bronchoscopy

停止屏息：基于CT的高斯点云法用于动态支气管镜

Andrea Dunn Beltran, Daniel Rho, Aarav Mehta, Xinqi Xiong, Raúl San José Estépar, Ron Alterovitz, Marc Niethammer, Roni Sengupta

发表机构 * University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校）； Harvard Medical School（哈佛医学院）； University of California, San Diego（加州大学圣地亚哥分校）

AI总结本文提出利用患者特定的呼吸模型消除屏息协议需求，通过配对呼气吸气CT扫描减少呼吸运动，实现连续的变形感知重建。

详情

AI中文摘要

支气管镜导航依赖于将内窥镜视频与术前CT扫描配准，但呼吸运动使气道变形5-20毫米，导致CT到身体的发散，限制了定位精度。在实践中，通过屏息协议来匹配术中解剖结构与静态CT，但难以重复且影响临床流程。我们提出通过利用患者特定的呼吸建模来消除屏息协议的需要。配对的呼气吸气CT扫描，已用于规划，隐含定义了呼吸气道的患者特定变形空间。通过配准这些扫描，我们将呼吸运动减少为每个帧的一个标量呼吸相位，约束所有重建到解剖观察到的配置。我们将这种表示嵌入到一个基于网格锚定的高斯点云框架中，其中轻量级估计器直接从内窥镜RGB推断呼吸相位，从而在呼吸周期内无需屏息或外部传感即可实现连续、变形感知的重建。为了实现定量评估，我们引入了RESPIRE，一个物理上合理的支气管镜模拟管道，具有每帧的几何、姿态、呼吸相位和变形的真实数据。在RESPIRE上的实验表明，我们的方法实现了几何忠实的重建，训练速度超过20倍，且1.22毫米的目标定位精度（在3毫米临床相关容差内）优于无约束的单CT基线。请访问我们的网站查看更多视觉：https://asdunnbe.github.io/RESPIRE/

英文摘要

Bronchoscopic navigation relies on registering endoscopic video to a preoperative CT scan, but respiratory motion deforms the airway by 5-20 mm, creating CT-to-body divergence that limits localization accuracy. In practice, this is mitigated through breath-hold protocols, which attempt to match the intraoperative anatomy to a static CT, but are difficult to reproduce and disrupt clinical workflow. We propose to eliminate the need for breath-hold protocols by leveraging patient-specific respiratory modeling. Paired inhale-exhale CT scans, already acquired for planning, implicitly define the patient-specific deformation space of the breathing airway. By registering these scans, we reduce respiratory motion to a single scalar breathing phase per frame, constraining all reconstructions to anatomically observed configurations. We embed this representation within a mesh-anchored Gaussian splatting framework, where a lightweight estimator infers breathing phase directly from endoscopic RGB, enabling continuous, deformation-aware reconstruction throughout the respiratory cycle without breath-holds or external sensing. To enable quantitative evaluation, we introduce RESPIRE, a physically grounded bronchoscopy simulation pipeline with per-frame ground truth for geometry, pose, breathing phase, and deformation. Experiments on RESPIRE show that our approach achieves geometrically faithful reconstruction, over 20x faster training, and 1.22 mm target localization accuracy (within the 3mm clinically relevant tolerances) outperforming unconstrained single-CT baselines. Please check out our website for additional visuals: https://asdunnbe.github.io/RESPIRE/

URL PDF HTML ☆

赞 0 踩 0

2604.28178 2026-05-01 cs.AI

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

基于LLM的临床图结构细化：增强EEG癫痫诊断中的表示学习

Lincan Li, Zheng Chen, Yushun Dong

发表机构 * Department of Computer Science, Florida State University（佛罗里达州立大学计算机科学系）； SANKEN, The University of Osaka（大阪大学SANKEN）

AI总结本文提出利用大语言模型对图结构进行细化，以提升EEG信号在癫痫诊断中的表示学习效果，通过两阶段框架去除冗余边，提高诊断准确率和图结构意义。

Comments This paper is accepted by the 35th International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2026)

详情

AI中文摘要

脑电图（EEG）信号对于自动癫痫检测至关重要，但其固有的噪声使得稳健的表示学习具有挑战性。现有图构造方法，无论是基于相关性还是学习方法，由于EEG数据的噪声性质，往往生成冗余或不相关的边，这显著损害了图表示的质量并限制了下游任务的性能。受大语言模型（LLMs）出色的推理和上下文理解能力的启发，我们探索了将LLMs用作图边细化的想法。具体而言，我们提出一个两阶段框架：首先验证LLM基于边细化可以有效识别并去除冗余连接，从而显著提高癫痫检测准确性并产生更有意义的图结构。基于这一见解，我们进一步开发了一个稳健的解决方案，其中初始图使用基于Transformer的边预测器和多层感知机构建，为潜在边分配概率分数并应用阈值确定其存在。LLM则作为边集细化器，根据节点对的文本和统计特征做出决策以验证剩余连接。在TUSZ数据集上的大量实验表明，我们的LLM细化图学习框架不仅增强了任务性能，还产生了更干净且可解释的图表示。

英文摘要

Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging. Existing graph construction methods, whether correlation-based or learning-based, often generate redundant or irrelevant edges due to the noisy nature of EEG data. This significantly impairs the quality of graph representation and limits downstream task performance. Motivated by the remarkable reasoning and contextual understanding capabilities of large language models (LLMs), we explore the idea of using LLMs as graph edge refiners. Specifically, we propose a two-stage framework: we first verify that LLM-based edge refinement can effectively identify and remove redundant connections, leading to significant improvements in seizure detection accuracy and more meaningful graph structures. Building on this insight, we further develop a robust solution where the initial graph is constructed using a Transformer-based edge predictor and multilayer perceptron, assigning probability scores to potential edges and applying a threshold to determine their existence. The LLM then acts as an edge set refiner, making informed decisions based on both textual and statistical features of node pairs to validate the remaining connections. Extensive experiments on TUSZ dataset demonstrate that our LLM-refined graph learning framework not only enhances task performance but also yields cleaner and more interpretable graph representations.

URL PDF HTML ☆

赞 0 踩 0

2604.28175 2026-05-01 cs.LG

Strait: Perceiving Priority and Interference in ML Inference Serving

Strait: 机器学习推理服务中的优先级感知与干扰处理

Haidong Zhao, Nikolaos Georgantas

发表机构 * Inria \& Sorbonne University Paris France ； Inria Paris France ； Inria \& Sorbonne University ； Inria

AI总结 Strait系统通过优先级感知调度和干扰预测，提升高优先级任务的截止期限满足率，降低低优先级任务的开销。

详情

AI中文摘要

机器学习（ML）推理服务系统托管深度神经网络（DNN）模型，并在部署的GPU上调度 incoming 推理请求。然而，有限的任务优先级支持和并发执行下不充分的延迟估计可能限制其在本地场景中的应用。我们提出了Strait，一个旨在增强高GPU利用率下双优先级推理流量截止期限满足的的服务系统。为提高延迟估计，Strait模型数据传输期间的潜在竞争，并通过自适应预测模型考虑内核执行干扰。通过利用这些预测，它执行优先级感知调度以实现差异化处理。在高强度负载下的评估结果表明，Strait在高优先级任务中将截止期限违规率减少1.02至11.18个百分点，同时在低优先级任务上产生可接受的开销。与软件定义抢占方法相比，Strait还表现出更公平的性能。

英文摘要

Machine learning (ML) inference serving systems host deep neural network (DNN) models and schedule incoming inference requests across deployed GPUs. However, limited support for task prioritization and insufficient latency estimation under concurrent execution may restrict their applicability in on-premises scenarios. We present \emph{Strait}, a serving system designed to enhance deadline satisfaction for dual-priority inference traffic under high GPU utilization. To improve latency estimation, Strait models potential contention during data transfer and accounts for kernel execution interference through an adaptive prediction model. By drawing on these predictions, it performs priority-aware scheduling to deliver differentiated handling. Evaluation results under intense workloads suggest that Strait reduces deadline violations for high-priority tasks by 1.02 to 11.18 percentage points while incurring acceptable costs on low-priority tasks. Compared to software-defined preemption approaches, Strait also exhibits more equitable performance.

URL PDF HTML ☆

赞 0 踩 0

2604.28169 2026-05-01 cs.CV cs.AI cs.LG

PhyCo: Learning Controllable Physical Priors for Generative Motion

PhyCo：学习可控制的物理先验以生成运动

Sriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan, Manmohan Chandraker

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； NEC Labs America（NEC美国实验室）； UC San Diego（圣地亚哥大学）

AI总结 PhyCo通过整合物理可控的生成模型，实现了在视频生成中物理一致性和可控性的提升，无需模拟器或几何重建。

Comments CVPR 2026. Project Page: https://phyco-video.github.io/

详情

AI中文摘要

现代视频扩散模型在外观合成方面表现出色，但在物理一致性上仍有不足：物体漂移、碰撞缺乏真实反弹、材料响应与底层属性不匹配。我们提出了PhyCo框架，引入连续、可解释且物理基础的控制到视频生成中。我们的方法整合了三个关键组件：（i）一个包含超过10万条光实模拟视频的大规模数据集，其中摩擦、恢复力、变形和力在多样化场景中系统变化；（ii）使用ControlNet对预训练扩散模型进行物理监督微调，该ControlNet基于像素对齐的物理属性图；（iii）VLM引导的奖励优化，其中微调的视觉-语言模型评估生成视频并提供可微分反馈。这种组合使生成模型能够通过物理属性的变化生成物理一致且可控的输出，无需任何模拟器或几何重建。在Physics-IQ基准测试中，PhyCo在强基线模型上显著提高了物理真实性，人类研究证实了对物理属性的更清晰和忠实的控制。我们的结果展示了一条可扩展的路径，使生成视频模型在超越合成训练环境的情况下实现物理一致性和可控性。

英文摘要

Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded control into video generation. Our approach integrates three key components: (i) a large-scale dataset of over 100K photorealistic simulation videos where friction, restitution, deformation, and force are systematically varied across diverse scenarios; (ii) physics-supervised fine-tuning of a pretrained diffusion model using a ControlNet conditioned on pixel-aligned physical property maps; and (iii) VLM-guided reward optimization, where a fine-tuned vision-language model evaluates generated videos with targeted physics queries and provides differentiable feedback. This combination enables a generative model to produce physically consistent and controllable outputs through variations in physical attributes-without any simulator or geometry reconstruction at inference. On the Physics-IQ benchmark, PhyCo significantly improves physical realism over strong baselines, and human studies confirm clearer and more faithful control over physical attributes. Our results demonstrate a scalable path toward physically consistent, controllable generative video models that generalize beyond synthetic training environments.

URL PDF HTML ☆

赞 0 踩 0

2604.28161 2026-05-01 cs.RO

RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects

RopeDreamer：一种用于柔性可变形线性物体动态的运动学递归状态空间模型

Tim Missal, Lucas Domingues, Berk Guler, Simon Manschitz, Jan Peters, Paula Dornhofer Paro Costa

发表机构 * Technical University of Darmstadt（德意志技术大学）； School of Electrical and Computer Engineering, Universidade Estadual de Campinas (UNICAMP)（坎皮纳斯州立大学电气与计算机工程学院）； Instituto de Pesquisas Eldorado（Eldorado研究所）； Honda Research Institute Europe GmbH（本田欧洲研究院）； German Research Center for Artificial Intelligence (DFKI)（德国人工智能研究中心）； Robotics Institute Germany (RIG)（德国机器人研究所）； Centre for Cognitive Science（认知科学研究中心）； Artificial Ingelligence Lab, Recod.ai（Recod.ai人工智能实验室）

AI总结本文提出结合递归状态空间模型与四元数运动链表示的潜变量框架，用于预测柔性可变形线性物体的状态，通过约束物理有效流形减少自交和非物理变形，提升长周期预测性能。

详情

AI中文摘要

可变形线性物体（DLOs）的机器人操作是一个基本挑战，由于柔性结构的高维非线性动力学和接触密集任务中保持拓扑完整性的复杂性。尽管最近的数据驱动方法利用递归和图神经网络进行动力学建模，但它们在自交和非物理变形（如打结和链接拉伸）方面常常遇到困难。在本文中，我们提出了一种潜变量框架，结合递归状态空间模型与四元数运动链表示，以实现稳健的长期DLO状态预测。通过将DLO编码为相对旋转序列（四元数）而非独立的笛卡尔位置，我们内在地将模型限制在物理有效的流形上，保持链接长度恒定。此外，我们引入了双解码器架构，将状态重建与未来状态预测解耦，迫使潜在空间捕捉变形的底层物理。我们在大规模模拟数据集上评估了我们的方法，该数据集包含涉及自交的复杂拾取和放置轨迹。我们的结果表明，与最先进的基线相比，所提模型在50步预测范围内实现了40.52%的开环预测误差减少，同时将推理时间减少了31.17%。我们的模型进一步在多重交叉场景中保持了优越的拓扑一致性，证明了其作为长周期操作规划组合基本元素的有效性。

英文摘要

The robotic manipulation of Deformable Linear Objects (DLOs) is a fundamental challenge due to the high-dimensional, non-linear dynamics of flexible structures and the complexity of maintaining topological integrity during contact-rich tasks. While recent data-driven methods have utilized Recurrent and Graph Neural Networks for dynamics modeling, they often struggle with self-intersections and non-physical deformations, such as tangling and link stretching. In this paper, we propose a latent dynamics framework that combines a Recurrent State Space Model with a Quaternionic Kinematic Chain representation to enable robust, long-term forecasting of DLO states. By encoding the DLO as a sequence of relative rotations (quaternions) rather than independent Cartesian positions, we inherently constrain the model to a physically valid manifold that preserves link-length constancy. Furthermore, we introduce a dual-decoder architecture that decouples state reconstruction from future-state prediction, forcing the latent space to capture the underlying physics of deformation. We evaluate our approach on a large-scale simulated dataset of complex pick-and-place trajectories involving self-intersections. Our results demonstrate that the proposed model achieves a 40.52% reduction in open-loop prediction error over 50-step horizons compared to the state-of-the-art baseline, while reducing inference time by 31.17%. Our model further maintains superior topological consistency in scenarios with multiple crossings, proving its efficacy as a compositional primitive for long-horizon manipulation planning.

URL PDF HTML ☆

赞 0 踩 0

2604.28159 2026-05-01 cs.CV

Continuous-tone Simple Points: An $\ell_0$-Norm of Cyclic Gradient for Topology-Preserving Data-Driven Image Segmentation

连续色调简单点：基于循环梯度的$\ell_0$-范数用于拓扑保持的数据驱动图像分割

Wenxiao Li, Faqiang Wang, Yuping Duan, Li Cui, Liqiang Zhang, Jun Liu

发表机构 * Laboratory of Mathematics and Complex Systems (Ministry of Education), School of Mathematical Sciences, Beijing Normal University（数学与复杂系统实验室（教育部）, 数学科学学院, 北京师范大学）； State Key Laboratory of Remote Sensing Science, Faculty of Geographical Science, Beijing Normal University（遥感科学国家重点实验室, 地理科学学院, 北京师范大学）

AI总结本文提出一种基于连续值图像直接计算简单点的方法，通过可微拓扑推断提升图像分割的拓扑一致性与结构精度。

详情

AI中文摘要

拓扑特征在图像分析任务中确保几何合理性与结构一致性至关重要。然而，将拓扑保持学习整合到深度学习任务中仍具挑战性，因为现有简单点检测方法局限于二值图像且不可微，无法与现代深度学习中的梯度优化兼容。此外，形态学和纯数据驱动方法常无法保证拓扑一致性。为此，本文提出一种新颖方法，直接在连续值图像上计算简单点，实现可微拓扑推断。基于此理论，开发了高效的骨架提取算法，保留二值和连续值图像的拓扑结构。进一步设计了变分模型，通过保留拓扑非可移除（即非简单）点来施加拓扑约束，可无缝集成到任何具有softmax或sigmoid输出的深度神经网络分割中。实验结果表明，所提方法在多个基准上有效提升了拓扑完整性和结构精度。代码可在https://github.com/levnsio/CSP获取。

英文摘要

Topological features play an essential role in ensuring geometric plausibility and structural consistency in image analysis tasks such as segmentation and skeletonization. However, integrating topology-preserving learning based on simple points into deep learning tasks remains challenging, as existing simple point detection methods are confined to binary images and are non-differentiable, rendering them incompatible with gradient-based optimization in modern deep learning. Moreover, morphological and purely data-driven approaches often fail to guaranty topological consistency. To address these limitations, we propose a novel method that directly computes simple points on continuous-valued images, enabling differentiable topological inference. Building on this theory, we develop an efficient skeleton extraction algorithm that preserves topological structures in binary and continuous-valued images. Furthermore, we design a variational model that enforces topological constraints by preserving topologically non-removable (i.e., non-simple) points, which can be seamlessly integrated into any deep neural network segmentation with softmax or sigmoid outputs. Experimental results demonstrate that the proposed approach effectively improves topological integrity and structural accuracy across multiple benchmarks. The codes are available in https://github.com/levnsio/CSP.

URL PDF HTML ☆

赞 0 踩 0

2604.28156 2026-05-01 cs.RO cs.AI cs.LG

FlexiTac: A Low-Cost, Open-Source, Scalable Tactile Sensing Solution for Robotic Systems

FlexiTac：一种低成本、开源、可扩展的触觉传感解决方案，用于机器人系统

Binghao Huang, Yunzhu Li

发表机构 * Columbia University（哥伦比亚大学）

AI总结 FlexiTac是一种低成本、开源、可扩展的触觉传感模块，通过灵活的传感器垫和紧凑的读取板实现高密度触觉信号采集，支持现代触觉学习流程。

Comments Website: https://flexitac.github.io/

详情

AI中文摘要

我们介绍了FlexiTac，一种低成本、开源且可扩展的压阻式触觉传感解决方案，专为机器人末端执行器设计。FlexiTac是一个实用的“插件”模块，包括（i）薄而灵活的触觉传感器垫，提供密集的触觉信号，以及（ii）紧凑的多通道读取板，用于同步测量，以实现实时控制和大规模数据收集。FlexiTac垫采用密封的三层叠层堆叠（FPC-Velostat-FPC），其中电极图案直接集成到柔性印刷电路中，显著提高了制造吞吐量和重复性，同时保持机械顺应性，适用于刚性和柔软夹具。读取电子设备使用广泛可用、低成本的组件，并通过串行通信以100 Hz的速度将触觉信号传输到主机计算机。在多个配置中，包括指尖垫和更大的触觉垫，FlexiTac可以安装在多种平台上，无需重大机械重新设计。我们进一步展示了FlexiTac支持现代触觉学习流程，包括3D视觉-触觉融合用于接触感知决策、跨身体技能转移以及实-模-实微调，使用GPU并行触觉模拟。我们的项目页面可在https://flexitac.github.io/上找到。

英文摘要

We present FlexiTac, a low-cost, open-source, and scalable piezoresistive tactile sensing solution designed for robotic end-effectors. FlexiTac is a practical "plug-in" module consisting of (i) thin, flexible tactile sensor pads that provide dense tactile signals and (ii) a compact multi-channel readout board that streams synchronized measurements for real-time control and large-scale data collection. FlexiTac pads adopt a sealed three-layer laminate stack (FPC-Velostat-FPC) with electrode patterns directly integrated into flexible printed circuits, substantially improving fabrication throughput and repeatability while maintaining mechanical compliance for deployment on both rigid and soft grippers. The readout electronics use widely available, low-cost components and stream tactile signals to a host computer at 100 Hz via serial communication. Across multiple configurations, including fingertip pads and larger tactile mats, FlexiTac can be mounted on diverse platforms without major mechanical redesign. We further show that FlexiTac supports modern tactile learning pipelines, including 3D visuo-tactile fusion for contact-aware decision making, cross-embodiment skill transfer, and real-to-sim-to-real fine-tuning with GPU-parallel tactile simulation. Our project page is available at https://flexitac.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2604.28149 2026-05-01 cs.LG

Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models

具有协变量信息的时间序列基础模型可解释性负荷预测

Matthias Hertel, Alexandra Nikoltchovska, Sebastian Pütz, Ralf Mikut, Benjamin Schäfer, Veit Hagenmeyer

发表机构 * Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院）

AI总结本文提出一种高效计算SHAP的方法，用于增强时间序列基础模型的透明度，通过在负荷预测任务中评估两种TSFMs，展示其在电力系统中的可靠性与可解释性。

详情

DOI: 10.1145/3744255.3811724

AI中文摘要

时间序列基础模型（TSFMs）最近涌现为通用预测模型，在能源系统中有很大应用潜力。然而，关键基础设施如电网应用需要透明性以确保信任和可靠性，不能依赖纯黑盒模型。为提高TSFMs的透明度，我们提出一种针对这些模型计算Shapley Additive Explanations（SHAP）的高效算法。该方法利用TSFMs对输入上下文长度和提供的协变量的灵活性。这一特性使能够高效地进行时间序列和协变量遮蔽（选择性地 withholding 输入），从而通过SHAP实现可扩展的模型预测解释。我们在一天前的负荷预测任务中评估了两种TSFMs - Chronos-2和TabPFN-TS，针对输电系统运营商（TSO）。在零样本设置中，两种模型在预测性能上与专门训练在多个年份TSO数据上的Transformer模型相竞争。通过我们提出的方法获得的解释与已建立的领域知识一致，特别是TSFMs能够适当利用天气和日历信息进行负荷预测。总体而言，我们证明TSFMs可以作为透明且可靠的运营能源预测工具。

英文摘要

Time Series Foundation Models (TSFMs) have recently emerged as general-purpose forecasting models and show considerable potential for applications in energy systems. However, applications in critical infrastructure like power grids require transparency to ensure trust and reliability and cannot rely on pure black-box models. To enhance the transparency of TSFMs, we propose an efficient algorithm for computing Shapley Additive Explanations (SHAP) tailored to these models. The proposed approach leverages the flexibility of TSFMs with respect to input context length and provided covariates. This property enables efficient temporal and covariate masking (selectively withholding inputs), allowing for a scalable explanation of model predictions using SHAP. We evaluate two TSFMs - Chronos-2 and TabPFN-TS - on a day-ahead load forecasting task for a transmission system operator (TSO). In a zero-shot setting, both models achieve predictive performance competitive with a Transformer model trained specifically on multiple years of TSO data. The explanations obtained through our proposed approach align with established domain knowledge, particularly as the TSFMs appropriately use weather and calendar information for load prediction. Overall, we demonstrate that TSFMs can serve as transparent and reliable tools for operational energy forecasting.

URL PDF HTML ☆

赞 0 踩 0

2604.28148 2026-05-01 cs.RO eess.IV physics.ins-det

Design and Characteristics of a Thin-Film ThermoMesh for the Efficient Embedded Sensing of a Spatio-Temporally Sparse Heat Source

Sajjad Boorghan Farahan, Ahmed Alajlouni, Jingzhou Zhao

发表机构 * Department of Mechanical Engineering State University of New York at Binghamton, Binghamton, NY

Comments 45 pages, 13 figures, 63 references, under review in Sensors and Actuators A: Physical

2604.28147 2026-05-01 cs.CL

On the Proper Treatment of Units in Surprisal Theory

关于在惊奇理论中正确处理单位的探讨

Samuel Kiegeland, Vésteinn Snæbjarnarson, Tim Vieira, Ryan Cotterell

发表机构 * ETH Zürich（苏黎世联邦理工学院）； University of Copenhagen（哥本哈根大学）

AI总结本文探讨了在惊奇理论中正确处理语言单位的重要性，提出应明确区分单位定义与预测区域选择，并统一框架处理任意单位库。

Comments ACL 2026 (main conference)

2604.28144 2026-05-01 cs.LG math.OC

Global Optimality for Constrained Exploration via Penalty Regularization

通过惩罚正则化实现约束探索的全局最优性

Florian Wolf, Ilyas Fatkhullin, Niao He

发表机构 * Florian Wolf: , Ilyas Fatkhullin: , Niao He: 1The Computing \& Mathematical Sciences Department, California Institute of Technology, Pasadena, CA. 2Department of Computer Science, ETH Zurich, Switzerland. 3ETH AI Center, ETH Zurich, Switzerland.

AI总结本文提出Policy Gradient Penalty方法，通过二次惩罚正则化解决约束下的探索问题，实现全局收敛性和近优策略。

详情

AI中文摘要

本文提出Policy Gradient Penalty方法，通过二次惩罚正则化解决约束下的探索问题，实现全局收敛性和近优策略。

英文摘要

Efficient exploration is a central problem in reinforcement learning and is often formalized as maximizing the entropy of the state-action occupancy measure. While unconstrained maximum-entropy exploration is relatively well understood, real-world exploration is often constrained by safety, resource, or imitation requirements. This constrained setting is particularly challenging because entropy maximization lacks additive structure, rendering Bellman-equation-based methods inapplicable. Moreover, scalable approaches require policy parameterization, inducing non-convexity in both the objective and the constraints. To our knowledge, the only prior model-free policy-gradient approach for this setting under general policy parameterization is due to Ying et al. (2025). Unfortunately, their guarantees are limited to weak regret and ergodic averages, which do not imply that the final output is a single deployable policy that is near-optimal and nearly feasible. In this work we take a different approach to this problem, and propose Policy Gradient Penalty (PGP) method, a single-loop policy-space method that enforces general convex occupancy-measure constraints via quadratic-penalty regularization. PGP constructs pseudo-rewards that yield gradient estimates of the penalized objective, subsequently exploiting the classical Policy Gradient Theorem. We further establish the regularity of the penalized objective, providing the smoothness properties needed to justify the convergence of PGP. Leveraging hidden convexity and strong duality, we then establish global last-iterate convergence guarantees, attaining an $ε$-optimal constrained entropy value with $ε$ bounded constraint violation despite policy-induced non-convexity. We validate PGP through ablations on a grid-world benchmark and further demonstrate scalability on two challenging continuous-control tasks.

URL PDF HTML ☆

赞 0 踩 0

2604.28136 2026-05-01 cs.CV

Beyond Pixel Fidelity: Minimizing Perceptual Distortion and Color Bias in Night Photography Rendering

超越像素保真：在夜景摄影渲染中最小化感知失真和颜色偏差

Furkan Kınlı

发表机构 * Bahçeşehir University Department of Artificial Intelligence Engineering İstanbul, Türkiye（贝勒谢尔大学人工智能工程系伊斯坦布尔）

AI总结本文提出pHVI-ISPNet框架，通过改进的HVI颜色空间和四种关键优化方法，提升夜景摄影的视觉质量和颜色一致性，实验证明在CIE2000色差和LPIPS指标上达到新水平。

Comments 6 pages, 3 figures, Accepted to 2026 IEEE International Conference on Image Processing

2604.28126 2026-05-01 cs.CV cs.AI

稀疏自编码器能否捕捉概念流形？

Usha Bhalla, Thomas Fel, Can Rager, Sheridan Feucht, Tal Haklay, Daniel Wurgaft, Siddharth Boppana, Matthew Kowal, Vasudev Shyam, Jack Merullo, Atticus Geiger, Ekdeep Singh Lubana

发表机构 * Harvard University（哈佛大学）； Northeastern University（东北大学）； Technion IIT（技术学院）； Stanford University（斯坦福大学）

AI总结本文探讨了稀疏自编码器捕捉流形的能力，指出现有方法在连续结构恢复上存在不足，并提出应以几何对象而非单个方向作为可解释性基础。

详情

AI中文摘要

稀疏自编码器（SAEs）被广泛用于从神经网络表示中提取可解释特征，通常隐含假设概念对应独立的线性方向。然而，越来越多的证据表明，许多概念实际上沿着低维流形组织，编码连续的几何关系。本文提出一个理论框架，证明SAEs可以通过两种方式捕捉流形：全局方式通过分配一组原子的线性张量包含整个流形，或局部方式通过分布于特征中，每个特征选择性地覆盖基础几何的受限区域。实验证明，SAEs在连续结构恢复上表现不佳，混合了全局子空间和局部铺砖解决方案，形成所谓的稀释状态。这解释了为什么流形结构在单个概念层面 rarely 可见，并促使后续无监督发现方法寻找连贯的原子组而非孤立方向。更广泛地说，本文结果表明，未来表征学习方法应将几何对象而非单个方向作为可解释性的基本单位。

英文摘要

Sparse autoencoders (SAEs) are widely used to extract interpretable features from neural network representations, often under the implicit assumption that concepts correspond to independent linear directions. However, a growing body of evidence suggests that many concepts are instead organized along low-dimensional manifolds encoding continuous geometric relationships. This raises three basic questions: what does it mean for an SAE to capture a manifold, when do existing SAE architectures do so, and how? We develop a theoretical framework that answers these questions and show that SAEs can capture manifolds in two fundamentally different ways: globally, by allocating a compact group of atoms whose linear span contains the entire manifold, or locally, by distributing it across features that each selectively tile a restricted region of the underlying geometry. Empirically, we find that SAEs suboptimally recover continuous structures, mixing the global subspace and local tiling solutions in a fragmented regime we call dilution. This explains why manifold structure is rarely visible at the level of individual concepts and motivates post-hoc unsupervised discovery methods that search for coherent groups of atoms rather than isolated directions. More broadly, our results suggest that future representation learning methods should treat geometric objects, not just individual directions, as the basic units of interpretability.

URL PDF HTML ☆

赞 0 踩 0

2604.28115 2026-05-01 cs.RO cs.CV

FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction

FreeOcc: 无需训练的具身开放词汇占用预测

Zeyu Jiang, Changqing Zhou, Xingxing Zuo, Changhao Chen

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； MBZUAI

AI总结 FreeOcc通过四层流程实现无需3D标注的开放词汇占用预测，相比传统方法在EmbodiedOcc-ScanNet上提升IoU和mIoU超过2倍，并引入ReplicaOcc基准测试新环境性能。

Comments RSS 2026

详情

AI中文摘要

现有基于学习的占用预测方法依赖大规模3D标注且泛化能力差。本文提出FreeOcc，一种无需训练的具身开放词汇占用预测框架，从单目或RGB-D序列中进行预测。不同于需要体素级监督和真实相机姿态的先前方法，FreeOcc无需3D标注、姿态真实值或任何学习阶段。FreeOcc通过四层流程逐步构建全局一致的占用地图：SLAM主干估计姿态和稀疏几何；几何一致的高斯更新构建密集的3D高斯地图；开放词汇语义从现成的视觉-语言模型关联到高斯原语；概率高斯到占用的投影产生密集体素占用。尽管完全无需训练且姿态无关，FreeOcc在EmbodiedOcc-ScanNet上相比先前自监督方法在IoU和mIoU上提升超过2倍。我们进一步引入ReplicaOcc，一个用于室内开放词汇占用预测的基准，证明FreeOcc能够零样本迁移到新环境，显著优于监督和自监督基线。项目页面：https://the-masses.github.io/freeocc-web/.

英文摘要

Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlation between harmful behavior and self-assessment in emergently misaligned models, it remains unclear how consistent this correspondence is across tasks and whether it varies across fine-tuning domains. We characterize the consistency of the EM persona by fine-tuning Qwen 2.5 32B Instruct on six narrowly misaligned domains (e.g., insecure code, risky financial advice, bad medical advice) and administering experiments including harmfulness evaluation, self-assessment, choosing between two descriptions of AI systems, output recognition, and score prediction. Our results reveal two distinct patterns: coherent-persona models, in which harmful behavior and self-reported misalignment are coupled, and inverted-persona models, which produce harmful outputs while identifying as aligned AI systems. These findings reveal a more fine-grained picture of the effects of emergent misalignment, calling into question the consistency of the EM persona.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

Representation Fréchet Loss for Visual Generation

Exploration Hacking: Can LLMs Learn to Resist RL Training?

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

An adaptive wavelet-based PINN for problems with localized high-magnitude source

Stop Holding Your Breath: CT-Informed Gaussian Splatting for Dynamic Bronchoscopy

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

Strait: Perceiving Priority and Interference in ML Inference Serving

PhyCo: Learning Controllable Physical Priors for Generative Motion

RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects

Continuous-tone Simple Points: An $\ell_0$-Norm of Cyclic Gradient for Topology-Preserving Data-Driven Image Segmentation

FlexiTac: A Low-Cost, Open-Source, Scalable Tactile Sensing Solution for Robotic Systems

Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models

Design and Characteristics of a Thin-Film ThermoMesh for the Efficient Embedded Sensing of a Spatio-Temporally Sparse Heat Source

On the Proper Treatment of Units in Surprisal Theory

Global Optimality for Constrained Exploration via Penalty Regularization

Beyond Pixel Fidelity: Minimizing Perceptual Distortion and Color Bias in Night Photography Rendering

AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation

Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People

Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces

Do Sparse Autoencoders Capture Concept Manifolds?

FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction

Splitting Argumentation Frameworks with Collective Attacks and Supports

Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

Neural Aided Kalman Filtering for UAV State Estimation in Degraded Sensing Environments

FiLMMeD: Feature-wise Linear Modulation for Cross-Problem Multi-Depot Vehicle Routing

Mapping the Methodological Space of Classroom Interaction Research: Scale, Duration, and Modality in an Age of AI

Characterizing the Consistency of the Emergent Misalignment Persona