arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2332
2605.11463 2026-05-13 cs.CV

Encore: Conditioning Trajectory Forecasting via Biased Ego Rehearsals

Conghao Wong, Ziqian Zou, Xinge You

AI总结 本文研究了如何在轨迹预测任务中学习和表示智能体的主观性,这一问题具有挑战性但至关重要。作者提出了一种名为Encore的方法,通过引入偏向性的自我排练机制,使模型能够从短期观测中生成针对场景中所有参与者的偏置排练轨迹,并利用这些轨迹作为条件来引导最终预测,从而更准确地模拟不同智能体的主观行为。实验表明,该方法在多个数据集上均取得了性能提升,并为理解轨迹中的主观性提供了清晰的解释。

详情
英文摘要

Learning and representing the subjectivities of agents has become a challenging but crucial problem in the trajectory prediction task. Such subjectivities not only present specific spatial or temporal structures, but also are anisotropic for all interaction participants. Despite great efforts, it remains difficult to explicitly learn and forecast these subjectivities, let alone further modulate models' predictions through a specific ego's subjectivity. Inspired by prefactual thoughts in psychology and relevant theatrical concepts, we interpret such subjectivities in future trajectories as the continuous process from rehearsal to encore. In the rehearsal phase, the proposed ego predictor focuses on how each ego agent learns to derive and direct a set of explicitly biased rehearsal trajectories for all participants in the scene from the short-term observations. Then, these rehearsal trajectories serve as immediate controls to condition final predictions, providing direct yet distinct ego biases for the prediction network to simulate agents' various subjectivities. Experiments across datasets not only demonstrate a consistent improvement in the performance of the proposed \emph{Encore} trajectory prediction model but also provide clear interpretability regarding subjectivities as biased ego rehearsals.

2605.11462 2026-05-13 cs.CV cs.AI

SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images

Zishan Liu, Ruoxi Zang, Yanglin Zhang, Wei Liu, Yin Zhang, Jian Yao, Jiayin Zheng, Zhengzhe Liu

AI总结 该研究提出了一种名为 SpatialForge 的可扩展数据合成方法,旨在从开放世界的二维图像中生成用于三维空间推理的监督信号,以解决当前大型视觉-语言模型在空间推理方面的不足。通过将空间推理分解为感知与关系两个部分,并构建包含深度、布局和视角依赖推理的结构化监督信号,该方法能够自动生成高质量的空间问答数据。基于此,研究构建了一个包含1000万对空间问答的大型数据集 SpatialForge-10M,并在多个空间推理基准上验证了其有效性,显著提升了视觉-语言模型的空间推理能力。

详情
英文摘要

Recent advancements in Large Vision-Language Models (VLMs) have demonstrated exceptional semantic understanding, yet these models consistently struggle with spatial reasoning, often failing at fundamental geometric tasks such as depth ordering and precise coordinate grounding. Recent efforts introduce spatial supervision from scene-centric datasets (e.g., multi-view scans or indoor video), but are constrained by the limited number of underlying scenes. As a result, the scale and diversity of such data remain significantly smaller than those of web-scale 2D image collections. To address this limitation, we propose SpatialForge, a scalable data synthesis pipeline that transforms in-the-wild 2D images into spatial reasoning supervision. Our approach decomposes spatial reasoning into perception and relation, and constructs structured supervision signals covering depth, layout, and viewpoint-dependent reasoning, with automatic verification to ensure data quality. Based on this pipeline, we build SpatialForge-10M, a large-scale dataset containing 10 million spatial QA pairs. Extensive experiments across multiple spatial reasoning benchmarks demonstrate that training on SpatialForge-10M significantly improves the spatial reasoning ability of standard VLMs, highlighting the effectiveness of scaling 2D data for 3D-aware spatial reasoning.

2605.11460 2026-05-13 cs.LG cs.SY eess.SY

Beyond Prediction: Interval Neural Networks for Uncertainty-Aware System Identification

Mehmet Ali Ferah, Tufan Kumbasar

AI总结 本文提出了一种用于不确定性感知系统辨识的区间神经网络(INN)框架,旨在解决传统方法在建模非线性动态系统时无法有效捕捉不确定性的局限性。通过将传统神经网络扩展为区间形式,研究开发了能够传播不确定性的区间LSTM和NODE模型,并提出了两种训练策略——级联INN(C-INN)和联合INN(J-INN),分别在不同阶段优化预测精度与区间精度。实验表明,该框架在多个系统辨识数据集上表现优异,且引入了通道弹性概念以分析不确定性在模型参数中的分布特征。

Comments Under review

详情
英文摘要

System identification (SysID) is critical for modeling dynamical systems from experimental data, yet traditional approaches often fail to capture nonlinear behaviors. While deep learning offers powerful tools for modeling such dynamics, incorporating uncertainty quantification is essential to ensure reliable predictions. This paper presents a systematic framework for constructing and training interval Neural Networks (INNs) for uncertainty-aware SysID. By extending crisp neural networks into interval counterparts, we develop Interval LSTM and NODE models that propagate uncertainty through interval arithmetic without probabilistic assumptions. This design allows them to represent uncertainty and produce prediction intervals. For training, we propose two strategies: Cascade INN (C-INN), a two-stage approach converting a trained crisp NN into an INN, and Joint INN (J-INN), a one-stage framework jointly optimizing prediction accuracy and interval precision. Both strategies employ uncertainty-aware loss functions and parameterization tricks to ensure reliable learning. Comprehensive experiments on multiple SysID datasets demonstrate the effectiveness of both approaches and benchmark their performance against well-established uncertainty-aware baselines: C-INN achieves superior point prediction accuracy, whereas J-INN yields more accurate and better-calibrated prediction intervals. Furthermore, to reveal how uncertainty is represented across model parameters, the concept of channel-wise elasticity is introduced, which is used to identify distinct patterns across the two training strategies. The results of this study demonstrate that the proposed framework effectively integrates deep learning with uncertainty-aware modeling.

2605.11448 2026-05-13 cs.LG cs.AI

Deep Minds and Shallow Probes

Su Hyeong Lee, Risi Kondor

AI总结 本文研究神经表示中隐藏坐标在不同实现下的对称性问题,提出应使用对称性稳定的浅层探针来揭示表示中的结构,而非依赖特定基底。通过分析最终输出层的精确模型,作者确定了一种唯一的浅层探针分层结构,其中线性探针为其一级成员。研究还表明,跨模型探针迁移应基于表示中探针可见的商空间,而非完整的隐藏状态,实验验证了该方法在合成与实际任务中的有效性。

详情
英文摘要

Neural representations are not unique objects. Even when two systems realize the same downstream computation, their hidden coordinates may differ by reparameterization. A probe family intended to reveal structure already present in a representation should therefore be stable under the relevant representation symmetries rather than be tied to a particular basis. We study this group action in the tractable exact setting of the final readout layer, where equivalent realizations induce affine changes of hidden coordinates. The resulting symmetry principle singles out a unique hierarchy of shallow coordinate-stable probes, with linear probes as its degree-1 member. We also show that a natural object for cross-model probe transfer is a shared probe-visible quotient--the representation modulo directions invisible to the probe family--rather than the full hidden state. Experiments on synthetic and real-world tasks support both predictions, showing where degree-2 probes help beyond linear ones and how quotient-based transfer enables coverage-aware monitor portability across model families. These results point toward a broader geometric representation theory of neural probing, with coverage-aware monitor transfer as a concrete operational consequence.

2605.11439 2026-05-13 cs.CV cs.LG

Instruct-ICL: Instruction-Guided In-Context Learning for Post-Disaster Damage Assessment

Armin Zarbaft, Ehsan Karimi, Nhut Le, Maryam Rahnemoonfar

AI总结 本文研究了如何通过结构化推理策略提升预训练多模态大语言模型在灾后视觉问答任务中的可靠性。提出了一种名为 Instruct-ICL 的方法,利用一个 MLLM 生成任务特定的指令作为链式推理(CoT)引导,辅助另一个 MLLM 进行答案生成,并结合不同程度的上下文学习(ICL)提升模型性能。实验表明,该方法在 FloodNet 数据集上显著提高了答案准确性,为灾后快速评估提供了更可靠的技术方案。

Comments Accepted by the 2026 IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2026)

详情
英文摘要

Rapid and accurate situational awareness is essential for effective response during natural disasters, where delays in analysis can significantly hinder decision-making. Training task-specific models for post-disaster assessment is often time-consuming and computationally expensive, making such approaches impractical in time-critical scenarios. Consequently, pretrained multimodal large language models (MLLMs) have emerged as a promising alternative for post-disaster visual question answering (VQA), a task that aims to answer structured questions about visual scenes by jointly reasoning over images and text. While these models demonstrate strong multimodal reasoning capabilities, their responses can be sensitive to prompt formulation, which can limit their reliability in real-world disaster assessment scenarios. In this paper, we investigate whether structured reasoning strategies can improve the reliability of pretrained MLLMs for post-disaster VQA. Specifically, we explore multiple prompting paradigms in which one MLLM is used to generate task-specific instructions that serve as Chain-of-Thought (CoT) guidance for a second MLLM. These instructions are incorporated during answer generation with varying degrees of in-context learning (ICL), enabling the model to leverage both explicit reasoning guidance and contextual examples. We conduct our evaluation on the FloodNet dataset and compare these approaches against a zero-shot baseline. Our results demonstrate that integrating instruction-driven CoT reasoning consistently improves answer accuracy.

2605.11438 2026-05-13 cs.CV

Beyond Masks: The Case for Medical Image Parsing

Siddharth Gupta, Alan L. Yuille, Zongwei Zhou

AI总结 本文提出医疗图像解析(Medical Image Parsing)作为医学影像研究的核心输出,强调应超越传统的像素级分割掩码,生成包含实体、属性及关系的结构化表示,以更全面地描述医学影像内容。研究指出,当前系统在实体识别方面表现较好,但在属性描述、实体间关系及语义闭包等方面仍严重不足。作者主张通过改进输出形式和训练信号,推动模型从测量转向解释,以更贴近临床实际需求。

详情
英文摘要

Medical imaging research has spent a decade getting very good at one thing: producing per-voxel masks. Masks tell us size, volume, and location, and a decade of clinical infrastructure rests on those outputs. Yet the report a radiologist writes contains almost nothing a mask can express. We argue that medical imaging research should adopt medical image parsing as its central output: a structured representation in which entities, attributes, and relationships are emitted together and mutually consistent. Entities are the named structures and findings, present or absent. Attributes describe those entities, capturing things like margin regularity, enhancement pattern, or severity grade. Relationships connect them, naming where one structure sits relative to another, what abuts what, and what has changed since the prior scan. A good parse satisfies three properties, in order: (1) decision (the parse names the right things in the current image), (2) reconstruction (its content is rich enough to regenerate that image), and (3) prediction (its content is rich enough to forecast how the patient state will evolve). Quantitative measurements are derived from this content; they are not predicted alongside it. To test how close the field is to producing such an output, we audit eleven representative systems against the three parsing primitives plus closure. None emits a well-formed parse. Entities are largely solved. Attributes, relationships, and closure remain near-empty. The path forward is not a new architecture. It is a commitment to a richer output, and to training signals that reward it. Segmentation taught models to measure. Parsing asks them to explain.

2605.11436 2026-05-13 cs.CL cs.AI

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

Joykirat Singh, Zaid Khan, Archiki Prasad, Justin Chih-Yao Chen, Akshay Nambi, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal

AI总结 本文提出了一种名为Agent-BRACE的方法,旨在解决大型语言模型在长时序、部分可观测环境中执行任务时面临的不确定性管理和上下文膨胀问题。该方法通过将信念状态与策略解耦,利用自然语言标注的置信度标签构建结构化的信念表示,从而帮助模型在决策时更有效地处理不确定性。实验表明,Agent-BRACE在多个长时序任务中显著提升了性能,同时保持了对上下文长度的鲁棒性。

Comments Code: https://github.com/joykirat18/Agent-BRACE

详情
英文摘要

Large language models (LLMs) are increasingly deployed on long-horizon tasks in partially observable environments, where they must act while inferring and tracking a complex environment state over many steps. This leads to two challenges: partial observability requires maintaining uncertainty over unobserved world attributes, and long interaction history causes context to grow without bound, diluting task-relevant information. A principled solution to both challenges is a belief state: a posterior distribution over environment states given past observations and actions, which compactly encodes history for decision making regardless of episode length. In LLM agents, however, the open-ended nature of text makes it unclear how to represent such a distribution. Therefore, we introduce Agent-BRACE: Agent Belief state Representation via Abstraction and Confidence Estimation, a method that decouples an LLM agent into a belief state model and a policy model, jointly optimized via reinforcement learning. The belief state model produces a structured approximation of the belief distribution: a set of atomic natural language claims about the environment, each annotated with an ordinal verbalized certainty label ranging from certain to unknown. The policy model conditions on this compact, structured approximate belief rather than the full history, learning to select actions under explicit uncertainty. Across long-horizon, partially observable embodied language environments, Agent-BRACE achieves an average absolute improvement of +14.5% (Qwen2.5-3B-Instruct) and +5.3% (Qwen3-4B-Instruct), outperforming strong RL baselines while maintaining a near-constant context window independent of episode length. Further analysis shows that the learned belief becomes increasingly calibrated over the course of an episode as evidence accumulates.

2605.11435 2026-05-13 cs.CV

ZeroIDIR: Zero-Reference Illumination Degradation Image Restoration with Perturbed Consistency Diffusion Models

Hai Jiang, Zhen Liu, Yinjie Lei, Songchen Han, Bing Zeng, Shuaicheng Liu

AI总结 本文提出了一种基于扩散模型的零参考图像修复框架ZeroIDIR,用于解决光照退化图像的恢复问题。该方法仅依赖低质量退化图像进行训练,通过解耦光照校正与扩散重建过程,引入自适应伽马校正模块和直方图引导的光照校正损失,提升光照一致性并作为后续扩散过程的可靠输入。此外,提出了一种扰动一致性扩散损失,以增强恢复图像的细节还原能力和稳定性,实验表明该方法在多个公开数据集上优于现有无监督方法,并具有良好的场景泛化能力。

Comments Accepted by CVPR 2026

详情
英文摘要

In this paper, we propose a zero-reference diffusion-based framework, named ZeroIDIR, for illumination degradation image restoration, which decouples the restoration process into adaptive illumination correction and diffusion-based reconstruction while being trained solely on low-quality degraded images. Specifically, we design an adaptive gamma correction module that performs spatially varying exposure correction to generate illumination-corrected only representations to mitigate exposure bias and serve as reliable inputs for subsequent diffusion processes, where a histogram-guided illumination correction loss is introduced to regularize the corrected illumination distribution toward that of natural scenes. Subsequently, the illumination-corrected image is treated as an intermediate noisy state for the proposed perturbed consistency diffusion model to reconstruct details and suppress noise. Moreover, a perturbed diffusion consistency loss is proposed to constrain the forward diffusion trajectory of the final restored image to remain consistent with the perturbed state, thus improving restoration fidelity and stability in the absence of supervision. Extensive experiments on publicly available benchmarks show that the proposed method outperforms state-of-the-art unsupervised competitors and is comparable to supervised methods while being more generalizable to various scenes. Code is available at https://github.com/JianghaiSCU/ZeroIDIR.

2605.11430 2026-05-13 cs.CV cs.AI cs.LG

Diabetic Retinopathy Classification using Downscaling Algorithms and Deep Learning

Nishi Doshi, Urvi Oza, Pankaj Kumar

AI总结 该研究针对糖尿病视网膜病变(DR)分类中的图像尺寸不一问题,提出在输入深度学习网络前使用多种下采样算法对视网膜图像进行预处理。研究结合了Kaggle和印度糖尿病视网膜病变图像数据集,基于改进的多通道Inception V3网络架构进行分类实验,结果在准确率、特异性和灵敏度方面优于现有方法,为DR的自动分级提供了更有效的解决方案。

详情
Journal ref
2020 7th International Conference on Signal Processing and Integrated Networks (SPIN)
英文摘要

Diabetic Retinopathy (DR) is an art and science of recording and classifying the retinal images of a diabetic patient. DR classification deals with classifying retinal fundus image into five stages on the basis of severity of diabetes. One of the major issue faced while dealing with DR classification problem is the large and varying size of images. In this paper we propose and explore the use of several downscaling algorithms before feeding the image data to a Deep Learning Network for classification. For improving training and testing; we amalgamate two datasets: Kaggle and Indian Diabetic Retinopathy Image Dataset. Our experiments have been performed on a novel Multi Channel Inception V3 architecture with a unique self crafted preprocessing phase. We report results of proposed approach using accuracy, specificity and sensitivity, which outperform the previous state of the art methods. Index Terms: Diabetic Retinopathy, Downscaling Algorithms, Multichannel CNN Architecture, Deep Learning

2605.11428 2026-05-13 cs.LG

FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling

Hongmin Li

AI总结 本文提出了一种名为 FastUMAP 的可扩展降维方法,旨在解决在重复使用场景下非线性降维方法计算效率低的问题。该方法基于双分图的地标采样,通过构建稀疏的点-地标模糊图,并结合 Nystrom 方法进行谱初始化,再在双分图上进行 UMAP 风格的目标优化,从而在保证一定精度的同时显著提升计算速度。实验表明,FastUMAP 在多个数据集上相比传统方法具有更快的运行时间,适合用于需要频繁进行降维探索的场景。

Comments 17 pages, 5 figures

详情
英文摘要

Exploratory analysis of high-dimensional data rarely stops at a single embedding. In practice, analysts rerun dimensionality reduction after changing preprocessing, subsets, or hyperparameters, and standard nonlinear methods can quickly become the bottleneck. We introduce FastUMAP (Bipartite Manifold Approximation and Projection), a landmark-based method designed for this repeated-use setting. FastUMAP builds a sparse point-landmark fuzzy graph, computes a Nystrom spectral warm start from the induced landmark affinity, and then refines all sample coordinates with a UMAP-style objective on the bipartite graph. The landmark ratio r = m/n provides a direct way to trade runtime against fidelity. On 9 benchmark datasets spanning 178 to 70,000 samples, FastUMAP has the lowest runtime on 7 datasets in our reported default-implementation comparison on one workstation. On MNIST and Fashion-MNIST (n=70000), it runs in about 4.6 seconds, compared with about 73--75 seconds for Barnes--Hut t-SNE, while reaching 91.4% mean kNN accuracy versus 94.6% for the strongest accuracy baseline. FastUMAP is therefore best viewed as a fast option for repeated exploratory embedding, rather than as a replacement for accuracy-first methods.

2605.11427 2026-05-13 cs.CV

PD-4DGS:Progressive Decomposition of 4D Gaussian Splatting for Bandwidth-Adaptive Dynamic Scene Streaming

Jiachen Li, Guangzhi Han, Jin Wan, Delong Han, Yuan Gao, Min Li, Mingle Zhou, Gang Li

AI总结 PD-4DGS 是一种面向动态场景流媒体的渐进式 4D 高斯溅射压缩框架,旨在解决现有 4DGS 模型在带宽受限设备上渲染延迟高、无法适配自适应码率传输的问题。该方法通过层次化形变分解(HDD)将 4DGS 的运动结构分解为三个可独立传输的层次,使流媒体前缀即可渲染,实现可扩展的流式传输。实验表明,PD-4DGS 在保持渲染质量的同时显著降低了传输带宽和首帧延迟,为 4DGS 在移动设备上的实时流媒体应用提供了可行方案。

详情
英文摘要

4D Gaussian Splatting (4DGS) enables high-quality dynamic novel view synthesis, yet current models remain monolithic bitstreams that clients must download in full before any frame can be rendered, causing black-screen waits of tens to hundreds of seconds on mobile bandwidth and leaving 4DGS incompatible with modern adaptive-bitrate delivery. Progressive 3DGS compression alleviates this for static scenes, but it acts only on spatial anchors and cannot partition the temporal deformation networks that dominate dynamic-scene size. We present PD-4DGS, the first framework for progressive compression and on-demand transmission of 4DGS. Hierarchical Deformation Decomposition (HDD) externalises the coarse-to-fine motion hierarchy already latent in 4DGS into three independently transmittable layers -- a static scaffold, a global deformation, and a local refinement -- so that any prefix of the bitstream is already renderable, turning a single training run into a scalable, DASH/HLS-compatible bitstream. A Gaussian-entropy attribute rate-distortion loss together with a temporal mask consistency regulariser shrink the base layer while suppressing low-bitrate flicker; a capacity-weighted rollout schedule, gated online by a learnt activation rate rho, then prevents deformation-network under-training without any per-scene hyperparameter. On the Dycheck iPhone benchmark, PD-4DGS cuts the streamed bitstream by >60% at matched rendering fidelity and reduces first-frame latency from 73--930 s to ~1.7 s on a 2 Mbps link, uniquely enabling true on-demand progressive streaming for 4DGS.

2605.11426 2026-05-13 cs.AI

A Mechanistic Investigation of Supervised Fine Tuning

Ruhaan Chopra

AI总结 本研究探讨了监督微调(SFT)对大语言模型激活状态的影响,发现尽管微调前后隐藏层激活的余弦相似度很高,但通过预训练稀疏自编码器(SAE)投影后,稀疏潜在表示存在显著差异。研究提出了一种基于SAE的分析方法,揭示了微调过程中任务和层特异性语义特征的变化,并发现了与安全对齐相关的分层更新模式。该方法为理解SFT的机制提供了高分辨率的诊断工具。

详情
英文摘要

The cosine similarity between a large language model's hidden activations before and after Supervised Fine-Tuning (SFT) remains very high. This, at first glance, suggests that SFT leaves the model's activation geometry largely undisturbed. However, projecting both sets of activations through a Sparse Autoencoder (SAE) pretrained on the base model reveals that the underlying sparse latents diverge significantly. We introduce a novel investigative pipeline which utilizes these pretrained SAEs as a high-resolution diagnostic tool to mechanistically investigate the drivers of this representational divergence. Through our analytical pipeline, we discover task-specific and layer-specific distributions of the precise semantic features that are systematically altered during supervised fine-tuning. We additionally identify a layer-wise update profile specific to safety alignment. All code, experimental scripts, and analysis files associated with this work are publicly available at: https://github.com/ruhzi/sae-investigation.

2605.11424 2026-05-13 cs.CV

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

Jimin Tang, Wenyuan Zhang, Junsheng Zhou, Zian Huang, Kanle Shi, Shenkun Xu, Yu-Shen Liu, Zhizhong Han

AI总结 VidSplat 是一种基于高斯点扩散的生成式重建框架,旨在解决在稀疏视角下进行多视角表面重建时存在的缺失区域和遮挡问题。该方法利用视频扩散先验,通过迭代生成新视角来补充输入覆盖不足的区域,从而实现对完整3D场景的重建。其核心在于提出了一种无需训练的分阶段去噪策略和迭代优化机制,有效提升了重建的几何一致性和完整性。

Comments Accepted by SIGGRAPH Conference 2026. Project Page: https://tangjm24.github.io/VidSplat

详情
英文摘要

Gaussian Splatting has achieved remarkable progress in multi-view surface reconstruction, yet it exhibits notable degradation when only few views are available. Although recent efforts alleviate this issue by enhancing multi-view consistency to produce plausible surfaces, they struggle to infer unseen, occluded, or weakly constrained regions beyond the input coverage. To address this limitation, we present VidSplat, a training-free generative reconstruction framework that leverages powerful video diffusion priors to iteratively synthesize novel views that compensate for missing input coverage, and thereby recover complete 3D scenes from sparse inputs. Specifically, we tackle two key challenges that enable the effective integration of generation and reconstruction. First, for 3D consistent generation, we elaborate a training-free, stage-wise denoising strategy that adaptively guides the denoising direction toward the underlying geometry using the rendered RGB and mask images. Second, to enhance the reconstruction, we develop an iterative mechanism that samples camera trajectories, explores unobserved regions, synthesizes novel views, and supplements training through confidence weighted refinement. VidSplat performs robustly to sparse input and even a single image. Extensive experiments on widely used benchmarks demonstrate our superior performance in sparse-view scene reconstruction.

2605.11418 2026-05-13 cs.AI cs.CR

Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry

Shoumik Saha, Kazem Faghih, Soheil Feizi

AI总结 本文研究了AI代理技能注册系统中基于自然语言的语义供应链攻击问题,揭示了SKILL.md文件在技能发现、选择和治理阶段可能被恶意利用的风险。通过实验证明,攻击者可通过精心设计的文本触发器提升恶意技能的可见性、引导代理选择功能相似的对抗性变体,并有效规避安全审查。研究指出,SKILL.md不仅是文档,更是影响代理行为的关键操作性文本,暴露了当前AI代理能力扩展机制中的重大安全隐患。

Comments 31 pages, 21 figures, 10 tables

详情
英文摘要

Autonomous AI agents increasingly extend their capabilities through Agent Skills: modular filesystem packages whose SKILL.md files describe when and how agents should use them. While this design enables scalable, on-demand capability expansion, it also introduces a semantic supply-chain risk in which natural-language metadata and instructions can affect which skills are admitted, surfaced, selected, and loaded. We study SKILL.md - only attacks across three registry-facing stages of the Agent Skill lifecycle, using real ClawHub skills and realistic registry mechanisms. In Discovery, short textual triggers can manipulate embedding-based retrieval and improve adversarial skill visibility, achieving up to 86% pairwise win rate and 80% Top-10 placement. In Selection, description-only framing biases agents toward functionally equivalent adversarial variants, which are selected in 77.6% of paired trials on average. In Governance, semantic evasion strategies cause malicious skills to avoid a blocking verdict in 36.5%-100% of cases. Overall, our results show that SKILL.md is not passive documentation but operational text that shapes which third-party capabilities agents find, trust, and use.

2605.11414 2026-05-13 cs.LG cs.AI

Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer

Nilushika Udayangani, Kishor Nandakishor, Marimuthu Palaniswami

AI总结 本文研究了在时间序列分类任务中,如何将完整序列分类器的知识迁移到仅基于部分序列输入的分类器中。为了解决部分数据缺乏判别性特征导致的泛化能力下降问题,作者提出了一种基于生成扩散先验的知识蒸馏框架(GDPD),通过将短上下文学生特征视为完整上下文教师特征的退化观测,利用扩散模型的迭代恢复能力学习教师特征的生成先验,并引导学生特征学习长期上下文知识,从而有效提升部分序列分类的性能。实验表明,GDPD在多种数据集和架构下均表现出优越的全序列到部分序列的知识迁移效果。

Comments Published as a conference paper at ICLR 2026 (Brazil, Rio de Janeiro)

详情
Journal ref
The Fourteenth International Conference on Learning Representations 2026
英文摘要

While traditional time-series classifiers assume full sequences at inference, practical constraints (latency and cost) often limit inputs to partial prefixes. The absence of class-discriminative patterns in partial data can significantly hinder a classifier's ability to generalize. This work uses knowledge distillation (KD) to equip partial time series classifiers with the generalization ability of their full-sequence counterparts. In KD, high-capacity teacher transfers supervision to aid student learning on the target task. Matching with teacher features has shown promise in closing the generalization gap due to limited parameter capacity. However, when the generalization gap arises from training-data differences (full versus partial), the teacher's full-context features can be an overwhelming target signal for the student's short-context features. To provide progressive, diverse, and collective teacher supervision, we propose Generative Diffusion Prior Distillation (GDPD), a novel KD framework that treats short-context student features as degraded observations of the target full-context features. Inspired by the iterative restoration capability of diffusion models, we learn a diffusion-based generative prior over teacher features. Leveraging this prior, we posterior-sample target teacher representations that could best explain the missing long-range information in the student features and optimize the student features to be minimally degraded relative to these targets. GDPD provides each student feature with a distribution of task-relevant long-context knowledge, which benefits learning on the partial classification task. Extensive experiments across earliness settings, datasets, and architectures demonstrate GDPD's effectiveness for full-to-partial distillation.

2605.11408 2026-05-13 cs.LG cs.AI cs.CL

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

Bo Zheng, Yudong Chen, Zihua Xiong, Shuai Fang, Peidong He, Yang Yang, Sheng Guo

AI总结 MaskTab 是一个专为工业级表格数据设计的统一预训练框架,旨在解决表格数据高维、缺失值多且标签稀少的问题。该方法通过引入可学习的缺失值标记和混合监督预训练策略,结合多专家增强损失函数,有效提升了模型在大规模工业数据上的表现。实验表明,MaskTab 在多个工业基准上显著优于现有方法,并能高效蒸馏到轻量模型中,在严格时延和可解释性约束下仍保持优越性能。

详情
英文摘要

Tabular data forms the backbone of high-stakes decision systems in finance, healthcare, and beyond. Yet industrial tabular datasets are inherently difficult: high-dimensional, riddled with missing entries, and rarely labeled at scale. While foundation models have revolutionized vision and language, tabular learning still leans on handcrafted features and lacks a general self-supervised framework. We present MaskTab, a unified pre-training framework designed specifically for industrial-scale tabular data. MaskTab encodes missing values via dedicated learnable tokens, enabling the model to distinguish structural absence from random dropout. It jointly optimizes a hybrid supervised pre-training scheme--utilizing a twin-path architecture to reconcile masked reconstruction with task-specific supervision--and an MoE-augmented loss that adaptively routes features through specialized subnetworks. On industrial-scale benchmarks, it achieves +5.04% AUC and +8.28% KS over prior art under rigorous scaling. Moreover, its representations distill effectively into lightweight models, yielding +2.55% AUC and +4.85% KS under strict latency and interpretability constraints, while improving robustness to distribution shifts. Our work demonstrates that tabular data admits a foundation-model treatment--when its structural idiosyncrasies are respected.

2605.11406 2026-05-13 cs.LG

A Boundary-Aware Non-parametric Granular-Ball Classifier Based on Minimum Description Length

Zeqiang Xian, Caihui Liu, Yong Zhang, Wenjing Qiu, Duoqian Miao, Witold Pedrycz

AI总结 本文提出了一种基于最小描述长度原理的边界感知非参数粒球分类器(MDL-GBC),旨在解决现有粒球分类方法中依赖手工设计质量指标和启发式规则的问题。该方法将类条件粒球构建建模为局部模型选择问题,通过比较单球模型、双球模型和核心-边界模型的描述长度,决定粒球的保留、分割或细化策略,从而实现边界敏感区域的显式建模与分类机制的一致性。实验表明,MDL-GBC在多个基准数据集上取得了优异的分类性能,具有良好的可解释性和竞争力。

Comments 13 pages, 2 figures

详情
英文摘要

Existing granular-ball classification methods are often driven by handcrafted quality measures, neighborhood rules, or heuristic splitting and stopping criteria, which may reduce the transparency of local construction decisions and hinder explicit modeling of boundary-sensitive regions. To address this issue, this paper proposes a Minimum Description Length based Granular-Ball Classifier (MDL-GBC), a boundary-aware non-parametric and interpretable granular-ball classifier. MDL-GBC formulates class-conditional granular-ball construction as a local model selection problem under the Minimum Description Length principle. For each class, samples from the target class provide positive class evidence, while samples from the remaining classes provide negative boundary evidence. For each current granular ball, three candidate explanations are compared under a unified description-length criterion: a single-ball model, a two-ball model, and a core-boundary model. The selected model determines whether the ball is retained, geometrically split, or refined into core and boundary-sensitive child balls, thereby making local construction decisions consistent with the MDL-based classification mechanism. During prediction, a class-level mixture coding rule aggregates stable granular balls of the same class and assigns the test sample by comparing class-wise coding costs. Experiments on 18 benchmark datasets show that MDL-GBC achieves competitive classification performance against classical classifiers and representative granular-ball-based methods, obtaining the best average Accuracy, Macro-F1, and average rank. These results indicate that MDL-GBC provides an effective and interpretable alternative to conventional heuristic granular-ball classification strategies.

2605.11404 2026-05-13 cs.AI

Attributing Emergence in Million-Agent Systems

Ling Tang, Jilin Mei, Qian Chen, Qihan Ren, Linfeng Zhang, Quanshi Zhang, Jing Shao, Xia Hu, Dongrui Liu

AI总结 该研究探讨了在百万智能体系统中如何将宏观涌现现象归因于个体智能体的问题。现有方法因计算复杂度限制,仅适用于小规模系统,而实际社会现象常发生在百万级智能体规模。为此,研究将Aumann-Shapley路径积分归因方法扩展至百万智能体规模,实现了高效且满足所有四个公理的归因计算,并通过实证分析揭示了小规模与全量数据在归因结果上的结构性差异,证明了全量归因对于非线性宏观指标的理论必要性。

详情
英文摘要

Large language models (LLMs) can simulate human-like reasoning and decision-making in individual agents. LLM-powered multi-agent systems (MAS) combine such agents to simulate population-scale social phenomena such as polarization, information cascades, and market panics. Such studies require attributing macro emergence to individual agents, but existing axiomatic methods scale combinatorially in $N$ and have been confined to $N \lesssim 10^3$, while the phenomena they explain occur at $N \geq 10^6$. We address this gap by adapting Aumann--Shapley path-integral attribution to LLM-powered MAS at million-agent scale; the resulting method satisfies all four axioms, runs four to five orders of magnitude faster than sampled Shapley on the same hardware. We use this method to test the scale gap empirically: across 14 days of public Bluesky data ($1{,}671{,}587$ active users), we compute the attribution at both full scale and the visibility-biased $N = 10^2$ convenience sample used by small-scale studies, and the two disagree structurally. At full scale the long tail and middle tier jointly carry the majority; the biased small panel attributes almost everything to a few high-follower accounts. We then prove that under any nonlinear macro indicator the disagreement cannot be reduced by post-hoc rescaling: an Attribution Scaling Bias theorem shows that no global rescaling factor can reconcile small-scale and full-scale attribution. Full-scale attribution is therefore not a methodological choice but a theoretical requirement for any nonlinear macro indicator.

2605.11403 2026-05-13 cs.LG cs.AI cs.CL

fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum

Mingxiong Lin, Zhangquan Gong, Maowen Tang, Qian Li, Chuangchuang Wang, Jian Ma, Sutian Huang, Kai Tang, Haonan Lu

AI总结 该研究针对基于可验证奖励的强化学习(RLVR)中主流算法Group Relative Policy Optimization(GRPO)存在的两个效率问题,提出了FG-ExPO方法。该方法通过引入准确率条件的KL缩放(AKL)和高斯课程采样(GCS)两个轻量组件,分别动态调整策略探索的约束强度和优化问题采样分布,从而提升模型在数学推理任务中的训练效率。实验表明,FG-ExPO在多个主流基准上显著优于原始GRPO,尤其在AIME 2025等任务中展现出更优的性能提升。

详情
英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has become the standard paradigm for LLM mathematical reasoning, with Group Relative Policy Optimization (GRPO) serving as the dominant algorithm. We identify two overlooked inefficiencies inherent in GRPO. First, a fixed KL coefficient overly restricts policy exploration at moments when the model needs to diverge significantly from the reference policy. Second, uniform question sampling overlooks that moderately difficult problems produce the most informative gradient signals. We propose FG-ExPO, short for Frontier-Guided Exploration-Prioritized Policy Optimization, which integrates two lightweight components. Accuracy-Conditioned KL Scaling (AKL) adjusts the KL penalty strength through a smooth nonlinear function of batch average accuracy, loosening the constraint when the model performs poorly and strengthening it when the model achieves satisfactory results. Gaussian Curriculum Sampling (GCS) assigns sampling weights to questions following a Gaussian distribution centered at a moderate accuracy level around 0.5, focusing model training on its learning frontier. We conduct evaluations on DeepSeek-R1-Distill-Qwen-1.5B and Qwen3-8B-Base across six mainstream mathematical reasoning benchmarks. Experimental results demonstrate that FG-ExPO consistently outperforms vanilla GRPO. It delivers an absolute improvement of 13.34 on the AIME 2025 pass@32 metric, rising from 63.33 percent to 76.67 percent, and obtains an average pass@32 gain of 2.66 on the 8B model. The substantially larger performance gains observed on pass@32 compared to pass@1 verify that FG-ExPO enlarges the model's effective exploration space under a fixed inference budget.

2605.11402 2026-05-13 cs.LG cs.CR cs.NI

More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting

Youquan Xian, Xueying Zeng, Lingjia Meng, Lei Cui, Runhan Song, Wei Wang, Zhengquan Ding, Peng Liu, Zhiyu Hao

AI总结 本文提出了一种语义感知的流量增强框架SATA,旨在解决基于深度学习的网站指纹识别技术在真实环境中的泛化能力不足问题。该方法通过协议规则进行应用层语义增强,扩展流量中的资源组成模式和帧序列模式,并引入跨层特征对齐机制,将增强的语义信息与可观测的流量特征进行对齐。实验表明,SATA能够生成训练集中不存在但在测试集中真实存在的流量模式,显著提升了主流模型在多种复杂场景下的性能,尤其在开放世界设置中,准确率和AUROC分别提升了90.81%和48.37%。

Comments 18 pages, 19 figures, Submitted to NDSS 2027

详情
英文摘要

Deep learning-based website fingerprinting has emerged as an effective technique for inferring the websites users visit. Although existing methods achieve strong performance on closed-world datasets, they often fail to generalize to real-world environments, especially under geographic and temporal shifts. This limitation fundamentally stems from the coupled effects of two key challenges: application-layer resource composition variability and observable feature instability induced by cross-layer encapsulation. Intertwined, these factors induce systematic shifts between underlying application semantics and observable traffic features. To address the above challenges, we propose SATA , a semantics-aware traffic augmentation framework. Specifically, SATA first performs application-layer semantic augmentation based on protocol rules, expanding the resource composition patterns within each flow and frame sequence patterns under protocol constraints. Based on these augmented frame sequences, we further introduce a cross-layer feature alignment mechanism via knowledge distillation. It aligns frame sequence with packet-length sequence features, enabling cross-layer feature alignment between enhanced semantics and observable sequences. Extensive experiments show that SATA successfully generates traffic patterns that are absent from the training set but genuinely exist in the test set, and significantly improves the performance of mainstream models across diverse and complex scenarios. In particular, in open-world settings, SATA improves ACC by 90.81% and AUROC by 48.37%. The source code of the prototype system is available at https://anonymous.4open.science/r/SATA-B6C2/.

2605.11398 2026-05-13 cs.AI cs.CL

AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment

Robin Linzmayer, Georgianna Lin, Di Coneybeare, Jason Chu, Trudi Cloyd, Manish Garg, Miles Gordon, Elizabeth Hartofilis, Benjamin Hong, Ashraf Hussain, Eugene Y. Kim, Oluchi Iheagwara King, Ross McCormack, Erica Olsen, John K. Riggins, Mustafa N. Rasheed, Dana L. Sacco, Vinay Saggar, Osman R. Sayan, Amit Shembekar, Janice Shin-Kim, Wendy W. Sun, Bernard P. Chang, David Kessler, Noémie Elhadad

AI总结 本文提出 AcuityBench,一个用于评估语言模型能否从用户医疗描述中正确识别护理紧急程度的基准。该基准整合了五个公开数据集,涵盖用户对话、论坛帖子、临床案例和患者门户信息,并统一采用四级紧急程度框架进行评估。研究发现,不同模型在明确案例和模糊案例中的表现存在显著差异,且任务形式的选择会影响误判类型,突显了临床紧急程度识别作为关键安全能力的重要性。

Comments 41 pages, 5 figures. Preprint under review for the Track on Evaluations and Datasets at NeurIPS 2026

详情
英文摘要

We introduce AcuityBench, a benchmark for evaluating whether language models identify the appropriate urgency of care from user medical presentations. Existing health benchmarks emphasize medical question answering, broad health interactions, or narrow workflow-specific triage tasks, but they do not offer a unified evaluation of acuity identification across these settings. AcuityBench addresses this gap by harmonizing five public datasets spanning user conversations, online forum posts, clinical vignettes, and patient portal messages under a shared four-level acuity framework ranging from home monitoring to immediate emergency care. The benchmark contains 914 cases, including 697 consensus cases for standard accuracy evaluation and 217 physician-confirmed ambiguous cases for uncertainty-aware evaluation. It supports two complementary task formats: explicit four-way classification in a QA setting, and free-form conversational responses evaluated with a rubric-based judge anchored to the same framework. Across 12 frontier proprietary and open-weight models, we find substantial variation in clear-case acuity accuracy and error direction. Comparing task formats reveals a systematic tradeoff: conversational responses reduce over-triage but increase under-triage relative to QA, especially in higher-acuity cases. In ambiguous cases, no model closely matches the distribution of physician judgments, and model predictions are more concentrated than expert clinical uncertainty. We also compare expert and model adjudication on a subset of maximally ambiguous cases, using those cases to examine the role of clinical uncertainty in label disagreement. Together, these results position acuity identification as a distinct safety-critical capability and show that AcuityBench enables systematic comparison and stress-testing of how well models guide users to the right level of care in real-world health use.

2605.11396 2026-05-13 cs.LG

MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization

Yupeng Su, Ruijie Zhang, Ziyue Liu, Yequan Zhao, Zheng Zhang

AI总结 本文提出MuonQ,一种基于方向保真优化的低比特Muon优化器训练框架,旨在解决Muon优化器在量化训练中对误差敏感的问题。通过预量化归一化、结构分解和μ律压缩量化等方法,MuonQ有效抑制了量化误差的累积与方向偏差,实现了稳定高效的4比特量化训练。实验表明,MuonQ在保持训练损失和下游任务准确率接近全精度Muon的同时,将优化器状态内存减少了7.3倍。

Comments MuonQ enables stable 4-bit quantization of Muon's optimizer states by preserving directional fidelity through pre-quantization normalization, structural decomposition, and companding quantization

详情
英文摘要

The Muon optimizer has emerged as a compelling alternative to Adam for training large language models, achieving remarkable computational savings through gradient orthogonalization. However, Muon's optimizer state is more sensitive to quantization errors: because the orthogonalization discards the magnitudes of singular values and retains only directional information, even small quantization errors in singular vector directions are amplified in the update. In this work, we propose MuonQ, a low-bit Muon training framework built on the principle of directional fidelity optimization. First, we apply a pre-quantization normalization so that each step introduces quantization errors of the same magnitude, preventing the accumulated error from developing a preferred direction. Second, we introduce a structural decomposition that separately quantizes the dominant singular components via power iteration, ensuring that quantization errors perturb only singular value magnitudes rather than rotating singular vector directions. Third, we adopt $μ$-law companding quantization to allocate higher resolution to densely packed momentum values, shifting the quantization objective from outlier preservation to dense-region distinguishability. Together, these techniques enable stable 4-bit quantization of Muon's optimizer states. Pre-training experiments on GPT-style and LLaMA-style models demonstrate that MuonQ at 4-bit precision closely matches full-precision Muon in both training loss and downstream task accuracy, while reducing optimizer state memory by up to 7.3 $\times$. Our code is available at https://github.com/YupengSu/MuonQ.

2605.11392 2026-05-13 cs.AI

Transformer Interpretability from Perspective of Attention and Gradient

Yongjin Cui, Xiaohui Fan, Huajun Chen

AI总结 本文从注意力和梯度的角度深入研究了Transformer模型的可解释性,提出了一种通过引导梯度方向(即注意力方向)实现更全面和细致的特征区域解释的方法。该方法有助于更好地理解Transformer的工作机制,并揭示了Vision Transformer(ViT)与人类图像感知之间的差异,展示了几乎不可察觉的图像类别篡改现象,可能在特定场景下带来安全隐患。

详情
英文摘要

Although researchers' attention is more focused on the performance of Transformer models, the interpretation of Transformer can never be ignored. Gradient is widely utilized in Transformer interpretation. From the perspective of attention and gradient, we conduct an in-depth study of Transformer interpretation and propose a method to achieve it by guiding the gradient direction, or more precisely, the attention direction. The method enables more comprehensive interpretation of feature regions, offers detail interpretation, and helps to better understand Transformer mechanism. Leveraging the difference in how Vision Transformer (ViT) and humans perceive images, we alter the class of an image in a way that is almost imperceptible to the human eye. This class rewriting phenomenon may potentially pose security risks in certain scenarios.

2605.11388 2026-05-13 cs.CL cs.AI

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

Dean Light, Michael Theologitis, Kshitish Ghate, Shuyue Stella Li, Benjamin Newman, Chirag Shah, Aylin Caliskan, Pang Wei Koh, Dan Suciu, Yulia Tsvetkov

AI总结 该研究提出了一种名为“Deep Reasoning”的方法,旨在提升通用智能体在推理任务中的灵活性与适应性。通过结构化的元推理,该方法在推理过程中动态构建任务特定的推理框架,从而更有效地处理复杂问题。实验表明,基于该方法构建的通用智能体DOLORES在多个困难基准上显著优于现有方法,展现了其在结构化推理和任务适应性方面的优势。

Comments Preprint under review

详情
英文摘要

Humans intuitively solve complex problems by flexibly shifting among reasoning modes: they plan, execute, revise intermediate goals, resolve ambiguity through associative judgment, and apply formal procedures to well-specified subproblems. Current LLM agents lack this flexibility, as their scaffolds hard-code such reasoning decisions in advance. These scaffolds are effective when their prescribed structure matches the task, but brittle when solving the task requires adapting the structure of reasoning itself. We introduce Deep Reasoning -- an inference-time approach for constructing task-specific scaffolds through structured meta-reasoning. Deep Reasoning uses a formal language that represents meta-reasoning as executable decompositions over associative inference, formal computation, and recursive subproblem solving, enabling decomposition principles to be encoded as in-context examples that guide test-time scaffold construction. We instantiate this approach in a general-purpose agent (DOLORES) that distributes complex tasks across more controlled reasoning threads. We evaluate it against state-of-the-art scaffolding methods across four hard benchmarks: multi-hop reasoning, long-chain question answering, long-context aggregation, and deep research-style information seeking. DOLORES outperforms all evaluated scaffolds across three model sizes and two model families, improving over the strongest evaluated scaffold baseline by 24.8% on average. DOLORES distributes cognition across structured, lower-load reasoning threads, thereby reducing premature termination and hallucinations. This advantage can even bridge the scaling gap, with an 8B version surpassing all evaluated 32B baselines from the same family in more than half the settings. These results point toward future agentic systems that treat scaffolding as adaptive reasoning, constructing the structure each task requires just-in-time.

2605.11387 2026-05-13 cs.LG cs.RO

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

Alberta Longhini, David Emukpere, Jean-Michel Renders, Seungsu Kim

AI总结 本文研究了在保持生成策略动作分布多模态特性的同时,如何利用强化学习对预训练生成策略进行微调的问题。为了解决现有方法在提升任务性能时导致行为模式单一化的问题,作者提出了一种无监督的行为模式发现框架,通过挖掘策略中的潜在行为模式,并利用互信息作为内在奖励,以在提升任务成功率的同时保持行为多样性。实验表明,该方法在机器人操作任务中优于传统微调方法,取得了更高的成功率并保留了更丰富的多模态动作分布。

详情
Journal ref
International Conference on Machine Learning, 2026
英文摘要

We address the problem of fine-tuning pre-trained generative policies with reinforcement learning (RL) while preserving the multimodality of their action distributions. Existing methods for RL fine-tuning of generative policies (e.g., diffusion policies) improve task performance but often collapse diverse behaviors into a single reward-maximizing mode. To mitigate this issue, we propose an unsupervised mode discovery framework that uncovers latent behavioral modes within generative policies. The discovered modes enable the use of mutual information as an intrinsic reward, regularizing RL fine-tuning to enhance task success while maintaining behavioral diversity. Experiments on robotic manipulation tasks demonstrate that our method consistently outperforms conventional fine-tuning approaches, achieving higher success rates and preserving richer multimodal action distributions.

2605.11386 2026-05-13 cs.AI

Revisiting Privacy Preservation in Brain-Computer Interfaces: Conceptual Boundaries, Risk Pathways, and a Protection-Strength Grading Framework

Lei Sun, Xiuqing Mao, Shuai Zhang, Qingyu Zeng, Min Zhao, Jiyuan Li, Wenle Dong

AI总结 随着脑机接口(BCI)技术从实验室走向临床和实际应用,其隐私保护问题日益突出。本文系统回顾了BCI系统中隐私泄露的多种路径,提出了涵盖保护对象、生命周期阶段和保护强度等级的三维分类框架,将现有研究分为四个保护强度等级。研究强调,BCI隐私保护不仅要隐藏数据,还需分离任务无关的敏感信息,同时保持系统功能的实用性,并指出心智隐私和神经伦理风险仍是亟待解决的开放问题。

详情
英文摘要

Brain-computer interfaces (BCIs) are moving rapidly from laboratory research into clinical, edge, and real-world settings. Under ISO/IEC 8663:2025, a BCI is a direct communication link between central nervous system activity and external software or hardware systems. This link expands privacy risk beyond raw neural-signal leakage: neural data, derived representations, model assets, and decoded outputs can be re-associated with individuals across collection, transmission, storage, training, inference, and feedback, or used to infer information beyond what a task requires. Starting from the general BCI paradigm, this review deffnes privacy-protection boundaries, protection objects, and the relationship between user data privacy and model privacy within a shared risk pathway. It then proposes a three-dimensional framework - protection object, lifecycle stage, and dominant protection-strength level - to classify existing work into four levels of protection strength. Finally, mental privacy and neuroethical risks are treated as open issues, emphasizing that BCI privacy protection should not only obscure data but also disentangle task-irrelevant sensitive information while preserving downstream utility. Keywords: Brain-computer interface, Neural data privacy, User data privacy, Model privacy, Disentanglement of task-irrelevant sensitive information, Protection-strength grading, Neuroethical risks

2605.11385 2026-05-13 cs.CV cs.RO

JACoP: Joint Alignment for Compliant Multi-Agent Prediction

Qingze Liu, Alen Mrdovic, Danrui Li, Mathew Schwartz, Sejong Yoon, Mubbasir Kapadia

AI总结 该论文提出了一种名为JACoP的多阶段框架,用于解决多智能体轨迹预测中的集体合规性问题。其核心方法结合了基于锚点的个体轨迹筛选和基于马尔可夫随机场的联合轨迹对齐,有效减少了轨迹间的社交碰撞和环境违规。JACoP在保证预测精度的同时,显著提升了场景层面的合理性,为实际应用提供了更安全可靠的预测方案。

Comments Accepted by CVPRF 2026

详情
英文摘要

Stochastic Human Trajectory Prediction (HTP) using generative modeling has emerged as a significant area of research. Although state-of-the-art models excel in optimizing the accuracy of individual agents, they often struggle to generate predictions that are collectively compliant, leading to output trajectories marred by social collisions and environmental violations, thus rendering them impractical for real-world applications. To bridge this gap, we present JACoP: Joint Alignment for Compliant Multi-Agent Prediction, an innovative multi-stage framework that ensures scene-level plausibility. JACoP incorporates an Anchor-Based Agent-Centric Profiler for effective initial compliance filtering and employs a Markov Random Field (MRF) based aligner to formalize the joint selection for scene predictions. By representing inter-agent spatial and social costs as MRF energy potentials, we successfully infer and sample from the joint trajectory distribution, achieving prediction with optimal scene compliance. Comprehensive experiments show that JACoP not only achieves competitive accuracy, but also sets a new standard in reducing both environmental violations and social collisions, thereby confirming its ability to produce collectively feasible and practically applicable trajectory predictions.

2605.11383 2026-05-13 cs.CV

HamBR: Active Decision Boundary Restoration Based on Hamiltonian Dynamics for Learning with Noisy Labels

Ningkang Peng, Jingyang Mao, Qianfeng Yu, Xiaoqian Peng, Peirong Ma, Yanhui Gu

AI总结 在大规模视觉识别和数据挖掘任务中,噪声标签会严重影响深度神经网络的泛化能力。本文首次提出了一种基于哈密顿动力学的主动决策边界修复方法HamBR,通过球面哈密顿蒙特卡洛机制主动探测特征空间中的类间模糊区域,并合成高质量虚拟异常样本,利用能量模型建立鲁棒的决策边界屏障,从而恢复决策边界的判别性。实验表明,HamBR在多个基准数据集上取得了最先进的性能,并显著提升了模型的分布外检测能力。

详情
英文摘要

In large-scale visual recognition and data mining tasks, the presence of noisy labels severely undermines the generalization capability of deep neural networks (DNNs). Prevalent sample selection methods rely primarily on training loss or prediction confidence for passive screening. However, within a feature space degraded by noise, decision boundaries undergo systematic boundary collapse. This phenomenon hinders the ability of the model to distinguish between hard clean samples and noisy samples at the decision margins, thereby creating a significant performance bottleneck. This study is the first to emphasize the pivotal importance of active boundary restoration for noise-robust learning. We propose HamBR, a novel paradigm based on Hamiltonian dynamics. The core approach leverages the Spherical Hamiltonian Monte Carlo (Spherical HMC) mechanism to actively probe inter-class ambiguous regions within the representation space and synthesize high-quality virtual outliers. By imposing explicit repulsion constraints via energy-based modeling, these synthesized samples establish robust energy barriers at the decision boundaries. This mechanism forces real samples to move from dispersed overlapping regions toward their respective class centers, thereby restoring the discriminative sharpness of the decision boundaries. HamBR demonstrates exceptional versatility and can be integrated as a plug-and-play defense module into existing semi-supervised noisy label learning frameworks. Empirical evaluations show that the proposed paradigm significantly enhances the discriminative accuracy of hard boundary samples, achieving state-of-the-art (SOTA) performance on CIFAR-10/100 and real-world noise benchmarks. Furthermore, it exhibits superior convergence efficiency and reliable robustness, while improving significantly the capability of the model for Out-of-Distribution (OOD) detection.

2605.11381 2026-05-13 cs.RO cs.DC

Kairos: A Scalable Serving System for Physical AI

Yinwei Dai, Ganesh Ananthanarayanan, Landon Cox, Xenofon Foukas, Bozidar Radunovic, Ravi Netravali

AI总结 随着物理AI在通用环境中的能力不断提升,其推理特性与数字AI存在显著差异,现有数字AI服务系统难以满足其需求。本文提出Kairos,首个专为多机器人设计的物理AI服务系统,将生成-执行循环作为核心机制,显著提升了任务执行效率。实验表明,Kairos在多种物理AI模型和机器人平台上,平均端到端任务延迟相比现有数字AI服务方法降低了31.8%至66.5%,且性能提升随机器人规模增大而增强。

详情
英文摘要

Physical AI is experiencing rapid growth with frontier foundation models increasing its capabilities across general environments. Physical AI tasks are characterized by inference properties that are markedly different from digital AI. They consist of multiple rounds of inference and action execution, generating a chunk of actions in each inference round, and asynchronously interleaving inference and execution. This makes existing digital AI serving systems unsuited for physical AI; a shortcoming that is critical for enabling their wide adoption, considering their size and the scale of the robot fleets they have to serve. To fill this gap, we design Kairos, the first multi-robot serving system that makes the generate-execute loop a first-class citizen, with active involvement in the execution phase. Across a wide range of physical AI models and robots, Kairos reduces the average end-to-end task latency by 31.8--66.5% over state-of-the-art digital AI serving practices, with gains scaling with the robot fleet size.

2605.11380 2026-05-13 cs.LG cs.AI

TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation Learning

Fan Ma, Qier An, Peng Chen, Lingfei Qian, Xiang Lan, Mingyang Jiang, Zhiling Gu, Xenophon Papademetris, Hua Xu

AI总结 本文提出了一种名为TRACE的自回归EEG预训练框架,旨在解决EEG信号多通道、非平稳特性带来的可迁移表征学习难题。TRACE通过在因果上下文中预测未来EEG片段,并在每个时间步进行跨通道一致的时序自适应计算,实现对不同时间阶段和通道间关系的灵活建模。该方法支持不同通道配置和记录域的异构预训练,实验表明其在多个下游任务中表现优异,尤其在运动想象和临床事件分类任务中具有竞争力。

详情
英文摘要

Learning transferable representations for electroencephalography (EEG) remains challenging because EEG signals are inherently multi-channel and non-stationary. Channels observed at the same time provide coupled measurements of neural activity, while the relevant temporal dynamics vary across contexts. This structure is poorly matched by architectures that apply uniform computation across time or route each channel patch independently. To this end, we propose TRACE, an autoregressive EEG pre-training framework that predicts future EEG patches from causal context while performing temporally adaptive and cross-channel coherent computation. At each temporal step, TRACE derives an expert routing decision from the causal cross-channel history and applies it jointly to all channels at that step. This preserves instantaneous cross-channel coherence while allowing different temporal regimes to activate different computation. Since routing is defined over the available channel set and causal temporal context, TRACE is compatible with heterogeneous pre-training across corpora with different channel counts, montages, sequence lengths, and recording domains. Across eight downstream EEG benchmarks, TRACE is evaluated in both settings: when downstream domains are seen only as unlabeled pre-training data and when downstream datasets are completely unseen during pre-training. It obtains the best results on several benchmarks while remaining competitive on motor imagery and clinical event classification tasks, with ablations supporting the importance of cross-channel temporal routing.