arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.01232 2026-05-05 cs.RO

A Principled Approach for Creating High-fidelity Synthetic Demonstrations for Imitation Learning

Moniruzzaman Akash, Momotaz Begum

详情

英文摘要

Recent advances in 3D Gaussian Splatting (3DGS) have enabled visually realistic demonstration generation from a single expert trajectory and a short multi-view scan. However, existing 3DGS-based synthesis pipelines typically generate new motions using sampling-based planners or trajectory optimization, which often deviate substantially from the expert's demonstrated path. While such deviations may be acceptable for tasks insensitive to motion shape, they discard subtle spatial and temporal structure that is critical for contact-rich and shape-sensitive manipulation, causing increased demonstration diversity to harm downstream policy learning. We argue that demonstration synthesis should treat the expert trajectory as a strong prior. Building on this principle, we propose a framework that synthesizes diverse task demonstrations while explicitly preserving expert motion structure. We model the expert trajectory using Dynamic Movement Primitives (DMPs) and retarget it to new goals, object configurations, and viewpoints within a reconstructed 3DGS scene, yielding phase-consistent, shape-preserving motion by construction. To safely realize this expert-preserving diversity in cluttered scenes, we introduce an analytic obstacle-aware DMP formulation that operates directly on the continuous density field induced by the 3DGS representation. This enables collision avoidance while minimally perturbing the nominal expert motion, unifying photorealistic rendering and geometric reasoning without additional scene representations. We evaluate our approach on a Spot mobile manipulator across three manipulation tasks with increasing sensitivity to trajectory fidelity. Compared to planner- and optimization-based synthesis, our method produces trajectories with lower deviation and collision rates and yields higher task success when training diffusion-based visuomotor policies.

URL PDF HTML ☆

赞 0 踩 0

2605.01231 2026-05-05 cs.LG

CombinationTS: A Modular Framework for Understanding Time-Series Forecasting Models

Xiaorui Wang, Fanda Fan, Chenxi Wang, Yuxuan Yang, Rui Tang, Kuoyu Gao, Simiao Pang, Yuanfeng Shang, Zhipeng Liu, Wanling Gao, Lei Wang, Jianfeng Zhan

Comments Accepted by ICML 2026 main track. Code available at https://github.com/BenchCouncil/CombinationTS

2605.01229 2026-05-05 cs.LG cs.CL

Attention Sinks in Massively Multilingual Neural Machine Translation:Discovery, Analysis, and Mitigation

Hillary Mutisya, John Mugane

2605.01227 2026-05-05 cs.RO

Dynamics Aware Quadrupedal Locomotion via Intrinsic Dynamics Head

Aman Arora, Nalini Ratha

Comments 8 pages, 6 figures

2605.01226 2026-05-05 cs.LG

Arbitrarily Conditioned Hierarchical Flows for Spatiotemporal Events

Keyan Chen, Qiwei Yuan, Zhitong Xu, Bin Shen, Shandian Zhe

2605.01224 2026-05-05 cs.CL

Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs

Anjishnu Mukherjee, Chutong Meng, Antonios Anastasopoulos

Comments under review

2605.01222 2026-05-05 cs.AI

Zero-Shot Signal Temporal Logic Planning with Disjunctive Branch Selection in Dynamic Semantic Maps

Bowen Ye, Ancheng Hou, Junyue Huang, Ruijia Liu, Xiang Yin

2605.01221 2026-05-05 cs.LG

Local Hessian Spectral Filtering for Robust Intrinsic Dimension Estimation

Genki Osada

Comments Accepted at ICML 2026

2605.01220 2026-05-05 cs.CV

Visual Implicit Autoregressive Modeling

Pengfei Jiang, Jixiang Luo, Luxi Lin, Zhaohong Huang, Xuelong Li

Comments ICML 2026

2605.01217 2026-05-05 cs.CV

Asymmetric Invertible Threat: Learning Reversible Privacy Defense for Face Recognition

Jiabei Zhang, Ziyuan Yang, Andrew Beng Jin Teoh, Yi Zhang

2605.01214 2026-05-05 cs.AI cs.CY

Agentic AI Systems Should Be Designed as Marginal Token Allocators

Siqi Zhu

2605.01208 2026-05-05 cs.AI

Faithful Mobile GUI Agents with Guided Advantage Estimator

Haowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhang

2605.01201 2026-05-05 cs.RO

To Do or Not to Do: Ensuring the Safety of Visuomotor Policies Learned from Demonstrations

Riad Ahmed, Moniruzzaman Akash, Momotaz Begum

2605.01199 2026-05-05 cs.LG

Focus and Dilution: The Multi-stage Learning Process of Attention

Zheng-An Chen, Pengxiao Lin, Zhi-Qin John Xu, Tao Luo

Comments ICML 2026 spotlight

2605.01197 2026-05-05 cs.SD cs.MM

MG-Former: A Transformer-Based Framework for Music-Driven 3D Conducting Gesture Generation

Ke Qiu, Yawen Qin, Tianzhi Jia, Xiaole Yang, Kaimin Wang, Kaixing Yang

2605.01192 2026-05-05 cs.LG cs.IT math.IT

Linear-Readout Floors and Threshold Recovery in Computation in Superposition

Hector Borobia, Elies Seguí-Mas, Guillermina Tormo-Carbó

Comments 38 pages, preprint, no figures; comments welcome

2605.01189 2026-05-05 cs.AI

NEURON: A Neuro-symbolic System for Grounded Clinical Explainability

Anuradha Chandrasekaran, Dimitrios Zikos, Mutlu Mete, Alan Pang, Brady D. Lund, Kewei Sha

2605.01185 2026-05-05 cs.CV

Phase-map synthesis from magnitude-only MR images using conditional score-based diffusion models with application in training of accelerated MRI reconstruction models

M. Berk Sahin, Dilek Yalcinkaya, Abolfazl Hashemi, Behzad Sharif

详情

英文摘要

Accelerated magnetic resonance imaging (MRI) enabled by the training of deep learning (DL)-based image recon. models requires large and diverse raw k-space datasets. In most clinical MRI applications, due to storage and patient privacy concerns, raw k-space data is discarded and magnitude-only images are the only component saved. Consequently, a large portion of the DL-based MRI recon. literature has either relied on small training datasets or has used one of the few available open-source k-space datasets. At the same time, the growing number of anonymized magnitude-only image registries/databases motivates the development of techniques that can use them as training datasets for generalizable DL-based recon. models. Here we propose to address this challenge by employing a generative approach based on conditional score-based diffusion models (SBDMs): given a magnitude-only MR image, it synthesizes a phase map (in the image domain) that realistically corresponds to the magnitude-only image. We evaluate its generative capabilities in a downstream DL-based recon. task whereby a large k-space dataset is generated by combining the SBDM-synthesized phase-maps and the corresponding magnitude-only images, and this k-space dataset is then used to train a DL model for accelerated MRI recon. We compare the performance of the resulting DL model versus those trained according to (a) a naive approach that uses smooth phase, (b) a k-space training dataset generated using synthesized phase maps derived from a generative adversarial network, and (c) the ground truth k-space data. Our results suggest that the DL model trained from SBDM-synthesized k-space data outperforms the other approaches in terms of quantitative metrics as well as qualitatively observed recon. fidelity, i.e., whether the reconstructed images include erroneous or hallucinated features that could adversely impact diagnostic accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.01172 2026-05-05 cs.LG stat.ML

A Theory of Generalization in Deep Learning

Elon Litman, Gabe Guo

2605.01168 2026-05-05 cs.CL

Quantifying and Predicting Disagreement in Graded Human Ratings

Leixin Zhang, Çağrı Çöltekin

Comments Accepted by the 5th Workshop on Perspectivist Approaches to NLP at LREC

2605.01167 2026-05-05 cs.LG cs.AI

Minimizing Collateral Damage in Activation Steering

Tam Nguyen, Tu Anh Nguyen, Sina Alemohammad, Richard G. Baraniuk

2605.01165 2026-05-05 cs.CV

CEZSAR: A Contrastive Embedding Method for Zero-Shot Action Recognition

Valter Estevam, Rayson Laroca, Helio Pedrini, David Menotti

Comments Accepted for presentation at the International Conference on Pattern Recognition (ICPR) 2026

2605.01164 2026-05-05 cs.AI

LLMs Should Not Yet Be Credited with Decision Explanation

Wenshuo Wang

2605.01154 2026-05-05 cs.LG cs.AI

Multi-Perspective Transformers in ARC-AGI-2 Challenge

Caleb Talley, Vedant Tibrewal, Seun Adekunle, Weiwen Dong, Xinyu Wu, Fariha Sheikh

2605.01148 2026-05-05 cs.AI cs.CL

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

Sheridan Feucht, Tal Haklay, Usha Bhalla, Daniel Wurgaft, Can Rager, Raphaël Sarfati, Jack Merullo, Thomas McGrath, Owen Lewis, Ekdeep Singh Lubana, Thomas Fel, Atticus Geiger

2605.01147 2026-05-05 cs.AI

Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment

Tanav Singh Bajaj, Nikhil Singh, Karan Anand, Eishkaran Singh

Comments 18 pages, 8 figures. Position paper

2605.01144 2026-05-05 cs.CV cs.AI

Semantic Context-aware mOdality fUsion Transformer (SCOUT): A Context-Aware Multimodal Transformer for Concept-Grounded Pathology Report Generation

Suryakant Singh, Saarthak Kapse, Joel Saltz, Prateek Prasanna

详情

英文摘要

Whole-slide images (WSIs) present a fundamental challenge for computational pathology due to their extreme resolution, multi-scale heterogeneity, and the requirement for clinically reliable interpretation. Although recent pathology foundation models have enabled fluent report generation, they often lack clinical grounding, failing to accurately represent key diagnostic concepts and relationships observed by pathologists. This limitation arises from the difficulty of integrating heterogeneous visual evidence spanning fine-grained cellular patterns, slide-level tissue architecture, and high-level diagnostic concepts, while maintaining interpretability and clinical coherence. Here we present SCOUT: Semantic Context-aware mOdality fUsion Transformer, a context-aware concept-grounded multimodal framework for pathology report generation that enables progressive conditioning of image representations by global slide information and explicit diagnostic concepts. The method integrates local histological patterns, whole-slide context, and expert-curated semantic descriptors within a unified learning paradigm, allowing visual features to be dynamically refined throughout the encoding process. By combining depth-aware contextual modulation with adaptive multimodal fusion during text generation, the framework produces clinically coherent reports while preserving complementarity across representational scales. Using CONCH1.5 features, we evaluate SCOUT against WSI-Caption, HistGen, and BiGen on TCGA-BRCA, MICCAI REG, and HistAI. SCOUT achieves the best BLEU-1 to BLEU-4 and METEOR scores on all datasets, plus the best ROUGE-L on TCGA-BRCA and MICCAI REG. On TCGA-BRCA, it reaches 0.436/0.303/0.202/0.156 BLEU-1/2/3/4 and 0.204 METEOR; on REG 2025, it achieves 0.865/0.834/0.805/0.780 and 0.568. These results support progressive contextual conditioning for grounded pathology report generation.

URL PDF HTML ☆

赞 0 踩 0

2605.01143 2026-05-05 cs.AI

A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents

Sheldon Yu, Yingcheng Sun, Hanqing Guo, Julian McAuley, Qianqian Tong

2605.01137 2026-05-05 cs.LG cs.CR

Metric-Normalized Posterior Leakage (mPL): Attacker-Aligned Privacy for Joint Consumption

Gaoyi Chen, Minghao Li, Weishi Shi, Yan Huang, Yusheng Wei, Sourabh Yadav, Chenxi Qiu

2605.01136 2026-05-05 cs.LG cs.SI math.SP stat.ML

Spectral Graph Sparsification Preserves Representation Geometry in Graph Neural Networks

Sanjukta Krishnagopal

Comments 9 pages, 4 figures