arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.00181 2026-04-15 cs.CV cs.AI

CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning

Hang Wu, Yujun Cai, Zehao Li, Haonan Ge, Bowen Sun, Junsong Yuan, Yiwei Wang

详情

英文摘要

Understanding camera dynamics is a fundamental pillar of video spatial intelligence. However, existing multimodal models predominantly treat this task as a black-box classification, often confusing physically distinct motions by relying on superficial visual patterns rather than geometric cues. We present \textbf{CamReasoner}, a framework that reformulates camera movement understanding as a structured inference process to bridge the gap between perception and cinematic logic. Our approach centers on the Observation-Thinking-Answer (O-T-A) paradigm, which compels the model to articulate spatio-temporal observations and reason about motion patterns within an explicit reasoning block. To instill this capability, we construct a Large-scale Inference Trajectory Suite comprising 18k SFT reasoning chains and 38k RL feedback samples. To the best of our knowledge, \textbf{we are the first to employ RL for logical alignment in camera movement understanding}, ensuring motion inferences are grounded in structured visual reasoning rather than contextual guesswork. Built upon Qwen2.5-VL-7B, CamReasoner-7B improves binary classification accuracy from 73.8\% to 78.4\% and VQA accuracy from 60.9\% to 74.5\% over its backbone, consistently outperforming both proprietary and open-source baselines across multiple benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2601.21297 2026-04-15 cs.RO cs.SY eess.SY

Deep QP Safety Filter: Model-free Learning for Reachability-based Safety Filter

Byeongjun Kim, H. Jin Kim

Comments Accepted to the 8th Annual Learning for Dynamics and Control Conference (L4DC 2026)

2601.11047 2026-04-15 cs.CL cs.LG

CoG: Controllable Graph Reasoning via Relational Blueprints and Failure-Aware Refinement over Knowledge Graphs

Yuanxiang Liu, Songze Li, Xiaoke Guo, Zhaoyan Gong, Qifei Zhang, Huajun Chen, Wen Zhang

Comments ACL 2026 Main

2601.10398 2026-04-15 cs.AI

LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries

Xuancheng Ren, Shijing Hu, Zhihui Lu, Jiangqi Huang, Qiang Duan

2601.09313 2026-04-15 cs.CL cs.AI

Understanding or Memorizing? A Case Study of German Definite Articles in Language Models

Jonathan Drechsel, Erisa Bytyqi, Steffen Herbold

Comments Accepted at ACL 2026

2601.09152 2026-04-15 cs.AI

PrivacyReasoner: Can LLM Emulate a Human-like Privacy Mind?

Yiwen Tu, Xuan Liu, Lianhui Qin, Haojian Jin

2601.06794 2026-04-15 cs.AI

No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning

Zhicong Li, Lingjie Jiang, Yulan Hu, Xingchen Zeng, Yixia Li, Xiangwen Zhang, Guanhua Chen, Zheng Pan, Xin Li, Yong Liu

2512.13961 2026-04-15 cs.CL cs.LG

Olmo 3

Team Olmo, :, Allyson Ettinger, Amanda Bertsch, Bailey Kuehl, David Graham, David Heineman, Dirk Groeneveld, Faeze Brahman, Finbarr Timbers, Hamish Ivison, Jacob Morrison, Jake Poznanski, Kyle Lo, Luca Soldaini, Matt Jordan, Mayee Chen, Michael Noukhovitch, Nathan Lambert, Pete Walsh, Pradeep Dasigi, Robert Berry, Saumya Malik, Saurabh Shah, Scott Geng, Shane Arora, Shashank Gupta, Taira Anderson, Teng Xiao, Tyler Murray, Tyler Romero, Victoria Graf, Akari Asai, Akshita Bhagia, Alexander Wettig, Alisa Liu, Aman Rangapur, Chloe Anastasiades, Costa Huang, Dustin Schwenk, Harsh Trivedi, Ian Magnusson, Jaron Lochner, Jiacheng Liu, Lester James V. Miranda, Maarten Sap, Malia Morgan, Michael Schmitz, Michal Guerquin, Michael Wilson, Regan Huff, Ronan Le Bras, Rui Xin, Rulin Shao, Sam Skjonsberg, Shannon Zejiang Shen, Shuyue Stella Li, Tucker Wilde, Valentina Pyatkin, Will Merrill, Yapei Chang, Yuling Gu, Zhiyuan Zeng, Ashish Sabharwal, Luke Zettlemoyer, Pang Wei Koh, Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi

Comments minor edit updates

2512.03963 2026-04-15 cs.CV

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning

Tao Wu, Li Yang, Gen Zhan, Yabin Zhang, Yiting Liao, Junlin Li, Deliang Fu, Li Zhang, Limin Wang

2511.22364 2026-04-15 cs.RO cs.AI

BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands

Seongwon Cho, Daechul Ahn, Donghyun Shin, Hyeonbeom Choi, San Kim, Jonghyun Choi

Comments 12 pages, 8 figures

2511.22039 2026-04-15 cs.CV

SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model

Jiayuan Du, Yiming Zhao, Zhenglong Guo, Yong Pan, Wenbo Hou, Zhihui Hao, Kun Zhan, Qijun Chen

Comments Accepted by CVPR2026 as an oral

2511.17097 2026-04-15 cs.RO

Progress-Think: Semantic Progress Reasoning for Vision-Language Navigation

Shuo Wang, Yucheng Wang, Guoxin Lian, Yongcai Wang, Maiyue Chen, Kaihui Wang, Bo Zhang, Zhizhong Su, Yutian Zhou, Wanting Li, Deying Li, Zhaoxin Fan

2511.10453 2026-04-15 cs.CL cs.AI

Reasoning about Intent for Ambiguous Requests

Irina Saparina, Mirella Lapata

2511.09803 2026-04-15 cs.CL

Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG

Yufeng Wang, Lu wei, Haibin Ling

2511.09780 2026-04-15 cs.LG

Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

Nikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen

Comments Accepted to ACL Findings 2026

2511.06341 2026-04-15 cs.LG cs.RO cs.SY eess.SY math.OC

Scalable Verification of Neural Control Barrier Functions Using Linear Bound Propagation

Nikolaus Vertovec, Frederik Baymler Mathiesen, Thom Badings, Luca Laurenti, Alessandro Abate

Comments accepted at the 8th Annual Conference on Learning for Dynamics and Control (L4DC 2026)

2510.21697 2026-04-15 cs.CV cs.LG

Visual Diffusion Models are Geometric Solvers

Nir Goren, Shai Yehezkel, Omer Dahary, Andrey Voynov, Or Patashnik, Daniel Cohen-Or

Comments Project page: https://kariander1.github.io/visual-geo-solver/

2510.14420 2026-04-15 cs.CL cs.AI

Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

Qingyu Ren, Qianyu He, Powei Chang, Jie Zeng, Zeye Sun, Fei Yu, Jiaqing Liang, Yanghua Xiao

2510.13793 2026-04-15 cs.CV cs.CR cs.LG

NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

Nir Goren, Oren Katzir, Abhinav Nakarmi, Eyal Ronen, Mahmood Sharif, Or Patashnik

Comments code available at: https://github.com/nirgoren/NoisePrints

2510.11715 2026-04-15 cs.CV

Point Prompting: Counterfactual Tracking with Video Diffusion Models

Ayush Shrivastava, Sanyam Mehta, Daniel Geng, Andrew Owens

Comments ICLR 2026. Project link: https://point-prompting.github.io

2510.01186 2026-04-15 cs.CV

ASTRA: Let Arbitrary Subjects Transform in Video Editing

Fei Shen, Weihao Xu, Rui Yan, Dong Zhang, Xiangbo Shu, Jinhui Tang, Maocheng Zhao

2510.00310 2026-04-15 cs.LG cs.MA

Robust Federated Inference

Akash Dhasade, Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Maxime Jacovella, Anne-Marie Kermarrec, Rafael Pinot

Comments Accepted at ICLR 2026

2509.25843 2026-04-15 cs.AI

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

Yein Park, Jungwoo Park, Jaewoo Kang

Comments ICLR 2026, 29 pages, 11 figures

2509.25758 2026-04-15 cs.AI

Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

Yein Park, Minbyul Jeong, Jaewoo Kang

详情

英文摘要

The remarkable capabilities of modern large reasoning models are largely unlocked through post-training techniques such as supervised fine-tuning (SFT) and reinforcement learning (RL). However, the architectural mechanisms behind such improvements remain largely opaque. In this work, we use circuit analysis to demonstrate that post-training for complex reasoning sparks the emergence of novel, functionally specialized attention heads. These heads collectively support structured reasoning and computation. Our comparative analysis across various model families reveals that these emergent heads evolve differently under different training regimes. Distillation and SFT foster a cumulative addition of stable reasoning heads. In contrast, group relative policy optimization (GRPO) operates in a dynamic search mode: relatively few attention heads are iteratively activated, evaluated, and pruned, with their survival closely tracking fluctuations in the task reward signal. Furthermore, we find that controllable "think on/off" models do not possess dedicated "thinking" heads. Instead, turning off explicit reasoning triggers a broader-but less efficient-set of compensatory heads. Through ablation and qualitative analyses, we connect these circuit-level dynamics to a crucial performance trade-off: strengthened heads enable sophisticated problem-solving strategies for difficult problems but can also introduce "over-thinking" failure modes, such as calculation errors or logical loops on simpler tasks. These findings connect circuit-level dynamics to macro-level performance, identifying an inherent tension where complex reasoning comes at the cost of elementary computations. More broadly, our work points to future directions for training policy design, emphasizing the need to balance the development of effective reasoning strategies with the assurance of reliable, flawless execution.

URL PDF HTML ☆

赞 0 踩 0

2509.16806 2026-04-15 cs.CV

MedGS: Gaussian Splatting for Multi-Modal 3D Medical Imaging

Kacper Marzol, Ignacy Kolton, Weronika Smolak-Dyżewska, Joanna Kaleta, Żaneta Świderska-Chadaj, Marcin Mazur, Mirosław Dziekiewicz, Tomasz Markiewicz, Przemysław Spurek

2509.08660 2026-04-15 cs.LG

Replicable Reinforcement Learning with Linear Function Approximation

Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell

Comments ICLR 2026

2509.07177 2026-04-15 cs.CL

Towards EnergyGPT: A Large Language Model Specialized for the Energy Sector

Amal Chebbi, Babajide Kolade

Comments Code and artifacts available at: https://github.com/fitila/energygpt-release

2509.03497 2026-04-15 cs.LG

Invariant Features for Global Crop Type Classification

Xin-Yi Tong, Sherrie Wang

2509.03234 2026-04-15 cs.LG

TeRA: Vector-based Random Tensor Network for High-Rank Adaptation of Large Language Models

Yuxuan Gu, Wuyang Zhou, Giorgos Iacovides, Danilo Mandic

Comments Accepted at ACL main conference 2026. Code is available at https://github.com/guyuxuan9/TeRA

2508.07267 2026-04-15 cs.RO

Bio-Inspired Topological Autonomous Navigation with Active Inference in Robotics

Daria de Tinguy, Tim Verbelen, Emilio Gamba, Bart Dhoedt

Comments Conference ICCAS 2025 - accepted (in processing)

详情

DOI: 10.1007/978-3-032-16955-6_19
Journal ref: Communications in Computer and Information Science, vol 2857, 2025

英文摘要

Achieving fully autonomous exploration and navigation remains a critical challenge in robotics, requiring integrated solutions for localisation, mapping, decision-making and motion planning. Existing approaches either rely on strict navigation rules lacking adaptability or on pre-training, which requires large datasets. These AI methods are often computationally intensive or based on static assumptions, limiting their adaptability in dynamic or unknown environments. This paper introduces a bio-inspired agent based on the Active Inference Framework (AIF), which unifies mapping, localisation, and adaptive decision-making for autonomous navigation, including exploration and goal-reaching. Our model creates and updates a topological map of the environment in real-time, planning goal-directed trajectories to explore or reach objectives without requiring pre-training. Key contributions include a probabilistic reasoning framework for interpretable navigation, robust adaptability to dynamic changes, and a modular ROS2 architecture compatible with existing navigation systems. Our method was tested in simulated and real-world environments. The agent successfully explores large-scale simulated environments and adapts to dynamic obstacles and drift, proving to be comparable to other exploration strategies such as Gbplanner, FAEL and Frontiers. This approach offers a scalable and transparent approach for navigating complex, unstructured environments.

URL PDF HTML ☆

赞 0 踩 0