arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.02870 2026-04-06 cs.CV

Token Warping Helps MLLMs Look from Nearby Viewpoints

Phillip Y. Lee, Chanho Park, Mingue Park, Seungwoo Yoo, Juil Koo, Minhyuk Sung

Comments CVPR 2026, Project Page: https://token-warping-mllm.github.io

详情

英文摘要

Can warping tokens, rather than pixels, help multimodal large language models (MLLMs) understand how a scene appears from a nearby viewpoint? While MLLMs perform well on visual reasoning, they remain fragile to viewpoint changes, as pixel-wise warping is highly sensitive to small depth errors and often introduces geometric distortions. Drawing on theories of mental imagery that posit part-level structural representations as the basis for human perspective transformation, we examine whether image tokens in ViT-based MLLMs serve as an effective substrate for viewpoint changes. We compare forward and backward warping, finding that backward token warping, which defines a dense grid on the target view and retrieves a corresponding source-view token for each grid point, achieves greater stability and better preserves semantic coherence under viewpoint shifts. Experiments on our proposed ViewBench benchmark demonstrate that token-level warping enables MLLMs to reason reliably from nearby viewpoints, consistently outperforming all baselines including pixel-wise warping approaches, spatially fine-tuned MLLMs, and a generative warping method.

URL PDF HTML ☆

赞 0 踩 0

2604.02869 2026-04-06 cs.AI

Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration

Wachiravit Modecrua, Krittanon Kaewtawee, Krittin Pachtrachai, Touchapon Kraisingkorn

2604.02867 2026-04-06 cs.CV

HairOrbit: Multi-view Aware 3D Hair Modeling from Single Portraits

Leyang Jin, Yujian Zheng, Bingkui Tong, Yuda Qiu, Zhenyu Xie, Hao Li

Comments 17 pages, 6 figures

2604.02866 2026-04-06 cs.CL

LLM-based Atomic Propositions help weak extractors: Evaluation of a Propositioner for triplet extraction

Luc Pommeret, Thomas Gerald, Patrick Paroubek, Sahar Ghannay, Christophe Servan, Sophie Rosset

2604.02860 2026-04-06 cs.CV cs.AI

A Paradigm Shift: Fully End-to-End Training for Temporal Sentence Grounding in Videos

Allen He, Qi Liu, Kun Liu, Xinchen Liu, Wu Liu

Comments Accepted as CVPR 2026 Workshop PVUW

2604.02847 2026-04-06 cs.CV

HiDiGen: Hierarchical Diffusion for B-Rep Generation with Explicit Topological Constraints

Shurui Liu, Weide Chen, Ancong Wu

2604.02845 2026-04-06 cs.CV

Deformation-based In-Context Learning for Point Cloud Understanding

Chengxing Lin, Jinhong Deng, Yinjie Lei, Wen Li

Comments Accepted by CVPR 2026. Code: https://github.com/linchengxing/DeformPIC

2604.02836 2026-04-06 cs.CV

Factorized Multi-Resolution HashGrid for Efficient Neural Radiance Fields: Execution on Edge-Devices

Kim Jun-Seong, Mingyu Kim, GeonU Kim, Tae-Hyun Oh, Jin-Hwa Kim

Comments Accepted for publication in IEEE Robotics and Automation Letters (RA-L)

详情

DOI: 10.1109/LRA.2024.3460419
Journal ref: IEEE Robotics and Automation Letters (RA-L), 2024

英文摘要

We introduce Fact-Hash, a novel parameter-encoding method for training on-device neural radiance fields. Neural Radiance Fields (NeRF) have proven pivotal in 3D representations, but their applications are limited due to large computational resources. On-device training can open large application fields, providing strength in communication limitations, privacy concerns, and fast adaptation to a frequently changing scene. However, challenges such as limited resources (GPU memory, storage, and power) impede their deployment. To handle this, we introduce Fact-Hash, a novel parameter-encoding merging Tensor Factorization and Hash-encoding techniques. This integration offers two benefits: the use of rich high-resolution features and the few-shot robustness. In Fact-Hash, we project 3D coordinates into multiple lower-dimensional forms (2D or 1D) before applying the hash function and then aggregate them into a single feature. Comparative evaluations against state-of-the-art methods demonstrate Fact-Hash's superior memory efficiency, preserving quality and rendering speed. Fact-Hash saves memory usage by over one-third while maintaining the PSNR values compared to previous encoding methods. The on-device experiment validates the superiority of Fact-Hash compared to alternative positional encoding methods in computational efficiency and energy consumption. These findings highlight Fact-Hash as a promising solution to improve feature grid representation, address memory constraints, and improve quality in various applications. Project page: https://facthash.github.io/

URL PDF HTML ☆

赞 0 踩 0

2604.02834 2026-04-06 cs.AI

ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents

Chao Li, Cailiang Liu, Ang Gao, Kexin Deng, Shu Zhang, Langping Xu, Xiaotong Shi, Xionghao Ding, Jian Pei, Xun Jiang

2604.02829 2026-04-06 cs.CV cs.RO

STRNet: Visual Navigation with Spatio-Temporal Representation through Dynamic Graph Aggregation

Hao Ren, Zetong Bi, Yiming Zeng, Zhaoliang Wan, Lu Qi, Hui Cheng

Comments CVPR2026

2604.02828 2026-04-06 cs.CV cs.AI

NavCrafter: Exploring 3D Scenes from a Single Image

Hongbo Duan, Peiyu Zhuang, Yi Liu, Zhengyang Zhang, Yuxin Zhang, Pengting Luo, Fangming Liu, Xueqian Wang

Comments 8 pages accepted by ICRA 2026

2604.02827 2026-04-06 cs.RO

Orientation Matters: Learning Radiation Patterns of Multi-Rotor UAVs In-Flight to Enhance Communication Availability Modeling

Martin Zoula, Daniel Bonilla Licea, Jan Faigl, Václav Navrátil, Martin Saska

Comments 9 pages, 8 figures

2604.02820 2026-04-06 cs.RO

MFE: A Multimodal Hand Exoskeleton with Interactive Force, Pressure and Thermo-haptic Feedback

Ziyuan Tang, Yitian Guo, Chenxi Xiao

Comments 8 pages, 7 figures, 2 tables

详情

DOI: 10.1109/LRA.2026.3662616
Journal ref: IEEE Robotics and Automation Letters 11 (2026) 3756-3763

英文摘要

Recent advancements in virtual reality and robotic teleoperation have greatly increased the variety of haptic information that must be conveyed to users. While existing haptic devices typically provide unimodal feedback to enhance situational awareness, a gap remains in their ability to deliver rich, multimodal sensory feedback encompassing force, pressure, and thermal sensations. To address this limitation, we present the Multimodal Feedback Exoskeleton (MFE), a hand exoskeleton designed to deliver hybrid haptic feedback. The MFE features 20 degrees of freedom for capturing hand pose. For force feedback, it employs an active mechanism capable of generating 3.5-8.1 N of pushing and pulling forces at the fingers' resting pose, enabling realistic interaction with deformable objects. The fingertips are equipped with flat actuators based on the electro-osmotic principle, providing pressure and vibration stimuli and achieving up to 2.47 kPa of contact pressure to render tactile sensations. For thermal feedback, the MFE integrates thermoelectric heat pumps capable of rendering temperatures from 10 to 55 degrees Celsius. We validated the MFE by integrating it into a robotic teleoperation system using the X-Arm 6 and Inspire Hand manipulator. In user studies, participants successfully recognized and manipulated deformable objects and differentiated remote objects with varying temperatures. These results demonstrate that the MFE enhances situational awareness, as well as the usability and transparency of robotic teleoperation systems.

URL PDF HTML ☆

赞 0 踩 0

2604.02819 2026-04-06 cs.CL

Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection

Chaoqun He, Yingfa Chen, Chaojun Xiao, Xu Han, Lijie Wen

Comments 17 pages, 6 figures

2604.02817 2026-04-06 cs.CV

MMPhysVideo: Scaling Physical Plausibility in Video Generation via Joint Multimodal Modeling

Shubo Lin, Xuanyang Zhang, Wei Cheng, Weiming Hu, Gang Yu, Jin Gao

Comments Project Page: https://shubolin028.github.io/MMPhysVideo-Page

2604.02816 2026-04-06 cs.CV cs.AI

QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models

Xinhao Wang, Zhonyu Xia, Zhiwei Lin, Zhe Li, Yongtao Wang

Comments 12 pages

2604.02808 2026-04-06 cs.CV

CMCC-ReID: Cross-Modality Clothing-Change Person Re-Identification

Haoxuan Xu, Hanzi Wang, Guanglin Niu

2604.02804 2026-04-06 cs.CV cs.AI cs.MM

PaveBench: A Versatile Benchmark for Pavement Distress Perception and Interactive Vision-Language Analysis

Dexiang Li, Zhenning Che, Haijun Zhang, Dongliang Zhou, Zhao Zhang, Yahong Han

2604.02799 2026-04-06 cs.CV

UNICA: A Unified Neural Framework for Controllable 3D Avatars

Jiahe Zhu, Xinyao Wang, Yiyu Zhuang, Yanwen Wang, Jing Tian, Yao Yao, Hao Zhu

Comments Opensource code: https://github.com/zjh21/UNICA

2604.02795 2026-04-06 cs.CL cs.AI

Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks

Tianze Xu, Yanzhao Zheng, Pengrui Lu, Lyumanshan Ye, Yong Wu, Zhentao Zhang, Yuanqiang Yu, Chao Ma, Jihuai Zhu, Pengfei Liu, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu

2604.02794 2026-04-06 cs.AI

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

Situo Zhang, Yifan Zhang, Zichen Zhu, Da Ma, Lei Pan, Danyang Zhang, Zihan Zhao, Lu Chen, Kai Yu

2604.02788 2026-04-06 cs.LG

Structure-Aware Commitment Reduction for Network-Constrained Unit Commitment with Solver-Preserving Guarantees

Guangwen Wang, Jiaqi Wu, Yang Weng, Baosen Zhang

Comments 10 pages

2604.02787 2026-04-06 cs.CV cs.AI

LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers

Shreshth Saini, Hakan Gedik, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

2604.02786 2026-04-06 cs.RO

QuadAgent: A Responsive Agent System for Vision-Language Guided Quadrotor Agile Flight

Ao Zhuang, Feng Yu, Tianbao Zhang, Linzuo Zhang, Danping Zou

2604.02785 2026-04-06 cs.CV

CANDLE: Illumination-Invariant Semantic Priors for Color Ambient Lighting Normalization

Rong-Lin Jian, Ting-Yao Chen, Yu-Fan Lin, Chia-Ming Lee, Fu-En Yang, Yu-Chiang Frank Wang, Chih-Chung Hsu

Comments CVPRW 2026 Camera Ready; NTIRE 2026 Ambient Lighting Normalization (2nd & 3rd in Color & White Light Track)

2604.02780 2026-04-06 cs.CV

A Unified Perspective on Adversarial Membership Manipulation in Vision Models

Ruize Gao, Kaiwen Zhou, Yongqiang Chen, Feng Liu

Comments Accepted by CVPR 2026

2604.02778 2026-04-06 cs.CL

When Modalities Remember: Continual Learning for Multimodal Knowledge Graphs

Linyu Li, Zhi Jin, Yichi Zhang, Dongming Jin, Yuanpeng He, Haoran Duan, Gadeng Luosang, Nyima Tashi

2604.02773 2026-04-06 cs.CV

Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark

Haoran Zhu, Wen Yang, Guangyou Yang, Chang Xu, Ruixiang Zhang, Fang Xu, Haijian Zhang, Gui-Song Xia

2604.02772 2026-04-06 cs.CL

Multiple-Debias: A Full-process Debiasing Method for Multilingual Pre-trained Language Models

Haoyu Liang, Peijian Zeng, Wentao Huang, Aimin Yang, Dong Zhou

2604.02770 2026-04-06 cs.AI

Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity

Guoling Zhou, Wenpei Han, Fengqin Yang, Li Wang, Yingcong Zhou, Zhiguo Fu