arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.19267 2026-04-22 cs.RO

Multimodal embodiment-aware navigation transformer

Louis Dezons, Quentin Picard, Rémi Marsal, François Goulette, David Filliat

Comments 8 pages, 7 figures

详情

英文摘要

Goal-conditioned navigation models for ground robots trained using supervised learning show promising zero-shot transfer, but their collision-avoidance capability nevertheless degrades under distribution shift, i.e. environmental, robot or sensor configuration changes. We propose ViLiNT a multimodal, attention-based policy for goal navigation, trained on heterogeneous data from multiple platforms and environments, which improves robustness with two key features. First, we fuse RGB images, 3D LiDAR point clouds, a goal embedding and a robot's embodiment descriptor with a transformer architecture to capture complementary geometry and appearance cues. The transformer's output is used to condition a diffusion model that generates navigable trajectories. Second, using automatically generated offline labels, we train a path clearance prediction head for scoring and ranking trajectories produced by the diffusion model. The diffusion conditioning as well as the trajectory ranking head depend on a robot's embodiment token that allows our model to generate and select trajectories with respect to the robot's dimensions. Across three simulated environments, ViLiNT improves Success Rate on average by 166\% over equivalent state-of-the-art vision-only baseline (NoMaD). This increase in performance is confirmed through real-world deployments of a rover navigating in obstacle fields. These results highlight that combining multimodal fusion with our collision prediction mechanism leads to improved off-road navigation robustness.

URL PDF HTML ☆

赞 0 踩 0

2604.19264 2026-04-22 cs.CV

DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents

Shengqin Wang, Wentao Yan, Huichi Zhou, Yihang Chen, Kun Shao, Zhizhong Zhang, Yuan Xie

2604.19262 2026-04-22 cs.CL cs.AI

CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks

Peiqin Lin, Chenyang Lyu, Wenjiang Luo, Haotian Ye, Md Mehrab Hossain, Chunlan Ma, Shaoxiong Ji, Younes Samih, Bo Zeng, Fan Jiang, Yuanbin Cao, Dilda Duisenbek, Adrian Neo Sau Xun, Daria Pozdniakova, Liubou Misevich, Nevena Marinković, Ngoc Gia Linh Nguyen, Thi Khanh Linh Do, Sarakmatak Sophy, Baotian Hu, Guanhua Chen, Gongbo Tang, Alham Fikri Aji, Longyue Wang, Weihua Luo

2604.19261 2026-04-22 cs.CL

Towards a Linguistic Evaluation of Narratives: A Quantitative Stylistic Framework

Alessandro Maisto

Comments 9TH International Workshop on Computational Models of Narrative (CMN '26) - 8-11 June 2026 - Madrid. 15 Pages

2604.19259 2026-04-22 cs.CV

Feature Perturbation Pool-based Fusion Network for Unified Multi-Class Industrial Defect Detection

Yuanchan Xu, Wenjun Zang, Ying Wu

2604.19257 2026-04-22 cs.CV

Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images

Hongyuan Liu, Bochao Zou, Qiankun Liu, Haochen Yu, Qi Mei, Jianfei Jiang, Chen Liu, Cheng Bi, Zhao Wang, Xueyang Zhang, Yifei Zhan, Jiansheng Chen, Huimin Ma

Comments Accepted by CVPR 2026

2604.19254 2026-04-22 cs.CL cs.AI

ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

Xianming Li, Zongxi Li, Tsz-fung Andrew Lee, Jing Li, Haoran Xie, Qing Li

2604.19240 2026-04-22 cs.AI

Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network

Shuo Feng, Runlin Zhou, Yuyang Li, Guangcan Liu

2604.19238 2026-04-22 cs.CV

Allo{SR}$^2$: Rectifying One-Step Super-Resolution to Stay Real via Allomorphic Generative Flows

Zihan Wang, Xudong Huang, Junbo Qiao, Wei Li, Jie Hu, Xinghao Chen, Shaohui Lin

2604.19233 2026-04-22 cs.CV

Adaptive Slicing-Assisted Hyper Inference for Enhanced Small Object Detection in High-Resolution Imagery

Francesco Moretti, Yi Jin, Guiqin Mario

详情

英文摘要

Deep learning-based object detectors have achieved remarkable success across numerous computer vision applications, yet they continue to struggle with small object detection in high-resolution aerial and satellite imagery, where dense object distributions, variable shooting angles, diminutive target sizes, and substantial inter-class variability pose formidable challenges. Existing slicing strategies that partition high-resolution images into manageable patches have demonstrated promising results for enlarging the effective receptive field of small targets; however, their reliance on fixed slice dimensions introduces significant redundant computation, inflating inference cost and undermining detection speed. In this paper, we propose \textbf{Adaptive Slicing-Assisted Hyper Inference (ASAHI)}, a novel slicing framework that shifts the paradigm from prescribing a fixed slice size to adaptively determining the optimal number of slices according to image resolution, thereby substantially mitigating redundant computation while preserving beneficial overlap between adjacent patches. ASAHI integrates three synergistic components: (1)an adaptive resolution-aware slicing algorithm that dynamically generates 6 or 12 overlapping patches based on a learned threshold, (2)a slicing-assisted fine-tuning (SAF) strategy that constructs augmented training data comprising both full-resolution and sliced image patches, and (3)a Cluster-DIoU-NMS (CDN) post-processing module that combines the geometric merging efficiency of Cluster-NMS with the center-distance-aware suppression of DIoU-NMS to achieve robust duplicate elimination in crowded scenes. Extensive experiments on VisDrone2019 and xView, demonstrate that ASAHI achieves state-of-the-art performance with 56.8% on VisDrone2019-DET-val and 22.7% on xView-test, while reducing inference time by 20-25% compared to the baseline SAHI method.

URL PDF HTML ☆

赞 0 踩 0

2604.19218 2026-04-22 cs.CV

Thinking Before Matching: A Reinforcement Reasoning Paradigm Towards General Person Re-Identification

Quan Zhang, Jingze Wu, Jialong Wang, Xiaohua Xie, Jianhuang Lai, Hongbo Chen

Comments 10 pages

2604.19217 2026-04-22 cs.CV cs.AI

Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data

Gopal Krishna Shyam, Ila Chandrakar

Comments 6 pages, 2 Figures

2604.19216 2026-04-22 cs.CV

An Object-Centered Data Acquisition Method for 3D Gaussian Splatting using Mobile Phones

Yuezhe Zhang, Luqian Bai, Mengting Yu, Lei Wei, Shuai Wan, Yifan Zhang

2604.19212 2026-04-22 cs.LG cs.LO

The Logical Expressiveness of Topological Neural Networks

Amirreza Akbari, Amauri H. Souza, Vikas Garg

Comments 39 pages, Published at the 14th International Conference on Learning Representations (ICLR 2026)

2604.19211 2026-04-22 cs.AI

ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation

Zhiqin Yang, Zhenyuan Zhang, Xianzhang Jia, Jun Song, Wei Xue, Yonggang Zhang, Yike Guo

Comments 13 pages

2604.19209 2026-04-22 cs.SD

Audio Spoof Detection with GaborNet

Waldek Maciejko

Comments Industrial conference materials

2604.19206 2026-04-22 cs.CV

When Can We Trust Deep Neural Networks? Towards Reliable Industrial Deployment with an Interpretability Guide

Hang-Cheng Dong, Yuhao Jiang, Yibo Jiao, Lu Zou, Kai Zheng, Bingguo Liu, Dong Ye, Guodong Liu

2604.19196 2026-04-22 cs.CV

Benchmarking Vision Foundation Models for Domain-Generalizable Face Anti-Spoofing

Mika Feng, Pierre Gallin-Martel, Koichi Ito, Takafumi Aoki

Comments 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

2604.19193 2026-04-22 cs.CV

How Far Are Video Models from True Multimodal Reasoning?

Xiaotian Zhang, Jianhui Wei, Yuan Wang, Jie Tan, Yichen Li, Yan Zhang, Ziyi Chen, Daoan Zhang, Dezhi YU, Wei Xu, Songtao Jiang, Zuozhu Liu

2604.19191 2026-04-22 cs.CV cs.AI

Improved Anomaly Detection in Medical Images via Mean Shift Density Enhancement

Pritam Kar, Gouri Lakshmi S, Saptarshi Bej

2604.19189 2026-04-22 cs.CL

Headlines You Won't Forget: Can Pronoun Insertion Increase Memorability?

Selina Meyer, Magdalena Abel, Michael Roth

Comments To be published at the 15th edition of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2026)

2604.19186 2026-04-22 cs.LG cs.AI

Inductive Subgraphs as Shortcuts: Causal Disentanglement for Heterophilic Graph Learning

Xiangmeng Wang, Qian Li, Haiyang Xia, Hao Miao, Qing Li, Guandong Xu

Comments SIGIR 2026

2604.19185 2026-04-22 cs.CL cs.AI

SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization

Bo-Jyun Wang, Ying-Jia Lin, Hung-Yu Kao

Comments Accepted by ACL 2026 Findings

2604.19172 2026-04-22 cs.AI

Reasoning-Aware AIGC Detection via Alignment and Reinforcement

Zhao Wang, Max Xiong, Jianxun Lian, Zhicheng Dou

2604.19171 2026-04-22 cs.LG

FOCAL-Attention for Heterogeneous Multi-Label Prediction

Chenghao Zhang, Qingqing Long, Ludi Wang, Wenjuan Cui, Jianjun Yu, Yi Du

Comments 24 pages, 4 figures

2604.19167 2026-04-22 cs.LG cs.AI

LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation

Siqing Song, Chuang Wang, Yong Lang, Yi Yang, Xu-Yao Zhang

2604.19162 2026-04-22 cs.CL stat.AP

Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation

Hongxing Pan, Yingying Guo, Wenqing Kuang, Jiashi Lu

Comments 7 pages, 1 figure, 3 tables

2604.19159 2026-04-22 cs.CV cs.LG

MSDS: Deep Structural Similarity with Multiscale Representation

Danling Kang, Xue-Hua Chen, Bin Liu, Keke Zhang, Weiling Chen, Tiesong Zhao

2604.19157 2026-04-22 cs.LG

SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving

Jinda Jia, Jisen Li, Zhongzhu Zhou, Jung Hwan Heo, Jue Wang, Tri Dao, Shuaiwen Leon Song, Ben Athiwaratkun, Chenfeng Xu, Tianyi Zhang, Xiaoxia Wu

2604.19149 2026-04-22 cs.CL cs.AI

How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning

Haoyang Chen, Yi Liu, Jianzhi Shao, Tao Zhang, Chengfu Huo, Wei Hu

Comments Accepted in the Findings of ACL 2026