arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2504.16315 2026-04-20 cs.CV cs.CL

SignX: Continuous Sign Recognition in Compact Pose-Rich Latent Space

Sen Fang, Yalin Feng, Chunyu Sui, Hongbin Zhong, Yanxin Zhang, Hongwei Yi, Hezhen Hu, Dimitris N. Metaxas

Comments 33 pages, CSLR SOTA (2026). More demo at https://signerx.github.io/SignX/

2504.15214 2026-04-20 cs.LG cs.SD

Histogram-based Parameter-efficient Tuning for Passive and Active Sonar Classification

Amirmohammad Mohammadi, Davelle Carreiro, Alexandra Van Dine, Joshua Peeples

Comments 5 pages, 3 figures. This work has been accepted to IEEE IGARSS 2026

2504.12482 2026-04-20 cs.AI

Agentic AI Optimisation (AAIO): what it is, how it works, why it matters, and how to deal with it

Luciano Floridi, Carlotta Buttaboni, Nicolas Gertler, Emmie Hine, Jessica Morley, Claudio Novelli, Tyler Schroder

2504.10466 2026-04-20 cs.CV

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, Srinath Sridhar

Comments Technical Report. Project Page: https://joy-jy11.github.io/

2504.09792 2026-04-20 cs.LG

A Tale of Two Learning Algorithms: Multiple Stream Random Walk and Asynchronous Gossip

Peyman Gholami, Hulya Seferoglu

2504.06355 2026-04-20 cs.LG

An Information-Geometric Approach to Artificial Curiosity

Alexander Nedergaard, Pablo A. Morales

Comments Comments: 24 pages, 2 figures; version accepted for publication at AISTATS 2026

2504.03621 2026-04-20 cs.CV

PILOT: A Promptable Interleaved Layout-aware OCR Transformer

Laziz Hamdi, Amine Tamasna, Pascal Boisson, Thierry Paquet

2504.01137 2026-04-20 cs.CL

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

Guy Kaplan, Michael Toker, Yuval Reif, Yonatan Belinkov, Roy Schwartz

Comments Accepted to ACL 2026

2504.00966 2026-04-20 cs.RO cs.SY eess.SY

Time-optimal Convexified Reeds-Shepp Paths on a Sphere

Sixu Li, Deepak Prakash Kumar, Swaroop Darbha, Yang Zhou

2503.22171 2026-04-20 cs.CV

An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval

Min Cao, Yuxin Lu, Ziyin Zeng, Dong Yi, Jinqiao Wang, Mang Ye

Comments 20 pages,13 figures

2503.07520 2026-04-20 cs.CV cs.IR

From Limited Labels to Open Domains:An Efficient Learning Method for Drone-view Geo-Localization

Zhongwei Chen, Zhao-Xu Yang, Hai-Jun Rong, Jiawei Lang, Guoqi Li

Comments Accepted by IEEE Transactions on Multimedia 2026

详情

英文摘要

Traditional supervised drone-view geo-localization (DVGL) methods heavily depend on paired training data and encounter difficulties in learning cross-view correlations from unpaired data. Moreover, when deployed in a new domain, these methods require obtaining the new paired data and subsequent retraining for model adaptation, which significantly increases computational overhead. Existing unsupervised methods have enabled to generate pseudo-labels based on cross-view similarity to infer the pairing relationships. However, geographical similarity and spatial continuity often cause visually analogous features at different geographical locations. The feature confusion compromises the reliability of pseudo-label generation, where incorrect pseudo-labels drive negative optimization. Given these challenges inherent in both supervised and unsupervised DVGL methods, we propose a novel cross-domain invariant knowledge transfer network (CDIKTNet) with limited supervision, whose architecture consists of a cross-domain invariance sub-network (CDIS) and a cross-domain transfer sub-network (CDTS). This architecture facilitates a closed-loop framework for invariance feature learning and knowledge transfer. The CDIS is designed to learn cross-view structural and spatial invariance from a small amount of paired data that serves as prior knowledge. It endows the shared feature space of unpaired data with similar implicit cross-view correlations at initialization, which alleviates feature confusion. Based on this, the CDTS employs dual-path contrastive learning to further optimize each subspace while preserving consistency in a shared feature space. Extensive experiments demonstrate that CDIKTNet achieves state-of-the-art performance under full supervision compared with those supervised methods, and further surpasses existing unsupervised methods in both few-shot and cross-domain initialization.

URL PDF HTML ☆

赞 0 踩 0

2503.05578 2026-04-20 cs.CV cs.RO

Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration

Jian Liu, Wei Sun, Kai Zeng, Jin Zheng, Hui Yang, Hossein Rahmani, Ajmal Mian, Lin Wang

Comments Accepted by TRO 2026, 18 pages, 9 figures

详情

Journal ref: IEEE Transactions on Robotics, 2026

英文摘要

Pose estimation-guided unseen object 6-DoF robotic manipulation is a key task in robotics. However, the scalability of current pose estimation methods to unseen objects remains a fundamental challenge, as they generally rely on CAD models or dense reference views of unseen objects, which are difficult to acquire, ultimately limit their scalability. In this paper, we introduce a novel task setup, referred to as SinRef-6D, which addresses 6-DoF absolute pose estimation for unseen objects using only a single pose-labeled reference RGB-D image captured during robotic manipulation. This setup is more scalable yet technically nontrivial due to large pose discrepancies and the limited geometric and spatial information contained in a single view. To address these issues, our key idea is to iteratively establish point-wise alignment in a common coordinate system with state space models (SSMs) as backbones. Specifically, to handle large pose discrepancies, we introduce an iterative object-space point-wise alignment strategy. Then, Point and RGB SSMs are proposed to capture long-range spatial dependencies from a single view, offering superior spatial modeling capability with linear complexity. Once pre-trained on synthetic data, SinRef-6D can estimate the 6-DoF absolute pose of an unseen object using only a single reference view. With the estimated pose, we further develop a hardware-software robotic system and integrate the proposed SinRef-6D into it in real-world settings. Extensive experiments on six benchmarks and in diverse real-world scenarios demonstrate that our SinRef-6D offers superior scalability. Additional robotic grasping experiments further validate the effectiveness of the developed robotic system. The code and robotic demos are available at https://paperreview99.github.io/SinRef-6DoF-Robotic.

URL PDF HTML ☆

赞 0 踩 0

2503.03509 2026-04-20 cs.RO

Sampling-Based Multi-Modal Multi-Robot Multi-Goal Path Planning

Valentin N. Hartmann, Tirza Heinle, Yijiang Huang, Stelian Coros

Comments 25 pages, 17 figures

2503.00214 2026-04-20 cs.RO

Tendon-driven Grasper Design for Aerial Robot Perching on Tree Branches

Haichuan Li, Ziang Zhao, Ziniu Wu, Parth Potdar, Long Tran, Ali Tahir Karasahin, Shane Windsor, Stephen G. Burrow, Basaran Bahadir Kocer

Comments 7 pages, 9 figures

2502.20689 2026-04-20 cs.AI cs.CL

WiseMind: a knowledge-guided multi-agent framework for accurate and empathetic psychiatric diagnosis

Yuqi Wu, Guangya Wan, Jingjing Li, Shengming Zhao, Lingfeng Ma, Tianyi Ye, Ion Pop, Yanbo Zhang, Jie Chen

Comments Accepted at npj Digital Medicine (2026)

详情

DOI: 10.1038/s41746-026-02559-9

英文摘要

Large Language Models (LLMs) offer promising opportunities to support mental healthcare workflows, yet they often lack the structured clinical reasoning needed for reliable diagnosis and may struggle to provide the emotionally attuned communication essential for patient trust. Here, we introduce WiseMind, a novel multi-agent framework inspired by the theory of Dialectical Behavior Therapy designed to facilitate psychiatric assessment. By integrating a "Reasonable Mind" Agent for evidence-based logic and an "Emotional Mind" Agent for empathetic communication, WiseMind effectively bridges the gap between instrumental accuracy and humanistic care. Our framework utilizes a Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5)-guided Structured Knowledge Graph to steer diagnostic inquiries, significantly reducing hallucinations compared to standard prompting methods. Using a combination of virtual standard patients, simulated interactions, and real human interaction datasets, we evaluate WiseMind across three common psychiatric conditions. WiseMind outperforms state-of-the-art LLM methods in both identifying critical diagnostic nodes and establishing accurate differential diagnoses. Across 1206 simulated conversations and 180 real user sessions, the system achieves 85.6% top-1 diagnostic accuracy, approaching reported diagnostic performance ranges of board-certified psychiatrists and surpassing knowledge-enhanced single-agent baselines by 15-54 percentage points. Expert review by psychiatrists further validates that WiseMind generates responses that are not only clinically sound but also psychologically supportive, demonstrating the feasibility of empathetic, reliable AI agents to conduct psychiatric assessments under appropriate human oversight.

URL PDF HTML ☆

赞 0 踩 0

2502.20503 2026-04-20 cs.CL

Protecting multimodal large language models against misleading visualizations

Jonathan Tonglet, Tinne Tuytelaars, Marie-Francine Moens, Iryna Gurevych

Comments Preprint. Code and data available at https://github.com/UKPLab/arxiv2025-misleading-visualizations

2502.19312 2026-04-20 cs.LG cs.AI cs.CL cs.HC stat.ML

FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users

Anikait Singh, Sheryl Hsu, Kyle Hsu, Eric Mitchell, Stefano Ermon, Tatsunori Hashimoto, Archit Sharma, Chelsea Finn

Comments Website: https://fewshot-preference-optimization.github.io/

2501.05281 2026-04-20 cs.CV cs.LG

Comparison Study: Glacier Calving Front Delineation in Synthetic Aperture Radar Images With Deep Learning

Nora Gourmelon, Konrad Heidler, Erik Loebel, Daniel Cheng, Julian Klink, Anda Dong, Fei Wu, Noah Maul, Moritz Koch, Marcel Dreier, Dakota Pyles, Thorsten Seehaus, Matthias Braun, Andreas Maier, Vincent Christlein

Comments Accepted as short paper in IEEE Transactions on Pattern Analysis and Machine Intelligence

2412.04300 2026-04-20 cs.CV cs.AI

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts

Ziwei Huang, Wanggui He, Quanyu Long, Yandi Wang, Haoyuan Li, Zhelun Yu, Fangxun Shu, Long Chan, Hao Jiang, Fei Wu, Leilei Gan

2411.18328 2026-04-20 cs.CV

EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond

Meiqi Cao, Xiangbo Shu, Jiachao Zhang, Rui Yan, Zechao Li, Jinhui Tang

Comments The experiments in this paper are not comprehensive enough to make the conclusions convincing. The authors are adding more experimental scenarios and will resubmit after completion

2411.12502 2026-04-20 cs.LG cs.AI stat.ML

Transformer Neural Processes - Kernel Regression

Daniel Jenson, Jhonathan Navott, Mengyan Zhang, Makkunda Sharma, Elizaveta Semenova, Seth Flaxman

Comments This was superseded by 'Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes' (arXiv:2506.09163)

2411.10446 2026-04-20 cs.RO cs.AI

VeriGraph: Scene Graphs for Execution Verifiable Robot Planning

Daniel Ekpo, Mara Levy, Saksham Suri, Chuong Huynh, Archana Swaminathan, Abhinav Shrivastava

Comments Accepted to ICRA 2026. Project website: https://verigraph-agent.github.io

2410.13149 2026-04-20 cs.RO

Power in Numbers: Primitive Algorithm for Swarm Robot Navigation in Unknown Environments

Yusuke Tsunoda, Shoken Otsuka, Kazuki Ito, Runze Xiao, Keisuke Naniwa, Yuichiro Sueoka, Koichi Osuka

Comments 11 pages, 22 figures

2406.15809 2026-04-20 cs.CL cs.LG

LaMSUM: Amplifying Voices Against Harassment through LLM Guided Extractive Summarization of User Incident Reports

Garima Chhikara, Anurag Sharma, V. Gurucharan, Kripabandhu Ghosh, Abhijnan Chakraborty

Comments Accepted at ICWSM 2026

2305.10947 2026-04-20 cs.LG cs.AI cs.CV cs.PF

Revisiting 16-bit Neural Network Training: A Practical Approach for Resource-Limited Learning

Juyoung Yun, Sol Choi, Francois Rameau, Byungkon Kang, Zhoulai Fu

2604.16239 2026-04-20 stat.ML cs.LG

Adaptive multi-fidelity optimization with fast learning rates

Come Fiegel, Victor Gabillon, Michal Valko

Comments Published at International Conference on Artificial Intelligence and Statistics (AISTATS) 2020

2604.16224 2026-04-20 cs.HC cs.AI cs.CY

"Taking Stock at FAccT": Using Participatory Design to Co-Create a Vision for the Fairness, Accountability and Transparency Community

Shiran Dudy, Jan Simson, Yanan Long

Comments Accepted at FAccT 2026, 27 pages, 9 figures

2604.16205 2026-04-20 cond-mat.mtrl-sci cs.AI physics.chem-ph

ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis

Vitor F. Grizzi, Thang Duc Pham, Luke N. Pretzie, Jiayi Xu, Murat Keceli, Cong Liu

2604.16116 2026-04-20 cs.ET cs.AI cs.CY

The Relic Condition: When Published Scholarship Becomes Material for Its Own Replacement

Lin Deng, Chang-bo Liu

详情

英文摘要

We extracted the scholarly reasoning systems of two internationally prominent humanities and social science scholars from their published corpora alone, converted those systems into structured inference-time constraints for a large language model, and tested whether the resulting scholar-bots could perform core academic functions at expert-assessed quality. The distillation pipeline used an eight-layer extraction method and a nine-module skill architecture grounded in local, closed-corpus analysis. The scholar-bots were then deployed across doctoral supervision, peer review, lecturing and panel-style academic exchange. Expert assessment involved three senior academics producing reports and appointment-level syntheses. Across the preserved expert record, all review and supervision reports judged the outputs benchmark-attaining, appointment-level recommendations placed both bots at or above Senior Lecturer level in the Australian university system, and recovered panel scores placed Scholar A between 7.9 and 8.9/10 and Scholar B between 8.5 and 8.9/10 under multi-turn debate conditions. A research-degree-student survey showed high performance ratings across information reliability, theoretical depth and logical rigor, with pronounced ceiling effects on a 7-point scale, despite all participants already being frontier-model users. We term this the Relic condition: when publication systems make stable reasoning architectures legible, extractable and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement. Because the technical threshold for this transition is already crossed at modest engineering effort, we argue that the window for protective frameworks covering disclosure, consent, compensation and deployment restriction is the present, while deployment remains optional rather than infrastructural.

URL PDF HTML ☆

赞 0 踩 0

2604.16104 2026-04-20 eess.IV cs.AI cs.CV

Dual-Modal Lung Cancer AI: Interpretable Radiology and Microscopy with Clinical Risk Integration

Baramee Sukumal, Aueaphum Aueawatthanaphisut

Comments 16 pages, 6 figures, 3 tables, 8 equations