arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.20800 2026-04-23 cs.CV cs.LG

LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

Dimitrije Antić, Alvaro Budria, George Paschalidis, Sai Kumar Dwivedi, Dimitrios Tzionas

Comments 26 pages, 11 figures, 4 tables. Project page: https://anticdimi.github.io/lexis

详情

英文摘要

Reconstructing 3D Human-Object Interaction from an RGB image is essential for perceptive systems. Yet, this remains challenging as it requires capturing the subtle physical coupling between the body and objects. While current methods rely on sparse, binary contact cues, these fail to model the continuous proximity and dense spatial relationships that characterize natural interactions. We address this limitation via InterFields, a representation that encodes dense, continuous proximity across the entire body and object surfaces. However, inferring these fields from single images is inherently ill-posed. To tackle this, our intuition is that interaction patterns are characteristically structured by the action and object geometry. We capture this structure in LEXIS, a novel discrete manifold of interaction signatures learned via a VQ-VAE. We then develop LEXIS-Flow, a diffusion framework that leverages LEXIS signatures to estimate human and object meshes alongside their InterFields. Notably, these InterFields help in a guided refinement that ensures physically-plausible, proximity-aware reconstructions without requiring post-hoc optimization. Evaluation on Open3DHOI and BEHAVE shows that LEXIS-Flow significantly outperforms existing SotA baselines in reconstruction, contact, and proximity quality. Our approach not only improves generalization but also yields reconstructions perceived as more realistic, moving us closer to holistic 3D scene understanding. Code & models will be public at https://anticdimi.github.io/lexis.

URL PDF HTML ☆

赞 0 踩 0

2604.20799 2026-04-23 cs.RO

A Hough transform approach to safety-aware scalar field mapping using Gaussian Processes

Muzaffar Qureshi, Trivikram Satharasi, Tochukwu E. Ogri, Kyle Volle, Rushikesh Kamalapurkar

2604.20796 2026-04-23 cs.CV

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

Inclusion AI, Tiwei Bie, Haoxing Chen, Tieyuan Chen, Zhenglin Cheng, Long Cui, Kai Gan, Zhicheng Huang, Zhenzhong Lan, Haoquan Li, Jianguo Li, Tao Lin, Qi Qin, Hongjun Wang, Xiaomei Wang, Haoyuan Wu, Yi Xin, Junbo Zhao

Comments LLaDA2.0-Uni Technical Report

2604.20795 2026-04-23 cs.AI

Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems

Pavel Salovskii, Iuliia Gorshkova

Comments Artificial Intelligence; Knowledge Representation and Reasoning; Information Retrieval; Machine Learning

详情

DOI: 10.5281/zenodo.19696042

英文摘要

This paper presents a hybrid architecture for intelligent systems in which large language models (LLMs) are extended with an external ontological memory layer. Instead of relying solely on parametric knowledge and vector-based retrieval (RAG), the proposed approach constructs and maintains a structured knowledge graph using RDF/OWL representations, enabling persistent, verifiable, and semantically grounded reasoning. The core contribution is an automated pipeline for ontology construction from heterogeneous data sources, including documents, APIs, and dialogue logs. The system performs entity recognition, relation extraction, normalization, and triple generation, followed by validation using SHACL and OWL constraints, and continuous graph updates. During inference, LLMs operate over a combined context that integrates vector-based retrieval with graph-based reasoning and external tool interaction. Experimental observations on planning tasks, including the Tower of Hanoi benchmark, indicate that ontology augmentation improves performance in multi-step reasoning scenarios compared to baseline LLM systems. In addition, the ontology layer enables formal validation of generated outputs, transforming the system into a generation-verification-correction pipeline. The proposed architecture addresses key limitations of current LLM-based systems, including lack of long-term memory, weak structural understanding, and limited reasoning capabilities. It provides a foundation for building agent-based systems, robotics applications, and enterprise AI solutions that require persistent knowledge, explainability, and reliable decision-making.

URL PDF HTML ☆

赞 0 踩 0

2604.20791 2026-04-23 cs.CL cs.AI

Can "AI" Be a Doctor? A Study of Empathy, Readability, and Alignment in Clinical LLMs

Mariano Barone, Francesco Di Serio, Roberto Moio, Marco Postiglione, Giuseppe Riccio, Antonio Romano, Vincenzo Moscato

2604.20784 2026-04-23 cs.CV

GeoRect4D: Geometry-Compatible Generative Rectification for Dynamic Sparse-View 3D Reconstruction

Zhenlong Wu, Zihan Zheng, Xuanxuan Wang, Qianhe Wang, Hua Yang, Xiaoyun Zhang, Qiang Hu, Wenjun Zhang

2604.20779 2026-04-23 cs.AI cs.CY cs.SE

SWE-chat: Coding Agent Interactions From Real Users in the Wild

Joachim Baumann, Vishakh Padmakumar, Xiang Li, John Yang, Diyi Yang, Sanmi Koyejo

2604.20777 2026-04-23 cs.LG

Efficient Multi-Cohort Inference for Long-Term Effects and Lifetime Value in A/B Testing with User Learning

Dario Simionato, Andrea Tonon, Mingxue Wang, Weiguo Wang, Tong Gui, Xiaoyue Li

2604.20775 2026-04-23 cs.LG

Relative Entropy Estimation in Function Space: Theory and Applications to Trajectory Inference

Chao Wang, Luca Nepote, Giulio Franzese, Pietro Michiardi

2604.20760 2026-04-23 cs.CV

Exploring High-Order Self-Similarity for Video Understanding

Manjin Kim, Heeseung Kwon, Karteek Alahari, Minsu Cho

2604.20755 2026-04-23 cs.AI cs.LG

V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization

Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang

Comments 15 pages, 4 figures, 4 tables

2604.20749 2026-04-23 cs.AI

Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation

Dongding Lin, Jian Wang, Yongqi Li, Wenjie Li

Comments Accpeted by ACL 2026

2604.20748 2026-04-23 cs.CV

Amodal SAM: A Unified Amodal Segmentation Framework with Generalization

Bo Zhang, Zhuotao Tian, Xin Tao, Songlin Tang, Jun Yu, Wenjie Pei

2604.20745 2026-04-23 cs.LG cs.CV

Lifecycle-Aware Federated Continual Learning in Mobile Autonomous Systems

Beining Wu, Jun Huang

Comments Submitted to IEEE

2604.20744 2026-04-23 cs.AI cs.LG cs.RO

AAC: Admissible-by-Architecture Differentiable Landmark Compression for ALT

An T. Le, Vien Ngo

Comments 50 pages, 8 figures, 24 tables, submitted to Transactions on Machine Learning Research

2604.20738 2026-04-23 cs.CL

RespondeoQA: a Benchmark for Bilingual Latin-English Question Answering

Marisa Hudspeth, Patrick J. Burns, Brendan O'Connor

Comments Published in LREC 2026

2604.20736 2026-04-23 cs.LG

F\textsuperscript{2}LP-AP: Fast \& Flexible Label Propagation with Adaptive Propagation Kernel

Yutong Shen, Ruizhe Xia, Jingyi Liu, Yinqi Liu

Comments 16 pages, 5 figures

2604.20735 2026-04-23 cs.LG cs.SY eess.SY physics.comp-ph

Fast Bayesian equipment condition monitoring via simulation based inference: applications to heat exchanger health

Peter Collett, Alexander Johannes Stasik, Simone Casolo, Signe Riemer-Sørensen

Comments Submitted, 15 pages, 9 figures, code available on github

2604.20733 2026-04-23 cs.LG

Near-Future Policy Optimization

Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Dingyu Yao, Zheng Lin, Peng Fu, Nan Duan, Jiaqi Wang

Comments Work in progress

2604.20728 2026-04-23 cs.AI cs.SY eess.SY

Interval POMDP Shielding for Imperfect-Perception Agents

William Scarbro, Ravi Mangal

Comments 15 pages, 7 figures

2604.20727 2026-04-23 cs.LG cs.AI

Supplement Generation Training for Enhancing Agentic Task Performance

Young Min Cho, Daniele Bonadiman, Divya Bhargavi, Tamer Alkhouli, Salvatore Romeo, Dongwei Jiang, Khushbu Pahwa, Yubin Ge, Etsuko Ishii, Monica Sunkara, Yi Zhang

Comments Accepted to the Findings of ACL 2026

2604.20723 2026-04-23 cs.LG cs.AI

Tokenised Flow Matching for Hierarchical Simulation Based Inference

Giovanni Charles, Cosmo Santoni, Seth Flaxman, Elizaveta Semenova

Comments 31 pages, 11 figures

2604.20721 2026-04-23 cs.RO

ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement

Yutong Shen, Hangxu Liu, Lei Zhang, Penghui Liu, Yinqi Liu, Liuxiang Yang, Tongtong Feng

Comments 10 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:2508.07842

2604.20720 2026-04-23 cs.LG cs.AI cs.CL

COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

Noah Flynn

2604.20719 2026-04-23 cs.SD cs.AI cs.MM eess.AS

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence

Menghe Ma, Siqing Wei, Yuecheng Xing, Yaheng Wang, Fanhong Meng, Peijun Han, Luu Anh Tuan, Haoran Luo

Comments 12 pages, 8 figures

2604.20715 2026-04-23 cs.CV

GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

Yuxuan Xue, Ruofan Liang, Egor Zakharov, Timur Bagautdinov, Chen Cao, Giljoo Nam, Shunsuke Saito, Gerard Pons-Moll, Javier Romero

Comments CVPR 2026 Highlight; Project page: https://yuxuan-xue.com

2604.20714 2026-04-23 cs.AI

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

Shan He, Runze Wang, Zhuoyun Du, Huiyu Bai, Zouying Cao, Yu Cheng, Bo Zheng

2604.20712 2026-04-23 cs.RO

Visual-Tactile Peg-in-Hole Assembly Learning from Peg-out-of-Hole Disassembly

Yongqiang Zhao, Xuyang Zhang, Zhuo Chen, Matteo Leonetti, Emmanouil Spyrakos-Papastavridis, Shan Luo

详情

DOI: 10.1109/LRA.2026.3679227
Journal ref: IEEE Robotics and Automation Letters, vol. 11, no. 6, pp. 6712-6719, June 2026

英文摘要

Peg-in-hole (PiH) assembly is a fundamental yet challenging robotic manipulation task. While reinforcement learning (RL) has shown promise in tackling such tasks, it requires extensive exploration. In this paper, we propose a novel visual-tactile skill learning framework for the PiH task that leverages its inverse task, i.e., peg-out-of-hole (PooH) disassembly, to facilitate PiH learning. Compared to PiH, PooH is inherently easier as it only needs to overcome existing friction without precise alignment, making data collection more efficient. To this end, we formulate both PooH and PiH as Partially Observable Markov Decision Processes (POMDPs) in a unified environment with shared visual-tactile observation space. A visual-tactile PooH policy is first trained; its trajectories, containing kinematic, visual and tactile information, are temporally reversed and action-randomized to provide expert data for PiH. In the policy learning, visual sensing facilitates the peg-hole approach, while tactile measurements compensate for peg-hole misalignment. Experiments across diverse peg-hole geometries show that the visual-tactile policy attains 6.4% lower contact forces than its single-modality counterparts, and that our framework achieves average success rates of 87.5% on seen objects and 77.1% on unseen objects, outperforming direct RL methods that train PiH policies from scratch by 18.1% in success rate. Demos, code, and datasets are available at https://sites.google.com/view/pooh2pih.

URL PDF HTML ☆

赞 0 踩 0

2604.20711 2026-04-23 cs.AI cs.HC

Participatory provenance as representational auditing for AI-mediated public consultation

Sachit Mahajan

2604.20707 2026-04-23 cs.LG cs.SY eess.SY

Generative Flow Networks for Model Adaptation in Digital Twins of Natural Systems

Pascal Archambault, Houari Sahraoui, Eugene Syriani

Comments Under Review