arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.21293 2026-04-23 cs.AI cs.HC

Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles

Siddharth Mehrotra, Jin Huang, Xuelong Fu, Roel Dobbe, Clara I. Sánchez, Maarten de Rijke

Comments Submitted to Journal of Artificial Intelligence Research (JAIR)

详情

DOI: 10.1613/jair.1.20729
Journal ref: Journal of Artificial Intelligence Research (2026)

英文摘要

Background: Trustworthy AI serves as a foundational pillar for two major AI ethics conferences: AIES and FAccT. However, current research often adopts techno-centric approaches, focusing primarily on technical attributes such as reliability, robustness, and fairness, while overlooking the sociotechnical dimensions critical to understanding AI trustworthiness in real-world contexts. Objectives: This scoping review aims to examine how the AIES and FAccT communities conceptualize, measure, and validate AI trustworthiness, identifying major gaps and opportunities for advancing a holistic understanding of trustworthy AI systems. Methods: We conduct a scoping review of AIES and FAccT conference proceedings to date, systematically analyzing how trustworthiness is defined, operationalized, and applied across different research domains. Our analysis focuses on conceptualization approaches, measurement methods, verification and validation techniques, application areas, and underlying values. Results: While significant progress has been made in defining technical attributes such as transparency, accountability, and robustness, our findings reveal critical gaps. Current research often predominantly emphasizes technical precision at the expense of social and ethical considerations. The sociotechnical nature of AI systems remains less explored and trustworthiness emerges as a contested concept shaped by those with the power to define it. Conclusions: An interdisciplinary approach combining technical rigor with social, cultural, and institutional considerations is essential for advancing trustworthy AI. We propose actionable measures for the AI ethics community to adopt holistic frameworks that genuinely address the complex interplay between AI systems and society, ultimately promoting responsible technological development that benefits all stakeholders.

URL PDF HTML ☆

赞 0 踩 0

2510.18263 2026-04-23 cs.LG cs.CV cs.GR

From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation

Ziwei Huang, Ying Shu, Hao Fang, Quanyu Long, Wenya Wang, Qiushi Guo, Tiezheng Ge, Leilei Gan

2510.17261 2026-04-23 cs.RO cs.LG

High-Level Multi-Robot Trajectory Planning And Spurious Behavior Detection

Fernando Salanova, Jesús Roche, Cristian Mahulea, Eduardo Montijano

Comments 6 pages,3 figures, Iberian Robotics Conference 2025

2510.15339 2026-04-23 cs.CL cs.AI

AutoGraph-R1: End-to-End Reinforcement Learning for Knowledge Graph Construction

Hong Ting Tsang, Jiaxin Bai, Haoyu Huang, Qiao Xiao, Tianshi Zheng, Baixuan Xu, Shujie Liu, Yangqiu Song

2510.14274 2026-04-23 cs.CL

Retrofitting Small Multilingual Models for Retrieval: Matching 7B Performance with 300M Parameters

Lifu Tu, Yingbo Zhou, Semih Yavuz

Comments minor update from previous version

2510.13928 2026-04-23 cs.CL cs.AI

LLMs Can Get "Brain Rot": A Pilot Study on Twitter/X

Shuo Xing, Junyuan Hong, Yifan Wang, Runjin Chen, Zhenyu Zhang, Ananth Grama, Zhengzhong Tu, Zhangyang Wang

Comments Updated experiments with corrected data

2510.11041 2026-04-23 cs.RO

Unveiling Uncertainty-Aware Autonomous Cooperative Learning Based Planning Strategy

Shiyao Zhang, Liwei Deng, Shuyu Zhang, Weijie Yuan, Hong Zhang

Comments Accepted by IEEE RA-L

2510.10417 2026-04-23 cs.CV cs.AI cs.LG

Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis

Zhao-Yang Wang, Zhimin Shao, Anirudh Nanduri, Basudha Pal, Laura McDaniel, Jieneng Chen, Rama Chellappa

2510.09574 2026-04-23 cs.RO

Online Structure Learning and Planning for Autonomous Robot Navigation using Active Inference

Daria de tinguy, Tim Verbelen, Emilio Gamba, Bart Dhoedt

Comments yet to be submitted

2510.04225 2026-04-23 cs.CV cs.AI cs.CL

Locate-Then-Examine: Grounded Region Reasoning Improves Detection of AI-Generated Images

Yikun Ji, Yan Hong, Bowen Deng, Jun Lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang

Comments 18 pages, 11 figures (including supplementary material)

2510.03323 2026-04-23 cs.CL

Enhancing Agentic Textual Graph Retrieval with Synthetic Stepwise Supervision

Ge Chang, Jinbo Su, Jiacheng Liu, Pengfei Yang, Yuhao Shang, Huiwen Zheng, Hongli Ma, Yan Liang, Yuanchun Li, Yunxin Liu

2510.02215 2026-04-23 cs.LG

Improving Large-Scale Recommender Systems with Auxiliary Learning

Mertcan Cokbas, Ziteng Liu, Zeyi Tao, Elder Veliz, Qin Huang, Ellie Wen, Huayu Li, Qiang Jin, Murat Duman, Benjamin Au, Guy Lebanon, Sagar Chordia, Chengkai Zhang

2510.01706 2026-04-23 cs.LG cs.AI

Representational Alignment Across Model Layers and Brain Regions with Multi-Level Optimal Transport

Shaan Shah, Meenakshi Khosla

2509.25844 2026-04-23 cs.CL cs.HC

Believing without Seeing: Quality Scores for Contextualizing Vision-Language Model Explanations

Keyu He, Tejas Srinivasan, Brihi Joshi, Xiang Ren, Jesse Thomason, Swabha Swayamdipta

2509.24765 2026-04-23 cs.AI

Semantic-Aware Logical Reasoning via a Semiotic Framework

Yunyao Zhang, Xinglang Zhang, Junxi Sheng, Wenbing Li, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang, Zikai Song

Comments Accepted at ACL 2026 (Main Conference)

2509.22343 2026-04-23 cs.CL cs.AI cs.LG cs.LO

Transformers Can Learn Connectivity in Some Graphs but Not Others

Amit Roy, Abulhair Saparov

Comments This paper contains some assumption which is not correct

详情

英文摘要

Reasoning capability is essential to ensure the factual correctness of the responses of transformer-based Large Language Models (LLMs), and robust reasoning about transitive relations is instrumental in many settings, such as causal inference. Hence, it is essential to investigate the capability of transformers in the task of inferring transitive relations (e.g., knowing A causes B and B causes C, then A causes C). The task of inferring transitive relations is equivalent to the task of connectivity in directed graphs (e.g., knowing there is a path from A to B, and there is a path from B to C, then there is a path from A to C). Past research focused on whether transformers can learn to infer transitivity from in-context examples provided in the input prompt. However, transformers' capability to infer transitive relations from training examples and how scaling affects the ability is unexplored. In this study, we seek to answer this question by generating directed graphs to train transformer models of varying sizes and evaluate their ability to infer transitive relations for various graph sizes. Our findings suggest that transformers are capable of learning connectivity on "grid-like'' directed graphs where each node can be embedded in a low-dimensional subspace, and connectivity is easily inferable from the embeddings of the nodes. We find that the dimensionality of the underlying grid graph is a strong predictor of transformers' ability to learn the connectivity task, where higher-dimensional grid graphs pose a greater challenge than low-dimensional grid graphs. In addition, we observe that increasing the model scale leads to increasingly better generalization to infer connectivity over grid graphs. However, if the graph is not a grid graph and contains many disconnected components, transformers struggle to learn the connectivity task, especially when the number of components is large.

URL PDF HTML ☆

赞 0 踩 0

2509.15174 2026-04-23 cs.CL cs.AI

SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models

Huy Nghiem, Advik Sachdeva, Hal Daumé

Comments ACL 2026. NLP, Hate speech detection, explanation, LLM. Version 3

2509.00800 2026-04-23 cs.CV

Semantic-guided Gaussian Splatting for High-Fidelity Underwater Scene Reconstruction

Zhuodong Jiang, Haoran Wang, Guoxi Huang, Brett Seymour, Nantheera Anantrasirichai

详情

英文摘要

Accurate 3D reconstruction in degraded imaging conditions remains a key challenge in photogrammetry and neural rendering. In underwater environments, spatially varying visibility caused by scattering, attenuation, and sparse observations leads to highly non-uniform information quality. Existing 3D Gaussian Splatting (3DGS) methods typically optimize primitives based on photometric signals alone, resulting in imbalanced representation, with overfitting in well-observed regions and insufficient reconstruction in degraded areas. In this paper, we propose SWAGSplatting (Semantic-guided Water-scene Augmented Gaussian Splatting), a multimodal framework that integrates semantic priors into 3DGS for robust, high-fidelity underwater reconstruction. Each Gaussian primitive is augmented with a learnable semantic feature, supervised by CLIP-based embeddings derived from region-level cues. A semantic consistency loss is introduced to align geometric reconstruction with high-level semantics, improving structural coherence and preserving salient object boundaries under challenging conditions. Furthermore, we propose an adaptive Gaussian primitive reallocation strategy that redistributes representation capacity based on both primitive importance and reconstruction error, mitigating the imbalance introduced by conventional densification. This enables more effective modeling of low-visibility regions without increasing computational cost. Extensive experiments on real-world datasets, including SeaThru-NeRF, Submerged3D, and S-UW, demonstrate that the proposed method consistently outperforms state-of-the-art approaches in terms of average PSNR, SSIM, and LPIPS. The results validate the effectiveness of integrating semantic priors for high-fidelity underwater scene reconstruction. Code is available at https://github.com/theflash987/SWAGSplatting.

URL PDF HTML ☆

赞 0 踩 0

2509.00798 2026-04-23 cs.CV cs.AI

Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering

Changin Choi, Wonseok Lee, Jungmin Ko, Wonjong Rhee

2508.18609 2026-04-23 cs.CL cs.AI cs.LG

Task-Stratified Knowledge Scaling Laws for Post-Training Quantized Large Language Models

Chenxi Zhou, Pengfei Cao, Jiang Li, Bohan Yu, Jinyu Ye, Jun Zhao, Kang Liu

Comments Accepted to Findings of ACL 2026

2508.18168 2026-04-23 cs.CL

Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation

Hongyu Cao, Yuxuan Wu, Yucheng Cai, Xianyu Zhao, Zhijian Ou

2508.17761 2026-04-23 cs.LG stat.ML

Evaluating the Quality of the Quantified Uncertainty for (Re)Calibration of Data-Driven Regression Models

Jelke Wibbeke, Nico Schönfisch, Sebastian Rohjans, Andreas Rauh

详情

DOI: 10.1016/j.ijar.2026.109685
Journal ref: International Journal of Approximate Reasoning, Volume 195, 2026, 109685, ISSN 0888-613X

英文摘要

In safety-critical applications data-driven models must not only be accurate but also provide reliable uncertainty estimates. This property, commonly referred to as calibration, is essential for risk-aware decision-making. In regression a wide variety of calibration metrics and recalibration methods have emerged. However, these metrics differ significantly in their definitions, assumptions and scales, making it difficult to interpret and compare results across studies. Moreover, most recalibration methods have been evaluated using only a small subset of metrics, leaving it unclear whether improvements generalize across different notions of calibration. In this work, we systematically extract and categorize regression calibration metrics from the literature and benchmark these metrics independently of specific modelling methods or recalibration approaches. Through controlled experiments with real-world, synthetic and artificially miscalibrated data, we demonstrate that calibration metrics frequently produce conflicting results. Our analysis reveals substantial inconsistencies: many metrics disagree in their evaluation of the same recalibration result, and some even indicate contradictory conclusions. This inconsistency is particularly concerning as it potentially allows cherry-picking of metrics to create misleading impressions of success. We identify the Expected Normalized Calibration Error (ENCE) and the Coverage Width-based Criterion (CWC) as the most dependable metrics in our tests. Our findings highlight the critical role of metric selection in calibration research.

URL PDF HTML ☆

赞 0 踩 0

2508.14098 2026-04-23 cs.RO cs.AI

No More Marching: Learning Humanoid Locomotion for Short-Range SE(2) Targets

Pranay Dugar, Mohitvishnu S. Gadde, Jonah Siekmann, Yesh Godse, Aayam Shrestha, Alan Fern

2508.10171 2026-04-23 cs.CV cs.ET

SynSpill: Improved Industrial Spill Detection With Synthetic Data

Aaditya Baranwal, Abdul Mueez, Jason Voelker, Guneet Bhatia, Shruti Vyas

Comments Accepted at ICCV (VISION'25 Workshop) 2025

详情

DOI: 10.1109/ICCVW69036.2025.00152
Journal ref: 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 1425-1434

英文摘要

Large-scale Vision-Language Models (VLMs) have transformed general-purpose visual recognition through strong zero-shot capabilities. However, their performance degrades significantly in niche, safety-critical domains such as industrial spill detection, where hazardous events are rare, sensitive, and difficult to annotate. This scarcity -- driven by privacy concerns, data sensitivity, and the infrequency of real incidents -- renders conventional fine-tuning of detectors infeasible for most industrial settings. We address this challenge by introducing a scalable framework centered on a high-quality synthetic data generation pipeline. We demonstrate that this synthetic corpus enables effective Parameter-Efficient Fine-Tuning (PEFT) of VLMs and substantially boosts the performance of state-of-the-art object detectors such as YOLO and DETR. Notably, in the absence of synthetic data (SynSpill dataset), VLMs still generalize better to unseen spill scenarios than these detectors. When SynSpill is used, both VLMs and detectors achieve marked improvements, with their performance becoming comparable. Our results underscore that high-fidelity synthetic data is a powerful means to bridge the domain gap in safety-critical applications. The combination of synthetic generation and lightweight adaptation offers a cost-effective, scalable pathway for deploying vision systems in industrial environments where real data is scarce/impractical to obtain. Project Page: https://synspill.vercel.app

URL PDF HTML ☆

赞 0 踩 0

2508.09958 2026-04-23 cs.CL cs.LG

Neural Bandit Based Optimal LLM Selection for a Pipeline of Subtasks

Baran Atalar, Eddie Zhang, Carlee Joe-Wong

2508.08508 2026-04-23 cs.CV cs.CL

Re:Verse -- Can Your VLM Read a Manga?

Aaditya Baranwal, Madhav Kataria, Naitik Agrawal, Yogesh S Rawat, Shruti Vyas

Comments Accepted (oral) at ICCV (AISTORY Workshop) 2025

详情

DOI: 10.1109/ICCVW69036.2025.00398
Journal ref: 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 3820-3830

英文摘要

Current Vision Language Models (VLMs) demonstrate a critical gap between surface-level recognition and deep narrative reasoning when processing sequential visual storytelling. Through a comprehensive investigation of manga narrative understanding, we reveal that while recent large multimodal models excel at individual panel interpretation, they systematically fail at temporal causality and cross-panel cohesion, core requirements for coherent story comprehension. We introduce a novel evaluation framework that combines fine-grained multimodal annotation, cross-modal embedding analysis, and retrieval-augmented assessment to systematically characterize these limitations. Our methodology includes (i) a rigorous annotation protocol linking visual elements to narrative structure through aligned light novel text, (ii) comprehensive evaluation across multiple reasoning paradigms, including direct inference and retrieval-augmented generation, and (iii) cross-modal similarity analysis revealing fundamental misalignments in current VLMs' joint representations. Applying this framework to Re:Zero manga across 11 chapters with 308 annotated panels, we conduct the first systematic study of long-form narrative understanding in VLMs through three core evaluation axes: generative storytelling, contextual dialogue grounding, and temporal reasoning. Our findings demonstrate that current models lack genuine story-level intelligence, struggling particularly with non-linear narratives, character consistency, and causal inference across extended sequences. This work establishes both the foundation and practical methodology for evaluating narrative intelligence, while providing actionable insights into the capability of deep sequential understanding of Discrete Visual Narratives beyond basic recognition in Multimodal Models. Project Page: https://re-verse.vercel.app

URL PDF HTML ☆

赞 0 踩 0

2508.06614 2026-04-23 cs.LG cond-mat.stat-mech quant-ph

Local Diffusion Models and Phases of Data Distributions

Fangjun Hu, Guangkuo Liu, Yifan F. Zhang, Xun Gao

Comments 11+23 pages, 4+4 figures

详情

英文摘要

As a class of generative artificial intelligence frameworks inspired by statistical physics, diffusion models have shown extraordinary performance in synthesizing complicated data distributions through a denoising process gradually guided by score functions. Real-life data, like images, is often spatially structured in low-dimensional spaces. However, ordinary diffusion models ignore this local structure and learn spatially global score functions, which are often computationally expensive. In this work, motivated by recent advances in non-equilibrium statistical physics, we develop a generic framework for defining phases of data distributions and use it to analyze the locality requirements of denoisers in diffusion models. We define two distributions as belonging to the same data distribution phase if they can be mutually connected via spatially local operations such as local denoisers, along the same evolution path as the diffusion. We demonstrate that the reverse denoising process consists of an early trivial phase and a late data phase, sandwiching a rapid phase transition where local denoisers must fail. We further demonstrate that the performance of local denoisers is closely tied to spatial Markovianity, which provides an operational criterion for diagnosing such phase transitions. We validate this criterion through numerical experiments on real-world datasets. Our work suggests guidance for simpler and more efficient architectures of diffusion models: far from the phase transition point, we can use small local neural networks to compute the score function; global neural networks are only necessary around the narrow time interval of phase transitions. This result also opens up new directions for studying phases of data distributions, the broader science of generative artificial intelligence, and guiding the design of neural networks inspired by physics concepts.

URL PDF HTML ☆

赞 0 踩 0

2508.02644 2026-04-23 cs.AI

D2PPO: Diffusion Policy Policy Optimization with Dispersive Loss

Guowei Zou, Weibing Li, Hejun Wu, Yukun Qian, Yuhang Wang, Haitao Wang

2508.01575 2026-04-23 cs.LG

KANMixer: a minimal KAN-centered mixer for long-term time series forecasting

Lingyu Jiang, Dengzhe Hou, Yuping Wang, Yao Su, Shuo Xing, Wenjing Chen, Xin Zhang, Zhengzhong Tu, Ziming Zhang, Fangzhou Lin, Michael Zielewski, Kazunori D Yamada

Comments 11 pages, 3 figures, 5 tables

2508.00414 2026-04-23 cs.AI cs.CL

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Tianqing Fang, Zhisong Zhang, Xiaoyang Wang, Rui Wang, Can Qin, Yuxuan Wan, Jun-Yu Ma, Ce Zhang, Jiaqi Chen, Xiyun Li, Yonglin Wang, Jingchen Ni, Tianshi Zheng, Chun Chen, Wenhao Yu, Zhenwen Liang, Hongming Zhang, Haitao Mi, Dong Yu

Comments 21 pages