arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.07663 2026-04-22 cs.AI cs.CL

Reasoning Models Will Sometimes Lie About Their Reasoning

William Walden, Miriam Wanner

详情

英文摘要

Hint-based faithfulness evaluations have established that Large Reasoning Models (LRMs) may not say what they think: they do not always volunteer information about how key parts of the input (e.g. answer hints) influence their reasoning. Yet, these evaluations also fail to specify what models should do when confronted with hints or other unusual prompt content -- even though versions of such instructions are standard security measures (e.g. for countering prompt injections). Here, we study faithfulness under this more realistic setting in which models are explicitly alerted to the possibility of unusual inputs. We find that such instructions can yield strong results on faithfulness metrics from prior work. However, results on new, more granular metrics proposed in this work paint a mixed picture: although models may acknowledge the presence of hints, they will often deny intending to use them -- even when permitted to use hints and even when it can be demonstrated that they are using them. Our results thus raise broader challenges for CoT monitoring and interpretability.

URL PDF HTML ☆

赞 0 踩 0

2601.07056 2026-04-22 cs.CV cs.AI

Adversarial Attacks on Medical Hyperspectral Imaging Exploiting Spectral-Spatial Dependencies and Multiscale Features

Yunrui Gu, Zhenzhe Gao, Cong Kong, Jiawei Du, Zhaoxia Yin

2601.04925 2026-04-22 cs.CL

Can AI-Generated Persuasion Be Detected? Persuaficial Benchmark and AI vs. Human Linguistic Differences

Arkadiusz Modzelewski, Paweł Golik, Anna Kołos, Giovanni Da San Martino

Comments Accepted to ACL 2026 Main Conference

2601.04562 2026-04-22 cs.AI

Reasoning Over Space: Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation

Dongyi Lv, Qiuyu Ding, Heng-Da Xu, Zhaoxu Sun, Zhi Wang, Feng Xiong, Mu Xu

2512.13684 2026-04-22 cs.CV

Recurrent Video Masked Autoencoders

Daniel Zoran, Nikhil Parthasarathy, Yi Yang, Drew A Hudson, Joao Carreira, Andrew Zisserman

2511.21931 2026-04-22 cs.LG cs.AI

Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment

Henry Salgado, Meagan R. Kendall, Martine Ceberio

2511.21893 2026-04-22 cs.LG

Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

Fatemeh Akbarian, Anahita Baninajjar, Yingyi Zhang, Ananth Balashankar, Amir Aminifar

2511.16164 2026-04-22 cs.LG stat.AP

Achieving Skilled and Reliable Daily Probabilistic Forecasts of Wind Power at Subseasonal-to-Seasonal Timescales over France

Eloi Lindas, Yannig Goude, Philippe Ciais

2511.11793 2026-04-22 cs.CL

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

MiroMind Team, Song Bai, Lidong Bing, Carson Chen, Guanzheng Chen, Yuntao Chen, Zhe Chen, Ziyi Chen, Jifeng Dai, Xuan Dong, Wenhan Dou, Yue Deng, Yunjie Fu, Junqi Ge, Chenxia Han, Tammy Huang, Zhenhang Huang, Jerry Jiao, Shilei Jiang, Tianyu Jiao, Xiaoqi Jian, Lei Lei, Ruilin Li, Gen Luo, Tiantong Li, Xiang Lin, Ziyuan Liu, Zhiqi Li, Jie Ni, Qiang Ren, Pax Sun, Shiqian Su, Chenxin Tao, Bin Wang, Wenhai Wang, Haonan Wang, James Wang, Jin Wang, Jojo Wang, Letian Wang, Shizun Wang, Weizhi Wang, Zixuan Wang, Jinfan Xu, Sen Xing, Chenyu Yang, Hai Ye, Jiaheng Yu, Yue Yu, Muyan Zhong, Tianchen Zhao, Xizhou Zhu, Yanpeng Zhou, Yifan Zhang, Zhi Zhu

Comments Technical Report

2511.08418 2026-04-22 cs.LG

Physics-Informed Neural Operators for Cardiac Electrophysiology

Hannah Lydon, Milad Kazemi, Martin Bishop, Nicola Paoletti

Comments All code used in this work, including experimental results, can be found at https://github.com/janet-9/CardiacEP-PINOS This work was accepted for a poster presentation at the 2026 L4DC conference

2511.05540 2026-04-22 cs.RO cs.AI cs.CV cs.LG cs.NE

Constructing the Umwelt: Cognitive Planning through Belief-Intent Co-Evolution

Shiyao Sang

Comments 12 pages, 8 figures. A paradigm shift from reconstructing the world to understanding it: planning through Belief-Intent Co-Evolution

2511.04320 2026-04-22 cs.RO

MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments

Kuankuan Sima, Longbin Tang, Zhenyu Yang, Haozhe Ma, Lin Zhao

Comments Accepted by IEEE Robotics and Automation Letters

2510.20087 2026-04-22 cs.CV

Endoshare: A Publicly Available, Surgeons-Friendly Solution to De-Identify and Manage Surgical Videos

Lorenzo Arboit, Dennis N. Schneider, Britty Baby, Vinkle Srivastav, Pietro Mascagni, Nicolas Padoy

Comments 13 pages, 6 figures. Source-available software: https://camma-public.github.io/Endoshare/

详情

DOI: 10.1007/s00464-026-12699-4
Journal ref: Surg Endosc, 2026

英文摘要

Video-based assessment and surgical data science can advance surgical training, research, and quality improvement, yet adoption remains limited by heterogeneous recording formats and privacy concerns linked to video sharing. This work develops, evaluates, and publicly releases Endoshare, a surgeon-friendly application that merges, standardizes, and de-identifies endoscopic videos. Development followed an iterative, user-centered software life cycle. In the analysis phase, an internal survey of four clinicians and four computer scientists, based on 10 usability heuristics, identified early requirements and guided a cross-platform, privacy-by-design architecture. Prototype testing reported high usability for clinicians (4.68 +/- 0.40 out of 5) and for computer scientists (4.03 +/- 0.51 out of 5), with the lowest score (4.00 +/- 0.93 out of 5) relating to label clarity, prompting interface refinement to streamline case selection, video merging, automated out-of-body removal, and filename pseudonymization. In the testing phase, ten surgeons completed an external survey combining the same heuristics with Technology Acceptance Model constructs, reporting high perceived usefulness (5.07 +/- 1.75 out of 7), ease of use (5.15 +/- 1.71 out of 7), heuristic usability (4.38 +/- 0.48 out of 5), and strong recommendation likelihood (9.20 +/- 0.79 out of 10). A performance assessment across different hardware and configurations showed that processing time increased proportionally with video duration and was consistently lower in fast mode. Endoshare is a publicly available solution to manage surgical videos, with potential to support training, research, and quality improvement. Compliance certification and broader interoperability validation are needed to establish it as a reliable tool for surgical video management. The software is available at https://camma-public.github.io/Endoshare

URL PDF HTML ☆

赞 0 踩 0

2510.16822 2026-04-22 cs.CV cs.AI

ReefNet: A Large-Scale Dataset and Benchmark for Fine-Grained Coral Reef Recognition

Abdulwahab Felemban, Yahia Battach, Faizan Farooq Khan, Yuqian Fu, Xuhui Liu, Yesmeen M. Khattab, Yousef A. Radwan, Xiang Li, Fabio Marchese, Sara Beery, Burton H. Jones, Francesca Benzoni, Mohamed Elhoseiny

2510.14630 2026-04-22 cs.CV

Adapting Self-Supervised Representations as a Latent Space for Efficient Generation

Ming Gui, Johannes Schusterbauer, Timy Phan, Felix Krause, Josh Susskind, Miguel Angel Bautista, Björn Ommer

Comments ICLR 2026, Code: https://github.com/CompVis/RepTok

2510.09204 2026-04-22 cs.RO cs.LG

Flow-Opt: Scalable Centralized Multi-Robot Trajectory Optimization with Flow Matching and Differentiable Optimization

Simon Idoko, Prajyot Jadhav, Arun Kumar Singh

详情

英文摘要

Centralized trajectory optimization in the joint space of multiple robots allows access to a larger feasible space that can result in smoother trajectories, especially while planning in tight spaces. Unfortunately, it is often computationally intractable beyond a very small swarm size. In this paper, we propose Flow-Opt, a learning-based approach towards improving the computational tractability of centralized multi-robot trajectory optimization. Specifically, we reduce the problem to first learning a generative model to sample different candidate trajectories and then using a learned Safety-Filter(SF) to ensure fast inference-time constraint satisfaction. We propose a flow-matching model with a diffusion transformer (DiT) augmented with permutation invariant robot position and map encoders as the generative model. We develop a custom solver for our SF and equip it with a neural network that predicts context-specific initialization. The initialization network is trained in a self-supervised manner, taking advantage of the differentiability of the SF solver. We advance the state-of-the-art in the following respects. First, we show that we can generate trajectories of tens of robots in cluttered environments in a few tens of milliseconds. This is several times faster than existing centralized optimization approaches. Moreover, our approach also generates smoother trajectories orders of magnitude faster than competing baselines based on diffusion models. Second, each component of our approach can be batched, allowing us to solve a few tens of problem instances in a fraction of a second. We believe this is a first such result; no existing approach provides such capabilities. Finally, our approach can generate a diverse set of trajectories between a given set of start and goal locations, which can capture different collision-avoidance behaviors.

URL PDF HTML ☆

赞 0 踩 0

2510.04800 2026-04-22 cs.CL

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

Sangmin Bae, Bilge Acun, Chien-Yu Lin, Haroun Habeeb, Seungyeon Kim, Liang Luo, Junjie Wang, Carole-Jean Wu

Comments 41 pages, 8 figures, 22 tables;

2509.24803 2026-04-22 cs.LG cs.AI

TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models

Tong Guan, Zijie Meng, Dianqi Li, Shiyu Wang, Chao-Han Huck Yang, Qingsong Wen, Zuozhu Liu, Sabato Marco Siniscalchi, Ming Jin, Shirui Pan

Comments Accepted by the 14th International Conference on Learning Representations (ICLR 2026)

详情

英文摘要

Recent advances in multimodal time series learning underscore a paradigm shift from analytics centered on basic patterns toward advanced time series understanding and reasoning. However, existing multimodal time series datasets mostly remain at the level of surface alignment and question answering, without reaching the depth of genuine reasoning. The absence of well-defined tasks that genuinely require time series reasoning, along with the scarcity of high-quality data, has limited progress in building practical time series reasoning models (TSRMs). To this end, we introduce Time Series Reasoning Suite (TSR-Suite), which formalizes four atomic tasks that span three fundamental capabilities for reasoning with time series: (1) perception, acquired through scenario understanding and causality discovery; (2) extrapolation, realized via event-aware forecasting; and (3) decision-making, developed through deliberation over perception and extrapolation. TSR-Suite is the first comprehensive time series reasoning suite that supports not only thorough evaluation but also the data pipeline and training of TSRMs. It contains more than 23K samples, of which 2.3K are carefully curated through a human-guided hierarchical annotation process. Building on this foundation, we introduce TimeOmni-1, the first unified reasoning model designed to address diverse real-world problems demanding time series reasoning. The model is trained in multiple stages, integrating a mixture of task scenarios, novel reward functions, and tailored optimizations. Experiments show that TimeOmni-1 delivers strong out-of-distribution generalization across all tasks and achieves a high rate of valid responses. It significantly improves causality discovery accuracy (64.0% vs. 35.9% with GPT-4.1) and raises the valid response rate by over 6% compared to GPT-4.1 on the event-aware forecasting task.

URL PDF HTML ☆

赞 0 踩 0

2509.07966 2026-04-22 cs.CV cs.CL

Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

Boammani Aser Lompo, Marc Haraoui

Comments Accepted at the First Workshop on Foundations of Reasoning in Language Models, NeurIPS 2025. Available at: https://openreview.net/forum?id=fvJRsGwhPf

2508.14170 2026-04-22 cs.CL cs.CY

Comparing energy consumption and accuracy in text classification inference

Johannes Zschache, Tilman Hartwig

Comments Key results in Figure 2, accepted in Nature Sci Rep, 32 pages

2508.12121 2026-04-22 cs.LG math.DS

Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks

Lorenzo Livi

Comments final version

2508.04818 2026-04-22 cs.CV eess.IV stat.ML

Single-Step Reconstruction-Free Anomaly Detection and Segmentation via Diffusion Models

Mehrdad Moradi, Marco Grasso, Bianca Maria Colosimo, Kamran Paynabar

Comments 9 pages, 8 figures, 1 table. Accepted to 2025 International Conference on Machine Learning and Applications (ICMLA)

详情

DOI: 10.1109/ICMLA66185.2025.00095
Journal ref: Proc. 2025 International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 2025, pp. 663-670

英文摘要

Generative models have demonstrated significant success in anomaly detection and segmentation over the past decade. Recently, diffusion models have emerged as a powerful alternative, outperforming previous approaches such as GANs and VAEs. In typical diffusion-based anomaly detection, a model is trained on normal data, and during inference, anomalous images are perturbed to a predefined intermediate step in the forward diffusion process. The corresponding normal image is then reconstructed through iterative reverse sampling. However, reconstruction-based approaches present three major challenges: (1) the reconstruction process is computationally expensive due to multiple sampling steps, making real-time applications impractical; (2) for complex or subtle patterns, the reconstructed image may correspond to a different normal pattern rather than the original input; and (3) Choosing an appropriate intermediate noise level is challenging because it is application-dependent and often assumes prior knowledge of anomalies, an assumption that does not hold in unsupervised settings. We introduce Reconstruction-free Anomaly Detection with Attention-based diffusion models in Real-time (RADAR), which overcomes the limitations of reconstruction-based anomaly detection. Unlike current SOTA methods that reconstruct the input image, RADAR directly produces anomaly maps from the diffusion model, improving both detection accuracy and computational efficiency. We evaluate RADAR on real-world 3D-printed material and the MVTec-AD dataset. Our approach surpasses state-of-the-art diffusion-based and statistical machine learning models across all key metrics, including accuracy, precision, recall, and F1 score. Specifically, RADAR improves F1 score by 7% on MVTec-AD and 13% on the 3D-printed material dataset compared to the next best model. Code available at: https://github.com/mehrdadmoradi124/RADAR

URL PDF HTML ☆

赞 0 踩 0

2508.03337 2026-04-22 cs.CV

Less is More: Token-Efficient Video-QA via Adaptive Frame-Pruning and Semantic Graph Integration

Shaoguang Wang, Weiyu Guo, Ziyang Chen, Yijie Xu, Xuming Hu, Hui Xiong

Comments Accepted to CVPR 2026 Findings

2508.02384 2026-04-22 cs.CV

SMART-Ship: A Comprehensive Synchronized Multi-modal Aligned Remote Sensing Targets Dataset and Benchmark for Berthed Ships Analysis

Chen-Chen Fan, Peiyao Guo, Linping Zhang, Kehan Qi, Haolin Huang, Yong-Qiang Mao, Yuxi Suo, Zhizhuo Jiang, Yu Liu, You He

2508.01959 2026-04-22 cs.CL

SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension

Junjie Wu, Jiangnan Li, Yuqing Li, Lemao Liu, Liyan Xu, Jiwei Li, Dit-Yan Yeung, Jie Zhou, Mo Yu

Comments ACL 2025 Main Conference. Our trained models can be downloaded from: https://huggingface.co/SituatedEmbedding

2507.09861 2026-04-22 cs.CV cs.AI

A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends

Yihao Ding, Siwen Luo, Yue Dai, Yanbei Jiang, Zechuan Li, Qiang Sun, Geoffrey Martin, Wei Liu, Yifan Peng

Comments Accepted at ACL 2026 Findings

2506.22174 2026-04-22 cs.RO cs.LG

ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research

Bavo Lesy, Siemen Herremans, Robin Kerstens, Jan Steckel, Walter Daems, Siegfried Mercelis, Ali Anwar

Comments 18 Pages, 13 Figures. Accepted at IEEE ACCESS

2506.18871 2026-04-22 cs.CV cs.AI cs.CL

OmniGen2: Towards Instruction-Aligned Multimodal Generation

Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, Ze Liu, Ziyi Xia, Chaofan Li, Haoge Deng, Jiahao Wang, Kun Luo, Bo Zhang, Defu Lian, Xinlong Wang, Zhongyuan Wang, Tiejun Huang, Zheng Liu

2506.09373 2026-04-22 cs.LG cs.AI cs.CV

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

Jiaqi Tang, Yu Xia, Yi-Feng Wu, Yuwei Hu, Yuhui Chen, Qing-Guo Chen, Xiaogang Xu, Xiangyu Wu, Hao Lu, Yanqing Ma, Shiyin Lu, Qifeng Chen

Comments Accepted by ACL 2026 Findings

2505.16662 2026-04-22 cs.RO eess.SP

Joint Magnetometer-IMU Calibration via Maximum A Posteriori Estimation

Chuan Huang, Gustaf Hendeby, Isaac Skog

Comments Accepted version