arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.16177 2026-04-22 cs.CV

Winner of CVPR2026 NTIRE Challenge on Image Shadow Removal: Semantic and Geometric Guidance for Shadow Removal via Cascaded Refinement

Lorenzo Beltrame, Jules Salzinger, Filip Svoboda, Jasmin Lampert, Phillipp Fanta-Jende, Radu Timofte, Marco Körner

Comments 10 pages, 4 figures, 5 tables, accepted at the CVPR 2026 Workshops (NTIRE 2026 Image Shadow Removal Challenge). Code and materials are available at https://github.com/AIT-Assistive-Autonomous-Systems/SGCR-SR . Corrected author name spelling in metadata and manuscript

2604.16079 2026-04-22 cs.CV

The Amazing Stability of Flow Matching

Rania Briq, Michael Kamp, Ohad Fried, Sarel Cohen, Stefan Kesselheim

Comments EurIPS 2025 Workshop on Principles of Generative Modeling (PriGM)

2604.15804 2026-04-22 cs.CL eess.AS

Qwen3.5-Omni Technical Report

Qwen Team

详情

英文摘要

In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor, Qwen3.5-Omni scales to hundreds of billions of parameters and supports a 256k context length. By leveraging a massive dataset comprising heterogeneous text-vision pairs and over 100 million hours of audio-visual content, the model demonstrates robust omni-modality capabilities. Qwen3.5-Omni-plus achieves SOTA results across 215 audio and audio-visual understanding, reasoning, and interaction subtasks and benchmarks, surpassing Gemini-3.1 Pro in key audio tasks and matching it in comprehensive audio-visual understanding. Architecturally, Qwen3.5-Omni employs a Hybrid Attention Mixture-of-Experts (MoE) framework for both Thinker and Talker, enabling efficient long-sequence inference. The model facilitates sophisticated interaction, supporting over 10 hours of audio understanding and 400 seconds of 720P video (at 1 FPS). To address the inherent instability and unnaturalness in streaming speech synthesis, often caused by encoding efficiency discrepancies between text and speech tokenizers, we introduce ARIA. ARIA dynamically aligns text and speech units, significantly enhancing the stability and prosody of conversational speech with minimal latency impact. Furthermore, Qwen3.5-Omni expands linguistic boundaries, supporting multilingual understanding and speech generation across 10 languages with human-like emotional nuance. Finally, Qwen3.5-Omni exhibits superior audio-visual grounding capabilities, generating script-level structured captions with precise temporal synchronization and automated scene segmentation. Remarkably, we observed the emergence of a new capability in omnimodal models: directly performing coding based on audio-visual instructions, which we call Audio-Visual Vibe Coding.

URL PDF HTML ☆

赞 0 踩 0

2604.15702 2026-04-22 cs.CL cs.LG

The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring

Jon-Paul Cacioli

Comments 11 pages, 6 figures, 3 tables. Submitted to NeurIPS 2026 Evaluations and Datasets Track. Code, data, and Croissant metadata: https://github.com/synthiumjp/metacognitive-monitoring-battery

2604.14726 2026-04-22 cs.LG cs.AI

Catching Every Ripple: Enhanced Anomaly Awareness via Dynamic Concept Adaptation

Jiaqi Zhu, Shaofeng Cai, Jie Chen, Fang Deng, Beng Chin Ooi, Wenqiao Zhang

Comments Accepted by IEEE TPAMI

2604.14164 2026-04-22 cs.CL

How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

Zixian Huang, Kaichen Yang, Xu Huang, Feiyang Hao, Qiming Ge, Bowen Li, He Du, Kai Chen, Qipeng Guo

2604.12600 2026-04-22 cs.CV cs.NA math.NA

Spatial-Spectral Adaptive Fidelity and Noise Prior Reduction Guided Hyperspectral Image Denoising

Xuelin Xie, Xiliang Lu, Zhengshan Wang, Yang Zhang, Long Chen

2604.12258 2026-04-22 cs.CL cs.AI

Coding-Free and Privacy-Preserving Agentic Framework for Data-Driven Clinical Research

Taehun Kim, Hyeryun Park, Hyeonhoon Lee, Yushin Lee, Kyungsang Kim, Hyung-Chul Lee

Comments 10 pages, 5 figures, 2 tables, Supplementary Appendix

2604.11721 2026-04-22 cs.CL cs.AI cs.LG

Evaluating Cooperation in LLM Social Groups through Elected Leadership

Ryan Faulkner, Anushka Deshpande, David Guzman Piedrahita, Joel Z. Leibo, Zhijing Jin

Comments Main text: 11 pages, 4 figures, 4 tables

2604.11582 2026-04-22 cs.CL cs.AI cs.LG

A Triadic Suffix Tokenization Scheme for Numerical Reasoning

Olga Chetverina

Comments v3: Updated to include analysis of N=1 scalability for SLM architectures

2604.11284 2026-04-22 cs.LG cs.AI cs.LO

THEIA: Learning Complete Kleene Three-Valued Logic in a Pure-Neural Modular Architecture

Augustus Haoyang Li

Comments 40 pages, 3 figures, 15 tables, 8 appendices (A-H)

详情

英文摘要

We present THEIA, a 2.75M modular neural architecture that learns the complete Kleene three-valued logic (K3) truth table from task data without external symbolic inference or hand-encoded K3 gate primitives. Across 5 seeds, THEIA achieves all 39 K3 rules at >99% per-rule accuracy. K3 learnability is not the central finding: Transformer baselines also reach >99% on all 39 rules, and flat MLPs match THEIA on Phase-1 accuracy within 0.04pp. The central findings are two properties of the learned system. (1) Uncertainty-verdict asymmetric propagation. The network preserves Has-Unknown at every upstream boundary (80.0/91.1/90.8/99.7% across Arith/Order/Set/Logic vs. ~52% majority) while final-verdict decodability stays at or below a 73.4% U-vs-non-U oracle reference under linear and nonlinear MLP probes. Activation patching on non-absorbent T->U configurations flips 4,898/4,898 OR pairs (4,719/4,719 AND) across 5 seeds, ruling out residual shortcuts. (2) Reliability spectrum under discretized end-to-end training, on task structures decomposable along the engine boundaries. A mod-3 sequential composition task generalizes from 5- to 500-step eval at 99.96+-0.04% (5 seeds). Under identical Gumbel-softmax training, flat MLPs collapse to chance by 50 steps; a 2x2 ResMLP depth x expansion grid reaches >=99% on only 3/20 (config, seed) trials; a pre-LN Transformer reaches 99.24+-0.34%. The 500-step figure is dominated by straight-through discretization preventing 0.999^500 compounding; the architectural separator is sustaining Phase-1 accuracy under Phase-3 end-to-end Gumbel training, where flat MLPs fail. Auxiliary: under matched optimizer settings THEIA reaches 12/12 Kleene coverage 6.5x faster than a parameter-comparable 8L Transformer; the ratio narrows to ~3.6x under Transformer-standard tuning. We did not perform a THEIA-optimal sweep; ratios are specific-config, not asymptotic.

URL PDF HTML ☆

赞 0 踩 0

2604.10401 2026-04-22 cs.CL

NameBERT: Scaling Name-Based Nationality Classification with LLM-Augmented Open Academic Data

Cong Ming, Ruixin Shi, Yifan Hu

Comments 12 pages, 3 figures, 8 tables; accepted at the 39th Canadian Conference on Artificial Intelligence (Canadian AI 2026)

2604.08404 2026-04-22 cs.LG stat.ML

Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization

Simon Zhang, Ryan P. DeMilt, Kun Jin, Cathy H. Xia

Comments 22 pages, 3 figures, accepted at ICML SCIS 2023

2604.08014 2026-04-22 cs.CV

Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding

Xuezhen Tu, Jingyu Wu, Fangyu Kang, Qingpeng Nong, Kaijin Zhang, Chaoyue Niu, Fan Wu

2604.06798 2026-04-22 cs.LG cs.AI

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

Zhixiong Zhao, Zukang Xu, Zhixuan Chen, Dawei Yang

Comments Although previously revised, per strict university regulations regarding incorrect affiliation, I am unauthorized to retain this manuscript. Furthermore, fundamental derivation errors in the NGES section compromise the mathematical framework, alongside misleading overlapping wording. The paper is therefore withdrawn

2604.06665 2026-04-22 cs.CV

VDPP: Video Depth Post-Processing for Speed and Scalability

Daewon Yoon, Injun Baek, Sangyu Han, Yearim Kim, Nojun Kwak

Comments 8 pages, 6 figures. Accepted to CVPR 2026 ECV Workshop. Project page: https://github.com/injun-baek/VDPP

2604.05090 2026-04-22 cs.CL cs.LG

Multilingual Language Models Encode Script Over Linguistic Structure

Aastha A K Verma, Anwoy Chatterjee, Mehak Gupta, Tanmoy Chakraborty

Comments Accepted at ACL 2026 (Main)

2604.04516 2026-04-22 cs.LG cs.AI

GAIN: Multiplicative Modulation for Domain Adaptation

Hengshuai Yao, Xing Chen, Ahmed Murtadha, Guan Wang

2604.03476 2026-04-22 cs.CV cs.AI q-bio.BM

Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition

Haocheng Tang, Xingyu Dang, Junmei Wang

2604.03261 2026-04-22 cs.CL cs.CY cs.HC

VIGIL: An Extensible System for Real-Time Detection and Mitigation of Cognitive Bias Triggers

Bo Kang, Sander Noels, Tijl De Bie

2604.03037 2026-04-22 cs.RO cs.AI cs.CV

ARM: Advantage Reward Modeling for Long-Horizon Manipulation

Yiming Mao, Zixi Yu, Weixin Mao, Yinhao Li, Qirui Hu, Zihan Lan, Minzhao Zhu, Hua Chen

2604.02368 2026-04-22 cs.AI cs.CL

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

Xue Liu, Xin Ma, Yuxin Ma, Yongchang Peng, Duo Wang, Zhoufutu Wen, Ge Zhang, Kaiyuan Zhang, Xinyu Chen, Yida Ding, Tianci He, Jiani Hou, Liang Hu, Ziyun Huang, Yongzhe Hui, Jianpeng Jiao, Chennan Ju, Yingru Kong, Yiran Li, Jiashuo Liu, Mengyun Liu, Luyao Ma, Fei Ni, Yiqing Ni, Pengbo Niu, Yueyan Qiu, Yanle Ren, Xinyu Shen, Zilin Shi, Zaiyuan Wang, Wenjie Yue, Chun Zhang, Shiyu Zhang, Xinyi Zhang, Kaiwen Zhao, Zhenwei Zhu, Shanshan Wu, Qi Zhao, Wenhao Huang

2604.01375 2026-04-22 cs.AI

RIFT: A RubrIc Failure Mode Taxonomy and Automated Diagnostics

Zhengyang Qi, Charles Dickens, Derek Pham, Amanda Dsouza, Armin Parchami, Frederic Sala, Paroma Varma

2604.00161 2026-04-22 cs.CV

Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models

Longwei Xu, Feng Feng, Shaojie Zhang, Xin Chen, Hang Li, Anan Du, Hailong Yu, Pei Fu, Zhenbo Luo, Jian Luan

2603.29078 2026-04-22 cs.CL cs.LG

PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

Caio Vicentino

Comments Found some errors, I need to fix

2603.27889 2026-04-22 cs.CL

Article and Comment Frames Shape the Quality of Online Comments

Matteo Guida, Yulia Otmakhova, Eduard Hovy, Lea Frermann

2603.26815 2026-04-22 cs.CL cs.AI cs.IR

Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval

Zhiyuan Cheng, Longying Lai, Yue Liu

Comments 18 pages, 4 figures, 9 tables. Submitted to Intelligent Systems with Applications

2603.22608 2026-04-22 cs.AI cs.CL

Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length

Jingxuan Chen, Mohammad Taher Pilehvar, Jose Camacho-Collados

Comments ACL 2026

2603.21298 2026-04-22 cs.CL cs.AI

More Than Sum of Its Parts: Deciphering Intent Shifts in Multimodal Hate Speech Detection

Runze Sun, Yu Zheng, Zexuan Xiong, Zhongjin Qu, Lei Chen, Jie Zhou, Jiwen Lu

2603.20530 2026-04-22 cs.RO cs.CV

Memory Over Maps: 3D Object Localization Without Reconstruction

Rui Zhou, Xander Yap, Jianwen Cao, Allison Lau, Boyang Sun, Marc Pollefeys

Comments 8 pages, 6 figures