arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2407.11764 2026-04-14 cs.LG

Adversarial Robustness of Graph Transformers

Philipp Foth, Lukas Gosch, Simon Geisler, Leo Schwinn, Stephan Günnemann

Comments TMLR 2025 (J2C-Certification: Presented @ ICLR 2026). A preliminary version appeared at the Differentiable Almost Everything Workshop at ICML 2024. Code available at https://github.com/isefos/gt_robustness

2407.11077 2026-04-14 cs.LG cs.AI

Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft

Yifei Li, Erik-Jan van Kampen

2405.11597 2026-04-14 cs.CL cs.AI

Language Reconstruction with Brain Predictive Coding from fMRI Data

Congchi Yin, Ziyi Ye, Piji Li

Comments Accepted by ACL 2026

2310.17245 2026-04-14 cs.LG cs.AI

CROP: Conservative Reward for Model-based Offline Policy Optimization

Hao Li, Xiao-Hu Zhou, Shu-Hai Li, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Zeng-Guang Hou

2305.16272 2026-04-14 cs.LG cs.GT stat.ML

Incentivizing Honesty among Competitors in Collaborative Learning and Optimization

Florian E. Dorner, Nikola Konstantinov, Georgi Pashaliev, Martin Vechev

Comments Updated experimental results after fixing a mistake in the code. Previous version published in NeurIPS 2023; 37 pages, 5 figures

2211.14456 2026-04-14 cs.CV

TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis

Pavlo Melnyk, Andreas Robinson, Michael Felsberg, Mårten Wadenbäck

Comments CVPR 2024

2210.01751 2026-04-14 cs.AI cs.LO

Proportoids

Christian Antić

2604.11072 2026-04-14 cs.AI

Hodoscope: Unsupervised Monitoring for AI Misbehaviors

Ziqian Zhong, Shashwat Saxena, Aditi Raghunathan

2604.11070 2026-04-14 cs.AI

PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk

Seulki Lee

Comments 13 pages, 13 tables, 1 appendix

2604.11066 2026-04-14 cs.CL

ks-pret-5m: a 5 million word, 12 million token kashmiri pretraining dataset

Haq Nawaz Malik, Nahfid Nissar

2604.11065 2026-04-14 cs.AI

AI Integrity: A New Paradigm for Verifiable AI Governance

Seulki Lee

Comments 13 pages, 8 tables

2604.11061 2026-04-14 cs.LG cs.AI

Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?

Ziqian Zhong, Aashiq Muhamed, Mona T. Diab, Virginia Smith, Aditi Raghunathan

2604.11052 2026-04-14 cs.SD

LaDA-Band: Language Diffusion Models for Vocal-to-Accompaniment Generation

Qi Wang, Zhexu Shen, Meng Chen, Guoxin Yu, Chaoxu Pang, Weifeng Zhao, Wenjiang Zhou

Comments Submitted to ACMMM 2026. Under review

详情

英文摘要

Vocal-to-accompaniment (V2A) generation, which aims to transform a raw vocal recording into a fully arranged accompaniment, inherently requires jointly addressing an accompaniment trilemma: preserving acoustic authenticity, maintaining global coherence with the vocal track, and producing dynamic orchestration across a full song. Existing open-source approaches typically make compromises among these goals. Continuous-latent generation models can capture long musical spans but often struggle to preserve fine-grained acoustic detail. In contrast, discrete autoregressive models retain local fidelity but suffer from unidirectional generation and error accumulation in extended contexts. We present LaDA-Band, an end-to-end framework that introduces Discrete Masked Diffusion to the V2A task. Our approach formulates V2A generation as Discrete Masked Diffusion, i.e., a global, non-autoregressive denoising formulation that combines the representational advantages of discrete audio codec tokens with full-sequence bidirectional context modeling. This design improves long-range structural consistency and temporal synchronization while preserving crisp acoustic details. Built on this formulation, LaDA-Band further introduces a dual-track prefix-conditioning architecture, an auxiliary replaced-token detection objective for weakly anchored accompaniment regions, and a two-stage progressive curriculum to scale Discrete Masked Diffusion to full-song vocal-to-accompaniment generation. Extensive experiments on both academic and real-world benchmarks show that LaDA-Band consistently improves acoustic authenticity, global coherence, and dynamic orchestration over existing baselines, while maintaining strong performance even without auxiliary reference audio. Codes and audio samples are available at https://github.com/Duoluoluos/TME-LaDA-Band .

URL PDF HTML ☆

赞 0 踩 0

2604.11050 2026-04-14 cs.CL cs.AI

Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds

Jihoon Jeong

Comments 34 pages, 6 figures, 1 table in main text + appendix. Ongoing series on Model Medicine

2604.11042 2026-04-14 cs.CV

Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization

Renyu Li, Vladimir Kirilenko, Yao You, Crag Wolfe

Comments 12 pages, 6 figures, 5 tables

2604.11041 2026-04-14 cs.AI

From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience

Jia Luo

2604.11040 2026-04-14 cs.AI

Intelligent Approval of Access Control Flow in Office Automation Systems via Relational Modeling

Dugang Liu, Zulong Chen, Chuanfei Xu, Jiaxuan He, Yunlu Ma, Jia Xu

2604.11038 2026-04-14 cs.CV

EgoFun3D: Modeling Interactive Objects from Egocentric Videos using Function Templates

Weikun Peng, Denys Iliash, Manolis Savva

Comments Project website: https://3dlg-hcvc.github.io/EgoFun3D/

2604.11037 2026-04-14 cs.LG cs.AI

RTMC: Step-Level Credit Assignment via Rollout Trees

Tao Wang, Suhang Zheng, Xiaoxiao Xu

2604.11036 2026-04-14 cs.CL cs.AI

Uncertainty-Aware Web-Conditioned Scientific Fact-Checking

Ashwin Vinod, Katrin Erk

2604.11035 2026-04-14 cs.AI

Introspective Diffusion Language Models

Yifan Yu, Yuqing Jian, Junxiong Wang, Zhongzhu Zhou, Donglin Zhuang, Xinyu Fang, Sri Yanamandra, Xiaoxia Wu, Qingyang Wu, Shuaiwen Leon Song, Tri Dao, Ben Athiwaratkun, James Zou, Fan Lai, Chenfeng Xu

2604.11025 2026-04-14 cs.CV

Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images

Zheng Jiang, Yiming Chen, Nan He, Jiahui Chen, Chaoyang Li, Houde Qian, Lifeng Sun

2604.11020 2026-04-14 cs.RO cs.HC

Inferring World Belief States in Dynamic Real-World Environments

Jack Kolb, Aditya Garg, Nikolai Warner, Karen M. Feigh

Comments 7 pages, 4 figures

2604.11014 2026-04-14 cs.CV

UHD-GPGNet: UHD Video Denoising via Gaussian-Process-Guided Local Spatio-Temporal Modeling

Weiyuan He, Chen Wu, Pengwen Dai, Wei Wang, Dianjie Lu, Guijuan Zhang, Linwei Fan, Yongzhen Wang, Zhuoran Zheng

2604.11012 2026-04-14 cs.AI cs.CL cs.LG

Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics

Yuanhao Ding, Meimingwei Li, Esteban Garces Arias, Matthias Aßenmacher, Christian Heumann, Chongsheng Zhang

Comments Accepted at ACL 2026 (Main Conference)

2604.11011 2026-04-14 cs.LG cs.CL cs.NE

K-Way Energy Probes for Metacognition Reduce to Softmax in Discriminative Predictive Coding Networks

Jon-Paul Cacioli

Comments 33 pages, 3 figures

详情

英文摘要

We present this as a negative result with an explanatory mechanism, not as a formal upper bound. Predictive coding networks (PCNs) admit a K-way energy probe in which each candidate class is fixed as a target, inference is run to settling, and the per-hypothesis settled energies are compared. The probe appears to read a richer signal source than softmax, since the per-hypothesis energy depends on the entire generative chain. We argue this appearance is misleading under the standard Pinchetti-style discriminative PC formulation. We present an approximate reduction showing that with target-clamped CE-energy training and effectively-feedforward latent dynamics, the K-way energy margin decomposes into a monotone function of the log-softmax margin plus a residual that is not trained to correlate with correctness. The decomposition predicts that the structural probe should track softmax from below. We test this across six conditions on CIFAR-10: extended deterministic training, direct measurement of latent movement during inference, a post-hoc decoder fairness control on a backpropagation network, a matched-budget PC vs BP comparison, a five-point Langevin temperature sweep, and trajectory-integrated MCPC training. In every condition the probe sat below softmax. The gap was stable across training procedures within the discriminative PC family. Final-state and trajectory-integrated training produced probes whose AUROC_2 values differed by less than 10^-3 at deterministic evaluation. The empirical regime is small: single seed, 2.1M-parameter network, 1280 test images. We frame the result as a preprint inviting replication. We discuss conditions under which the decomposition does not apply (bidirectional PC, prospective configuration, generative PC, non-CE energy formulations) and directions for productive structural probing the analysis does not foreclose.

URL PDF HTML ☆

赞 0 踩 0

2604.11010 2026-04-14 cs.CV

Byte-level generative predictions for forensics multimedia carving

Jaewon Lee, Md Eimran Hossain Eimon, Avinash Srinivasan, Hari Kalva

Comments Accepted for publication at the "SPIE Defense + Security" Conference

2604.11007 2026-04-14 cs.CV

Data-Efficient Semantic Segmentation of 3D Point Clouds via Open-Vocabulary Image Segmentation-based Pseudo-Labeling

Takahiko Furuya

2604.11006 2026-04-14 cs.CV

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation

Zhiyuan Zhang, Zijian Zhou, Linjun Li, Long Chen, Hao Tang, Yichen Gong

Comments Dataset will be available at https://github.com/yx345kw/EmissionGen

2604.11005 2026-04-14 cs.AI

Diffusion-CAM: Faithful Visual Explanations for dMLLMs

Haomin Zuo, Yidi Li, Luoxiao Yang, Xiaofeng Zhang

Comments Accepted by ACL 2026 main conference