arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.22740 2026-04-27 eess.SP cs.IT math.IT

Minimax Optimal Procedures for Joint Detection and Estimation

Dominik Reinhard, Michael Fauß, Abdelhak M. Zoubir

Comments 13 pages, 3 figures, 2 tables

2604.22737 2026-04-27 eess.SY cs.SY math.CO math.OC

A Vehicle Routing Problem for Human-Centered Electric Mobility

Mostafa Emam, Björn Martens, Thomas Rottmann, Matthias Gerdts

Comments 7 pages, 5 figures, standard IEEE double-column format

2604.22724 2026-04-27 cs.RO cs.SY eess.SY

GCImOpt: Learning efficient goal-conditioned policies by imitating optimal trajectories

Jon Goikoetxea, Jesús F. Palacián

Comments Accepted for publication at the 8th Annual Conference on Learning for Dynamics and Control (L4DC 2026). 16 pages (including appendix), 1 figure. For project website, see https://jongoiko.github.io/gcimopt/

2604.22706 2026-04-27 eess.SP

When AI Meets Terahertz: A Survey on the Symbiosis of Artificial Intelligence and Terahertz Networks

Chong Han, Jingting Jiang, Zhengdong Hu, Meixia Tao, Wenjun Zhang

2604.22695 2026-04-27 eess.SP cs.LG

Time-Localized Parametric Decomposition of Respiratory Airflow for Sub-Breath Analysis

Victoria Ribeiro Rodrigues, Paul W. Davenport, Nicholas J. Napoli

Comments Submitted to IEEE Journal of Biomedical and Health Informatics (under review). 18 pages, 7 figures, 5 tables

2604.22682 2026-04-27 eess.SP

Mobility Aware Power Control for VCSEL Based Indoor OWC

Walter Zibusiso Ncube, Ahmad Adnan Qidan, Taisir El-Gorashi, Jaafar M. H. Elmirghani

2604.19935 2026-04-27 eess.SP

A Hybrid Gauss Markov LSTM Mobility Model for Indoor OWC

Walter Zibusiso Ncube, Ahmad Adnan Qidan, Taisir El-Gorashi, Jaafar M. H. Elmirghani

2604.18820 2026-04-27 stat.ML cs.LG eess.SP math.OC stat.AP

Sparse Network Inference under Imperfect Detection and its Application to Ecological Networks

Aoran Zhang, Tianyao Wei, Maria J. Guerrero, César A. Uribe

Comments 13 pages, 4 figures

2602.23338 2026-04-27 eess.SP physics.ins-det

CubeSounder: Low SWaP-C 180 GHz Radiometer for Atmospheric Sensing Tested on High Altitude Balloons

Kyle D. Massingill, Tyler M. Karasinski, Sean Bryan, Michael Baricuatro, Daniel Bliss, Delondrae Carter, Walter Goodwin, Jonathan Greenfield, Christopher Groppi, Philip Mauskopf, Philip Rybak, Scott Smas, Roshni Suresh, Sage Tinlin, Bianca Wullen, Peter Wullen

Comments 8 Pages, 11 Figures, Submitted to IEEE Transactions on Instrumentation and Measurement

2511.10571 2026-04-27 cs.LG cs.SY eess.SY math.PR

Differentiable Filtering for Learning Hidden Markov Models

Reginald Zhiyan Chen, Heng-Sheng Chang, Prashant G. Mehta

Comments 20 pages, 8 figures, accepted to conference: L4DC 2026

2511.06203 2026-04-27 eess.IV

SPASHT: An image-enhancement method for sparse-view MPI SPECT

Zezhang Yang, Zitong Yu, Nuri Choi, Janice Tania, Wenxuan Xue, Barry A. Siegel, Abhinav K. Jha

Comments The paper was withdrawn because the original submission was an early draft manuscript and not the final version for publication

2307.07580 2026-04-27 math.OC cs.SY eess.SY

Home Battery Dispatch under a Tiered Peak Power Tariff

David Pérez-Piñeiro, Sigurd Skogestad, Stephen Boyd

2604.22624 2026-04-27 math.OC cs.SY eess.SY

Compositional Online Learning for Multi-Objective System Co-Design

Meshal Alharbi, Munther A. Dahleh, Gioele Zardini

2604.22579 2026-04-27 eess.IV cs.CV cs.LG

Useful nonrobust features are ubiquitous in biomedical images

Coenraad Mouton, Randle Rabe, Niklas C. Koser, Nicolai Krekiehn, Christopher Hansen, Jan-Bernd Hövener, Claus-C. Glüer

Comments Accepted at The IEEE International Symposium on Biomedical Imaging (ISBI), 2026

2604.22557 2026-04-27 eess.IV cs.CV cs.LG

Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction?

Anam Hashmi, Mayug Maniparambil, Julia Dietlmeier, Kathleen M. Curran, Noel E. O'Connor

Comments Accepted to CVPRW 2026

2604.22492 2026-04-27 eess.IV cs.CV

MTT-Bench: Predicting Social Dominance in Mice via Multimodal Large Language Models

Yunquan Chen, Haoyu Chen

Comments 8 pages, 2 figures. Submitted to conference

2604.22479 2026-04-27 cs.CV eess.IV

Improving Driver Drowsiness Detection via Personalized EAR/MAR Thresholds and CNN-Based Classification

Gökdeniz Ersoy, Mehmet Alper Tatar, Eray Tonbul, Serap Kırbız

2604.22478 2026-04-27 eess.SP

Time-Frequency Pilot Sequence Design and LoS Delay-Doppler Estimation

Aadarsh Devanand, Praful D. Mankar

Comments 6 pages

2604.22469 2026-04-27 eess.SP

The manifold of unitary and symmetric matrices: characterization, Riemannian optimization and application to BD-RIS design

Ignacio Santamaria, Carlos Beltrán, Eduard Jorswieck, Mohammad Soleymani, Jesus Gutiérrez

Comments 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2601.13877. text overlap with arXiv:2601.13877

2604.22467 2026-04-27 eess.AS

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, Ming Li

详情

英文摘要

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and sometimes when it was spoken. Recent Speech-LLM approaches have shown the potential of unified modeling for this task, but jointly learning speaker attribution, temporal structure, and lexical recognition remains difficult and data-intensive. At the current stage, leveraging reliable speaker diarization as an explicit structural prior provides a practical and efficient way to simplify this task. To effectively exploit such priors, we propose DM-ASR, a diarization-aware multi-speaker ASR framework that reformulates the task as a multi-turn dialogue generation process. Given an audio chunk and diarization results, DM-ASR decomposes transcription into a sequence of speaker- and time-conditioned queries, each corresponding to one speaker in one time segment. This formulation converts multi-speaker recognition into a series of structured sub-tasks, explicitly decoupling speaker-temporal structure from linguistic content and enabling effective integration of diarization cues with the reasoning capability of large language models. We further introduce an optional word-level timestamp prediction mechanism that interleaves word and timestamp tokens, yielding richer structured outputs and better transcription quality. Our analysis shows that diarization systems provide more reliable speaker identities and segment-level boundaries, while LLMs excel at modeling linguistic content and long-range dependencies, demonstrating their complementary strengths. Experiments on Mandarin and English benchmarks show that the proposed approach achieves strong performance with relatively small models and training data, while remaining competitive with or outperforming existing unified approaches.

URL PDF HTML ☆

赞 0 踩 0

2604.22392 2026-04-27 cs.IT eess.SP math.IT

Multi-User ISAC with Heterogeneous Unknown Parameters: Optimal Beamforming based on Distribution Information

Chan Xu, Shuowen Zhang

Comments Accepted to appear in IEEE International Symposium on Information Theory (ISIT), 2026

2604.22338 2026-04-27 eess.IV cs.CV

Selective Depthwise Separable Convolution for Lightweight Joint Source-Channel Coding in Wireless Image Transmission

Ming Ye, Kui Cai, Cunhua Pan, Zhen Mei, Wanting Yang, Chunguo Li

Comments 5 pages, 6 figures, journal

2604.22327 2026-04-27 eess.SY cs.SY

Multi-robot obstacle-aware shepherding of non-cohesive target agents

Cinzia Tomaselli, Stefano Covone, Andreagiovanni Reina, Mario di Bernardo

Comments Accepted at ICRA 2026

2604.22323 2026-04-27 eess.SP

Fundamental Theorems on Controllability in Wave-domain Processing for Holographic MIMO

Davide Dardari

Comments 10 pages, 10 figures. Submitted to IEEE Trans. on Wireless Communications

2604.22318 2026-04-27 math.OC cs.GT cs.SY eess.SY

Strategically Robust Linear Quadratic Dynamic Games

Boris Velasevic, Nicolas Lanzetti, Eric Mazumdar

Comments 6 pages, 5 figures, 2 tables. Submitted to the 2026 IEEE Conference on Decision and Control (CDC)

2604.22315 2026-04-27 eess.SY cs.SY

Control of Multi-agent Systems under STL Specifications based on Prescribed Performance Observers

Tommaso Zaccherini, Siyuan Liu, Dimos V. Dimarogonas

Comments arXiv admin note: text overlap with arXiv:2602.05586. text overlap with arXiv:2602.05586

2604.22290 2026-04-27 cs.SD cs.MM eess.AS

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

Maximilian Wachter, Sebastian Murgul, Michael Heizmann

Comments Accepted to the 5th International Conference on SMART MULTIMEDIA (ICSM), 2025

详情

英文摘要

Rhythm transcription is a key subtask of notation-level Automatic Music Transcription (AMT). While deep learning models have been extensively used for detecting the metrical grid in audio and MIDI performances, beat-based rhythm quantization remains largely unexplored. In this work, we introduce a novel deep learning approach for quantizing MIDI performances using a priori beat information. Our method leverages the transformer architecture to effectively process synchronized score and performance data for training a quantization model. Key components of our approach include dataset preparation, a beat-based pre-quantization method to align performance and score times within a unified framework, and a MIDI tokenizer tailored for this task. We adapt a transformer model based on the T5 architecture to meet the specific requirements of rhythm quantization. The model is evaluated using a set of score-level metrics designed for objective assessment of quantization performance. Through systematic evaluation, we optimize both data representation and model architecture. Additionally, we apply performance and score augmentations, such as transposition, note deletion, and performance-side time jitter, to enhance the model's robustness. Finally, a qualitative analysis compares our model's quantization performance against state-of-the-art probabilistic and deep-learning models on various example pieces. Our model achieves an onset F1-score of 97.3% and a note value accuracy of 83.3% on the ASAP dataset. It generalizes well across time signatures, including those not seen during training, and produces readable score output. Fine-tuning on instrument-specific datasets further improves performance by capturing characteristic rhythmic and melodic patterns. This work contributes a robust and flexible framework for beat-based MIDI quantization using transformer models.

URL PDF HTML ☆

赞 0 踩 0

2604.22276 2026-04-27 eess.AS cs.SD

Audio Effect Estimation with DNN-Based Prediction and Search Algorithm

Youichi Okita, Haruhiro Katayose

Comments Accepted for ICASSP2026

2604.22264 2026-04-27 eess.SP cs.IT math.IT

A General EM-Based Channel Model for Reconfigurable Antenna Systems

Chen Xu, Xianghao Yu

Comments 6 pages, 5 figures, conference

2604.22245 2026-04-27 eess.AS

Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding

Mingchen Shao, Hang Su, Wenjie Tian, Bingshen Mu, Zhennan Lin, Lichun Fan, Zhenbo Luo, Jian Luan, Lei Xie