arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.22740 2026-04-27 eess.SP cs.IT math.IT

Minimax Optimal Procedures for Joint Detection and Estimation

Dominik Reinhard, Michael Fauß, Abdelhak M. Zoubir

Comments 13 pages, 3 figures, 2 tables

详情
英文摘要

We investigate the problem of jointly testing a pair of composite hypotheses and, depending on the test result, estimating a random parameter under distributional uncertainties. Specifically, it is assumed that the distribution of the data given the parameter of interest, is subject to uncertainty. Both, a Bayesian formulation and a Neyman-Pearson-like formulation, are considered. It is shown that the optimal policy induces an $f$-similarity that must be maximized to identify the least favorable distributions. Besides the general results, the implementation is investigated using a band-type uncertainty model. For designing the minimax procedures, existing algorithms are modified to increase convergence speed while maintaining numerical stability. The proposed theory is supplemented by numerical results for both formulations.

2604.22737 2026-04-27 eess.SY cs.SY math.CO math.OC

A Vehicle Routing Problem for Human-Centered Electric Mobility

Mostafa Emam, Björn Martens, Thomas Rottmann, Matthias Gerdts

Comments 7 pages, 5 figures, standard IEEE double-column format

详情
英文摘要

In this paper, we present the Electric Mobility Dial-a-Ride Problem (EM-DARP), which extends the Electric Vehicle Dial-a-Ride Problem (EV-DARP) to better accommodate human-focused mobility services. The problem involves utilizing a fleet of heterogeneous Electric Vehicles (EVs) to fulfill a set of customer requests with DARP and mobility-related specifications, while incorporating visits to charging stations amid requests. The problem is formulated as a Mixed-Integer Linear Program (MILP) and subsequently solved for a number of curated evaluation scenarios to demonstrate its practical applicability.

2604.22724 2026-04-27 cs.RO cs.SY eess.SY

GCImOpt: Learning efficient goal-conditioned policies by imitating optimal trajectories

Jon Goikoetxea, Jesús F. Palacián

Comments Accepted for publication at the 8th Annual Conference on Learning for Dynamics and Control (L4DC 2026). 16 pages (including appendix), 1 figure. For project website, see https://jongoiko.github.io/gcimopt/

详情
英文摘要

Imitation learning is a well-established approach for machine-learning-based control. However, its applicability depends on having access to demonstrations, which are often expensive to collect and/or suboptimal for solving the task. In this work, we present GCImOpt, an approach to learn efficient goal-conditioned policies by training on datasets generated by trajectory optimization. Our approach for dataset generation is computationally efficient, can generate thousands of optimal trajectories in minutes on a laptop computer, and produces high-quality demonstrations. Further, by means of a data augmentation scheme that treats intermediate states as goals, we are able to increase the training dataset size by an order of magnitude. Using our generated datasets, we train goal-conditioned neural network policies that can control the system towards arbitrary goals. To demonstrate the generality of our approach, we generate datasets and then train policies for various control tasks, namely cart-pole stabilization, planar and three-dimensional quadcopter stabilization, and point reaching using a 6-DoF robot arm. We show that our trained policies can achieve high success rates and near-optimal control profiles, all while being small (less than 80,000 neural network parameters) and fast enough (up to more than 6,000 times faster than a trajectory optimization solver) that they could be deployed onboard resource-constrained controllers. We provide videos, code, datasets and pre-trained policies under a free software license; see our project website https://jongoiko.github.io/gcimopt/.

2604.22706 2026-04-27 eess.SP

When AI Meets Terahertz: A Survey on the Symbiosis of Artificial Intelligence and Terahertz Networks

Chong Han, Jingting Jiang, Zhengdong Hu, Meixia Tao, Wenjun Zhang

详情
英文摘要

The Terahertz (THz) band (0.1-10 THz) has emerged as a critical frontier for future communication systems, offering ultra-wide bandwidths that enable Terabits-per-second (Tbps) wireless links and high-precision sensing and imaging. However, practical deployment of THz systems is hindered by unique challenges, including intricate channel characteristics, high-dimensional and large-scale optimization problems, and highly dynamic network environments. Artificial Intelligence (AI) serves as a transformative enabler to address these challenges, providing robust capabilities for precise modeling, advanced signal processing, complex optimization, real-time decision-making, and prediction, among others. Reciprocally, the unprecedented bandwidth and high-resolution sensing capabilities of THz networks provide a promising physical infrastructure for AI, facilitating training, inference, and data collection. This survey presents a systematic and comprehensive overview of AI-driven solutions across the entire THz communication network and the symbiosis of AI and THz networks. To begin with, a foundational overview of AI technologies tailored for wireless communications is presented. Subsequently, AI-based innovations are investigated, spanning from hardware design, channel modeling, physical layer optimization, up to higher-layer network protocols and advanced THz services, including mobile edge computing and sensing-empowered applications. In parallel, the capacity of THz networks to serve AI is examined, underscoring a profound paradigm shift towards a mutual symbiosis where AI and THz co-evolve and empower each other. Finally, by synthesizing these state-of-the-art advancements and identifying open research directions, this survey highlights the potential of AI in copilot with development of THz communication systems.

2604.22695 2026-04-27 eess.SP cs.LG

Time-Localized Parametric Decomposition of Respiratory Airflow for Sub-Breath Analysis

Victoria Ribeiro Rodrigues, Paul W. Davenport, Nicholas J. Napoli

Comments Submitted to IEEE Journal of Biomedical and Health Informatics (under review). 18 pages, 7 figures, 5 tables

详情
英文摘要

Respiratory airflow signals provide critical insight into breathing mechanics, yet conventional analysis methods remain limited in their ability to characterize the internal structure of individual breaths. Traditional approaches treat airflow as a quasi-periodic signal and rely on global descriptors such as tidal volume or peak flow, obscuring sub-breath events that reflect neuromuscular coordination and compensatory breathing strategies. This study introduces a parametric framework for decomposing inspiratory airflow into a small number of time-localized components with explicit amplitude, onset time, and duration parameters. Unlike spectral or data-adaptive methods, the proposed approach employs physiologically grounded basis functions, Half-Sine, Gaussian, and Beta, to represent intrabreath waveform morphology through constrained nonlinear optimization. Evaluation across 8,276 breaths demonstrates high reconstruction accuracy (mean squared error $<$ 0.001 for four-component models) and robust parameter precision under moderate noise. Component-derived features describing sub-breath timing and coordination improved classification of cognitive fatigue states arising from cognitive-respiratory competition by up to 30.7% in Matthews correlation coefficient compared with classical respiratory metrics. These results establish that modeling airflow as a sum of parameterized, time-localized primitives provides an interpretable and precise foundation for quantifying intrabreath organization, compensatory breathing dynamics, and respiratory motor control adaptation under cognitive-respiratory dual-task demands.

2604.22682 2026-04-27 eess.SP

Mobility Aware Power Control for VCSEL Based Indoor OWC

Walter Zibusiso Ncube, Ahmad Adnan Qidan, Taisir El-Gorashi, Jaafar M. H. Elmirghani

详情
英文摘要

Optical wireless communication (OWC) is a promising technology for supporting data intensive services in indoor environments due to its large unregulated spectrum, high spatial reuse, and potential for multigigabit data rates. In particular, vertical cavity surface emitting laser (VCSEL) based systems enable highly directional transmission, allowing efficient spatial separation of users and improved link performance. However, the use of narrow optical beams also makes system performance highly sensitive to user mobility and device orientation, as movement directly affects beam alignment and optical channel gain. Consequently, power allocation strategies that ignore mobility dynamics often provision excess optical power to maintain reliable connectivity, resulting in inefficient energy use. In this work, a power control framework for dynamic indoor OWC networks that explicitly accounts for mobility driven channel variation is developed. It uses a hybrid Gauss Markov and learning based approach that captures both user movement continuity and behaviour driven orientation changes. The mobility states are then used to guide power allocation decisions. Simulation results show that incorporating mobility aware channel prediction enables more accurate power allocation, and improves energy efficiency compared with conventional power control schemes in dynamic indoor environments.

2604.19935 2026-04-27 eess.SP

A Hybrid Gauss Markov LSTM Mobility Model for Indoor OWC

Walter Zibusiso Ncube, Ahmad Adnan Qidan, Taisir El-Gorashi, Jaafar M. H. Elmirghani

详情
英文摘要

Optical wireless communication (OWC) has emerged as a promising candidate for future high-capacity indoor wireless networks, driven by its large unregulated spectrum, high spatial reuse, and ability to support multi-gigabit data rates. However, OWC systems are highly sensitive to user mobility, as link performance depends strongly on the spatial alignment between transmitter and receiver. Accurate modelling of user position and device orientation is therefore essential for reliable channel estimation and system evaluation. To that effect, this paper proposes a hybrid Gauss--Markov and long short-term memory (GM--LSTM) mobility model for indoor OWC environments. The Gauss--Markov component captures the temporal correlation of user motion, while the LSTM learns residual behaviour to model non-linear movement patterns and orientation dynamics. The proposed model jointly predicts user position and device orientation, enabling improved representation of mobility in OWC channels. Performance is evaluated using prediction accuracy and per-user data rate evolution. Results show that the proposed hybrid GM--LSTM model outperforms conventional Random Waypoint and Gauss--Markov models, providing more accurate mobility prediction and more stable communication performance in dynamic indoor environments.

2604.18820 2026-04-27 stat.ML cs.LG eess.SP math.OC stat.AP

Sparse Network Inference under Imperfect Detection and its Application to Ecological Networks

Aoran Zhang, Tianyao Wei, Maria J. Guerrero, César A. Uribe

Comments 13 pages, 4 figures

详情
英文摘要

Recovering latent structure from count data has received considerable attention in network inference, particularly when one seeks both cross-group interactions and within-group similarity patterns in bipartite networks, which is widely used in ecology research. Such networks are often sparse and inherently imperfect in their detection. Existing models mainly focus on interaction recovery, while the induced similarity graphs are much less studied. Moreover, sparsity is often not controlled, and scale is unbalanced, leading to oversparse or poorly rescaled estimates with degrading structural recovery. To address these issues, we propose a framework for structured sparse nonnegative low-rank factorization with detection probability estimation. We impose nonconvex $\ell_{1/2}$ regularization on the latent similarity and connectivity structures to promote sparsity within-group similarity and cross-group connectivity with better relative scale. The resulting optimization problem is nonconvex and nonsmooth. To solve it, we develop an ADMM-based algorithm with adaptive penalization and scale-aware initialization and establish its asymptotic feasibility and KKT stationarity of cluster points under mild regularity conditions. Experiments on synthetic and real-world ecological datasets demonstrate improved recovery of latent factors and similarity/connectivity structure relative to existing baselines.

2602.23338 2026-04-27 eess.SP physics.ins-det

CubeSounder: Low SWaP-C 180 GHz Radiometer for Atmospheric Sensing Tested on High Altitude Balloons

Kyle D. Massingill, Tyler M. Karasinski, Sean Bryan, Michael Baricuatro, Daniel Bliss, Delondrae Carter, Walter Goodwin, Jonathan Greenfield, Christopher Groppi, Philip Mauskopf, Philip Rybak, Scott Smas, Roshni Suresh, Sage Tinlin, Bianca Wullen, Peter Wullen

Comments 8 Pages, 11 Figures, Submitted to IEEE Transactions on Instrumentation and Measurement

详情
英文摘要

Microwave sounding is the leading driver of global numerical weather forecasting, but is limited by the scalability of such instruments. With modern machining and commercial microwave components, it is now possible to design low size, weight, power, and cost (SWaP-C) microwave spectrometers while maintaining wide bandwidth performance. Here we report on the status of CubeSounder, a spectrometer tailored for water vapor radiometry that utilizes passive wave guide filter banks. After developing a prototype and high altitude balloon payload, we demonstrated CubeSounder on commercial stratospheric balloon flights. We report on our design process, especially the simulation and fabrication of the custom millimeter-wave filter banks. We also report the initial results of the data collected from the balloon flights.

2511.10571 2026-04-27 cs.LG cs.SY eess.SY math.PR

Differentiable Filtering for Learning Hidden Markov Models

Reginald Zhiyan Chen, Heng-Sheng Chang, Prashant G. Mehta

Comments 20 pages, 8 figures, accepted to conference: L4DC 2026

详情
英文摘要

Hidden Markov Models (HMMs) are fundamental for modeling sequential data, yet learning their parameters from observations remains challenging. Classical methods like the Baum-Welch algorithm are computationally intensive and prone to local optima, while modern spectral algorithms offer provable guarantees but may produce probability outputs outside valid ranges. This work introduces Belief Net, a differentiable filtering framework that learns HMM parameters by formulating the forward filter as a structured neural network and optimizing it with stochastic gradient descent. This architecture recursively updates the belief state, which represents the posterior probability distribution over hidden states based on the observation history. Unlike black-box transformer models, Belief Net's learnable weights are explicitly the logits of the initial distribution, transition matrix, and emission matrix, ensuring full interpretability. The model processes observation sequences using a decoder-only (causal) architecture and is trained end-to-end with standard autoregressive next-observation prediction loss. On synthetic HMM data, Belief Net achieves faster convergence than Baum-Welch while successfully recovering parameters in both undercomplete and overcomplete settings, whereas spectral methods prove ineffective in the latter. Comparisons with transformer-based models are also presented on real-world language data.

2511.06203 2026-04-27 eess.IV

SPASHT: An image-enhancement method for sparse-view MPI SPECT

Zezhang Yang, Zitong Yu, Nuri Choi, Janice Tania, Wenxuan Xue, Barry A. Siegel, Abhinav K. Jha

Comments The paper was withdrawn because the original submission was an early draft manuscript and not the final version for publication

详情
英文摘要

Single-photon emission computed tomography for myocardial perfusion imaging (MPI SPECT) is a widely used diagnostic tool for coronary artery disease. However, the procedure requires considerable scanning time, leading to patient discomfort and the potential for motion-induced artifacts. Reducing the number of projection views while keeping the time per view unchanged provides a mechanism to shorten the scanning time. However, this approach leads to increased sampling artifacts, higher noise, and hence limited image quality. To address these issues, we propose sparseview SPECT image enhancement (SPASHT), inherently training the algorithm to improve performance on defect-detection tasks. We objectively evaluated SPASHT on the clinical task of detecting perfusion defects in a retrospective clinical study using data from patients who underwent MPI SPECT, where the defects were clinically realistic and synthetically inserted. The study was conducted for different numbers of fewer projection views, including 1/6, 1/3, and 1/2 of the typical projection views for MPI SPECT. Performance on the detection task was quantified using area under the receiver operating characteristic curve (AUC). Images obtained with SPASHT yielded significantly improved AUC compared to those obtained with the sparse-view protocol for all the considered numbers of fewer projection views. To further assess performance, a human observer study on the task of detecting perfusion defects was conducted. Results from the human observer study showed improved detection performance with images reconstructed using SPASHT compared to those from the sparse-view protocol. The results provide evidence of the efficacy of SPASHT in improving the quality of sparse-view MPI SPECT images and motivate further clinical validation.

2307.07580 2026-04-27 math.OC cs.SY eess.SY

Home Battery Dispatch under a Tiered Peak Power Tariff

David Pérez-Piñeiro, Sigurd Skogestad, Stephen Boyd

详情
英文摘要

We consider the problem of operating a battery in a home connected to the grid to minimize electricity cost, which combines an energy charge and a tiered peak power charge based on the average of the $N$ largest daily peak powers in each billing month. With perfect foresight of loads and prices, the minimum cost is the solution of a mixed-integer linear program (MILP), which provides a lower bound on the cost of any implementable policy. We propose a model predictive control (MPC) policy that uses simple forecasts of loads and prices and solves a small MILP at each time step. Numerical experiments on one year of data from a home in Trondheim, Norway, show that the MPC policy attains a cost within $1.7\%$ of the prescient bound, and saves close to three times as much as the best rule-based policy we consider.

2604.22624 2026-04-27 math.OC cs.SY eess.SY

Compositional Online Learning for Multi-Objective System Co-Design

Meshal Alharbi, Munther A. Dahleh, Gioele Zardini

详情
英文摘要

Many engineered systems must balance competing objectives, such as performance and safety, cost and reliability, or efficiency and sustainability, and are naturally modeled as compositions of interacting subsystems. We study online multi-objective decision-making in monotone co-design, where functionalities and resources are partially ordered, and the goal is to identify the target-feasible antichain of non-dominated trade-offs using few expensive evaluations. We introduce optimistic evaluators: history-dependent bounds on functionality and resource mappings that enable safe elimination of implementations before full evaluation. Based on these evaluators, we develop an elimination-based rejection-sampling algorithm, prove its soundness, and show that the admissible region shrinks monotonically as information accumulates. We instantiate the framework under monotonicity, Lipschitz continuity, and linear-parametric structure. For compositional co-design problems modeled by multigraphs, we show how local optimistic certificates propagate through the tractable remainder of the graph to yield system-level optimistic feasibility and resource bounds. Experiments on multi-robot fleet design, intermodal mobility systems, and synthetic monotone and Lipschitz benchmarks show substantial sample-efficiency gains over uniform sampling, Bayesian optimization, and multi-objective evolutionary algorithms.

2604.22579 2026-04-27 eess.IV cs.CV cs.LG

Useful nonrobust features are ubiquitous in biomedical images

Coenraad Mouton, Randle Rabe, Niklas C. Koser, Nicolai Krekiehn, Christopher Hansen, Jan-Bernd Hövener, Claus-C. Glüer

Comments Accepted at The IEEE International Symposium on Biomedical Imaging (ISBI), 2026

详情
英文摘要

We study whether deep networks for medical imaging learn useful nonrobust features - predictive input patterns that are not human interpretable and highly susceptible to small adversarial perturbations - and how these features impact test performance. We show that models trained only on nonrobust features achieve well above chance accuracy across five MedMNIST classification tasks, confirming their predictive value in-distribution. Conversely, adversarially trained models that primarily rely on robust features sacrifice in-distribution accuracy but yield markedly better performance under controlled distribution shifts (MedMNIST-C). Overall, nonrobust features boost standard accuracy yet degrade out-of-distribution performance, revealing a practical robustness-accuracy trade-off in medical imaging classification tasks that should be tailored to the requirements of the deployment setting.

2604.22557 2026-04-27 eess.IV cs.CV cs.LG

Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction?

Anam Hashmi, Mayug Maniparambil, Julia Dietlmeier, Kathleen M. Curran, Noel E. O'Connor

Comments Accepted to CVPRW 2026

详情
英文摘要

The emergence of large-scale pretrained foundation models has transformed computer vision, enabling strong performance across diverse downstream tasks. However, their potential for physics-based inverse problems, such as accelerated cardiac MRI reconstruction, remains largely underexplored. In this work, we investigate whether natural-domain foundation models can serve as effective image priors for accelerated cardiac MRI reconstruction, and compare the performance obtained against domain-specific counterparts such as BiomedCLIP. We propose an unrolled reconstruction framework that incorporates pretrained, frozen visual encoders, such as CLIP, DINOv2, and BiomedCLIP, within each cascade to guide the reconstruction process. Through extensive experiments, we show that while task-specific state-of-the-art reconstruction models such as E2E-VarNet achieve superior performance in standard in-distribution settings, foundation-model-based approaches remain competitive. More importantly, in challenging cross-domain scenarios, where models are trained on cardiac MRI and evaluated on anatomically distinct knee and brain datasets--foundation models exhibit improved robustness, particularly under high acceleration factors and limited low-frequency sampling. We further observe that natural-image-pretrained models, such as CLIP, learn highly transferable structural representations, while domain-specific pretraining (BiomedCLIP) provides modest additional gains in more ill-posed regimes. Overall, our results suggest that pretrained foundation models offer a promising source of transferable priors, enabling improved robustness and generalization in accelerated MRI reconstruction.

2604.22492 2026-04-27 eess.IV cs.CV

MTT-Bench: Predicting Social Dominance in Mice via Multimodal Large Language Models

Yunquan Chen, Haoyu Chen

Comments 8 pages, 2 figures. Submitted to conference

详情
英文摘要

Understanding social dominance in animal behavior is critical for neuroscience and behavioral studies. In this work, we explore the capability of Multimodal Large Language Models(MLLMs) to analyze raw behavioral video of mice and predict their dominance hierarchy. We introduce MTT-Bench, a novel benchmark comprising annotated videos of pairwise mouse interactions for Mouse Tube Test analysis. Building on existing MLLM architectures, we fine-tune these models to perform zero-shot inference on unseen behavioral sequences, predicting social dominance without explicit labels during testing. Our framework demonstrates promising results, showing high agreement with tube test rankings. This work opens a new direction for applying foundation models to ethology and social behavior analysis, without the need to design domain-specific models.

2604.22479 2026-04-27 cs.CV eess.IV

Improving Driver Drowsiness Detection via Personalized EAR/MAR Thresholds and CNN-Based Classification

Gökdeniz Ersoy, Mehmet Alper Tatar, Eray Tonbul, Serap Kırbız

详情
英文摘要

Driver drowsiness is a major cause of traffic accidents worldwide, posing a serious threat to public safety. Vision-based driver monitoring systems often rely on fixed Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR) thresholds; however, such fixed values frequently fail to generalize across individuals due to variations in facial structure, illumination, and driving conditions. This paper proposes a personalized driver drowsiness detection system that monitors eyelid movements, head position, and yawning behavior in real time and provides warnings when signs of fatigue are detected. The system employs driver-specific EAR and MAR thresholds, calibrated before driving, to improve classical metric-based detection. In addition, deep learning-based Convolutional Neural Network (CNN) models are integrated to enhance accuracy in challenging scenarios. The system is evaluated using publicly available datasets as well as a custom dataset collected under diverse lighting conditions, head poses, and user characteristics. Experimental results show that personalized thresholding improves detection accuracy by 2-3% compared to fixed thresholds, while CNN-based classification achieves 99.1% accuracy for eye state detection and 98.8% for yawning detection, demonstrating the effectiveness of combining classical metrics with deep learning for robust real-time driver monitoring.

2604.22478 2026-04-27 eess.SP

Time-Frequency Pilot Sequence Design and LoS Delay-Doppler Estimation

Aadarsh Devanand, Praful D. Mankar

Comments 6 pages

详情
英文摘要

We present a novel framework for line-of-sight (LoS) delay-Doppler (DD) estimation in dense scattering propagation environments. We present two time-frequency (TF) domain pilot sequences inspired by the Zadoff-Chu sequence that exhibit desirable autocorrelation properties. Further, we present a twisted convolution-based approach for LoS DD estimation directly from the TF-domain received signal, avoiding an additional TF to DD transformation, which is commonly found in literature. Numerical results from simulations demonstrate that the proposed framework significantly outperforms traditional single-carrier Zadoff-Chu sequences in both delay and Doppler estimation over a wide range of Rician fading factor and SNR values.

2604.22469 2026-04-27 eess.SP

The manifold of unitary and symmetric matrices: characterization, Riemannian optimization and application to BD-RIS design

Ignacio Santamaria, Carlos Beltrán, Eduard Jorswieck, Mohammad Soleymani, Jesus Gutiérrez

Comments 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2601.13877. text overlap with arXiv:2601.13877

详情
英文摘要

This paper proposes and analyzes Riemannian optimization algorithms on the manifold of unitary and symmetric matrices, denoted ${\cal {U}}_s$, which naturally models the scattering matrices of passive and reciprocal devices such as beyond-diagonal reconfigurable intelligent surfaces (BD-RISs). Despite its relevance, the geometry of ${\cal {U}}_s$ has remained largely unexplored, and existing BD-RIS optimization methods either ignore the symmetry constraint or rely on costly Takagi-based parameterizations. We first provide a rigorous geometric characterization of ${\cal {U}}_s$, deriving its tangent space, a simple retraction, and closed-form expressions for geodesics. Building on these results, we develop two Riemannian manifold optimization (MO) algorithms tailored to ${\cal {U}}_s$: a line-search (LS) based scheme and a phase-optimization (PO) update along geodesics. We then apply the proposed framework to BD-RIS-assisted multiple-input multiple-output (MIMO) links, addressing sum-gain maximization, rate maximization, and minimum mean-square error problems, where they outperform existing approaches. Furthermore, we show that when the number of BD-RIS elements exceeds the total number of antennas, the optimal scattering matrix is low-rank, which motivates and enables efficient low-rank variants of the proposed algorithms.

2604.22467 2026-04-27 eess.AS

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, Ming Li

详情
英文摘要

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and sometimes when it was spoken. Recent Speech-LLM approaches have shown the potential of unified modeling for this task, but jointly learning speaker attribution, temporal structure, and lexical recognition remains difficult and data-intensive. At the current stage, leveraging reliable speaker diarization as an explicit structural prior provides a practical and efficient way to simplify this task. To effectively exploit such priors, we propose DM-ASR, a diarization-aware multi-speaker ASR framework that reformulates the task as a multi-turn dialogue generation process. Given an audio chunk and diarization results, DM-ASR decomposes transcription into a sequence of speaker- and time-conditioned queries, each corresponding to one speaker in one time segment. This formulation converts multi-speaker recognition into a series of structured sub-tasks, explicitly decoupling speaker-temporal structure from linguistic content and enabling effective integration of diarization cues with the reasoning capability of large language models. We further introduce an optional word-level timestamp prediction mechanism that interleaves word and timestamp tokens, yielding richer structured outputs and better transcription quality. Our analysis shows that diarization systems provide more reliable speaker identities and segment-level boundaries, while LLMs excel at modeling linguistic content and long-range dependencies, demonstrating their complementary strengths. Experiments on Mandarin and English benchmarks show that the proposed approach achieves strong performance with relatively small models and training data, while remaining competitive with or outperforming existing unified approaches.

2604.22392 2026-04-27 cs.IT eess.SP math.IT

Multi-User ISAC with Heterogeneous Unknown Parameters: Optimal Beamforming based on Distribution Information

Chan Xu, Shuowen Zhang

Comments Accepted to appear in IEEE International Symposium on Information Theory (ISIT), 2026

详情
英文摘要

This paper studies an integrated sensing and communication (ISAC) system where a multi-antenna base station (BS) communicates with multiple single-antenna users in the downlink and senses the unknown and random angle information of a target based on its prior distribution information and the received echo signals. We focus on a challenging scenario with heterogeneous unknown parameters where the target's reflection coefficient is also unknown with no prior information. We consider a general transmit beamforming structure with both communication beams and dedicated sensing beams, where the communication users can cancel the interference caused by the pre-determined sensing signals. By adopting the periodic posterior Cramer-Rao bound (PCRB) to quantify a lower bound of the mean-cyclic error (MCE) for sensing the periodic angle parameter, we optimize the transmit beamforming to minimize the periodic PCRB, subject to individual communication user rate constraints, which is a non-convex problem. By leveraging the semi-definite relaxation (SDR) technique and Lagrange duality theory, we derive the optimal solution and prove that at most one dedicated sensing beam is needed. Numerical results validate our analysis and effectiveness of the proposed beamforming design.

2604.22338 2026-04-27 eess.IV cs.CV

Selective Depthwise Separable Convolution for Lightweight Joint Source-Channel Coding in Wireless Image Transmission

Ming Ye, Kui Cai, Cunhua Pan, Zhen Mei, Wanting Yang, Chunguo Li

Comments 5 pages, 6 figures, journal

详情
英文摘要

Depthwise separable convolutional (DSConv) layers have been successfully applied to deep learning (DL)-based joint source-channel coding (JSCC) schemes to reduce computational complexity. However, a systematic investigation of the layerwise and ratio-wise replacement of standard convolutional (Conv) layers with DSConv layers in JSCC systems for wireless image transmission remains largely unexplored. In this letter, we propose a configurable lightweight JSCC framework that incorporates a selective replacement strategy, enabling flexible substitution of standard Conv layers with DSConv layers at various layer positions and replacement ratios. By adjusting the proportion of layers replaced, we achieve different model compression levels and analyze their impact on reconstruction performance. Furthermore, we investigate how replacements at different encoder and decoder depths influence reconstruction quality under a fixed replacement ratio. Our results show that Conv-to-DSConv replacement at intermediate layers achieves a favorable complexity-performance trade-off, revealing layer-wise redundancy in DL-based JSCC systems. Extensive experiments further demonstrate that the proposed framework achieves substantial parameter reduction with only slight performance degradation, enabling flexible complexity-performance trade-offs for resource-constrained edge devices.

2604.22327 2026-04-27 eess.SY cs.SY

Multi-robot obstacle-aware shepherding of non-cohesive target agents

Cinzia Tomaselli, Stefano Covone, Andreagiovanni Reina, Mario di Bernardo

Comments Accepted at ICRA 2026

详情
英文摘要

This paper presents a novel control strategy for multi-agent shepherding of non-cohesive targets in obstacle-rich environments. Unlike previous approaches that assume cohesive flocking behavior, our method handles targets that interact only with nearby herders through repulsive forces and exhibit no inter-target coordination. Each herder employs a hybrid control policy that combines direct goal-oriented steering with obstacle-tangent maneuvering, enabling targets to circumnavigate obstacles while being guided toward a goal region. The herder dynamics integrate three key behaviors: return-to-goal motion when idle, target steering with adaptive directional control, and obstacle avoidance using both normal and tangential force components. Numerical simulations demonstrate superior performance compared to existing shepherding methods, achieving higher target confinement rates in cluttered environments. Experimental validation using TurtleBot4 herders and Osoyoo target robots in an indoor arena confirms the practical effectiveness of the proposed approach.

2604.22323 2026-04-27 eess.SP

Fundamental Theorems on Controllability in Wave-domain Processing for Holographic MIMO

Davide Dardari

Comments 10 pages, 10 figures. Submitted to IEEE Trans. on Wireless Communications

详情
英文摘要

Wave-domain processing is an emerging paradigm where signal processing operations are partially shifted from the digital to the electromagnetic (EM) domain. Leveraging reconfigurable EM devices, this approach aims to reduce complexity, energy consumption, and latency in next-generation wireless systems employing holographic MIMO. This paper establishes fundamental theorems on the controllability of generic reconfigurable EM devices, where wave processing is achieved through the dynamic configuration of passive scatterers. Specifically, we derive necessary and sufficient conditions for controllability as a function of geometry and mutual coupling between elements. Finally, we provide a detailed discussion and numerical results characterizing the interplay between the number of elements, physical size, degrees of freedom, and directivity.

2604.22318 2026-04-27 math.OC cs.GT cs.SY eess.SY

Strategically Robust Linear Quadratic Dynamic Games

Boris Velasevic, Nicolas Lanzetti, Eric Mazumdar

Comments 6 pages, 5 figures, 2 tables. Submitted to the 2026 IEEE Conference on Decision and Control (CDC)

详情
英文摘要

We study linear quadratic dynamic games where players are uncertain about each other's control policies or goals and consequently seek to be strategically robust. Building on recent work on strategically robust and risk-averse game theory, we first formalize the problem of strategically robust linear quadratic dynamic games. We show that these can be rewritten as simple transformations of linear quadratic games in which each player chooses a controller in a fictitious game in which they are faced with an adversary who is penalized for deviating from the other players' policies. This formulation naturally induces a novel notion of dynamic equilibrium, which we call a strategically robust dynamic equilibrium. We establish existence and uniqueness of such equilibria and furthermore show that the equilibrium policies are Markovian, linear, and can be efficiently computed via coupled backward Riccati equations. Through numerical simulations, including experiments in a network game, we illustrate the benefits of strategic robustness in designing robust and resilient decentralized control schemes. Our experiments also expose a "free-lunch" phenomenon in games in which robustness does not incur a corresponding loss in performance but can yield improvements in players' utilities and social welfare.

2604.22315 2026-04-27 eess.SY cs.SY

Control of Multi-agent Systems under STL Specifications based on Prescribed Performance Observers

Tommaso Zaccherini, Siyuan Liu, Dimos V. Dimarogonas

Comments arXiv admin note: text overlap with arXiv:2602.05586. text overlap with arXiv:2602.05586

详情
英文摘要

This paper addresses decentralized control of large-scale heterogeneous multi-agent systems subject to bounded external disturbances and limited communication, with the objective of satisfying cooperative Signal Temporal Logic (STL) specifications. The considered specifications involve spatiotemporal tasks that require collaboration among multiple agents, including agents beyond direct communication neighborhoods. To address the communication constraints, a $k$-hop Prescribed Performance State Observer ($k$-hop PPSO) is designed to enable each agent to estimate the states of agents up to $k$ communication hops away using only information from $1$-hop neighbors, while guaranteeing predefined performance bounds on the estimation errors. The estimation error bounds are explicitly incorporated into a reformulation of the spatial robustness of the STL specifications, yielding robustness measures that account for worst-case estimation uncertainty. Based on the modified robustness, a decentralized continuous-time feedback control law is designed to guarantee satisfaction of the STL specifications in the presence of bounded disturbances and estimation errors. The proposed framework provides formal correctness guarantees using only local information and limited communication. Numerical simulations illustrate the theoretical results.

2604.22290 2026-04-27 cs.SD cs.MM eess.AS

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

Maximilian Wachter, Sebastian Murgul, Michael Heizmann

Comments Accepted to the 5th International Conference on SMART MULTIMEDIA (ICSM), 2025

详情
英文摘要

Rhythm transcription is a key subtask of notation-level Automatic Music Transcription (AMT). While deep learning models have been extensively used for detecting the metrical grid in audio and MIDI performances, beat-based rhythm quantization remains largely unexplored. In this work, we introduce a novel deep learning approach for quantizing MIDI performances using a priori beat information. Our method leverages the transformer architecture to effectively process synchronized score and performance data for training a quantization model. Key components of our approach include dataset preparation, a beat-based pre-quantization method to align performance and score times within a unified framework, and a MIDI tokenizer tailored for this task. We adapt a transformer model based on the T5 architecture to meet the specific requirements of rhythm quantization. The model is evaluated using a set of score-level metrics designed for objective assessment of quantization performance. Through systematic evaluation, we optimize both data representation and model architecture. Additionally, we apply performance and score augmentations, such as transposition, note deletion, and performance-side time jitter, to enhance the model's robustness. Finally, a qualitative analysis compares our model's quantization performance against state-of-the-art probabilistic and deep-learning models on various example pieces. Our model achieves an onset F1-score of 97.3% and a note value accuracy of 83.3% on the ASAP dataset. It generalizes well across time signatures, including those not seen during training, and produces readable score output. Fine-tuning on instrument-specific datasets further improves performance by capturing characteristic rhythmic and melodic patterns. This work contributes a robust and flexible framework for beat-based MIDI quantization using transformer models.

2604.22276 2026-04-27 eess.AS cs.SD

Audio Effect Estimation with DNN-Based Prediction and Search Algorithm

Youichi Okita, Haruhiro Katayose

Comments Accepted for ICASSP2026

详情
Journal ref
Proceedings of the 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 15952-15956, 2026
英文摘要

Audio effects play an essential role in sound design. This research addresses the task of audio effect estimation, which aims to estimate the configuration of applied effects from a wet signal. Existing approaches to this problem can be categorized into predictive approaches, which use models pre-trained in a data-driven manner, and search-based approaches, which are based on wet signal reconstruction. In this study, we propose a novel approach that integrates these approaches: first, DNNs predict the dry signal and effect configuration, and then a search is performed based on wet signal reconstruction using these predictions. By estimating the dry signal in the prediction stage, it becomes possible to complement or improve the predictions using reconstruction similarity as an objective function. The experimental evaluation showed that methods based on the proposed approach outperformed the method solely based on the predictive approach. Furthermore, the findings suggest that the task division of predicting the effect type combination followed by the search-based estimation of order and parameters was the most effective across various metrics.

2604.22264 2026-04-27 eess.SP cs.IT math.IT

A General EM-Based Channel Model for Reconfigurable Antenna Systems

Chen Xu, Xianghao Yu

Comments 6 pages, 5 figures, conference

详情
英文摘要

Reconfigurable antenna systems (RASs), such as fluid antennas and movable antennas, are poised to play a pivotal role in sixth-generation (6G) systems by dynamically adapting the antenna elements for system performance enhancement. However, unlocking their full potential requires channel models that accurately capture the influence of antenna configurations on the radiation, propagation, and reception of signals. Existing channel models suffer from several limitations, such as neglecting polarization effects, being restricted to specific antenna types, or relying on oversimplified assumptions. In this paper, we propose a general electromagnetic (EM)-based channel model grounded in spherical vector wave expansion (SVWE). The proposed EM-based channel model captures the impact of antenna position and orientation on the channel gain, thereby making it particularly well-suited for RASs. The effectiveness and accuracy are validated through comparisons with commercial simulation software, demonstrating excellent agreement in predicted channel gains. Moreover, it is shown that antenna orientation is a critical factor governing communication performance, and that dynamically adjusting the antenna orientation yields up to 70% improvement in achievable communication rate compared to a fixed-antenna configuration.

2604.22245 2026-04-27 eess.AS

Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding

Mingchen Shao, Hang Su, Wenjie Tian, Bingshen Mu, Zhennan Lin, Lichun Fan, Zhenbo Luo, Jian Luan, Lei Xie

详情
英文摘要

While Large Audio Language Models (LALMs) achieve strong performance on short audio, they degrade on long-form inputs. This degradation is more severe in temporal awareness tasks, where temporal alignment becomes increasingly inaccurate as audio duration grows. We attribute these limitations to the lack of data, benchmarks, and modeling approaches tailored for long-form temporal awareness. To bridge this gap, we first construct LAT-Chronicle, a 1.2k hour long-form audio dataset with temporal annotations across real-world scenarios. We further develop LAT-Bench, the first human-verified benchmark supporting audio up to 30 minutes while covering three core tasks: Dense Audio Caption, Temporal Audio Grounding, and Targeted Audio Caption. Leveraging these resources, we propose LAT-Audio, formulating temporal awareness as a progressive global-to-local reasoning paradigm. A global timeline is first constructed as an aligned temporal-semantic context,and the Think-With-Audio Chain-of-Thought (TWA-CoT) is then introduced to perform iterative reasoning by incorporating local audio information via tool use. Experiments show that LAT-Audio surpasses existing models on long-form audio temporal awareness tasks and improves robustness to input duration. We release the dataset, benchmark, and model to facilitate future research at https://github.com/alanshaoTT/LAT-Audio-Repo.