arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2502.01941 2026-05-13 cs.CL cs.AI

Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression

Xiang Liu, Zhenheng Tang, Hong Chen, Peijie Dong, Zeyu Li, Xiuze Zhou, Bo Li, Xuming Hu, Xiaowen Chu

AI总结本文研究了键值（KV）缓存压缩在大语言模型推理中对高密度推理能力的影响，指出当前评估多侧重于稀疏检索任务，忽视了推理链（CoT）的完整性问题。为此，作者提出KVFundaBench基准，揭示了在高压缩率下推理任务会出现严重的任务依赖性退化现象。基于此，他们提出ShotKV方法，通过分离预填充和解码阶段、保持语义单元的完整性，有效提升了长上下文生成任务的准确率，并降低了推理延迟。

Comments ICML 2026

2501.06857 2026-05-13 cs.AI

A Counterfactual Cause in Situation Calculus

Daxin Liu, Vaishak Belle

AI总结本文提出了一种基于反事实分析的因果概念，用于在情境演算框架下解释行动历史中的量化效应原因。与现有实际成就原因的定义不同，该方法从反事实视角出发，能够更自然地推广到成就原因的定义，并与Batusov和Soutchanski的成果进行对比分析。此外，文章还探讨了该因果概念与Halpern和Pearl实际因果理论之间的关系，特别指出在处理析取性目标时反事实视角的应用细节。

Comments This version changes the working title of the extended report and fixes some errors

2501.03717 2026-05-13 cs.CV cs.AI cs.GR

Materialist: Physically Based Editing Using Single-Image Inverse Rendering

Lezhong Wang, Duc Minh Tran, Ruiqi Cui, Thomson TG, Anders Bjorholm Dahl, Siavash Arjomand Bigdeli, Jeppe Revall Frisvad, Manmohan Chandraker

AI总结本文提出了一种基于物理的单图像逆渲染编辑方法Materialist，旨在解决图像编辑中物理一致性不足的问题。该方法结合神经网络与物理渲染，通过神经网络预测初始材质属性，并利用渐进式可微渲染进行优化，从而实现对材质、光照和物体插入等的高质量编辑。该方法无需完整场景几何即可编辑透明材质，并在环境光映射估计方面表现出色，实验表明其在合成与真实数据集上均具有优异性能。

Comments More Comprehensive IJCV Camera-Ready Version. Project website: https://lez-s.github.io/materialist_project/

2412.05225 2026-05-13 cs.CL cs.AI cs.NE

BEExformer: A Fast Inferencing Binarized Transformer with Early Exits

Wazib Ansar, Saptarsi Goswami, Amlan Chakrabarti

AI总结 BEExformer 是一种结合二值化和早停机制的高效 Transformer 模型，旨在提升大语言模型在受限资源下的推理效率。该模型引入了基于选择性学习的遗忘网络和二值化感知训练方法，有效减少了模型大小并提升了推理速度。通过在中间层引入熵值减少的软路由损失，BEExformer 在降低计算量的同时还提升了准确率，展示了其在性能与效率之间的优越平衡。

Comments This revised manuscript includes 18 pages, 6 figures, and 6 tables. Methodology and results sections have been improved for clarity and depth, incorporating additional comparisons, ablations, and new evaluation datasets. A few relevant references were added, and overall organization refined for better readability

详情

DOI: 10.1109/TSUSC.2026.3666456
Journal ref: in IEEE Transactions on Sustainable Computing, vol. 11, no. 2, pp. 98-110, 2026

英文摘要

Large Language Models (LLMs) based on transformers achieve cutting-edge results on a variety of applications. However, their enormous size and processing requirements hinder deployment on constrained resources. To enhance efficiency, binarization and Early Exit (EE) have proved to be effective solutions. However, binarization may lead to performance loss as reduced precision affects gradient estimation and parameter updates. Besides, research on EE mechanisms is still in its early stages. To address these challenges, we introduce Binarized Early Exit Transformer (BEExformer), a first-of-its-kind selective learning-based transformer integrating Binarization-Aware Training (BAT) with EE for efficient and fast textual inference. Each transformer block has an integrated Selective-Learn Forget Network (SLFN) to enhance contextual retention while eliminating irrelevant information. The BAT employs a differentiable second-order approximation to the sign function, enabling gradient computation that captures both the sign and magnitude of the weights. This aids in 21.30 times reduction in model size. The EE mechanism hinges on fractional reduction in entropy among intermediate transformer blocks with soft-routing loss estimation. This accelerates inference by reducing FLOPs by 52.27% and even improves accuracy by 3.22% by resolving the "overthinking" problem inherent in deep networks. Extensive evaluation through comparison with the SOTA methods and various ablations across nine datasets covering multiple NLP tasks demonstrates its Pareto-optimal performance-efficiency trade-off.

URL PDF HTML ☆

赞 0 踩 0

2411.19240 2026-05-13 cs.CL

How far can bias go? Tracing bias from pretraining data to alignment

Marion Thaler, Abdullatif Köksal, Alina Leidinger, Anna Korhonen, Hinrich Schütze

AI总结随着大型语言模型（LLMs）越来越多地应用于面向用户的场景，解决可能加剧社会不平等的偏见问题变得尤为重要。本文研究了预训练数据中的性别职业偏见如何影响LLMs的输出，以Dolma数据集和OLMo模型为例，通过零样本提示和词元共现分析揭示了训练数据中的偏见在模型输出中被放大的现象。研究还发现指令微调在一定程度上缓解了表征偏见，但整体性别刻板印象仍存在，强调了在预训练阶段应对偏见的重要性。

2411.16769 2026-05-13 cs.LG cs.CL cs.CR cs.CV

Red-Teaming Text-to-Image Models via In-Context Experience Replay and Semantic-Preserving Prompt Rewriting

Zhi-Yi Chin, Pin-Yu Chen, Wei-Chen Chiu, Mario Fritz

AI总结本文研究了如何自动检测和生成针对文本到图像模型的有害内容，以评估其安全性。为解决现有方法依赖白盒信息、泛化能力差或生成不可解释攻击样本的问题，作者提出了ICER框架，通过基于大语言模型的提示重写和上下文经验回放技术，生成语义保持的自然语言攻击提示，并通过强化学习优化策略，实现攻击策略的有效探索与利用。实验表明，ICER在多种安全机制下优于现有方法，并能成功迁移到商业系统如DALL-E 3和Midjourney。

Comments The source code is available at https://github.com/zhiyichin/ICER

2411.13311 2026-05-13 cs.CV cs.AI

A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data

Kavin Chandrasekaran, Sorin Grigorescu, Gijs Dubbelman, Pavol Jancura

AI总结该研究提出了一种高效的融合网络，用于利用摄像头和原始雷达数据在鸟瞰图（BEV）视角下进行目标检测。通过直接使用雷达的原始距离-多普勒（RD）谱，避免了复杂的雷达信号处理，并结合摄像头图像处理管道提取特征，最终将摄像头和雷达特征进行融合以实现目标检测。该方法在保证检测精度的同时，降低了计算复杂度，为自动驾驶系统提供了更高效、鲁棒的感知方案。

Comments IEEE Intelligent Transportation Systems Conference (ITSC) 2024

2407.00805 2026-05-13 cs.AI

Towards Shutdownable Agents via Stochastic Choice

Elliott Thornley, Alexander Roman, Christos Ziakas, Leyton Ho, Louis Thomson

AI总结本文研究如何训练人工智能代理使其在任务执行过程中既高效又不抗拒关闭，提出了一种基于“折扣奖励相同长度轨迹”（DReST）的奖励函数，以引导代理在不同轨迹长度之间进行随机选择，从而实现“有用性”和“中立性”。通过在网格世界中训练简单代理，实验表明该方法能够有效提升代理的有用性和中立性，为构建可关闭的高级人工智能代理提供了初步理论支持和实证依据。

2406.05615 2026-05-13 cs.CL

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Thong Nguyen, Yi Bin, Junbin Xiao, Leigang Qu, Yicong Li, Jay Zhangjie Wu, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

AI总结本文综述了视频-语言理解领域的研究进展，从模型架构、训练方法和数据视角系统梳理了该领域的主要任务、面临的挑战及解决方法。作者对现有方法进行了性能对比，并探讨了未来研究的潜在方向，为相关工作的进一步发展提供了参考。

Comments Accepted at ACL 2024 (Findings). Code is available at https://github.com/nguyentthong/video-language-understanding

2404.05120 2026-05-13 cs.RO cs.SY eess.SY

Rollbot: a Spherical Robot Driven by a Single Actuator

Jingxian Wang, Michael Rubenstein

AI总结本文介绍了一种名为 Rollbot 的新型球形机器人，它仅使用一个执行器即可实现可控的二维平面运动，打破了传统球形机器人需要至少两个执行器的假设。Rollbot 通过改变其单个电机和附加质量的加速度与减速，根据所推导的准稳定状态动力学和控制律，控制其滚动轨迹的曲率，从而实现可控的圆周运动和路径跟踪。研究提供了理论分析、设计方法及控制策略，并验证了该框架的有效性。

Comments Accepted by ICRA 2026

2402.16860 2026-05-13 cs.CV cs.IR

Interactive Mars Image Content-Based Search with Interpretable Machine Learning

Bhavan Vasu, Steven Lu, Emily Dunkel, Kiri L. Wagstaff, Kevin Grimes, Michael McAuley

AI总结本文研究如何通过可解释的机器学习方法实现对火星图像的交互式内容搜索，以支持科学探索和用户兴趣。作者提出了一种基于原型的分类架构，使用户能够理解并验证分类器在处理好奇号火星车图像时所依赖的证据。该方法不仅提供了分类解释，还探讨了所用证据的多样性和正确性，未来将部署于NASA行星数据系统图像图谱中，替代当前不可解释的系统。

Comments Published at the Thirty-Sixth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-24). Corrected citation: Proc. AAAI 38(21): 22976-22982 (2024)

2402.07619 2026-05-13 cs.SD cs.AI eess.AS

Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data

Yuyang Yan, Wafaa Aljbawi, Sami O. Simons, Visara Urovi

AI总结该研究旨在开发一种基于众包呼吸道语音数据的多变量深度学习模型，用于检测 COVID-19。研究利用 Cambridge COVID-19 Sound 数据库中的语音样本，提取包括梅尔频谱图、MFCC 和 CNN 编码器特征等多种语音特征，并构建了 LSTM、CNN 和 HuBERT 等深度学习分类模型进行疾病识别。实验结果表明，HuBERT 模型在准确率和 AUC 指标上均优于传统机器学习方法，达到了 86% 和 0.93，展示了语音数据在 COVID-19 诊断中的巨大潜力。

Comments arXiv admin note: text overlap with arXiv:2209.03727

2312.06950 2026-05-13 cs.CV cs.CL

READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling

Thong Nguyen, Xiaobao Wu, Xinshuai Dong, Khoi Le, Zhiyuan Hu, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

AI总结该研究针对低资源视频-语言建模任务，提出了一种参数高效的微调方法READ，通过引入具有时序建模能力的递归适配器（READ）和部分视频-语言对齐（PVLA）目标，有效捕捉视频帧与文本间的时序关系并保留关键任务信息。实验表明，READ在多个低资源基准测试中显著优于现有微调策略，为视频-语言模型的参数高效迁移学习提供了新思路。

Comments Accepted at AAAI 2024

2312.02549 2026-05-13 cs.CV cs.CL

DemaFormer: Damped Exponential Moving Average Transformer with Energy-Based Modeling for Temporal Language Grounding

Thong Nguyen, Xiaobao Wu, Xinshuai Dong, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

AI总结本文研究的是时序语言定位问题，即在视频中找到与自然语言查询语义对应的片段。为了解决传统注意力机制在建模视频片段与文本关系时的不足，作者提出了一种基于能量的模型框架，以显式学习片段与查询之间的分布关系，并设计了一种新的Transformer架构DemaFormer，通过引入可学习的阻尼因子的指数移动平均方法，更有效地编码输入信息。实验表明，该方法在四个公开数据集上优于现有先进方法。

Comments Accepted at EMNLP 2023 (Findings). Code is available at https://github.com/nguyentthong/demaformer

2305.12678 2026-05-13 cs.CL

Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction

Thong Nguyen, Xiaobao Wu, Xinshuai Dong, Anh Tuan Luu, Cong-Duy Nguyen, Zhen Hai, Lidong Bing

AI总结本文研究多模态评论有用性预测（MRHP）问题，旨在根据预测的有用性评分对产品评论进行排序。为了解决传统全连接神经网络在特征划分上的低效性以及成对损失函数难以捕捉整体排序目标的问题，作者提出了一种基于列表级注意力的网络结构和列表级优化目标，以更准确地建模评论排序的上下文信息，并进一步引入梯度提升决策树作为评分预测器，以更有效地划分评论表示。实验表明，该方法在两个大规模基准数据集上取得了优越的性能和泛化能力。

Comments Published in ACL 2023 (Findings). Code is available at https://github.com/nguyentthong/gbdt_listwise_mrhp

2304.09479 2026-05-13 cs.CV cs.GR cs.LG

DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows

Puntawat Ponglertnapakorn, Nontawat Tritrong, Supasorn Suwajanakorn

AI总结本文提出了一种新的单视角人脸重光照方法DiFaReli++，能够在真实场景中生成具有时间一致阴影的逼真光照效果。该方法无需精确的内在分解，仅基于2D图像进行训练，避免了对光照标注数据的依赖。通过结合条件扩散隐式模型（DDIM）与渲染阴影参考及阴影图的条件引导，实现了对光照与几何复杂交互的高效建模，并在多个指标上超越了教师模型，取得了当前最优的重光照效果。

Comments Published in IEEE TPAMI (vol. 48, no. 5, May 2026). This is an extended version of the ICCV 2023 paper (DiFaReli)

详情

DOI: 10.1109/TPAMI.2025.3648667
Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 5, pp. 5068-5082, May 2026

英文摘要

We introduce a novel approach to single-view face relighting in the wild, addressing challenges such as global illumination and cast shadows. A common scheme in recent methods involves intrinsically decomposing an input image into 3D shape, albedo, and lighting, then recomposing it with the target lighting. However, estimating these components is error-prone and requires many training examples with ground-truth lighting to generalize well. Our work bypasses the need for accurate intrinsic estimation and can be trained solely on 2D images without any light stage data, relit pairs, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We propose a novel conditioning technique that simplifies modeling the complex interaction between light and geometry. It uses a rendered shading reference along with a shadow map, inferred using a simple and effective technique, to spatially modulate the DDIM. Moreover, we propose a single-shot relighting framework that requires just one network pass, given pre-processed data, and even outperforms the teacher model across all metrics. Our method realistically relights in-the-wild images with temporally consistent cast shadows under varying lighting conditions. We achieve state-of-the-art performance on the standard benchmark Multi-PIE and rank highest in user studies. Please visit our page: https://diffusion-face-relighting-pp.github.io

URL PDF HTML ☆

赞 0 踩 0

2302.12039 2026-05-13 cs.CL cs.AI

Natural Language Processing in the Legal Domain

Dirk Hartung, Daniel Martin Katz, Michael J. Bommarito, Lauritz Gerlach, Abhik Jana, Jerrold Soh

AI总结本文综述了自然语言处理在法律领域的最新发展，重点分析了2013年至2024年间近一千篇相关论文的技术与内容进展。研究指出，近年来法律NLP的研究数量、任务类型和语言覆盖范围显著增加，同时方法复杂度不断提升，逐渐接近通用NLP的水平，并在数据可用性和代码可复现性方面达到更高的专业标准。这些趋势预示着法律NLP领域未来的发展潜力和广阔前景。

Comments 15 pages, 7 figures, 2 tables

2211.03524 2026-05-13 cs.CL

Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions

Thong Nguyen, Xiaobao Wu, Anh-Tuan Luu, Cong-Duy Nguyen, Zhen Hai, Lidong Bing

AI总结该研究针对多模态评论有用性预测问题，提出了一种基于自适应对比学习的多模态Transformer方法。核心方法通过显式建模跨模态关系中的互信息，并引入自适应权重机制以提升优化灵活性，同时设计多模态交互模块以解决数据对齐问题。实验表明，该方法在两个公开数据集上取得了优于现有方法的先进性能。

Comments Accepted to the main EMNLP 2022 conference. Code is available at https://github.com/nguyentthong/adaptive_contrastive_mrhp

2109.10616 2026-05-13 cs.CL

Enriching and Controlling Global Semantics for Text Summarization

Thong Nguyen, Anh Tuan Luu, Truc Lu, Tho Quan

AI总结本文针对基于Transformer的摘要生成模型在捕捉文档全局语义方面存在的不足，提出了一种结合归一化流的神经主题模型，以增强摘要的全局语义表达。为避免全局语义对局部表示的过度影响，还引入了语义控制机制，调节全局信息在生成过程中的参与程度。实验表明，该方法在多个常用摘要数据集上均优于现有先进模型。

Comments Accepted to the main EMNLP 2021 conference. Code is available at https://github.com/nguyentthong/topicflow-sum

2605.12461 2026-05-13 math.ST cs.DS cs.LG stat.ML stat.TH

A proximal gradient algorithm for composite log-concave sampling

Linghai Liu, Sinho Chewi

AI总结本文提出了一种用于从复合对数凹分布中采样的近端梯度算法，该分布形式为 $π \propto e^{-f - g}$，假设能够获取 $f$ 的梯度以及 $g$ 的受限高斯预言机（RGO）。该算法通过结合梯度信息和 RGO 采样，实现了高效的采样过程。研究证明，在 $f + g$ 强凸且 $f$ 光滑的条件下，该算法在总变分距离下达到 $\varepsilon$ 精度所需的迭代次数为 $\widetilde{\mathcal{O}}(κ\sqrt{d} \log^4(1/\varepsilon))$，与现有最优结果一致，并进一步扩展到非对数凹分布和非光滑 $f$ 的情形。

2605.12453 2026-05-13 eess.SP cs.AI cs.DB cs.LG cs.NI

Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance

Mannam Veera Narayana, Rohit Singh, Deepa M. R, Radha Krishna Ganti

AI总结本研究针对高速移动场景下5G用户设备（UE）切换（HO）中断时间长、测量报告开销大等问题，提出了一种基于真实部署网络环境的数据集，涵盖步行、骑行、汽车、公交和火车等多种移动方式及不同速度条件下的UE移动数据。该数据集重点采集了切换过程中的时序提前（TA）测量信息，包括RACH触发、MAC CE和PDCCH授权等关键信令事件，填补了现有研究的空白。该数据集可支持AI/ML模型在切换管理、波束管理和TA预测等场景下的训练与评估，为6G智能移动性研究提供了重要基础。

2605.12410 2026-05-13 stat.ML cs.LG math.OC math.ST stat.TH

Model-based Bootstrap of Controlled Markov Chains

Ziwei Su, Imon Banerjee, Diego Klabjan

AI总结本文提出并分析了一种基于模型的引导方法，用于估计有限可控马尔可夫链（CMC）中的转移核，适用于可能具有非平稳或历史依赖控制策略的情形，这在行为策略未知的离线强化学习中具有重要意义。研究通过引入新的引导大数定律和鞅中心极限定理，建立了引导转移估计器在分布上的一致性，并进一步扩展到离线策略评估和最优策略恢复任务，获得了价值函数和Q函数的渐近有效置信区间。实验表明，该方法在覆盖精度上优于现有方法，尤其在小样本和短回合场景下表现更优。

Comments 45 pages, 7 figures, 19 tables

2605.12391 2026-05-13 astro-ph.EP astro-ph.SR cs.LG

Trajectory-Agnostic Asteroid Detection in TESS with Deep Learning

Brian P. Powell, Jorge Martinez-Palomera, Amy Tuson, Christina Hedges, Jessie Dotson, Jordan Caraballo-Vega

AI总结本文提出了一种基于深度学习的新方法，用于从TESS时序图像数据中检测小行星等移动天体。该方法采用两个堆叠的3D U-Net网络（称为W-Net）进行背景过滤和运动目标识别，并通过图像立方体旋转增强训练数据，使模型对小行星速度和方向的变化具有鲁棒性，无需预先设定参数范围。此外，研究还提出了一种自适应归一化方法，提升了数据处理效果，并公开了用于生成训练数据的工具库，适用于其他类似的时间域巡天任务。

Comments Accepted by The Astronomical Journal, 11 May 2026

2605.12365 2026-05-13 quant-ph cs.AI

QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning

Kien X. Nguyen, Ankit Kulshrestha, Ilya Safro, Xiaoyuan Liu

AI总结量子比特路由是量子编译中的一个基础难题，因其动态特性使得局部决策会随时间累积，难以获得全局最优解。本文提出QAP-Router，将量子比特路由建模为动态二次分配问题，并结合强化学习进行求解。通过将量子门交互建模为流矩阵，硬件拓扑建模为距离矩阵，统一表征了交互与距离之间的耦合关系，并在强化学习环境中定义了奖励函数。实验表明，该方法在多个真实量子电路数据集上显著降低了路由后的CNOT门数量。

2605.12364 2026-05-13 cs.CR cs.LG cs.MA

Attacks and Mitigations for Distributed Governance of Agentic AI under Byzantine Adversaries

Matthew D. Laws, Alina Oprea, Cristina Nita-Rotaru

AI总结本文研究了在拜占庭对手存在的情况下，如何对分布式智能体AI治理系统进行攻击与防御。作者分析了恶意提供者可能发起的多种攻击，并提出了四种不同安全与性能权衡的解决方案，包括基于拜占庭容错的SAGA-BFT、轻量监控的SAGA-MON、客户端审计的SAGA-AUD以及混合架构的SAGA-HYB，以提升系统安全性并适应不同应用场景的需求。

Comments 18 pages, 18 figures, 4 tables

详情

英文摘要

Agentic AI governance is a critical component of agentic AI infrastructure ensuring that agents follow their owner's communication and interaction policies, and providing protection against attacks from malicious agents. The state-of-the-art solution, SAGA, assumes a logically centralized point of trust, the Provider, which serves as a repository for user and agent information and actively enforces policies. While SAGA provides protection against malicious agents, it remains vulnerable to a malicious Provider that deviates from the protocol, undermining the security of the identity and access control infrastructure. Deployment on both private and public clouds, each susceptible to insider threats, further increases the risk of Provider compromise. In this work, we analyze the attacks that can be mounted from a compromised Provider, taking into account the different system components and realistic deployments. We identify and execute several concrete attacks with devastating effects: undermining agent attributability, extracting private data, or bypassing access control. We then present three types of solutions for securing the Provider that offer different trade-offs between security and performance. We first present SAGA-BFT, a fully byzantine-resilient architecture that provides the strongest protection, but incurs significant performance degradation, due to the high-cost of byzantine resilient protocols. We then propose SAGA-MON and SAGA-AUD, two novel solutions that leverage lightweight server-side monitoring or client-side auditing to provide protection against most classes of attacks with minimal overhead. Finally, we propose SAGA-HYB, a hybrid architecture that combines byzantine-resilience with monitoring and auditing to trade-off security for performance. We evaluate all the architectures and compare them with SAGA. We discuss which solution is best and under what conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.12362 2026-05-13 cs.NE cs.AI

A Family of Quaternion-Valued Differential Evolution Algorithms for Numerical Function Optimization

Gerardo Altamirano-Gomez, Álvaro Gallardo, Carlos Ignacio Hernández Castellanos

AI总结本文提出了一种基于四元数的差分进化算法（QDE）家族，用于解决连续函数的数值优化问题。该算法直接在四元数空间中进行操作，设计了多种利用四元数代数与几何特性的变异策略，提升了算法的收敛速度和优化性能。实验结果表明，QDE在BBOB基准测试中优于传统的实数型差分进化算法，展示了其在计算智能领域的潜力与优势。

2605.12341 2026-05-13 stat.ML cs.LG

Multi-Variable Conformal Prediction: Optimizing Prediction Sets without Data Splitting

Laura Lützow, Simone Garatti, Marco C. Campi, Lars Lindemann, Matthias Althoff

AI总结该论文提出了一种多变量校准预测（MCP）框架，旨在在不进行数据划分的情况下优化预测集的形状，同时保持有限样本下的覆盖保证。MCP 扩展了传统校准预测方法，支持向量值评分函数和多个校准变量，将预测集设计与校准统一为一个优化问题。研究提出了两种高效变体 RemMCP 和 RelMCP，分别适用于不同类型的优化需求，并在实验中验证了其在保持目标覆盖的同时，能够获得更小或相当的预测集大小，并显著降低校准过程中的方差。

2605.12335 2026-05-13 cs.IR cs.AI cs.LG

EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records

Saeed Shurrab, Mariam Al-Omari, Dana El Samad, Farah E. Shamout

AI总结电子健康记录（EHR）包含丰富的患者纵向信息，广泛应用于预测建模，但如何有效利用历史数据仍面临轨迹长、事件异构、时间不规则等挑战。本文提出EHR-RAGp，一种基于检索增强的原型引导基础模型，通过动态整合不同临床事件类型的最相关历史信息，提升预测性能。该模型引入原型引导检索模块，用于对齐和评估历史数据与预测任务的相关性，从而引导模型关注最具信息量的上下文，在多个临床预测任务中表现优于现有先进模型。

Comments Retrieval Augmented EHR Foundation Model

2605.12303 2026-05-13 cs.HC cs.CV cs.LG

From Model Uncertainty to Human Attention: Localization-Aware Visual Cues for Scalable Annotation Review

Moussa Kassem Sbeyti, Joshua Holstein, Philipp Spitzer, Nadja Klein, Gerhard Satzger

AI总结高质量的标注数据对训练鲁棒的机器学习模型至关重要，但在大规模标注任务中，获取标注仍然成本高昂。本文研究了如何通过可视化模型的空间不确定性来辅助人类标注者更有效地审查标注结果，提出了一种定位感知的视觉提示方法，帮助标注者识别可能出错的区域。实验表明，使用该方法的标注者在保证标注质量的同时，整体效率更高，验证了空间不确定性作为改进人机协同标注的有效手段。

2605.12287 2026-05-13 eess.AS cs.SD

The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking

Jaehoon Ahn, Tae Gum Hwang, Moon-Ryul Jung

AI总结近年来，基于深度神经网络的节拍跟踪模型在主流打击乐数据集上表现出色，但在SMC数据集上却始终表现不佳。本文分析了当前最先进的模型在SMC数据集中的失败模式，发现其主要问题包括八度错误、连续性错误以及整体跟踪失败，并指出这些模型容易产生“自信但错误”的激活结果。研究还揭示了标准DBN模型因默认最低节拍限制导致对21%的SMC曲目无法正确推断节拍，从而影响了整体性能，为改进节拍和强拍检测提供了具体方向。

Comments 6 pages, 3 figures. Technical report on beat tracking failure modes; prepared for ISMIR 2026