arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 4033
专题追踪 全部专题
2605.10317 2026-05-12 cs.LG cs.AI

Relations Are Channels: Knowledge Graph Embedding via Kraus Decompositions

Sayan Kumar Chaki

发表机构 * Inria, Laboratoire Hubert Curien, Université Jean Monnet(法国国家科学研究中心(Inria)、Hubert Curien实验室、让·莫内大学)

AI总结 本文提出了一种基于Kraus分解的知识图谱嵌入方法,通过引入线性、迹保持和完全正性三个结构公理,将关系操作符形式化为Kraus通道,从而为关系建模提供了理论基础。该方法不仅能够自然处理多对多关系,还支持多跳推理并消除了对实体嵌入范数的约束,同时提出了首个具有理论依据的关系复杂度度量。实验表明,该模型在多对多关系任务上显著优于现有方法。

详情
英文摘要

Knowledge graph embedding (KGE) models typically represent each relation as an operator on entity embeddings. In this work, we identify three structural axioms that any principled relation operator must satisfy, linearity, trace preservation, and complete positivity, and show that they characterize a Kraus channel structure via the Kraus representation theorem. The completeness constraint defining this family is equivalent to these axioms, providing a principled foundation rather than an externally imposed condition. Under this formulation, most existing operator-based KGE models are recoverable as special cases with Kraus rank $κ= 1$ under specific embedding choices. We further generalize this characterization to arbitrary metric geometries by introducing \mbox{w-Kraus} channels, which satisfy completeness by construction within their respective spaces. Building on this theory, we propose \textsc{KrausKGE}, a principled KGE model that naturally handles $1$-to-$N$ and $N$-to-$N$ relations, supports $k$-hop reasoning without requiring explicit path encoders, and eliminates the need for norm constraints on entity embeddings. Additionally, our framework yields the first theoretically grounded per-relation complexity measure in the KGE literature, with a provable lower bound in terms of the empirical relation matrix rank. Empirical evaluation demonstrates that \textsc{KrausKGE} consistently outperforms strong baselines on $N$-to-$N$ relations, with performance gains that increase monotonically with relation fan-out, in alignment with theoretical predictions.

2605.10315 2026-05-12 cs.LG cs.AI

Active Tabular Augmentation via Policy-Guided Diffusion Inpainting

Zheyu Zhang, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

发表机构 * Technical University of Munich(慕尼黑技术大学) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心(MCML))

AI总结 本文研究了在数据稀缺场景下如何通过生成表格数据来提升下游模型性能的问题。传统方法侧重于生成数据的分布保真度,但未能有效提升模型表现。为此,作者提出了TAP方法,结合扩散补全技术与条件策略,动态选择生成内容和注入时机,以最大化对当前学习器的提升效果。实验表明,TAP在多个真实数据集上显著优于现有方法,分类准确率提升最高达15.6个百分点,回归任务的RMSE降低最高达32%。

Comments Accepted for publication at ICML 2026

详情
英文摘要

Generative tabular augmentation is appealing in data-scarce domains, yet the prevailing focus on distributional fidelity does not reliably translate into better downstream models. We formalize a fidelity-utility gap: common generative objectives prioritize distributional plausibility, whereas augmentation succeeds only when injected samples reduce the current learner's held-out evaluation loss. This gap motivates learning not just how to generate, but what to generate and when to inject as training evolves. We propose TAP (Tabular Augmentation Policy), which couples diffusion inpainting with a lightweight, learner-conditioned policy to steer generation toward high-utility regions and controls safe injection via explicit gating and conservative windowed commitment. Under severe data scarcity, TAP consistently outperforms strong generative baselines on seven real-world datasets, improving classification accuracy by up to 15.6 percentage points and reducing regression RMSE by up to 32%.

2605.10313 2026-05-12 cs.LG math.OC

Signature Approach for Contextual Bandits with Nonlinear and Path-dependent Rewards

Xin Guo, Grace He, Xinyu Li

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Oxford(牛津大学)

AI总结 本文研究具有非线性和路径依赖奖励的上下文多臂老虎机问题,提出了一种基于签名变换的新方法,将连续路径依赖的奖励函数在签名空间中近似为线性函数,从而能够高效地应用线性上下文老虎机算法并保留序列结构信息。基于该框架,作者设计了签名驱动的离散上置信界算法DisSigUCB,并在一定假设下证明了其高概率数据依赖的次线性遗憾界。实验表明,该算法在非线性和路径依赖场景下优于传统线性和核方法。

详情
英文摘要

We study contextual bandits with nonlinear and path-dependent rewards through a novel signature-transform-based approach. Leveraging the universal nonlinearity property of signatures, we approximate continuous path-dependent reward functionals by linear functionals in the signature space. This representation enables the use of efficient linear contextual bandit methods while preserving expressive sequential structure. Building on this framework, we propose \texttt{DisSigUCB}, a signature-based disjoint upper confidence bound (UCB) algorithm. Under boundedness and non-degeneracy assumptions, we prove a high-probability data-dependent sublinear regret bound of order \(\tilde{\mathcal O}(\sqrt{(d+m)KT})\) where \(d\) is the context dimension and \(m\) is the signature feature dimension. Synthetic experiments and numerical applications on temperature sensor monitoring, sleep-stage classification, and hospital nurse staffing demonstrate that \texttt{DisSigUCB} consistently outperforms classical linear and kernelized contextual bandit baselines in nonlinear and path-dependent settings.

2605.10298 2026-05-12 cs.LG

Set Prediction for Next-Day Active Fire Forecasting

Yuchen Bai, Georgios Athanasiou, Xin Yu, Diogenis Antonopoulos, Ioannis Papoutsis, Stijn Hantson, Nuno Carvalhais

发表机构 * Max Planck Institute for Biogeochemistry(马克斯·普朗克生物地球化学研究所) Orion Lab(奥里昂实验室) University of Utah(犹他大学) National Observatory of Athens(雅典国家天文台) Earth System Science Program, School of Sciences and Engineering, Universidad del Rosario(地球系统科学计划,科学与工程学院,罗萨里奥大学) Departamento de Ciências e Engenharia do Ambiente, Faculdade de Ciências e Tecnologia, Universidade Nova Lisboa(环境科学与工程系,科学与技术学院,新里斯本大学) ELLIS Unit Jena(耶纳ELLIS单位)

AI总结 本文提出了一种名为WISP的模型,用于高分辨率的次日主动火点预测,将火点预测问题重新定义为点集预测任务。该模型基于48小时的多源数据,如气象、植被、地理和历史火点信息,在375米网格上预测未来火点集群中心的固定大小排名集合,并通过匈牙利匹配进行端到端训练。实验表明,该方法在全局测试集上取得了较高的平均精度和火点覆盖度,为高分辨率火灾预测提供了新的方法和基准。

详情
英文摘要

Accurate next-day active fire forecasts can support early warning, disaster response, forest risk assessment, and downstream estimation of fire-related carbon emissions. Existing machine learning approaches to wildfire forecasting typically predict wildfire danger or fire probability on kilometre-scale daily grids, which is useful for regional warning but does not directly represent localized fire events. We propose Wildfire Ignition Set Predictor (WISP), a query-based model that reformulates next-day active fire forecasting as point-set prediction. From 48 hours of covariates including meteorology, satellite vegetation products, static land, and fire history, WISP predicts a fixed-size ranked set of future active fire cluster centres on a 375 m grid across globally distributed regions. The model is trained end-to-end with Hungarian matching; to address the conflicting roles of the classification score in assignment, ranking, and query activation, we use asymmetric classification-localization weighting in matching and loss. We further construct a globally distributed, hourly, multi-source benchmark for this task. On a held-out test set spanning fire regions worldwide, the best WISP variant achieves 38.2% average precision (AP) for ranked fire-centre detections, covers 53.4% of fire cluster mass weighted by fire radiative power (FRP), and localizes 54.1% of observed clusters within 5 km. These results establish sparse set prediction as a viable formulation for high-resolution wildfire forecasting and provide a benchmark for future work in this regime.

2605.10296 2026-05-12 cs.CL cs.AI cs.IR cs.LG

Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding

Anton Bazdyrev, Ivan Bashtovyi, Ivan Havlytskyi, Oleksandr Kharytonov, Artur Khodakovskyi

发表机构 * National Technical University of Ukraine(乌克兰国家技术大学)

AI总结 本文研究了如何利用现成的检索增强生成(RAG)方法解决乌克兰语多领域文档理解任务,具体为从PDF文档中回答多项选择题并定位支持信息。作者提出了一种基于上下文分块、问题感知的密集检索与重排序以及受限答案生成的管道,有效提升了系统性能。实验表明,使用Qwen系列模型进行检索与重排序能够显著提高召回率和答案准确率,在公开和私有测试集上均取得优异成绩,验证了结构保留和答案空间感知在严格竞赛条件下的有效性。

Comments Accepted to The Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)

详情
英文摘要

We participated in the Fifth UNLP shared task on multi-domain document understanding, where systems must answer Ukrainian multiple-choice questions from PDF collections and localize the supporting document and page. We propose a retrieval-augmented pipeline built around three ideas: contextual chunking of PDFs, question-aware dense retrieval and reranking conditioned on both the question and answer options, and constrained answer generation from a small set of reranked passages. Our final system uses Qwen3-Embedding-8B for retrieval, a fine-tuned Qwen3-Reranker-8B for passage ranking, and Qwen3-32B for answer selection. On a held-out split, reranking improves Recall@1 from 0.6957 to 0.7935, while using the top-2 reranked passages raises answer accuracy from 0.9348 to 0.9674. Our best leaderboard run reached 0.9452 on the public leaderboard and 0.9598 on the private leaderboard. Our results suggest that, under strict code-competition constraints, preserving document structure and making relevance estimation aware of the answer space are more effective than adding complex downstream heuristics.

2605.10295 2026-05-12 cs.CL

DECO-MWE: building a linguistic resource of Korean multiword expressions for feature-based sentiment analysis

Jaeho Han, Changhoe Hwang, Seongyong Choi, Gwanghoon Yoo, Eric Laporte, Jeesun Nam

发表机构 * DICORA, Department of Linguistics and Cognitive Science, Hankuk University of Foreign Studies, Korea(DICORA,语言学与认知科学系,韩国民法大学,韩国) Université Paris-Est, LIGM, CNRS, UPEM, ESIEE, ENPC, France(巴黎-est大学,LIGM,法国国家科学研究中心,UPEM,ESIEE,ENPC,法国)

AI总结 本文旨在构建一个用于基于特征的情感分析的韩语多词表达(MWE)语言资源DECO-MWE。为高效构建情感相关的MWE资源,研究采用局部语法图(LGG)方法,将DECO-MWE形式化为有限状态转换器,以表达MWE的词法和句法限制。通过构建化妆品评论语料库并进行实证分析,研究识别出四类MWE,并在测试语料中实现了0.806的F值,为基于特征的情感分析提供了通用的多词表达词典和可复用的有限状态处理方法。

Journal ref 13th Workshop on Asian Language Resources, May 2018, Miyazaki, Japan, pp.14-20

详情
英文摘要

This paper aims to construct a linguistic resource of Korean Multiword Expressions for Feature-Based Sentiment Analysis (FBSA): DECO-MWE. Dealing with multiword expressions (MWEs) has been a critical issue in FBSA since many constructs reveal lexical idiosyncrasy. To construct linguistic resources of sentiment MWEs efficiently, we utilize the Local Grammar Graph (LGG) methodology: DECO-MWE is formalized as a Finite-State Transducer that represents lexical-syntactic restrictions on MWEs. In this study, we built a corpus of cosmetics review texts, which show particularly frequent occurrences of MWEs. Based on an empirical examination of the corpus, four types of MWEs have been distinguished. The DECO-MWE thus covers the following four categories: Standard Polarity MWEs (SMWEs), Domain-Dependent Polarity MWEs (DMWEs), Compound Named Entity MWEs (EMWEs) and Compound Feature MWEs (FMWEs). The retrieval performance of the DECO-MWE shows 0.806 f-measure in the test corpus. This study brings a twofold outcome: first, a sizeable general-purpose polarity MWE lexicon, which may be broadly used in FBSA; second, a finite-state methodology adopted in this study to treat domain-dependent MWEs such as idiosyncratic polarity expressions, named entity expressions or feature expressions, and which may be reused in describing linguistic properties of other corpus domains.

2605.10293 2026-05-12 cs.LG cs.AI

Robust Probabilistic Shielding for Safe Offline Reinforcement Learning

Maris F. L. Galesloot, Thomas Rhemrev, Nils Jansen

发表机构 * Radboud University Nijmegen The Netherlands Ruhr University\,\&\,Radboud University Bochum Germany Radboud University Ruhr University\,\&\,Radboud University

AI总结 本文研究了如何在离线强化学习中实现安全策略改进的问题,提出了鲁棒的概率屏蔽方法,通过结合安全策略改进(SPI)与屏蔽技术,仅利用已有数据集和安全状态知识,在策略优化过程中提供性能与安全性的双重保障。该方法能够在高概率下确保改进后的策略既优于基线策略,又满足安全约束,实验表明其在数据量较少时表现出更优的平均与最差情况性能。

详情
英文摘要

In offline reinforcement learning (RL), we learn policies from fixed datasets without environment interaction. The major challenges are to provide guarantees on the (1) performance and (2) safety of the resulting policy. A technique called safe policy improvement (SPI) provides a performance guarantee: with high probability, the new policy outperforms a given baseline policy, which is assumed to be safe. Orthogonally, in the context of safe RL, a shield provides a safety guarantee by restricting the action space to those actions that are provably safe with respect to a given safety-relevant model. We integrate these paradigms by extending shielding to offline RL, relying solely on the available dataset and knowledge of safe and unsafe states. Then, we shield the policy improvement steps, guaranteeing, with high probability, a safe policy. Experimental results demonstrate that shielded SPI outperforms its unshielded counterpart, improving both average and worst-case performance, particularly in low-data regimes.

2605.10292 2026-05-12 cs.LG cs.AI

LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling

Sheng Pan, Ming Jin, Bo Du, Shirui Pan

发表机构 * Griffith University(格里菲斯大学)

AI总结 本文提出了一种名为 LeapTS 的新型时间序列预测框架,将传统固定映射的预测任务重新定义为动态的多步调度过程,以更好地适应未来时间点的演变。LeapTS 通过分层控制器和神经控制微分方程实现多级决策,动态选择预测尺度和推进步长,从而提升模型对非平稳动态的捕捉能力。实验表明,LeapTS 在多个真实和合成数据集上显著提升了预测性能,并实现了比基于 Transformer 的模型更快的推理速度。

详情
英文摘要

Time series forecasting serves as an essential tool for many real-world applications, supporting tasks such as resource optimization and decision-making. Despite significant architectural advancements, most modern models still treat forecasting task as a fixed mapping from history to target horizons. This induces temporal decoupling across future time points and limits the model's ability to adapt to the evolving context as forecasting progresses. In this work, we present LeapTS, a novel framework that reformulates time series forecasting as a dynamic scheduling process over the prediction horizon. Specifically, LeapTS organizes the forecasting process into multi-level decisions using: (1) the hierarchical controller to dynamically select the optimal prediction scale and advancement length at each step, and (2) continuous-time state evolution driven by neural controlled differential equations. Within this process, the controlled update mechanism explicitly couples the irregular temporal dynamics with discrete scheduling feedback. Extensive evaluations on both real-world and synthetic datasets demonstrate that LeapTS improves overall forecasting performance by at least 7.4% while achieving a 2.6$\times$ to 5.3$\times$ inference speedup over representative Transformer-based models. Furthermore, by explicitly tracing the scheduling trajectories, we reveal how the model autonomously adapts its forecasting behavior to capture non-stationary dynamics.

2605.10286 2026-05-12 cs.AI

AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks

Baraa Al Jorf, Farah E. Shamout

发表机构 * New York University Abu Dhabi(纽约大学阿布扎克分校)

AI总结 本文提出并评估了基于大语言模型(LLM)的智能体在多模态临床预测任务中的性能,研究了其在电子健康记录、医学影像、报告和临床笔记等异构数据上的表现。通过大规模真实医疗数据的系统性实验,发现单一智能体框架在多模态任务中优于简单的多智能体系统,具有更强的数据处理能力和校准效果。该研究为医疗领域智能体系统的进一步发展提供了新的基准,并开源了代码和评估框架。

Comments Accepted at the AHLI Conference on Health, Inference, and Learning 2026

详情
英文摘要

Building effective clinical decision support systems requires the synthesis of complex heterogeneous multimodal data. Such modalities include temporal electronic health records data, medical images, radiology reports, and clinical notes. Large language model (LLM)-based agents have shown impressive performance in various healthcare tasks, especially those involving textual modalities. Considering the fragmentation of healthcare data across hospital systems, collaborative agent frameworks present a promising direction to mitigate data sharing challenges. However, the effectiveness of LLM agents for multimodal clinical risk prediction remains largely unexamined. In this work, we conduct a systematic evaluation of LLM-based agents for clinical prediction tasks using large-scale real-world data. We assess performance in unimodal and multimodal settings and quantify performance gaps between single agent and multi-agent systems. Our findings highlight that single agent frameworks outperform naive multi-agent systems, are better at handling multimodal data, and are better calibrated. This underscores a critical need for improving multi-agent collaboration to better handle heterogeneous inputs. By open-sourcing our code and evaluation framework, this work offers a new benchmark to support future developments relating to agentic systems in healthcare.

2605.10281 2026-05-12 cs.SD cs.AI

Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs

Konstantinos Soiledis, Maximos Kaliakatsos-Papakostas, Dimos Makris, Konstantinos Tsamis

发表机构 * Dept. of Music Technology and Acoustics, Hellenic Mediterranean University(音乐技术与声学系,希腊地中海大学)

AI总结 本文研究如何从带有微时值和力度信息的表达性鼓点网格(MIDI表示)直接生成逼真的鼓音频,提出了一种基于神经音频编解码器的方法。该方法使用基于Transformer的模型将鼓点网格映射为编解码器的离散码元序列,并通过预训练的编解码器解码器生成波形音频。实验表明,该方法在大型人类鼓演奏数据集E-GMD上表现出良好的音频保真度和音乐对齐性,为鼓点到音频的生成提供了有效途径,并为打击乐合成中的音频码元选择提供了实用参考。

详情
英文摘要

Generating realistic drum audio directly from symbolic representations is a challenging task at the intersection of music perception and machine learning. We propose a system that transforms an expressive drum grid, a time-aligned MIDI representation with microtiming and velocity information, into drum audio by predicting discrete codes of a neural audio codec. Our approach uses a Transformer-based model to map the drum grid input to a sequence of codec tokens, which are then converted to waveform audio via a pre-trained codec decoder. We experiment with multiple state-of-the-art neural codecs, namely EnCodec, DAC, and X-Codec, to assess how the choice of audio representation impacts the quality of the generated drums. The system is trained and evaluated on the Expanded Groove MIDI Dataset, E-GMD, a large collection of human drum performances with paired MIDI and audio. We evaluate the fidelity and musical alignment of the generated audio using objective metrics. Overall, our results establish codec-token prediction as an effective route for drum grid-to-audio generation and provide practical insights into selecting audio tokenizers for percussive synthesis.

2605.10279 2026-05-12 cs.LG

DeepLog: A Software Framework for Modular Neurosymbolic AI

Robin Manhaeve, Stefano Colamonaco, Vincent Derkinderen, Rik Adriaensen, Lucas Van Praet, Luc De Raedt, Giuseppe Marra

发表机构 * Department of Computer Science and Leuven.AI(计算机科学系和Leuven.AI)

AI总结 DeepLog 是一个基于 PyTorch 的模块化神经符号人工智能框架,旨在将逻辑推理与深度学习统一在一个操作流程中。该框架通过将多种神经符号语言作为高层规范进行自动编译,生成优化的算术电路,从而降低了机器学习实践者的使用门槛,并为神经符号系统开发者提供了一个高性能的共享平台。其核心贡献在于实现了神经符号系统的模块化与通用化,便于不同方法的集成与实验。

Comments Preprint accepted at IJCAI2026 Demo Track

详情
英文摘要

DeepLog is an operational neurosymbolic framework that unifies logic and deep learning within standard PyTorch workflows. While existing neurosymbolic systems focus on a particular paradigm and semantics, DeepLog serves as a universal backend that can emulate many systems in the neurosymbolic alphabet soup. By treating diverse neurosymbolic languages as high-level specifications, the DeepLog software automatically compiles them into optimized arithmetic circuits. This design lowers the barrier for machine learning practitioners by treating logic as composable modules, while providing neurosymbolic developers with a shared, high-performance basis for prototyping new integration strategies. The code is available here: https://github.com/ML-KULeuven/deeplog

2605.10278 2026-05-12 cs.LG

Predictive Radiomics for Evaluation of Cancer Immune SignaturE in Glioblastoma: the PRECISE-GBM study

Prajwal Ghimire, Junjie Li, Liu Yaou, Marc Modat, Thomas Booth

发表机构 * School of Biomedical Engineering & Imaging Sciences, King’s College London, UK(伦敦国王学院生物医学工程与成像科学学院) Department of Neurosurgery, King’s College Hospital, London, UK(伦敦国王学院医院神经外科部门) Department of Neuroradiology, Beijing Tiantan Hospital, Beijing, China(北京天坛医院神经放射科部门) Department of Neuroradiology, King’s College Hospital, London, UK(伦敦国王学院医院神经放射科部门)

AI总结 本研究旨在通过影像基因组学方法,开发并验证用于评估IDH野生型胶质母细胞瘤免疫特征的影像生物标志物。研究利用多中心回顾性数据,结合深度学习分割的MRI影像特征与基因组数据,构建并验证了基于放射组学的免疫签名预测模型。结果表明,所提出的模型能够非侵入性地预测巨噬细胞M0亚型的免疫特征,具有良好的稳定性和泛化能力,有望用于指导胶质母细胞瘤患者的免疫治疗分层。

Comments Abstract : 226; Importance of study: 109; Manuscript: 5690 (excluding references) Figures: 4, Tables: 2 Supplemental File: 1

Journal ref Neuro-Oncology Advances 2026. Published online May 2, 2026

详情
英文摘要

Background: Radiogenomics allows identification of radiological biomarkers for genomic phenotypes. In glioblastoma, these biomarkers could potentially complement patient stratification strategies. We aim to develop and analytically validate radiological biomarkers that capture immune cell signatures within IDH-wildtype glioblastoma microenvironment using radiogenomic analysis. Methods: This was a retrospective multicenter study using curated open-access anonymized imaging and genomic data from TCGA-GBM, CPTAC, IvyGAP, REMBRANDT and CGGA datasets. Imaging data consisted of MRI-based radiomic features extracted from necrotic core, enhancing and edema regions of deep learning-based auto-segmented tumors. Radiomic feature selections were performed using nested cross-validated LASSO. Support vector machine and ensemble models were trained using seventeen immune and cell-specific score labels extracted from deconvoluted transcriptomic data using pan-cancer and glioblastoma immune signature matrices as reference standards. Seventeen classifier models trained in three cross-cohort strategies were validated on three held-out datasets assessing stability and generalizability. Results: One-hundred-and-seventy-six patients were included in the study. The immune-related radiomic signatures obtained after feature selection were shape, first order and higher order radiomic features. Models predicting macrophage subtype immune signature showed stable mean performance on balanced accuracy (0.67) and precision (0.89) metrics for three independent holdout datasets with ensemble model outperforming support vector machine model. Conclusion: Radiogenomic models non-invasively predicted the macrophage subtype M0 immune signature in IDH-wildtype glioblastoma. These biomarkers have the potential to stratify patients for immunotherapy within prospective glioblastoma clinical trials.

2605.10277 2026-05-12 cs.LG math.AP stat.ML

Generalization Error Bounds for Picard-Type Operator Learning in Nonlinear Parabolic PDEs

Koichi Taniguchi, Sho Sonoda

发表机构 * Department of Mathematical and Systems Engineering, Faculty of Engineering(工学系数学与系统工程系) RIKEN AIP / CyberAgent(RIKEN AIP/ CyberAgent)

AI总结 本文研究了基于Duhamel-Picard迭代的非线性抛物型偏微分方程(PDE)解算子的学习问题,提出了一个抽象的状态转移模型框架,并推导了与实现无关的泛化误差界,将实现误差与估计误差分离。核心贡献在于揭示了增加Picard迭代深度可以减少截断误差,同时避免熵估计误差的无界增长,并将该理论应用于环面上非线性热方程的Picard型傅里叶神经算子实现中。

Comments 39 pages

详情
英文摘要

Operator learning for partial differential equations (PDEs) aims to learn solution operators on infinite-dimensional function spaces from finite-resolution data. In this setting, it is important for the learned model to be discretization-invariant, or resolution-robust, and to reflect PDE-specific structure. It is therefore natural to ask how such structure should be encoded in the model architecture, hypothesis class, or learning procedure. In this paper, we study operator learning for solution operators of nonlinear parabolic PDEs based on Duhamel--Picard iteration. We formulate Picard iteration as an abstract state-transition model and present a theoretical framework for Picard-type operator learning. We derive implementation-agnostic generalization error bounds that separate the implementation error from the estimation error associated with the abstract state-transition model induced by Picard iteration. A key consequence is that increasing the Picard depth reduces the Picard truncation error without causing an unbounded growth of the entropy-based estimation error. We also extend the analysis to long-time prediction by rolling out the same learned local model over successive time blocks. Finally, we illustrate the theory for nonlinear heat equations on the torus using a Picard-type Fourier neural operator as a concrete implementation.

2605.10275 2026-05-12 cs.CV

PolarVSR: A Unified Framework and Benchmark for Continuous Space-Time Polarization Video Reconstruction

Chenggong Li, Yidong Luo, Junchao Zhang, Boxin Shi, Degui Yang

发表机构 * School of Automation, Central South University(中南大学自动化学院) Hunan Provincial Key Laboratory of Optic-Electronic Intelligent Measurement and Control(湖南省光学电子智能测量控制重点实验室) Zhejiang University(浙江大学) School of Engineering, Westlake University(西湖大学工程学院) State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University(北京大学计算机学院多媒体信息处理国家重点实验室) National Engineering Research Center of Visual Technology, School of Computer Science, Peking University(视觉技术国家工程研究中心,北京大学计算机学院)

AI总结 本文提出了一种统一的时空极化视频重建框架PolarVSR,旨在解决主流分焦平面极化成像中从混色阵列中恢复极化参数这一具有挑战性的逆问题。该方法通过联合建模空间与时间上的极化方向,并结合极化感知的隐式神经表示,实现了连续且高保真的超分辨率重建。同时,引入了基于光流引导的极化变化损失以优化极化动态,还建立了首个大规模彩色DoFP极化视频基准数据集,实验结果验证了方法的有效性。

详情
英文摘要

Polarimetric imaging captures surface polarization characteristics, such as the Degree of Linear Polarization (DoLP) and the Angle of Polarization (AoP). In mainstream Division of-Focal-Plane (DoFP) color polarization imaging, recovering polarization parameters from captured mosaic arrays remains a challenging inverse problem. Existing DoFP cameras also face hardware bottlenecks and often cannot support high-frame-rate acquisition, limiting polarimetric imaging in dynamic video tasks. These limitations motivate joint spatial and temporal enhancement. To this end, we propose the first space-time polarization video reconstruction architecture. The method jointly models polarization directions in space and time and uses a polarization-aware implicit neural representation for continuous, high-fidelity upsampling. By analyzing temporal variations in polarization parameters, we further introduce a flow-guided polarization variation loss to supervise polarization dynamics. We also establish the first large-scale color DoFP polarization video benchmark to support this research direction. Extensive experiments on this benchmark demonstrate the effectiveness of the method.

2605.10272 2026-05-12 cs.LG cs.AI cs.CR cs.DC

DP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Models

Haaris Mehmood, Jie Xu, Karthikeyan Saravanan, Rogier Van Dalen, Mete Ozay

发表机构 * Samsung AI Centre Cambridge(三星剑桥人工智能中心)

AI总结 本文提出了一种轻量级自适应剪切方法DP-LAC,用于在联邦学习中实现语言模型的差分隐私微调。该方法通过私有直方图估计初步确定剪切阈值,并在训练过程中动态调整该阈值,而无需额外消耗隐私预算或引入新超参数。实验表明,DP-LAC在准确率上优于现有自适应剪切方法和传统DP-SGD,平均提升了6.6%。

Comments Accepted at ICASSP 2026

详情
英文摘要

Federated learning (FL) enables the collaborative training of large-scale language models (LLMs) across edge devices while keeping user data on-device. However, FL still exposes sensitive information through client-provided gradients. Differentially private stochastic gradient descent (DP-SGD) mitigates this risk by clipping each client's contribution to a threshold $C$ and adding noise proportional to $C$. Existing adaptive clipping techniques dynamically adjust $C$ but demand tedious hyperparameter tuning, which can erode the privacy budget. In this paper, we introduce DP-LAC, a method that first estimates an initial clipping threshold within an order of magnitude of the optimum using private histogram estimation, and then adapts this threshold during training without consuming additional privacy budget or introducing new hyperparameters. Empirical results show that DP-LAC outperforms both state-of-the-art adaptive clipping methods and vanilla DP-SGD, achieving an average accuracy gain of $6.6\%$.

2605.10269 2026-05-12 cs.CV cs.RO

Increasing the Efficiency of DETR for Maritime High-Resolution Images

Tinsae Yehuala, Hao Cheng, Ville Lehtola

发表机构 * Dept. of Earth Observation Science, ITC Faculty, University of Twente(地球观测科学系,ITC学院,特文特大学)

AI总结 本文针对海上无人水面船舶(USV)安全导航中高分辨率图像的目标检测需求,研究如何提升DETR模型的检测效率。作者采用基于状态空间模型(SSM)的Vision Mamba(ViM)作为主干网络,结合序列化图像分块处理与特征金字塔网络设计,有效提升了对远距离、小目标及大尺度变化的检测能力。通过引入令牌剪枝等优化策略,该方法在保持检测精度的同时显著降低了计算和内存开销,为海上实时目标检测提供了更高效可靠的解决方案。

Comments Accepted to IEEE ITSC 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses. DOI to be added upon publication

详情
英文摘要

Maritime object detection is critical for the safe navigation of unmanned surface vessels (USVs), requiring accurate recognition of obstacles from small buoys to large vessels. Real-time detection is challenging due to long distances, small object sizes, large-scale variations, edge computing limitations, and the high memory demands of high-resolution imagery. Existing solutions, such as downsampling or image splitting, often reduce accuracy or require additional processing, while memory-efficient models typically handle only limited resolutions. To overcome these limitations, we leverage Vision Mamba (ViM) backbones, which build on State Space Models (SSMs) to capture long-range dependencies while scaling linearly with sequence length. Images are tokenized into sequences for efficient high-resolution processing. For further computational efficiency, we design a tailored Feature Pyramid Network with successive downsampling and SSM layers, as well as token pruning to reduce unnecessary computation on background regions. Compared to state-of-the-art methods like RT-DETR with ResNet50 backbone, our approach achieves a better balance between performance and computational efficiency in maritime object detection.

2605.10268 2026-05-12 cs.CL cs.AI

MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading

Baibei Ji, Xiaoyang Weng, Juntao Li, Zecheng Tang, Yihang Lou, Min Zhang

发表机构 * Soochow University(苏州大学) Peking University(北京大学)

AI总结 为了解决长上下文推理任务中标准注意力机制带来的二次复杂度问题,研究提出了一种基于智能体记忆的方法,通过动态更新记忆来线性处理文档块。然而,现有方法在记忆覆盖过程中可能丢失潜在证据,为此,MemReread 引入了基于问题分解和重读的机制,在最终记忆不足时触发重读,从而恢复被提前丢弃的间接事实,支持非线性推理同时保持文档理解的逻辑流程。此外,研究还引入强化学习框架,提升模型对长文本的外推能力,并根据任务复杂度动态控制重读次数,有效平衡了性能与计算开销。

详情
英文摘要

To tackle long-context reasoning tasks without the quadratic complexity of standard attention mechanisms, approaches based on agent memory have emerged, which typically maintain a dynamically updated memory when linearly processing document chunks. To mitigate the potential loss of latent evidence in this memorize-while-reading paradigm, recent works have integrated retrieval modules that allow agents to recall information previously discarded during memory overwriting. However, retrieval-based recall suffers from both evidence loss during memory formation and interference induced by invalid queries. To overcome these limitations, we propose MemReread. Built upon streaming reading, MemReread circumvents intermediate retrieval. It triggers question decomposition and rereading when the final memory is insufficient, enabling the recovery of indirect facts that were prematurely discarded. This design supports non-linear reasoning while preserving the inherent logical flow of document comprehension. To further enhance practicality, we introduce a reinforcement learning framework that enhances length extrapolation capability while dynamically determining the number of rereading passes based on task complexity, thereby flexibly controlling computational overhead. Extensive experiments demonstrate that MemReread consistently outperforms baseline frameworks on long-context reasoning tasks, while maintaining linear time complexity with respect to context length.

2605.10261 2026-05-12 cs.AI cs.LG

E-TCAV: Formalizing Penultimate Proxies for Efficient Concept Based Interpretability

Hasib Aslam, Muhammad Ali Chattha, Muhammad Taha Mukhtar, Muhammad Imran Malik, Andreas Dengel, Sheraz Ahmed

发表机构 * National University of Sciences and Technology(国立科学与技术大学) German Research Centre for Artificial Intelligence(德国人工智能研究中心)

AI总结 本文提出了一种名为E-TCAV的高效概念解释框架,用于解决传统TCAV方法在计算开销、层间评分不一致和统计稳定性方面的不足。通过深入分析TCAV方法的三个关键方面,E-TCAV利用最终层作为早期层的快速代理,显著提升了计算效率,并在多个网络架构和数据集上验证了其有效性。实验表明,最终层与倒数第二层在TCAV评分上高度一致,且评分方差主要由潜在分类器的选择引起,从而为高效模型调试和实时概念引导训练提供了可行方案。

详情
英文摘要

TCAV (Testing with Concept Activation Vectors) is an interpretability method that assesses the alignment between the internal representations of a trained neural network and human-understandable, high-level concepts. Though effective, TCAV suffers from significant computational overhead, inter-layer disagreement of TCAV scores, and statistical instability. This work takes a step toward addressing these challenges by introducing E-TCAV, a framework for efficient approximation of TCAV scores, which is based on extensive investigation into three key aspects of the TCAV methodology: 1) the effect of latent classifiers on the stability of TCAV scores, 2) the inter-layer agreement of TCAV scores, and 3) the use of the penultimate layer as a fast proxy for earlier layers for TCAV computation. To ensure a solid foundation for E-TCAV, we conduct extensive evaluations across four different architectures and five datasets, encompassing problems from both computer vision and natural language domains. Our results show that the layers in the final block of the neural network strongly agree with the penultimate layer in terms of the TCAV scores, and the commonly observed variance of the TCAV scores can be attributed to the choice of the latent classifier. Leveraging this inter-layer agreement and the degeneracy of directional sensitivities at the penultimate layer, E-TCAV guarantees linearly scaling speed-ups with respect to the network's size and the number of evaluation samples, marking a step towards efficient model debugging and real-time concept-guided training.

2605.10257 2026-05-12 cs.AI

Towards Autonomous Railway Operations: A Semi-Hierarchical Deep Reinforcement Learning Approach to the Vehicle Rescheduling Problem

Alberto Castagna, Stefan Zahlner, Adrian Egli, Christian Eichenberger, Daniel Boos, Manuel Meyer, Anton Fuxjager

发表机构 * enliteAI SBB CFF FFS Flatland Association

AI总结 本文研究了如何通过半分层深度强化学习方法解决铁路车辆调度中的突发干扰问题,以提升铁路运营的自动化水平。该方法针对铁路操作中的调度与路径规划任务,设计了专门的动作和观测空间,使策略能够专注于不同层次的决策,从而有效应对调度决策少而路径更新频繁的问题。实验表明,该方法在协调性、资源利用率和系统鲁棒性方面优于传统启发式方法和单一强化学习方法,显著提高了列车到达目的地的数量,并在高密度交通下保持了较低的死锁率。

详情
英文摘要

Managing disruptions in railway traffic management is a major challenge. Rising traffic density and infrastructure limits increase complexity, making the Vehicle Routing and Scheduling Problem (VRSP) difficult to solve reliably and in real time. While Operational Research (OR) methods are widely used, most dispatching still relies on human expertise due to the problem's exponential combinatorial complexity. Reinforcement Learning (RL) has gained attention for its potential in multi-agent coordination, but existing RL approaches often underperform OR methods and struggle to scale in dense rail networks. This paper addresses this gap from a machine learning perspective by introducing a semi-hierarchical RL formulation tailored to operational railway constraints. The method separates dispatching from routing through dedicated action and observation spaces, enabling policies to specialise in distinct decision scopes and addressing the imbalance between rare dispatch decisions and frequent routing updates. The approach is evaluated on the Flatland-RL simulator across five difficulty levels and 50 random seeds, with 7 to 80 trains. Results show substantially improved coordination, resource utilisation, and robustness compared with heuristic baselines and monolithic RL, nearly doubling the number of trains reaching their destinations, while keeping deadlock rates below 5% and adaptively sequencing, delaying, or cancelling trains under heavy congestion.

2605.10256 2026-05-12 cs.SD cs.AI

A Cold Diffusion Approach for Percussive Dereverberation

Dimos Makris, András Barják, Maximos Kaliakatsos-Papakostas

发表机构 * Department of Music Technology(音乐技术系) Acoustics Hellenic Mediterranean University(声学希伯伦地中海大学)

AI总结 本文提出了一种用于打击乐去混响的冷扩散框架,针对当前音频去混响研究主要集中在语音而忽视打击乐信号的问题,通过将混响建模为从无混响信号到混响信号的确定性退化过程,逐步生成混响效果。研究引入了两种逆过程参数化方法,并采用UNet和扩散Transformer作为模型架构,在包含真实和电子鼓录音的数据集上进行训练与评估,实验表明该方法在多个指标上优于现有的基于分数和条件扩散的基线模型。

Comments Accepted for the 2026 IEEE World Congress on Computational Intelligence, IJCNN Track, 21-26 June 2026, Maastricht, the Netherlands

详情
英文摘要

Most recent advances in audio dereverberation focus almost exclusively on speech, leaving percussive and drum signals largely unexplored despite their importance in music production. Percussive dereverberation poses distinct challenges due to sharp transients and dense temporal structure. In this work, we propose a cold diffusion framework for dereverberating stereo drum stems (downmixes), modeling reverberation as a deterministic degradation process that progressively transforms anechoic signals into reverberant ones. We investigate two reverse-process parameterizations, Direct (next-state) and a Delta-normalized residual (velocity-style) prediction, and implement the framework using both a UNet and a diffusion Transformer backbone. The models are trained and evaluated on curated datasets comprising both acoustic and electronic drum recordings, with reverberation generated using a combination of synthetic and real room impulse responses. Extensive experiments on in-domain and fully out-of-domain test sets demonstrate that the proposed method consistently outperforms strong score-based and conditional diffusion baselines, evaluated using signal-based and perceptual metrics tailored to percussive audio.

2605.10251 2026-05-12 cs.CV

Efficient Hybrid CNN-GNN Architecture for Monocular Depth Estimation

Ishan Narayan

发表机构 * IMCS Lab, CSIR-CSIO(IMCS实验室,CSIR-CSIO)

AI总结 本文提出了一种名为GraphDepth的单目深度估计架构,通过在卷积编码器-解码器框架中引入图神经网络(GNN),有效建模了局部卷积难以捕捉的长距离空间关系。该方法在ResNet-101 U-Net主干网络的多尺度位置嵌入高效的GraphSAGE层,并结合通道注意力门控跳跃连接和异方差不确定性估计模块,提升了深度估计的精度与鲁棒性。实验表明,与基于Transformer的混合模型相比,GraphDepth在保持相近全局感受野的同时,计算效率更高,且在多个基准数据集上取得了优异的性能表现。

详情
英文摘要

We present GraphDepth, a monocular depth estimation architecture that synergistically integrates Graph Neural Networks (GNNs) within a convolutional encoder-decoder framework. Our approach embeds efficient GraphSAGE layers at multiple scales of a ResNet-101 U-Net backbone, enabling explicit modeling of long-range spatial relationships that lie beyond the receptive field of local convolutions. Key technical contributions include: (1) batch-parallelized graph construction with configurable k-NN and grid-based adjacency for scalable training; (2) multi-scale GraphSAGE integration at bottleneck and decoder stages (1/32, 1/16, 1/8 resolution) to propagate global context throughout the feature hierarchy; (3) channel-attention gated skip connections that adaptively weight encoder features before fusion; and (4) heteroscedastic uncertainty estimation via a dedicated aleatoric uncertainty head, enabling confidence-aware loss weighting during optimization. Unlike transformer-based hybrids, which suffer from quadratic complexity in sequence length, GraphDepth scales linearly with spatial resolution while achieving comparable global receptive fields through iterative message passing. Experiments on NYU Depth V2, WHU Aerial, ETH3D, and Mid-Air benchmarks demonstrate competitive accuracy within 4.6\% of state-of-the-art transformers on indoor scenes with substantially lower computational cost (25 FPS vs 9 FPS, 3.8 GB vs 8.8 GB VRAM). GraphDepth achieves the best reported result on WHU Aerial (RMSE 8.24 m) and exhibits superior zero-shot cross-domain transfer to the Mid-Air synthetic aerial dataset, validating the generalization power of explicit relational reasoning for depth estimation.

2605.10247 2026-05-12 cs.LG

Teaching LLMs to See Graphs: Unifying Text and Structural Reasoning

Dario Vajda

发表机构 * Faculty of Computer and Information Science University of Ljubljana(计算机与信息科学系卢布尔雅纳大学)

AI总结 本文研究如何使大语言模型(LLMs)更有效地处理图结构数据,提出了一种名为Graph Transformer Language Model(GTLM)的新架构,该模型通过在注意力模块中引入图感知的注意偏差,使LLM能够原生处理图结构,同时避免了传统方法中将文本属性压缩为单一标记所带来的语义瓶颈。GTLM参数效率极高,仅增加0.015%的参数即可实现与图神经网络(GNN)相当甚至更优的性能,并在多个图结构基准测试中表现出色,展示了其在图推理任务中的优越性。

详情
英文摘要

Using Large Language Models (LLMs) to process graph-structured data is an active research area, yet current state-of-the-art approaches typically rely on multi-step pipelines with Graph Neural Network (GNN) encoders that compress rich textual attributes into solitary tokens, creating a significant semantic bottleneck. In this paper, we introduce the Graph Transformer Language Model (GTLM), a novel architecture that enables pretrained LLMs to natively process graph topologies while entirely eliminating this compressive bottleneck. GTLM is exceptionally parameter-efficient: by injecting graph-aware attention biases directly into the LLM's attention modules, it introduces only 0.015% additional parameters relative to the base model. We theoretically prove that our bidirectional attention prefix preserves node permutation equivariance while maintaining exact backward compatibility with the pretrained base model. Extensive evaluations demonstrate that a 1B-parameter GTLM matches or exceeds the performance of 7B-parameter state-of-the-art models on standard Text-Attributed Graph benchmarks, while significantly surpassing baselines on GraphQA. Finally, we demonstrate that GTLM attention heads implicitly learn to simulate message passing, explaining its superior performance on algorithmic tasks. This paradigm shift enables true algorithmic reasoning within LLMs and provides a scalable foundation for next-generation GraphRAG and relational deep learning.

2605.10242 2026-05-12 cs.LG cs.AI

When Normality Shifts: Risk-Aware Test-Time Adaptation for Unsupervised Tabular Anomaly Detection

Wei Huang, Hezhe Qiao, Kailai Zhang, Zaisheng Ye, Yu-Ming Shang, Xiangling Fu

发表机构 * IEEE Publication Technology Group(IEEE出版技术组) Piscataway, NJ(新泽西州皮斯基塔威)

AI总结 本文研究了无监督表格异常检测中因训练数据有限导致的正常模式不完整问题,并提出了一个风险感知的测试时自适应方法RTTAD。该方法通过训练阶段的协作双任务学习建立鲁棒的正常先验,并在测试阶段引入测试时对比学习模块,利用高置信度的伪正常样本进行模型更新,同时抑制异常样本的影响,从而有效应对正常模式偏移问题。实验表明,RTTAD在15个表格数据集上取得了最先进的检测性能。

Comments 13 pages, 6 figures

详情
英文摘要

Unsupervised tabular anomaly detection methods typically learn feature patterns from normal samples during training and subsequently identify samples that deviate from these patterns as anomalies during testing. However, in practical scenarios, the limited scale and diversity of training data often lead to an incomplete characterization of normal patterns. While test-time adaptation offers a remedy, its isolated focus on test-time optimization ignores the critical synergy with training-phase learning. Furthermore, indiscriminate adaptation to unlabeled test data inevitably triggers anomaly contamination, preventing the model from fully realizing its discriminative capability between normal and anomalous samples. To address these issues, we propose RTTAD, a Risk-aware Test-time adaptation method for unsupervised Tabular Anomaly Detection. RTTAD holistically tackles normality shifts via a synergistic two-stage mechanism. During training, collaborative dual-task learning captures multi-level representations to establish a robust normal prior. During testing, a Test-Time Contrastive Learning (TTCL) module explicitly accounts for adaptation risk by selectively updating the model using high-confidence pseudo-normal samples while constraining anomalous ones. Additionally, TTCL incorporates a k-nearest neighbor-based contrastive objective to refine embedding distributions, thereby further enhancing the model's discriminative capacity. Extensive experiments on 15 tabular datasets demonstrate that RTTAD achieves state-of-the-art overall detection performance.

2605.10241 2026-05-12 cs.CL cs.LG

Building Korean linguistic resource for NLU data generation of banking app CS dialog system

Jeongwoo Yoon, On-yu Park, Changhoe Hwang, Gwanghoon Yoo, Eric Laporte, Jeesun Nam

发表机构 * DICORA, Hankuk University of Foreign Studies(DICORA,韩国外国语大学) Université Gustave Eiffel(古斯塔夫·伊费尔大学)

AI总结 本文旨在构建用于银行客户服务对话系统自然语言理解(NLU)的韩语标注训练数据,提出了一种名为FIAD的金融领域标注数据集,并基于银行应用评论语料库识别出韩语请求语句中的三种语言模式,利用局部语法图(LGGs)生成涵盖多种意图和实体的标注数据。实验表明,基于FIAD生成的数据训练的模型在意图和主题识别任务上取得了较高的准确率,验证了该资源的有效性。

Journal ref 29th International Conference on Computational Linguistics (COLING), Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning (Pan-DL), Oct 2022, Gyeongju, South Korea, pp.29-37

详情
英文摘要

Natural language understanding (NLU) is integral to task-oriented dialog systems, but demands a considerable amount of annotated training data to increase the coverage of diverse utterances. In this study, we report the construction of a linguistic resource named FIAD (Financial Annotated Dataset) and its use to generate a Korean annotated training data for NLU in the banking customer service (CS) domain. By an empirical examination of a corpus of banking app reviews, we identified three linguistic patterns occurring in Korean request utterances: TOPIC (ENTITY, FEATURE), EVENT, and DISCOURSE MARKER. We represented them in LGGs (Local Grammar Graphs) to generate annotated data covering diverse intents and entities. To assess the practicality of the resource, we evaluate the performances of DIET-only (Intent: 0.91 /Topic [entity+feature]: 0.83), DIET+ HANBERT (I:0.94/T:0.85), DIET+ KoBERT (I:0.94/T:0.86), and DIET+ KorBERT (I:0.95/T:0.84) models trained on FIAD-generated data to extract various types of semantic items.

2605.10237 2026-05-12 cs.LG

The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

Elisabetta Cornacchia, Dan Mikulincer, Elchanan Mossel

发表机构 * Bocconi University(博科尼大学) University of Washington(华盛顿大学) Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文研究了数据中的时间相关性如何使某些稀疏学习问题能够被梯度方法高效求解。研究聚焦于布尔k-juntas这一经典稀疏学习问题,发现当样本由超立方体上的懒惰随机游走生成时,使用带时间差分损失的两层ReLU网络进行训练,可以高效学习该问题,样本复杂度几乎与环境维度线性相关。相比之下,使用标准凸点wise损失的大批量梯度方法则无法获得相同优势。

Comments 10 pages main body, 3 figures

详情
英文摘要

We study how temporal correlations in the data can make certain sparse learning problems efficiently learnable by gradient-based methods. Our focus is on Boolean k-juntas, a canonical sparse learning problem known to pose barriers for gradient-based methods under independent uniform samples. We show that this picture changes when the samples are generated by a lazy random walk on the hypercube. In this setting, the temporal dependencies can be exploited by a two-layer ReLU network trained using stylized-SGD with a temporal-difference loss, which compares target and predicted increments across consecutive samples. For every fixed k, the resulting sample complexity is essentially linear in the ambient dimension d. By contrast, we show that for large-batch gradient methods using standard convex pointwise losses, temporal correlations do not provide the same advantage.

2605.10230 2026-05-12 cs.LG

FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

Qingchuan Zhang, He Cao, Hao Li, Yanjun Shao, Zhiyuan Liu, Shihang Wang, Shufang Xie, Shenghua Gao, Xinwu Ye

发表机构 * University of Science and Technology of China(中国科学技术大学) International Digital Economy Academy(国际数字经济学院) Peking University(北京大学) Yale University(耶鲁大学) National University of Singapore(新加坡国立大学) Macao Polytechnic University(澳门理工学院) Zhongguancun Academy(中关村学院) University of Hong Kong(香港大学)

AI总结 FORGE 是一种面向分子优化的两阶段框架,旨在通过局部编辑在保持分子结构相似性的前提下提升其性质。该方法利用自动挖掘的片段编辑对替代人工标注,第一阶段基于分子上下文对候选片段进行排序以注入化学先验知识,第二阶段生成具体的片段替换方案。FORGE 在多个基准测试中表现优于现有方法,展示了基于片段级监督的分子优化新路径。

详情
英文摘要

Molecular optimization seeks to improve a molecule through small structural edits while preserving similarity to the starting compound. Recent language-model approaches typically treat this task as prompt-conditioned sequence generation. However, relying on natural language introduces an inherent data-scaling bottleneck, often leads to chemical hallucinations, and ignores the strong context dependence of fragment effects. We present FORGE, a two-stage framework that reformulates molecular optimization as context-aware local editing. By utilizing automatically mined, verified low-to-high edit pairs instead of expensive human text annotations, Stage 1 ranks candidate fragments by their property contribution under the full molecular context to inject chemical prior, and Stage 2 generates explicit fragment replacements. Built on a compact 0.6B language model, FORGE further adapts to unseen black-box objectives through in-context demonstrations. Across Prompt-MolOpt, PMO-1k and ChemCoTBench, FORGE consistently outperforms prior methods, including substantially larger language models and graph methods. These results highlight the value of explicit fragment-level supervision as a more easily obtainable, scalable, and hallucination-less alternative to natural language training.

2605.10229 2026-05-12 cs.CV cs.CY

VPD-100K: Towards Generalizable and Fine-grained Visual Privacy Protection

Xiaobin Hu, Enpu Zuo, Lanping Hu, Kaiwen Yang, Dianshu Liao, Tianyi Zhang, Bo Yin, Yinsi Zhou, Shidong Pan, Xiaoyu Sun

发表机构 * National University of Singapore(新加坡国立大学) Australian National University(澳大利亚国立大学) New York University(纽约大学) The University of New South Wales(新南威尔士大学)

AI总结 随着视觉数据共享的普及,隐私保护成为一项重要需求,但现有隐私检测算法因缺乏全面数据集而面临挑战。为此,本文提出一个大规模、细粒度的视觉隐私数据集 VPD-100K,涵盖人类存在、屏幕上的个人身份信息、物理标识符和位置指示等四个领域,包含10万张图像和19万标注对象实例,具有长尾分布、小目标和高视觉复杂度等特点。同时,研究设计了一种基于频率增强的轻量模块,有效提升了对敏感信息细微特征的捕捉能力,实验表明该数据集和方法在多种基准测试中均表现出色。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
英文摘要

Privacy protection has become a critical requirement in the era of ubiquitous visual data sharing, imposing higher demands on efficient and robust privacy detection algorithms. However, current robust detection models are severely hindered by the lack of comprehensive datasets. Existing privacy-oriented datasets often suffer from limited scale, coarse-grained annotations, and narrow domain coverage, failing to capture the intricate details of sensitive information in realworld environments. To bridge this gap, we present a large-scale, fine-grained Visual Privacy Dataset (VPD-100K), designed to facilitate generalized privacy detection. We establish a holistic taxonomy comprising four primary domains: Human Presence, On-Screen Personally Identifiable Information (PII), Physical Identifiers, and Location Indicators, containing 100,000 images annotated with 33 fine-grained classes and over 190,000 object instances. Statistical analysis reveals that our dataset features long-tailed distributions, small object scales, and high visual complexity. These characteristics make the dataset particularly valuable for demanding, unconstrained applications such as live streaming, where actors frequently face unintentional, realtime information leakage. Furthermore, we design an effective frequency-enhanced lightweight module consisting of frequency-domain attention fusion and adaptive spectral gating mechanism that breaks the limitations of spatial pixel intensity to better capture the subtle details of sensitive information. Extensive experiments conducted on both diverse image and streaming videos benchmarks consistently demonstrate the effectiveness of our VPD-100K dataset and the wellcurated frequency mechanism. The code and dataset are available at https://vpd-100k.github.io/.

2605.10224 2026-05-12 cs.AI

Hypothesis-Driven Deep Research with Large Language Models: A Structured Methodology for Automated Knowledge Discovery

Michael Chin

发表机构 * Independent Researcher(独立研究者)

AI总结 本文提出了一种基于假设驱动的深度研究方法(HDRI),旨在通过将假设作为研究过程的组织工具,提升人工智能辅助科研的系统性和主动性。该方法引入了六项核心原则和八阶段流程,重点创新包括基于缺口驱动的迭代研究机制和可追溯的事实推理框架,从而实现自动化的知识发现与验证。实验表明,该方法在事实密度、主体匹配准确率和多源验证置信度等方面均有显著提升,并通过五个案例验证了其实际应用价值。

详情
英文摘要

Current AI-powered research systems adopt a direct search-then-summarize paradigm that treats hypotheses as end products of scientific discovery. We argue this leaves a critical gap: hypotheses can serve a far more powerful role as organizational instruments that structure the research process itself. We propose the Hypothesis-Driven Deep Research (HDRI) methodology - the first framework using hypotheses to organize general-purpose deep research across arbitrary domains, rather than merely validating claims within specific domains. This transforms research from reactive information retrieval into proactive, verifiable, and iterative knowledge discovery. HDRI is formalized with six core principles and an eight-stage pipeline. A central innovation is the gap-driven iterative research mechanism - a closed-loop quality assurance system that automatically identifies informational and logical gaps, triggering targeted supplementary investigation. We further introduce a fact reasoning framework with traceable reasoning chains and quantified confidence propagation, a subject locking mechanism to prevent entity confusion, and a multi-dimensional quality assessment scheme. The methodology is realized in the INFOMINER system. Experiments demonstrate improvements of 22.4% in fact density, 90% subject matching accuracy, 0.92 multi-source verification confidence, and 14% completeness gain from gap-driven supplementation. Five case studies validate its practical applicability, achieving an average quality rating of 4.46/5.0.

2605.10223 2026-05-12 cs.AI cs.SE

Beyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Execution

Kai Pan, Rong Hou

发表机构 * a2alab(a2alab实验室)

AI总结 当前大型语言模型代理框架过于强调自主性,缺乏企业级部署所需的安全可控机制。本文提出了一种动态分层的AgentRunner框架,通过风险自适应分层、权力分离架构和设计韧性机制,实现了在安全性与效率之间的帕累托最优平衡,为企业级AI执行提供了更安全、可控和可靠的解决方案。

Comments 9 pages, 2 figures, 3 tables

详情
英文摘要

Current large language model agent frameworks prioritize autonomy but lack the governability mechanisms required for enterprise deployment. High-risk write operations proceed without independent review, complex tasks lack acceptance verification, and computational resources are allocated uniformly regardless of risk level. We propose the Dynamic Tiered AgentRunner, a controlled execution protocol distilled from a production-grade multi-tenant SaaS platform. The framework introduces three core mechanisms: (1) Risk-Adaptive Tiering that dynamically allocates computational resources and review intensity based on task risk profiles, achieving Pareto-optimal trade-offs between safety and efficiency; (2) Separation of Powers architecture where proposal, review, execution, and verification are performed by independent agents with physically isolated boundaries; and (3) Resilience-by-Design through a Verifier-Recovery closed loop that treats failure as a first-class system state. We formalize the tier selectio

2605.10218 2026-05-12 cs.CL

Relative Score Policy Optimization for Diffusion Language Models

Zichao Yu, Shengze Xu, Bingqing Jiang, Wenyi Zhang, Difan Zou

发表机构 * University of Science and Technology of China(中国科学技术大学) The Chinese University of Hong Kong(香港中文大学) The University of Hong Kong(香港大学)

AI总结 扩散语言模型(dLLMs)在并行和高效文本生成方面具有潜力,但其推理能力的提升需要有效的后训练方法。传统基于可验证奖励的强化学习(RLVR)方法因缺乏可计算的序列级对数比率而难以直接应用于dLLMs,导致依赖高方差的ELBO近似,影响训练稳定性。本文提出了一种新的RLVR方法——相对得分策略优化(RSPO),通过将奖励优势解释为当前策略与参考策略之间的相对对数比率目标,从而校准噪声估计,提升策略更新的准确性。实验表明,RSPO在规划任务中表现出显著优势,在数学推理任务中也具有竞争力。

详情
英文摘要

Diffusion large language models (dLLMs) offer a promising route to parallel and efficient text generation, but improving their reasoning ability requires effective post-training. Reinforcement learning with verifiable rewards (RLVR) is a natural choice for this purpose, yet its application to dLLMs is hindered by the absence of tractable sequence-level log-ratios, which are central to standard policy optimization. The lack of tractable sequence-level log-ratios forces existing methods to rely on high-variance ELBO-based approximations, where high verifier rewards can amplify inaccurate score estimates and destabilize RL training. To overcome this issue, we propose \textbf{R}elative \textbf{S}core \textbf{P}olicy \textbf{O}ptimization (RSPO), a simple RLVR method that uses verifiable rewards to calibrate noisy likelihood estimates in dLLMs. The core of our algorithm relies on a key observation: a reward advantage can be interpreted not only as an update direction, but also as a target for the relative log-ratio between the current and reference policies. Accordingly, RSPO calibrates this noisy relative log-ratio estimate by comparing its reward advantage with the reward-implied target relative log-ratio, updating the policy according to the gap between the current estimate and the target rather than the raw advantage alone. Experiments on mathematical reasoning and planning benchmarks show that RSPO yields especially strong gains on planning tasks and competitive mathematical-reasoning performance.