arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 4033
专题追踪 全部专题
2605.10216 2026-05-12 cs.CL

The Impact of Editorial Intervention on Detecting Native Language Traces

Ahmet Yavuz Uluslu, Mark Gales, Kate Knill, Gerold Schneider

发表机构 * University of Cambridge(剑桥大学) University of Zurich(苏黎世大学)

AI总结 本文研究了编辑干预对识别作者母语痕迹的影响,探讨在不同程度的语法纠错和改写处理下,母语识别模型的鲁棒性。研究发现,母语特征不仅依赖于表面语法错误,还涉及词汇语义选择、语用迁移和文化视角等深层因素,而轻微编辑能够保留这些特征,保持较高的识别准确率,而过度改写则会显著削弱模型性能。

详情
英文摘要

Native Language Identification (NLI) is the task of determining an author's native language (L1) from their non-native writings. With the advent of human-AI co-authorship, non-native texts are routinely corrected and rewritten by large language models, fundamentally altering the linguistic features NLI models depend on. In this paper, we investigate the robustness of L1 traces across increasing degrees of editorial intervention. By processing 450 essays from the Write & Improve 2024 corpus through varying levels of grammatical error correction (GEC) and paraphrasing, we demonstrate that L1 attribution does not entirely depend on surface-level errors. Instead, the detection models leverage deeper L1 features: unidiomatic lexico-semantic choices, pragmatic transfer, and the author's underlying cultural perspective. We find that minimal edits preserve these structural traces and maintain high profiling accuracy. In contrast, fluency edits and paraphrasing normalize these L1 features, leading to a severe degradation in performance.

2605.10211 2026-05-12 cs.CL cs.AI cs.IR

To Redact, or not to Redact? A Local LLM Approach to Deliberative Process Privilege Classification

Maik Larooij, David Graus

发表机构 * University of Amsterdam(阿姆斯特丹大学)

AI总结 该研究针对政府文件中需脱敏发布的“审议过程特权”信息,提出了一种基于本地大语言模型的自动分类方法,以替代依赖第三方云API的处理方式。研究采用Qwen3.5 9B等小型模型,在消费级硬件上实现高精度分类,并通过结合思维链提示和基于错误示例的少样本提示,显著提升了召回率和F2分数,性能接近商业模型Gemini 2.5 Flash。分析表明,审议性内容常包含第一人称和表达意见的动词,这些语言特征是分类的关键依据。

Comments Accepted to The First Workshop on Artificial Intelligence & Open Government at the 21st International Conference on Artificial Intelligence and Law (ICAIL), June 8, 2026, Singapore

详情
英文摘要

Government transparency laws, like the Freedom of Information (FOIA) acts in the United States and United Kingdom, and the Woo (Open Government Act) in the Netherlands, grant citizens the right to directly request documents from the government. As these documents might contain sensitive information, such as personal information or threats to national security, the laws allow governments to redact sensitive parts of the documents prior to release. We build on prior research to perform automatic sensitivity classification for the FOIA Exemption 5 deliberative process privilege using Large Language Models (LLMs). However, processing documents not yet cleared for review via third-party cloud APIs is often legally or politically untenable. Therefore, in this work, we perform sensitivity classification with a small, local model, deployable on consumer-grade hardware (Qwen3.5 9B). We compare eight variants of applying LLMs for sentence classification, using well-known prompting techniques, and find that a combination of Chain-of-Thought prompting and few-shot prompting with error-based examples outperforms classification models of earlier work in terms of recall and F2 score. This method also closely approaches the performance of a widely-used, cost-efficient commercial model (Gemini 2.5 Flash). In an additional analysis, we find that sentences that are predicted as deliberative contain more verbs that indicate the expression of opinions, and are more often phrased in in first-person. Above all, deliberativeness seems characterized by the presence of a combination of multiple indicators, in particular the combination of first-person words with a verb for expressing opinion.

2605.10210 2026-05-12 cs.RO cs.CV

Nano-U: Efficient Terrain Segmentation for Tiny Robot Navigation

Federico Pizzolato, Francesco Pasti, Nicola Bellotto

发表机构 * Dept of Information Engineering, University of Padua(信息工程系,帕多瓦大学)

AI总结 本文研究了如何在微型机器人上实现高效的地形分割,以支持其在户外非结构化环境中的自主导航。为了解决现有模型在资源受限的微控制器上部署困难的问题,作者提出了一种名为 Nano-U 的轻量二值分割网络,并结合量化感知蒸馏方法进行训练,显著提升了模型性能。该模型在多个数据集上表现优异,并通过改进的编译器工具链成功部署在低成本微控制器上,实现了低功耗、低延迟的实时地形感知。

Comments Code repository: https://github.com/federico-pizz/Nano-U

详情
英文摘要

Terrain segmentation is a fundamental capability for autonomous mobile robots operating in unstructured outdoor environments. However, state-of-the-art models are incompatible with the memory and compute constraints typical of microcontrollers, limiting scalable deployment in small robotics platforms. To address this gap, we develop a complete framework for robust binary terrain segmentation on a low-cost microcontroller. At the core of our approach we design Nano-U, a highly compact binary segmentation network with a few thousand parameters. To compensate for the network's minimal capacity, we train Nano-U via Quantization-Aware Distillation (QAD), combining knowledge distillation and quantization-aware training. This allows the final quantized model to achieve excellent results on the Botanic Garden dataset and to perform very well on TinyAgri, a custom agricultural field dataset with more challenging scenes. We deploy the quantized Nano-U on a commodity microcontroller by extending MicroFlow, a compiler-based inference engine for TinyML implemented in Rust. By eliminating interpreter overhead and dynamic memory allocation, the quantized model executes on an ESP32-S3 with a minimal memory footprint and low latency. This compiler-based execution demonstrates a viable and energy-efficient solution for perception on low-cost robotic platforms.

2605.10205 2026-05-12 cs.LG

Unveiling High-Probability Generalization in Decentralized SGD

Jiahuan Wang, Ping Luo, Ziqing Wen, Dongsheng Li, Tao Sun

发表机构 * College of Computer Science and Technology(计算机科学与技术学院)

AI总结 本文研究了去中心化随机梯度下降(D-SGD)在大规模分布式学习中的泛化性能,旨在填补传统SGD与D-SGD在高概率泛化界上的理论差距。作者提出了基于点态均匀稳定性的学习理论,推导出D-SGD在凸、强凸和非凸设置下的高概率泛化界,达到了最优的$\mathcal{O}\left(\frac{1}{\sqrt{mn}}\log (1/δ)\right)$收敛速率,并分析了非凸场景下的梯度基度量和优化误差界。研究还考虑了通信开销,分析了时变框架下本地模型的泛化性能。

详情
英文摘要

Decentralized stochastic gradient descent (D-SGD) is an efficient method for large-scale distributed learning. Existing generalization studies mainly address expected results, achieving rates limited to $\mathcal{O}\left(\frac{1}{δ\sqrt{mn}}\right)$, where $δ$ is the confidence parameter, $m$ the number of workers, and $n$ the sample size. When $m=1$, D-SGD reduces to traditional SGD, whose optimal high-probability generalization bound is $\mathcal{O}\left(\frac{1}{\sqrt{n}}\log (1/δ)\right)$. This discrepancy reveals a gap between high-probability guarantees for SGD and those for D-SGD. To close this, we develop a high-probability learning theory for D-SGD, aiming for the optimal $\mathcal{O}\left(\frac{1}{\sqrt{mn}}\log (1/δ)\right)$ rate. We refine bounds for D-SGD using pointwise uniform stability in distributed learning-a weaker notion than uniform stability-and analyze them across convex, strongly convex, and non-convex settings. We also provide high-probability results for gradient-based measures in non-convex cases where only local minima exist, and derive optimization error and excess risk bounds. Finally, accounting for communication overhead, we analyze generalization bounds for local models within time-varying frameworks.

2605.10204 2026-05-12 cs.CV

3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects

Zhicheng Liang, Haoyi Yu, Boyan Li, Dayou Zhang, Zijian Cao, Tianyi Gong, Junhua Liu, Shuguang Cui, Fangxin Wang

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Capital Normal University(首都师范大学) University of Southern California(南加州大学)

AI总结 本文介绍了3DReflecNet,一个专为重建具有反射、透明和低纹理表面物体的3D视觉方法而设计的大规模数据集。该数据集包含超过12万个基于物理渲染的合成样本和1000多个使用消费级设备采集的真实物体,总数据量超过22TB,涵盖了多种材质、复杂光照条件和几何形态。研究还设计了五个核心任务的基准测试,揭示了现有方法在处理这类复杂材料时的性能局限,推动了更鲁棒的3D视觉模型的发展。

Comments This paper has been accepted by CVPR 2026 Oral

详情
英文摘要

Accurate 3D reconstruction of objects with reflective, transparent, or low-texture surfaces still remains notoriously challenging. Such materials often violate key assumptions in multi-view reconstruction pipelines, such as photometric consistency and the availability on distinct geometric texture cues. Existing datasets primarily focus on diffuse, textured objects, and therefore provide limited insight into performance under real-world material complexities. We introduce 3DReflecNet, a large-scale hybrid dataset exceeding 22 TB that is specifically designed to benchmark and advance 3D vision methods for these challenging materials. 3DReflecNet combines two types of data: over 120,000 synthetic instances generated via physically-based rendering of more than 12,000 shapes, and over 1,000 real-world objects captured using consumer devices. Together, these data consist of more than 7 million multi-view frames. The dataset spans diverse materials, complex lighting conditions, and a wide range of geometric forms, including shapes generated from both real and LLM-synthesized 2D images using diffusion-based pipelines. To support robust evaluation, we design benchmarks for five core tasks: image matching, structure-from-motion, novel view synthesis, reflection removal, and relighting. Extensive experiments demonstrate that state-of-the-art methods struggle to maintain accuracy across these settings, highlighting the need for more resilient 3D vision models.

2605.10203 2026-05-12 cs.SD eess.AS

Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration

Haowen Li, Tianxiang Li, Yi Yang, Boyu Cao, Qi Liu

发表机构 * School of Future Technology, South China University of Technology, Guangzhou, China.(未来技术学院,华南理工大学,广州,中国)

AI总结 该研究提出了一种名为Polyphonia的零样本音色迁移框架,旨在解决多声部音乐中对特定音轨进行音色编辑时背景伴奏易被破坏的问题。其核心方法是引入基于声学信息的注意力校准机制,通过概率声学先验建立粗略边界,从而在保持非目标音轨语义完整性的同时,更精确地定位并修改目标音轨。实验表明,该方法在目标音轨对齐度上比现有方法提升了15.5%,同时保持了较高的音乐保真度和非目标音轨的完整性。

Comments Accepted by ICML 2026

详情
英文摘要

The advancement of diffusion-based text-to-music generation has opened new avenues for zero-shot music editing. However, existing methods fail to achieve stem-specific timbre transfer, which requires altering specific stems while strictly preserving the background accompaniment. This limitation severely hinders practical application, since real-world production necessitates precise manipulation of components within dense mixtures. Our key finding is that, while vanilla cross-attention captures semantic features of stems, it lacks the spectral resolution to strictly localize targets in dense mixtures, leading to boundary leakage. To resolve this dilemma, we propose Polyphonia, a zero-shot editing framework with Acoustic-Informed Attention Calibration. Rather than relying solely on diffuse semantic attention, Polyphonia leverages a probabilistic acoustic prior to establish coarse boundaries, enabling non-target stems preserved precise semantic synthesis. For evaluation, we propose PolyEvalPrompts, a standardized prompt set with 1,170 timbre transfer tasks in polyphonic music. Specifically, Polyphonia achieves an increase of 15.5% in target alignment compared to baselines, while maintaining competitive music fidelity and non-target integrity.

2605.10202 2026-05-12 cs.LG cs.CL

Task-Aware Calibration: Provably Optimal Decoding in LLMs

Tim Tomov, Dominik Fuchsgruber, Rajeev Verma, Stephan Günnemann

发表机构 * School of Computation, Information & Technology, Technical University of Munich(慕尼黑技术大学计算、信息与技术学院) Munich Data Science Institute(慕尼黑数据科学研究所) Munich Center for Machine Learning(慕尼黑机器学习中心) University of Amsterdam(阿姆斯特丹大学)

AI总结 本文研究了大语言模型(LLM)解码过程中因模型预测分布与真实生成分布不一致而导致的次优决策问题。作者提出了一种任务感知校准(Task Calibration)方法,通过在任务诱导的潜在空间中对模型预测分布进行校准,从而实现更优的解码策略。该方法基于最小贝叶斯风险(MBR)解码理论,证明了在任务校准后的潜在分布上进行解码能够获得最优的生成效果,并引入任务校准误差(TCE)作为衡量校准质量的指标,实验表明该方法在多个任务上有效提升了生成质量。

详情
英文摘要

LLM decoding often relies on the model's predictive distribution to generate an output. Consequently, misalignment with respect to the true generating distribution leads to suboptimal decisions in practice. While a natural solution is to calibrate the model's output distribution, for LLMs, this is ill-posed at the combinatorially vast level of free-form language. We address this by building on the insight that in many tasks, these free-form outputs can be interpreted in a semantically meaningful latent structure, for example, discrete class labels, integers, or sets. We introduce task calibration as a paradigm to calibrate the model's predictive distribution in the task-induced latent space. We apply a decision-theoretic result to show that Minimum Bayes Risk (MBR) decoding on the task-calibrated latent distribution is the optimal decoding strategy on latent model beliefs. Empirically, it consistently improves generation quality across different tasks and baselines. We also introduce Task Calibration Error (TCE), an application-aware calibration metric that quantifies the excess loss due to miscalibration. Our work demonstrates that task calibration enables more reliable model decisions across various tasks and applications.

2605.10199 2026-05-12 cs.CL eess.AS

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Hui Lu, Xueyuan Chen, Huimeng Wang, Shuhai Peng, Shiyin Kang, Xixin Wu, Zhiyong Wu

发表机构 * The Chinese University of Hong Kong(香港中文大学) SenseTime Research(商汤研究院) Tsinghua University(清华大学)

AI总结 本文研究了在全双工语音对话中,大语言模型(LLM)如何在生成自身语音响应的同时持续监听用户输入的问题。作者提出用户流在LLM中的路由方式是影响系统性能的关键架构问题,并设计了两种路由策略进行对比:一种是直接将用户流注入模型输入,另一种是通过交叉注意力机制访问外部记忆。实验表明,直接注入方式在语义理解和问答任务中表现更优,但在用户打断等场景下容易导致上下文混乱;而交叉注意力路由虽然问答性能稍逊,但能更好地保持生成上下文的稳定性,更具鲁棒性。研究为全双工语音对话系统的设计提供了重要的指导。

详情
英文摘要

Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language models (LLMs), which are designed to extend a single coherent sequence and do not naturally support user input arriving during generation. We argue that how the user stream is routed into the LLM is therefore a key architectural question for full-duplex modeling. To study this question, we extend a text-only LLM into a unified full-duplex spoken dialogue system and compare two routing strategies under a shared training pipeline: (i) channel fusion, which injects the user stream directly into the LLM input, and (ii) cross-attention routing, which keeps the user stream as external memory accessed through cross-attention adapters. Experiments on spoken question answering and full-duplex interaction benchmarks reveal a clear tradeoff. Channel fusion yields stronger semantic grounding and consistently better question-answering performance. However, under semantically overlapping conditions such as user interruptions, it is more vulnerable to context corruption: if the model fails to stop in time, the overlapping user stream can interfere with ongoing generation and lead to semantically incoherent continuations. Cross-attention routing underperforms on question answering, but better preserves the LLM generation context and is more robust to this failure mode. These results establish user-stream routing as a central design axis in full-duplex spoken dialogue and offer practical guidance on the tradeoff between semantic integration and context robustness. We provide a demo page for qualitative inspection.

2605.10198 2026-05-12 cs.LG cs.AI

Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models

Nicola Novello, Andrea M. Tonello

发表机构 * University of Klagenfurt(克雷格弗尔特大学)

AI总结 本文研究如何从文本到图像的扩散模型中去除特定概念,以避免生成受版权保护或不适当的内容。为了解决现有封闭形式概念去除方法在大模型上效果下降的问题,作者提出了一种基于稀疏交叉注意力的高效概念去除方法SPACE,通过迭代更新模型的交叉注意力参数,同时实现概念去除和参数稀疏化,显著提升了去除效果和模型鲁棒性,并大幅降低了存储需求。

详情
英文摘要

Erasing specific concepts from text-to-image diffusion models is essential for avoiding the generation of copyrighted and explicit content. Closed-form concept erasure methods offer a fast alternative to backpropagation-based techniques, but they become less effective when scaling from smaller models such as Stable Diffusion 1.5 to larger models like Stable Diffusion XL. To maintain erasure effectiveness in these larger-scale architectures, we propose SParse cross-Attention-based Concept Erasure (SPACE). SPACE iteratively modifies the cross-attention parameters of a model with a closed-form update that jointly induces sparsity and erases target concepts. By concentrating the concept mapping to a lower-dimensional subspace, SPACE achieves superior erasure efficacy compared to dense baselines. Extensive experimental results show improvements in erasure effectiveness and robustness against adversarial prompts. Furthermore, SPACE achieves 80\%-90\% cross-attention sparsity, reducing the storage requirements for saving the modified parameters by 70\%, demonstrating its memory efficiency.

2605.10196 2026-05-12 cs.LG

Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments

Andrea Rubbi, Arpit Merchant, Samuel Ogden, Amir Akbarnejad, Pietro Liò, Sattar Vakili, Mo Lotfollahi

发表机构 * Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK(韦尔科姆桑格研究所,韦尔科姆基因组校园,英国辛顿) Cambridge Center for AI in Medicine, University of Cambridge, Cambridge, UK(剑桥人工智能医学中心,剑桥大学,剑桥,英国) Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK(剑桥干细胞研究所,剑桥大学,剑桥,英国) Department of Computer Science and Technology, University of Cambridge, Cambridge, UK(剑桥计算机科学与技术系,剑桥大学,剑桥,英国) MediaTek Research, Cambridge, UK(联发科研究,剑桥,英国)

AI总结 该研究针对高通量基因扰动实验中如何高效发现具有显著表型效应的干预策略这一问题,提出了一种基于概率的主动实验设计方法。核心方法是引入“Probability-of-Hit”获取函数,通过后验概率直接评估候选扰动是否超过预设效应阈值,从而更高效地识别有效干预。该方法在合成数据和真实生物数据上均表现出优越性能,相比基线方法在某些数据集上提升了6.4%的效果。

Comments To be published in International Conference on Machine Learning (ICML) 2026

详情
英文摘要

High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both synthetic benchmarks and real biological immunology datasets, including up to 6.4% improvement over baselines on the Schmidt IL-2 dataset.

2605.10194 2026-05-12 cs.AI cs.LG

TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment

Jiaxuan Wang, Xuan Ouyang, Zhiyu Chen, Yulan Hu, Zheng Pan, Xin Li, Lan-Zhe Guo

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University(南京大学新型软件技术国家重点实验室) School of Intelligence Science and Technology, Nanjing University(南京大学智能科学与技术学院) AMAP, Alibaba Group(阿里集团AMAP) University of Wisconsin–Madison(威斯康星大学麦迪逊分校) Tsinghua University(清华大学)

AI总结 本文提出了一种名为TRACE的新型策略,用于改进基于验证奖励的强化学习中的自蒸馏方法。该方法通过仅在注释者标记的关键推理片段上进行对齐,有效减少了冗余梯度更新和特权信息泄露的问题。TRACE结合了正向KL散度、反向KL散度和GRPO等技术,并在训练初期逐步减少KL通道的影响。实验表明,TRACE在多个数学基准测试中优于现有方法,同时保持了模型在分布外任务上的性能,展示了其在提升推理能力和泛化能力方面的有效性。

Comments work in progress

详情
英文摘要

On-policy self-distillation (self-OPD) densifies reinforcement learning with verifiable rewards (RLVR) by letting a policy teach itself under privileged context. We find that when this guidance spans the full response, all-token KL spends gradients on mostly redundant positions and amplifies privileged-information leakage, causing entropy rise, shortened reasoning, and out-of-distribution degradation in long-horizon math training. We propose Token-Routed Alignment for Critical rEasoning (TRACE), which distills only on annotator-marked critical spans: forward KL on key spans of correct rollouts, optional reverse KL on localized error spans, and GRPO on all remaining tokens, with the KL channel annealed away after a short warm-up. Our analysis explains TRACE through two effects: forward KL provides non-vanishing lift to teacher-supported tokens that the student under-allocates, while span masking and decay keep cumulative privileged-gradient exposure finite. On four held-out math benchmarks plus GPQA-Diamond, TRACE improves over GRPO by 2.76 percentage points on average and preserves the Qwen3-8B base OOD score on GPQA-Diamond, where GRPO and all-token self-OPD baselines degrade. Gains persist under online self-annotation (+1.90 percentage points, about 69% of the strong-API gain), reducing the concern that TRACE merely imports external annotator capability. Across scales, the best routed action is base-dependent: on Qwen3-8B it is forward KL on key spans, while on Qwen3-1.7B it shifts to reverse KL on error spans.

2605.10190 2026-05-12 cs.CV

DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer

Soichiro Okazaki, Tatsuya Sasaki, Hiroki Ohashi

发表机构 * Hitachi, Ltd. Research and Development Group(日立株式会社研究开发集团)

AI总结 DetRefiner 是一种用于开放词汇目标检测的模型无关检测优化框架,旨在提升对已见和未见类别的检测性能。该方法通过轻量级的 Transformer 编码器融合全局图像特征和局部图像块特征,生成属性可靠性信息以校准基础检测模型的置信度。DetRefiner 不依赖于基础模型的内部特征或重新训练,仅在推理阶段对检测结果进行辅助校准,显著提升了多个开放词汇检测模型在多个数据集上的性能,尤其在未见类别上取得了最高达 +10.1 AP 的提升。

Comments CVPR 2026 Findings

详情
英文摘要

Open-vocabulary object detection (OVOD) aims to detect both seen and unseen categories, yet existing methods often struggle to generalize to novel objects due to limited integration of global and local contextual cues. We propose DetRefiner, a simple yet effective plug-and-play framework that learns to fuse global and local features to refine open-vocabulary detection. DetRefiner processes global image features and patch-level image features from foundational models (e.g., DINOv3) through a lightweight Transformer encoder. The encoder produces a class vector capturing image-level attributes and patch vectors representing local region attributes, from which attribute reliability is inferred to recalibrate the base model's confidence. Notably, DetRefiner is trained independently of the base OVOD model, requiring neither access to its internal features nor retraining. At inference, it operates solely on the base detector's predictions, producing auxiliary calibration scores that are merged with the base detector's scores to yield the final refined confidence. Despite this simplicity, DetRefiner consistently enhances multiple OVOD models across COCO, LVIS, ODinW13, and Pascal VOC, achieving gains of up to +10.1 AP on novel categories. These results highlight that learning to fuse global and local representations offers a powerful and general mechanism for advancing open-world object detection. Our codes and models are available at https://github.com/hitachi-rd-cv/detrefiner.

2605.10189 2026-05-12 cs.LG cs.AI

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Yulin Zhang, He Cao, Zihao Jiang, Chenyi Zi, Zhipeng Zhou, Zijing Liu, Yu Li, Jia Li, Ziqi Gao

发表机构 * Tsinghua University(清华大学) International Digital Economy Academy(国际数字经济学院) Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州)) Nanyang Technological University(南洋理工大学)

AI总结 本文提出了一种名为ProteinOPD的多目标偏好对齐框架,旨在解决蛋白质设计中偏好引导与保持模型原始设计能力之间的矛盾。该方法借鉴了On-Policy Distillation(OPD)的思想,通过在学生模型的轨迹上进行标记级的知识蒸馏,将多个偏好目标的教师模型知识整合到一个共享的学生模型中,从而在保持蛋白质语言模型设计能力的同时,有效平衡多个竞争目标。实验表明,ProteinOPD在提升目标偏好性能的同时,显著加快了训练速度,优于基于强化学习的对齐方法。

详情
英文摘要

Designing proteins with desired functions or properties represents a core goal in synthetic biology and drug discovery. Recent advances in protein language models (PLMs) have enabled the generation of highly designable protein sequences, while preference alignment provides a promising way to steer designs toward desired functions and properties. Nevertheless, they often trigger catastrophic forgetting of pretrained knowledge, degrading basic designability and failing to balance multiple competing objectives. To address these issues, we draw inspiration from On-Policy Distillation (OPD), an advanced post-training method renowned for mitigating catastrophic forgetting through its mode-seeking nature. In this work, we propose ProteinOPD, a multi-objective preference alignment framework that can effectively balance multiple preference objectives while maintaining the inherent designability of PLMs. ProteinOPD adapts a pretrained PLM into preference-specific teachers and distills their knowledge into a shared student via token-level OPD on the student's own trajectories. During this process, the student is aligned to a unique normalized geometric consensus of weighted teachers while ensuring bounded optimization under conflicts. This bridges the gap for OPD in multi-objective/teacher alignment. Extensive experiments show that ProteinOPD achieves substantial gains on target preference objectives without compromising the designability, with an 8x training speedup over RL-based alignment competitors.

2605.10186 2026-05-12 cs.CL cs.AI

LegalCiteBench: Evaluating Citation Reliability in Legal Language Models

Sijia Chen, Hang Yin, Shunfan Zhou

发表机构 * Northeastern University(东北大学) Phala

AI总结 该论文提出了一个名为 LegalCiteBench 的基准,用于评估法律语言模型在无外部信息支持下的引用可靠性问题。研究发现,即使是最强大的模型在闭卷设置下也难以准确恢复或生成正确的法律引用,错误率高达94%以上。该基准包含五个以引用为核心的任务,旨在诊断模型在缺乏外部依据时生成错误引用、验证引用准确性以及放弃回答的行为。

Comments Preprint. 23 pages including references and appendices

详情
英文摘要

Large language models (LLMs) are increasingly integrated into legal drafting and research workflows, where incorrect citations or fabricated precedents can cause serious professional harm. Existing legal benchmarks largely emphasize statutory reasoning, contract understanding, or general legal question answering, but they do not directly study a central common-law failure mode: when asked to provide case authorities without external grounding, models may return plausible-looking but incorrect citations or cases. We introduce LegalCiteBench, a benchmark for studying closed-book citation recovery, citation verification, and case matching in legal language models. LegalCiteBench contains approximately 24K evaluation instances constructed from 1,000 real U.S. judicial opinions from the Case Law Access Project. The benchmark covers five citation-centric tasks: citation retrieval, citation completion, citation error detection, case matching, and case verification and correction. Across 21 LLMs, exact citation recovery remains highly challenging in this closed-book setting: even the strongest models score below 7/100 on citation retrieval and completion. Within the evaluated models, scale and legal-domain pretraining provide limited gains and do not resolve this difficulty. Models also frequently provide concrete but incorrect or low-overlap authorities under our evaluation protocol, with Misleading Answer Rates (MAR) exceeding 94% for 20 of 21 evaluated models on retrieval-heavy tasks. A prompt-only abstention experiment shows that explicit uncertainty instructions reduce some confident fabrication but do not improve citation correctness. LegalCiteBench is intended as a diagnostic framework for studying authority generation failures, verification behavior, and abstention when external grounding is absent, incomplete, or bypassed.

2605.10184 2026-05-12 cs.CV cs.AI

Developing a foundation model for high-resolution remote sensing data of the Netherlands

Paul Vermeeren, Heysem Kaya

发表机构 * Utrecht University, Department of Information and Computing Sciences(乌得勒支大学信息与计算科学系)

AI总结 本文提出了一种基于荷兰高分辨率(1.2米)卫星影像的基座模型,结合卷积神经网络与视觉Transformer,以同时捕捉景观的细纹理、边缘、小物体以及大范围地形结构、高程模式和土地覆盖分布等特征。通过引入时间序列数据,模型能够学习跨时间的上下文信息,提升对地形特征、土地覆盖变化和季节动态等时序依赖关系的建模能力,从而减少特征歧义、增强表征学习并提高小样本下的泛化性能。实验表明,该模型在荷兰植被监测等任务中表现优异,并在多个全球基准数据集上取得了与先进模型相当的性能,展现了在有限数据和参数规模下学习通用表征的能力。

Comments 9 pages, 4 figures, under review in a journal

详情
英文摘要

We develop a foundation model using 1.2m high resolution satellite images of the Netherlands. By combining a Convolutional Neural Network and a Vision Transformer, the model captures both low- and high-frequency landscape features, such as fine textures, edges, and small objects as well as large terrain structures, elevation patterns, and land-cover distributions. Leveraging temporal data as input, the model learns from broader contextual information across time, allowing the model to exploit the temporal dependencies, such as topographic features, land-cover changes, and seasonal dynamics. These additional constraints reduce feature ambiguity, improve representation learning, and enable better generalization with fewer labeled samples. The foundation model is evaluated on multiple downstream tasks, ranging from use cases within the Netherlands to global benchmarking datasets. On the vegetation monitoring dataset of the Netherlands, the model shows clear performance improvements by incorporating temporal information instead of relying on a single time point. Despite using a smaller model and less pretraining data limited to the Netherlands, it achieves competitive results on global benchmarks when compared to state-of-the-art models. These results demonstrate that the model can learn rich, generalizable representations from limited data, achieving competitive performance on global benchmarks while using a fraction of the parameters of larger state-of-the-art remote sensing models. To maximize reproducibility and reuse, we made the scripts and the model accessible on GitHub.

2605.10183 2026-05-12 cs.LG

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

Jinping Wang, Qinhan Liu, Zhiwu Xie, Zhiqiang Gao

发表机构 * CSMT, Wenzhou-Kean University(温州肯恩大学计算机科学与技术学院) International Frontier Interdisciplinary Research Institute, Wenzhou-Kean University(温州肯恩大学国际前沿交叉学科研究院)

AI总结 该论文重新审视了尖锐度感知最小化(SAM)方法中的损失与扰动半径之间的不匹配问题,提出了一种新的方法——损失均衡SAM(LE-SAM),通过固定损失空间预算而非传统固定的参数空间扰动半径,有效削弱梯度模长主导的学习信号,使优化过程更关注曲率主导的平坦极小值。实验表明,LE-SAM在多个基准任务中表现出更强的泛化能力,优于原始SAM及其变体,达到了当前最优性能。

Comments Accepted by ICML2026

详情
英文摘要

Sharpness-Aware Minimization (SAM) improves generalization by minimizing the worst-case loss within a fixed parameter-space radius neighborhood. SAM and its variants mainly rely on a first-order linearized surrogate, while flat minima are inherently a second-order (curvature) notion.We revisit this mismatch and propose Loss-Equated SAM (LE-SAM), which inverts the traditional SAM mechanism that fixed perturbation radius with a fixed loss-space budget,effectively removing gradient-norm-dominated learning signals and shifting optimization toward curvature-dominated terms. Extensive experiments across diverse benchmarks and tasks demonstrate the strong generalization ability of LESAM that consistently outperforms SAM and even its variants, achieving the state-of-the-art performance.

2605.10179 2026-05-12 cs.LG cs.AI

One-Step Graph-Structured Neural Flows for Irregular Multivariate Time Series Classification

Mengzhou Gao, Kaiwei Wang, Pengfei Jiao

发表机构 * School of Cyberspace, Hangzhou Dianzi University(杭州电子科技大学信息学院)

AI总结 该研究提出了一种名为图结构神经流(GSNF)的一步式模型,用于处理不规则多变量时间序列分类问题。为了解决现有方法在变量间交互建模方面的不足,GSNF引入了两种辅助轨迹自监督策略,通过轨迹发散和逆向时间生成增强图结构学习的效果。实验表明,该方法在多个真实数据集上取得了最先进的分类性能,同时保持了较高的训练效率和较低的内存消耗。

详情
英文摘要

Neural Flows efficiently model irregular multivariate time series by directly learning ODE solution trajectories with neural networks, bypassing step-by-step numerical solvers. Despite their efficiency, many existing approaches treat variables independently, leaving inter-variable interactions underexplored. Moreover, their one-step mapping makes interaction modeling inherently challenging, as it removes the iterative refinement of interactions during learning. To address this challenge, we propose one-step Graph-Structured Neural Flows (GSNF), which introduce two auxiliary-trajectory self-supervision strategies to strengthen interaction learning: (i) interaction-aware trajectory generation via re-initialization, which induces trajectory divergence to expose graph-induced interactions, with a theoretically derived lower bound on divergence; and (ii) reverse-time trajectory generation, which enforces forward-backward consistency to regularize graph learning, enabled by flow invertibility. Experiments on five real-world datasets show that GSNF achieves state-of-the-art classification performance with highly competitive training time and memory usage.

2605.10177 2026-05-12 cs.CV cs.AI cs.RO

MTA-RL: Robust Urban Driving via Multi-modal Transformer-based 3D Affordances and Reinforcement Learning

Guangli Chen, Dianzhao Li, Wenjian Zhong, Bangquan Xie, Ostap Okhrin

发表机构 * Dongguan Key Laboratory of Intelligent Equipment and Smart Industry, School of Advanced Engineering, Great Bay University(东莞智能装备与智能制造重点实验室,先进工程学院,大湾大学) Chair of Applied Statistics, Technische Universität Dresden(应用统计学教授职位,德累斯顿技术大学) Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI)(可扩展数据解析与人工智能中心(ScaDS.AI)) College of Automation, Guangdong University of Technology(自动化学院,广东技术大学)

AI总结 本文提出了一种名为MTA-RL的框架,通过基于多模态Transformer的3D可操作性表示和强化学习,提升城市自动驾驶的鲁棒性。该方法将RGB图像和LiDAR点云融合,生成结构化的几何感知可操作性表示,作为强化学习策略的输入,从而提高决策效率和稳定性。实验表明,MTA-RL在不同密度的交通场景中均优于现有方法,并在未见过的城市环境中表现出优异的零样本泛化能力。

详情
英文摘要

Robust urban autonomous driving requires reliable 3D scene understanding and stable decision-making under dense interactions. However, existing end-to-end models lack interpretability, while modular pipelines suffer from error propagation across brittle interfaces. This paper proposes MTA-RL, the first framework that bridges perception and control through Multi-modal Transformer-based 3D Affordances and Reinforcement Learning (RL). Unlike previous fusion models that directly regress actions, RGB images and LiDAR point clouds are fused using a transformer architecture to predict explicit, geometry-aware affordance representations. These structured representations serve as a compact observation space, enabling the RL policy to operate purely on predicted driving semantics, which significantly improves sample efficiency and stability. Extensive evaluations in CARLA Town01-03 across varying densities (20-60 background vehicles) show that MTA-RL consistently outperforms state-of-the-art baselines. Trained solely on Town03, our method demonstrates superior zero-shot generalization in unseen towns, achieving up to a 9.0% increase in Route Completion, an 11.0% increase in Total Distance, and an 83.7% improvement in Distance Per Violation. Furthermore, ablation studies confirm that our multi-modal fusion and reward shaping are critical, significantly outperforming image-only and unshaped variants, demonstrating the effectiveness of MTA-RL for robust urban autonomous driving.

2605.10174 2026-05-12 cs.CV

BathyFacto: Refraction-Aware Two-Media Neural Radiance Fields for Bathymetry

Markus Brezovsky, Anatol Günthner, Frederik Schulte, Lukas Winiwarter, Boris Jutzi, Gottfried Mandlburger

发表机构 * Department of Geodesy and Geoinformation, TU Wien(维也纳技术大学测绘与地理信息系) Institute of Photogrammetry and Remote Sensing (IPF), Karlsruhe Institute of Technology (KIT)(卡尔斯鲁厄理工学院测绘与遥感研究所) Unit of Geometry and Surveying, University of Innsbruck(因斯布鲁克大学几何与测绘单位)

AI总结 BathyFacto 是一种针对水下测绘的折射感知双介质神经辐射场方法,旨在解决传统光束法重建在水下场景中因光折射导致的深度偏差问题。该方法通过引入介质条件颜色头和基于哈希网格的密度场,结合斯涅尔定律模拟光线在空气-水界面的折射路径,从而实现更精确的水下点云重建。实验表明,BathyFacto 在模拟场景中显著提升了重建精度和完整性,优于传统方法和未考虑折射的神经辐射场基线。

Comments 16 pages, 8 figures, 3 tables. Submitted to ISPRS Open Journal of Photogrammetry and Remote Sensing, Special Issue "3D Underwater Mapping from Above and Below"

详情
英文摘要

Through-water photogrammetry based on UAV imagery enables shallow-water bathymetry, but refraction at the air-water interface violates the straight-ray assumption of Structure-from-Motion and causes systematic depth bias. We present BathyFacto, a refraction-aware two-media extension of Nerfacto integrated into Nerfstudio that targets metrically precise underwater point clouds. BathyFacto uses a shared hash-grid-based density field with a medium-conditioned color head that receives a one-bit medium flag (air or water) and traces each camera ray as two segments: a straight segment in air up to a planar water surface and a refracted segment in water computed via Snell's law with known refractive indices. To allocate samples efficiently across the air-water boundary, we employ a single proposal-network sampler that operates on a virtual straight ray spanning both media, combined with a kinked density wrapper that transparently corrects water-segment positions along the refracted direction before density evaluation. A data adaptation pipeline converts photogrammetric reconstructions to a Nerfstudio-compatible format, estimates the water plane from boundary markers, and provides per-pixel medium masks to gate refraction. We also extend the point cloud export with refraction-corrected backprojection and reversible coordinate transforms to world and global frames. On a simulated two-media scene with known ground truth, BathyFacto with refraction achieves a Cloud-to-Mesh mean distance of 0.06 m and 87 % completeness, compared to 0.52 m / 29 % for the Nerfacto baseline and 0.36 m / 21% for conventional MVS without refraction correction.

2605.10172 2026-05-12 cs.CV cs.CL

V-ABS: Action-Observer Driven Beam Search for Dynamic Visual Reasoning

Zhiwei Ning, Xuanang Gao, Jiaxi Cao, Gengming Zhang, Shengnan Ma, Wenwen Tong, Hanming Deng, Jie Yang, Wei Liu

发表机构 * School of Automation and Intelligent Sensing, Shanghai Jiao Tong University(上海交通大学自动化与智能感知学院) Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University(上海交通大学图像处理与模式识别研究所) SenseTime Research(商汤研究院) Institute of Medical Robotics, Shanghai Jiao Tong University(上海交通大学医学机器人研究所)

AI总结 本文提出了一种名为V-ABS的行动观察者驱动的束搜索框架,用于解决动态视觉推理中的多步骤复杂任务。该方法通过引入思考者-行动者-观察者迭代机制,结合基于熵的自适应加权算法,有效缓解了想象-行动-观察者偏差(IAO偏差),提升了推理的稳定性和最优性。实验表明,V-ABS在多个基准测试中均取得领先性能,显著优于现有模型。

详情
英文摘要

Multimodal large language models (MLLMs) have achieved remarkable success in general perception, yet complex multi-step visual reasoning remains a persistent challenge. Although recent agentic approaches incorporate tool use, they often neglect critical execution feedback. Consequently, they suffer from the imagination-action-observer (IAO) bias, a misalignment between prior imagination and observer feedback that undermines reasoning stability and optimality. To bridge this gap, we introduce V-ABS, an action-observer driven beam search framework that enables deliberate reasoning through thinker-actor-observer iterations. We also propose an entropy-based adaptive weighting algorithm to mitigate the IAO bias by dynamically balancing the confidence scores between the policy priors and the observational feedback. Moreover, we construct a large-scale supervised fine-tuning (SFT) dataset comprising over 80k samples to guide the model to assign higher prior confidence to correct action paths. Extensive experiments across eight diverse benchmarks show that V-ABS achieves state-of-the-art performance, delivering an average improvement of 19.7% on the Qwen3-VL-8B baseline and consistent gains across both open-source and proprietary models.

2605.10171 2026-05-12 cs.CL cs.AI

When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews

Sandeep Kumar, Yash Kamdar, Abid Hossain, Bharti Kumari, Tanik Saikh, Asif Ekbal

发表机构 * Department of Computer Science and Engineering, Indian Institute of Technology Patna, India(印度理工学院帕纳瓦分校计算机科学与工程系) School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, India(比哈尔邦布尔萨大学计算机工程学院)

AI总结 科学同行评审中常常存在专家意见不一致的现象,随着会议投稿数量的增加,识别和理解这些分歧变得越来越具有挑战性。本文提出了一种细粒度的矛盾分析方法,通过识别完整评审中的矛盾证据片段并赋予分歧强度评分,更准确地刻画评审间的冲突程度。为此,研究者构建了RevCI数据集,并设计了IMPACT框架,结合多智能体推理与证据提取,实现了对矛盾及其严重程度的建模,同时提出了轻量模型TIDE以实现高效推理。

Comments accepted at ACL 2026

详情
英文摘要

Scientific peer reviews frequently contain conflicting expert judgments, and the increasing scale of conference submissions makes it challenging for Area Chairs and editors to reliably identify and interpret such disagreements. Existing approaches typically frame reviewer disagreement as binary contradiction detection over isolated sentence pairs, abstracting away the review-level context and obscuring differences in the severity of evaluative conflict. In this work, we introduce a fine-grained formulation of reviewer contradiction analysis that operates over full peer reviews by explicitly identifying contradiction evidence spans and assigning graded disagreement intensity scores. To support this task, we present RevCI, an expert-annotated benchmark of peer-review pairs with evidence-level contradiction annotations with graded intensity labels. We further propose IMPACT, a structured multi-agent framework that integrates aspect-conditioned evidence extraction, deliberative reasoning, and adjudication to model reviewer contradictions and their intensity. To support efficient deployment, we distill IMPACT into TIDE, a small language model that predicts contradiction evidence and intensity in a single forward pass. Experimental results show that IMPACT substantially outperforms strong single-agent and generic multi-agent baselines in both evidence identification and intensity agreement, while TIDE achieves competitive performance at significantly lower inference cost.

2605.10170 2026-05-12 cs.LG

Balancing Efficiency and Fairness in Traffic Light Control through Deep Reinforcement Learning

Matteo Cederle, Giacomo Scatto, Gian Antonio Susto

发表机构 * University of Padova(帕多瓦大学)

AI总结 本文研究如何通过深度强化学习在交通信号灯控制中平衡效率与公平性。提出了一种新型的深度强化学习代理,能够在动态交通条件下同时考虑车辆和行人流量的公平性需求,实现两者的动态协调。实验表明,该方法在缓解交通拥堵的同时,有效保障了不同道路使用者的公平服务,为智能城市中的交通管理提供了实用且灵活的解决方案。

Comments Paper accepted to the 2026 IFAC World Congress, held in Busan (KOR), August 23rd-28th, 2026

详情
英文摘要

Urban traffic congestion presents a significant challenge for modern cities, which impacts mobility and sustainability. Traditional traffic light control systems often fail to adapt to dynamic conditions, leading to inefficiencies. This paper proposes a novel deep reinforcement learning agent for traffic light control that addresses this limitation by explicitly integrating fairness considerations for both vehicular and pedestrian traffic. Unlike prior work, our approach dynamically balances these flows based on real-time demand, moving beyond systems focused solely on vehicles. Experimental results demonstrate that our agent effectively reduces congestion while ensuring equitable service for both the categories of road users. This research contributes to a practical and adaptable solution for intelligent traffic management within the framework of smart cities, paving the way for more efficient and inclusive urban mobility.

2605.10169 2026-05-12 cs.AI cs.GT

Automated Approach for Solving Infinite-state Polynomial Reachability Games

Krishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi, Maximilian Seeliger, Đorđe Žikelić

发表机构 * Institute of Science and Technology Austria (ISTA)(奥地利科学与技术研究所) ETH Zurich(苏黎世联邦理工学院) Singapore Management University(新加坡管理大学)

AI总结 本文研究无限状态图上的回合制可达性博弈,重点在于确定“REACH”玩家是否存在并计算其赢得游戏的策略。作者提出了排名证明(ranking certificates)作为一种完备且可靠的证明规则,并设计了一种针对多项式可达性博弈的全自动算法,能够在子指数时间内计算出赢得策略并生成形式化正确性证明。实验表明,该方法能够解决现有方法难以处理的复杂案例,例如经典“灰姑娘与继母”博弈中首次实现了任意精度参数下的最优策略计算。

详情
英文摘要

Reachability games are two-player games played on a graph, where the objective of $\texttt{REACH}$ player is to reach the target set whereas the objective of $\texttt{SAFE}$ player is to stay away from the target set. Reachability games have important applications in artificial intelligence and reactive synthesis, and many of these applications give rise to infinite-state reachability games. In this paper, we study turn-based reachability games on infinite-state graphs defined over valuations of a finite set of real variables. We consider the problem of determining the existence of and computing a winning strategy for $\texttt{REACH}$ player. Our contributions are twofold. First, we propose ranking certificates for reachability games, a sound and complete proof rule for proving that $\texttt{REACH}$ player has a winning strategy from the specified initial state. Second, we consider polynomial reachability games, where transitions and objectives are described by polynomial constraints over real variables, and propose a fully automated algorithm for computing a winning strategy for $\texttt{REACH}$ player together with a formal correctness witness in the form of a ranking certificate. The algorithm is sound, semi-complete, and runs in sub-exponential time. Our experiments demonstrate the ability of our method to solve challenging examples from the literature that were out of the reach of existing methods. Specifically, for the classical Cinderella-Stepmother game, we are able to compute an optimal winning strategy for an arbitrary precision parameter for the first time.

2605.10168 2026-05-12 cs.CL cs.IR

ASTRA-QA: A Benchmark for Abstract Question Answering over Documents

Shu Wang, Shansong Zhou, Xinyang Wang, Shiwei Wang, Hulong Wu, Yixiang Fang

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Data Science Group(数据科学组)

AI总结 本文提出ASTRA-QA,一个用于文档抽象问答的基准数据集,旨在解决现有问答基准在处理需要综合多文档信息的抽象问题时支持不足的问题。该数据集包含869个问答实例,涵盖五类抽象问题和三种可控检索范围,并为每个实例提供了明确的评估标注,如答案主题集、未支持主题和对齐证据。通过直接评分主题覆盖度和未支持内容,ASTRA-QA实现了无需详尽对比的可扩展评估,并在多种检索增强生成方法上验证了其对覆盖性、幻觉和检索鲁棒性的诊断能力。

详情
英文摘要

Document-based question answering (QA) increasingly includes abstract questions that require synthesizing scattered information from long documents or across multiple documents into coherent answers. However, this setting is still poorly supported by existing benchmarks and evaluation methods, which often lack stable abstract references or rely on coarse similarity metrics and unstable head-to-head comparisons. To alleviate this issue, we introduce ASTRA-QA, a benchmark for AbSTRAct Question Answering over documents. ASTRA-QA contains 869 QA instances over academic papers and news documents, covering five abstract question types and three controlled retrieval scopes. Each instance is equipped with explicit evaluation annotations, including answer topic sets, curated unsupported topics, and aligned evidence. Building on these annotations, ASTRA-QA assesses whether answers cover required key points and avoid unsupported content by directly scoring topic coverage and curated unsupported content, enabling scalable evaluation without exhaustive head-to-head comparisons. Experiments with representative Retrieval-Augmented Generation (RAG) methods spanning vanilla, graph-based, and hierarchical retrieval settings show that ASTRA-QA provides reference-grounded diagnostics for coverage, hallucination, and retrieval-scope robustness. Our dataset and code are available at https://xinyangsally.github.io/astra-benchmark.

2605.10164 2026-05-12 cs.LG stat.ML

Hyperparameter Transfer for Dense Associative Memories

Roi Holtzman, Dmitry Krotov, Boris Hanin

发表机构 * Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK(鲁道夫·皮尔士理论物理中心,牛津大学,牛津 OX1 3PU, 英国) Dynamical Mind, IBM Research(动态思维,IBM研究) Princeton ORFE(普林斯顿ORFE)

AI总结 该论文研究了如何将超参数迁移方法应用于密集联想记忆(DenseAM)模型,这类模型通过神经网络在能量景观上进行时间动态操作,具有层内和层间权重共享的结构特点。由于DenseAM使用了在传统前馈网络中较少见的快速峰值激活函数,使得现有超参数迁移方法难以直接应用。本文提出了针对DenseAM的超参数迁移方法,推导了从小规模模型迁移至大规模模型的明确超参数设置规则,并通过实验验证了理论分析与实际结果的一致性。

详情
英文摘要

Dense Associative Memory (DenseAM) is a promising family of AI architectures that is represented by a neural network performing temporal dynamics on an energy landscape. While hyperparameter transfer methods are well-studied for feed-forward networks, these methods have not been developed for settings in which weights are shared across layers and within the layer, which is common in DenseAMs. Additionally, DenseAMs utilize rapidly peaking activation functions that are rarely used in feed-forward architectures. The confluence of these aspects makes DenseAM a challenging framework for using existing methods for hyperparameter transfer. Our work initiates the development of hyperparameter transfer methods for this class of models. We derive explicit prescriptions for how the hyperparameters tuned on small models can be transferred to models trained at scale. We demonstrate excellent agreement between these theoretical findings and empirical results.

2605.10162 2026-05-12 cs.CV

Active-SAOOD: Active Sparsely Annotated Oriented Object Detection in Remote Sensing Images

Yu Lin, Jianghang Lin, Kai Ye, Shengchuan Zhang, Liujuan Cao

发表机构 * Key Laboratory of Multimedia Trusted Perception(多媒体可信感知关键实验室) Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China(高效计算,中国教育部,厦门大学,361005,中华人民共和国)

AI总结 本文提出了一种基于主动学习的稀疏标注遥感图像定向目标检测方法Active-SAOOD,旨在降低遥感图像中定向目标检测的标注成本。该方法通过模型状态观测模块,在实例层面综合考虑方向、分类与定位的不确定性以及类间和类内多样性,主动选择对当前模型最有价值的稀疏样本,从而在完全随机初始化的稀疏标注下实现稳定检测。实验表明,Active-SAOOD在多种数据集上显著提升了现有稀疏标注方法的性能与稳定性,尤其在仅1%标注比例下性能提升达9%,进一步增强了其在遥感领域的实用价值。

详情
英文摘要

Reducing the annotation cost of oriented object detection in remote sensing remains a major challenge. Recently, sparse annotation has gained attention for effectively reducing annotation redundancy in densely remote sensing scenes. However, (1) the sparse data reliance on class-dependent sampling, and (2) the lack of in-depth investigation into the characteristics of sparse samples hinders its further development. This paper proposes an active learning-based sparsely annotated oriented object detection (SAOOD) method, termed Active-SAOOD. Based on a model state observation module, Active-SAOOD actively selects the most valuable sparse samples at the instance level that are best suited to the current model state, by jointly considering orientation, classification, and localization uncertainty, as well as inter- and intra-class diversity. This design enables SAOOD to operate stably under completely randomly initialized sparse annotations and extends its applicability to broader real-world. Experiments on multiple datasets demonstrate that Active-SAOOD significantly improves both performance and stability of existing SAOOD methods under various random sparse annotation. In particular, with only 1\% annotated ratios, it achieves a 9\% performance gain over the baseline, further enhancing the practical value of SAOOD in remote sensing. The code will be public.

2605.10161 2026-05-12 cs.LG

OUIDecay: Adaptive Layer-wise Weight Decay for CNNs Using Online Activation Patterns

Alberto Fernández-Hernández, Jose I. Mestre, Cristian Pérez-Corral, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí

发表机构 * Universitat Politècnica de València(瓦伦西亚理工大学) Universitat Jaume I(Jaime I 大学) Openchip & Software Technologies S.L.(Openchip 及软件技术公司)

AI总结 本文提出了一种名为OUIDecay的自适应层间权重衰减方法,用于卷积神经网络的训练。该方法基于激活模式计算的过拟合-欠拟合指示器(OUI),动态调整各层的权重衰减系数,无需依赖验证数据,且计算轻量,适合在线使用。实验表明,OUIDecay在多个数据集和网络结构上优于固定衰减和基于梯度的自适应方法,有效提升了模型的泛化性能。

详情
英文摘要

Weight decay remains one of the most widely used regularization mechanisms for training convolutional neural networks, yet it is still commonly applied as a fixed coefficient shared by all layers throughout training. This uniform treatment ignores that different layers may follow different structural dynamics and therefore may require different regularization strengths. In this work, we propose OUIDecay, an adaptive layer-wise and time-dependent weight decay scheduler for CNNs driven by the Overfitting-Underfitting Indicator (OUI), an activation-based metric previously shown to provide early information about regularization quality. OUIDecay uses a lightweight batch-based formulation of OUI to monitor the structural behavior of each layer online and periodically rescales its weight decay relative to the other layers in the network. Unlike gradient-based adaptive decay methods, our approach relies on functional information extracted from activation patterns and does not require validation data. Experiments on EfficientNet-B0 with Stanford Cars, ResNet50 with Food101, DenseNet121 with CIFAR100, and MobileNetV2 with CIFAR10 show that OUIDecay achieves the best mean best-validation-loss in 7 out of 8 evaluated settings. These results indicate that activation-driven weight decay adaptation is a practical and effective alternative to fixed decay and gradient-based adaptive decay, while keeping the method lightweight and suitable for online use.

2605.10159 2026-05-12 cs.LG cs.NA math.NA physics.comp-ph

jNO: A JAX Library for Neural Operator and Foundation Model Training

Leon Armbruster, Rathan Ramesh, Georg Kruse, Christopher Straub

发表机构 * Fraunhofer Institute for Integrated Systems and Device Technology(弗劳恩霍夫整合系统与器件技术研究所)

AI总结 jNO 是一个基于 JAX 的库,旨在支持神经算子和基础模型的训练,统一支持数据驱动和物理感知两种训练方式。其核心设计采用了一种追踪系统,允许用户用统一的符号语言编写领域、模型调用、残差、监督损失和诊断信息,并将其编译为一个优化流程,从而在不同任务间灵活切换而无需重构代码。jNO 还支持多模型组合、参数级别的精细控制、超参数调优以及适用于偏微分方程基础模型家族的原生 JAX 工作流。

详情
英文摘要

jNO (jax Neural Operators) is a JAX-native library for neural operators and foundation models with unified support for both data-driven and physics-informed training. Its core design is a tracing system in which domains, model calls, residuals, supervised losses, and diagnostics are written in one symbolic language and compiled into one optimization pipeline. This allows users to move between operator regression, mesh-aware residual evaluation, and PDE-constrained training without restructuring the surrounding code. jNO also supports multi-model compositions, fine-grained control at parameter level (model, optimizer, and learning rate), hyperparameter tuning, and JAX-native workflows for translated PDE foundation-model families. The source repository is available at https://github.com/FhG-IISB/jNO.

2605.10158 2026-05-12 cs.LG

Unsupervised Process Reward Models

Artyom Gadetsky, Maxim Kodryan, Siba Smarak Panigrahi, Hang Guo, Maria Brbic

发表机构 * Swiss Federal Institute of Technology(瑞士联邦理工学院)

AI总结 本文提出了一种无需人工监督的无监督过程奖励模型(uPRM),用于指导大语言模型的推理过程。该方法通过利用大语言模型的下一个词概率定义评分函数,联合评估多个推理轨迹中首个错误步骤的位置,从而实现对推理过程的评估与引导。实验表明,uPRM在错误步骤识别、测试时扩展验证以及强化学习奖励信号应用中均表现出色,为复杂推理任务的可扩展奖励建模提供了新途径。

Comments preprint

详情
英文摘要

Process Reward Models (PRMs) are a powerful mechanism for steering large language model reasoning by providing fine-grained, step-level supervision. However, this effectiveness comes at a significant cost: PRMs require expert annotations for every reasoning step, making them costly and difficult to scale. Here, we propose a method for training unsupervised PRMs (uPRM) that requires no human supervision, neither at the level of step-by-step annotations nor through ground-truth verification of final answers. The key idea behind our approach is to define a scoring function, derived from LLM next-token probabilities, that jointly assesses candidate positions of first erroneous steps across a batch of reasoning trajectories. We demonstrate the effectiveness of uPRM across diverse scenarios: (i) uPRM achieves up to 15% absolute accuracy improvements over the LLM-as-a-Judge in identifying first erroneous steps on the ProcessBench dataset; (ii) as a verifier for test-time scaling, uPRM performs comparably to supervised PRMs and outperforms the majority voting baseline by up to 6.9%, and (iii) when used as a reward signal in reinforcement learning, uPRM enables more robust policy optimization throughout training compared to a supervised PRM trained using ground-truth labels. Overall, our results open a path toward scalable reward modeling for complex reasoning tasks.

2605.10155 2026-05-12 cs.CL

NyayaAI: An AI-Powered Legal Assistant Using Multi-Agent Architecture and Retrieval-Augmented Generation

Deepanshu, Divi Saxena, Deepali Rana, Ayesha Varshney, Sahinur Rahman Laskar

发表机构 * School of Computer Science UPES, Dehradun, India(计算机科学学院 UPES 德里胡迪恩印度)

AI总结 本文介绍了NyayaAI,一个基于多智能体架构和检索增强生成技术的AI法律助手,旨在解决印度法律信息因语言复杂和文档量大而难以获取的问题。该系统结合大型语言模型与构建在印度法律知识库上的检索增强生成流程,通过多智能体协调处理法律研究、文档摘要、案例检索和文书起草等任务,并设有合规模块确保输出准确性。实验表明,该系统在领域分类、检索和响应准确率方面均达到较高水平,展示了结构化多智能体LLM系统在提升法律可及性和工作效率方面的潜力。

Comments 3 pages, 1 figure

详情
英文摘要

Legal information in India remains largely inaccessible due to the complexity of legal language and the sheer volume of legal documentation involved in research and case analysis. This paper presents NyayaAI, an AI-powered legal assistant that automates and simplifies legal workflows for lawyers, law students, and general users. The system combines Large Language Models with a Retrieval-Augmented Generation pipeline grounded in a curated Indian legal knowledge base comprising constitutional provisions, statutes, case laws, and judicial precedents. A multi-agent architecture orchestrated through the Mastra TypeScript framework coordinates a main agent with specialized sub-agents handling legal research, document summarization, case law retrieval, and drafting assistance. A compliance module validates all responses before delivery. Domain classification achieved 70\% precision across test samples, with RAG retrieval precision at 74\% and overall response accuracy at 72\%, demonstrating that structured multi-agent LLM systems can meaningfully improve legal accessibility and workflow efficiency. The code\footnote{https://github.com/B97784/NyayaAI} is made publicly available for the benefit of the research community.