arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.04238 2026-06-04 cs.LG cs.AI

Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data

Recover-LoRA 用于激进量化：通过低秩适配与合成数据知识蒸馏恢复2比特语言模型的精度

Devleena Das, Rajeev Patwari, Elliott Delaye, Ashish Sirasao

发表机构 * Advanced Micro Devices, Inc.（先进微器件公司）

AI总结针对2比特激进量化导致的大语言模型精度严重下降问题，提出Recover-LoRA方法，结合选择性混合精度策略（仅MLP的gate和up层量化为2比特）和基于合成数据蒸馏的低秩适配训练，在Qwen3-4B上以1万合成样本在12个基准中恢复9个基准80-95%的精度。

详情

AI中文摘要

将权重激进量化至2比特精度可大幅提升大语言模型推理的吞吐量和内存效率，但通常会导致严重的精度下降。这些增益对于内存容量和带宽为主要限制的边缘和设备端部署尤为重要。在本工作中，我们将Recover-LoRA——一种最初为通用模型权重损坏设计的轻量级、无需数据的精度恢复方法——扩展到超低比特量化场景。我们提出了一种选择性混合精度策略，其中仅MLP的gate和up投影层被量化为2比特（W2），而所有其他线性层保持更高精度，从而形成混合精度的GateUp配置。通过三个模型系列（4B-20B）和两个硬件平台的屋顶线分析，我们证明W4/W2-GateUp部署（4比特基础加2比特gate/up）相比均匀W4可实现7.5-23.3%的TPS提升（取决于模型和上下文长度），同时将量化误差限制在可预测的层子集内。然后，我们应用Recover-LoRA——在量化层上通过合成数据的logit蒸馏训练低秩适配器——来恢复因gate和up层的2比特量化而损失的精度。在Qwen3-4B的案例研究中，Recover-LoRA仅使用1万合成训练样本且无需标注数据，就在12个基准中的9个上实现了80-95%的精度恢复。我们进一步证明，对于基于蒸馏的恢复，合成数据的表现与精心整理的标注数据相当，并且恢复结果可泛化到分布外评估任务。我们的结果表明，Recover-LoRA是一种实用的后量化精度恢复工具，适用于部署场景中的激进权重压缩。

英文摘要

Aggressive weight quantization to 2-bit precision offers substantial throughput and memory gains for large language model (LLM) inference, but typically incurs severe accuracy degradation. These gains are particularly relevant for edge and on-device deployment, where memory capacity and bandwidth are primary constraints. In this work, we extend Recover-LoRA -- a lightweight, data-free accuracy recovery method originally developed for general model weight corruption -- to the setting of ultra-low-bit quantization. We propose a selective mixed-precision strategy in which only gate and up projection layers of the MLP are quantized to 2-bit (W2), while all other linear layers remain at higher precision, yielding a mixed-precision GateUp configuration. We demonstrate via roofline analysis across three model families (4B--20B) and two hardware platforms that a W4/W2-GateUp deployment (4-bit base with 2-bit gate/up) delivers 7.5--23.3\% TPS improvement over uniform W4 depending on model and context length, while confining quantization error to a predictable subset of layers. We then apply Recover-LoRA -- training low-rank adapters on the quantized layers via logit distillation with synthetic data -- to recover accuracy lost from 2-bit quantization of the gate and up layers. In a case study on Qwen3-4B, Recover-LoRA achieves 80--95\% accuracy recovery on 9 of 12 benchmarks, using only 10k synthetic training samples and no labeled data. We further demonstrate that synthetic data performs comparably to curated labeled data for distillation-based recovery, and that recovery generalizes to out-of-distribution evaluation tasks. Our results present Recover-LoRA as a practical post-quantization accuracy recovery tool for aggressive weight compression in deployment settings.

URL PDF HTML ☆

赞 0 踩 0

2606.04236 2026-06-04 cs.CL cs.AI cs.LG

通过最小二乘和力学建模估算假肢接受腔中的法向和剪切界面压力

Axel González Cornejo, Tianhao Yu, Chi Hwan Lee, Edgar Bolívar-Nieto

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Michigan（密歇根大学）

AI总结针对假肢接受腔界面压力测量中剪切力缺失和传感器串扰问题，提出一种基于稀疏传感和最小二乘的准静态弹簧-质量接触模型，通过全局力/力矩和局部压力数据验证模型性能。

详情

AI中文摘要

假肢接受腔的适配仍然主要依靠手工和迭代，客观适配指标仍然有限。挑战之一在于缺乏残肢-接受腔界面的长期真实压力数据。传统压力传感器随时间漂移，且仅能捕捉接受腔内稀疏位置的法向压力，缺失了生物力学分析的关键分量：剪切力。尽管某些传感器可以同时报告法向和剪切界面应力，但由于测量串扰，这些分量往往难以解耦。一个潜在的解决途径是开发能够增强现有测量的模型。本文引入了一个测试平台，使用两种互补的验证信号评估稀疏压力传感下的模型性能：（i）通过人工残肢传递的全局力螺旋（即正交坐标系中的总力和力矩），以及（ii）由稀疏传感簇（每个簇由四个电容传感通道组成）测量的局部界面载荷（即每个仪器位置处右手正交坐标系中解耦的法向和剪切压力分量）。本文不呈现全场压力估计，而是聚焦于一个分析序列，量化候选力学模型在受控条件下解释全局和局部测量的能力。评估了一个准静态弹簧-质量接触模型，并通过两阶段凸最小二乘问题识别其参数。静态加载下的验证表明，估计恒定偏置项可以减少力螺旋通道的稳态偏移，并改善与局部测量的一致性。帕累托前沿敏感性分析进一步说明了当包含偏置项时，全局和局部目标之间的权衡如何变化。

英文摘要

Prosthetic socket fitting remains largely manual and iterative, and objective fit metrics are still limited. Part of the challenge is the lack of long-term real-life pressure data at the residual limb--socket interface. Traditional pressure sensors are prone to drift over time, and capture only normal pressures at sparse locations within the socket, missing a critical component for biomechanical analysis: shear. Although some sensors can report both normal and shear interface stresses, these components are often difficult to decouple because of measurement crosstalk. One potential path forward is to develop models that can augment available measurements. This work introduces a testbed to evaluate model performance under sparse pressure sensing using two complementary validation signals: (i) the global wrench (\ie, total forces and moments expressed in an orthonormal frame) transmitted through the socket, by an artificial residual-limb, and (ii) local interface loads (\ie, decoupled normal and shear pressure components in a right-hand-rule orthogonal frame that lives in each instrumented location) measured by sparse sensing clusters, each composed of four capacitance-sensing channels. Rather than presenting full-field pressure estimates, the focus is on an analysis sequence that quantifies how well candidate mechanical models explain both global and local measurements under controlled conditions. A quasi-static spring--mass contact model is evaluated, and its parameters are identified via a two-stage convex least-squares problem. Validation under static loading shows that estimating constant bias terms reduces steady offsets in the wrench channels and improves agreement with local measurements. A Pareto-front sensitivity analysis further illustrates how the trade-off between global and local objectives changes when bias terms are included.

URL PDF HTML ☆

赞 0 踩 0

2606.04221 2026-06-04 cs.SD cs.AR eess.AS

Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid

基于时域DNN的助听器嵌入式FPGA语音增强可行性研究

Feyisayo Olalere, Umut Altin, Kiki van der Heijden, Marcel van Gerven

发表机构 * Radboud University, Donders Institute for Brain, Cognition, and Behaviour, The Netherlands（拉德堡德大学，脑认知行为研究所，荷兰）； Mortimer B. Zuckerman Mind, Brain, Behavior Institute, Columbia University, USA（莫蒂默·B·齐克曼心智、大脑与行为研究所，哥伦比亚大学，美国）

AI总结本文在AMD-Xilinx Kria KV260上部署轻量级SuDoRM-RF++模型，通过FP32和16位定点精度评估语音分离和降噪，发现数据移动是主要瓶颈，定点降噪加速器达到9.7ms首样本延迟，满足10ms临床阈值。

Comments 13 pages

2606.04209 2026-06-04 cs.LG

使用可解释语言特征检测AI生成假新闻的跨提示泛化

Aya Vera-Jimenez, Samuel Jaeger, Calvin Ibenye, Dhrubajyoti Ghosh

发表机构 * Department of Mathematics（数学系）； School of Data Science and Analytics（数据科学与分析学院）； Department of Computer Science（计算机科学系）

AI总结研究通过提取词汇多样性、可读性和情感特征，在跨提示框架下使用随机森林分类器检测AI生成假新闻，发现模型在不同提示下均表现稳定（AUC 0.988-1.000），表明这些特征可泛化。

详情

AI中文摘要

大型语言模型的日益普及引发了对AI生成假新闻传播的担忧，尤其是在不同的提示策略下。大多数现有的检测模型是在单一生成设置下训练和评估的，其跨未见提示的泛化能力尚不清楚。在本研究中，我们使用三个在不同提示下生成的AI文章数据集以及真实新闻文章，研究了假新闻检测中的跨提示泛化。我们提取了捕捉词汇多样性、可读性和情感特征的可解释语言特征，并在跨提示框架下评估了随机森林分类器，其中在一个提示上训练的模型在另一个提示上进行测试。在所有六个训练-测试组合中，性能始终保持较高，AUC值在0.988到1.000之间。特征分布分析显示，与整体数据集相比，AI生成文本表现出更高的词汇多样性、更低的可读性和显著较低的情感强度，且不同提示间存在差异。尽管存在这些分布变化，分类器仍保持强劲性能，表明这些特征捕捉了AI生成文本的稳定属性，这些属性可跨提示策略泛化。这些发现表明，基于特征的方法可以在提示变化下提供对AI生成假新闻的稳健检测。

英文摘要

The increasing use of large language models has raised concerns about the spread of AI-generated fake news, particularly under varying prompting strategies. Most existing detection models are trained and evaluated under a single generation setting, leaving their ability to generalize across unseen prompts unclear. In this study, we investigate cross-prompt generalization in fake news detection using three datasets of AI-generated articles produced under distinct prompts, combined with real news articles. We extract interpretable linguistic features capturing lexical diversity, readability, and emotion-based characteristics and evaluate a random forest classifier under a cross-prompt framework, where models trained on one prompt are tested on another. Across all six train-test combinations, performance remains consistently high, with AUC values ranging from 0.988 to 1.000. Analysis of feature distributions shows that AI-generated text exhibits increased lexical diversity, reduced readability, and substantially lower emotional intensity compared to the overall dataset, with variations across prompts. Despite these distributional shifts, the classifier maintains strong performance, indicating that these features capture stable properties of AI-generated text that generalize across prompting strategies. These findings suggest that feature-based approaches can provide robust detection of AI-generated fake news under prompt variability.

URL PDF HTML ☆

赞 0 踩 0

2606.04198 2026-06-04 cs.CV

Spatial Artifact Coherence Determines Codec Robustness in Patch-Based rPPG

空间伪影相干性决定基于补丁的rPPG中的编解码鲁棒性

Achraf Ben Ahmed

发表机构 * PlesmoSense SARL（PlesmoSense公司）

AI总结提出空间伪影相干性（SAC）度量，解释编解码压缩下基于补丁的rPPG方法优于全局投影方法的原因，并设计PatchPCA算法族，实验表明SAC解释了93.8%的PCA优势方差。

详情

AI中文摘要

远程光电容积描记法（rPPG）在未压缩基准上实现了低心率误差，但在远程医疗、新生儿ICU和驾驶员疲劳应用中通过压缩视频通道部署。先前没有工作确定在编解码压缩下空间分解优于全局投影方法的物理量。我们提出空间伪影相干性（SAC），定义为4x4块间绿色通道协方差矩阵（带通0.75-2.5 Hz）的非对角能量与对角能量之比，以及PatchPCA算法族（四种编解码感知的rPPG算法）。我们在三个公共数据集上评估了280名受试者、11种编解码退化变体（MPEG-4、H.265、H.264、JPEG、色度子采样）和13种算法，通过Wilcoxon检验（BH-FDR，q < 0.05，904次检验）。SAC解释了PCA优势中93.8%的变体间方差（r = +0.969），编解码族之间零重叠：非MPEG-4变体聚集在SAC 0.10-0.18，PCA胜率84-90%；而MPEG-4变体聚集在SAC 0.48-0.59，胜率61%，平均改进降低5.8倍。在受试者内部，78%确认了预期模式（p < 10^-22，dz = 0.73）。变体内部受试者水平SAC相关性为r = +0.099，确认SAC分类编解码族而非预测个体结果。MPEG-4的影响是结构性的（宏块DCT几何，而非噪声幅度），由源编解码状态而非分辨率决定。P-Hybrid被确定为最部署鲁棒的算法。建立了PatchPCA优势的两个必要操作条件：SAC < 0.30和低到中等运动，直接排除了原始到MPEG-4转码流水线。SAC为临床远程监测系统中编解码感知的rPPG算法选择提供了物理基础度量。

英文摘要

Remote photoplethysmography (rPPG) achieves low heart-rate error on uncompressed benchmarks yet is deployed over compressed video channels in telehealth, neonatal ICU, and driver fatigue applications. No prior work identifies the physical quantity determining when spatial decomposition outperforms global-projection methods under codec compression. We propose Spatial Artifact Coherence (SAC), defined as the ratio of off-diagonal to diagonal energy in the 4x4 inter-patch Green-channel covariance matrix (bandpass 0.75-2.5 Hz), and the PatchPCA algorithm family (four codec-aware rPPG algorithms). We evaluate 280 subjects across three public datasets, 11 codec degradation variants (MPEG-4, H.265, H.264, JPEG, chroma subsampling), and 13 algorithms via Wilcoxon tests (BH-FDR, q < 0.05, 904 tests). SAC explains 93.8% of between-variant variance in PCA advantage (r = +0.969), with zero overlap between codec families: non-MPEG-4 variants cluster at SAC 0.10-0.18 with 84-90% PCA win rates, while MPEG-4 variants cluster at SAC 0.48-0.59 with 61% win rate and a 5.8x reduction in mean improvement. Within subjects, 78% confirm the expected pattern (p < 10^-22, dz = 0.73). Within-variant subject-level SAC correlation is r = +0.099, confirming SAC classifies codec families rather than predicting individual outcomes. MPEG-4's effect is structural (macroblock DCT geometry, not noise amplitude), governed by source codec state, not resolution. P-Hybrid is identified as the most deployment-robust algorithm. Two necessary operating conditions for PatchPCA advantage are established: SAC < 0.30 and low-to-moderate motion, directly ruling out raw-to-MPEG-4 transcoding pipelines. SAC provides a physically grounded metric for codec-aware rPPG algorithm selection in clinical remote monitoring systems.

URL PDF HTML ☆

赞 0 踩 0

2606.04194 2026-06-04 cs.LG cs.CL cs.IR

Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval

免训练的词汇-稠密融合用于对话记忆检索

Christian Lysenstøen

发表机构 * Inland Norway University of Applied Sciences（内陆挪威应用科学大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结本文提出一种免训练、仅CPU的检索方法，通过分数级融合最大查询-轮次相似度（后期交互）与BM25，显著提升多会话对话记忆检索的命中率，并分析了不同编码器和池化策略的影响。

Comments 9 pages, 3 figures, 10 tables. Code, data, and per-table receipts: https://github.com/Chrislysen/opsem

详情

AI中文摘要

在跨长多会话历史中检索回答新查询的过去几轮是长期对话记忆（LoCoMo, LongMemEval）背后的检索瓶颈。最近的并行工作Nano-Memory表明，通过最大查询-轮次相似度（后期交互，“轮次隔离检索”）对会话进行评分优于均值池化的会话嵌入。我们不声称该效果；我们复现它并询问一个免训练、仅CPU的检索阶段应在其周围添加什么。我们报告四个发现。（1）融合：在单个留一对话权重下，后期交互稠密分数与BM25的分数级融合，在六个编码器上比单独后期交互增加+8.8到+17.2个LoCoMo Hit@1点（所有p<1e-4），达到Hit@1 0.752 / NDCG@5 0.829（e5-large-v2），比BM25高+11.2个百分点。（2）一个现成的网络搜索交叉编码器重排序器在融合的前10个结果上效果不佳，将Hit@1降低6.9个百分点（一个重排序器，一种配置）。（3）池化算子消融显示top-k后期交互匹配最大相似度，但朴素的平滑最大值（log-sum-exp）对一半编码器失效。（4）所有六个编码器的后期减早期差距很大，且较大的编码器差距往往更大，而边际融合增益缩小；在LongMemEval-S上，一个BM25饱和的词汇机制中，相对于BM25的净融合增益很小且不显著。按类别分析将增益视为分工：稠密后期交互在多跳和时间问题上帮助最大，但在对抗性问题上落后于BM25。贡献是对一个强大的免训练检索方案的可控、可复现的描述，而非后期交互检索器本身（Nano-Memory的）。我们不声称完整的记忆架构；这是一个检索阶段的研究。

英文摘要

Retrieving the few past turns that answer a new query across long multi-session histories is the retrieval bottleneck behind long-term conversational memory (LoCoMo, LongMemEval). Recent concurrent work, Nano-Memory, shows that scoring a session by the maximum query-turn similarity (late interaction, "Turn Isolation Retrieval") beats mean-pooled session embeddings. We do not claim that effect; we replicate it and ask what a training-free, CPU-only retrieval stage should add around it. We report four findings. (1) Fuse: score-level fusion of the late-interaction dense score with BM25, under a single leave-one-conversation-out weight, adds +8.8 to +17.2 points of LoCoMo Hit@1 over late interaction alone across six encoders (all p<1e-4), reaching Hit@1 0.752 / NDCG@5 0.829 (e5-large-v2), +11.2 pp over BM25. (2) An off-the-shelf web-search cross-encoder reranker over the fused top-10 hurts here, degrading Hit@1 by 6.9 pp (one reranker, one configuration). (3) A pooling-operator ablation shows top-k late interaction matches max-similarity, but a naive smooth-max (log-sum-exp) collapses for half the encoders. (4) The late-minus-early gap is large for all six encoders and tends to be larger for larger ones, while the marginal fusion gain shrinks; on LongMemEval-S, a lexical regime where BM25 saturates, the net fusion gain over BM25 is small and not significant. A per-category analysis frames the gain as a division of labor: dense late interaction helps most on multi-hop and temporal questions but trails BM25 on adversarial ones. The contribution is a controlled, reproducible account of a strong training-free retrieval recipe, not the late-interaction retriever itself (Nano-Memory's). We make no claim to a complete memory architecture; this is a retrieval-stage study.

URL PDF HTML ☆

赞 0 踩 0

2606.04191 2026-06-04 cs.LG cs.AI

KODA: 视觉-语言基础模型的对比表示比较与对齐

Youqi Wu, Mohammad Jalali, Farzan Farnia

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出KODA框架，通过核优化方法对比分析视觉-语言基础模型的表示差异，并识别弱聚类与强聚类的样本子集，实现表示对齐。

详情

AI中文摘要

视觉-语言基础模型（如CLIP和SigLIP）为多模态学习系统提供了广泛使用的表示。虽然这些模型通常通过下游性能进行比较，但这种评估往往不能解释它们的表示在结构上如何不同。在本文中，我们通过对比嵌入聚类任务研究这一问题：识别在一个表示下弱聚类但在另一个表示下强聚类的样本子集。我们提出了\emph{核优化差异分析（KODA）}，一个基于核的对比表示比较与对齐框架。KODA通过模态核组合构建统一的多模态核，并将差异发现形式化为一个约束优化问题，该问题在一个表示中搜索一致结构，同时抑制参考表示中的一致性。这产生了与特定样本子集和模态交互相关的可解释差异方向。为了将KODA扩展到大型视觉-语言数据集，我们开发了使用随机投影的联合核随机低维近似，包括用于平移不变核的随机傅里叶特征。实验上，KODA在视觉-语言表示中识别出一致且可解释的差异结构，并为表示对齐提供了样本子集。代码可在https://github.com/yokiwuuu/KODA获取。

英文摘要

Vision-language foundation models such as CLIP and SigLIP provide widely used representations for multimodal learning systems. While these models are typically compared through downstream performance, such evaluations often do not explain how their representations differ structurally. In this work, we study this problem through the task of Contrastive Embedding Clustering: identifying sample subsets that are weakly clustered under one representation but strongly clustered under another. We propose \emph{Kernel Optimization for Discrepancy Analysis (KODA)}, a kernel-based framework for contrastive representation comparison and alignment. KODA constructs unified multimodal kernels through modality-wise kernel composition and formulates discrepancy discovery as a constrained optimization problem that searches for coherent structures in one representation while suppressing coherence in a reference representation. This yields interpretable discrepancy directions associated with specific sample subsets and modality interactions. To scale KODA to large vision-language datasets, we develop randomized low-dimensional approximations of joint kernels using random projections, including Random Fourier Features for shift-invariant kernels. Empirically, KODA identifies consistent and interpretable discrepancy structures across vision-language representations and provides sample subsets for representation alignment. The code is available at https://github.com/yokiwuuu/KODA.

URL PDF HTML ☆

赞 0 踩 0

2606.04177 2026-06-04 cs.CL cs.AI

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

跨领域与模型的人工智能生成文本检测中语言特征的系统分析

Yassir El Attar, Esra Dönmez, Maximilian Maurer, Agnieszka Falenska

发表机构 * Institute for Natural Language Processing, University of Stuttgart（斯图加特大学自然语言处理研究所）； Interchange Forum for Reflecting on Intelligent Systems, University of Stuttgart（智能系统反思交流论坛，斯图加特大学）； GESIS Leibniz Institute for the Social Sciences（莱比锡社会科学院）； Heinrich-Heine University Düsseldorf（杜塞尔多夫海因里希-海涅大学）

AI总结通过大规模实证研究，系统评估284个可解释语言特征在27个LLM和10个文本领域中的鲁棒性，发现词汇丰富度是跨模型和领域的最可靠信号。

Comments preprint

详情

AI中文摘要

可解释的语言特征为解释给定文本为何看似机器生成提供了一种有前景的方法，尤其对于非专业用户。然而，关于哪些特征可靠地指示LLM生成文本的现有发现仍然分散在不同的特征集、模型和文本领域中。为解决这一差距，我们进行了一项大规模实证研究，评估语言信号在表征AI生成文本方面的鲁棒性。我们的分析涵盖了来自27个LLM和十个文本领域的输出中的284个可解释语言特征，并在跨模型和跨领域泛化设置下进行。我们表明，仅基于语言特征的分类器可以可靠地区分AI生成文本和人类撰写文本。然而，许多先前提出的指标被证明高度依赖上下文，但词汇丰富度指标除外，这些指标在模型家族和文本领域中保持鲁棒信号。这些结果展示了哪些语言信号在上下文中泛化，并为更可靠、可解释的AI生成语言分析提供了基础。

英文摘要

Interpretable linguistic features offer a promising approach for explaining why a given text appears machine-generated, particularly for non-expert users. However, existing findings on which features reliably indicate LLM-generated text remain fragmented across feature sets, models, and text domains. To address this gap, we conduct a large-scale empirical study assessing the robustness of linguistic signals for characterizing AI-generated text. Our analysis covers 284 interpretable linguistic features across outputs from 27 LLMs and ten text domains under cross-model and cross-domain generalization settings. We show that classifiers based solely on linguistic features can reliably distinguish AI-generated from human-written text. However, many previously proposed indicators prove strongly context-dependent, with the exception of measures of lexical richness, which remain robust signals across model families and text domains. These results demonstrate which linguistic signals generalize across contexts and provide a foundation for more reliable, interpretable analyses of AI-generated language.

URL PDF HTML ☆

赞 0 踩 0

2606.04176 2026-06-04 cs.LG math.ST stat.ML stat.TH

端到端文本行检测与排序

Benjamin Kiessling

发表机构 * ALMAnaCH, Inria, France（ALMAnaCH、法国国家信息与自动化研究所）

AI总结提出Orli模型，将文本行检测与阅读顺序排序统一为图像到序列问题，通过自回归生成基线实现端到端处理，在多种历史文档上达到先进性能。

详情

AI中文摘要

实际的历史文档文本识别流程通常将布局分析分解为行检测和单独的阅读顺序步骤，后者通常由手工编码的几何启发式方法处理，但难以应对旁注、多列、表格和特定来源的编辑惯例。本文介绍了Orli（行的有序回归），一个端到端模型，将两个子任务视为单一的图像到序列问题：从页面图像中，Orli以自回归方式直接按阅读顺序生成文本行基线。基线采用弦框架参数化表示，该参数化锚定行的位置、方向和范围，同时通过垂直偏移编码局部几何；迭代细化头和局部视觉细化器生成最终曲线。在涵盖十种书写系统的196,691页异构语料库上训练，Orli在没有数据集特定训练的情况下，略微超过了之前报道的cBAD行检测的最先进水平，在多个阅读顺序基准测试中零样本达到近乎完美的覆盖率和排序，并通过有限的微调适应更专业的域外布局。该方法的源代码和模型权重在开放许可下可从https://github.com/mittagessen/orli获取。

英文摘要

Practical text-recognition pipelines for historical documents typically decompose layout analysis into line detection followed by a separate reading-order step, with the latter most often handled by a hand-coded geometric heuristic that struggles with marginalia, multiple columns, tables, and source-specific editorial conventions. This article introduces Orli (Ordered Regression of Lines), an end-to-end model that casts both sub-tasks as a single image-to-sequence problem: from a page image, Orli autoregressively generates text-line baselines directly in reading order. Baselines are represented in a chord-frame parameterization that anchors a line's position, orientation, and extent while encoding local geometry through perpendicular offsets; an iterative refinement head and a local visual refiner produce the final curve. Trained on a heterogeneous corpus of 196,691 pages spanning ten writing systems, Orli marginally exceeds the previously reported state of the art for cBAD line detection without dataset-specific training, reaches near perfect coverage and ordering on multiple reading-order benchmarks zero-shot, and adapts to more specialized out-of-domain layouts with limited fine-tuning. The method's source code and model weights are available under an open license at https://github.com/mittagessen/orli.

URL PDF HTML ☆

赞 0 踩 0

2606.04164 2026-06-04 cs.LG cs.AI

ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models

ADAPTOOD：面向分布外心电图时间序列模型的不确定性感知微调

Sotirios Vavaroutas, Yu Yvonne Wu, Ali Etemad, Cecilia Mascolo

发表机构 * University of Cambridge（剑桥大学）； Dartmouth College（达特茅斯学院）； Queen’s University（皇后大学）

AI总结提出ADAPTOOD框架，利用数据不确定性量化分布偏移严重性，结合低秩更新和自适应超参数优化，在分布外心电图时间序列任务上提升准确率高达7%和精确率12.9%。

Comments 11 pages

详情

AI中文摘要

用于训练的数据样本通常与微调和部署期间遇到的数据不同，尽管机器学习模型显示出潜力，但在只有少量标注数据集可用时，其性能仍然有限。在由不同传感器、人群和应用设置引起的分布偏移下，性能通常会下降。尽管预训练有所帮助，但模型在现实环境中经常遇到分布外（OOD）数据，导致鲁棒性降低。现有的自适应方法通常假设固定的分布偏移，并在出现多种类型或严重性时难以应对。特别是，它们忽略了偏移的严重性，例如将适应大型熟悉数据集与适应带有新任务的小型数据集同等对待，这限制了泛化能力。为了解决这个问题，我们提出了ADAPTOOD，这是一个新颖的框架，利用数据不确定性来量化分布偏移的严重性并指导时间序列的微调。这种不确定性衡量目标部署分布中的样本与预训练分布偏离的程度，提供了OOD严重性的直接信号。我们的框架将这种不确定性与低秩模型更新和自适应超参数优化相结合，以改进自适应。我们表明，在OOD任务中，ADAPTOOD比现有方法实现了高达7%的准确率和12.9%的精确率提升，在分布偏移严重性增加时仍保持强劲性能。

英文摘要

Data samples used for training often differ from those encountered during fine-tuning and deployment, and while ML models show promise, their performance remains limited when only small annotated datasets are available. Performance often degrades under distribution shifts caused by diverse sensors, populations, and application settings. Although pre-training helps, models frequently encounter out-of-distribution (OOD) data in real-world settings, leading to reduced robustness. Existing adaptation methods usually assume fixed distribution shifts and struggle when multiple types or severities occur. In particular, they overlook shift severity, for example treating adaptation to a large familiar dataset the same as adaptation to a small dataset with a new task, which limits generalisation. To address this, we propose ADAPTOOD, a novel framework that leverages data uncertainty to quantify distribution shift severity and guide fine-tuning for time series. This uncertainty measures how strongly samples from the target deployment distribution deviate from the pre-training distribution, providing a direct signal of OOD severity. Our framework combines this uncertainty with low-rank model updates and adaptive hyperparameter optimisation to improve adaptation. We show that ADAPTOOD achieves up to 7% higher accuracy and 12.9% higher precision than existing methods in OOD tasks, maintaining strong performance as distribution shift severity increases.

URL PDF HTML ☆

赞 0 踩 0

2606.04161 2026-06-04 cs.LG

When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction

当离线选择器无法超越最佳单一模型：基于edX辍学预测的诊断研究

Tyler Crosse, Alan Nadelsticher Ruvalcaba, Dustin Khang LeDuc, Thomas Trask, Nicholas Lytle, David Joyner

发表机构 * edX

AI总结针对离线选择器在实践中的表现常不如最佳单一模型的问题，提出三阶段诊断方法，通过k-NN标签一致性、离线学习器性能比较和状态特征消融实验，识别瓶颈为局部表示模糊性，建议改进状态或收集新数据而非调优学习器。

详情

AI中文摘要

不同的预测器通常在不同输入上表现优异，因此每实例选择最佳预测器有望比固定单一模型获得更高准确率。在实践中，从日志数据训练的选择器经常无法击败最强的单一预测器。在进一步调优之前，三个原因通常未被区分：不匹配的学习器、无法预测哪个模型获胜的状态、或从缓存到部署的标签偏移。一个三阶段诊断在共享缓存上排除这些原因。第一阶段通过k-NN标签一致性估计oracle恢复的局部上限。第二阶段询问配对BC和离线RL学习器（BC、DQN和CQL，跨惩罚权重）是否达到该上限。第三阶段消融选择器状态，测试更丰富的特征是否会提高上限。综合结论指向最有希望的下一步：调优学习器、重新设计状态或收集新数据。我们将其应用于在edX点击流数据上选择五个辍学预测模型。在16个时间窗口上，oracle平均比最强单一基模型高出9.7个准确率点，但BC、DQN和CQL均落在其下方的相同测试准确率带内（对十倍缓存扫描和N=2,000个保留样本鲁棒）。瓶颈是局部表示模糊性：CQL缩小了模仿差距但无部署增益（非保守性），遗憾在学习器间紧密聚集（非打破平局），三个学习器在测试准确率上收敛（非偏移）。下一次迭代应改变状态或收集新数据，而非进一步调优离线学习器。

英文摘要

Different predictors often excel on different inputs, so picking the best one per instance promises higher accuracy than committing to a single model. In practice, selectors trained from logged data routinely fail to beat the strongest single predictor. Three causes typically go unseparated before more tuning is applied: a mismatched learner, a state that does not predict which model wins, or buffer-to-deployment label shift. A three-stage diagnostic rules them out on a shared buffer. Stage~1 estimates a local ceiling on oracle recovery from $k$-NN label consistency. Stage~2 asks whether paired BC and offline-RL learners (BC, DQN, and CQL across penalty weights) reach that ceiling. Stage~3 ablates the selector state to test whether richer features would raise it. The combined verdict points to the most promising next step: tuning the learner, redesigning the state, or collecting new data. We apply it to selecting among five dropout-prediction models on edX clickstream data. Across 16 windows, the oracle beats the strongest single base model by 9.7 accuracy points on average, yet BC, DQN, and CQL land in the same test-accuracy band below it (robust to a tenfold buffer sweep and $N{=}2{,}000$ held-out examples). The bottleneck is local representational ambiguity: CQL closes the imitation gap without a deployment gain (not conservatism), regret clusters tightly across learners (not tie-breaking), and the three learners converge on test accuracy (not shift). The next iteration should change the state or collect new data, not tune the offline learner further.

URL PDF HTML ☆

赞 0 踩 0

2606.04160 2026-06-04 cs.CL cs.LG

Expert-Aware Refusal Steering

专家感知的拒绝引导

Anna C. Marbut, Daniel R. Olson, Travis J. Wheeler

发表机构 * Department of Interdisciplinary Studies（交叉学科研究部）； University of Montana（蒙大拿大学）； Department of Pharmacy Practice & Science（药学与科学系）； University of Arizona（亚利桑那大学）； European Bioinformatics Institute（欧洲生物信息研究所）； European Molecular Biology Laboratory（欧洲分子生物学实验室）； Wellcome Genome Campus（沃氏基因组校园）

AI总结研究在混合专家（MoE）大语言模型中，通过专家感知的引导向量抑制拒绝行为，发现单个专家输出即可有效引导，且注意力机制在MoE拒绝行为中起重要作用。

Comments Under review for COLM 2026

详情

AI中文摘要

指令调优的大语言模型（LLM）的安全对齐依赖于模型可靠地拒绝回答有害或不允许请求的能力。最近的研究表明，在推理过程中对密集LLM应用引导向量可以有效抑制拒绝行为，诱导模型响应有害请求。我们将这种拒绝引导方法扩展到三个开源混合专家（MoE）LLM，并发现引导性能不受MoE架构固有的复杂路由模式影响。然后，我们提出了两种专家感知的拒绝引导方法，利用拒绝特定的专家路由模式和专家特定的引导方向来抑制正常的拒绝行为。我们发现，基于单个专家的输出即可有效引导拒绝行为。我们的结果表明，引导方法捕获的拒绝信号与专家路由行为不同，这表明注意力在MoE拒绝行为中扮演重要角色。

英文摘要

Safety alignment in instruction-tuned large language models (LLMs) depends on a model's ability to reliably refuse to respond to harmful or disallowed requests. Recent work has shown that a steering vector can be applied to a dense LLM during inference to effectively suppress refusal behavior, inducing response to harmful requests. We extend this refusal steering method to three open-source Mixture-of-Experts (MoE) LLMs and find that steering performance is uninhibited by the complex routing patterns inherent to the MoE architecture. We then propose two expert-aware refusal steering methods that leverage refusal-specific expert routing patterns and expert-specific steering directions to suppress normal refusal behavior. We find that refusal behavior can be effectively steered based on the output of a single expert. Our results show that refusal signals captured by steering methods differ from expert routing behavior, suggesting a substantial role for attention in MoE refusal behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.04158 2026-06-04 cs.RO

Multi-Agent Next-Best-View Optimization for Risk-Averse Planning

多智能体风险规避规划中的下一最佳视角优化

Amirhossein Mollaei Khass, Vivek Pandey, Guangyi Liu, Athanasios Cosse, Emrah Bayrak, Nader Motee

发表机构 * Department of Mechanical Engineering and Mechanics, Lehigh University（莱文大学机械工程与力学系）； Amazon Robotics（亚马逊机器人）

AI总结提出一种分布式、风险感知的多智能体下一最佳视角框架，通过共识ADMM优化信息增益并建模碰撞风险，在降低通信开销的同时接近集中式方法的映射质量和轨迹安全性。

Comments 8 pages, 5 figures. Submitted to IROS 2026

详情

AI中文摘要

在不确定和未知环境中，多智能体下一最佳视角选择用于安全路径规划需要信息丰富、安全感知且高效的协调。集中式方法依赖于共享原始传感器数据或大量通信开销，导致可扩展性有限。我们提出一种分布式、风险感知的多智能体NBV框架，其中每个机器人维护一个私有的局部3D高斯溅射地图，团队共同最大化沿规划轨迹的掩蔽区域内的期望信息增益。通过通信图上的共识ADMM求解分布式目标，每个机器人仅交换候选视角、规划轨迹描述符和标量EIG贡献。通过局部3DGS地图上的平均风险价值对每条轨迹的碰撞风险进行建模，并用于塑造掩蔽半径和评分规划路径。在多个团队规模的Gibson环境中的实验表明，分布式公式在映射质量和轨迹安全性方面接近集中式基线，同时将通信量降低数个数量级。

英文摘要

Multi-agent Next-Best-View (NBV) selection for safe path planning in uncertain and unknown environments requires informative, safety-aware, and efficient coordination. Centralized approaches rely on sharing raw sensor data or significant communication overhead, resulting in limited scalability. We propose a distributed, risk-aware multi-agent NBV framework in which each robot maintains a private local 3D Gaussian Splatting map and the team jointly maximizes expected information gain (EIG) restricted to masked zones along planned trajectories. The resulting distributed objective is solved by Consensus ADMM (C-ADMM) over a communication graph, with each robot exchanging only candidate viewpoints, planned trajectory descriptors, and scalar EIG contributions. Collision risk along each trajectory is modeled via Average Value-at-Risk (AV@R) over the local 3DGS map and used both to shape the masking radius and to score planned paths. Experiments in Gibson environments at multiple team sizes show that the distributed formulation approaches the centralized baseline in mapping quality and trajectory safety while reducing communication by orders of magnitude.

URL PDF HTML ☆

赞 0 踩 0