基于Agentic AI的框架：缓解医疗应用中的过早诊断交接和无声幻觉

Divyansh Srivastava, Shreya Ghosh, Anshul Verma, Rajkumar Buyya

发表机构 * Distributed Systems (qCLOUDS) Lab, School of Computing ； Information Systems, The University of Melbourne, Australia ； 2Department of Computer Science ； Engineering, School of Electrical ； Computer Sciences (SECS), Indian Institute of Technology Bhubaneswar, India ； 3Department of Computer Science Banaras Hindu University, Varanasi, India

AI总结提出多智能体框架，通过确定性编排约束和两个安全机制（神经符号状态跟踪门和语义熵不确定性量化门）解决LLM在医疗对话中的过早诊断交接和无声幻觉问题，诊断精度提升11.3个百分点。

详情

AI中文摘要

大型语言模型（LLM）和多智能体系统的最新进展推动了Agentic AI的兴起，显示出在医学推理方面的潜力。然而，开放式对话代理仍然容易受到两种关键故障模式的影响：过早的诊断交接和无声的临床幻觉，这些可能在到达患者之前未被检测到。在这项工作中，我们提出了一个多智能体框架，通过用确定性编排约束取代“LLM作为法官”的路由来解决这两个问题。该框架包含两个安全机制。首先，一个神经符号状态跟踪门通过阻止诊断转换直到所有必需的维度被收集，强制实施OLDCARTS临床协议（发病、位置、持续时间、特征、加重/缓解因素、放射、时间和严重程度）的完整性。其次，一个认知不确定性量化（UQ）门计算跨K=5个独立诊断样本的语义熵（H），以在交付前识别和拦截发散输出。我们使用由llama-3.1-70b-instruct模型驱动的模拟患者代理在150个测试案例上评估该系统。完整架构实现了49.3%的诊断精度，比无约束基线绝对提高了11.3个百分点。此外，我们观察到OLDCARTS完整性（σ）与语义熵（H）之间存在统计显著的负相关（r = -0.181，p < 0.05），表明结构化信息收集与诊断不确定性降低相关。

英文摘要

Recent advances in Large Language Models (LLMs) and multi-agent systems have driven the rise of Agentic AI, showing promise for medical reasoning. However, open-ended conversational agents remain prone to two critical failure modes: premature diagnostic handoff and silent clinical hallucinations that may go undetected before reaching the patient. In this work, we propose a multi-agent framework that addresses both issues by replacing ``LLM-as-a-judge'' routing with deterministic orchestration constraints. The framework incorporates two safety mechanisms. First, a neuro-symbolic state-tracking gate enforces completeness of the OLDCARTS clinical protocol (Onset, Location, Duration, Character, Aggravating/Alleviating factors, Radiation, Timing, and Severity) by blocking diagnostic transitions until all required dimensions are collected. Second, an epistemic uncertainty quantification (UQ) gate computes semantic entropy (H) across K=5 independent diagnostic samples to identify and intercept divergent outputs before delivery. We evaluate the system using simulated patient agents powered by the llama-3.1-70b-instruct model on 150 test cases. The full architecture achieves 49.3% diagnostic precision, representing an absolute improvement of 11.3 percentage points over an unconstrained baseline. Additionally, we observe a statistically significant negative correlation (r = -0.181, p < 0.05) between OLDCARTS completeness (\sigma) and semantic entropy (H), suggesting that structured information gathering is associated with reduced diagnostic uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2606.18066 2026-06-17 cs.LG 新提交

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

NoiseTilt: 噪声倾斜反向核用于扩散奖励对齐

Jisung Hwang, Yunhong Min, Jaihoon Kim, I-Chao Shen, Minhyuk Sung

发表机构 * KAIST（韩国科学技术院）； The University of Tokyo（东京大学）

AI总结提出噪声倾斜反向核(NTRK)，通过将奖励梯度注入噪声项实现奖励引导采样，保持预训练反向核不变，每步仅需单样本，在奖励对齐任务中超越现有方法且不损失样本质量。

Comments 52 pages

详情

AI中文摘要

我们引入了噪声倾斜反向核(NTRK)，这是一种奖励引导的扩散采样器，通过噪声项注入奖励梯度，保持预训练反向核不变，且每步仅需一个样本。推理时的奖励引导采样极大地扩展了预训练扩散模型的通用性。然而，现有方法面临权衡。基于梯度的引导会偏移反向均值，引导生成但将中间状态推离模型训练区域，降低质量。基于搜索的方法保持质量但无法获得梯度信号。先前没有方法能同时实现两者。NTRK通过保持反向均值固定并将噪声项偏向高奖励来解决这一问题。我们引入了一个白化算子，这是NTRK背后的核心机制，使得奖励梯度可以安全地作为噪声注入而不丢失其引导信号。在各种奖励对齐任务中，NTRK在保持样本质量的同时超越了最新的基线方法。值得注意的是，在美学生成任务上，NTRK仅用25次NFE就超越了最佳基线在500次NFE时的奖励，计算量减少了20倍。

英文摘要

We introduce the Noise-Tilted Reverse Kernel (NTRK), a reward-guided diffusion sampler that injects reward gradients through the noise term, leaving the pretrained reverse kernel unchanged and requiring only a single sample per step. Reward-guided sampling at inference time has greatly expanded the versatility of pretrained diffusion models. Yet existing methods face a trade-off. Gradient-based guidance shifts the reverse mean, steering generation but pushing intermediate states outside the region that the model was trained on and degrading quality. Search-based methods preserve quality but gain no gradient signal. No prior method achieves both. NTRK resolves this by keeping the reverse mean fixed and biasing the noise term toward high reward. We introduce a whitening operator, the central mechanism behind NTRK, that makes the reward gradient safe to inject as noise without losing its guiding signal. Across various reward alignment tasks, NTRK outperforms recent state-of-the-art baselines without losing sample quality. Remarkably, on aesthetic generation, NTRK surpasses the reward of the best baseline at 500 NFEs using only 25 NFEs, a 20$\times$ reduction in compute.

URL PDF HTML ☆

赞 0 踩 0

2606.18063 2026-06-17 cs.CV cs.AI cs.LG 新提交

When LLMs Analyze Scars: From Images to Clinically-Meaningful Features

当LLM分析疤痕：从图像到临床有意义的特征

Ruman Wang, Hangting Ye

发表机构 * Liaoning University of Traditional Chinese Medicine（辽宁中医药大学）； School of Artificial Intelligence, Jilin University（吉林大学人工智能学院）

AI总结提出ScaFE框架，利用LLM作为知识驱动的特征工程师，将高维图像转化为低维临床可解释特征，在数据稀缺的疤痕分类中优于端到端深度学习方法。

详情

AI中文摘要

医学图像分类面临一个基本困境：虽然深度学习模型在大规模数据上表现卓越，但现实临床场景中由于标注成本、隐私约束和疾病罕见性，常常遭受严重的数据稀缺。这一挑战在病理性疤痕分类中尤为突出，区分瘢痕疙瘩和增生性疤痕需要微妙的专家知识，且标注图像极其有限。我们提出一种新范式，将大型语言模型（LLM）重新定位为知识驱动的特征工程师，而非端到端分类器。我们将此框架称为ScaFE（疤痕特征工程）。我们的关键洞察是，LLM编码了丰富的医学知识，可以外部化为可执行的特征提取代码，从而将高维图像转化为低维、临床可解释的表示。具体来说，我们使用既定的疤痕评估标准提示LLM，生成确定性的Python代码，提取与临床评分系统（如温哥华疤痕量表）对齐的特征。我们的方法提供三个关键优势：（1）数据效率，通过将知识获取与统计学习解耦，在有限训练样本下实现稳健性能；（2）隐私保护，原始图像在本地处理，不暴露给外部LLM；（3）可解释性，通过基于临床推理的显式特征。在疤痕分类上的大量实验表明，在数据有限条件下，我们的方法始终优于端到端深度学习基线或使用LLM作为黑盒分类器，为将LLM集成到数据高效且临床透明的医学AI系统中开辟了有前景的方向。

英文摘要

Medical image classification faces a fundamental dilemma: while deep learning models achieve remarkable performance at scale, real-world clinical scenarios often suffer from severe data scarcity due to annotation costs, privacy constraints, and disease rarity. This challenge is particularly pronounced in pathological scar classification, where differentiating keloids from hypertrophic scars requires subtle expert knowledge and labeled images are extremely limited. We propose a novel paradigm that repositions large language models (LLMs) as knowledge-driven feature engineers rather than end-to-end classifiers. We call this framework ScaFE (Scar Feature Engineering). Our key insight is that LLMs encode rich medical knowledge that can be externalized as executable feature extraction code, enabling the transformation of high-dimensional images into low-dimensional, clinically interpretable representations. Specifically, we prompt an LLM with established scar assessment criteria to generate deterministic Python code that extracts features aligned with clinical scoring systems such as the Vancouver Scar Scale. Our approach offers three key advantages: (1) data efficiency, achieving robust performance with limited training samples by decoupling knowledge acquisition from statistical learning; (2) privacy preservation, as raw images are processed locally without exposure to external LLMs; and (3) interpretability, through explicit features grounded in clinical reasoning. Extensive experiments on scar classification demonstrate that our method consistently outperforms end-to-end deep learning baselines or using LLMs as black-box classifiers under limited data conditions, establishing a promising direction for integrating LLMs into data-efficient and clinically transparent medical AI systems.

URL PDF HTML ☆

赞 0 踩 0

2606.18062 2026-06-17 cs.CL cs.AI cs.CR cs.HC 新提交

Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond

现实中的安全与隐私提示：用户向LLM提问及LLM如何回应

Hobin Kim, Xiaoyuan Wu, Omer Akgul, Lujo Bauer, Nicolas Christin

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； RSAC Labs（RSAC实验室）

AI总结基于WildChat数据集，分析用户向大语言模型提出的安全与隐私问题，分类并评估模型回答质量与一致性。

详情

AI中文摘要

大型语言模型（LLM）被广泛用于满足用户的信息需求；用户向LLM询问天气、提出教育问题，并咨询法律帮助。一个特别未被充分研究的领域是数字安全与隐私（S&P），用户可能寻求LLM的帮助，了解如何保护他们的在线账户或保护计算机免受网络攻击。据我们所知，之前没有研究收集或分析用户向LLM提出的S&P问题；先前关于LLM回答质量的研究依赖于专家撰写的S&P误解或常见问题解答，而非用户查询。利用WildChat（一个从现实环境中收集的320万用户-LLM对话数据集），我们的研究识别出14,727个S&P提示，并将其分为九类，涵盖广泛的S&P主题。从S&P提示中，我们抽样了450个，并进行了主题分析，以描述用户向LLM提出的S&P问题。与主题分析分开，我们整理了270个寻求建议的S&P提示，其中用户询问建议、指导或特定的S&P信息。我们测量了将提示向LLM提出10次时的LLM回答质量和一致性。我们发现，商业LLM优于开放权重模型（GPT 5.5在98%的提示上提供了“足够好”的回答；Llama 4为47%）。然而，在平均获得高质量回答的提示中，商业模型有时会在不同运行中产生矛盾的回答，有可能使用户困惑或误导用户。

英文摘要

Large language models (LLMs) are widely used to fulfill users' information needs; users ask LLMs about the weather, pose educational questions, and consult them for legal assistance. One particularly understudied area is digital security and privacy (S&P), where users may seek LLMs' help on how to secure their online accounts or protect their computers from cyber attacks. To the best of our knowledge, no prior study has collected or analyzed the S&P questions users ask LLMs; prior research on LLM response quality relied on expert-authored S&P misconceptions or FAQs rather than user queries. Drawing from WildChat, a dataset of 3.2M user-LLM conversations collected in the wild, our study identifies 14,727 S&P prompts and categorizes them into nine categories covering a wide range of S&P topics. From the S&P prompts, we sampled 450 and performed a thematic analysis to characterize the S&P questions users ask LLMs. Separate from the thematic analysis, we curated 270 advice-seeking S&P prompts, where users ask for recommendations, guidance, or specific S&P information. We measured LLM response quality and consistency when posing the prompt to LLMs 10 times. We found that commercial LLMs outperform open-weight models (GPT 5.5 provided "good enough" responses on 98% of prompts; Llama 4 on 47%). However, among prompts that received high-quality responses on average, commercial models sometimes produce contradictory responses across runs, risking confusing or misleading users.

URL PDF HTML ☆

赞 0 踩 0

2606.18060 2026-06-17 cs.AI cs.CL 新提交

PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience

PseudoBench: 衡量自主研究如何助长伪科学

Xinyang Liao, Lingyu Li, Huacan Liu, Tianle Gu, Yang Yao, Tong Zhu, Yan Teng, Yingchun Wang

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； Xi’an Jiao Tong University（西安交通大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结提出PseudoBench基准，通过200个伪科学声明-证据对评估AI代理识别和抵制伪科学的能力，发现当前系统极易生成有说服力的伪科学报告，拒绝率接近零。

Comments 26 pages, 21 figures

详情

AI中文摘要

随着基于大型语言模型的代理进入自主科学研究，它们抵制伪科学的能力变得越来越重要。否则，此类系统可能迅速生成看似合理但具有误导性的研究，污染学术文献并侵蚀对科学的信任。我们提出了PseudoBench，一个对抗性基准，用于评估自主研究系统能否识别和抵制伪科学叙述。PseudoBench包含五个领域的200个精心策划的伪科学声明-证据对，并通过从实验到写作的端到端研究流程评估代理。测试了七个最先进的代理，我们发现当前系统很容易生成与伪科学前提一致的有说服力的报告，拒绝率接近零，最高抵制率仅为27.4%。更强的代理有可能用更复杂的科学语言包装伪科学，增加其表面可信度。这些发现揭示了助长伪科学的惊人能力，呼吁在广泛部署之前进行科学对齐。

英文摘要

As Large Language Model based agents enter autonomous scientific research, their ability to resist pseudoscience becomes increasingly important. Otherwise, such systems may rapidly generate plausible yet misleading studies that contaminate academic literature and erode trust in science. We present PseudoBench, an adversarial benchmark for evaluating whether agentic auto-research systems can identify and resist pseudoscientific narratives. PseudoBench contains 200 curated pseudoscientific claim-evidence pairs across five domains and evaluates agents through an end-to-end research pipeline from experiments to writing. Testing seven state-of-the-art agents, we find that current systems readily produce persuasive reports that align with pseudoscientific premises with near-zero refusal rates and the highest resistance of only 27.4%. Stronger agents risk packaging pseudoscience in more sophisticated scientific language, increasing its apparent credibility. These findings reveal an alarming capacity to fuel pseudoscience, calling for scientific alignment before widespread deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.18056 2026-06-17 cs.CL 新提交

ConSA: Controllable Sparsity in Hybrid Attention via Learnable Allocation

ConSA: 通过可学习分配实现混合注意力中的可控稀疏性

Yao Chen, Yinqi Yang, Junyuan Shang, Xiangzhao Hao, Simeng Zhang, Yilong Chen, Tingwen Liu, Shuohuan Wang, Dianhai Yu

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）； School of Cyber Security, University of Chinese Academy of Sciences（中国科学院大学网络空间安全学院）； Baidu Inc.（百度公司）

AI总结提出ConSA框架，通过L0正则化和增广拉格朗日约束学习全注意与滑动窗口注意的最优分配，实现用户指定的稀疏目标，在0.6B和1.7B规模LLM上优于规则基线，并发现底层SWA与中层FA的集中分配模式。

详情

AI中文摘要

结合全注意（FA）和滑动窗口注意（SWA）的混合架构是高效LLM推理的一种有前景的范式。然而，现有方法通常依赖手工规则或简单的后验启发式进行FA/SWA分配，并且对这些设计背后的注意行为分析有限。我们提出混合注意力中的可控稀疏性（ConSA），一个在用户指定稀疏目标下学习最优FA/SWA分配的框架。ConSA采用L0正则化学习选择每个注意力单元FA或SWA的二元掩码，同时增广拉格朗日约束在层或KV头粒度上强制执行目标稀疏性。我们在0.6B和1.7B规模的两个LLM上评估ConSA。学习到的分配一致优于基于规则的基线，其中KV头级分配比层级分配带来明显增益。学习到的模式将SWA置于底层，并将FA集中在连续的中间层块中，这与基于规则方法中均匀交错模式不同。这种结构在模型规模、稀疏级别和分配粒度上持续存在，揭示了学习分配背后细粒度的内在注意行为谱。

英文摘要

Hybrid architectures combining full attention (FA) and sliding-window attention (SWA) are a promising paradigm for efficient LLM inference. However, existing methods typically rely on hand-crafted rules or simple post-hoc heuristics for FA/SWA allocation and offer limited analysis of the attention behaviors underlying these designs. We propose Controllable Sparsity in Hybrid Attention (ConSA), a framework that learns optimal FA/SWA assignment under a user-specified sparsity target. ConSA employs L0 regularization to learn binary masks selecting between FA and SWA for each attention unit, while an augmented Lagrangian constraint enforces the target sparsity at either layer or KV-head granularity. We evaluate ConSA on two LLMs at the 0.6B and 1.7B scales. Learned allocations consistently outperform rule-based baselines, with KV-head-wise allocation yielding clear gains over layer-wise allocation. The learned patterns place SWA in the bottom layers and concentrate FA into contiguous middle-layer blocks, diverging from evenly interleaved patterns in rule-based methods. This structure persists across model scales, sparsity levels, and allocation granularities, revealing a fine-grained spectrum of intrinsic attention behaviors that underlies the learned allocation.

URL PDF HTML ☆

赞 0 踩 0

2606.18053 2026-06-17 cs.RO 新提交

A Hybrid Optimization Framework for Grasp Synthesis under Partial Observations

一种用于部分观测下抓取合成的混合优化框架

Wenzheng Zhang, Fahira Afzal Maken, Tin Lai, Fabio Ramos

发表机构 * School of Computer Science, The University of Sydney（悉尼大学计算机科学学院）； Data61, CSIRO（澳大利亚联邦科学与工业研究组织Data61）； NVIDIA（英伟达）

AI总结提出结合基于学习的能量模型与解析迭代最近点方法的混合框架，从部分观测点云生成鲁棒抓取，在67个物体5360次抓取尝试中平均成功率达60.9%，优于现有方法。

2606.18051 2026-06-17 cs.CL 新提交

当英语不是最好的老师：跨语言上下文学习中的源语言效应

Fred Philippy, Siwen Guo, Jacques Klein, Tegawendé F. Bissyandé

发表机构 * Snt, University of Luxembourg（卢森堡大学科学技术系）； Luxembourg Institute of Science and Technology（卢森堡科学技术研究院）

AI总结研究跨语言上下文学习（ICL）中源语言选择的影响，发现基于微调的预期在ICL中不成立，提出有效选择源语言的替代启发式方法。

Comments Accepted at 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026), co-located with ACL 2026

2606.18024 2026-06-17 cs.LG cs.AI 新提交

Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation

灾难性遗忘是低秩的：持续适应的函数空间理论

Ido Nitzan Hidekel, Dan Raviv

发表机构 * Tel Aviv University（特拉维夫大学）

AI总结本文在神经正切核（NTK）框架下提出函数空间理论，推导出新任务训练导致旧任务预测漂移的闭式表达式，揭示遗忘集中在少量旧任务NTK本征模式上，并给出低秩特性与Kronecker缩放规则。

Comments Accepted to the ICML 2026 Workshop on Continual Adaptation at Scale: Towards Sustainable AI

详情

AI中文摘要

持续适应中的灾难性遗忘通常通过参数漂移、重放或蒸馏来研究，但这些观点未能识别哪些输出空间方向是脆弱的。我们在NTK机制下给出一个函数空间解释：新任务训练通过跨任务核诱导旧任务预测漂移，从而在新任务梯度步骤之前得到遗忘向量的闭式预测器。在冻结主干线性头PEFT-CL中，模型在可训练参数上是线性的，预测器精确到数值精度；对于非线性适配器/全微调，它是局部NTK近似。同一表达式揭示遗忘集中在少量旧任务NTK本征模式上，并在冻结线性头下给出脆弱秩的Kronecker缩放规则。这些结果澄清了与先前NTK重叠理论的关系，解释了为什么参数空间正则化器可能遗漏输出空间干扰，并激发了一种有针对性的谱正则化器。

英文摘要

Catastrophic forgetting in continual adaptation is usually studied through parameter drift, replay, or distillation, but these views do not identify which output-space directions are vulnerable. We give a function-space account in the NTK regime: new-task training induces old-task prediction drift through the cross-task kernel, yielding a closed-form predictor for the forgetting vector before any new-task gradient step. In frozen-backbone linear-head PEFT-CL, where the model is linear in the trainable parameters, the predictor is exact up to numerical precision; for nonlinear adapters/full fine-tuning, it is a local NTK approximation. The same expression reveals that forgetting concentrates in a small number of old-task NTK eigenmodes and under frozen linear heads gives a Kronecker scaling rule for the vulnerable rank. These results clarify the relation to prior NTK-overlap theory, explain why parameter-space regularizers can miss output-space interference, and motivate a targeted spectral regularizer.

URL PDF HTML ☆

赞 0 踩 0

2606.18023 2026-06-17 cs.LG cs.AI 新提交

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

LoopCoder-v2: 仅循环一次以实现高效的测试时计算扩展

Jian Yang, Shawn Guo, Wei Zhang, Tianyu Zheng, Yaxin Du, Haau-Sing Li, Jiajun Wu, Yue Song, Yan Xing, Qingsong Cai, Zelong Huang, Chuan Hao, Ran Tao, Xianglong Liu, Wayne Xin Zhao, Mingjie Tang, Weifeng Lv, Ming Zhou, Bryan Dai

发表机构 * Beihang University（北京航空航天大学）； IQuest Research ； Langboat（浪波）； Renmin University of China（中国人民大学）

AI总结本文提出并行循环Transformer（PLT）并研究循环次数选择，发现两循环变体在代码生成等任务上显著提升，而三循环以上性能下降，揭示了增益-成本权衡。

详情

AI中文摘要

循环Transformer通过重复应用共享块来扩展潜在计算，但顺序循环会随着循环次数增加延迟和KV缓存内存。并行循环Transformer（PLT）通过跨循环位置偏移（CLP）和共享KV门控滑动窗口注意力来缓解这一成本，使循环次数成为实际设计选择。因此，我们通过增益-成本视角研究PLT循环次数选择：额外的循环可能细化表示，但CLP在每个循环边界引入位置不匹配。我们通过从头训练LoopCoder-v2（一组具有不同循环次数的7B PLT编码器）在18T token上，随后进行匹配的指令调优和评估来实例化这项研究。经验上，两循环变体在代码生成、代码推理、代理软件工程和工具使用基准上比无循环基线带来广泛提升，将SWE-bench Verified从43.0提高到64.4分，Multi-SWE从14.0提高到31.0分。相比之下，三循环或更多循环的变体性能下降，揭示了强烈的非单调循环次数效应。我们的诊断表明，循环2提供了主要的生产性细化，而后续循环产生递减、振荡的更新和降低的表示多样性。由于CLP引起的不匹配在细化收益缩小时大致固定，偏移成本日益占主导。这种增益-成本权衡解释了PLT在两循环处饱和，并为循环次数选择提供了诊断。

英文摘要

Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.

URL PDF HTML ☆

赞 0 踩 0

2606.18022 2026-06-17 cs.LG 新提交

C2FL：空间和时间漂移下的聚类持续联邦学习

Davide Domini, Gianluca Aguzzi, Lorenzo Pellegrini, Mirko Viroli, Lukas Esterle

发表机构 * University of Bologna（博洛尼亚大学）； Aarhus University（哥本哈根大学）

AI总结针对空间异质性和时间漂移下节点隐私保护的集体自适应问题，提出C2FL方法，通过空间聚类自组织学习组，结合经验回放和停留时间感知自适应平均，实现鲁棒集体适应。

详情

AI中文摘要

集体自适应系统（CAS）越来越依赖机器学习，让每个节点从本地感知数据中学习，使其行为与周围环境对齐。然而，扩展这种智能带来了根本性挑战：感知数据通常涉及隐私，无法集中收集；节点是移动的，穿越不同区域，附近节点感知相似现象，而远处节点观察到截然不同的条件，形成自然空间聚类；并且由于移动性，这些分布随时间演变，引入时间漂移，使本地模型逐渐过时。这些动态出现在多个领域——车辆感知、无人机监测、智能手机众包——但隐私、空间异质性和时间漂移的相互作用严重削弱了传统学习策略。因此，我们提出C2FL，一种完全分布式的联邦学习（FL）方法，其中节点通过空间聚类自组织成学习组，反映环境的地理结构。为了抵消时间漂移，每个节点将经验回放与停留时间感知的自适应平均步骤相结合，随着在同一区域停留更长时间，逐步纳入区域共识，同时在不断变化的分布下保留先前获得的知识。我们在系统再现空间和时间变化的合成实验上评估了我们的方法，表明标准联邦策略在这些条件下显著退化，而我们的方法恢复了鲁棒的集体适应。

英文摘要

Collective Adaptive Systems (CAS) increasingly rely on machine learning to let each node learn from locally sensed data, aligning its behavior with the surrounding environment. Scaling this intelligence, however, raises fundamental challenges: sensed data is often privacy-sensitive, preventing centralized collection; nodes are mobile, traversing regions where nearby nodes perceive similar phenomena while distant ones observe radically different conditions, creating natural spatial clusters; and these distributions evolve over time due to mobility, introducing temporal drift that makes local models progressively stale. These dynamics arise across domains - vehicular sensing, drone-based monitoring, smartphone crowdsensing - yet the interplay of privacy, spatial heterogeneity, and temporal drift severely undermines conventional learning strategies. Therefore, we propose C2FL, a fully distributed Federated Learning (FL) approach where nodes self-organize into learning groups through spatial clustering, reflecting the geographic structure of the environment. To counteract temporal drift, each node combines experience replay with a dwell-time-aware adaptive averaging step, progressively incorporating the regional consensus as it remains longer within the same area, while preserving previously acquired knowledge under evolving distributions. We evaluate our approach on synthetic experiments that systematically reproduce spatial and temporal shifts, showing that standard federated strategies degrade significantly under these conditions and that our method restores robust collective adaptation.

URL PDF HTML ☆

赞 0 踩 0

2606.18001 2026-06-17 cs.LG 新提交

Half a Link can Be Enough to Predict a Whole Link: Understanding Generalization in Knowledge Graph Foundation Models

半条链接足以预测整条链接：理解知识图谱基础模型中的泛化

Cosimo Gregucci, Obaidah Theeb, Daniel Hernandez, Antonio Vergari, Steffen Staab

发表机构 * Institute for AI, University of Stuttgart（斯图加特大学人工智能研究所）； University of Southampton（南安普顿大学）； University of Edinburgh（爱丁堡大学）

AI总结本文通过分析知识图谱基础模型在未见图上的零样本泛化，发现模型利用部分可见的“半链接”进行预测，并基于此提出四类场景的分类法，揭示现有模型的泛化机制与改进方向。

详情

AI中文摘要

知识图谱（KG）基础模型（KGFMs）是零样本泛化器：只需训练一次，它们就能在未见过的图上预测链接，无需重新训练。然而，理解它们何时以及如何能够在不同KG间稳健泛化仍是一个开放问题。在本文中，我们揭示了它们的泛化机制，强调了它们在未见KG上的性能在涉及部分可见链接（我们称之为半链接）时并非均匀。事实上，我们表明，要预测一个测试三元组$(h,r,t)$，在实践中可能只需在推理图中观察到半链接$(h,r)$或$(r,t)$。这产生了四种场景的分类法，这些半链接的组合被观察到或未被观察到。通过对这些场景进行严格的分层分析，我们揭示了SoTA KGFMs利用可见的半链接进行预测，而不可见的半链接则带来不同的挑战。因此，我们更细粒度的分类法可以作为稳健KGFM泛化的诊断协议，并突出新KGFM可以改进的地方。

英文摘要

Knowledge graph (KG) foundation models (KGFMs) are zero-shot generalizers: trained once, they can predict links on unseen graphs without retraining. However, understanding when and how they can robustly generalize across KGs is still an open question. In this paper, we shed some light on their generalization mechanisms highlighting how their performance on unseen KGs is not uniform when it comes to partially seen links, which we call half-links. In fact, we show that to predict a test triple $(h,r,t)$ it might suffice in practice to have observed the half-link $(h,r)$ or $(r,t)$ in the inference graph. This yields a taxonomy of four scenarios when combinations of these half-links are observed or not. In a rigorous stratified analysis over these scenarios, we reveal that SoTA KGFMs use seen half links for predictions, while unseen half-links pose different challenges. As such, our finer-grained taxonomy can be a diagnostic protocol for robust KGFM generalization and highlights where novel KGFMs can improve.

URL PDF HTML ☆

赞 0 踩 0

2606.17998 2026-06-17 cs.CV 新提交

AIGS-Net: Compact Illumination Field Modeling via 2D Gaussian Splatting for Fast Low-Light Image Enhancement

AIGS-Net: 基于2D高斯泼溅的紧凑光照场建模用于快速低光图像增强

Yuhan Chen, Kunyang Huang, Fuchen Li, Zhuohan Qin, Guofa Li, Wenbo Chu, Keqiang Li

发表机构 * College of Mechanical and Vehicle Engineering, Chongqing University（重庆大学机械与车辆工程学院）； Department of Electrical and Computer Engineering, Carnegie Mellon University（卡内基梅隆大学电气与计算机工程系）； Herbert Wertheim College of Engineering, University of Florida（佛罗里达大学赫伯特·韦特海姆工程学院）； School of Mathematics and Statistics, Qingdao University（青岛大学数学与统计学院）； National Innovation Center of Intelligent and Connected Vehicles（国家智能网联汽车创新中心）； School of Vehicle and Mobility, Tsinghua University（清华大学车辆与运载学院）

AI总结提出AIGS-Net，通过输入自适应的2D高斯泼溅光照场和零参数多尺度上下文编码，以约40个可学习参数实现低光图像增强，在LOL和LSRW基准上平衡了增强质量与推理效率。

详情

AI中文摘要

现有的低光图像增强方法通常在光照场建模的表征能力与计算复杂度之间存在瓶颈。为解决此问题，本文提出自适应光照高斯泼溅网络（AIGS-Net），一种用于快速低光增强的超轻量级架构。与传统的静态先验不同，AIGS-Net构建了一个输入自适应的2D高斯泼溅光照场。高斯基函数的不透明度由输入图像的相对亮度统计动态调制，并通过有序alpha合成渲染空间变化的光照补偿。为了高效指导自适应光照补偿，引入了一个零参数非线性多尺度上下文编码模块，无需额外卷积权重即可提取低频结构和局部对比度线索。为抑制噪声放大和传感器引起的颜色偏差，AIGS-Net集成了噪声掩膜估计、锁定单通道伽马映射、跨通道一致性正则化和目标颜色对齐约束。在LOL和LSRW基准上的实验表明，AIGS-Net在仅需约40个可学习参数的情况下，改善了细节恢复和颜色保真度，实现了增强质量与极端推理效率之间的有效权衡。

英文摘要

Existing low-light image enhancement methods often face a bottleneck between the representation capacity of illumination-field modeling and computational complexity. To address this issue, this paper proposes an Adaptive Illumination Gaussian Splatting Network (AIGS-Net), an ultra-lightweight architecture for fast low-light enhancement. Unlike conventional static priors, AIGS-Net constructs an input-adaptive 2D Gaussian Splatting illumination field. The opacity of Gaussian basis functions is dynamically modulated by relative luminance statistics of the input image, and spatially varying illumination compensation is rendered through ordered alpha compositing. To guide adaptive illumination compensation efficiently, a zero-parameter nonlinear multiscale contextual encoding module is introduced to extract low-frequency structures and local contrast cues without additional convolutional weights. To suppress noise amplification and sensor-induced color bias, AIGS-Net integrates noise-mask estimation, locked single-channel Gamma mapping, cross-channel consistency regularization, and target color-alignment constraints. Experiments on LOL and LSRW benchmarks show that AIGS-Net improves detail recovery and color fidelity while requiring only approximately 40 learnable parameters, achieving an effective trade-off between enhancement quality and extreme inference efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.17996 2026-06-17 cs.LG cs.AI 新提交

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

多重周期性与通道相关的小波分解在长期时间序列预测中的应用

Bin Wang, Heming Yang, Jinfang Sheng

发表机构 * School of Computer Science and Engineering, Central South University（中南大学计算机科学与工程学院）

AI总结提出McWC模型，通过多层周期性构建、多层感知机提取通道相关性、多级小波分解融合高低频信息，并在频域解耦通道内自相关，实现高效准确的长期预测。

详情

AI中文摘要

周期性和趋势是时间序列数据的重要组成部分，许多基于周期性和趋势的研究在长期时间序列预测中取得了良好效果。然而，我们认为当前工作忽略了时间序列数据中真实世界通道间相关性的影响，导致预测次优。此外，这些模型依赖复杂设计来捕获多样信息，导致计算效率低下。为解决这一挑战，我们提出McWC，一种长期时间序列预测模型，分别对周期性、趋势和通道间相关性进行建模。具体来说，McWC首先使用多层周期性构建模块从数据中解耦周期性信息。然后，使用多层感知机提取通道间相关性。接着，使用多级小波分解模块对数据中的多层高频和低频信息进行建模和融合。最后，聚合不同组件的结果以获得输出。同时，我们通过在频域计算损失函数来解耦通道内自相关。在六个真实世界数据集上的实验表明，McWC实现了最先进的性能，展现出卓越的计算效率和历史信息提取能力。

英文摘要

Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the influence of real-world inter-channel correlations in time series data which leads to suboptimal predictions. Furthermore, these models rely on complex designs to capture diverse information so that resulting in low computational efficiency. To address this challenge, we propose McWC, a long-term time series forecasting model that separately models the cyclicity, trend, and inter-channel correlations. Specifically, McWC first decouples cyclical information from data using a multi-layer cyclicity construction module. Then, it extracts inter-channel correlations using multi-layer perceptron. Next, it models and fuses the multi-layer high-frequency and low-frequency information from data using a multi-level wavelet decomposition module. Finally, it aggregates the results of different components to obtain the output. Simultaneously, we decouple intra-channel autocorrelations by calculating a loss function in the frequency domain. Experiments on six real-world datasets demonstrate that McWC achieves state-of-the-art performance, exhibiting excellent computational efficiency and historical information extraction capabilities.

URL PDF HTML ☆

赞 0 踩 0