arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.11264 2026-06-11 q-bio.QM cs.AI 新提交

OmniBioTwin: A System-of-Twinned-Systems Framework for Health Digital Twins

OmniBioTwin：用于健康数字孪生的孪生系统之系统框架

Zhaohui Wang, Yu Huang, Jiang Bian

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出OmniBioTwin框架，通过多层级网络架构中的模块化孪生体和交互算子，实现跨尺度健康数字孪生的系统级集成，并在阿尔茨海默病GLP-1信号通路中验证。

2606.11263 2026-06-11 math.ST cs.LG cs.NA math.NA math.PR stat.TH 新提交

Geometric bias in eigenspace perturbation under random heterogeneous noise

随机异质噪声下特征空间扰动的几何偏差

Fengkai Liu, Ke Wang, Wanjie Wang

发表机构 * Department of Mathematics, Hong Kong University of Science and Technology（香港科技大学数学系）； Department of Statistics and Data Science, National University of Singapore（新加坡国立大学统计与数据科学系）

AI总结针对稀疏、异质方差噪声下的信号加噪声矩阵，研究发现经验特征向量存在经典扰动界无法捕捉的系统性几何偏差，并通过二次向量方程和精细各向同性局部律推导了最优非渐近扰动界。

Comments 104 pages, 1 figure

详情

AI中文摘要

谱方法从根本上依赖于主特征空间在随机扰动下的稳定性。经典上，这种稳定性由 Davis-Kahan 和 Wedin 定理量化，这些定理利用噪声的算子范数和相关谱间隙来界定特征空间误差。虽然这些最坏情况界对于任意确定性扰动是紧的，但在低秩信号加随机噪声的设置中可能造成浪费，因为它们未能捕捉信号几何与噪声分布之间的细粒度相互作用。在本文中，我们研究了被具有任意非齐次方差剖面的稀疏随机噪声破坏的信号加噪声矩阵的谱扰动。我们证明，在异质噪声方差下，经验特征向量遭受系统性的、确定性的几何偏差，这种偏差完全不为经典扰动界所见。通过利用二次向量方程并建立精细的各向同性局部律，我们推导了在算子范数和 $2\to\infty$ 范数下前导特征空间的近最优、非渐近扰动界。这些界将通常的信噪比贡献、随机波动和由信号特征空间与行方差剖面对齐决定的结构化几何偏差项分离开来。

英文摘要

Spectral methods rely fundamentally on the stability of principal eigenspaces under random perturbations. Classically, this stability is quantified by the Davis-Kahan and Wedin theorems, which bound the eigenspace error using the operator norm of the noise and the relevant spectral gaps. While these worst-case bounds are sharp for arbitrary deterministic perturbations, they can be wasteful in the low-rank signal-plus-random-noise setting, as they fail to capture the fine-grained interaction between the signal geometry and the noise distribution. In this paper, we study the spectral perturbation of signal-plus-noise matrices corrupted by sparse, random noise with an arbitrary, inhomogeneous variance profile. We demonstrate that under heterogeneous noise variances, the empirical eigenvectors suffer a systematic, deterministic geometric bias that is entirely invisible to classical perturbation bounds. By leveraging the Quadratic Vector Equation (QVE) and establishing fine-grained isotropic local laws, we derive near-optimal, non-asymptotic perturbation bounds for the leading eigenspaces in the operator and $2\to\infty$ norms. The bounds separate the usual signal-to-noise contribution, stochastic fluctuations, and structured geometric bias terms determined by the alignment between the signal eigenspaces and the row-wise variance profile.

URL PDF HTML ☆

赞 0 踩 0

2606.11256 2026-06-11 physics.chem-ph cs.LG cs.NE 新提交

My Chemical Harness: Evolutionary Molecular Design over Synthetic Pathways with Large Language Model Agents

我的化学缰绳：基于合成路径的大语言模型智能体进化分子设计

César Ojeda, Darius A. Faroughy, Maryam Karimi, Payam Zarrintaj, Mir Mehdi Seyedebrahimi, Martín Carballo-Pacheco

发表机构 * Institute of Mathematics, Faculty of Science, University of Potsdam（数学研究所，科学学院，波茨坦大学）； NHETC, Department of Physics and Astronomy, Rutgers University（NHETC，物理与天文学系，罗格斯大学）； Potsdam Transfer, University of Potsdam（波茨坦转移，波茨坦大学）； E3 LLC

AI总结提出一种以可执行合成路径为种群、大语言模型仅作策略控制器的进化框架，在可溶性环氧化物水解酶代理任务上达到最优性能。

Comments 27 pages | 10 figures

详情

AI中文摘要

当候选结构伴随可行的合成路线时，设计具有目标性质的分子最为有用。我们介绍了My Chemical Harness，一种面向目标分子设计的路线原生进化框架，其中搜索种群由可执行的合成路径而非孤立的分子图组成。每条路径由可购买的构建块和反应模板构建，通过确定性化学工具执行，并通过任务特定的分子预言机评分。大语言模型仅用作策略控制器，选择关于路径长度、移动类型、反应家族、基序和探索压力的高级偏好，而本地代码执行路径构建、验证、去重、评分、选择和记忆更新。这种分离使得大语言模型能够引导探索，同时防止其引入幻觉产物或不受支持的反应步骤。在一个可溶性环氧化物水解酶代理任务上，我们的LLM智能体优于单次LLM和确定性控制器，在sEH分数、合成可及性分数和AiZynthFinder成功率指标上达到最先进性能。这些结果表明，受约束的大语言模型智能体可以在无需训练、微调或专用生成模型的情况下，在分子发现中发挥重要作用。

英文摘要

Designing molecules with target properties is most useful when candidate structures are accompanied by feasible synthetic routes. We introduce My Chemical Harness, a route-native evolutionary framework for goal-directed molecular design in which the search population consists of executable synthetic pathways rather than isolated molecular graphs. Each route is built from purchasable building blocks and reaction templates, executed by deterministic chemistry tools, and scored through task-specific molecular oracles. Large language models (LLMs) are used only as strategy controllers that select high-level preferences over route length, move type, reaction families, motifs, and exploration pressure, while local code performs route construction, validation, deduplication, scoring, selection, and memory updates. This separation lets the LLM guide exploration without allowing it to introduce hallucinated products or unsupported reaction steps. On a soluble epoxide hydrolase proxy task, our LLM agent improves over single pass LLM and deterministic controllers, reaching state-of-the-art performance across the sEH score, synthetic accessibility score, and AiZynthFinder success rate metrics. These results suggest that constrained LLM agents can play a significant role in molecular discovery without requiring training, fine-tuning, or dedicated generative models.

URL PDF HTML ☆

赞 0 踩 0

2606.11244 2026-06-11 cs.AR cs.AI 新提交

SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit LLM Serving

SPEAR: 一种后量化误差自适应恢复系统，实现高效低比特LLM服务

Hongyuan Liu, Yawei Li, Zhiqiang Que, Qinli Yang, Junming Shao, Guosheng Hu

发表机构 * University of Electronic Science and Technology of China（电子科学与技术大学）； University of Bristol（布里斯托大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结针对低比特量化导致LLM质量下降的问题，提出SPEAR系统，通过输入感知的门控误差补偿器（EC）选择性修正高误差层，结合自适应内核融合调度和SLO感知调度器，在<1%内存开销下恢复W4与FP16之间56-75%的困惑度差距。

详情

AI中文摘要

高效的大语言模型（LLM）服务日益受到部署成本的制约。量化是降低服务成本的关键技术，但即使是最先进的4比特量化器，其与FP16之间仍存在显著的质量差距，尤其是在低比特服务最有利的小型模型中。我们发现这一差距的根本原因：量化误差高度依赖于输入，且在不同token之间差异显著，而现有的后量化补偿方法是静态的，对所有输入应用相同的修正。结果，简单token被过度修正，而困难token则修正不足。我们提出SPEAR，一种后量化误差自适应恢复系统，用于改进低比特LLM服务。SPEAR引入了由逐token门控调制的轻量级误差补偿器（EC），并将其仅放置在通过CKA引导的熵感知诊断识别出的最误差敏感层。这将少量参数预算集中在最有效的位置。EC的高效部署带来了若干系统挑战，包括额外计算、由输入相关门控引起的张量并行同步，以及跨配置的延迟不稳定。SPEAR通过自适应内核融合调度解决了这些问题，结合了后同步集成规约内核与P2P双写，将EC后计算融合到低比特GEMM中，并采用SLO约束的EC感知调度器以实现可预测的服务性能。在具有挑战性的逐通道量化设置中，SPEAR恢复了W4与FP16之间56-75%的困惑度差距，同时增加了不到1%的模型内存开销，并保持了与广泛使用的4比特服务部署相当的延迟。

英文摘要

Efficient large language model (LLM) serving is increasingly constrained by deployment cost. Quantization is a key technique for reducing serving cost, yet even state-of-the-art 4-bit quantizers exhibit a noticeable quality gap from FP16, particularly for smaller models where low-bit serving is most beneficial. We identify a fundamental cause of this gap: quantization error is highly input-dependent and varies substantially across tokens, while existing post-quantization compensation methods are static and apply identical corrections to all inputs. As a result, easy tokens are over-corrected while hard tokens remain under-corrected. We present SPEAR, a system for post-quantization error-adaptive recovery that improves low-bit LLM serving. SPEAR introduces lightweight Error Compensators (ECs) modulated by per-token gates and places them only at the most error-sensitive layers identified through a CKA-guided entropy-aware diagnostic. This focuses a small parameter budget where it is most effective. Efficient deployment of ECs presents several systems challenges, including additional computation, tensor-parallel synchronization caused by input-dependent gating, and latency instability across configurations. SPEAR addresses these issues through adaptive kernel-fusion dispatch, combining an epilogue-integrated peer-reduction kernel with P2P dual-write to fuse the post-EC computation into low-bit GEMMs, and an SLO-constrained EC-aware scheduler for predictable serving performance. Across challenging per-channel quantization settings, SPEAR recovers 56-75% of the perplexity gap between W4 and FP16 while adding less than 1% model memory overhead and maintaining latency comparable to a widely used 4-bit serving deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.11236 2026-06-11 cs.NE cs.CV cs.LG 新提交

A2SG:Adaptive and Asymmetric Surrogate Gradients for Training Deep Spiking Neural Networks

A2SG：用于训练深度脉冲神经网络的适应性和非对称替代梯度

Yechan Kang, Yongjin Kweon, Mingyeong Seo, Sohee Park, Yeonguk Jeon, Jongkil Park, Hyun Jae Jang, Jaewook Kim, YeonJoo Jeong, Suyoun Lee, Seongsik Park

发表机构 * KAIST（韩国科学技术院）

AI总结提出适应性和非对称替代梯度（A2SG）框架，通过自适应窗口调整梯度方向一致性、非对称梯度反映神经元动态，降低梯度变化并促进收敛到平坦最小值，在多种SNN模型和任务上提升精度与能效。

Comments Accepted at ICML 2026

详情

AI中文摘要

由于替代梯度导致的尖锐损失景观和时间不一致性，训练深度脉冲神经网络（SNN）仍然具有挑战性。为了解决这些问题，我们提出了一个统一框架：适应性和非对称替代梯度A2SG。适应性梯度调整一个有效窗口以实现时空适应，减少空间梯度变化并保持梯度随时间的方向一致性。非对称梯度通过为具有更高膜电位的神经元分配更大的梯度来反映神经元动态，并且我们证明它们比对称替代梯度产生更低的方差。我们的分析进一步建立了局部梯度变化与损失景观曲率之间的直接联系，为A2SG如何促进收敛到更平坦的最小值并改善泛化提供了原理性解释。我们在多种模型上进行了广泛实验，包括基于CNN和基于Transformer的SNN，涉及各种任务，如使用静态和神经形态数据集的图像分类以及分割。结果表明，A2SG持续提高了准确性和能效，使其成为训练深度SNN的通用且可靠的解决方案。我们的代码可在以下网址获取：此 https URL。

英文摘要

Training deep spiking neural networks (SNNs) remains challenging due to sharp loss landscapes and temporal inconsistency caused by surrogate gradients. To address these challenges, we propose a unified framework: adaptive and asymmetric surrogate gradients A2SG. The adaptive gradients adjust an effective window for spatio-temporal adaptation, reducing spatial gradient variation and maintaining directional consistency of gradients over time. The asymmetric gradients reflect neuronal dynamics by assigning larger gradients to neurons with higher membrane potentials, and we prove that they yield lower variation than symmetric surrogates. Our analysis further establishes a direct connection between local gradient variation and the curvature of the loss landscape, providing a principled explanation for how A2SG promotes convergence to flatter minima and improves generalization. We conduct extensive experiments on diverse models, including CNN-based and Transformer-based SNNs, across various tasks such as image classification using both static and neuromorphic datasets, as well as segmentation. The results demonstrate that A2SG consistently improves accuracy and energy efficiency, establishing it as a general and reliable solution for training deep SNNs. Our code is available at https://github.com/KIST-NCL/A2SG.git.

URL PDF HTML ☆

赞 0 踩 0

2606.11218 2026-06-11 cs.CY cs.AI 新提交

An Ethical eValuation Agent (EeVA): Results of a Proof-of-Concept Test on a Prototype Agentic-like Workflow to Assist Ethical Deliberations

伦理评估代理（EeVA）：在原型类代理工作流中辅助伦理审议的概念验证测试结果

Stephen Milford, B. Zara Malgir, Miguel Vazquez

发表机构 * Institute for Biomedical Ethics, Basel University（伦理研究所，巴塞尔大学）； North-West University（北开普大学）； Barcelona Supercomputing Center（巴塞罗那超级计算中心）

AI总结提出基于LLM的类代理工作流EeVA，通过10种伦理框架评估用例，生成结构化评估与综合，促进伦理反思而非给出绝对答案，在三个案例中验证了可行性。

详情

AI中文摘要

伦理审议常被误解为寻找单一对错答案，这给必须应对伦理挑战的非伦理专业人员带来困难。我们开发了EeVA，一种基于LLM的类代理工作流，旨在支持比较性伦理反思而非提供确定性伦理答案。EeVA使用n8n编程，包含三个互连工作流：启动器、工作器和发射器。它通过评估器和综合提示，根据10种伦理框架评估上传的用例。概念验证测试使用了来自城市交通、点对点能源交易和社会服务资源分配的三个已发表案例。在所有案例中，EeVA生成了结构一致的框架特定评估和综合报告。输出区分了不同框架，识别了收敛和分歧，提出了增加一致性的修改建议，并突出了持续的伦理张力。综合报告对非专业人士可读，并将注意力从简单答案转向设计条件、保障措施以及跨框架完全一致不太可能的领域。研究结果表明，LLM可以被组织成可用的工作流，在保留伦理多元性的同时，帮助弥合伦理学家与非伦理专业人员之间的沟通差距。EeVA的价值不在于取代伦理学家或解决道德分歧，而在于构建结构化的伦理审议。EeVA为在伦理专业知识有限的情况下支持伦理反思提供了一个有前景的概念验证。在成为成熟工具之前，还需要在可重复性、人工评估、用户测试和效率方面进行进一步工作。

英文摘要

Ethical deliberation is often misunderstood as a search for single right or wrong answers, creating difficulties for non-ethically trained personnel who must address ethically laden challenges. We developed EeVA, an agentic-like LLM-based workflow designed to support comparative ethical reflection rather than deliver definitive ethical answers. EeVA was programmed in n8n using three interconnected workflows: starter, worker, and emitter. It evaluated uploaded use cases against 10 ethical frameworks through evaluator and synthesis prompts. Proof-of-concept testing used three published cases from urban mobility, peer-to-peer energy trading, and social-service resource allocation. Across all cases, EeVA produced consistently structured framework-specific evaluations and integrated syntheses. Outputs differentiated between frameworks, identified convergences and divergences, recommended modifications to increase alignment, and highlighted persistent ethical tensions. Syntheses were readable for non-specialists and shifted attention away from simplistic answers toward design conditions, safeguards, and areas where full cross-framework agreement was unlikely. The findings suggest that LLMs can be organised into usable workflows that preserve ethical plurality while helping bridge the communicative gap between ethicists and non-ethically trained personnel. EeVA's value lies not in replacing ethicists or resolving moral disagreement, but in scaffolding structured ethical deliberation. EeVA offers a promising proof of concept for supporting ethical reflection where access to ethics expertise is limited. Further work is needed on reproducibility, human evaluation, user testing, and efficiency before it can be considered a mature tool.

URL PDF HTML ☆

赞 0 踩 0

2606.11217 2026-06-11 cs.CY cs.AI cs.HC 新提交

Preregistration for Experiments with AI Agents

AI智能体实验的预注册

Michelle Vaccaro

发表机构 * MIT（麻省理工学院）

AI总结针对AI智能体实验中的方法论漏洞，提出将预注册实践扩展至该领域，并设计专用模板以提升研究可信度。

Comments Accepted at ICML 2026 as a Spotlight (Top 5%) Position Paper

详情

AI中文摘要

大型语言模型（LLM）和自主AI智能体的普及催生了一种快速发展的方法论范式：“计算机内”行为实验。最初，这种方法被设想为在认知、决策和社会动态研究中，使用AI智能体作为人类参与者的替代品，但现在它已具有新的意义——随着AI智能体越来越多地代表个人和组织进行谈判、交易和做出重大决策，理解它们的行为本身已成为研究重点。虽然这些AI智能体实验在可扩展性、成本效益和实验控制方面提供了前所未有的优势，但它们也继承并有时放大了长期困扰人类受试者研究的方法论漏洞。为解决这些问题，本文主张，预注册实践——对于提高人类受试者实验的可信度至关重要——现在应扩展到AI智能体实验。我们系统地列举了AI智能体实验引入的研究者自由度——例如模型选择、提示措辞、设置和基于结果的重新设计——并展示了低迭代成本和缺乏报告规范如何使这些选择既容易被利用又难以被检测。我们提出了一个针对AI智能体实验的预注册模板，并呼吁会议、期刊和资助机构将预注册作为这一新兴研究范式的标准实践。

英文摘要

The proliferation of large language models (LLMs) and autonomous AI agents has given rise to a rapidly growing methodological paradigm: "in silico" behavioral experiments. Originally conceived as a way to use AI agents as proxies for human participants in studies of cognition, decision-making, and social dynamics, this approach has taken on new significance -- as AI agents increasingly negotiate, transact, and make consequential decisions on behalf of people and organizations, understanding their behavior has become a research priority in its own right. While these experiments with AI agents offer unprecedented advantages in terms of scalability, cost efficiency, and experimental control, they also inherit, and in some cases amplify, methodological vulnerabilities that have long plagued human subjects research. To address these issues, this paper argues that preregistration practices -- central to improving the credibility of human subjects experiments -- should now be extended to experiments with AI agents. We systematically catalog the researcher degrees of freedom that experiments with AI agents introduce -- model selection, prompt wording, settings, and outcome-contingent redesign, for example -- and show how the low cost of iteration and lack of reporting norms make these choices both easy to exploit and difficult to detect. We propose a preregistration template tailored to experiments with AI agents and call on conferences, journals, and funding agencies to make preregistration standard practice for this emerging research paradigm.

URL PDF HTML ☆

赞 0 踩 0

2606.11215 2026-06-11 cs.CY cs.AI 新提交

The Environmental Cost of LLMs in AIED: Reporting and Practices

AIED中LLMs的环境成本：报告与实践

Sabrina C. Eimler, Lukas Erle, Daniel Flood, Aditi Haiman, Luca Häckert, André Helgert, Lachlan McGinness, Büsra Yapici

发表机构 * Institute of Computer Science and Institute of Positive Computing, Ruhr West University of Applied Sciences（计算机科学研究所和积极计算研究所，鲁尔-韦斯特应用科学大学）； Centre for Computational Science and Mathematical Modelling, Coventry University（计算科学与数学建模中心，科文特里大学）； Carnegie Mellon University（卡内基梅隆大学）； Australian National University and CSIRO（澳大利亚国立大学和CSIRO）

AI总结针对AIED社区缺乏LLM计算与环境成本标准化报告的问题，提出开源方法测量并报告碳排放，包括本地和云端硬件，以及未知参数的前沿LLM计算开销公式。

详情

AI中文摘要

近年来，大型语言模型（LLM）在人工智能教育（AIED）社区中的使用越来越广泛。虽然LLM为学习者和教育者提供了独特的途径，但使用LLM会带来计算和环境成本。由于缺乏标准化程序来测量和报告这些影响，这些成本大多被隐藏。为了解决这一差距，我们首先对AIED 2025会议论文集的所有论文进行了文献综述，确定是否以及如何报告LLM的计算或环境成本。大多数项目使用LLM，但很少报告使用的计算资源，几乎没有将LLM的环境影响作为伦理问题讨论。为了解决缺乏标准化报告实践的问题，我们提出了一种开源方法，用于系统测量和报告LLM的计算开销以及运行机器学习（ML）AIED系统的环境影响。我们提供了测量本地和云端硬件碳足迹的软件解决方案。我们还提供了一个易于使用的公式，用于计算前沿LLM的计算开销，即使确切的参数数量未知。总体而言，我们希望激励同事们使用我们的方法，在AIED社区中争取更透明地报告使用LLM的隐藏成本。

英文摘要

Large Language Model (LLM) usage in recent years has become increasingly widespread in the Artificial Intelligence in Education (AIED) community. While LLMs offer unique avenues for learners and educators, using LLMs comes with computational and environmental costs. These costs are mostly hidden due to a lack of standardised procedures to measure and report these impacts. To address this gap, we first conducted a literature review of all papers published as part of the AIED 2025 conference proceedings, determining if and how computational or environmental costs of LLMs are reported. Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts of LLMs as an ethical concern. To address this lack of standardised reporting practices, we propose an open-source method for systematically measuring and reporting the computational expense of LLMs and environmental impact of running Machine Learning (ML) AIED systems. We provide software solutions to measure the carbon footprint for both local and cloud based hardware. We also provide an easy-to-use formula to calculate the computational expense of frontier LLMs even when the exact number of parameters is not known. Overall, we hope to motivate colleagues to use our method to strive for more transparent reporting of hidden costs of using LLMs in the AIED community.

URL PDF HTML ☆

赞 0 踩 0

2606.11214 2026-06-11 cs.CY cs.AI cs.HC 新提交

From Awareness to Action: Understanding and Overcoming the Research-Practice Gap in Algorithmic Fairness for Public Health

从意识到行动：理解并克服公共卫生算法公平性中的研究-实践差距

Sara Altamirano, Tijs Portegies, Sennay Ghebreab

发表机构 * Informatics Institute University of Amsterdam（阿姆斯特丹大学信息研究所）

AI总结通过混合方法研究，揭示算法公平性在公共卫生ML应用中从意识到行动的差距，提出Fairness-to-Action框架，整合方法、组织和系统维度，指出公平性制度化薄弱、翻译机制外部驱动及系统优先性偏重准确性的问题。

Comments Extended version of an accepted IASEAI'26 paper; includes technical appendices. 22 pages, 2 figures

详情

AI中文摘要

算法公平性对于负责任的机器学习驱动的公共卫生研究至关重要，但其实际实施仍然有限。为了调查这种意识-行动差距，我们进行了一项顺序混合方法研究，包括专家访谈、在线调查和系统映射。专家访谈为调查设计提供了信息，调查揭示了公平性的碎片化定义、有限的培训和指导、对外部来源的依赖以及正式评估、缓解或监测的罕见使用。这些发现随后被映射到三个既定的研究-实践差距视角：知识-实践差距、知识到行动循环和知道-做差距，每个视角提供了互补的观点。基于这一综合，我们引入了公平到行动框架，该框架整合了方法、组织和系统维度，以识别算法公平性知识转化停滞的位置。我们的分析表明，公平性仍然制度化薄弱，转化机制由外部驱动，系统级优先事项继续强调准确性而非公平性。这些见解为推进安全、公平和道德的机器学习驱动的公共卫生研究实践提供了关键杠杆点。

英文摘要

Algorithmic fairness is essential for responsible ML-driven public health research, yet its practical implementation remains limited. To investigate this awareness-action gap, we conducted a sequential mixed-methods study comprising expert interviews, an online survey, and systematic mapping. The expert interviews informed the design of the survey, which in turn revealed fragmented definitions of fairness, limited training and guidance, reliance on external sources, and rare use of formal assessment, mitigation, or monitoring. These findings were subsequently mapped onto three established research-practice gap lenses: the Knowledge-Practice Gap, the Knowledge-to-Action Cycle, and the Knowing-Doing Gap, each offering complementary perspectives. Building on this synthesis, we introduce the Fairness-to-Action framework, which integrates methodological, organizational, and systemic dimensions to identify where translation of algorithmic fairness knowledge stalls. Our analysis shows that fairness remains weakly institutionalized, translation mechanisms are externally driven, and system-level priorities continue to emphasize accuracy over fairness. These insights suggest critical leverage points for advancing safe, fair, and ethical ML-driven public health research practice.

URL PDF HTML ☆

赞 0 踩 0

2606.11197 2026-06-11 eess.AS cs.AI cs.CL cs.SD 新提交

MA-DLE: Speech-based Automatic Depression Level Estimation via Memory Augmentation

MA-DLE: 基于记忆增强的语音自动抑郁程度估计

Xuzhi Wang, Xinran Wu, Ziping Zhao, Jianhua Tao, Björn W. Schuller

发表机构 * Tianjin Normal University（天津师范大学）； Tsinghua University（清华大学）； Technical University of Munich（慕尼黑技术大学）； Imperial College London（伦敦帝国理工学院）

AI总结提出记忆增强特征方法，通过选择性整合历史时序特征和动态记忆特征，结合层次注意力融合模块，在DAIC-WOZ和E-DAIC数据集上实现最优性能。

Comments Accepted at IEEE TAC

详情

AI中文摘要

基于语音的抑郁程度自动估计对于实现早期检测和及时干预至关重要，尤其是在资源受限的心理健康环境中。近年来，深度学习在包括情感计算和心理健康评估在内的多个领域取得了显著成功。现有方法大多依赖基于RNN的架构（如LSTM和GRU）来建模时间信息以进行抑郁估计。然而，提取的特征往往只强调少数相邻语音片段，限制了其捕捉长程依赖的能力。为克服这一局限，我们引入了一种基于记忆的特征增强方法，以增强GRU提取特征的表示能力。我们的记忆库并非不加区分地整合历史数据，而是设计为选择性整合两类组件以减少冗余和不相关性：(1) 与当前GRU输出高度相似的历史时序特征，提供互补的上下文信息；(2) 基于特征变异性识别的动态记忆特征，捕捉指示抑郁症状的行为和情绪波动。为有效融合记忆增强特征与GRU输出，我们进一步设计了层次注意力融合（HAF）模块。我们的方法在广泛使用的DAIC-WOZ和E-DAIC数据集上进行了评估，取得了最先进的性能。

英文摘要

Speech-based automatic estimation of depression levels is essential for enabling early detection and timely intervention, particularly in resource-constrained mental health settings. In recent years, deep learning has demonstrated impressive success across various domains, including affective computing and mental health assessment. Most existing approaches rely on RNN-based architectures (such as LSTM and GRU) to model temporal information for depression estimation. However, the extracted features often emphasize only a few adjacent speech segments, limiting their ability to capture long-range dependencies. To overcome this limitation, we introduce a memory-based feature augmentation method that enhances the representational capacity of GRU-extracted features. Rather than indiscriminately incorporating historical data, our memory bank is designed to selectively integrate two types of components in order to reduce redundancy and irrelevance: (1) historical temporal features that closely resemble the current GRU output, offering complementary contextual information; and (2) dynamic memory features identified based on feature variability, which capture behavioral and emotional fluctuations indicative of depressive symptoms. To effectively fuse the memory-augmented features with GRU outputs, we further design a Hierarchical Attention Fusion (HAF) module. Our method is evaluated on the widely used DAIC-WOZ and E-DAIC datasets, achieving state-of-the-art performance.

URL PDF HTML ☆

赞 0 踩 0

2606.11195 2026-06-11 cs.CY cs.AI cs.HC 新提交

From Consumption to Reflection: Designing Human-AI Relations for Stable Reasoning

从消费到反思：为稳定推理设计人-人工智能关系

Rikard Rosenbacke, Carl Rosenbacke, Victor Rosenbacke, Martin McKee

发表机构 * Faculty of Medicine, Lund University（吕勒欧大学医学院）； Department of Economics, Lund University School of Economics and Management（吕勒欧大学经济学与管理学院经济系）； Department of Health Services Research and Policy, London School of Hygiene & Tropical Medicine（伦敦卫生与热带医学学院健康服务研究与政策系）

AI总结提出关系反思智能（RRI），一种推理时治理层，通过可审计的推理循环实现反思，将人机交互转变为联合推理系统，以补偿双方局限并实现稳定推理。

详情

AI中文摘要

大型语言模型（LLM）改变了人类获取信息的方式，但并未改变我们推理信息的方式。它们的流畅性加速了消费，同时绕过了支撑健全判断的缓慢反思过程。本文介绍了关系反思智能（RRI），一种推理时治理层，通过可审计的推理循环将反思操作化。RRI 不在模型内部运行，而是在模型周围运行，为人类与 LLM 之间的稳定、可审计推理提供了实用结构。核心前提是，LLM 继承了与塑造人类思维相似的认知脆弱性：依赖直觉捷径、混淆表征与现实、偏好连贯性而非证伪。当人类和模型共享这些倾向时，它们的错误会叠加。我们称之为关系漂移，一种源于交互而非仅来自模型的失败。解决这一问题需要从建模词间关系转向建模模型输出与人类推理之间的关系。RRI 通过三个组件提供了这一缺失层：Rose-Frame（识别推理中可能的故障点）、Architect's Pen（在关键时刻引入针对性反思步骤）以及一个推理时工作流（无需重新训练模型即可嵌入这些步骤）。这些元素共同将人机交互转变为一个具有显式检查点、冲突揭示和可审计假设轨迹的联合推理系统。RRI 不是让机器像人类一样思考，也不是强迫人类像机器一样推理，而是创造一种结构化交互，使双方补偿彼此的局限。它将 AI 安全重新定义为认知架构问题，其中可靠决策取决于将反思直接嵌入交互过程。

英文摘要

Large language models (LLMs) have transformed how humans access information, but not how we reason with it. Their fluency accelerates consumption while bypassing the slow, reflective processes that underpin sound judgment. This paper introduces Relational Reflective Intelligence (RRI), an inference-time governance layer that operationalizes reflection through auditable reasoning loops. RRI operates not inside the model but around it, providing a practical structure for stable, auditable reasoning between humans and LLMs. The core premise is that LLMs inherit cognitive vulnerabilities similar to those that shape human thought: reliance on intuitive shortcuts, confusion between representation and reality, and a preference for coherence over falsification. When humans and models share these tendencies, their errors compound. We refer to this as relational drift, a failure that arises from interaction rather than from the model alone. Addressing this requires a shift from modeling relations between words to structuring relations between model outputs and human reasoning. RRI provides this missing layer through three components: the Rose-Frame, which identifies likely breakdowns in reasoning; the Architect's Pen, which introduces targeted reflection steps at critical moments; and an inference-time workflow that embeds these steps without retraining the model. Together, these elements transform human-AI interaction into a joint reasoning system with explicit checkpoints, conflict surfacing, and an auditable trail of assumptions. Rather than making machines think like humans or forcing humans to reason like machines, RRI creates a structured interaction in which both compensate for each other's limitations. It reframes AI safety as a cognitive architecture problem, where reliable decisions depend on embedding reflection directly into the interaction process.

URL PDF HTML ☆

赞 0 踩 0

2606.11107 2026-06-11 eess.IV cs.CV cs.LG 版本更新

Multimodal Brain Tumour Classification Using Feature Fusion

使用特征融合的多模态脑肿瘤分类

Wajih ul Islam, Muhammad Yaqoob, Javed Ali Khan, Volker Steuber

发表机构 * School of Physics, Engineering and Computer Science（物理、工程与计算机科学学院）； University of Hertfordshire（赫特福德郡大学）

AI总结提出双分支多模态网络，融合MRI图像与91个放射组学特征，通过门控融合实现脑肿瘤分类，准确率达96.13%。

详情

AI中文摘要

临床医生通过综合患者症状、病史以及来自MRI和CT扫描等模态的定量成像数据，形成统一的临床判断来诊断脑肿瘤。然而，大多数深度学习模型仅依赖MRI/CT图像，未能复制临床医生的多模态推理。我们探索了一种双分支多模态网络，将原始MRI扫描与91个提取的放射组学特征（强度、纹理、形状和边界描述符）相结合，将脑肿瘤分类为胶质瘤、脑膜瘤、垂体瘤和无肿瘤。预训练的CNN骨干网络编码图像流，而专用的MLP编码放射组学特征流。通过拼接、门控或双向跨模态注意力策略融合两个流。在平衡的7200张图像数据集上的九次实验运行中，所有多模态配置均优于单模态基线，其中门控融合实现了最佳准确率96.13%。

英文摘要

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explore a two-branch multimodal network combining raw MRI scans with 91 extracted radiomic features (intensity, texture, shape, and boundary descriptors) to classify brain tumors into glioma, meningioma, pituitary, and no-tumor. A pre-trained CNN backbone encodes the image stream, whereas a dedicated MLP encodes the radiomic stream. Both streams are fused via concatenation, gated, or bidirectional cross-modal attention strategies. Across nine experimental runs on a balanced 7,200 image dataset, all multimodal configurations outperform unimodal baselines with gated fusion achieving the best accuracy of 96.13%.

URL PDF HTML ☆

赞 0 踩 0

2606.10120 2026-06-11 cs.IR cs.AI cs.HC 版本更新

MetaPlate: Counterfactual-Guided RAG-LLM Tool for Personalized Food Recommendation and Hyperglycemia Prevention

MetaPlate: 反事实引导的RAG-LLM工具用于个性化食物推荐和高血糖预防

Asiful Arefeen, Carol Johnston, Hassan Ghasemzadeh

发表机构 * College of Health Solutions, Arizona State University（亚利桑那州立大学健康解决方案学院）； School of Computing and Augmented Intelligence, Arizona State University（亚利桑那州立大学计算与增强智能学院）

AI总结提出MetaPlate框架，结合反事实解释、机器学习预测和RAG-LLM，生成个性化膳食建议以预防餐后高血糖，经注册营养师评估证明其可行性和有效性。

详情

AI中文摘要

餐后高血糖是代谢紊乱的关键风险因素；然而，现有的饮食指导通常是静态的、不切实际的且个性化不足，提供的建议难以遵循或效果不佳。尽管最近的进展利用连续血糖监测（CGM）和机器学习来预测血糖反应，但这些方法主要是预测性的，缺乏可操作的指导。此外，推荐系统常常与用户目标不一致，且需要大量输入。我们提出了MetaPlate，一个反事实解释（CF）引导的、上下文感知的决策支持框架，用于生成个性化膳食建议，以减轻健康成年人的餐后血糖波动。MetaPlate整合了多模态数据，包括来自25名个体的CGM读数、可穿戴设备衍生的生理信号以及用户提供的膳食输入，以建模餐前上下文。一个机器学习模型预测血糖反应，而CF优化模块通过调整膳食组成（修改宏量营养素数量）来维持血糖水平在目标范围内（≤140 mg/dL）。基于LLM的检索增强生成（RAG）层通过使用USDA食品数据库的约束搜索生成人类可读的建议，增强了可解释性。我们通过结构化的专家在环评估，与注册营养师（RDs）一起评估MetaPlate，比较提示优化前后的性能。结果显示，在膳食真实性、份量适宜性和推荐可能性方面有所改进，专家反馈表明从临床不可行的输出转向了可操作、上下文适宜的建议。我们的发现强调了领域知识和结构化约束在LLM驱动系统中的重要性，并突出了MetaPlate作为实时个性化膳食决策支持工具的潜力。

英文摘要

Postprandial hyperglycemia is a key risk factor for metabolic disorders; however, existing dietary guidance is often static, impractical, and insufficiently personalized, providing recommendations that are difficult to follow or not impactful. While recent advances leverage continuous glucose monitoring (CGM) and machine learning to predict glycemic responses, these approaches are largely predictive and lack actionable guidance. Moreover, recommendation systems are often misaligned with user goals and require extensive input. We present MetaPlate, a counterfactual explanation (CF) guided, context-aware decision-support framework that generates personalized meal recommendations to mitigate postprandial glucose excursions in healthy adults. MetaPlate integrates multimodal data, including CGM readings, wearable-derived physiological signals, and user-provided meal inputs from $25$ individuals to model pre-meal context. A machine learning model predicts glucose response, while a CF optimization module adjusts meal composition modifying macronutrient amounts to maintain glucose levels within a target range ($\leq 140$ mg/dL). An LLM-based retrieval-augmented generation (RAG) layer enhances interpretability by producing human-readable recommendations using constrained search of the USDA food database. We evaluate MetaPlate via a structured expert-in-the-loop assessment with registered dietitians (RDs), comparing performance before and after prompt refinement. Results show improvements in meal realism, portion suitability, and recommendation likelihood, with expert feedback indicating a shift from clinically implausible outputs to actionable, contextually appropriate recommendations. Our findings emphasize the importance of domain knowledge and structured constraints in LLM-driven systems and highlight the potential of MetaPlate as a real-time personalized dietary decision-support tool.

URL PDF HTML ☆

赞 0 踩 0

2606.09964 2026-06-11 quant-ph cs.LG 版本更新

JGRA: Jacobian Geometry Robustness Assessment in NISQ Noise-Aware Quantum Neural Networks

JGRA: NISQ噪声感知量子神经网络中的雅可比几何鲁棒性评估

Gianluca Scanu, Luca Barletta, Stefano Rini

发表机构 * Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano（电子、信息与生物工程系，米兰理工学院）； Department of Electrical and Computer Engineering, National Yang Ming Chiao Tung University（电子与计算机工程系，国立阳明交通大学）

AI总结提出JGRA框架，通过雅可比几何评估噪声感知量子神经网络的鲁棒性，包括熵匹配噪声校准、噪声感知训练和噪声条件雅可比提取，揭示干净域结构与噪声推理行为的关系。

Comments Accepted at IEEE qCCL 2026. Author accepted manuscript. 6 pages; cleaned source files, no changes to manuscript content

详情

AI中文摘要

NISQ时代对量子计算施加了严格约束，噪声和退相干从根本上限制了性能。在经典深度学习中，模型对扰动的鲁棒性和弹性已得到充分研究：深度神经网络（DNN）由于其表示中的固有冗余，在剪枝、噪声注入和结构扰动下仍能保持高性能。量子机器学习的一个核心挑战是将这种鲁棒性概念转移到现实NISQ噪声下的量子神经网络（QNN）中。虽然经典深度学习通过结构冗余表现出鲁棒性，但QNN的类似原理尚不成熟。我们提出JGRA：一个通过雅可比几何评估噪声感知QNN鲁棒性的框架，捕捉噪声引起的参数扰动下的模型敏感性。我们的方法包括熵匹配噪声校准、噪声感知训练和噪声条件雅可比提取，产生将干净域结构与噪声推理行为联系起来的几何描述符。我们还实验证明，这些描述符编码了关于在未见噪声下鲁棒性的预测信息。

英文摘要

The NISQ era places stringent constraints on quantum computation, where noise and decoherence fundamentally limit performance. In classical deep learning, model robustness and resilience to perturbations are well studied: deep neural networks (DNNs) maintain high performance despite pruning, noise injection, and structural perturbations due to inherent redundancy in their representations. A central challenge in quantum machine learning is to transfer this notion of robustness to quantum neural networks (QNNs) under realistic NISQ noise. While classical deep learning exhibits robustness through structural redundancy, analogous principles for QNNs remain underdeveloped. We propose JGRA: a framework for assessing robustness in noise-aware QNNs via Jacobian geometry, capturing model sensitivity to parameter perturbations induced by noise. Our method includes entropy-matched noise calibration, noise-aware training, and noise-conditioned Jacobian extraction, yielding geometric descriptors that link clean-regime structure to noisy inference behaviour. We also empirically demonstrate that these descriptors encode predictive information about robustness under unseen noise.

URL PDF HTML ☆

赞 0 踩 0

2606.08493 2026-06-11 q-bio.GN cs.LG stat.ML 版本更新

Querying Counterfactuals on Tissue Graphs with Supervised Disentanglement

在组织图上通过监督解缠查询反事实

Abdul Moeed, Stefan Schrod, Martin Rohbeck, Marc Jan Bonder, Pavlo Lutsik, Oliver Stegle, Daniel Dimitrov

发表机构 * Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany（德国癌症研究中心（DKFZ）计算基因组学与系统遗传学部，海德堡，德国）； Helmholtz Information & Data Science School for Health, Germany（德国健康信息与数据科学学院）； Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany（欧洲分子生物学实验室（EMBL）基因组生物学部，海德堡，德国）； Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands（格罗宁根大学医学中心基因学系，格罗宁根，荷兰）； Oncode Institute, Utrecht, The Netherlands（奥诺代码研究所，乌得勒支，荷兰）； KU Leuven, Leuven, Belgium（鲁汶大学，鲁汶，比利时）； Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK（沃里克桑格研究所，沃里克基因组校园，欣斯顿，英国）

AI总结本文形式化组织图反事实为空间干预，提出Cellina框架通过监督解缠分解细胞内在状态与空间上下文，用于反事实预测，在结直肠癌和小鼠大脑数据上优于现有方法。

详情

AI中文摘要

组织图反事实询问在改变的空间邻居上下文中细胞的表达将如何变化。这类查询对于预测组织中细胞行为至关重要，但缺乏统一定义，现有方法针对特定干预类型或将细胞视为独立同分布。在这项工作中，我们首先将组织图反事实形式化为一类空间干预，这些干预要么重新连接细胞之间的边（边扰动），要么修改其邻居的表达（节点扰动）。然后，我们介绍Cellina（https://cellina.readthedocs.io），一个使用监督解缠将细胞内在状态从其空间上下文中分解出来的框架，将后者作为反事实预测的条件输入。在跨越结直肠癌和小鼠大脑中超过250万个空间分辨细胞的基准测试中，Cellina在组织扰动、解缠和可扩展性方面优于空间感知和非空间的竞争对手。此外，我们展示了Cellina以无监督方式揭示生物学上不同的癌症子域，并实现靶向邻居扰动模拟。

英文摘要

Tissue graph counterfactuals ask how a cell's expression would change under altered spatial neighbor contexts. Such queries are central to predicting cell behavior in tissues, but lack a unified definition, with existing methods targeting specific intervention types or treating cells as i.i.d. In this work, we first formalize tissue graph counterfactuals as a class of spatial interventions that either rewire connections between cells (edge perturbation) or modify the expression of their neighbors (node perturbation). We then introduce Cellina (https://cellina.readthedocs.io) - a framework that uses supervised disentanglement to decompose a cell's intrinsic state from its spatial context, using the latter as a conditioning input for counterfactual predictions. Across benchmarks spanning over 2.5 million spatially-resolved cells in colorectal cancer and mouse brain, Cellina outperforms spatially-informed and non-spatial competitors in in-silico graph perturbations, disentanglement, and scalability. Additionally, we show that Cellina reveals biologically distinct cancer subdomains in an unsupervised manner and enables targeted neighbor perturbation simulations.

URL PDF HTML ☆

赞 0 踩 0

2606.06940 2026-06-11 eess.AS cs.SD 版本更新

Beyond Semantic Dominance: Cognitive Affective Reasoning and Empathetic Response Alignment in Audio Language Models

超越语义主导：音频语言模型中的认知情感推理与共情响应对齐

Zhixian Zhao, Shuiyuan Wang, Wenjie Tian, Jingbin Hu, Ziyu Zhang, Lei Xie

发表机构 * Northwestern Polytechnical University（西北工业大学）

AI总结提出CogAudio-LLM框架，通过构建LIME-440K数据集实现声学-语义解耦，设计EIPS思维链机制进行心理推理，并采用DR-SAPO优化策略平衡逻辑严谨性与共情质量，解决音频语言模型中的语义主导和情感认知不足问题。

Comments Accepted by Interspeech2026

详情

AI中文摘要

虽然音频语言模型（ALM）表现出强大的语义理解能力，但在复杂的情感交互方面仍存在困难。具体来说，文本语义主导常常掩盖声学细微差别，而缺乏认知深度导致生成通用、与情感无关的响应。我们提出了CogAudio-LLM\footnote{ \urlstyle{same} this https URL}，一种新颖的认知情感推理框架。为了缓解语义主导，我们构建了LIME-440K，一个“词汇相同、多情感”的数据集，旨在促进声学-语义解耦。我们引入了EIPS，一种包含心理推理的4步思维链（CoT）机制。为了提高推理效率，多阶段训练通过监督微调显式建立EIPS，然后将这种逻辑提炼为隐式生成过程。最后，我们设计了DR-SAPO（双路径软自适应策略优化）来动态平衡CoT的逻辑严谨性与直接响应的共情质量。

英文摘要

While Audio Language Models (ALMs) demonstrate strong semantic understanding, they struggle with complex affective interactions. Specifically, textual semantic dominance often overshadows acoustic nuances, and a lack of cognitive depth leads to generic, emotion-agnostic responses. We propose CogAudio-LLM\footnote{ \urlstyle{same} https://github.com/zxzhao0/CogAudio-LLM, a novel cognitive affective reasoning framework. To mitigate semantic dominance, we build LIME-440K, a ``lexically-identical, multi-emotion'' dataset designed to facilitate acoustic-semantic decoupling. We introduce EIPS, a 4-step Chain-of-Thought (CoT) mechanism incorporating psychological reasoning. For inference efficiency, multi-stage training explicitly establishes EIPS via supervised fine-tuning, then distills this logic into an implicit generation process. Finally, we design DR-SAPO (Dual-Route Soft Adaptive Policy Optimization) to dynamically balance the logical rigor of the CoT with the empathetic quality of the direct response.

URL PDF HTML ☆

赞 0 踩 0

2606.07001 2026-06-11 cs.DB cs.AI 版本更新

DataEvolver: Automatic Data Preparation for Large Language Models through Multi-Level Self-Evolving

DataEvolver: 通过多级自我进化实现大型语言模型的自动数据准备

Chao Deng, Shaolei Zhang, Ju Fan, Xiaoyong Du

发表机构 * Renmin University of China（中国人民大学）

AI总结提出DataEvolver，首个自我进化的数据准备系统，通过多级机制自动构建管道将原始数据转化为高质量数据，在七个基准上平均提升下游LLM性能10%。

详情

AI中文摘要

高质量训练数据对大型语言模型（LLMs）至关重要，通常需要大量且昂贵的人工整理。现有的自动数据准备方法依赖于预定义管道或定制化人工指令，这限制了它们对不同数据分布的适应性，并且缺乏来自高质量示例的原则性指导。在本文中，我们介绍了DataEvolver，这是首个自我进化的数据准备系统，能够自动构建管道将原始数据转化为高质量数据。DataEvolver采用多级机制来确保管道的可执行性和有效性。在算子级别，它逐步扩展算子集以构建逻辑计划，同时解决依赖冲突。在管道级别，它将逻辑计划实例化为可执行代码，并通过反馈循环迭代优化管道编排，从而减少准备数据与高质量示例之间的分布差距。在七个基准上的实验表明，与在原始数据上训练相比，DataEvolver显著提高了数据质量，并使下游LLM性能平均提升10%，突显了LLM与数据迭代协同进化的新机遇。

英文摘要

High-quality training data is essential to large language models (LLMs) and typically requires extensive and costly manual curation. Existing automatic data preparation methods rely on predefined pipelines or customized human instructions, which limits their adaptability to diverse data distributions and lacks principled guidance from high-quality examples. In this paper, we introduce DataEvolver, the first self-evolving data preparation system that automatically constructs pipelines to transform raw data into high-quality data. DataEvolver employs a multi-level mechanism to ensure both pipeline executability and effectiveness. At the operator level, it incrementally expands the operator set to construct a logical plan while resolving dependency conflicts. At the pipeline level, it instantiates logical plans into executable code and iteratively refines pipeline orchestration through a feedback loop that reduces the distribution gap between prepared data and high-quality examples. Experiments on seven benchmarks show that DataEvolver substantially improves data quality and achieves an average 10\% gain in downstream LLM performance compared with training on original data, highlighting new opportunities for the iterative co-evolution of LLMs and data.

URL PDF HTML ☆

赞 0 踩 0

2606.05907 2026-06-11 cs.IR cs.LG 版本更新

Knowledge Manifold: A Riemannian Geometric Framework for Semantic Mapping and Geodesic Analysis of Scientific Literature

知识流形：用于科学文献语义映射和测地线分析的黎曼几何框架

Tomonaga Okabe, Kazuhiko Komatsu

发表机构 * Department of Aerospace Engineering, Tohoku University（东大航空航天工程系）； Research Center for Green X-Tech, Tohoku University（东大绿色X技术研究中心）

AI总结提出知识流形框架，通过字符n-gram TF-IDF、SPH插值、高斯过程回归和黎曼测地线路径，实现文献的语义映射、虚拟知识生成和概念桥梁发现。

详情

AI中文摘要

我们提出了知识流形：一个黎曼几何空间，其中文档语料库根据从字符n-gram TF-IDF表示中导出的语义位置关系进行排列。该框架包含五个紧密耦合的阶段。首先，每篇文档被转换为字符级n-gram TF-IDF向量（4-7克，最多250,000个特征，L2归一化），并通过带有排斥、方差和中心正则化项的约束应力最小化嵌入到二维知识地图中。其次，通过使用三次样条核的平滑粒子流体动力学（SPH）插值估计任意查询点的知识，得到可进行语言表征的插值TF-IDF特征向量。第三，从SPH插值图计算0、45和90度方向的知识梯度，并通过内积和余弦相似度量化成对方向相似性。第四，一个高斯过程回归（GPR）模型，使用在10维SVD投影上拟合的Constant × RBF + White核，提供查询点的贝叶斯后验均值、不确定性估计和每篇文档的贡献率。第五，通过最小化由SPH诱导度量张量导出的离散黎曼路径能量，使用L-BFGS-B算法和七个确定性初始路径候选，获得知识空间中的测地线。我们将该公式应用于20篇纤维增强复合材料与航空航天结构力学论文的语料库，表明语义地图恢复了有意义的研究聚类，测地线路径揭示了遥远主题之间的自然概念桥梁，并且SPH/GPR插值能够生成虚拟知识：描述未研究但几何预测的研究方向的假设论文摘要。

英文摘要

We present the knowledge manifold: a Riemannian geometric space in which a corpus of documents is arranged according to semantic positional relationships derived from character n-gram TF-IDF representations. The framework proceeds in five tightly coupled stages. First, each document is converted to a character-level n-gram TF-IDF vector (4-7 grams, up to 250,000 features, L2-normalized) and embedded in a two-dimensional knowledge map via constrained stress minimization with repulsion, variance, and centering regularizers. Second, knowledge at an arbitrary query point is estimated through Smoothed Particle Hydrodynamics (SPH) interpolation using a cubic-spline kernel, yielding an interpolated TF-IDF feature vector that can be linguistically characterized. Third, directional knowledge gradients at 0, 45, and 90 degrees are computed from the SPH interpolation map, and pairwise directional similarity is quantified via inner product and cosine similarity. Fourth, a Gaussian Process Regression (GPR) model, with a Constant x RBF + White kernel fitted on a 10-dimensional SVD projection, provides a Bayesian posterior mean, uncertainty estimate, and per-document contribution rate at the query point. Fifth, geodesics in the knowledge space are obtained by minimizing a discrete Riemannian path energy derived from the SPH-induced metric tensor, using L-BFGS-B with seven deterministic initial-path candidates. We apply the formulation to a corpus of 20 papers in fiber-reinforced composite materials and aerospace structural mechanics, showing that the semantic map recovers meaningful research clusters, geodesic paths reveal natural conceptual bridges between distant topics, and SPH/GPR interpolation enables the generation of virtual knowledge: hypothetical paper abstracts describing unstudied but geometrically predicted research directions.

URL PDF HTML ☆

赞 0 踩 0

2606.05608 2026-06-11 cs.SE cs.AI 版本更新

Agentic Software: How AI Agents Are Restructuring the Software Paradigm

软件工程的终结：AI代理如何根本性地重构软件范式

Zhenfeng Cao

发表机构 * Lingxi Intelligent Investment (Shenzhen) Development Co., Ltd.（灵犀智能投资（深圳）发展有限公司）

AI总结本文通过第一性原理分析，论证了以LLM为推理引擎的AI代理系统正在根本性地重构软件范式，从传统软件（代码承载决策逻辑）转向代理系统（代码作为临时工具），并提出了代理工程作为新兴学科。

Comments 15 pages, 2 figures, and 3 tables

详情

AI中文摘要

半个多世纪以来，软件工程一直基于一个基本前提：人类工程师分解问题，将决策逻辑编码为静态代码，并随着需求演变手动调整代码。本文认为，AI代理——即大型语言模型作为主要推理引擎、动态生成和丢弃代码作为工具资源的系统——的出现并非渐进式改进，而是对软件范式的根本性重构。基于复杂性缩放的第一性原理分析，我们形式化了传统软件（代码是决策逻辑的载体）与代理系统（代码是LLM驱动推理循环的临时工具）之间的区别。我们追溯了从许可软件到SaaS再到我们所谓的代理即服务（AaaS）的历史轨迹，表明每次转变都将额外的复杂性从最终用户转移出去。我们引入了代理工程作为一门新兴学科——其核心研究对象、控制模型和人类角色均不同于软件工程。通过分析最近的基准证据，包括SWE-bench Verified、EvoClaw和LangChain的多代理协调研究，我们展示了代理范式的变革潜力及其当前局限性。最后，我们提出了一个迈向自我进化代理生态系统的四阶段路线图，并为应对这一转变的从业者提供了具体建议。

英文摘要

For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve. This paper argues that the emergence of AI agents -- systems where large language models serve as the primary reasoning engine, dynamically generating and discarding code as an instrumental resource -- constitutes a fundamental restructuring of what software is, not an incremental tool improvement. We formalize the distinction between traditional deterministic software and agentic software: in the former, code is the carrier of pre-written decision logic; in the latter, the agent itself is the software, and its decision logic is generated at runtime. We trace the historical arc from licensed software to SaaS to Agent-as-a-Service (AaaS), showing that each shift transferred additional complexity away from end-users -- with the agentic shift transferring not just operational complexity but decision-making complexity itself. We introduce Agentic Engineering as an expansion of the software engineering discipline into a new paradigm, distinct in its core object of study (agent systems rather than static source code), its control model (LLM-driven rather than human-predefined), and its human role (intent architect rather than code author). Through analysis of recent benchmark evidence including SWE-bench Verified, EvoClaw, and LangChain's multi-agent coordination studies, we demonstrate both the transformative potential of the agentic paradigm and its current limitations. We conclude with a four-stage roadmap toward self-evolving agent ecosystems and concrete recommendations for practitioners navigating this transition.

URL PDF HTML ☆

赞 0 踩 0

2606.05551 2026-06-11 stat.ML cs.AI cs.LG 版本更新

Conformal Risk-Averse Decision Making with Action Conditional Guarantee

具有行动条件保证的共形风险规避决策

Zihan Zhu, Shayan Kiyani, George Pappas, Hamed Hassani

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出行动条件共形预测方法，通过分位数损失最小化算法实现行动条件风险价值优化，在有限样本下提供行动条件安全保证。

详情

AI中文摘要

由机器学习模型驱动的可靠决策管道需要具有明确安全保证的不确定性量化（UQ）方法。共形预测通过将ML预测包装成预测集来提供这种UQ，而Kiyani等人（2025b）的最新工作表明，这些集合可以转化为最优的风险规避决策策略——但仅继承边际安全保证。我们通过以下方式推广并加强了他们的结果：（i）引入行动条件共形预测，该预测产生明确条件于决策者所采取的每个行动的安全保证；（ii）表明行动条件预测集可作为风险规避决策者旨在优化行动条件风险价值的可行决策空间的代理；（iii）提出一种基于分位数损失最小化的原则性有限样本算法，将Gibbs等人（2025）的框架与行动条件保证联系起来。在两个真实世界数据集上的实验证实，我们的方法在行动条件性能上显著优于共形基线。

英文摘要

Reliable decision making pipelines powered by machine learning models require uncertainty quantification (UQ) methods that come with explicit safety guarantees. Conformal prediction provides such UQ by wrapping ML predictions into prediction sets, and recent work by Kiyani et al. (2025b) established that these sets can be translated into optimal risk-averse decision policies -- yet only inheriting marginal safety guarantees. We generalize and strengthen their results by (i) introducing action-conditional conformal prediction, which yields safety guarantees conditioned explicitly on each action taken by the decision maker, (ii) showing that action-conditional prediction sets serve as a proxy for the feasible decision space for risk-averse decision makers aiming to optimize action-conditional value-at-risk, and (iii) proposing a principled finite-sample algorithm based on pinball-loss minimization, connecting the framework of Gibbs et al. (2025) to action-conditional guarantees. Experiments on two real-world datasets confirm that our approach significantly improves action-conditional performance over conformal baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.31506 2026-06-11 cs.IR cs.CL 版本更新

Evaluating Factual Density in Multi-Source RAG: A Study in Medical AI Accuracy

评估多源RAG中的事实密度：医学AI准确性研究

Michael R. DeMarco

发表机构 * NexusAgentics

AI总结针对标准RAG管道因专家盲视效应而忽视高密度事实证据的问题，提出事实密度（FD*）作为检索优化信号，通过概率事实性分析预处理和Z-score归一化消除长度偏差，在HealthFC基准上实现100%系统综述覆盖率。

Comments 16 pages, 8 tables. Includes Experiment 3 results (n=11, Wilcoxon p=0.0619). Preliminary findings; powered Experiment 3 and Graph RAG extension identified as future work. Updated from v1

详情

AI中文摘要

检索增强生成（RAG）是当前将AI锚定于现实世界事实的行业标准。传统检索方法依赖关键词匹配和主题接近度，根据内容与用户查询的相似程度进行排序。但它们并未衡量内容实际包含多少经过验证的事实。这种结构性差距被称为专家盲视效应，导致标准RAG管道持续将高密度事实证据埋没，而偏向于同一主题的词汇主导文本。为解决这一差距，本文引入事实密度（FD*），一种新颖的检索优化信号，衡量经过验证的原子声明相对于总标记数的比例。使用NexusAgentics Ghost Audit预处理管道，通过概率事实性分析对原始文本进行事实特异性评分，在语料库摄入前过滤内容。初始公式引入了严重的文档长度混杂因素（Pearson R = -0.8636，p = 2.27e-07）。在长度区间内实施Z-score归一化解决了这一偏差，验证了FD*作为长度无关的密度信号（p = 0.0749）。在HealthFC基准（由医学专家标记为支持、反驳或无证据的750个健康声明）上评估，FD*优化的检索是唯一在top-5结果中实现100%系统综述饱和度的条件，使标准余弦相似度排名前十之外的Cochrane证据浮现。真实验证确认了跨越七个HealthFC支持声明的25个映射。由于语料库-基准对齐的限制，n=50个查询的完整统计验证仍是未来工作，但这些发现确立了事实密度重排序作为一种低成本、高影响力的干预措施，用于提高健康RAG架构的事实精度。

英文摘要

Retrieval-Augmented Generation (RAG) is the current industry standard for grounding AI in real-world facts. Traditional retrieval methods rely on keyword matching and topic proximity, ranking content based on how closely it sounds like the user's query. What they do not measure is how many verified facts the content actually contains. This structural gap, termed the Expert Blindness Effect, causes standard RAG pipelines to consistently bury high-density factual evidence in favor of lexically dominant text on the same topic. To address this gap, this paper introduces Factual Density (FD*), a novel retrieval optimization signal that measures the proportion of verified atomic claims relative to total token count. Using the NexusAgentics Ghost Audit preprocessing pipeline, raw text is scored for factual specificity using probabilistic factuality analysis to filter content before corpus ingestion. An initial formulation introduced a severe document-length confound (Pearson R = -0.8636, p = 2.27e-07). Implementing Z-score normalization within length bins resolved this bias, validating FD* as a length-independent density signal (p = 0.0749). Evaluated against the HealthFC benchmark (750 health claims labeled Supported, Refuted, or No Evidence by medical experts), FD*-optimized retrieval was the only condition to achieve 100% systematic review saturation in top-5 results, surfacing Cochrane evidence that standard cosine similarity ranked outside the top ten. Ground truth verification confirmed 25 mappings across seven HealthFC-supported claims. While full statistical validation across n=50 queries remains future work due to constraints on corpus-benchmark alignment, these findings establish factual density reranking as a low-cost, high-impact intervention for improving factual precision in health RAG architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.26234 2026-06-11 math.DG cs.LG math.GT 版本更新

Minimal surfaces, Knots, and Neural Networks

极小曲面、纽结与神经网络

Tancredi Schettini Gherardini, Marco Usula

发表机构 * GitHub

AI总结基于物理信息神经网络求解双曲空间中的极小曲面方程，通过计算纽结边界的极小曲面及其自交数，为Fine猜想提供了实证支持。

Comments 38 pages, 12 figures; small cosmetic update

2605.22346 2026-06-11 stat.ML cs.LG cs.SI 版本更新

The ASE-LSE Disagreement Landscape: An End-to-End Characterisation of Extremes and Structural Drivers

偏离正则性：度异质性和特征间隙作为ASE-LSE潜在子空间分歧的结构驱动因素

Minh Triet Pham, Ian Gallagher

发表机构 * School of Mathematics and Statistics（数学与统计学系）； The University of Melbourne（墨尔本大学）

AI总结本文研究了图数据分析中邻接谱嵌入和拉普拉斯谱嵌入方法在相同网络上产生不同结果的结构原因，揭示了度异质性和社区结构强度对潜在子空间分歧的影响。

Comments This paper is being withdrawn as it was submitted without the consent of all listed authors, and contains work that is currently under academic assessment. It will be resubmitted at an appropriate time once evaluation is complete

详情

AI中文摘要

图数据分析中，邻接谱嵌入和拉普拉斯谱嵌入两种最常用方法在相同网络上常产生不同结果。本文提供了结构上的解释。我们证明正则性是完美一致的充分条件：当每个节点具有相同数量的连接时，两种方法产生相同的潜在子空间。任何偏离正则性都会引入分歧，我们证明了一个显式的界限，其两个术语表明控制分歧的结构因素：度异质性推动方法分离，社区结构强度则拉近它们。我们通过成千上万个模拟网络验证了这两种驱动因素，确认异质性推动分歧增加，社区强度抑制它，其比值提供了两种嵌入可以互换或不可互换的强预测。

英文摘要

Two of the most widely used methods for analysing graph data, Adjacency Spectral Embedding and Laplacian Spectral Embedding, often produce different results when applied to the same graph. Yet the structural reasons behind this disagreement remain incompletely understood. This paper provides an end-to-end account of ASE-LSE latent subspace disagreement. We first prove that the two methods produce identical latent subspaces for every embedding dimension whenever the Laplacian is a scalar multiple of the adjacency matrix, and show that this scalar relationship holds if and only if the graph is either regular or bipartite biregular. This anchor result identifies a sufficient condition for perfect agreement that pins down the floor of the disagreement spectrum and supplies the baseline for the perturbation analysis. We then prove that no maximal-disagreement graph or family of graphs exists: the disagreement is always strictly below its theoretical ceiling, and we exhibit a witness family demonstrating that no finite maximum is attainable, so the disagreement landscape has no maximiser. With both endpoints established, we derive a Regularity Departure Bound whose two terms isolate degree heterogeneity and eigengap as the primary structural factors influencing disagreement in the middle regime. Empirical validation across thousands of simulated graphs confirms the mechanisms predicted by the bound: heterogeneity pushes disagreement up, eigengap suppresses it, and their joint ratio emerges as a unified predictor of ASE-LSE disagreement, suggesting when the two embeddings can be treated as interchangeable and when they cannot.

URL PDF HTML ☆

赞 0 踩 0

2603.12901 2026-06-11 stat.ML cond-mat.dis-nn cs.IT cs.LG math.IT 版本更新

A theory of learning data statistics in diffusion models, from easy to hard

扩散模型中学习数据统计的理论：从容易到困难

Lorenzo Bardone, Claudia Merger, Sebastian Goldt

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文研究了扩散模型在学习数据统计时的分布简单性偏差，揭示了学习 pairwise 统计和 higher-order 统计所需的样本复杂度差异，并引入了扩散信息指数这一不变量。

详情

Journal ref: ICML 2026

AI中文摘要

尽管扩散模型已成为强大的生成模型，但其学习动态仍不明确。我们通过实验证明，标准扩散模型在自然图像上学习时存在分布简单性偏差，先学习简单的 pairwise 输入统计，再转向更高阶相关性。我们在简单的去噪器上用最小数据模型混合累积模型重现了这一行为，并精确控制了输入的 pairwise 和 higher-order 相关性。我们识别出一个模型不变量，即扩散信息指数，类比于不同学习范式中的相关不变量。利用这一不变量，我们证明去噪器在线性样本复杂度下学习输入的简单 pairwise 统计，而更复杂的 higher-order 统计如四阶累积量需要至少立方样本复杂度。我们还证明，如果 pairwise 和 higher-order 统计共享相关潜在结构，则学习四阶累积量的样本复杂度是线性的。本文描述了扩散模型如何学习越来越复杂分布的关键机制。

英文摘要

While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.

URL PDF HTML ☆

赞 0 踩 0

2411.10959 2026-06-11 econ.EM cs.LG math.ST stat.AP stat.ME stat.ML stat.TH 版本更新

Program Evaluation with Remotely Sensed Outcomes

利用遥感结果的程序评估

Ashesh Rambachan, Rahul Singh, Davide Viviano

发表机构 * MIT（麻省理工学院）； Harvard（哈佛大学）

AI总结本文研究了在实验和准实验中，由于遥感变量不完全测量经济结果而引起的因果推断问题，提出了一种非参数识别因果参数的方法，结合实验和观测数据进行n^{-1/2}推断。

2605.17557 2026-06-11 cs.GR cs.CV 版本更新

Real-Time Neural Hair Denoising

实时神经头发去噪

Chenghao Wu, Yuefan Shen, Tao Huang, Kai Yan, Zahra Montazeri, Kui Wu

发表机构 * University of Manchester（曼彻斯特大学）

AI总结本文提出了一种轻量级的实时方法，用于从严重欠采样的光栅化输入中重建基于丝状的头发G-Buffers。方法首先应用神经空间重建和时间累积来恢复头发覆盖，即像素内的分数头发可见性及切线向量，然后利用切线引导的重建步骤完成位置信息，随后用于基于物理的延迟头发着色。在多种发型和静态/动态场景下评估了该方法，其头发重建质量优于现有专门针对头发的去噪技术以及通用工业神经重建解决方案如DLSS和FSR。

2605.05368 2026-06-11 math.LO cs.AI 版本更新

Towards an Inferentialist Account of Information Through Proof-theoretic Semantics

走向信息的推理主义账户：通过证明论语义

Matthew Collinson, Timo Eckhardt, David Pym

发表机构 * University of Aberdeen, King’s College（阿伯丁大学，国王学院）； UCL & Institute of Philosophy, University of London（伦敦大学学院（UCL）及哲学研究所）； University of London, Senate House, Malet Street（伦敦大学 senate house, malet street）

AI总结本文旨在通过证明论语义发展一种信息的推理主义理论，通过概念分析、逻辑和系统三个核心组件，为信息提供数学逻辑基础，并探讨信息作为相关性的理解。

Comments Manuscript

详情

AI中文摘要

信息是当前时代最广泛讨论的概念之一。然而，尽管有大量深刻的见解工作，仍未完全令人信服的逻辑或数学基础。没有这些，我们缺乏足够的推理工具来理解社会依赖的复杂系统生态系统。我们通过朝着发展信息的推理主义语义理论迈出第一步来纠正这一点。有三个关键相互作用的组成部分。首先，概念分析：信息的形而上学。Dretske用意向性、真理和传递性来表达信息的关键概念。我们用推理性代替真理，并追溯这种替代的后果。其次，逻辑：证明论语义（P-tS）为推理主义推理提供了数学-逻辑的实现。使用P-tS，我们发展了信息的推理主义原始单位“inferon”的数学-逻辑理论的第一步。这种证明论方法与情况理论中信息的模型论观点相对。此外，我们论证它有助于处理van Benthem和Martinez对信息理解的三类分类：范围、相关性和代码。我们的重点是信息作为相关性。第三，系统：我们开发的P-tS工具为分布式系统建模的数学账户提供了基础——这是信息学中理解信息处理系统组织的关键工具。这导致了分布式系统模型中信息流的推理理论。总体而言，我们试图为信息及其在信息学中的作用提供概念严谨的数学-逻辑账户，基于推理和推理。

英文摘要

Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools for understanding the complex ecosystems of systems upon which the society depends. We seek to rectify this by taking a first step towards developing an inferentialist semantic theory of information. There are three key interacting components. First, conceptual analysis: the metaphysics of information. Dretske expressed the key concepts of information in terms of intentionality, truth, and transmissibility. We replace truth with inferability, and trace the consequences of this replacement. Second, logic: proof-theoretic semantics (P-tS) provides a mathematical-logical realization of inferentialist reasoning. Using P-tS, we develop the first steps towards a mathematical-logical theory of an inferentialist primitive unit of information, the 'inferon'. This proof-theoretic approach counterpoints the model-theoretic view of information articulated in situation theory. Furthermore, we argue that it facilitates addressing all three components of van Benthem and Martinez's categorization of the understandings of information, as range, as correlation, and as code. Our focus is on information-as-correlation. Third, systems: the P-tS tools we develop provide the basis for a mathematical account of distributed systems modelling -- a key tool from informatics for understanding the organization of information processing systems. This yields a reasoning-based theory of information flow in models of distributed systems. Overall, we seek to give a conceptually rigorous mathematical-logical account of information and its role within informatics, grounded in inference and reasoning.

URL PDF HTML ☆

赞 0 踩 0

2605.14084 2026-06-11 cs.SE cs.AI cs.CL 版本更新

CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing

CRANE：通过空域编辑实现代码代理的约束推理注入

Mingzhi Zhu, Michele Merler, Raju Pavuluri, Stacy Patterson

发表机构 * Rensselaer Polytechnic Institute（拉特格斯理工学院）； IBM Research（IBM研究院）

AI总结 CRANE通过空域编辑技术，结合推理和工具使用能力，提升代码代理性能，在多个基准测试中取得显著成果。

详情

AI中文摘要

代码代理必须同时对长周期的仓库状态进行推理并遵守严格的工具使用协议。在配对的Instruct/Thinking检查点中，这些能力是互补但不一致的。Instruct模型简洁且工具纪律性强，而Thinking模型提供更强的规划和恢复行为，但往往过度 deliberates 并降低代理性能。我们提出CRANE（通过空域编辑实现代码代理的约束推理注入），一种无需训练的参数编辑方法，将Thinking-Instruct的delta视为Instruct骨干的候选推理编辑方向池。CRANE结合幅度阈值去噪delta，保守的泰勒门来保留对推理转移和工具使用保留共同有益的编辑，以及渐进的Sigmoid投影来抑制格式关键的更新方向。通过合并配对的Instruct和Thinking检查点，CRANE在单独模型上取得显著优势的同时保持Instruct级别的效率：在Roo-Eval上，它实现了Qwen3-30B-A3B的pass1为66.2%（+19.5%）和Qwen3-Next-80B-A3B的81.5%（+8.7%）；在SWE-bench-Verified上，它在两个规模（122/500和180/500）上解决了多达14个额外的实例；在Terminal-Bench v2上，它提高了pass1/pass5高达2.3%/7.8%，分别达到7.6%/17.9%和14.8%/30.3%，在所有三个基准测试中一致超越了其他合并策略。

英文摘要

Code agents must both reason over long-horizon repository state and obey strict tool-use protocols. In paired Instruct/Thinking checkpoints, these capabilities are complementary but misaligned. The Instruct model is concise and tool-disciplined, whereas the Thinking model offers stronger planning and recovery behavior but often over-deliberates and degrades agent performance. We present CRANE (Constrained Reasoning Injection for Code Agents via Nullspace Editing), a training-free parameter-editing method that treats the Thinking-Instruct delta as a directional pool of candidate reasoning edits for the Instruct backbone. CRANE combines magnitude thresholding to denoise the delta, a Conservative Taylor Gate to retain edits that are jointly beneficial for reasoning transfer and tool-use preservation, and Graduated Sigmoidal Projection to suppress format-critical update directions. By merging paired Instruct and Thinking checkpoints, CRANE delivers strong gains over either individual model while preserving Instruct-level efficiency: on Roo-Eval it achieves pass1 of 66.2% (+19.5%) for Qwen3-30B-A3B and 81.5% (+8.7%) for Qwen3-Next-80B-A3B; on SWE-bench-Verified it resolves up to 14 additional instances at both scales (122/500 and 180/500); and on Terminal-Bench v2 it improves pass1/pass5 by up to 2.3%/7.8%, reaching 7.6%/17.9% and 14.8%/30.3%, respectively, consistently outperforming alternative merging strategies across all three benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.10907 2026-06-11 cs.CR cs.AI 版本更新

Engineering Robustness into Personal Agents with the AI Workflow Store

通过AI工作流存储增强个人代理的鲁棒性

Roxana Geambasu, Mariana Raykova, Pierre Tholoniat, Trishita Tiwari, Lillian Tsai, Wen Zhang

发表机构 * Columbia University and Google（哥伦比亚大学和谷歌）； Google（谷歌）

AI总结本文探讨将严谨的软件工程流程整合到代理循环中，以生成可靠、安全且确定性约束的代理工作流，提升高风险场景下的性能。

2605.06100 2026-06-11 eess.SP cs.AI cs.LG cs.RO 版本更新

CredibleDFGO: Differentiable Factor Graph Optimization with Credibility Supervision

可信DFGO：具有可信度监督的可微因子图优化

Liang Qian, Penggao Yan, Penghui Xu, Li-Ta Hsu

发表机构 * Department of Aeronautical and Aviation Engineering（航空与航空工程系）

AI总结针对GNSS协方差不可靠问题，提出CredibleDFGO框架，通过可微高斯-牛顿求解器与加权生成网络，利用适当评分规则监督预测分布，提升协方差可信度与定位精度。

Comments Submitted to NAVIGATION: Journal of the Institute of Navigation

详情

AI中文摘要

全球导航卫星系统（GNSS）定位广泛用于城市导航，但GNSS求解器报告的协方差在城市峡谷中通常不可靠。现有的可微因子图优化（DFGO）方法通过求解器学习测量加权，但仍仅使用位置目标。因此，位置估计可能改善，而报告的协方差仍然过小、过大或方向错误。我们提出CredibleDFGO（CDFGO），一种可微GNSS因子图框架，将协方差可信度作为显式训练目标。加权生成网络（WGN）预测每颗卫星的可靠性权重，可微高斯-牛顿求解器将这些权重映射到位置估计和基于Hessian的后验协方差。我们使用适当评分规则端到端监督东-北预测分布。我们研究了负对数似然（NLL）、能量分数（ES）及其组合。在三个UrbanNav测试场景上的结果表明，协方差可信度持续提升。定位精度在中度城市和严峻城市场景中也有所提高；在深度城市场景中，平均水平误差和第95百分位误差均有所改善。在严峻城市的旺角（MK）场景中，与DFGO（MAE）相比，CDFGO-Combined将平均水平误差从13.77米降至11.68米，将NLL从40.63降至6.59，将ES从12.31降至9.05。案例研究将MK改进归因于更好的轴向一致性、更可信的局部协方差椭圆以及卫星级重新加权。

英文摘要

Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods learn measurement weighting through the solver, but they still use position-only objectives. As a result, the position estimate may improve while the reported covariance remains too small, too large, or incorrectly oriented. We propose CredibleDFGO (CDFGO), a differentiable GNSS factor graph framework that makes covariance credibility an explicit training target. A Weighting Generation Network (WGN) predicts per-satellite reliability weights, and a differentiable Gauss-Newton solver maps these weights to a position estimate and a Hessian-derived posterior covariance. We use proper scoring rules to supervise the East-North predictive distribution end to end. We study negative log-likelihood (NLL), the energy score (ES), and their combination. Results on three UrbanNav test scenes show consistent gains in covariance credibility. Positioning accuracy also improves on the medium-urban and harsh-urban scenes; on the deep-urban scene, both the mean horizontal error and the 95th-percentile error improve. On the harsh-urban Mong Kok (MK) scene, CDFGO-Combined reduces the mean horizontal error from 13.77 m to 11.68 m, reduces NLL from 40.63 to 6.59, and reduces ES from 12.31 to 9.05 relative to DFGO (MAE). Case studies link the MK improvement to better axis-wise consistency, more credible local covariance ellipses, and satellite-level reweighting.

URL PDF HTML ☆

赞 0 踩 0