arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.13213 2026-06-10 stat.ML cs.LG math.OC physics.chem-ph 版本更新

Rare Event Analysis via Stochastic Optimal Control

基于随机最优控制的稀有事件分析

Yuanqi Du, Jiajun He, Dinghuai Zhang, Eric Vanden-Eijnden, Carles Domingo-Enrich

发表机构 * Microsoft Research New England（微软研究院新英格兰分部）； Cornell University（康奈尔大学）； University of Cambridge（剑桥大学）； Courant Institute of Mathematical Sciences, NYU（纽约大学Courant数学科学研究所）

AI总结提出将稀有事件分析中的committor函数估计转化为随机最优控制问题，通过反馈控制引导轨迹采样，并开发两种损失函数及处理亚稳态的方法，在基准系统上获得更准确的结果。

详情

AI中文摘要

稀有事件，如生物分子的构象变化、相变和化学反应，是许多物理系统行为的关键，但由于无偏模拟很少产生这些事件，因此计算研究极其困难。过渡路径理论（TPT）为分析此类事件提供了严格的统计框架：它表征了两个指定亚稳态（反应物和产物）之间的反应轨迹集合，其核心对象——committor函数（给出系统下一步到达产物而非反应物的概率）——编码了所有基本的动力学和热力学信息。我们引入了一个框架，将committor估计转化为随机最优控制（SOC）问题。在此公式中，committor定义了一个反馈控制（与其对数梯度成正比），该控制主动引导轨迹朝向反应区域，从而实现对反应路径的高效采样。为了解决由此产生的命中时间控制问题，我们开发了两个互补的目标：直接反向传播损失和基于原理的离策略值匹配损失，并为其建立了一阶最优性保证。我们进一步通过引入一种替代采样过程来解决亚稳态问题（该问题可能使受控轨迹陷入中间势阱），该过程在降低有效能垒的同时保持反应电流。在基准系统上，该框架比现有方法产生了显著更准确的committor估计、反应速率和平衡常数。

英文摘要

Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.

URL PDF HTML ☆

赞 0 踩 0

2604.10760 2026-06-10 cs.MA cs.AI 版本更新

Prosociality by Coupling, Not Mere Observation: Homeostatic Sharing in an Inspectable Recurrent Artificial Life Agent

通过耦合而非单纯观察的亲社会性：可检查循环人工生命体中的稳态共享

Aishik Sanyal

发表机构 * Independent Research Engineer（独立研究工程师）

AI总结研究通过稳态耦合而非直接奖励或观察实现人工体的亲社会行为，发现耦合使智能体主动帮助同伴，而单纯观察无效。

Comments Accepted at ALIFE 2026 Conference, Waterloo Institute for Complexity & Innovation

详情

AI中文摘要

人工体可以通过显式社会奖励、硬编码亲社会奖励或直接访问另一智能体的状态来被设计为“帮助”。本文隔离了一条更窄的路径：稳态耦合。基于ReCoN-Ipsundrum，我添加了一个标量稳态器和一个社会耦合通道，同时保持动作选择自我导向：规划器仅对智能体自身预测的内部状态评分，没有同伴福利奖励。在一步FoodShareToy中，精确求解器发现默认状态下从EAT到PASS的转换发生在$\lambda^\star \approx 0.91$。在多步SocialCorridorWorld中，无耦合的同伴状态访问不改变行为，而耦合的智能体则取、携带并传递食物给同伴。假手术保留帮助行为；耦合关闭和同伴打乱手术消除帮助行为。耦合/负载扫描显示，耦合创建了一个低负载帮助机制，但在更高代谢负载下不能保证救援。这不是关于共情、利他、意识或道德地位的声明。这是一个最小的人工生命演示，表明在该控制器中，除非同伴困境被路由到自我调节中，否则同伴状态访问在行为上是惰性的。

英文摘要

Artificial agents can be made to ``help'' through explicit social rewards, hard-coded prosocial bonuses, or direct access to another agent's state. I isolate a narrower route: homeostatic coupling. Building on ReCoN-Ipsundrum, I add a scalar homeostat and a social coupling channel while keeping action selection self-directed: the planner scores only the actor's predicted internal state, with no partner-welfare reward. In a one-step FoodShareToy, an exact solver finds a switch from EAT to PASS at $λ^\star \approx 0.91$ for the default state. In a multi-step SocialCorridorWorld, partner-state access without coupling leaves behavior unchanged, whereas coupled agents fetch, carry, and pass food to the partner. Sham lesions preserve helping; coupling-off and shuffled-partner lesions abolish it. A coupling/load sweep shows that coupling creates a low-load helping regime but does not guarantee rescue under higher metabolic load. This is not a claim about empathy, altruism, consciousness, or moral status. It is a minimal ALife demonstration that, in this controller, partner-state access is behaviorally inert unless partner distress is routed into self-regulation.

URL PDF HTML ☆

赞 0 踩 0

2603.29730 2026-06-10 stat.ML cs.LG 版本更新

mlr3mbo: Bayesian Optimization in R

mlr3mbo：R语言中的贝叶斯优化

Marc Becker, Lennart Schneider, Martin Binder, Lars Kotthoff, Bernd Bischl

发表机构 * Department of Statistics, LMU Munich（慕尼黑大学统计系）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； University of St Andrews（圣安德鲁大学）

AI总结介绍mlr3mbo，一个模块化的R语言贝叶斯优化工具箱，支持单/多目标优化、多提议、并行化，并通过坐标下降搜索和基准测试验证其性能与现有优化器相当。

详情

AI中文摘要

我们提出mlr3mbo，一个用于R语言中贝叶斯优化的模块化工具箱。mlr3mbo支持单目标和多目标优化、多点提议、批量与异步并行化以及稳健的错误处理。虽然它可用于许多标准贝叶斯优化变体的应用场景，但研究人员也可以从其灵活的构建块中构建自定义贝叶斯优化算法。除了介绍软件、设计原则和构建块外，本文还在基于代理的基准套件YAHPO Gym上进行了两次广泛的实证评估。为了识别数值和混合层次优化场景下的稳健默认配置，并进一步了解各个设置的各自影响，我们在mlr3mbo配置空间上运行坐标下降搜索并分析其结果。此外，我们将mlr3mbo与包括HEBO、SMAC3、Ax和Optuna在内的多种现有优化器进行基准测试，发现其性能与最新技术相当。

英文摘要

We present mlr3mbo, a modular toolbox for Bayesian optimization in R. mlr3mbo supports single- and multi-objective optimization, multi-point proposals, batch and asynchronous parallelization, and robust error handling. While it can be used for many standard Bayesian optimization variants in applied settings, researchers can also construct custom Bayesian optimization algorithms from its flexible building blocks. In addition to an introduction to the software, its design principles, and its building blocks, the paper presents two extensive empirical evaluations on the surrogate-based benchmark suite YAHPO Gym. To identify robust default configurations for both numeric and mixed-hierarchical optimization regimes, and to gain further insights into the respective impacts of individual settings, we run a coordinate descent search over the mlr3mbo configuration space and analyze its results. Furthermore, we benchmark mlr3mbo against a wide range of established optimizers, including HEBO, SMAC3, Ax, and Optuna, and find that it performs on par with state-of-the-art.

URL PDF HTML ☆

赞 0 踩 0

2603.04689 2026-06-10 cs.DS cs.CC cs.CG cs.CY cs.DB cs.LG 版本更新

Generalizing Fair Top-$k$ Selection: An Integrative Approach

公平的top-k选择的泛化：一种整合方法

Guangya Cai

发表机构 * University of Minnesota, Twin Cities（明尼苏达大学，双城分校）

AI总结本文研究了在多个受保护群体下寻找公平线性评分函数的问题，通过分析发现即使对于二维数据集和小k值，问题可能计算上不可行，但通过引入新的 disparity 测量方法，可恢复小k值时的效率。

详情

AI中文摘要

公平的top-k选择，确保在选出的top-k候选者中适当代表少数族或历史上不利群体，引起了广泛关注。我们研究了在多个受保护群体下寻找公平（线性）评分函数的问题，同时最小化与参考评分函数的差异。这扩展了之前的设置，该设置仅限于单群体设置且不考虑差异最小化。先前研究暗示受保护群体的数量可能对运行时间效率影响有限。然而，出于实验探索的需要，我们发现这一暗示忽略了可能影响结果公平性的关键问题。一旦正确考虑这个问题，我们的难度分析显示，即使对于二维数据集和小k值，问题可能计算上不可行。然而，我们的分析也揭示了难度障碍的差距，使我们能够在受保护群体数量足够小时恢复小k值的效率。此外，除了将差异测量为“公平评分函数与参考评分函数之间的距离”外，我们引入了另一种差异测量——即“效用损失”，这可能在小权重扰动下产生更稳定的评分函数。通过仔细的工程权衡，平衡实现复杂性、鲁棒性和性能，我们的增强双管方案在真实世界数据集上表现出强大的经验性能，实验观察也影响了算法设计和实现决策。

英文摘要

Fair top-$k$ selection, which ensures appropriate proportional representation of members from minority or historically disadvantaged groups among the top-$k$ selected candidates, has drawn significant attention. We study the problem of finding a fair (linear) scoring function with multiple protected groups while also minimizing the disparity from a reference scoring function. This generalizes the prior setup, which was restricted to the single-group setting without disparity minimization. Previous studies imply that the number of protected groups may have a limited impact on the runtime efficiency. However, driven by the need for experimental exploration, we find that this implication overlooks a critical issue that may affect the fairness of the outcome. Once this issue is properly considered, our hardness analysis shows that the problem may become computationally intractable even for a two-dimensional dataset and small values of $k$. However, our analysis also reveals a gap in the hardness barrier, enabling us to recover the efficiency for the case of small $k$ when the number of protected groups is sufficiently small. Furthermore, beyond measuring disparity as the "distance" between the fair and the reference scoring functions, we introduce an alternative disparity measure$\unicode{x2014}$utility loss$\unicode{x2014}$that may yield a more stable scoring function under small weight perturbations. Through careful engineering trade-offs that balance implementation complexity, robustness, and performance, our augmented two-pronged solution demonstrates strong empirical performance on real-world datasets, with experimental observations also informing algorithm design and implementation decisions.

URL PDF HTML ☆

赞 0 踩 0

2603.23183 2026-06-10 cs.IR cs.AI 版本更新

Reasoning over Semantic IDs Enhances Generative Recommendation

基于语义ID的推理增强生成式推荐

Yingzhi He, Yan Sun, Junfei Tan, Yuxin Chen, Xiaoyu Kong, Chunxu Shen, Xiang Wang, An Zhang, Tat-Seng Chua

发表机构 * National University of Singapore（国立新加坡大学）； University of Science and Technology of China（中国科学技术大学）； Tencent Inc.（腾讯公司）

AI总结提出SIDReasoner两阶段框架，通过增强语义ID与语言的对齐和结果驱动的强化优化，实现无需大量推理标注的有效推理，提升生成式推荐的准确性、可解释性和跨领域泛化能力。

Comments Accepted by KDD 2026

详情

AI中文摘要

生成式推荐的最新进展通过将序列推荐形式化为在包含语言标记和物品标识符的统一标记空间上的自回归生成，利用了预训练的大语言模型，其中每个物品由紧凑的离散标记序列（即语义ID）表示。这种基于语义ID的公式能够在大规模物品语料库上实现高效解码，并为基于大语言模型的推荐系统利用丰富的世界知识提供了自然接口。同时，大语言模型推理的突破推动了推理增强推荐的发展，然而在语义ID上的有效推理仍然未被充分探索且具有挑战性。物品标记对大语言模型而言并非天然有意义；此外，面向推荐的语义ID推理难以评估，导致高质量监督稀缺。为了解决这些挑战，我们提出了SIDReasoner，一个两阶段框架，通过增强语义ID与语言的对齐来激发对语义ID的推理，从而解锁可迁移的大语言模型推理能力，而不是依赖大量推荐特定的推理轨迹。具体来说，SIDReasoner首先通过在由更强教师模型合成的丰富语义ID中心语料库上进行多任务训练来增强语义ID与语言的对齐，将物品标记扎根于多样的语义和行为上下文中。基于这种增强的对齐，SIDReasoner进一步通过结果驱动的强化优化来改进推荐推理，引导模型走向有效的推理轨迹，而无需显式的推理标注。在三个真实世界数据集上的大量实验证明了我们推理增强的基于语义ID的生成式推荐的有效性。除了准确性之外，结果还突出了大型推理模型在生成式推荐中的更广泛潜力，包括改进的可解释性和跨领域泛化能力。

英文摘要

Recent advances in generative recommendation have leveraged pretrained LLMs by formulating sequential recommendation as autoregressive generation over a unified token space comprising language tokens and itemic identifiers, where each item is represented by a compact sequence of discrete tokens, namely Semantic IDs (SIDs). This SID-based formulation enables efficient decoding over large-scale item corpora and provides a natural interface for LLM-based recommenders to leverage rich world knowledge. Meanwhile, breakthroughs in LLM reasoning motivate reasoning-enhanced recommendation, yet effective reasoning over SIDs remains underexplored and challenging. Itemic tokens are not natively meaningful to LLMs; moreover, recommendation-oriented SID reasoning is hard to evaluate, making high-quality supervision scarce. To address these challenges, we propose SIDReasoner, a two-stage framework that elicits reasoning over SIDs by strengthening SID--language alignment to unlock transferable LLM reasoning, rather than relying on large amounts of recommendation-specific reasoning traces. Concretely, SIDReasoner first enhances SID-language alignment via multi-task training on an enriched SID-centered corpus synthesized by a stronger teacher model, grounding itemic tokens in diverse semantic and behavioral contexts. Building on this enhanced alignment, SIDReasoner further improves recommendation reasoning through outcome-driven reinforced optimization, which guides the model toward effective reasoning trajectories without requiring explicit reasoning annotations. Extensive experiments on three real-world datasets demonstrate the effectiveness of our reasoning-augmented SID-based generative recommendation. Beyond accuracy, the results highlight the broader potential of large reasoning models for generative recommendation, including improved interpretability and cross-domain generalization.

URL PDF HTML ☆

赞 0 踩 0

2507.01062 2026-06-10 cs.CY cs.AI 版本更新

Quantifying Perception-Based Student Success with Generative AI: An Exploratory Monte Carlo Simulation

基于生成式AI的感知学生成功量化：一项探索性蒙特卡洛模拟

Seyma Yaman Kayadibi

发表机构 * arXiv

AI总结本研究开发了一个探索性蒙特卡洛模拟框架，通过结构化文献检索和概率建模，量化学生在使用生成式AI工具时的感知成功，揭示了权重结构对复合分数的影响。

Comments Published in Education Sciences. This article is an extended and substantially revised version of a conference paper presented at the Melbourne Institute of Technology ICETE Conference, Sydney, NSW, Australia, 9-10 February 2026. The earlier conference version is available at DOI 10.25397/ppny-f488

Journal ref Education Sciences 2026, 16, 832

详情

DOI: 10.3390/educsci16060832

AI中文摘要

生成式人工智能（GenAI）工具，如ChatGPT，在高等教育中引起了越来越多的关注，特别是关于学生如何感知其有用性、可用性和教育价值。本研究开发了一个探索性蒙特卡洛模拟框架，用于量化在GenAI使用背景下基于感知的学生成功。在Scopus中进行的PRISMA结构化文献检索识别了2023年至2025年间发表的19项实证研究，其中6项报告了适用于概率建模的项目级均值和标准差。选择了一个一致的10项、5点李克特量表可用性导向工具作为规范的概念验证数据集，并用于参数化逆方差加权蒙特卡洛模拟，生成10,000个合成观测值。结果表明，加权结构显著影响模拟结果，系统效率和学习负担获得最大的逆方差权重，因此对复合分数的影响最强。该研究提供了一个透明、可重复且保护隐私的概念验证框架，将结构化文献检索、项目级汇总统计和概率建模联系起来。

英文摘要

Generative artificial intelligence (GenAI) tools such as ChatGPT have attracted growing attention in higher education, particularly in relation to how students perceive their usefulness, usability, and educational value. This study develops an exploratory Monte Carlo simulation framework for quantifying perception-based student success in the context of GenAI use. A PRISMA-informed structured literature search in Scopus identified nineteen empirical studies published between 2023 and 2025, of which six reported item-level means and standard deviations suitable for probabilistic modelling. One coherent 10-item, 5-point Likert-scale usability-oriented instrument was selected as a canonical proof-of-concept dataset and used to parameterise an inverse-variance-weighted Monte Carlo simulation generating 10,000 synthetic observations. The results show that the weighting structure substantially influences the simulated outcome, with System Efficiency and Learning Burden receiving the largest inverse-variance weight and therefore the strongest influence on the composite score. The study offers a transparent, reproducible, and privacy-preserving proof-of-concept framework linking structured literature search, item-level summary statistics, and probabilistic modelling.

URL PDF HTML ☆

赞 0 踩 0

2603.08924 2026-06-10 stat.AP cs.AI cs.IR 版本更新

Quantifying Uncertainty in AI Visibility: A Statistical Framework for Generative Search Measurement

量化AI可见性的不确定性：生成式搜索测量的统计框架

Ronald Sielinski

发表机构 * IQRush

AI总结针对AI生成式搜索中可见性测量的随机性问题，提出将引用指标视为样本估计量，通过重复采样和Bootstrap置信区间揭示测量噪声，并给出样本量建议。

Comments 39 pages, 13 figures

详情

AI中文摘要

AI驱动的答案引擎本质上是不确定性的：在不同时间提交相同的查询可能会产生不同的响应并引用不同的来源。尽管存在这种随机行为，当前测量生成式搜索中领域可见性的方法通常依赖于单次运行的引用份额和普遍性的点估计，隐含地将其视为固定值。本文认为，引用可见性指标应被视为底层响应分布的样本估计量，而非固定值。我们通过三个生成式搜索平台——Perplexity Search、OpenAI SearchGPT和Google Gemini——对三个消费品主题进行重复采样，实证研究了引用变异性。采用了两种采样方案：连续九天的每日收集和十分钟间隔的高频采样。我们表明，引用分布遵循幂律形式，并在重复样本间表现出显著变异性。Bootstrap置信区间显示，许多领域间的明显差异落在测量过程的噪声基底内。全分布排名稳定性分析进一步表明，引用排名在样本间不稳定，不仅限于排名靠前的领域，而且在频繁引用的领域集中也是如此。这些发现表明，单次运行的可见性指标提供了对生成式搜索中领域性能的误导性精确描述。我们认为，必须附带不确定性估计报告引用可见性，并为实现可解释置信区间所需的样本量提供实用指导。

英文摘要

AI-powered answer engines are inherently non-deterministic: identical queries submitted at different times can produce different responses and cite different sources. Despite this stochastic behavior, current approaches to measuring domain visibility in generative search typically rely on single-run point estimates of citation share and prevalence, implicitly treating them as fixed values. This paper argues that citation visibility metrics should be treated as sample estimators of an underlying response distribution rather than fixed values. We conduct an empirical study of citation variability across three generative search platforms--Perplexity Search, OpenAI SearchGPT, and Google Gemini--using repeated sampling across three consumer product topics. Two sampling regimes are employed: daily collections over nine days and high-frequency sampling at ten-minute intervals. We show that citation distributions follow a power-law form and exhibit substantial variability across repeated samples. Bootstrap confidence intervals reveal that many apparent differences between domains fall within the noise floor of the measurement process. Distribution-wide rank stability analysis further demonstrates that citation rankings are unstable across samples, not only among top-ranked domains but throughout the frequently cited domain set. These findings demonstrate that single-run visibility metrics provide a misleadingly precise picture of domain performance in generative search. We argue that citation visibility must be reported with uncertainty estimates and provide practical guidance for sample sizes required to achieve interpretable confidence intervals.

URL PDF HTML ☆

赞 0 踩 0

2603.02673 2026-06-10 stat.ML cs.LG 版本更新

Exact Functional ANOVA Decomposition for Categorical Inputs Models

类别输入模型的精确函数ANOVA分解

Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré

发表机构 * Institut de Mathématiques de Toulouse（图卢兹数学研究所）； Université de Toulouse（图卢兹大学）； CNRS（国家科学研究中心）

AI总结针对类别输入模型，提出一种无需假设的闭式函数ANOVA分解方法，高效处理任意依赖结构，并自然推广SHAP值。

详情

AI中文摘要

函数ANOVA通过将模型预测分解为主效应和高阶交互，为可解释性提供了原则性框架。对于独立特征，该分解定义明确，与SHAP值紧密相关，并作为加性可解释性的基石。然而，对于一般依赖分布，缺乏显式闭式表达式迫使实践者依赖昂贵的基于采样的近似。我们完全解决了类别输入的这一限制。通过将函数分析与离散傅里叶分析的扩展相结合，我们在没有任何假设的情况下推导出闭式分解。我们的公式计算效率非常高。它无缝地恢复了经典独立情况，并扩展到任意依赖结构，包括具有非矩形支撑的分布。此外，利用SHAP与ANOVA在独立性下的内在联系，我们的框架为一般类别设置提供了SHAP值的自然推广。

英文摘要

Functional ANOVA offers a principled framework for interpretability by decomposing a model's prediction into main effects and higher-order interactions. For independent features, this decomposition is well-defined, strongly linked with SHAP values, and serves as a cornerstone of additive explainability. However, the lack of an explicit closed-form expression for general dependent distributions has forced practitioners to rely on costly sampling-based approximations. We completely resolve this limitation for categorical inputs. By bridging functional analysis with the extension of discrete Fourier analysis, we derive a closed-form decomposition without any assumption. Our formulation is computationally very efficient. It seamlessly recovers the classical independent case and extends to arbitrary dependence structures, including distributions with non-rectangular support. Furthermore, leveraging the intrinsic link between SHAP and ANOVA under independence, our framework yields a natural generalization of SHAP values for the general categorical setting.

URL PDF HTML ☆

赞 0 踩 0

2602.22352 2026-06-10 cs.AR cs.AI 版本更新

GRAU: Generic Reconfigurable Activation Unit Design for Neural Network Hardware Accelerators

GRAU：面向神经网络硬件加速器的通用可重构激活单元设计

Yuhao Liu, Salim Ullah, Akash Kumar

发表机构 * Ruhr University Bochum, Germany（博德姆鲁尔大学）； Dresden University of Technology, Germany（德累斯顿技术大学）； Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden/Leipzig), Germany（可扩展数据与人工智能中心（ScaDS.AI 德累斯顿/莱比锡））

AI总结提出基于分段线性拟合的可重构激活硬件GRAU，用2的幂近似斜率，仅需比较器和1位移位器，支持混合精度量化和SiLU等非线性函数，相比多阈值激活器减少90%以上LUT消耗，在6-8段时达到最佳权衡。

2602.04935 2026-06-10 cs.SE cs.AI 版本更新

ASA: Backbone-Training-Free Representation Engineering for Tool-Calling Agents

ASA：无需骨干训练的工具调用智能体表示工程

Youjin Wang, Run Zhou, Yingjie Ma, Rong Fu, Jiani Liang, Shuaishuai Cao, Min Huang, Tao Fang, Liangming Pan

发表机构 * Renmin University of China（中国人民大学）； University of Macau（澳门大学）； Central South University（中南大学）； Jiangxi Normal University（江西师范大学）； Macau Millennium College（澳门 millennium 学院）； Peking University（北京大学）

AI总结针对大语言模型在工具调用中的惰性代理问题，提出一种无需训练、推理时激活干预的方法ASA，通过路由条件混合引导向量和探针引导门控，显著提升工具使用F1并降低误报率。

Comments The manuscript consists of 24 pages formatted in the ACL style. Youjin Wang, Run Zhou, and Yingjie Ma contributed equally to this work. Tao Fang and Liangming Pan are the co-corresponding authors

详情

AI中文摘要

将LLM智能体适应特定领域的工具调用在演化的接口下仍然明显脆弱。提示和模式工程易于部署，但在分布偏移和严格解析器下通常脆弱，而持续参数高效微调提高了可靠性，但代价是训练、维护和潜在的遗忘。我们识别出一个关键的惰性代理失败模式，其中工具必要性几乎可以从中间层激活完美解码，但模型在进入工具模式时仍然保守，揭示了表示-行为差距。我们提出激活引导适配器（ASA），一种无需训练、推理时控制器，执行单次中间层干预，并通过路由条件混合引导向量，结合探针引导符号门控来放大真实意图同时抑制虚假触发，从而定位工具领域。在MTU-Bench上使用Qwen2.5-1.5B，ASA将严格的工具使用F1从0.18提高到0.50，同时将假阳性率从0.15降低到0.05，仅使用约20KB的可移植资产且无需权重更新。

英文摘要

Adapting LLM agents to domain-specific tool calling remains notably brittle under evolving interfaces. Prompt and schema engineering is easy to deploy but often fragile under distribution shift and strict parsers, while continual parameter-efficient fine-tuning improves reliability at the cost of training, maintenance, and potential forgetting. We identify a critical Lazy Agent failure mode where tool necessity is nearly perfectly decodable from mid-layer activations, yet the model remains conservative in entering tool mode, revealing a representation-behavior gap. We propose Activation Steering Adapter (ASA), a training-free, inference-time controller that performs a single-shot mid-layer intervention and targets tool domains via a router-conditioned mixture of steering vectors with a probe-guided signed gate to amplify true intent while suppressing spurious triggers. On MTU-Bench with Qwen2.5-1.5B, ASA improves strict tool-use F1 from 0.18 to 0.50 while reducing the false positive rate from 0.15 to 0.05, using only about 20KB of portable assets and no weight updates.

URL PDF HTML ☆

赞 0 踩 0

2510.06473 2026-06-10 physics.soc-ph cs.AI cs.SI 版本更新

Deep Generative Model for Human Mobility Behavior

人类移动行为的深度生成模型

Ye Hong, Yatao Zhang, Konrad Schindler, Martin Raubal

发表机构 * Institute of Cartography and Geoinformation, ETH Zurich（测绘与地理信息研究所，苏黎世联邦理工学院）； Department of Human Geography, Lund University（人类地理学系，吕勒奥大学）； Future Resilient Systems, Singapore-ETH Centre, ETH Zurich（未来韧性系统，新加坡-苏黎世联邦理工学院，苏黎世联邦理工学院）； Photogrammetry and Remote Sensing, ETH Zurich（摄影测量与遥感，苏黎世联邦理工学院）； Department of Geography, University College London（地理系，伦敦大学学院）

AI总结提出基于扩散的生成框架MobilityGen，模拟多属性活动-出行序列，复现标度律、时间分配等关键模式，支持城市空间可达性和社会暴露分析。

详情

AI中文摘要

理解和建模人类移动性对于交通规划、可持续城市设计和公共卫生等挑战至关重要。尽管经过数十年的努力，由于移动行为的复杂性、情境依赖性和探索性，模拟个体移动仍然具有挑战性。在此，基于日常移动的活动视角，我们提出了MobilityGen，一个基于扩散的生成框架，用于在大的空间尺度上模拟数天至数周的多属性活动-出行序列。通过将行为属性与环境背景联系起来，MobilityGen再现了关键模式，如地点访问的标度律、活动时间分配以及出行方式和目的地选择的耦合演化。它反映了时空变异性，并生成与建成环境一致的多样的、合理的移动模式。除了标准验证外，MobilityGen还使得先前模型难以实现的分析成为可能，包括不同出行方式下城市空间可达性的差异，以及共现动态如何塑造社会暴露和隔离。总之，这些结果为人类移动行为及其社会影响的精细研究提供了一个集成的、数据驱动的基础。

英文摘要

Understanding and modeling human mobility is central to challenges in transport planning, sustainable urban design, and public health. Despite decades of effort, simulating individual mobility remains challenging because of its complex, context-dependent, and exploratory nature. Here, building on the activity-based view of daily mobility, we propose MobilityGen, a diffusion-based generative framework for simulating multi-attribute activity-travel sequences over days to weeks at large spatial scales. By linking behavioral attributes with environmental context, MobilityGen reproduces key patterns such as scaling laws for location visits, activity time allocation, and the coupled evolution of travel mode and destination choices. It reflects spatio-temporal variability and generates diverse and plausible mobility patterns consistent with the built environment. Beyond standard validation, MobilityGen enables analyses that have been difficult with earlier models, including how access to urban space varies across travel modes and how co-presence dynamics shape social exposure and segregation. Together, these results support an integrated, data-driven basis for fine-grained studies of human mobility behavior and its societal implications.

URL PDF HTML ☆

赞 0 踩 0

2601.20970 2026-06-10 math.OC cs.IT cs.LG math.IT 版本更新

The hyper-scaled NLP bound for maximum-entropy remote sampling

最大熵远程采样的超缩放NLP界

Gabriel Ponte, Marcia Fampa, Jon Lee

发表机构 * University of Michigan（密歇根大学）； Universidade Federal do Rio de Janeiro（里约热内卢联邦大学）

AI总结针对最大熵远程采样问题，提出基于凸松弛的超缩放NLP界，理论上优于传统互补NLP界，并适用于秩亏协方差矩阵，通过数值实验验证了算法先进性。

详情

AI中文摘要

最大熵远程采样问题（MERSP）是从$n$个随机变量中选择一个包含$s$个随机变量的子集，以最大化关于一组不可直接观测的目标随机变量的信息。我们假设所有这些随机变量服从联合高斯分布，并且协方差矩阵已知。最后，我们使用香农微分熵来度量信息。解决中等规模MERSP实例精确解的主要方法是分支定界（B\&B），因此先前的工作集中于上界。在我们的工作之前，有两种MERSP的上界方法：所谓的“互补NLP界”和“谱界”，两者都是25年前提出的。我们现在能够建立这两个上界之间的支配关系。此外，我们基于一个微妙的凸松弛，提出了一种新颖有效的“超缩放NLP界”（hNLP界）。MERSP的“互补”版本hNLP界推广了先前MERSP的互补NLP界。我们提供了理论保证，给出了互补hNLP界严格支配互补NLP界的充分条件。此外，hNLP公式允许我们在秩亏协方差矩阵满足技术条件时推导其上界。这与先前的NLP界仅适用于正定协方差矩阵（因为它依赖于互补公式）形成对比。此外，我们描述了计算超缩放参数的步骤。最后，对于B\&B，我们提供了一种变量固定方法和结果，指导构建子问题的最佳方式。在基准实例上的数值实验证明了我们的方法在推进MERSP算法前沿方面的有效性。

英文摘要

The maximum-entropy remote sampling problem (MERSP) is to select a subset of $s$ random variables from a set of $n$ random variables, so as to maximize the information concerning a set of target random variables that are not directly observable. We assume that the set of all of these random variables follows a joint Gaussian distribution, and that we have the covariance matrix available. Finally, we measure information using Shannon's differential entropy. The main approach for exact solution of moderate-sized instances of MERSP has been branch-and-bound (B\&B), and so previous work concentrated on upper bounds. Prior to our work, there were two upper-bounding methods for MERSP: the so-called ``complementary NLP bound'' and the ``spectral bound'', both introduced 25 years ago. We are able now to establish domination results between these two upper bounds. Further, we propose a novel and effective ``hyper-scaled NLP bound'' (hNLP bound) based on a subtle convex relaxation. The ``complementary'' version of hNLP bound for MERSP generalizes the previous complementary NLP bound for MERSP. We provide theoretical guarantees, giving sufficient conditions under which the complementary hNLP bound strictly dominates the complementary NLP bound. In addition, the hNLP formulation allows us to derive upper bounds for rank-deficient covariance matrices when they satisfy a technical condition. This is in contrast to the previous NLP bound that worked with only positive definite covariance matrices (because it was wedded to a complementary formulation). Additionally, we describe procedures for calculating hyper-scaling parameters. Finally, for B\&B, we provide a variable-fixing methodology and results guiding the best way to construct subproblems. Numerical experiments on benchmark instances demonstrate the effectiveness of our approaches in advancing the algorithmic state-of-the-art for MERSP.

URL PDF HTML ☆

赞 0 踩 0

2601.16700 2026-06-10 cs.SE cs.AI cs.ET cs.HC 版本更新

Adoption of Generative Artificial Intelligence in the German Software Engineering Industry: An Empirical Study

生成式人工智能在德国软件工程行业的采用：一项实证研究

Ludwig Felder, Tobias Eisenreich, Mahsa Fischer, Stefan Wagner, Chunyang Chen

发表机构 * Technical University of Munich（慕尼黑技术大学）； Heilbronn University of Applied Science（海德堡应用科学大学）

AI总结通过混合方法研究德国软件工程师采用生成式AI工具的情况，发现经验水平调节感知收益，组织规模影响工具选择和使用强度，项目上下文意识不足是主要障碍。

Comments Accepted at FSE '26

详情

DOI: 10.1145/3803437.3805207

AI中文摘要

生成式人工智能（GenAI）工具在软件开发人员中迅速普及。尽管行业采用率正在上升，但影响这些工具有效使用的潜在因素，包括交互深度、组织约束和经验相关考虑，尚未得到彻底调查。这个问题在监管要求严格的环境中尤为突出，例如德国，从业者必须应对GDPR和欧盟AI法案，同时平衡生产力提升与知识产权考虑。尽管GenAI对软件工程产生了重大影响，但据我们所知，尚无实证研究系统性地考察德国背景下GenAI工具的采用动态。为填补这一空白，我们提出了一项关于德国软件工程师采用GenAI的全面混合方法研究。具体而言，我们进行了18次探索性访谈，随后对109名参与者进行了开发者调查。我们分析了工具采用模式、提示策略以及影响有效性的组织因素。结果表明，经验水平调节了GenAI工具的感知收益，并且生产力提升在开发人员之间并非均匀分布。此外，组织规模影响工具选择和使用强度。项目上下文意识有限被确定为最显著的障碍。我们总结了一系列对开发者、组织和工具供应商具有可操作性的启示，以推进人工智能辅助的软件开发。

英文摘要

Generative artificial intelligence (GenAI) tools have seen rapid adoption among software developers. While adoption rates in the industry are rising, the underlying factors influencing the effective use of these tools, including the depth of interaction, organizational constraints, and experience-related considerations, have not been thoroughly investigated. This issue is particularly relevant in environments with stringent regulatory requirements, such as Germany, where practitioners must address the GDPR and the EU AI Act while balancing productivity gains with intellectual property considerations. Despite the significant impact of GenAI on software engineering, to the best of our knowledge, no empirical study has systematically examined the adoption dynamics of GenAI tools within the German context. To address this gap, we present a comprehensive mixed-methods study on GenAI adoption among German software engineers. Specifically, we conducted 18 exploratory interviews with practitioners, followed by a developer survey with 109 participants. We analyze patterns of tool adoption, prompting strategies, and organizational factors that influence effectiveness. Our results indicate that experience level moderates the perceived benefits of GenAI tools, and productivity gains are not evenly distributed among developers. Further, organizational size affects both tool selection and the intensity of tool use. Limited awareness of the project context is identified as the most significant barrier. We summarize a set of actionable implications for developers, organizations, and tool vendors seeking to advance artificial intelligence (AI) assisted software development.

URL PDF HTML ☆

赞 0 踩 0

2601.13994 2026-06-10 cs.DC cs.AI 版本更新

torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch

torch-sla: 可微稀疏线性代数，带有伴随求解器和稀疏张量并行性，用于 PyTorch

Mingyuan Chi, Shizheng Wen

发表机构 * PyTorch

AI总结提出 torch-sla 库，通过统一 autograd 接口和 O(1) 图伴随微分，实现可微稀疏线性代数求解器，支持多后端和分布式多 GPU 执行。

详情

AI中文摘要

可微稀疏线性代数是科学机器学习的基础，但 PyTorch 缺乏统一的库：此 http URL 仅提供低级内核和不可微、仅 CPU 的 spsolve，而此 http URL 仅支持密集。我们提出 torch-sla，一个填补这一空白的开源库。它暴露了一个单一的 autograd 感知 API，用于直接、迭代、非线性和特征值求解器，跨越五个可互换的后端——CPU 上的 SciPy 和 Eigen，GPU 上的 cuDSS、CuPy 和 PyTorch 原生迭代求解器——并根据设备和问题大小自动调度。该库进一步支持在共享或不同稀疏模式上的批量求解，以及通过域分解和光环交换的分布式多 GPU 执行。这些能力通过 O(1) 图伴随微分框架和 autograd 兼容的分布式光环交换层实现可扩展性。该库可在 https://github.com/your-repo/torch-sla 获取。

英文摘要

Differentiable sparse linear algebra is foundational for scientific machine learning, yet PyTorch lacks a unified library for it: torch.sparse provides only low-level kernels and a non-differentiable, CPU-only spsolve, and torch.linalg is dense-only. We present torch-sla, an open-source library that fills this gap. It exposes a single autograd-aware API for direct, iterative, nonlinear, and eigenvalue solvers across five interchangeable backends -- SciPy and Eigen on CPU, cuDSS, CuPy, and a PyTorch-native iterative solver on GPU -- with automatic dispatch by device and problem size. The library further supports batched solves over shared or distinct sparsity patterns and distributed multi-GPU execution via domain decomposition with halo exchange. These capabilities are made scalable by an O(1)-graph adjoint differentiation framework and an autograd-compatible distributed halo-exchange layer. The library is available at https://www.torchsla.com/.

URL PDF HTML ☆

赞 0 踩 0

2512.18531 2026-06-10 physics.chem-ph cs.LG 版本更新

Pushing the limits of one-dimensional NMR spectroscopy for automated structure elucidation using artificial intelligence

利用人工智能推动一维核磁共振波谱在自动结构解析中的极限

Frank Hu, Jonathan M. Tubb, Dimitris Argyropoulos, Sergey Golotvin, Mikhail Elyashberg, Grant M. Rotskoff, Matthew W. Kanan, Thomas E. Markland

发表机构 * Department of Chemistry, Stanford University（化学系，斯坦福大学）； ACD/Labs（ACD实验室）

AI总结提出基于Transformer的深度学习框架，仅利用一维1H和13C NMR谱，对含多达40个非氢原子的有机分子实现60.4%的首次15次预测准确率，克服化学空间组合爆炸。

详情

AI中文摘要

一维核磁共振波谱是有机化合物和天然产物表征中最广泛使用的技术之一。对于含有最多36个非氢原子的分子，可能的结构数量估计在$10^{20} - 10^{60}$范围内。因此，仅使用其一维$^1$H和/或$^{13}$C NMR谱来确定该大小分子的结构（分子式和连接性），即从头结构生成，似乎完全不可行。在这里，我们展示了如何通过深度学习框架，对含有最多40个非氢原子且涵盖有机化学中常见元素（C、N、O、H、P、S、Si、B和卤素）的系统实现这一任务，从而覆盖了类药化学空间的绝大部分。利用自然语言处理的见解，我们展示了基于Transformer的架构仅使用$^1$H和$^{13}$C NMR谱，在前15次预测中正确预测分子的准确率达到60.4%，从而克服了化学空间的组合增长，同时通过微调也可扩展到实验数据。

英文摘要

One-dimensional NMR spectroscopy is one of the most widely used techniques for the characterization of organic compounds and natural products. For molecules with up to 36 non-hydrogen atoms, the number of possible structures has been estimated to range from $10^{20} - 10^{60}$. The task of determining the structure (formula and connectivity) of a molecule of this size using only its one-dimensional $^1$H and/or $^{13}$C NMR spectrum, i.e. de novo structure generation, thus appears completely intractable. Here we show how it is possible to achieve this task for systems with up to 40 non-hydrogen atoms across the full elemental coverage typically encountered in organic chemistry (C, N, O, H, P, S, Si, B, and the halogens) using a deep learning framework, thus covering a vast portion of the drug-like chemical space. Leveraging insights from natural language processing, we show that our transformer-based architecture predicts the correct molecule with 60.4% accuracy within the first 15 predictions using only the $^1$H and $^{13}$C NMR spectra, thus overcoming the combinatorial growth of the chemical space while also being extensible to experimental data via fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2511.22331 2026-06-10 math.OC cs.AI cs.LG 版本更新

On the Condition Number Dependency in Bilevel Optimization

关于双层优化中条件数依赖性的研究

Lesi Chen, Jingzhao Zhang

发表机构 * IIIS, Tsinghua University（清华大学信息学院）

AI总结本文针对非凸上层、强凸下层的双层优化问题，建立了条件数依赖性的下界，揭示了双层与极小极大优化在条件数依赖上的首次可证明差距。

Comments This new version improves deterministic lower bounds in v1

详情

AI中文摘要

双层优化最小化一个由上层问题定义的目标函数，其可行域是下层问题的解集。我们研究当上层问题非凸、下层问题强凸时，使用一阶方法寻找 $\epsilon$-稳定点的 oracle 复杂度。近期工作 (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen et al., JMLR 2025) 达到了 $\tilde{\mathcal{O}}(\bar \kappa_y^4 \epsilon^{-2})$ 的上界，在 $\epsilon$ 上接近最优，通过在内循环中朴素应用 Nesterov 加速可降至 $\tilde{\mathcal{O}}(\bar \kappa_y^{7/2} \epsilon^{-2})$，其中 $\bar \kappa_y$ 是全局条件数。然而，条件数的最优依赖性未知。本文建立了新的 $\Omega(\kappa_y^{5/2} \epsilon^{-2})$ 下界，其中 $\kappa_y < \bar \kappa_y$ 是下层条件数，当光滑常数为 $\mathcal{O}(1)$ 时与 $\bar \kappa_y$ 同阶。我们的下界首次证明了在此设定下双层问题与极小极大优化在条件数依赖性上的可证明差距。下界可推广到多种设置，包括高阶光滑函数、随机 oracle 和凸超目标：(1) 对于二阶和任意光滑问题，我们分别给出 $\Omega({\kappa_y^{31/14}} \epsilon^{-12/7})$ 和 $\Omega(\kappa_y^{21/10} \epsilon^{-8/5})$ 的下界。(2) 对于凸-强凸问题，我们将先前最佳下界 (Ji and Liang, JMLR 2022) 从 $\Omega(\kappa_y /\sqrt{\epsilon})$ 改进为 $\Omega(\kappa_y^{3/2} / \sqrt{\epsilon})$。(3) 对于光滑随机问题，我们也给出 $\Omega(\kappa_y^4 \epsilon^{-4})$ 的下界。

英文摘要

Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an $ε$-stationary point with first-order methods when the upper-level problem is nonconvex, and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen et al., JMLR 2025) achieve a $\tilde{\mathcal{O}}(\bar κ_y^4 ε^{-2})$ upper bound that is near-optimal in $ε$, which can be reduced to $\tilde{\mathcal{O}}(\bar κ_y^{7/2} ε^{-2})$ by a naive application of Nesterov acceleration in the inner loop, where $\bar κ_y$ is the global condition number. However, the optimal dependency on the condition number is unknown. In this work, we establish a new $Ω(κ_y^{5/2} ε^{-2})$ lower bound, where $κ_y < \bar κ_y$ is the lower-level condition number that is of the same order as $\bar κ_y$ when the smoothness constants are $\mathcal{O}(1)$. Our lower bound establishes the first provable gap in terms of condition number dependency between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show lower bounds of $Ω({κ_y^{31/14}} ε^{-12/7})$ and $Ω(κ_y^{21/10} ε^{-8/5})$, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from $Ω(κ_y /\sqrtε)$ to $Ω(κ_y^{3/2} / \sqrtε)$. (3) For smooth stochastic problems, we also show a lower bound of $Ω(κ_y^4 ε^{-4})$.

URL PDF HTML ☆

赞 0 踩 0

2511.19706 2026-06-10 eess.IV cs.CV 版本更新

Selective Disk Bispectrum: A Complete and Rotation Invariant Image Descriptor

选择性圆盘双谱：一种完备且旋转不变的图像描述符

Adele Myers Lantow, Nina Miolane

发表机构 * Department of Physics（物理系）； Department of Electrical and Computer Engineering（电气与计算机工程系）； University of California, Santa Barbara（加州大学圣芭芭拉分校）

AI总结提出选择性圆盘双谱（SDB），一种复值旋转不变向量，在保持图像除方向外所有信息的同时，降低了计算复杂度，并验证了其在噪声分类和多参考对齐中的鲁棒性。

详情

AI中文摘要

旋转不变性是许多计算机视觉任务的基本要求。历史上，这种归纳偏置通过手工设计的旋转不变表示来编码。这些表示紧凑、可解释且计算快速，但以描述能力为代价。最近，架构通过学习表示来实现归纳偏置。这些表示高度描述性，实现了强大的经验性能，但以效率和可解释性为代价。在这项工作中，我们提出了两种范式交叉点上的替代方案。我们引入了选择性圆盘双谱（SDB），一种复值旋转不变向量，它保留了图像除方向外的所有信息。我们的关键理论贡献是选择性圆盘双谱、其逆变换、其（降低的）空间和计算复杂度（与完整圆盘双谱相比），以及其在噪声下的期望和方差。此外，我们提出了数值SDB近似，并为其准确性和旋转不变性提供了理论保证。在经验上，我们验证了SDB在噪声分类任务中的不变性和鲁棒性。我们在旋转图像的多参考对齐上测试了我们的重建算法。

英文摘要

Rotation invariance is a fundamental requirement across many computer vision tasks. Historically, this inductive bias has been encoded through hand-crafted rotation-invariant representations. These are compact, interpretable, and fast to compute, but they come at the cost of descriptive power. More recently, architectures achieve inductive bias through learned representations. These are highly descriptive and achieve strong empirical performance, at the cost of efficiency and interpretability. In this work, we propose an alternative at the intersection of both paradigms. We introduce the selective disk bispectrum (SDB), a complex-valued rotation-invariant vector that preserves all information about the image except its orientation. Our key theoretical contributions are the selective disk bispectrum, its inversion, its (reduced) spatial and computational complexities (compared to the full disk bispectrum), and its expectation and variance under noise. Furthermore, we propose a numerical SDB approximation and provide theoretical guarantees for its accuracy and rotation invariance. Empirically, we validate SDB's invariance and robustness to noise classification tasks. We test our reconstruction algorithm on multi-reference alignment of rotated images.

URL PDF HTML ☆

赞 0 踩 0

2507.22017 2026-06-10 eess.IV cs.CV 版本更新

Cyst-X: A Multi-Center MRI Benchmark and Federated Learning Framework for Malignancy-Risk Stratification of Pancreatic Cystic Neoplasm

Cyst-X：用于胰腺囊性肿瘤恶性风险分层的多中心MRI基准与联邦学习框架

Hongyi Pan, Gorkem Durak, Elif Keles, Ziliang Hong, Deniz Seyithanoglu, Zheyuan Zhang, Alpay Medetalibeyoglu, Halil Ertugrul Aktas, Andrea Mia Bejar, Yavuz Taktak, Gulbiz Dagoglu Kartal, Mehmet Sukru Erturk, Timurhan Cebeci, Yury Velichko, Lili Zhao, Emil Agarunov, Federica Proietto Salanitri, Concetto Spampinato, Pallavi Tiwari, Ziyue Xu, Sachin Jambawalikar, Ivo G. Schoots, Marco J. Bruno, Chenchan Huang, Candice W. Bolan, Tamas Gonda, Frank H. Miller, Rajesh N. Keswani, Michael B. Wallace, Ulas Bagci

发表机构 * Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University（机器与混合智能实验室，放射科，西北大学）； Istanbul Faculty of Medicine, Istanbul University（伊斯坦布尔大学医学学院）； Department of Biomedical Engineering and Radiology, University of Wisconsin-Madison（生物医学工程与放射科，威斯康星大学麦迪逊分校）； Department of Preventive Medicine, Northwestern University（预防医学系，西北大学）； Division of Gastroenterology and Hepatology, New York University（消化内科与肝病科，纽约大学）； Department of Electrical, Electronic and Computer Engineering, University of Catania（电气、电子和计算机工程系，卡塔尼亚大学）； NVIDIA ； Department of Radiology, Columbia University（放射科，哥伦比亚大学）； Department of Radiology and Nuclear Medicine, Erasmus Medical Center（放射科与核医学科，埃因霍温医学院）； Department of Gastroenterology and Hepatology, Erasmus Medical Center（消化内科与肝病科，埃因霍温医学院）； Department of Radiology, New York University（放射科，纽约大学）； Division of Gastroenterology and Hepatology, Mayo Clinic Florida（消化内科与肝病科，迈阿密诊所佛罗里达分部）； Department of Gastroenterology and Hepatology, Northwestern University（消化内科与肝病科，西北大学）

AI总结提出Cyst-X，一个多中心MRI基准和联邦学习框架，用于IPMN恶性风险分层，结合PanSegNet分割器和3D DenseNet-121分类器，在内部交叉验证中达到0.85的AUC，性能与放射科医生相当。

详情

AI中文摘要

预计到2030年，胰腺癌将成为第二大致命癌症，因此早期检测至关重要。导管内乳头状黏液性肿瘤（IPMN）是关键的癌前病变，目前指南在恶性风险分层方面存在困难，导致不必要的手术或漏诊。在此，我们介绍Cyst-X，一个用于IPMN恶性风险分层的多中心MRI基准和联邦学习框架。该数据集包含来自七个国际中心764名患者的1,461次腹部MRI扫描，具有基于组织病理学或三年影像随访的三级恶性标签和专家胰腺分割。该流程将PanSegNet胰腺分割器与3D DenseNet-121分类器以及并行放射组学预测器相结合。在内部交叉验证中，深度学习分类器在T2加权MRI上对高风险与低风险或无风险鉴别达到了平均受试者工作特征曲线下面积（AUC）0.85（95%置信区间0.84-0.86），平均精确度从患病率基线0.23提高到0.64。当训练分布在多个机构之间且不交换原始患者图像时，该性能得以保持（AUC 0.85，FedProx）。在仅基于影像条件下评估的629例读者子集上，与三位盲法放射科医生相比，该分类器在特异性相当的情况下达到或超过了敏感性。为了加速早期胰腺癌检测研究，我们公开发布Cyst-X数据集、分割掩膜和训练模型，作为首个用于胰腺囊性肿瘤分析的大规模多中心MRI资源。

英文摘要

Pancreatic cancer is projected to be the second-deadliest cancer by 2030, making early detection critical. Intraductal papillary mucinous neoplasms (IPMNs), key cancer precursors, present a clinical dilemma, as current guidelines struggle to stratify malignancy risk, leading to unnecessary surgeries or missed diagnoses. Here, we introduce Cyst-X, a multi-center MRI benchmark and a federated learning framework for IPMN malignancy-risk stratification. The dataset comprises 1,461 abdominal MRI scans from 764 patients at seven international centers, with three-tier malignancy labels anchored in histopathology or three-year imaging follow-up and expert pancreas segmentations. The pipeline couples the PanSegNet pancreas segmenter with a 3D DenseNet-121 classifier and a parallel radiomics predictor. On internal cross-validation, the deep learning classifier reached a mean area under the receiver operating characteristic curve (AUC) of 0.85 (95% confidence interval 0.84-0.86) on T2-weighted MRI for high-risk versus low- or no-risk discrimination, with the average precision rising from a prevalence baseline of 0.23 to 0.64. This performance was preserved (AUC 0.85, FedProx) when training was distributed across institutions without exchange of raw patient images. Benchmarked against three blinded radiologists on a 629-case reader subset evaluated under imaging-only conditions, the classifier matched or exceeded sensitivity at comparable specificity. To accelerate research in early pancreatic cancer detection, we publicly release the Cyst-X dataset, segmentation masks, and trained models as the first large-scale, multi-centre MRI resource for pancreatic cystic neoplasm analysis.

URL PDF HTML ☆

赞 0 踩 0

2510.08906 2026-06-10 stat.ML cs.LG physics.chem-ph 版本更新

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection

梯度引导的最远点采样用于鲁棒训练集选择

Morris Trestman, Stefan Gugler, Felix A. Faber, O. A. von Lilienfeld

发表机构 * Berlin Institute for the Foundations of Learning（柏林学习与数据基础研究院）； Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, ON, Canada（化学物理理论组，化学系，多伦多大学圣乔治校区，多伦多，ON，加拿大）； Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, ON, Canada（材料科学与工程系，多伦多大学圣乔治校区，多伦多，ON，加拿大）； Vector Institute for Artificial Intelligence, Toronto, ON, Canada（人工智能研究所，多伦多，ON，加拿大）； Department of Physics, University of Toronto, St. George Campus, Toronto, ON, Canada（物理系，多伦多大学圣乔治校区，多伦多，ON，加拿大）； Acceleration Consortium, University of Toronto, Toronto, ON, Canada（加速联盟，多伦多大学，多伦多，ON，加拿大）

AI总结提出梯度引导最远点采样（GGFPS），利用分子力范数指导构型空间采样，在MD17数据集上相比FPS和随机采样显著提升数据效率和模型鲁棒性。

Comments 41 pages, 43 figures, 2 algorithms; journal article with supplementary information appended

Journal ref Machine Learning: Science and Technology 7, 035047 (2026)

详情

DOI: 10.1088/2632-2153/ae68b8

AI中文摘要

训练集采样方法用于提高机器学习问题中与化学相关的模型性能并降低数据成本。我们引入了梯度引导最远点采样（GGFPS），这是最远点采样（FPS）的一个简单扩展，利用分子力范数指导分子构型空间的高效采样。针对一个玩具系统（Styblinski-Tang函数）以及来自MD17数据集的分子动力学轨迹，提供了数值证据。我们的数值结果表明，与FPS、均匀随机采样（URS）以及已有的监督式FPS风格选择器PCov-FPS和PCov-CUR相比，使用GGFPS时数据效率和模型鲁棒性更优。对MD17数据的分布分析表明，FPS系统性地欠采样平衡几何结构，导致松弛结构测试误差较大。GGFPS纠正了这一缺陷，并且（i）在二维Styblinski-Tang系统中，与FPS相比，在不牺牲预测精度的情况下，训练成本可降低两倍；（ii）系统性地降低了MD17中平衡以及应变结构的预测误差；（iii）在所有MD17构型空间中系统性地降低了预测误差方差。这些结果表明，梯度感知采样方法作为有效的训练集选择工具具有很大潜力，而简单使用FPS可能导致训练不平衡和预测结果不一致。

英文摘要

Training set sampling methods are used to improve model performance and lower data costs in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular force norms to guide efficient sampling of configurational spaces of molecules. Numerical evidence is presented for a toy system (the Styblinski-Tang function) as well as for molecular dynamics trajectories from the MD17 dataset. Our numerical results indicate superior data efficiency and model robustness when using GGFPS compared to FPS and uniform random sampling (URS), as well as established supervised FPS-style selectors, PCov-FPS and PCov-CUR. Distribution analysis of the MD17 data suggests that FPS systematically under-samples equilibrium geometries, resulting in large test errors for relaxed structures. GGFPS cures this artifact and (i) enables up to twofold reductions in training cost without sacrificing predictive accuracy compared to FPS in the 2-dimensional Styblinski-Tang system, (ii) systematically lowers prediction errors for equilibrium as well as strained structures in MD17, and (iii) systematically decreases prediction error variances across all of the MD17 configuration spaces. These results suggest that gradient-aware sampling methods hold great promise as effective training set selection tools, and that naive use of FPS may result in imbalanced training and inconsistent prediction outcomes.

URL PDF HTML ☆

赞 0 踩 0

2509.17251 2026-06-10 stat.ML cs.LG 版本更新

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization

线性回归中的风险比较：隐式正则化主导显式正则化

Jingfeng Wu, Peter L. Bartlett, Sham M. Kakade, Jason D. Lee, Bin Yu

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Alphabetical order ； Harvard University（哈佛大学）； Google DeepMind（谷歌DeepMind）

AI总结本文通过实例比较线性回归中梯度下降、岭回归和随机梯度下降的有限样本风险，发现梯度下降优于岭回归，但与随机梯度下降不可比，且在某些问题中梯度下降可能更差。

Comments Accepted for presentation at the Conference on Learning Theory (COLT) 2026

详情

AI中文摘要

现有理论表明，对于按容量和源条件分类的线性回归问题，梯度下降（GD）始终是极小化最优的，而岭回归和在线随机梯度下降（SGD）对于某些类别的问题则是多项式次优的。超越极小化理论，本文为任何良好设定的线性回归问题提供了这些算法有限样本风险的实例比较。我们的分析得出三个关键发现。首先，GD 优于岭回归：在可比较的正则化下，GD 的过剩风险始终在岭回归的一个常数因子内，但即使经过最优调整，岭回归也可能多项式地更差。其次，GD 与 SGD 不可比。虽然已知对于某些问题 GD 可以多项式地优于 SGD，但反之亦然：我们受良性过拟合理论启发构造了问题，其中最优停止的 GD 多项式地更差。最后，对于一类重要子问题——具有快速且连续衰减协方差谱的问题，GD 优于 SGD，这包括所有满足标准容量条件的问题。

英文摘要

Existing theory suggests that for linear regression problems categorized by capacity and source conditions, gradient descent (GD) is always minimax optimal, while both ridge regression and online stochastic gradient descent (SGD) are polynomially suboptimal for certain categories of such problems. Moving beyond minimax theory, this work provides instance-wise comparisons of the finite-sample risks for these algorithms on any well-specified linear regression problem. Our analysis yields three key findings. First, GD dominates ridge regression: with comparable regularization, the excess risk of GD is always within a constant factor of that of ridge, but ridge can be polynomially worse even when tuned optimally. Second, GD is incomparable with SGD. While it is known that for certain problems GD can be polynomially better than SGD, the reverse is also true: we construct problems, inspired by benign overfitting theory, where optimally stopped GD is polynomially worse. Finally, GD dominates SGD for a significant subclass of problems -- those with fast and continuously decaying covariance spectra -- which includes all problems satisfying the standard capacity condition.

URL PDF HTML ☆

赞 0 踩 0

2506.03672 2026-06-10 stat.ML cs.LG math.OC 版本更新

Latent Guided Sampling for Combinatorial Optimization

面向组合优化的潜在引导采样

Sobihan Surendran, Adeline Fermanian, Sylvain Le Corff

发表机构 * Sorbonne Université and Université Paris Cité, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, F-75005 Paris, France（索邦大学和巴黎Cité大学，法国国家科学研究中心，概率、统计与建模实验室，法国巴黎F-75005）； LOPF, Califrais' Machine Learning Lab, Paris, France（LOPF，Califrais机器学习实验室，法国巴黎）

AI总结提出LGS-Net潜在空间模型，结合马尔可夫链蒙特卡洛与随机逼近的潜在引导采样方法，在路由任务上达到最先进性能。

Journal ref International Conference on Machine Learning, Jul 2026, Seoul, South Korea

详情

AI中文摘要

组合优化问题在物流、制造和药物发现等领域广泛存在，但其NP-hard性质使其计算上具有挑战性。最近的神经组合优化（NCO）方法利用深度学习来学习构建解的策略，通过监督学习或强化学习进行训练。尽管有前景，但这些方法通常依赖于任务特定的增强，在分布外实例上表现不佳，并且缺乏鲁棒的推理机制。此外，现有的潜在空间模型要么需要标记数据，要么使用与实例无关的潜在分布。在这项工作中，我们提出了LGS-Net，一种新颖的以问题实例为条件的潜在空间模型，并引入了一种高效的推理方法——潜在引导采样（LGS），基于马尔可夫链蒙特卡洛和随机逼近。我们证明了我们方法的迭代形成一个时间非齐次马尔可夫链，并提供了严格的理论收敛保证。在基准路由任务上的实证结果表明，我们的方法在NCO基线中达到了最先进的性能。

英文摘要

Combinatorial Optimization problems are widespread in domains such as logistics, manufacturing, and drug discovery, yet their NP-hard nature makes them computationally challenging. Recent Neural Combinatorial Optimization (NCO) methods leverage deep learning to learn policies for constructing solutions, trained via Supervised or Reinforcement Learning. While promising, these approaches often rely on task-specific augmentations, perform poorly on out-of-distribution instances, and lack robust inference mechanisms. Moreover, existing latent space models either require labeled data or use an instance-independent latent distribution. In this work, we propose LGS-Net, a novel latent space model that conditions on problem instances, and introduce an efficient inference method, Latent Guided Sampling (LGS), based on Markov Chain Monte Carlo and Stochastic Approximation. We show that the iterations of our method form a time-inhomogeneous Markov Chain and provide rigorous theoretical convergence guarantees. Empirical results on benchmark routing tasks show that our method achieves state-of-the-art performance among NCO baselines.

URL PDF HTML ☆

赞 0 踩 0

2503.20272 2026-06-10 stat.ML cs.LG 版本更新

An $(ε,δ)$-accurate level set estimation with a stopping criterion

一个具有停止准则的 $(\epsilon,\delta)$-精确水平集估计

Hideaki Ishibashi, Kota Matsui, Kentaro Kutsukake, Hideitsu Hino

发表机构 * Kyushu Institute of Technology（九州工业技术大学）； Nagoya University / RIKEN AIP（名古屋大学 / RIKEN AIP）； The Institute of Statistical Mathematics/ RIKEN AIP（统计数学研究所 / RIKEN AIP）

AI总结提出一种带停止准则的水平集估计获取策略，理论上证明满足 $\epsilon$-精确度和 $1-\delta$ 置信水平，减少不必要的函数评估，实验验证了其有效性。

详情

AI中文摘要

水平集估计问题旨在识别候选点集内未知且评估代价高昂的函数值超过指定阈值的区域，为全面评估函数值提供了一种高效替代方案。传统方法通常采用序列优化策略来寻找 $\epsilon$-精确解，该解允许在阈值轮廓周围留有余量，但往往缺乏有效的停止准则，导致过度探索和效率低下。本文引入了一种带有停止准则的水平集估计获取策略，确保算法在进一步探索不太可能带来改进时停止，从而减少不必要的函数评估。我们从理论上证明，该方法在 $1-\delta$ 的置信水平下满足 $\epsilon$-精确度，弥补了现有方法的一个关键空白。此外，我们表明这还带来了对 F-score 等性能指标下限的保证。数值实验表明，所提出的获取函数在达到与现有方法相当的精确度的同时，确认了停止准则在充分探索后有效终止算法。

英文摘要

The level set estimation problem seeks to identify regions within a set of candidate points where an unknown and costly to evaluate function's value exceeds a specified threshold, providing an efficient alternative to exhaustive evaluations of function values. Traditional methods often use sequential optimization strategies to find $ε$-accurate solutions, which permit a margin around the threshold contour but frequently lack effective stopping criteria, leading to excessive exploration and inefficiencies. This paper introduces an acquisition strategy for level set estimation that incorporates a stopping criterion, ensuring the algorithm halts when further exploration is unlikely to yield improvements, thereby reducing unnecessary function evaluations. We theoretically prove that our method satisfies $ε$-accuracy with a confidence level of $1 - δ$, addressing a key gap in existing approaches. Furthermore, we show that this also leads to guarantees on the lower bounds of performance metrics such as F-score. Numerical experiments demonstrate that the proposed acquisition function achieves comparable precision to existing methods while confirming that the stopping criterion effectively terminates the algorithm once adequate exploration is completed.

URL PDF HTML ☆

赞 0 踩 0

2407.20242 2026-06-10 cs.CY cs.AI cs.RO 版本更新

BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

BadRobot: 在物理世界中越狱具身LLM智能体

Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Changgan Yin, Minghui Li, Lulu Xue, Yichen Wang, Shengshan Hu, Aishan Liu, Peijin Guo, Leo Yu Zhang

发表机构 * Huazhong University of Science and Technology（华中科技大学）； Beihang University（北航）； Griffith University（格里菲斯大学）

AI总结提出BadRobot攻击范式，利用LLM在机器人系统中的操纵、语言输出与物理动作的错位以及世界知识缺陷三个漏洞，通过语音交互使具身LLM执行有害行为，并在基准测试中验证了有效性。

Comments Accepted to ICLR 2025. Please cite the conference version. Project page: https://Embodied-LLMs-Safety.github.io

Journal ref International Conference on Learning Representations (ICLR) 2025

详情

AI中文摘要

具身AI代表将AI集成到物理实体中的系统。大型语言模型（LLM）展现出强大的语言理解能力，通过促进复杂的任务规划，已被广泛用于具身AI。然而，一个关键的安全问题仍被忽视：这些具身LLM是否会实施有害行为？为此，我们引入了BadRobot，一种新颖的攻击范式，旨在通过典型的基于语音的用户-系统交互，使具身LLM违反安全和伦理约束。具体来说，我们利用了三个漏洞来实现这种攻击：(i) 机器人系统中LLM的操纵，(ii) 语言输出与物理动作之间的错位，以及(iii) 世界知识缺陷导致的意外危险行为。此外，我们构建了一个包含各种恶意物理动作查询的基准，以评估BadRobot的攻击性能。基于该基准，针对现有突出的具身LLM框架（例如Voxposer、Code as Policies和ProgPrompt）的大量实验证明了我们BadRobot的有效性。我们的代码可在以下网址获取：this https URL。

英文摘要

Embodied AI represents systems where AI is integrated into physical entities. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI by facilitating sophisticated task planning. However, a critical safety issue remains overlooked: could these embodied LLMs perpetrate harmful behaviors? In response, we introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions. Specifically, three vulnerabilities are exploited to achieve this type of attack: (i) manipulation of LLMs within robotic systems, (ii) misalignment between linguistic outputs and physical actions, and (iii) unintentional hazardous behaviors caused by world knowledge's flaws. Furthermore, we construct a benchmark of various malicious physical action queries to evaluate BadRobot's attack performance. Based on this benchmark, extensive experiments against existing prominent embodied LLM frameworks (e.g., Voxposer, Code as Policies, and ProgPrompt) demonstrate the effectiveness of our BadRobot. Our code is available at https://github.com/Rookie143/BadRobot.

URL PDF HTML ☆

赞 0 踩 0

2501.04339 2026-06-10 stat.ML cs.LG physics.app-ph 版本更新

Interpretable deep convolutional model for nonlinear multivariate time series in complex systems

可解释的深度卷积模型用于复杂系统中的非线性多元时间序列

Domjan Baric, Davor Horvatic

发表机构 * Department of Physics, Faculty of Science, University of Zagreb（扎格拉布大学物理系）

AI总结提出DCIts架构，通过分解为Focuser和Modeler组件，实现非线性多元时间序列的局部可解释交互结构学习，在保持预测精度的同时恢复稳定的符号化滞后交互模式。

Comments 40 pages, 13 figures

Journal ref Chaos 36, 063116 (2026)

详情

DOI: 10.1063/5.0325209

AI中文摘要

我们介绍了深度卷积时间序列解释器（DCIts），这是一种用于非线性多元时间序列的深度学习架构，能够提供样本特定、局部可解释的底层交互结构描述。与标准的黑箱预测器不同，DCIts学习一个时间和滞后依赖的转移张量，该张量被显式分解为两个组件：Focuser通过稀疏掩码机制选择相关的源序列和时间滞后，Modeler为这些选定的交互分配符号系数。这种分解为每个预测实例产生局部滞后邻接结构和符号化的源-滞后贡献，从而能够直接检查有效连接；当高阶分支被激活时，同一框架产生阶数分辨的元素级多项式贡献。在架构上，DCIts使用多样化的卷积滤波器库来捕获时间和跨变量依赖关系，这些依赖关系通过瓶颈网络映射到转移张量。在具有已知交互结构的受控基准数据集上，我们证明DCIts在实现竞争性预测误差（相对于强可解释基线）的同时，恢复了稳定的、符号化的、滞后分辨的交互模式。因此，该框架优先考虑内在可解释性，将预测准确性作为忠实性约束而非唯一目标。

英文摘要

We introduce the Deep Convolutional Interpreter for Time Series (DCIts), a deep-learning architecture for nonlinear multivariate time series that provides sample-specific, locally interpretable descriptions of the underlying interaction structure. Unlike standard black-box forecasters, DCIts learns a time- and lag-dependent transition tensor explicitly factorized into two components: a Focuser, which selects relevant source series and time lags via a sparse masking mechanism, and a Modeler, which assigns signed coefficients to these selected interactions. This decomposition yields a local lag-adjacency structure and signed source-lag contributions for every forecast instance, enabling direct inspection of effective connectivity; when higher-order branches are activated, the same framework yields order-resolved elementwise polynomial contributions. Architecturally, DCIts uses a diverse bank of convolutional filters to capture temporal and cross-variable dependencies, which are mapped through a bottleneck network to the transition tensor. On controlled benchmark datasets with a known interaction structure, we demonstrate that DCIts achieves competitive forecasting error relative to a strong interpretable baseline while recovering stable, signed, lag-resolved interaction patterns. The framework thus prioritizes intrinsic interpretability, using forecasting accuracy as a faithfulness constraint rather than the sole objective.

URL PDF HTML ☆

赞 0 踩 0

2412.16758 2026-06-10 physics.med-ph cs.CV 版本更新

Training Set Augmentation and Biology-Aware Harmonization Improve Radiomic Models for Lung Cancer Prediction in Indeterminate Nodules

训练集增强与生物学感知的谐波化改善不确定肺结节中肺癌预测的影像组学模型

Claire Huchthausen, Menglin Shi, Gabriel L. A. de Sousa, James Larner, Einsley Janowski, Jonathan Colen, Krishni Wijesooriya

发表机构 * Department of Radiation Oncology, University of Virginia School of Medicine（弗吉尼亚大学医学院放射肿瘤学系）； Department of Physics, University of Virginia（弗吉尼亚大学物理系）； Department of Physics, Massachusetts Institute of Technology（麻省理工学院物理系）； Department of Biomedical Engineering, Northwestern University（西北大学生物医学工程系）； Department of Radiation Oncology, University of Virginia（弗吉尼亚大学放射肿瘤学系）； Old Dominion University（旧 Dominion 大学）

AI总结针对早期肺结节恶性率低和图像采集差异问题，通过加入后期结节扩充训练集，并采用生物学感知的谐波化方法校正采集效应，显著提升了影像组学模型的预测性能（ROC-AUC 0.74）。

Comments 22 pages, 5 figures, plus supplemental material; updated with the accepted version of the manuscript

详情

AI中文摘要

基于CT影像组学的机器学习有潜力比标准方法更早预测肺结节（PNs）中的肺癌。早期发育PNs的低恶性率和可变的图像采集方式阻碍了用于诊断这些PNs的影像组学模型的开发。为应对这些挑战，我们利用后期发育的PNs扩充训练集，并对采集效应进行谐波化处理。我们研究了低于标准诊断灵敏度的早期发育良性及恶性PNs（n=106）。当仅使用早期发育PNs的ComBat谐波化影像组学特征训练时，分类器预测恶性程度的表现接近随机。随后，我们用后期发育的良性及恶性PNs（n=225）扩充训练集。我们评估了谐波化是否必须纳入影响新增训练数据中采集效应的生物学因素。为校正来自四种采集协议的变异性，我们比较了：1）生物学无感知谐波化，2）使用区分早期发育、后期发育良性、后期发育恶性数据集的协变量进行谐波化，3）分别对每个数据集进行谐波化。使用扩充训练集但采用生物学无感知谐波化的模型未能持续改进。使用协变量谐波化（ROC-AUC 0.74 [0.69-0.79]）或分别谐波化（ROC-AUC 0.71 [0.66-0.77]）的扩充训练数据获得了更高的测试ROC-AUC（Delong检验，p<=0.05）和PR-AUC（Wilcoxon检验，p<=0.05）。在一项原理验证方法学研究中，我们通过一个小型单中心数据集证明，结合来自后期发育良性及恶性PNs的影像组学特征需要生物学感知的谐波化。

英文摘要

CT radiomics-based machine learning has potential to predict lung cancer in pulmonary nodules (PNs) earlier than standard-of-care methods. Low malignancy rates in early-development PNs and variable image acquisition hinder development of radiomic models for diagnosing these PNs. To address these challenges, we augmented training using later-development PNs and harmonized for acquisition effects. We examine early-development benign and malignant PNs (n=106) below the sensitivity of standard-of-care diagnosis. Classifiers predicting malignancy performed near chance when trained on ComBat-harmonized radiomic features from only early-development PNs. We then augmented training with later-development benign and malignant PNs (n=225). We evaluated whether harmonization must incorporate biology that impacts acquisition effects in added training data. To correct variability from four acquisition protocols, we compared: 1) biology-unaware harmonization, 2) harmonizing with a covariate distinguishing early-development, later-development benign, later-development malignant datasets, 3) harmonizing each dataset separately. Models trained using augmentation, but biology-unaware harmonization, failed to improve consistently. Augmented training data harmonized with a covariate (ROC-AUC 0.74 [0.69-0.79]) or separately (ROC-AUC 0.71 [0.66-0.77]) yielded higher test ROC-AUC (Delong, p<=0.05) and PR-AUC (Wilcoxon, p<=0.05). In a proof-of-principle methodological study, we demonstrate with a small single-center dataset that combining radiomic features from later-development benign and malignant PNs requires biology-aware harmonization.

URL PDF HTML ☆

赞 0 踩 0

2501.01481 2026-06-10 eess.IV cs.CV 版本更新

Unleashing Correlation and Continuity for Hyperspectral Reconstruction from RGB Images

释放相关性与连续性：从RGB图像进行高光谱重建

Fuxiang Feng, Runmin Cong, Shoushui Wei, Yipeng Zhang, Jun Li, Sam Kwong, Wei Zhang

发表机构 * School of Control Science and Engineering, Shandong University（控制科学与工程学院，山东大学）； Key Laboratory of Machine Intelligence and System Control, Ministry of Education（机器智能与系统控制重点实验室，教育部）； University of California, Los Angeles（加州大学洛杉矶分校）； Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences（智能地理信息处理重点实验室，中国地质大学）； Lingnan University（岭大大学）

AI总结提出相关性连续性网络(CCNet)，通过局部光谱相关性建模(GrSCM)和全局光谱连续性建模(NeSCM)及自适应融合(PAF)，实现RGB到高光谱图像的SOTA重建。

详情

AI中文摘要

从RGB图像重建高光谱图像(HSI)可以以较低成本获得高空间分辨率的HSI，显示出巨大的应用潜力。本文揭示了光谱特征的局部相关性和全局连续性对于HSI重建任务至关重要。因此，我们充分探索了这些光谱间关系，并提出了相关性连续性网络(CCNet)用于从RGB图像重建HSI。针对局部光谱的相关性，我们引入了分组光谱相关性建模(GrSCM)模块，该模块在局部范围内高效建立光谱波段相似性。针对全局光谱的连续性，我们设计了邻域光谱连续性建模(NeSCM)模块，该模块利用记忆单元递归地建模全局层面的渐进变化特征。为了探索这两个模块的内在互补性，我们设计了分块自适应融合(PAF)模块，以分块自适应方式将全局连续性特征高效集成到光谱特征中。这些创新提升了重建HSI的质量。我们在光谱重建任务的主流数据集NTIRE2022和NTIRE2020上进行了全面的比较和消融实验。与当前先进的光谱重建算法相比，我们设计的算法达到了最先进(SOTA)性能。

英文摘要

Reconstructing Hyperspectral Images (HSI) from RGB images can yield high spatial resolution HSI at a lower cost, demonstrating significant application potential. This paper reveals that local correlation and global continuity of the spectral characteristics are crucial for HSI reconstruction tasks. Therefore, we fully explore these inter-spectral relationships and propose a Correlation and Continuity Network (CCNet) for HSI reconstruction from RGB images. For the correlation of local spectrum, we introduce the Group-wise Spectral Correlation Modeling (GrSCM) module, which efficiently establishes spectral band similarity within a localized range. For the continuity of global spectrum, we design the Neighborhood-wise Spectral Continuity Modeling (NeSCM) module, which employs memory units to recursively model the progressive variation characteristics at the global level. In order to explore the inherent complementarity of these two modules, we design the Patch-wise Adaptive Fusion (PAF) module to efficiently integrate global continuity features into the spectral features in a patch-wise adaptive manner. These innovations enhance the quality of reconstructed HSI. We perform comprehensive comparison and ablation experiments on the mainstream datasets NTIRE2022 and NTIRE2020 for the spectral reconstruction task. Compared to the current advanced spectral reconstruction algorithms, our designed algorithm achieves State-Of-The-Art (SOTA) performance.

URL PDF HTML ☆

赞 0 踩 0

2605.14999 2026-06-10 cs.HC cs.AI cs.CY

Towards Gaze-Informed AI Disclosure Interfaces: Eye-Tracking Attentional and Cognitive Load While Reading AI-Assisted News

迈向基于 gaze 的 AI 信息披露界面：阅读 AI 协助新闻时的注视注意力与认知负荷

Pooja Prajod, Hannes Cools, Thomas Röggla, Pablo Cesar, Abdallah El Ali

发表机构 * Centrum Wiskunde & Informatica（荷兰数学与信息研究所）； University of Amsterdam（阿姆斯特丹大学）； TU Delft（代尔夫特理工大学）； Utrecht University（乌得勒支大学）

AI总结研究探讨了AI信息披露对读者注意力和认知负荷的影响，发现简要披露导致更高的注视时间和眼跳次数，而详细披露无额外负担，提出基于注视的自适应信息披露设计。

详情

AI中文摘要

随着生成式AI在新闻业中的深入整合，设计有效的人工智能使用披露以在不给读者造成不必要的负担的情况下提供信息是一个关键挑战。尽管先前研究主要关注信任和可信度，但披露对读者注意力和认知负荷的影响仍被忽视。为填补这一空白，我们进行了一项3×2×2混合因子研究，操纵AI使用披露细节水平（无、一行、详细）、新闻类型（政治、生活方式）和AI的角色（编辑、部分内容生成），通过NASA-TLX和眼动追踪测量负荷。我们的结果揭示了显著的注意力成本：一行披露导致更高的注视持续时间和眼跳次数，尤其是在AI编辑内容中。详细披露未增加额外负担。基于信息间隙理论，我们认为简短标签可能通过提示读者注意AI使用而引发更高的视觉审视，但未提供足够信息。NASA-TLX分数和瞳孔直径在各条件下无显著差异，表明AI使用披露无论细节水平如何均不造成认知负担。访谈见解 contextualize 这些发现，并揭示对详细或“按需详细”设计的强烈偏好。我们的发现为基于注视的自适应信息披露界面设计提供了指导，该界面可根据读者的注意力模式和新闻上下文动态调整透明度水平。

英文摘要

As generative AI becomes increasingly integrated into journalism, designing effective AI-use disclosures that inform readers without imposing unnecessary burden is a key challenge. While prior research has primarily focused on trust and credibility, the impact of disclosures on readers' attentional and cognitive load remains underexplored. To address this gap, we conducted a $3\times2\times2$ mixed factorial study manipulating the level of AI-use disclosure detail (none, one-line, detailed), news type (politics, lifestyle), and role of AI (editing, partial content generation), measuring load via NASA-TLX and eye-tracking. Our results reveal a significant attentional cost: one-line disclosures resulted in significantly higher fixation durations and saccade counts, particularly for AI-edited content. Detailed disclosures did not impose additional burden. Drawing on Information-Gap Theory, we argue that brief labels may trigger increased visual scrutiny by alerting readers to AI use without providing enough information. NASA-TLX scores and pupil diameter showed no significant differences across conditions, suggesting that AI-use disclosures do not impose cognitive burden regardless of the detail level. Interview insights contextualize these findings and reveal a strong preference for detailed or ``detail-on-demand'' designs. Our findings inform the design of gaze-informed adaptive disclosure interfaces that dynamically adjust transparency levels based on readers' attentional patterns and news context.

URL PDF HTML ☆

赞 0 踩 0

2603.03339 2026-06-10 cs.CY cs.AR cs.CL cs.HC

Offline-First LLM Architecture for Adaptive Learning in Low-Connectivity Environments

面向低连接环境的离线优先LLM架构：用于自适应学习

Joseph Walusimbi, Ann Move Oguti, Joshua Benjamin Ssentongo, Keith Ainebyona

发表机构 * University of Nairobi（内罗毕大学）

AI总结本文提出一种离线优先的LLM架构，适用于低连接环境中的自适应学习，通过本地推理和硬件感知模型选择，提供课程对齐的解释和结构化学术支持，适应不同教育阶段的学习者需求。

Comments 16 pages, 10 figures, 2 tables

详情

AI中文摘要

人工智能（AI）和大语言模型（LLMs）通过使对话辅导、个性化解释和探究式学习成为可能，正在改变教育技术。然而，大多数基于AI的学习系统依赖持续的互联网连接和云计算，限制了其在带宽受限环境中的使用。本文提出了一种面向低连接环境的离线优先大语言模型架构，该系统通过量化语言模型在本地进行所有推理，并结合硬件感知的模型选择，使部署在低规格CPU设备上成为可能。通过去除对云基础设施的依赖，该系统通过自然语言交互提供课程对齐的解释和结构化的学术支持。为了支持不同教育阶段的学习者，该系统包括自适应响应级别，生成不同复杂程度的解释：简单英语、初级中学、高级中学和技术。这使解释能够根据学生能力进行调整，提高学术概念的清晰度和理解。该系统在有限连接条件下部署于选定的中学和高等教育机构，并在技术性能、可用性、感知响应质量和教育影响方面进行了评估。结果显示，在传统硬件上稳定运行，响应时间可接受，用户对支持自主学习的支持有积极评价。这些发现证明了在低连接环境中离线大语言模型部署用于AI辅助教育的可行性。

英文摘要

Artificial intelligence (AI) and large language models (LLMs) are transforming educational technology by enabling conversational tutoring, personalized explanations, and inquiry-driven learning. However, most AI-based learning systems rely on continuous internet connectivity and cloud-based computation, limiting their use in bandwidth-constrained environments. This paper presents an offline-first large language model architecture designed for AI-assisted learning in low-connectivity settings. The system performs all inference locally using quantized language models and incorporates hardware-aware model selection to enable deployment on low-specification CPU-only devices. By removing dependence on cloud infrastructure, the system provides curriculum-aligned explanations and structured academic support through natural-language interaction. To support learners at different educational stages, the system includes adaptive response levels that generate explanations at varying levels of complexity: Simple English, Lower Secondary, Upper Secondary, and Technical. This allows explanations to be adjusted to student ability, improving clarity and understanding of academic concepts. The system was deployed in selected secondary and tertiary institutions under limited-connectivity conditions and evaluated across technical performance, usability, perceived response quality, and educational impact. Results show stable operation on legacy hardware, acceptable response times, and positive user perceptions regarding support for self-directed learning. These findings demonstrate the feasibility of offline large language model deployment for AI-assisted education in low-connectivity environments.

URL PDF HTML ☆

赞 0 踩 0

2510.17876 2026-06-10 physics.geo-ph cs.LG

Three-dimensional inversion of gravity data using implicit neural representations and scientific machine learning

利用隐式神经表示和科学机器学习进行三维重力数据反演

Pankaj K Mishra, Sanni Laaksonen, Jochen Kamm, Anand Singh

发表机构 * Geological Survey of Finland（芬兰地质调查局）； Indian Institute of Technology Bombay（印度理工学院孟买分校）

AI总结本文提出一种基于隐式神经表示的三维重力反演方法，通过物理正演模型损失直接训练深度神经网络，实现连续密度场的反演，无需预定义网格或离散化，提升对地质结构的重建能力。

Comments Codes for reproducing results are at https://zenodo.org/records/19440024

Journal ref Scientific Reports (2026)

详情

DOI: 10.1038/s41598-026-55960-5

AI中文摘要

重力数据反演是研究地下密度变化的重要方法，涉及矿产勘探、地热评估、碳储存、天然氢、地下水资源和构造演化。本文提出一种科学机器学习方法，利用隐式神经表示（INR）将地下密度表示为连续场。该方法通过物理正演模型损失直接训练深度神经网络，将空间坐标映射到连续密度场，无需预定义网格或离散化。空间编码增强了网络捕捉尖锐对比和短波长特征的能力，克服了传统坐标基网络因频谱偏倚导致的过度平滑问题。本文在合成示例中验证了该方法，包括平滑模型和倾斜块模型，以评估不同深度结构的恢复能力。INR框架在不使用显式正则化或深度加权的情况下重建了详细的结构和地质合理的边界，同时随着问题规模增大，减少了反演参数数量。这些结果展示了隐式表示在实现可扩展、灵活和可解释的大规模地球物理反演中的潜力。该框架可推广到其他地球物理方法及联合/多物理场反演。

英文摘要

Inversion of gravity data is an important method for investigating subsurface density variations relevant to mineral exploration, geothermal assessment, carbon storage, natural hydrogen, groundwater resources, and tectonic evolution. Here we present a scientific machine-learning approach for three-dimensional gravity inversion that represents subsurface density as a continuous field using an implicit neural representation (INR). The method trains a deep neural network directly through a physics-based forward-model loss, mapping spatial coordinates to a continuous density field without predefined meshes or discretisation. Spatial encoding enhances the network's capacity to capture sharp contrasts and short-wavelength features that conventional coordinate-based networks tend to oversmooth due to spectral bias. We demonstrate the approach on synthetic examples including smooth models, representing realistic geological complexity, and a dipping block model to assess recovery of structures at different depths. The INR framework reconstructs detailed structure and geologically plausible boundaries without explicit regularisation or depth weighting, while reducing the number of inversion parameters as the problem size grows bigger. These results highlight the potential of implicit representations to enable scalable, flexible, and interpretable large-scale geophysical inversion. This framework could generalise to other geophysical methods and for joint/multiphysics inversion.

URL PDF HTML ☆

赞 0 踩 0

2601.00809 2026-06-10 cs.OH cs.AI cs.MA

A Modular Reference Architecture for MCP-Servers Enabling Agentic BIM Interaction

一种模块化参考架构用于MCP服务器，以实现代理BIM交互

Tobias Heimig-Elschner, Changyu Du, Anna Scheuvens, André Borrmann, Jakob Beetz

发表机构 * Chair of Design Computation, RWTH Aachen University（设计计算系，亚琛工业大学）； Chair of Computing in Civil and Building Engineering, Technical University of Munich（土木与建筑工程计算系，慕尼黑技术大学）； Federal Institute for Research on Building, Urban Affairs and Spatial Development (BBSR)（建筑、都市事务和空间发展研究院）； TUM Georg Nemetschek Institute（慕尼黑技术大学Georg Nemetschek研究所）

AI总结本文提出一种模块化参考架构，通过解耦MCP接口与特定BIM-API，实现API无关、隔离且可重复的BIM代理交互，提升重用性和研究系统性。

Comments Accepted at the GNI Symposium on Artificial Intelligence for the Built World (Technical University of Munich, May 18--20, 2026)

详情

DOI: 10.14459/2026md1851873

AI中文摘要

由大型语言模型驱动的代理工作流正被越来越多应用于建筑信息建模（BIM），使IFC模型的自然语言检索、修改和生成成为可能。最近的研究开始采用新兴的模型上下文协议（MCP）作为LLM的统一工具调用接口，简化了BIM交互的代理侧。尽管MCP标准化了LLM调用工具的方式，但当前BIM侧的实现仍需针对特定工具进行编程，限制了在不同环境中的重用、评估和工作流可移植性。本文通过引入一种模块化参考架构来解决这一缺口，该架构使MCP服务器能够实现API无关、隔离且可重复的BIM代理交互。通过对近期文献中反复出现的能力进行系统分析，我们推导出一组核心需求。这些需求指导了一种以显式适配器合同为中心的微服务架构，将MCP接口与特定BIM-API解耦。使用IfcOpenShell的原型实现展示了在常见修改和生成任务中的可行性。在代表性场景中的评估表明，该架构能够实现可靠的工作流，减少耦合，并为系统性研究提供可重用的基础。

英文摘要

Agentic workflows driven by large language models (LLMs) are increasingly applied to Building Information Modelling (BIM), enabling natural-language retrieval, modification and generation of IFC models. Recent work has begun adopting the emerging Model Context Protocol (MCP) as a uniform tool-calling interface for LLMs, simplifying the agent side of BIM interaction. While MCP standardises how LLMs invoke tools, current BIM-side implementations are still authoring tool-specific and ad hoc, limiting reuse, evaluation, and workflow portability across environments. This paper addresses this gap by introducing a modular reference architecture for MCP servers that enables API-agnostic, isolated and reproducible agentic BIM interactions. From a systematic analysis of recurring capabilities in recent literature, we derive a core set of requirements. These inform a microservice architecture centred on an explicit adapter contract that decouples the MCP interface from specific BIM-APIs. A prototype implementation using IfcOpenShell demonstrates feasibility across common modification and generation tasks. Evaluation across representative scenarios shows that the architecture enables reliable workflows, reduces coupling, and provides a reusable foundation for systematic research.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Rare Event Analysis via Stochastic Optimal Control

Prosociality by Coupling, Not Mere Observation: Homeostatic Sharing in an Inspectable Recurrent Artificial Life Agent

mlr3mbo: Bayesian Optimization in R

Generalizing Fair Top-$k$ Selection: An Integrative Approach

Reasoning over Semantic IDs Enhances Generative Recommendation

Quantifying Perception-Based Student Success with Generative AI: An Exploratory Monte Carlo Simulation

Quantifying Uncertainty in AI Visibility: A Statistical Framework for Generative Search Measurement

Exact Functional ANOVA Decomposition for Categorical Inputs Models

GRAU: Generic Reconfigurable Activation Unit Design for Neural Network Hardware Accelerators

ASA: Backbone-Training-Free Representation Engineering for Tool-Calling Agents

Deep Generative Model for Human Mobility Behavior

The hyper-scaled NLP bound for maximum-entropy remote sampling

Adoption of Generative Artificial Intelligence in the German Software Engineering Industry: An Empirical Study

torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch

Pushing the limits of one-dimensional NMR spectroscopy for automated structure elucidation using artificial intelligence

On the Condition Number Dependency in Bilevel Optimization

Selective Disk Bispectrum: A Complete and Rotation Invariant Image Descriptor

Cyst-X: A Multi-Center MRI Benchmark and Federated Learning Framework for Malignancy-Risk Stratification of Pancreatic Cystic Neoplasm

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization

Latent Guided Sampling for Combinatorial Optimization

An $(ε,δ)$-accurate level set estimation with a stopping criterion

BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

Interpretable deep convolutional model for nonlinear multivariate time series in complex systems

Training Set Augmentation and Biology-Aware Harmonization Improve Radiomic Models for Lung Cancer Prediction in Indeterminate Nodules

Unleashing Correlation and Continuity for Hyperspectral Reconstruction from RGB Images

Towards Gaze-Informed AI Disclosure Interfaces: Eye-Tracking Attentional and Cognitive Load While Reading AI-Assisted News

Offline-First LLM Architecture for Adaptive Learning in Low-Connectivity Environments

Three-dimensional inversion of gravity data using implicit neural representations and scientific machine learning

A Modular Reference Architecture for MCP-Servers Enabling Agentic BIM Interaction