arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2094
专题追踪
2606.20394 2026-06-19 cs.RO math.OC 新提交

Agentic AutoResearch forSpace Autonomy: An Auditable, LLM-Driven Research Agent for Aerospace Control Problems

面向空间自主性的智能体自动研究:用于航空航天控制问题的可审计、LLM驱动的研究代理

Amit Jain, Richard Linares

发表机构 * Department of Aeronautics and Astronautics(航空航天学系)

AI总结 提出AutoResearch框架,利用大语言模型作为离线研究代理,自动迭代开发航天控制策略,并通过内置可信层审计结果,消除种子噪声影响,在交会和对接问题上验证了有效性。

详情
AI中文摘要

航天器的制导、导航与控制功能日益通过从专家求解器中提炼的学习策略来实现。开发这样的策略本身就是一个研究过程:研究者选择架构和超参数,运行实验,并必须判断一个明显的改进是真实的还是仅仅是种子噪声。本文提出了AutoResearch框架,其中大语言模型自主驱动这一循环,用于航空航天控制问题,并结合了一个内置在循环中的可信层,该层根据问题自身测量的种子噪声对每个报告的结果进行认证。语言模型仅作为离线研究代理,负责开发控制策略;它产生的训练策略随后部署在航天器上,而模型本身从不操作飞行器。在每次迭代中,代理读取自然语言描述的问题描述和运行历史,对训练脚本提出一次编辑,执行它,并记录结果。任何报告的结果在通过相同的三项检查之前不会被认可:测量的每个问题的种子噪声、最佳配置的重新播种验证,以及代理编辑的留一法剪枝。相同的循环被原样应用于两个航空航天控制问题:Clohessy-Wiltshire相对交会问题和带有安全约束的避碰对接问题(经过禁飞区),每个问题都针对已知的最优控制基准进行了校准。在这两个问题中,经过审计的策略以多个标准差超过了测量的种子噪声;对相同参数的未定向搜索则没有。在对接问题上,差距变得明显:未定向搜索没有产生可行的策略,而学习到的策略在每个种子上都保持在禁飞区之外。

英文摘要

Spacecraft guidance, navigation, and control functions are increasingly realized as learned policies distilled from expert solvers. Developing such a policy is itself a research process: an investigator selects an architecture and hyperparameters, runs experiments, and must determine whether an apparent improvement is genuine or merely seed noise. This paper presents AutoResearch, a framework in which a large language model autonomously drives that loop for aerospace control problems, coupled with a credibility layer, built into the loop, that certifies each reported result against the problem's own measured seed noise. The language model serves only as the offline research agent that develops the control policy; the trained policy it produces is then deployed onboard the spacecraft, while the model itself never operates the vehicle. At each iteration the agent reads a plain-language problem description and the run history, proposes a single edit to the training script, executes it, and logs the outcome. No reported result is credited until it passes the same three checks: measured per-problem seed noise, reseeded verification of the best configuration, and leave-one-out pruning of the agent's edits. The same loop is applied, unchanged, to two aerospace control problems: a Clohessy-Wiltshire relative rendezvous and a safety-constrained collision-avoidance docking past a keep-out zone, each calibrated against a known optimal control benchmark. In both, the audited policy clears the measured seed noise by many standard deviations; an undirected search over the same parameters does not. On the docking problem the gap becomes categorical: undirected search yields no feasible policy, while the learned policy stays outside the keep-out zone on every seed.

2606.20162 2026-06-19 cs.AI cs.IT cs.NI math.IT 新提交

Implicit Semantic-Aware Communication Based on Hypergraph Reasoning

基于超图推理的隐式语义感知通信

Yiwei Liao, Shurui Tu, Yong Xiao, Yingyu Li, Guangming Shi

发表机构 * China Electric Power Research Institute Co., Ltd(中国电力科学研究院有限公司) National Key Laboratory for Power Grid Environmental Protection(电网环境保护国家重点实验室) School of Electronic Information and Communications, Huazhong University of Science and Technology(华中科技大学电子信息与通信学院) Peng Cheng Laboratory(鹏城实验室) Pazhou Laboratory (Huangpu)(琶洲实验室(黄埔)) School of Mechanical Engineering and Electronic Information, China University of Geosciences(中国地质大学机械与电子信息学院)

AI总结 提出基于超图的隐式语义推理框架HISR,通过超图建模多实体高阶关系,在噪声信道下提升语义推理鲁棒性,准确率提升36.6%。

Comments This work is accepted at IEEE Transactions on Communications

详情
AI中文摘要

语义感知通信已成为下一代通信系统的变革性范式,将基本目标从传输比特级符号转变为可靠恢复和理解信息的语义含义。先前研究表明,将源消息的语义内容表示为基于图的结构可以显著提高通信效率和接收端语义推理的准确性。然而,现有解决方案通常采用仅捕获成对关系的图,从而忽略了现实场景中常见的高阶隐式相关性,例如群体交互、多实体关联和复杂关系上下文。这种限制降低了语义表达能力,并使语义推理容易受到歧义和性能下降的影响,尤其是在噪声或损坏的信道条件下。为了解决这些问题,本文提出了一种新颖的基于超图的隐式语义推理框架HISR,该框架利用超图表示语义知识实体之间的复杂多实体关系。在HISR中,实体及其关联的高阶关系被映射到针对不同关系上下文定制的专用语义子空间中。这种设计不仅解耦了多样的语义交互以减轻传统图嵌入方法中常见的过平滑效应,而且即使在传输过程中发生部分信息丢失时也能实现鲁棒的语义推理。数值结果表明,所提出的HISR在隐式语义解释准确率上比最先进的基准提高了36.6%。

英文摘要

Semantic-aware communication has emerged as a transformative paradigm for next-generation communication systems, shifting the fundamental goal from transmitting bit-level symbols to reliably recovering and understanding the semantic meaning of information. Previous studies have demonstrated that representing the semantic content of source messages as graph-based structures can significantly improve communication efficiency and the accuracy of semantic inference at the receiver. However, existing solutions typically employ graphs that capture only pairwise relationships, thereby neglecting higher-order implicit correlations commonly observed in real-world scenarios, such as group interactions, multi-entity associations, and complex relational contexts. This limitation reduces semantic expressiveness and makes semantic inference susceptible to ambiguity and performance degradation, particularly under noisy or corrupted channel conditions. To address these issues, this paper proposes a novel hypergraph-based implicit semantic reasoning framework, HISR, which leverages hypergraphs to represent complex multi-entity relationships among semantic knowledge entities. In HISR, entities and their associated higher-order relations are mapped into dedicated semantic subspaces tailored to distinct relational contexts. This design not only disentangles diverse semantic interactions to mitigate the over-smoothing effects commonly found in traditional graph embedding methods but also enables robust semantic inference even when partial information loss occurs during transmission. Numerical results show that the proposed HISR achieves up to a 36.6% improvement in implicit semantic interpretation accuracy over the state-of-the-art benchmarks.

2606.19878 2026-06-19 cs.LG math.OC stat.ML 新提交

On the Oracle Complexity of Interpolation-Based Gradient Descent

基于插值的梯度下降的预言复杂度

Dongmin Lee, William Lu, Anuran Makur

发表机构 * Purdue University(普渡大学)

AI总结 提出分段多项式插值梯度下降(PPI-GD)方法,通过数据域等距点查询一阶预言构造多项式插值近似全梯度,在强凸和非凸损失下分析预言复杂度,证明在数据维数受限且损失足够光滑时优于多种GD变体。

Comments 16 pages, 2 figures

详情
AI中文摘要

最近关于经验风险最小化(ERM)的一阶优化器的工作表明,可以利用ERM损失函数在训练数据中的光滑性(而非优化参数中的光滑性)来改进梯度下降(GD)方法的预言复杂度。在本文中,我们提出了一种不精确梯度方法——分段多项式插值梯度下降(PPI-GD),该方法通过在数据域中的等距点处查询一阶预言来近似每次迭代中的全梯度,从而在数据域的适当大小的块上构造所得梯度样本的多项式插值。我们分析了PPI-GD在强凸和非凸损失函数下的预言复杂度,其中数据空间维数以训练样本数量的多对数函数为界,并发现当损失函数足够光滑时,PPI-GD在关键区域优于几种GD变体。此外,我们的分析将双三次样条插值误差分析中的几种技术扩展到$d$变量张量积多项式插值的设置中,这可能对插值分析具有独立意义。

英文摘要

Recent work on first-order optimizers for empirical risk minimization (ERM) has suggested that smoothness of ERM loss functions in the training data, rather than in the optimization parameters, can be leveraged to improve the oracle complexity of gradient descent (GD) methods. In this paper, we propose an inexact gradient method, piecewise polynomial interpolation-based gradient descent (PPI-GD), which approximates the full gradient in each iteration by querying the first-order oracle at equidistant points in the data domain to construct polynomial interpolants of the resulting gradient samples over appropriately sized patches of the data domain. We analyze the oracle complexity of PPI-GD for strongly convex and non-convex loss functions when the data space dimension is bounded by a polylogarithmic function of the number of training samples, and find it to outperform several GD variants in key regimes when the loss function is sufficiently smooth. Furthermore, our analysis extends several techniques from the error analysis of bicubic spline interpolants to the setting of $d$-variate tensor product polynomial interpolants which may be of independent interest in interpolation analysis.

2606.19754 2026-06-19 cs.LG cs.NA math.NA 新提交

Learning universal approximations for partial differential equations with Physics-Informed Broad Learning System

基于物理信息广度学习系统的偏微分方程通用逼近学习

Zhiwen Yu, Derong Yang, Liujian Zhang, Kaixiang Yang, Peilin Zhan, Jianmin Lv, Jane You, C. L. Philip Chen

发表机构 * School of Computer Science and Engineering, South China University of Technology(华南理工大学计算机科学与工程学院) Peng Cheng Laboratory(鹏城实验室) School of Future Technology, South China University of Technology(华南理工大学未来技术学院) School of Computer Science and Technology, Guangdong University of Technology(广东工业大学计算机科学与技术学院) Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University(香港理工大学工业及系统工程学系)

AI总结 提出物理信息广度学习系统(PIBLS),通过无反向传播的最小二乘优化高效求解线性和非线性偏微分方程,比传统PINN快1-3个数量级且精度更高。

详情
AI中文摘要

偏微分方程(PDE)在建模复杂的物理、生物和工程系统中起着核心作用。虽然传统的数值求解器很稳健,但由于网格依赖性,它们常常带来高昂的计算成本,而最近的物理信息神经网络(PINN)提供了一种无网格替代方案,但经常遭受收敛缓慢和优化不稳定的问题。为了弥合这一差距,本文提出了物理信息广度学习系统(PIBLS),一种新颖的无反向传播框架,将PDE求解重新表述为直接的最小二乘优化。我们改进了该框架内的一个算法以高效处理非线性PDE,并提供了严格的数学证明,确立了PIBLS对这些方程的通用逼近性质。在线性和非线性PDE上的实验表明,PIBLS比传统PINN快1到3个数量级,同时实现了显著更高的求解精度。该框架为科学机器学习提供了一种计算高效的范式,为实时仿真和设计优化任务提供了一种实用、高速的替代方案。

英文摘要

Partial differential equations (PDEs) play a central role in modeling complex physical, biological, and engineering systems. While traditional numerical solvers are robust, they often incur prohibitive computational costs due to mesh dependencies, whereas recent Physics-Informed Neural Networks (PINNs) offer a mesh-free alternative but frequently suffer from slow convergence and optimization instability. To bridge this gap, this article proposes the Physics-Informed Broad Learning System (PIBLS), a novel backpropagation-free framework that reformulates PDE solving as a direct least-squares optimization. We improved an algorithm within this framework to handle nonlinear PDEs efficiently and provide a rigorous mathematical proof establishing the universal approximation property of PIBLS for these equations. Experiments on linear and nonlinear PDEs demonstrate that PIBLS is one to three orders of magnitude faster than conventional PINNs while achieving significantly higher solution accuracy. This framework provides a computationally efficient paradigm for scientific machine learning, offering a practical, high-speed alternative for real-time simulation and design optimization tasks.

2606.19521 2026-06-19 cs.LG math.OC 新提交

Interactive Pareto navigation for deep multi-task learning

深度多任务学习的交互式帕累托导航

Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz

发表机构 * Department of Computer Science, TU Dortmund, Dortmund, Germany(多特蒙德工业大学计算机科学系,德国多特蒙德) Lamarr Institute for Machine Learning and Artificial Intelligence(拉马尔机器学习和人工智能研究所)

AI总结 提出偏好帕累托探索(PPE)框架,通过预测-校正方法沿帕累托流形切线方向引导偏好,利用Krylov子空间方法避免Hessian计算,实现高效交互式多目标优化。

详情
AI中文摘要

在多任务学习中,处理越来越多的目标在计算资源和决策者选择适当权衡的能力方面都很快变得具有挑战性。因此,一种广泛使用的方法是通过加权和将各个损失聚合到单个损失函数中。这通常由于帕累托前沿的形状而无法捕捉决策者的偏好,或者需要多次调整和计算,这在深度学习应用中变得过于昂贵。为了解决这些问题,我们引入了一个新颖的框架,偏好帕累托探索(PPE),它在交互式探索过程中强制执行决策者的偏好,同时考虑帕累托集的几何形状。PPE基于预测-校正方法,该方法沿着帕累托最优解流形的切线方向执行预测步骤,遵循决策者的偏好。随后的校正步骤产生反映该偏好的新权衡。为了在表征流形切空间时避免显式的Hessian计算,我们采用了一种仅依赖于矩阵-向量乘积的Krylov子空间方法。这些乘积可以通过自动微分高效获得,确保了整个优化过程的效率和鲁棒性。该方法的有效性和性能通过玩具问题和深度学习示例进行了展示。

英文摘要

In multi-task learning, handling an increasing number of objectives can quickly become challenging, both in terms of the computational resources and the decision maker's capacity to choose appropriate trade-offs. A widely used approach is thus to aggregate the individual losses in a single loss function by a weighted sum. This often fails to capture either the decision maker's preferences as a result of the shape of the Pareto front, or requires multiple adjustments and computations which becomes prohibitively expensive in deep learning applications. To address these issues, we introduce a novel framework, Preference Pareto Exploration (PPE), which enforces the decision maker's preferences while accounting for the geometry of the Pareto set in an interactive exploration process. PPE is based on a predictor-corrector method that performs predictor steps tangential to the manifold of Pareto-optimal solutions, following the decision maker's preference. The subsequent corrector step results in a new trade-off reflecting this preference. To avoid explicit Hessian computations when characterizing the tangent space of the manifold, we employ a Krylov subspace method that relies solely on matrix-vector products. These products can be efficiently obtained via automatic differentiation, ensuring both efficiency and robustness throughout the optimization process. The method's functionality and performance are demonstrated using both toy problems and examples from deep learning.

2606.19361 2026-06-19 cs.LG cs.AI cs.NA math.NA stat.CO stat.ME stat.ML 新提交

Computational Identifiability

计算可识别性

Lucius E. J. Bynum, Rajesh Ranganath, Kyunghyun Cho

发表机构 * New York University(纽约大学)

AI总结 提出“计算可识别性”框架,通过有限计算搜索过程在指定误差容限内找到经验估计量,从而解决理论可识别性在有限样本、模糊图标准等实际场景中的不足。

详情
AI中文摘要

识别条件描述了目标查询或感兴趣参数作为可用信息类型和数量的函数的可计算性。在因果识别中,这些信息通常以因果图的形式表达,数据是针对图中某些变量子集观测或收集的。目标查询可以是单个效应,也可以是给定模型中的一类效应。识别算法的推导在数学上定义了期望中理论上唯一确定所需因果效应的过程。期望中的可识别性,即“理论可识别性”,通常假设渐近性质、无限数据或其他数学理想化条件。在本文中,我们探讨了这种理论理想化的可识别性与一种受计算限制的替代方案之间的根本区别。我们提出的框架——“计算可识别性”——而是为经验估计量定义一个有限的计算搜索过程。如果该过程在期望的误差容限内经验性地找到了估计量,则满足可识别性,条件取决于搜索的指定假设(即参数上的先验分布)以及搜索过程本身。通过多个实验,我们展示了该框架如何回答细粒度的实际识别问题,例如小有限样本下的识别、模糊图标准下的识别、混合观测-干预数据下的识别,以及跨反事实数据和估计量的识别。代码见 https://this https URL。

英文摘要

Identification conditions describe the computability of a target query or parameter of interest as a function of the type and amount of information available. In causal identification, this information is often expressed in the form of a causal graph, and data are observed or collected for some subset of variables in the graph. Target queries may be for a single effect alone or for a class of effects in a given model. The derivation of an identification algorithm then defines mathematically the process by which the desired causal effect(s) can be uniquely determined, theoretically, in expectation. Identifiability in expectation, or 'theoretical identifiability,' generally assumes asymptotic properties, infinite data, or other mathematically idealized conditions. In this paper, we explore a fundamental distinction between this theoretical, idealized notion of identifiability and a proposed alternative that is computation-bound. The framework we propose - 'computational identifiability' - is to instead define a finite computational search procedure for an empirical estimator. If this process finds an estimator empirically, within a desired error tolerance, then identifiability is satisfied, conditional on the specified assumptions of the search (i.e., a prior distribution over the parameters) and conditional on the search procedure itself. Through several experiments, we demonstrate how this framework allows us to answer fine-grained, practical identification questions, such as identification with small finite samples, with ambiguous graphical criteria, with mixed observational-interventional data, and across counterfactual data and estimands. Code is available at https://github.com/lbynum/metadentify.

2606.20467 2026-06-19 cs.LG cs.NA math.NA physics.comp-ph 新提交

Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural Networks

智能符号搜索:超越手工表达式、网格和神经网络的PDE特征化

Zongmin Yu, Liu Yang

发表机构 * National University of Singapore(新加坡国立大学)

AI总结 提出ASYS框架,通过智能体将PDE理论转化为可微分符号程序,结合进化搜索和梯度优化自动发现解析形式或近似,在多个问题中生成可解释表示。

详情
AI中文摘要

数学家通过数学结构而非计算值表来理解PDE解。历史上,这需要针对每个问题单独进行数学分析。数值模拟和神经网络都不能直接产生这些结构。我们提出智能符号搜索(ASYS),一种先验引导框架,其中智能体将PDE理论、公共问题约束和累积搜索经验转化为可测试的可微分符号程序。数学形式在进化搜索下被精炼,而其连续参数通过基于梯度的优化拟合。这使得搜索成为归纳偏置注入的自动化形式,而非盲目的符号回归。对于已知解析形式的问题,ASYS自然恢复这些形式;对于其他问题,ASYS构建解析近似,可引导数学家进行进一步分析。在我们的实验中,跨越五个问题,包括有界动力学、有限时间爆破和自由边界聚焦,ASYS产生了可解释表示,包括Allen-Cahn 2D动力学的几何界面公式和Keller-Segel趋化爆破的九参数收缩律,这些场景中先前没有闭式描述。ASYS展示了表征PDE解的新范式的可能性,超越了手工解析解、基于网格的数值解和神经网络近似。

英文摘要

Mathematicians understand a PDE solution through mathematical structures rather than tables of computed values. Historically, this has been the product of mathematical analysis, carried out by hand for each problem individually. Neither numerical simulation nor neural networks produce those structures directly. We propose Agentic Symbolic Search (ASYS), a prior-guided framework in which an agent translates PDE theory, public problem constraints, and accumulated search experience into testable differentiable symbolic programs. The mathematical forms are refined under evolutionary search, while their continuous parameters are fit by gradient-based optimization. This makes the search an automated form of inductive-bias injection rather than blind symbolic regression. For problems with known analytical forms, ASYS recovers these forms naturally; for other problems, ASYS constructs analytical approximations which can guide mathematicians toward further analysis. In our experiments, across five problems spanning bounded dynamics, finite-time blow-up, and free-boundary focusing, ASYS produces interpretable representations, including a geometric interface formula for Allen-Cahn 2D dynamics and a nine-parameter contraction law for Keller-Segel chemotactic blow-up, in settings where no closed-form description was previously available. ASYS shows the possibility of a new paradigm for characterizing PDE solutions, beyond handcrafted analytical solutions, mesh-based numerical solutions, and neural network approximations.

2606.20329 2026-06-19 cs.LG physics.geo-ph 新提交

Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems

约束混合建模预测土壤系统中微生物动态与有机质周转

Paul Collart, Juergen Gall, Andrea Schnepf, Holger Pagel, Lars Doorenbos

发表机构 * Agrosphere (IBG-3), Forschungszentrum Jülich GmbH(农业圈(IBG-3),于利希研究中心) Institute of Crop Science and Resource Conservation, University of Bonn(波恩大学作物科学与资源保护研究所) Institute of Computer Science, University of Bonn(波恩大学计算机科学研究所) Lamarr Institute for Machine Learning and Artificial Intelligence(拉马尔机器学习和人工智能研究所)

AI总结 提出首个混合建模框架,利用神经网络从宏基因组推断功能性状预测过程模型参数,并整合生态理论约束,有效预测微生物动态和有机质周转。

Comments Accepted at ICML '26

详情
AI中文摘要

土壤微生物控制有机质循环,并在很大程度上决定土壤系统如何应对和缓解气候变化及环境威胁。因此,在基于过程的土壤模型中表示微生物动态对于预测土壤碳循环至关重要,尽管从数据中获取信息极具挑战性。改进参数化的一个有前景的方法是整合基因组数据,然而建模基因组与微生物驱动过程之间复杂且未知的关系是一个未解决的问题。在这项工作中,我们提出了第一个混合建模框架,用于从基于DNA测序数据的宏基因组推断功能性状中推导基于过程的土壤有机质周转模型的生物动力学参数值。我们的模型通过神经网络从基因组性状数据预测过程模型的生物动力学参数,并整合来自生态理论和文献的约束,以确保即使是非观测状态变量也能实现逼真的行为。我们在不同复杂度的合成基因组性状数据集和真实数据上评估了我们的方法,结果表明,我们的方法在多个基线上提高了性能,并有效学习了过程模型中不可测量组分的动态,即使是在小训练数据集上也是如此。

英文摘要

Soil microorganisms control organic matter cycling and largely determine how soil systems can cope with and mitigate climate change and environmental threats. Representing microbial dynamics in process-based soil models is therefore critical to predict carbon cycling in soils, albeit highly challenging to inform from data. One promising approach to improve their parametrisation is the integration of genomic data, yet modelling the complex and unknown relationship between genomes and the processes the microbes are driving is an unsolved problem. In this work, we present the first hybrid modeling framework for deriving biokinetic parameter values of a process-based soil organic matter turnover model from metagenome-inferred functional traits based on DNA sequencing data. Our model predicts biokinetic parameters of the process-based model from genomic trait data with a neural network and integrates constraints from ecological theory and literature to ensure realistic behavior, even of non-observed state variables. We evaluate our method on synthetic genomic trait datasets of varying complexity and on real data, showing that our approach improves performance over multiple baselines and learns the dynamics of unmeasurable components of the process-based model effectively, even for small training datasets.

2606.20326 2026-06-19 cs.LG physics.comp-ph 新提交

Quantum-classical physics-informed Kolmogorov-Arnold networks for PDEs

量子-经典物理信息Kolmogorov-Arnold网络求解偏微分方程

Xiang Rao, Yuxuan Shen

发表机构 * School of Petroleum Engineering, Yangtze University(扬州大学石油工程学院) School of Computer Science, Yangtze University(扬州大学计算机科学学院) State Key Laboratory of Low Carbon Catalysis and Carbon Dioxide Utilization (Yangtze University)(扬州大学低碳催化与二氧化碳利用国家重点实验室) Western Research Institute, Yangtze University(扬州大学西部研究院)

AI总结 提出QCPIKAN,首个量子-经典物理信息Kolmogorov-Arnold网络,结合Chebyshev多项式KAN层和参数化量子电路,通过嵌入物理约束加速高频误差指数收敛并抑制数值色散,在多孔介质渗流场景中优于现有量子-经典PINN。

详情
AI中文摘要

我们开发了QCPIKAN,这是首个旨在求解偏微分方程(PDE)的量子-经典物理信息Kolmogorov-Arnold网络。该混合框架基于Chebyshev多项式KAN层和参数化量子电路构建,将物理约束嵌入训练损失中以强制执行物理一致性。我们的基于逼近论的理论研究证明,该设计将高频误差收敛加速至指数速率,并有效抑制数值色散。我们在多孔介质中的三个典型渗流场景(包括单相流、组分运移和两相流)上验证了该框架。与现有的量子-经典物理信息神经网络相比,QCPIKAN在全局预测精度、局部误差控制、动态演化跟踪和驱替前沿定位方面均实现了优越性能。这项工作为求解复杂PDE提供了一种鲁棒且高效的替代方案。

英文摘要

We develop QCPIKAN, the first quantum-classical physics-informed Kolmogorov-Arnold network designed to solve partial differential equations (PDEs). Built upon Chebyshev-polynomial KAN layers and parameterized quantum circuits, this hybrid framework embeds physical constraints into the training loss to enforce physical consistency. Our theoretical investigations grounded in approximation theory prove that this design accelerates high-frequency error convergence to an exponential rate and effectively mitigates numerical dispersion. We validate the framework across three typical seepage scenarios in porous media, including single-phase flow, component transport and two-phase flow. Compared with existing quantum-classical physics-informed neural networks, QCPIKAN achieves superior performance in global prediction accuracy, local error control, dynamic evolution tracking and displacement front localization. This work provides a robust and efficient alternative for solving complex PDEs.

2606.19853 2026-06-19 cs.LG physics.comp-ph 新提交

Physics-Informed Neural Network with Squeeze-Excitation-like Attention

带有挤压-激励式注意力的物理信息神经网络

Yun-Fei Song, Long-Gang Pang, Fu-Peng Li, Jun-Jie Zhang

发表机构 * Key Laboratory of Quark and Lepton Physics (MOE) & Institute of Particle Physics, Central China Normal University(华中师范大学夸克与轻子物理教育部重点实验室及粒子物理研究所) Artificial Intelligence and Computational Physics Research Center, Central China Normal University(华中师范大学人工智能与计算物理研究中心) Key Laboratory of Nuclear Physics and Ion-beam Application (MOE) & Institute of Modern Physics, Fudan University(复旦大学核物理与离子束应用教育部重点实验室及现代物理研究所) Shanghai Research Center for Theoretical Nuclear Physics, NSFC and Fudan University(国家自然科学基金委员会-复旦大学上海理论核物理研究中心) Northwest Institute of Nuclear Technology(西北核技术研究所)

AI总结 提出SEA-PINN架构,通过挤压-激励式注意力机制动态调整神经元重要性,实现稳定初始化,在20个基准问题中17个方差极小,无需傅里叶嵌入或周期激活即可达到与TSA-PINN相当的精度,并可作为轻量插件提升其他PINN性能。

Comments 15 pages, 6 figures

详情
AI中文摘要

我们引入了SEA-PINN,一种新颖的架构,它将类似挤压-激励的注意力机制融入物理信息神经网络,以动态重新校准各层神经元的重要性。SEA-PINN的一个关键特性是其高度稳定的初始化。在20个基准问题中的17个上,SEA-PINN表现出几乎可忽略的方差和显著降低的初始损失,为优化建立了一个准确定且有利的起点。值得注意的是,在没有采用傅里叶特征嵌入或周期激活函数的情况下,SEA-PINN与TSA-PINN(一种通过正弦激活中的可学习频率专门为高频问题设计的模型)相比,达到了具有竞争力的精度(在高频案例7上,相对于FNN-PINN的改进分别为83%和90%)。此外,将SEA-PINN集成到TSA-PINN中使性能提升了42.49%。这些结果强调了SEA-PINN作为一种轻量级插件模块,能够增强非线性表示能力,促进更稳健和高效的收敛,并提高物理信息学习的整体可靠性。

英文摘要

We introduce SEA-PINN, a novel architecture that incorporates a Squeeze-Excitation-like attention mechanism into physics-informed neural networks to dynamically recalibrate the importance of neurons across layers. A key feature of SEA-PINN is its highly stable initialization. On 17 out of 20 benchmark problems, SEA-PINN exhibit nearly negligible variance and significantly reduced initial loss, establishing a quasi-deterministic and favorable starting point for optimization. Notably, without employing Fourier feature embeddings or periodic activation functions, SEA-PINN attained competitive accuracy (83\% vs. 90\% improvement relative to FNN-PINN on the high-frequency case 7) as compared with TSA-PINN-a model specifically engineered for high-frequency problems via learnable frequencies in sinusoidal activations. Furthermore, integrating SEA-PINN into TSA-PINN boosted performance by 42.49\%. These results underscore SEA-PINN as a lightweight plug-in module that enhances nonlinear representation power, promotes more robust and efficient convergence, and strengthens the overall reliability of physics-informed learning.

2606.19562 2026-06-19 cs.LG physics.flu-dyn 新提交

Advances in Scientific Machine Learning for Coupled Fluid Flow and Transport

耦合流体流动与输运的科学机器学习进展

Gabriel F. Barros, Rômulo M. Silva, Alvaro L. G. A. Coutinho

发表机构 * COPPE - Federal University of Rio de Janeiro - UFRJ(里约热内卢联邦大学COPPE学院)

AI总结 综述科学机器学习在耦合流体流动与输运问题中的进展,包括基于SVD的线性降阶和PINNs、β-VAE等神经网络方法,并展示其在浊流和热对流中的应用。

详情
AI中文摘要

本章回顾了科学机器学习(SciML)在模拟由不可压缩Navier-Stokes方程和标量输运方程控制的耦合流体流动与输运现象方面的最新进展。这类系统出现在浊流和热对流等应用中,具有强非线性耦合和多尺度行为,使得高保真模拟计算成本高昂。为此,本章调查了构建高效代理模型的最新SciML方法,包括基于奇异值分解的线性降阶技术(如动态模态分解)和非线性神经网络方法(如物理信息神经网络(PINNs)和β-变分自编码器(β-VAEs))。首先介绍了作者将这些模型与高性能计算策略相结合的工作,包括自适应网格细化/粗化(AMR/C)和科学浮点数据压缩。然后提出了两个新贡献:通过PINNs对浊流进行代理建模,以及使用β-VAEs从热流中提取解缠的非线性模态。控制方程和代表性基准(包括锁交换流和Rayleigh-Bénard对流)说明了这些方法。本章篇幅较长,涵盖了耦合流体流动的数学和物理基础以及最先进建模的计算方面。总体而言,它展示了SciML如何在特定数据范围和建模假设下,实现复杂耦合系统的快速、精确近似,同时相对于全阶模拟大幅降低计算成本。实时预测和不确定性量化等更广泛的能力仍然是活跃的研究方向,其可行性在很大程度上取决于具体问题。

英文摘要

This chapter reviews recent advances in Scientific Machine Learning (SciML) for modeling coupled fluid flow and transport phenomena governed by the incompressible Navier-Stokes and scalar transport equations. Such systems, found in applications like turbidity currents and thermal convection, feature strong nonlinear coupling and multiscale behavior that make high-fidelity simulations computationally expensive. To address this, the chapter surveys state-of-the-art SciML methods for building efficient surrogate models, including linear reduced-order techniques based on Singular Value Decomposition (such as Dynamic Mode Decomposition) and nonlinear neural network approaches like Physics-Informed Neural Networks (PINNs) and $β$-Variational Autoencoders ($β$-VAEs). It first covers the authors' work combining these models with High Performance Computing strategies, including Adaptive Mesh Refinement/Coarsening (AMR/C) and scientific floating-point data compression. It then presents two new contributions: surrogate modeling of turbidity currents via PINNs, and the extraction of disentangled nonlinear modes from thermal flows using $β$-VAEs. Governing equations and representative benchmarks, including lock-exchange flows and Rayleigh-Bénard convection, illustrate these methodologies. The chapter is intentionally long, covering both the mathematical and physical foundations of coupled fluid flow and the computational aspects of state-of-the-art modeling. Overall, it demonstrates how SciML enables fast, accurate approximations of complex coupled systems within the specific data regimes and modeling assumptions considered, while substantially reducing computational cost relative to full-order simulations. Broader capabilities such as real-time prediction and uncertainty quantification remain active research directions whose feasibility depends strongly on the problem at hand.

2606.20231 2026-06-19 cs.AI cond-mat.stat-mech cs.IT math-ph math.IT math.MP nlin.AO 新提交

Thermodynamic Measure of Intelligence

智能的热力学度量

Ishanu Chattopadhyay

发表机构 * Institute for Biomedical Informatics, University of Kentucky(肯塔基大学生物医学信息学研究所) Department of Computer Science, University of Kentucky(肯塔基大学计算机科学系)

AI总结 提出智能是稀有但有效未来的合法放大,通过递归自模拟实现,并给出热力学度量,证明该结构对高智能必要且近乎充分。

详情
AI中文摘要

智能可以被度量吗?我们提出智能可以定义为稀有但有效未来的合法放大:一个系统增加那些在被动动力学下不太可能但在领域约束下仍然可允许的结果的概率。我们从智能系统必须建模世界及其自身在其中的位置这一前提开始。由于系统是其建模世界的一部分,这自然导致递归自模拟:系统表示其自身动作是轨迹一部分的未来。我们的核心结果给出了一个必要性陈述和一个条件性近乎充分性陈述,将该架构与稀有-有效未来的合法放大的精确热力学度量联系起来:高稀有-有效提升是不可能的,除非内部模拟以高保真度识别稀有-有效未来;反之,当稀有-有效保真度高且模拟包含有效策略时,可实现的提升接近受驱动限制的最优值。因此,递归自模拟不仅是智能的一个合理特征,而且在所述假设下,对于高热力学智能是必要且近乎充分的。由此产生的框架使智能在通用尺度上可度量,从被动物质和反馈控制器、大型语言模型、作为文本生成器的人类到麦克斯韦妖式信息引擎。

英文摘要

Can intelligence be measured? We propose that intelligence can be defined as the lawful amplification of rare but valid futures: a system increases the probability of outcomes that would be unlikely under passive dynamics but remain admissible under the constraints of the domain. We start with the premise that an intelligent system must model the world and its own place within it. Because the system is part of the world it models, this leads naturally to recursive self-simulation: the system represents futures in which its own actions are part of the trajectory. Our central results give a necessity statement and a conditional near-sufficiency statement connecting this architecture to a precise thermodynamic measure of lawful amplification of rare-valid futures: high rare-valid lift is impossible unless the internal simulation identifies rare-valid futures with high fidelity; conversely, when rare-valid fidelity is high and the simulation contains an effective policy, the achievable lift approaches the actuation-limited optimum. Thus recursive self-simulation is not merely a plausible feature of intelligence but, under the stated assumptions, is necessary and nearly sufficient for high thermodynamic intelligence. The resulting framework makes intelligence measurable on a universal scale, from passive matter and feedback controllers, large language models, and humans as text generators to Maxwell-demon-like information engines.

2606.19378 2026-06-19 cs.LG cond-mat.mtrl-sci 新提交

A Hybrid GNN-FEM Framework for Phase-Field Fracture Simulation. Physics-Preserving Hybridization for Generalizable Surrogate Modeling

一种用于相场断裂模拟的混合GNN-FEM框架:面向通用代理模型的物理保持混合方法

Hyeonbin Moon, Yongjin Choi, Seunghwa Ryu

发表机构 * KAIST(韩国科学技术院)

AI总结 提出混合GNN-FEM框架,用图神经网络替代相场更新步骤,保留FEM位移求解器,通过无量纲特征设计和物理信息损失实现跨几何、载荷、材料和离散化的通用断裂模拟,降低计算成本并保持精度。

Comments 46 pages

详情
AI中文摘要

科学机器学习(SciML)已成为加速复杂物理系统模拟的一种有前景的方法,但对于非线性、历史依赖问题实现物理一致且可泛化的预测仍然是一个核心挑战。在本研究中,我们提出了一种混合GNN-FEM框架,用于高效且可泛化的相场断裂建模。虽然相场方法为模拟复杂裂纹演化提供了稳健的变分框架,但其高计算成本限制了实际应用,因为需要在增量有限元过程中求解耦合、非线性和历史依赖的系统。为应对这一挑战,我们将图神经网络代理集成到传统的交错方案中,在每个载荷增量下替代相场更新,同时保留基于FEM的位移求解器以强制执行力学平衡和边界条件。通过保留增量求解结构,该框架与历史依赖的断裂演化保持一致,而无需代理近似整个解轨迹。这种选择性代理策略强调识别物理上有意义且增量结构化的学习目标,而非依赖暴力数据生成来学习整个断裂过程。所提出的框架通过无量纲特征设计、基于网格域的图公式以及源自控制相场方程的物理信息损失,实现了跨不同几何、载荷条件、材料属性和离散化的强泛化能力。数值实验表明,与传统FEM相比,该混合方法在保持精度的同时降低了计算成本,并在多种问题设置下展现出稳健的预测性能。

英文摘要

Scientific machine learning (SciML) has emerged as a promising approach for accelerating simulations of complex physical systems, yet achieving physically consistent and generalizable predictions for nonlinear, history-dependent problems remains a central challenge. In this study, we propose a hybrid GNN--FEM framework for efficient and generalizable phase-field fracture modeling. While phase-field approaches provide a robust variational framework for simulating complex crack evolution, their high computational cost limits practical applications because they require solving coupled, nonlinear, and history-dependent systems within an incremental finite element procedure. To address this challenge, a graph neural network surrogate is integrated into the conventional staggered scheme, replacing the phase-field update at each load increment while retaining the FEM-based displacement solver to enforce mechanical equilibrium and boundary conditions. By preserving the incremental solution structure, the framework remains consistent with history-dependent fracture evolution without requiring the surrogate to approximate the full solution trajectory. This selective surrogate strategy emphasizes the identification of a physically meaningful and incrementally structured learning target, rather than relying on brute-force data generation to learn the full fracture process. The proposed framework achieves strong generalization across varying geometries, loading conditions, material properties, and discretizations through dimensionless feature design, a graph-based formulation on mesh-based domains, and a physics-informed loss derived from the governing phase-field equation. Numerical experiments demonstrate that the hybrid approach reduces computational cost while maintaining accuracy compared with conventional FEM, and exhibits robust predictive performance across diverse problem settings.

2606.19375 2026-06-19 cs.LG cond-mat.mtrl-sci 新提交

Physics-Informed Discovery of Yield Functions in Plasticity via Convex Neural Representations

基于凸神经表示的塑性屈服函数物理信息发现

Hyeonbin Moon, Donghyuk Cho, Jecheon Yu, Jeong Whan Yoon, Seunghwa Ryu

发表机构 * KAIST(韩国科学技术院)

AI总结 提出一种物理信息框架,从全场位移和反力数据中自动发现各向异性屈服函数,无需应力观测或预设参数形式,采用凸神经网络表示并嵌入弹塑性应力积分中训练。

Comments 39 pages

详情
AI中文摘要

识别各向异性屈服函数仍然具有挑战性,因为屈服在全场力学测量中无法直接观测,方向标定可能需要多个加载方向,且选择合适的解析形式并非易事。本研究提出一种物理信息框架,用于从全场位移数据和反力数据中发现屈服函数,无需应力观测、塑性应变测量、直接屈服面数据或预设的参数化屈服函数。该框架将屈服函数识别为弹塑性应力积分中受力学约束的本构组成部分,而非通过直接的应力空间监督。屈服函数由凸神经网络表示,该网络强制执行凸性和一次正齐次性,同时施加假定的拉压对称性,并通过可微应力更新和跨多个加载工况的物理信息力平衡损失来训练该神经屈服函数。使用von Mises、Hill 1948和Yld2000-2d屈服函数的有限元基准研究验证了所提框架,评估了屈服轮廓一致性、位移噪声敏感性、通过塑性活跃应力状态的可识别性、认知不确定性和多项式代理部署。本研究提供了一条受力学约束的路径,用于从位移和力数据中发现各向异性屈服函数,同时将识别出的组件保留在弹塑性应力积分的结构内。

英文摘要

Identifying anisotropic yield functions remains challenging since yielding is not directly observed in full-field mechanical measurements, directional calibration can require many loading directions, and selecting an appropriate analytical form is nontrivial. This study proposes a physics-informed framework for discovering yield functions from full-field displacement data and reaction force data, without stress observations, plastic strain measurements, direct yield surface data, or a prescribed parametric yield function. The framework identifies the yield function as a mechanically constrained constitutive component inside elastoplastic stress integration, rather than through direct stress-space supervision. The yield function is represented by a convex neural network that enforces convexity and positive homogeneity of degree one while imposing the assumed tension-compression symmetry, and this neural yield function is trained with a differentiable stress update and a physics-informed force equilibrium loss across multiple loading cases. The proposed framework is validated using finite element (FE) benchmark studies with von Mises, Hill 1948, and Yld2000-2d yield functions, assessing yield contour agreement, displacement-noise sensitivity, identifiability through plastically active stress states, epistemic uncertainty, and polynomial-surrogate deployment. This study provides a mechanics-constrained pathway for discovering anisotropic yield functions from displacement and force data while keeping the identified component within the structure of elastoplastic stress integration.

2606.19245 2026-06-19 cs.AI cs.LG 新提交

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

TxBench-PP:分析AI代理在小分子临床前药理学中的表现

Hannah Le, Ramesh Ramasamy, Alex Urrutia, Mahsa Yazdani, Tim Proctor, Kenny Workman

发表机构 * LatchBio

AI总结 提出TxBench-PP基准,用于评估AI代理从真实实验数据中恢复临床前药理学结论的能力,测试显示最强配置Claude Opus 4.8 / Pi仅通过59.3%的端点尝试。

详情
AI中文摘要

人工智能(AI)代理有望通过压缩解释和决策循环来加速药物发现,但实际部署需要基于现实程序决策的可信评估。我们引入了TherapeuticsBench临床前药理学(TxBench-PP),这是一个针对小分子临床前药理学的可验证基准,也是更广泛的TherapeuticsBench在药物发现阶段和治疗模式中的首个聚焦切片。TxBench-PP测试代理是否能够从真实实验数据中恢复准确的结论,而非从文献中记忆的事实。该基准包含100个评估,按程序阶段、实验类型和任务结构索引,涵盖作用机制(MoA)和药效学(PD)推理、化合物-靶点结合、因果靶点验证、可开发性与安全性以及转化疗效。代理接收现实的工作流程快照,在编码环境中检查文件,并返回确定性评分的结构化答案。在16个模型-工具配置(包括11个模型和4,800条轨迹)中,没有系统能够可靠地恢复临床前药理学决策。最强配置Claude Opus 4.8 / Pi通过了59.3%的端点尝试(178/300;95% CI, 51.1-67.6),其次是GPT-5.5 / Pi,为55.3%(166/300;47.0-63.6)。

英文摘要

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic program decisions. We introduce TherapeuticsBench Preclinical Pharmacology (TxBench-PP), a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader TherapeuticsBench effort across drug-discovery stages and therapeutic modalities. TxBench-PP tests whether agents can recover accurate conclusions from real-world assay data rather than memorized facts from literature. The benchmark contains 100 evaluations indexed by program stage, assay type, and task structure, spanning mechanism-of-action (MoA) and pharmacodynamic (PD) reasoning, compound-target engagement, causal target validation, developability and safety, and translational efficacy. Agents receive realistic workflow snapshots, inspect files in a coding environment, and return structured answers graded deterministically. Across 16 model-harness configurations, comprising 11 models and 4,800 trajectories, no system reliably recovered preclinical pharmacology decisions. The strongest configuration, Claude Opus 4.8 / Pi, passed 59.3\% of endpoint attempts (178/300; 95\% CI, 51.1-67.6), followed by GPT-5.5 / Pi at 55.3\% (166/300; 47.0-63.6).

2606.19209 2026-06-19 cs.SD 新提交

FineCombo-TTS: Collaborative and Precise Controllable Speech Synthesis Using Text Descriptions and Reference Speech

FineCombo-TTS: 使用文本描述和参考语音的协作式精确可控语音合成

Shuoyi Zhou, Yixuan Zhou, Peiji Yang, Yifan Hu, Yicheng Zhong, Zhisheng Wang, Zhiyong Wu

发表机构 * Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院) Inner Mongolia University(内蒙古大学) Tencent(腾讯)

AI总结 提出FineCombo-TTS统一框架,通过条件流匹配的语音方差预测器实现基于文本描述的细粒度参考到目标变换,实现灵活精确的声学属性控制。

Comments Accepted by Interspeech 2026

详情
AI中文摘要

可控文本到语音(TTS)已成为一个关键研究焦点。然而,基于参考语音或文本描述的方法缺乏灵活性和精确控制,最近的联合方法仍然松散耦合,语音建模音色而文本控制全局风格。我们提出FineCombo-TTS,一个基于参考语音并由文本描述引导的语音合成统一框架,能够对声学属性进行灵活精确的控制。不同于显式属性解耦,我们学习统一的声学表示,并引入基于条件流匹配(CFM)的语音方差预测器,以建模由文本描述引导的细粒度参考到目标变换。为了支持相对属性控制,我们构建了FineEdit,一个结构化的配对数据集,显式编码源到目标的属性变化。实验表明,我们的方法实现了灵活、精确且富有表现力的可控TTS。

英文摘要

Controllable text-to-speech (TTS) has become a key research focus. However, methods based on either reference speech or text descriptions lack flexibility and precise control, and recent joint approaches remain loosely coupled, with speech modeling timbre and text controlling global style. We propose FineCombo-TTS, a unified framework for speech synthesis grounded in reference speech and guided by text descriptions, enabling flexible and precise control over acoustic attributes. Instead of explicit attribute disentanglement, we learn a unified acoustic representation and introduce a Conditional Flow Matching (CFM)-based Speech Variance Predictor to model fine-grained reference-to-target transformations guided by text descriptions. To support relative attribute control, we construct FineEdit, a structured paired dataset that explicitly encodes source-to-target attribute variations. Experiments demonstrate that our approach achieves flexible, precise, and expressive controllable TTS.

2606.19186 2026-06-19 cs.RO cs.LG 新提交

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

学习标注延迟和误报AEB事件:针对极端类别不平衡和非对称标签噪声的实用系统

Mengxiang Hao, Xin Jiang, Xinghao Huang, Wenliang Su, Zhiteng Wang, Junjie Rao, Xiaotian Yang, Wei Liao, Chengyu Han, Gen Liang, Yulun Song, Zhitao Xu, Xianpeng Lang

发表机构 * Li Auto

AI总结 提出首个自动化AEB标注框架,通过特定数据增强和噪声抑制技术,解决极端类别不平衡和非对称标签噪声问题,将延迟/误报触发召回率提升80%,人工工作量减少50%。

Comments 8 pages, 5 figures, accepted by IEEE International Conference on Robotics and Automation (ICRA)

Journal ref 2026 IEEE International Conference on Robotics and Automation (ICRA)

详情
AI中文摘要

自主紧急制动(AEB)优化依赖于准确标注的真实世界触发事件,特别是揭示系统缺陷的罕见但关键的延迟和误报AEB触发事件。然而,这些少数样本在每天数千次触发事件中占比不到5%,使得大规模人工标注成本过高。我们提出了首个自动化AEB标注框架来解决这一问题。在开发过程中,我们识别出两个严重损害延迟/误报触发标注准确性的基本挑战:(1)极端类别不平衡,其中延迟/误报触发被真实触发淹没;(2)非对称标签噪声,其中误标注的多数样本(真实触发)抑制了少数样本(延迟/误报触发)的学习。为克服这些挑战,我们提出两项关键创新:(1)特定数据增强,通过操纵焦点目标属性、移植自车动态和掩蔽非焦点代理来合成逼真样本;(2)噪声抑制,使用稳定硬度估计和探针引导的自适应阈值来清理误标注的真实触发样本。关键的是,我们将模型部署为具有全栈架构的实用标注系统,从每天数千个AEB事件中高效识别关键的延迟/误报触发。生产结果表明,延迟/误报触发的召回率提高了80%,人工工作量减少了50%。除了直接收益,该系统通过积累高质量标注实现持续自我改进,为车载AEB系统优化奠定了必要的数据基础。

英文摘要

Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framework to address this problem. During development, we identified two fundamental challenges that severely impair delayed/false trigger annotation accuracy: (1) Extreme class imbalance where delayed/false triggers are overwhelmed by true triggers; (2) Asymmetric label noise where mislabeled majority samples (true triggers) suppress minority samples (delayed/false triggers) learning. To overcome these challenges, we propose two key innovations: (1) Specific data augmentation that synthesizes realistic samples by manipulating focal target attributes, transplanting ego-vehicle dynamics, and masking non-focal agents; (2) noise suppression using stable hardness estimation and probe-guided adaptive threshold to clean mislabeled true trigger samples. Crucially, we deploy our model as a practical annotation system with full-stack architecture, efficiently identifying critical delayed/false triggers from thousands of daily AEB events. Production results demonstrate 80% improvement in recall of delayed/false triggers and 50% reduction in manual workload. Beyond immediate gains, the system enables continuous self-improvement through accumulated high-quality annotations, establishing a necessary data foundation for on-vehicle AEB system optimization

2606.19031 2026-06-19 cs.RO 新提交

Congestion-Aware Robot Tour Planning in Crowded Environments

拥挤环境中的拥塞感知机器人巡视规划

Stefano Bernagozzi, Charlie Street, Masoumeh Mansouri, Lorenzo Natale

发表机构 * Istituto Italiano di Tecnologia(意大利理工学院) Università di Genova(热那亚大学) University of Birmingham(伯明翰大学)

AI总结 提出一种基于概率的巡视规划器,通过学习人流预测模型并在线构建马尔可夫决策过程,在拥挤环境中高效规划机器人路径,减少拥塞影响。

Comments Accepted to IEEE IROS 2026

Journal ref IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2026

详情
AI中文摘要

自主移动服务机器人通常需要完成在环境中遍历一组位置的巡视任务。例如,引导人们穿过购物中心、在配送中心递送包裹或在博物馆提供导览。然而,在拥挤环境中,人群的存在可能对机器人性能产生负面影响。例如,人类会触发机器人的碰撞避免操作,从而降低机器人速度。人群随机移动且随时间变化。本文提出一种针对拥挤环境的概率巡视规划器,该规划器明确考虑人类拥塞。我们学习圆形线性流场(CLiFF)地图,该地图根据初始观测预测人类轨迹。然后,我们利用这些预测在线构建并求解马尔可夫决策过程,从而高效地将机器人引导通过环境。我们的方法具有足够的可扩展性,能够在观察到新人群时重新规划。我们在购物中心的真实人群数据集上评估了该方法。

英文摘要

Autonomous mobile service robots are often required to complete tours that require navigating through a set of locations in an environment. Example domains include guiding people through a shopping mall, delivering packages in a fulfilment centre, or giving guided tours in a museum. However, in crowded environments, the presence of people may negatively impact robot performance. For example, humans will activate robot collision avoidance manoeuvres that slow the robot down. Crowds move stochastically and vary throughout the day. In this paper we present a probabilistic tour planner for crowded environments which explicitly reasons over human congestion. We learn circular linear flow field (CLiFF) maps which predict human trajectories given an initial observation. We then use these predictions to build and solve a Markov decision process online which efficiently routes the robot through the environment. Our approach is scalable enough to re-plan as new people are observed. We evaluate our approach on a real-world crowd dataset in a shopping mall.

2606.18970 2026-06-19 cs.LG cs.AI cs.CV 新提交

A Controlled Benchmark of Quantum-Latent GAN Augmentation for Brain MRI

脑MRI的量子潜GAN增强的受控基准测试

Syed Mujtaba Haider, Silvia Figini

发表机构 * Department of Mathematics(数学系) Department of Political and Social Sciences(政治与社会科学系)

AI总结 通过受控基准测试,比较量子与经典生成器在脑MRI数据增强中的性能,发现两者均未显著优于仅用真实数据训练,且量子生成器无额外优势。

详情
AI中文摘要

医学图像分类常受限于有限的标注数据,因此生成式增强被提出;最近,量子生成模型被用于此目的,并经常报告准确率提升。然而,这些声称通常基于单次训练运行,未匹配量子与经典生成器的参数预算,也未表征任何收益出现的数据范围。我们提出了一个受控基准测试,隔离量子生成器对脑MRI增强的贡献。图像被编码到KL正则化的潜在空间中,在该空间中,使用变分量子生成器或参数数量几乎相同的经典生成器(1648 vs. 1632)训练带有梯度惩罚的条件Wasserstein GAN。合成样本被解码并用于增强预训练分类器,覆盖从5%到100%的标注数据比例,通过八个随机种子进行配对显著性检验(多重比较校正)以及集内多样性和潜在分布分析。在所有比例下,没有增强变体显著优于仅用真实数据训练,且量子与经典生成器在统计上无法区分。任何低数据优势表现为正则化而非忠实的数据扩展:合成样本分布外移,并且在数据稀缺时严重模式崩溃,而量子生成器并不比经典生成器更多样化。我们发布该协议作为医学成像中量子生成增强严格评估的测试平台。

英文摘要

Medical image classification is often constrained by limited labeled data, motivating generative augmentation; recently, quantum generative models have been proposed for this purpose, frequently reporting accuracy gains. However, such claims are typically based on single training runs, do not match the parameter budgets of the quantum and classical generators, and do not characterize the data regime in which any benefit appears. We present a controlled benchmark that isolates the contribution of a quantum generator to brain-MRI augmentation. Images are encoded into a KL-regularized latent space in which a conditional Wasserstein GAN with gradient penalty is trained using either a variational quantum generator or a classical generator of near-identical parameter count (1648 vs. 1632). Synthetic samples are decoded and used to augment a pretrained classifier across labeled data fractions from 5% to 100%, evaluated over eight random seeds with paired significance testing (with multiple-comparison correction) and with intraset diversity and latent-distribution analyses. Across all fractions, no augmentation variant significantly outperforms real-data-only training, and the quantum and classical generators are statistically indistinguishable. Any low-data benefit behaves as regularization rather than faithful data expansion:synthetic samples are off distribution and severely mode collapsed precisely where data is scarce, and the quantum generator is no more diverse thanits classical counterpart. We release the protocol as a testbed for rigorous evaluation of quantum generative augmentation in medical imaging.

2606.18960 2026-06-19 cs.CV cs.RO 新提交

Mem-World: Memory-Augmented Action-Conditioned World Models for Persistent Robot Manipulation

Mem-World:用于持久机器人操作的内存增强动作条件世界模型

Zirui Zheng, Jiaqian Yu, Xiongfeng Peng, jun shi, Mingyi Li, Chao Zhang, Weiming Li, Dong Wang, Huchuan Lu, Xu Jia

发表机构 * Dalian University of Technology(大连理工大学) Samsung R&D Institute China-Beijing (SRCB)(三星中国北京研究院)

AI总结 提出Mem-World,通过4D腕部视角曲面元索引内存W-VMem,解决操作中因遮挡和运动导致的场景遗忘问题,实现持久世界建模,提升策略评估与改进效果。

详情
AI中文摘要

动作条件世界模型已成为机器人学习的一种有前景的范式,通过生成动作一致的视频推演,为昂贵的真实世界实验提供了可扩展的替代方案。然而,在操作中持久世界建模仍然具有挑战性:频繁的末端执行器遮挡和快速的腕部相机运动使得当前观测不足以预测未来视图,导致模型遗忘或幻觉先前帧中看到的场景细节。现有的内存检索策略在动态操作场景中往往无法识别信息丰富的历史。为解决这一限制,我们提出了Mem-World,一种内存增强的多视图动作条件世界模型。其核心是W-VMem,一种4D腕部视图为中心的曲面元索引内存,将历史观测锚定到随时间演变的表面元素上。通过显式建模场景元素被观测的时间和位置,W-VMem能够根据未来动作实现几何感知的相关历史帧检索。在生成过程中,通过基于曲面元的渲染和评分选择相关历史帧,为预测提供信息丰富且非冗余的上下文。大量实验表明,Mem-World在复杂操作场景中生成持久推演,比Ctrl-World实现更可靠的策略评估,将皮尔逊相关系数提高14.5%,并通过合成数据生成支持有效的策略改进,在长时域任务中将成功率从58%提升到72%。

英文摘要

Action-conditioned world models have emerged as a promising paradigm for robot learning, offering a scalable alternative to costly real-world experimentation by generating action-consistent video rollouts. However, persistent world modeling remains challenging in manipulation: frequent end-effector occlusions and rapid wrist-camera motion make the current observation insufficient for predicting future views, causing models to forget or hallucinate scene details seen in earlier frames. Existing memory retrieval strategies often fail to identify informative history in dynamic manipulation scenarios. To address this limitation, we propose Mem-World, a memory-augmented multi-view action-conditioned world model. At its core, we present W-VMem, a 4D wrist-view-centered surfel-indexed memory that anchors historical observations to temporally evolving surface elements. By explicitly modeling when and where scene elements are observed, W-VMem enables geometry-aware retrieval of relevant history frames conditioned on future actions. During generation, relevant history frames are selected via surfel-based rendering and scoring, providing informative and non-redundant context for prediction. Extensive experiments show that Mem-World generates persistent rollouts in complex manipulation scenarios, enables more reliable policy evaluation than Ctrl-World, improving the Pearson correlation with real-world performance by 14.5\%, and supports effective policy improvement through synthetic data generation, increasing success rates from 58\% to 72\% on long-horizon tasks.

2606.18950 2026-06-19 cs.AI 新提交

RTSGameBench: An RTS Benchmark for Strategic Reasoning by Vision-Language Models

RTSGameBench: 视觉语言模型战略推理的RTS基准

San Kim, Daechul Ahn, Reokyoung Kim, Hyeonbeom Choi, Seungyeon Jwa, Jonghyun Choi

发表机构 * Seoul National University(首尔国立大学)

AI总结 提出RTSGameBench,基于Beyond All Reason游戏,通过多样化对战、迷你游戏诊断和自进化生成框架,评估视觉语言模型在实时策略游戏中的战略推理能力。

Comments First two authors contributed equally

详情
AI中文摘要

现代视觉语言模型(VLM)在竞争和合作环境中的不确定性下,往往难以进行战略推理,即预测和影响其他智能体的行为。实时策略(RTS)游戏可以作为诊断这一局限性的自然测试平台,因为它们要求与盟友协调、适应对手策略,并在部分可观测性下进行长期规划。然而,现有的RTS基准评估范围有限,缺乏系统的能力诊断,并且局限于预设计的场景覆盖。为了解决这些限制,我们提出了RTSGameBench,它建立在Beyond All Reason之上,这是一款大规模RTS游戏,其扩展战场要求比现有测试平台更广泛的策略多样性。该基准通过多种对战结构提供评估,通过迷你游戏进行诊断性评估,每个迷你游戏针对单个战略能力,并通过自进化生成框架实现可扩展的覆盖,该框架将自由形式的查询转化为新的迷你游戏,并在连续循环中改进。此外,为了让VLM在大规模RTS游戏中运行,我们提供了RTSGameAgent,它通过具有智能体记忆的有限状态机(FSM)管理单位。我们通过实验验证,多个最先进的VLM在对战需要更紧密协调、多智能体协调以及任务规模增加时表现不佳。

英文摘要

Modern Vision-Language Models (VLMs) often struggle with strategic reasoning, i.e., anticipating and influencing other agents' actions, under uncertainty in competitive and cooperative settings. Real-time strategy (RTS) games can be a natural testbed for diagnosing this limitation, as they demand coordination with allies, adaptation to opponents' strategy, and long-horizon planning under partial observability. However, existing RTS benchmarks offer limited evaluation scope, lack systematic competency diagnosis, and remain fixed in the pre-designed scenario coverage. To address these limitations, we present RTSGameBench, which is built on Beyond All Reason, a large-scale RTS game with an expanded battlefield that demands broader strategy diversity than the existing testbeds. The proposed benchmark provides evaluations through diverse gameplay across various matchup structures, diagnostic assessment via mini-games, each targeting an individual strategic competency, and extensible coverage via a self-evolving generation framework that converts free-form queries into new mini-games, improving over successive cycles. Additionally, for VLMs to operate in large-scale RTS games, we provide RTSGameAgent that manages units by an FSM with agentic memory. We empirically validate that multiple state-of-the-art VLMs do not perform well when matchups demand tighter coordination, multiagent coordination and when task scale increases.

2606.18933 2026-06-19 cs.LG cs.IR stat.ME 新提交

Zero-Shot Active Feature Acquisition via LLM-Elicitation

基于LLM启发式的零样本主动特征获取

Binyamin Perets, Natalie Mendelson, Shiran Vainberg, Yehuda Chowers, Shai Shen-Orr, Shie Mannor

发表机构 * Faculty of EE, Technion(技术学院电子工程系) Faculty of Medicine, Technion(技术学院医学院) CytoReason NVIDIA

AI总结 提出通过LLM启发式获取马尔可夫随机场充分统计量的零样本主动特征获取框架,解决数据标注不足问题,在IBD患者诊断中优于现有方法。

详情
AI中文摘要

主动特征获取(AFA)顺序选择要观察的特征以达成分类或排序决策。其主要局限性在于依赖大量标注数据来拟合指导获取的概率模型。大型语言模型(LLM)提供无监督的领域知识,但作为序列规划者表现不佳。要求其同时知晓和决策会混淆最好分开的能力。这里,我们通过严格的启发式方法开发了一个零样本AFA框架:仅要求LLM返回其可被信任返回的内容,即马尔可夫随机场(MRF)的充分统计量——一元偏差和成对协变。我们将该框架应用于两个场景:二分类和top-$k$识别。实践中,LLM可靠地仅返回判别性统计量,即区分类别而非孤立每个类别的统计量,这阻碍了经典AFA。我们应用最大熵闭包来解决这种规范模糊性。我们在炎症性肠病(IBD)患者队列上进行评估,这是一个活跃的临床环境,其中诊断模糊性和患者异质性阻碍了稳定的治疗策略。我们的框架在真实标签和其自身提取的信念上均优于LLM。在最关键的地方,即最困难的患者上,我们的top-$k$获取策略显著优于所有现有方法。

英文摘要

Active feature acquisition (AFA) sequentially selects which features to observe to reach a classification or ranking decision. Its central limitation is reliance on large amount of labeled data to fit probabilistic models guiding acquisition. Large language models (LLMs) supply unsupervised domain knowledge, but are poor sequential planners. Asking one to both know and decide conflates capabilities best kept separate. Here, we develop a framework for zero-shot AFA through disciplined elicitation: asking the LLM only for what it can be trusted to return, the unary deviations and pairwise co-variations that are the sufficient statistics of a Markov random field (MRF). We apply our framework to two settings: binary classification and top-$k$ identification. In practice, the LLM reliably returns only discriminative statistics, what distinguishes the classes rather than each class in isolation, which precludes classical AFA. We apply a maximum-entropy closure that resolves this gauge ambiguity. We evaluate on a cohort of Inflammatory Bowel Disease (IBD) patients, an active clinical setting where diagnostic ambiguity and patient heterogeneity obstruct stable treatment strategies. Our framework outperforms the LLM both on real labels and on its own extracted beliefs. Where it matters most, on the hardest patients, our top-$k$ acquisition policy markedly outperforms all existing methods.

2606.18812 2026-06-19 cs.LG cs.AI 新提交

Reinforcement Learning Foundation Models Should Already Be A Thing

强化学习基础模型本应已经存在

Abdelrahman Zighem, Jill-Jênn Vie

发表机构 * École normale supérieure de Paris, PSL University, Paris, France(巴黎高等师范学院,PSL大学,法国巴黎) Soda team, Inria Saclay, Palaiseau, France(Soda团队,法国国家信息与自动化研究所萨克雷中心,法国帕莱索)

AI总结 提出通过合成MDP构建强化学习基础模型,利用固定大小的充分统计量使注意力架构适用,在线和离线实验均优于传统算法。

详情
AI中文摘要

语言和视觉的基础模型由互联网规模的数据驱动,而结构化领域(表格预测、时间序列预测、图学习、强化学习)则不然。替代方案是合成数据,它将负担从收集转移到先验设计。这种先验已经存在于许多结构化任务中:TabPFN及其后续工作通过一个在合成贝叶斯先验上预训练的Transformer解决表格分类问题。我们提出两点。\textbf{首先},强化学习是明显的空白:采样一个合成MDP与采样一个合成表格数据集一样可行,然而没有上下文强化学习工作将先验设计作为主要目标。\textbf{其次},MDP允许一个固定大小的充分统计量,独立于观察到的回合且形状为表格形式,这使得它们直接适用于用于表格基础模型的基于注意力的架构,只需将策略头替换监督目标。这些共同定义了强化学习基础模型的议程。作为概念验证,我们完全在合成MDP上训练一个模型,并表明,无需任务特定的调优,它就能在上下文中解决留出的表格基准,包括在线和离线:在线时,使用比UCB-VI和表格Q-learning少得多的回合;离线时,与VI-LCB竞争。

英文摘要

Foundation models for language and vision are powered by internet-scale data, while structured domains such as tabular prediction are powered by synthetic data. This substitute shifts the challenge from collection to prior design. Such priors already exist for many structured tasks: TabPFN and its successors solve tabular classification with a transformer pretrained on a synthetic Bayesian prior. We make two points. \textbf{First}, reinforcement learning is the conspicuous gap: sampling a synthetic MDP is as feasible as sampling a synthetic tabular dataset, yet no in-context RL work treats prior design as a primary objective. \textbf{Second}, MDPs admit a fixed-size sufficient statistic, independent of the episodes observed and tabular in shape, which makes them directly amenable to the attention-based architectures used for tabular foundation models, with a policy head replacing the supervised target. Together these define the agenda for an RL foundation model. As a proof of concept, we train a Graph Attention Network entirely on synthetic MDPs and show that, with no task-specific tuning, it solves held-out tabular benchmarks in context, both online and offline: online, in far fewer episodes than UCB-VI and tabular Q-learning, and offline, competitively with VI-LCB.

2606.18613 2026-06-19 cs.CL cs.AI 新提交

Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance

LLMs 是否已准备好辅助医生?PhysAssistBench:交互式医患-电子病历辅助基准

Tianming Du, Peijie Yu, Sihan Shang, Danli Shi, My Linh Nguyen, Shengbo Gao, Guangyuan Li, Yinghong Yu, Yan Jiang, Qianlong Zhao, Behzad Bozorgtabar, Shaoxiong Ji, Jiazhen Pan, Daniel Rueckert, Jiancheng Yang

发表机构 * Aalto University(阿尔托大学) Tencent(腾讯) Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Hong Kong Polytechnic University(香港理工大学) Aarhus University(奥胡斯大学) Technical University of Munich(慕尼黑工业大学)

AI总结 提出PhysAssistBench基准,通过构建交互式患者代理评估LLM在医患-EHR交互中的协调能力,发现当前模型不可靠,瓶颈在于多维度协调而非单一能力。

Comments 34 pages with 8 figures

详情
AI中文摘要

医疗LLM最合理的近期角色是辅助而非替代医生,但当前的评估通常测试孤立能力:临床知识、EHR系统交互或患者沟通。而医生辅助需要在同一交互中协调这些能力,其中医生提出不明确的请求,患者模糊描述症状,EHR系统要求精确的工具使用。我们引入PhysAssistBench,一个用于交互式医患-EHR辅助的基准。基于真实的MIMIC-IV病例,PhysAssistBench使用可扩展的流水线构建交互式、记录驱动的患者代理,将静态EHR记录转化为多轮临床场景,同时保持临床事实准确性。PhysAssistBench提供了一个精选的双语评估集,包含1,296个经过人工审查和医生验证的轮次。与领先LLM的实验表明,当前模型在此设置下仍不可靠,这暴露了临床LLM的关键瓶颈:可靠的辅助需要知识、沟通和系统之间的协调,而非任何单一能力的孤立提升。

英文摘要

The most plausible near-term role of medical LLMs is to assist rather than replace physicians, yet current evaluations often test isolated capabilities: clinical knowledge, EHR system interaction, or patient communication. Physician assistance instead requires coordinating these capabilities within the same interaction, where physicians issue underspecified requests, patients describe symptoms ambiguously, and EHR systems demand precise tool use. We introduce PhysAssistBench, a benchmark for interactive doctor-patient-EHR assistance. Built from real MIMIC-IV cases, PhysAssistBench uses a scalable pipeline to construct agentic patients: interactive, record-grounded agents that turn static EHR records into multi-turn clinical scenarios while preserving clinical factuality. PhysAssistBench provides a curated bilingual evaluation set of 1,296 manually reviewed and physician-validated turns. Experiments with leading LLMs show that current models remain unreliable in this setting, which exposes a key bottleneck for clinical LLMs: reliable assistance requires coordination across knowledge, communication, and systems, not isolated gains in any of them.

2606.18611 2026-06-19 cs.SD cs.AI cs.LG stat.ML 新提交

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

QC-GAN: 一种参数高效的四元数Conformer GAN用于高保真语音增强

Shogo Yamauchi, Hideaki Tamori, Makoto Sakai, Yosuke Yamano, Tohru Nitta

发表机构 * The Asahi Shimbun Company(朝日新闻社) Tokyo Woman's Christian University(东京女子基督教大学)

AI总结 提出参数高效的QC-GAN,结合四元数Conformer生成器和MetricGAN训练,通过汉密尔顿积共享权重减少参数量,在VoiceBank+DEMAND上以0.89M参数达到PESQ 3.48,性能媲美两倍大小模型。

Comments 10 pages, 6 figures and 5 tables. Accepted at Interspeech2026

详情
AI中文摘要

我们提出了一种参数高效的语音增强框架——四元数Conformer GAN(QC-GAN),它将四元数Conformer生成器与基于MetricGAN的训练相结合。汉密尔顿积通过结构化权重共享对幅度和相位进行编码,在减少层参数数量的同时保持其相互依赖性。采用度量学习判别器,通过优化近似感知评估分数来最大化感知质量。在VoiceBank+DEMAND数据集上,QC-GAN仅用0.89M参数就达到了3.48的语音质量感知评估(PESQ)分数,其性能与最先进模型相当,而参数量不到后者的一半。一个35K参数的变体实现了3.23的PESQ分数,以显著更少的参数超越了传统方法。在DNS-Challenge 3数据集上的评估进一步证实了其在真实世界条件下的泛化能力。

英文摘要

We propose a parameter-efficient speech enhancement framework, Quaternion Conformer GAN (QC-GAN), which combines a Quaternion Conformer generator with MetricGAN-based training. The Hamilton product encodes the magnitude and phase via structured weight sharing, reducing the number of layer parameters while preserving their interdependencies. A metric-learning discriminator was employed to maximize perceptual quality by optimizing the approximate perceptual evaluation scores. On the VoiceBank+DEMAND dataset, QC-GAN achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.48 with only 0.89M parameters, delivering a performance comparable to state-of-the-art models at less than half their size. A 35K-parameter variant achieved a PESQ score of 3.23, surpassing conventional methods with significantly fewer parameters. Evaluation on the DNS-Challenge 3 dataset further confirmed generalization to real-world conditions.

2606.18485 2026-06-19 cs.SD cs.AI eess.AS 新提交

MagpieTTS-LF: Inference-Time Long-Form Speech Generation Without Training on Long-Form data

MagpieTTS-LF:无需长语音数据训练的推理时长生成长语音生成

Subhankar Ghosh, Jason Li, Paarth Neekhara, Shehzeen Hussain, Ryan Langman, Xuesong Yang, Roy Fejgin

发表机构 * NVIDIA Corporation(英伟达公司)

AI总结 提出MagpieTTS-LF推理时方法,通过软注意力先验、有状态推理和历史感知文本编码,在不重新训练模型的情况下实现连贯的长语音生成。

Journal ref Interspeech 2026

详情
AI中文摘要

神经文本到语音(TTS)系统在短语句上取得了显著质量,但长语音生成表现出韵律漂移、说话人不一致和句子边界伪影。现有方法要么压缩序列、增加上下文长度,要么简单拼接独立合成的片段。我们提出一种称为MagpieTTS-LF的推理时方法,使MagpieTTS能够在不重新训练模型的情况下生成连贯的长语音。我们的方法引入了三个关键创新:(1)软注意力先验,在保留过去和未来上下文的同时引导单调对齐;(2)有状态推理算法,跨句子块维护上下文,确保韵律连续性;(3)历史感知文本编码,利用过去文本进行语篇级韵律规划。在长文本上的实验表明,与其他基线相比,在长距离可懂度、韵律连贯性、说话人一致性和边界自然度方面有显著改进。

英文摘要

Neural Text-to-Speech (TTS) systems achieve remarkable quality on short utterances but long-form speech generation shows prosodic drift, speaker inconsistencies and sentence boundary artifacts. Existing approaches either compress sequences, increase context length or naively concatenate independently synthesized chunks. We present an inference-time approach called MagpieTTS-LF that enables MagpieTTS to produce coherent long-form speech without model retraining. Our method introduces three key innovations: (1) soft attention priors to guide monotonic alignment while preserving past and future context; (2) a stateful inference algorithm that maintains context across sentence chunks, ensuring prosodic continuity; (3) history-aware text encoding that uses past text for discourse-level prosodic planning. Experiments on long texts show significant improvements in long-range intelligibility, prosodic coherence, speaker consistency, and boundary naturalness compared to other baselines.

2606.18413 2026-06-19 cs.AI cs.HC 新提交

Searching for Synergy in Shared Workspace Human-AI Collaboration

在共享工作空间的人机协作中寻找协同效应

Nachiket Kotalwar, Rohini Das, Carolyn Rose

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 研究共享工作空间的人机团队协作,通过Collaborative Gym环境实验发现,缺乏协调结构时增加协作者会降低性能,而结合共享记忆和模拟人在环门控的脚手架可提升团队绩效。

Comments Accepted at ICML 2026 Workshop on Human-AI Co-Creativity

详情
AI中文摘要

自动化AI代理越来越强大,但许多科学和专业任务仍需要人类判断和情境专业知识。我们研究共享工作空间的人机团队,其中AI代理和人类协作者必须在提交最终答案前协调职责。使用Collaborative Gym环境和DiscoveryBench任务,我们考察何时添加模拟人类协作者能提升性能,以及何时过程损失将额外协作者变为协调开销。在1482个会话中,当团队缺乏协调贡献的结构时,添加相关协作者会降低性能。然后我们评估一种脚手架,它结合了共享群体记忆和模拟人在环(HITL)门控,其中选定动作需要指定模拟参与者的批准。这种脚手架在三人团队中最为明显,产生了更高的平均性能,具有更清晰的责任信号和更强的专业知识路由到团队动作。总体而言,人机团队如何协调和整合专业知识与他们可用的能力同样重要。

英文摘要

Automated AI agents are increasingly capable, yet many scientific and professional tasks require human judgment and contextual expertise. We study shared-workspace human-AI teams, where AI agents and human collaborators must coordinate responsibilities before submitting a final answer. Using the Collaborative Gym environment with DiscoveryBench tasks, we examine when adding simulated human collaborators improves performance and when process loss turns additional collaborators into coordination overhead. Across 1,482 sessions, adding relevant collaborators can lower performance when teams lack structure to coordinate their contributions. We then evaluate scaffolding that combines shared group memory with simulated human-in-the-loop (HITL) gates, where selected actions require approval from a designated simulated participant. This scaffolding yields higher mean performance, most clearly in three-person teams, with clearer responsibility signals and stronger routing of expertise to team actions. Overall, how human-AI teams coordinate and integrate expertise matters as much as the capability available to them.

2606.18249 2026-06-19 cs.CV 新提交

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

统一多模态自回归建模:共享上下文-视觉分词器是实现统一的关键

Wujian Peng, Lingchen Meng, Yuxuan Cai, Xianwei Zhuang, Yuhuan Yang, Rongyao Fang, Chenfei Wu, Junyang Lin, Zuxuan Wu, Shuai Bai

发表机构 * Institute of Trustworthy Embodied AI, Fudan University(可信具身AI研究院,复旦大学) Shanghai Innovation Institute(上海创新研究院) Qwen Team, Alibaba Inc.(通义实验室,阿里公司)

AI总结 提出UniAR框架,通过单一离散视觉分词器桥接视觉理解与生成,采用并行位预测和扩散解码,在图像生成和编辑上达到最优,同时保持多模态理解竞争力。

Comments ICML2026. Project page https://sharelab-sii.github.io/uniar-web

详情
AI中文摘要

统一多模态建模旨在将视觉理解和生成集成到单个系统中。然而,现有方法通常依赖两个不同的视觉分词器,这分割了表示空间并阻碍了真正的统一建模。我们提出UniAR,一个统一的自回归框架,其中单个离散视觉分词器作为理解和生成之间的关键桥梁,使得模型能够直接解释其自身生成的视觉标记而无需额外的重新编码,从而实现共享上下文。UniAR采用预训练的视觉编码器,结合多级特征融合和无查找的逐位量化方案,在保留高层语义和低层细节的同时,以最小代价扩展有效视觉词汇。在此基础上,统一自回归模型采用并行逐位预测来联合预测空间分组的多级视觉编码,大幅减少视觉序列长度并加速生成。最后,基于扩散的视觉解码器对离散视觉标记进行操作,以解码高保真图像。通过大规模预训练,随后进行监督微调和强化学习,UniAR在图像生成和图像编辑上达到了最先进的性能,同时在多模态理解基准上保持竞争力。项目页面可在此URL获取。

英文摘要

Unified Multimodal Modeling aims to integrate visual understanding and generation within a single system. However, existing approaches typically rely on two disparate visual tokenizers, which splits the representation space and hinders truly unified modeling. We propose UniAR, a unified autoregressive framework where a single discrete visual tokenizer serves as the key bridge between understanding and generation, enabling a shared context in which the model can directly interpret its own generated visual tokens without additional re-encoding. UniAR adapts a pretrained vision encoder with multi-level feature fusion and a lookup-free bitwise quantization scheme, preserving both high-level semantics and low-level details while scaling the effective visual vocabulary at minimal cost. Building on this, the unified autoregressive model adopts parallel-bitwise-prediction to jointly predict spatially grouped, multi-level visual codes, substantially reducing visual sequence length and accelerating generation. Finally, a diffusion-based visual decoder operates on discrete visual tokens to decode high-fidelity images. Through large-scale pre-training, followed by supervised fine-tuning and reinforcement learning, UniAR achieves state-of-the-art performance on image generation and image editing while remaining competitive on multimodal understanding benchmarks. The project page is available at https://sharelab-sii.github.io/uniar-web.

2606.18191 2026-06-19 cs.AI cs.MA 新提交

DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

DRFLOW:用于个性化工作流预测的深度研究基准

Md Tawkat Islam Khondaker, Raymond Li, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Issam H. Laradji

发表机构 * ServiceNow AI Research(ServiceNow人工智能研究)

AI总结 提出DRFLOW基准,评估AI代理从异构源预测个性化工作流的能力,包含5领域100任务,并设计7个诊断指标,实验显示现有代理性能有限。

详情
AI中文摘要

深度研究(DR)系统越来越多地用于复杂信息寻求任务,但现有工作主要关注生成报告和摘要。相比之下,许多企业任务需要代理识别具体的工作流,即一系列行动步骤。例如,代理不应总结预算政策,而应能确定回答诸如“在固定预算下如何申请新员工?”这类问题所需的步骤。因此,我们引入DRFLOW,一个用于评估代理从异构源预测个性化工作流的基准。每个任务要求代理从分散来源中识别相关证据,然后使用这些证据预测用户任务的正确行动步骤序列。DRFLOW包含跨五个领域的100个任务,1246个参考工作流步骤,基于超过3900个来源。我们定义了七个诊断指标,涵盖事实依据、步骤恢复、结构排序、条件解决和个性化。我们进一步提出DRFLOW-Agent(DRFA),一个面向工作流的参考代理,用于预测个性化工作流。我们表明,尽管DRFA相比强基线代理有所改进(平均F1分数提升高达10.02%),但在这些工作流指标上仍有很大的改进空间,表明预测完整且正确的个性化工作流仍然是深度研究的一个挑战性前沿。

英文摘要

Deep research (DR) systems are increasingly used for complex information-seeking tasks, but existing works mainly focus on generating reports and summaries. In contrast, many enterprise tasks instead require an agent to identify concrete workflows which is a sequence of action-steps. For example, rather than summarizing budgeting policies, an agent should be able to determine the steps needed to answer a question such as: "How do I request new headcount given a fixed budget?". Therefore, we introduce DRFLOW, a benchmark for evaluating personalized workflows predicted by agents from heterogeneous sources. Each task requires the agent to identify relevant evidence from scattered sources, then use that evidence to predict the correct action-step sequence for the user's task. DRFLOW contains 100 tasks across five domains, with 1,246 reference workflow steps grounded in more than 3,900 sources. We define seven diagnostic metrics covering factual grounding, step recovery, structural ordering, condition resolution, and personalization. We further present DRFLOW-Agent (DRFA), a workflow-oriented reference agent to predict personalized workflow. We show that although DRFA improves over strong baseline agents (upto 10.02% average F1 score), there is substantial room for improvement remains across these workflow metrics, indicating that predicting complete and correct personalized workflows remains a challenging frontier for deep research.

2606.18112 2026-06-19 cs.RO cs.CV 新提交

Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System

Qwen-RobotNav 技术报告:为智能体导航系统设计的可扩展导航模型

Jiazhao Zhang, Gengze Zhou, Hale Yin, Yiyang Huang, Zixing Lei, Qihang Peng, Haoqi Yuan, Jie Zhang, Xudong Guo, Xiaoyue Chen, An Yang, Fei Huang, Zhibo Yang, Junyang Lin, Dayiheng Liu, Jingren Zhou, Zhuoyuan Yu, Jingyang Fan, Zhixuan Liang, Pei Lin, Ye Wang, Anzhe Chen, Kun Yan, Xiao Xu, Jiahao Li, Lulu Hu, Minying Zhang, Shurui Li, Wenhu Xiao, Shuai Bai, Xuancheng Ren, Chenxu Lv, Chenfei Wu, Xiong-Hui Chen

发表机构 * Qwen Team(通义实验室)

AI总结 提出 Qwen-RobotNav 可扩展导航模型,通过参数化接口支持多种任务模式和可调观测参数,在15.6M样本上训练,联合视觉语言数据防止行为坍缩,在多个导航基准上取得新最优结果,并展示零样本泛化能力。

详情
AI中文摘要

智能体导航系统需要一个基础导航模型,其观测策略可以在推理时从外部重新配置,因为指令跟随、目标搜索、目标跟踪和自动驾驶共享相同的感知规划主干,但对视觉流的消费方式有根本不同的要求。我们提出 Qwen-RobotNav,一个建立在 Qwen-RobotNav 上的可扩展导航模型,通过一个具有两个互补维度的参数化接口来解决这个问题:多个任务模式选择导航行为,以及可控的观测参数(例如,token 预算、每个摄像头的权重)控制视觉历史的编码方式。通过训练时对所有参数进行随机化,Qwen-RobotNav 对任何推理时配置都具有鲁棒性,无需对 Qwen-RobotNav 主干进行任何架构修改。我们在15.6M样本上训练 Qwen-RobotNav;与视觉语言数据联合训练防止了在仅轨迹训练中观察到的反应性动作序列映射器的坍缩。参数化接口也使 Qwen-RobotNav 成为智能体系统的自然构建块:对于长时域场景,上层规划器将目标分解为子任务,并在情节中动态切换 Qwen-RobotNav 的任务模式和上下文策略,通过重复调用同一模型组合出复杂行为。大量实验表明,Qwen-RobotNav 在主要导航基准上取得了新的最优结果。该模型从2B到8B参数展现出良好的扩展性,联合多任务训练发展出一个跨任务族迁移的共享空间规划基板,并在多样环境中对真实世界机器人展现出强大的零样本泛化能力。

英文摘要

Agentic navigation systems require a base navigation model whose observation strategy can be externally reconfigured at inference time, because instruction following, object search, target tracking, and autonomous driving share the same perception-planning backbone yet demand fundamentally different strategies for consuming the visual stream. We present Qwen-RobotNav, a scalable navigation model built on Qwen-RobotNav that addresses it through a parameterised interface with two complementary dimensions: multiple task modes that select the navigation behaviour, and controllable observation parameters (e.g., token budget, per-camera weights) that govern how visual history is encoded. With training-time randomization over all parameters, Qwen-RobotNav is robust to any inference-time configuration requiring zero architectural modification to the Qwen-RobotNav backbone. We train Qwen-RobotNav on 15.6M samples; co-training with vision-language data prevents the collapse into reactive action-sequence mappers observed in trajectory-only training. The parameterised interface also makes Qwen-RobotNav a natural building block for agentic systems: for long-horizon scenarios, an upper-level planner decomposes goals into sub-tasks and dynamically switches Qwen-RobotNav's task mode and context strategy mid-episode, composing complex behaviours from repeated calls to the same model. Extensive experiments show that Qwen-RobotNav sets new state-of-the-art results across major navigation benchmarks. The model exhibits favourable scaling from 2B to 8B parameters, with joint multi-task training developing a shared spatial-planning substrate that transfers across task families, and demonstrates strong zero-shot generalisation to real-world robots across diverse environments.