arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1985
2606.12425 2026-06-12 cs.CY cs.AI cs.ET cs.HC cs.LG 新提交

An Explainable AI Assistant for Introductory Programming Education: Improving Feedback Reliability with Instructor-AI Collaboration

面向入门编程教育的可解释AI助手:通过教师-AI协作提高反馈可靠性

Muntasir Hoq, Griffin Pitts, Bradford Mott, Seung Lee, Jessica Vandenberg, Shuyin Jiao, Narges Norouzi, James Lester, Bita Akram

发表机构 * North Carolina State University(北卡罗来纳州立大学) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种可解释AI驱动的课堂助手,通过分析学生代码、映射逻辑错误到教师识别的误解并提供教师撰写的反馈,提高入门编程课程中反馈的可靠性和可解释性。

Comments Full paper accepted to the 27th International Conference on AI in Education (AIED 2026)

详情
AI中文摘要

主动学习被广泛认为是提高入门编程课程学习效果的有效方法。然而,不足的教学支持往往限制了学生获得及时、个性化反馈的机会,而这对于掌握基础编程概念至关重要。尽管最近AI的进展,特别是大型语言模型,为反馈提供了可扩展的机会,但可解释性和可靠性问题仍然存在。在本文中,我们提出了一种AI驱动的课堂助手,它利用可解释的AI模型分析学生代码,将逻辑错误映射到教师识别的误解,并提供教师撰写的反馈,从而将可靠性建立在教师定义的教学知识基础上。为了评估我们框架的有效性,我们进行了专家评估以检查其与教师验证反馈的一致性,并在课堂环境中部署了该系统以评估学生对其可用性的看法。结果表明,该助手能够为学生提供准确的、经过教师验证的反馈,同时培养积极的体验。

英文摘要

Active learning is widely recognized as an effective approach for improving learning outcomes in introductory programming courses. However, insufficient instructional support often limits students' access to timely, personalized feedback, which is crucial for mastering foundational programming concepts. Although recent advances in AI, particularly large language models, offer scalable opportunities for feedback, concerns about explainability and reliability remain. In this paper, we present an AI-driven classroom assistant that leverages an explainable AI model to analyze student code, map logical errors to instructor-identified misconceptions, and deliver instructor-authored feedback, thereby grounding reliability in instructor-defined pedagogical knowledge. To evaluate the effectiveness of our framework, we conducted an expert evaluation to examine its alignment with instructor-verified feedback and deployed the system in a classroom setting to assess students' perceptions of its usability. Results indicate that the assistant can provide accurate, instructor-verified feedback to students while fostering a positive experience.

2606.12422 2026-06-12 cs.CY cs.AI cs.HC 新提交

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

通过上下文工程创建和评估K-12生成式AI评分器

Zewei Tian, Alex Liu, Lief Esbenshade, Michael Xiao, Zachary Zhang, Yulia Lápicus, Thomas Han, Kevin He, Min Sun

发表机构 * University of Washington(华盛顿大学) Colleague AI

AI总结 本研究通过上下文工程利用商用基础模型构建LLM评分器,基于MCAS数据评估其在数学、科学和ELA上的评分一致性,发现大参数模型在数学和科学上表现良好,而ELA上差异较大,表明AI更适合作为形成性工具。

Comments Published on the Proceedings of NCME 2026 Conference (https://www.xcdsystem.com/proceedings/ncme/8DbqHwv/presentation/28064.cfm?uuid=3EC982ED-A989-8E53-B42BC86334206028)

详情
AI中文摘要

将大型语言模型(LLM)整合到教育评估中代表了课堂评分实践的一个变革性转变。虽然自动评分系统和机器学习技术已经存在了几十年,但生成式AI(GenAI)现在使教育工作者能够以前所未有的效率和规模实施基于标准的评分(SBG)。本文考察了理论基础,并评估了一个LLM评分器,该评分器使用商用基础模型,结合上下文和提示工程,根据评分标准对学生作业进行评分。利用马萨诸塞州综合评估系统(MCAS)数据的实证评分者间一致性研究,我们使用Claude Sonnet 4、Haiku 4.5、GPT-5和GPT-5 Mini,观察了数学、科学和英语语言艺术(ELA)上的二次加权卡帕(QWK)和均方误差比例减少(PRMSE)。结果表明,LLM评分器,特别是基于参数更多的基础模型时,在数学和科学评估中与人类评分者达到显著一致性,而在ELA中表现各异,表明通用基础模型在特定上下文中可以有效评分。对教师和学生反馈的额外分析显示,对AI生成的叙述性反馈接受度很高,但对数值分数持怀疑态度,这表明LLM最有效地作为形成性工具而非总结性评估者。我们的发现表明,精心设计的混合模型结合AI效率和教师判断,可以减少工作量,提高反馈质量,并支持公平的评估实践,而不取代专业专长。

英文摘要

The integration of large language models (LLMs) into educational assessment represents a transformative shift in classroom grading practices. While automated scoring systems and machine learning techniques have existed for decades, generative AI (GenAI) now enables educators to implement standards-based grading (SBG) with unprecedented efficiency and scale. This paper examines the theoretical foundations and evaluates an LLM grader that uses commercially available foundation models with context and prompt engineering to score student work against a rubric. Drawing on an empirical interrater agreement study using Massachusetts Comprehensive Assessment System (MCAS) data, we observed the Quadratic Weighted Kappa (QWK) and Proportional Reduction in Mean-Squared Error (PRMSE) across mathematics, science, and ELA, using Claude Sonnet 4, Haiku 4.5, GPT-5, and GPT-5 Mini. The results demonstrate that LLM graders, especially when based on foundational models with more parameters, achieve substantial agreement with human raters in mathematics and science assessments, while the performances vary in ELA, suggesting generic foundation models can be effective at scoring in given contexts. Additional analysis of teacher and student feedback reveals strong acceptance of AI-generated narrative feedback but skepticism toward numerical scores, suggesting that LLMs function most effectively as formative tools rather than summative evaluators. Our findings indicate that thoughtfully designed hybrid models that combine AI efficiency with teacher judgment can reduce workload, enhance feedback quality, and support equitable assessment practices without displacing professional expertise.

2606.12419 2026-06-12 cs.CY cs.AI 新提交

GeoDial: A Multimodal Conversational Tutoring Dataset for Geometry Problem-Solving with Visual Tutor Turns

GeoDial:面向几何问题求解的多模态对话式辅导数据集,包含可视化辅导轮次

Sankalan Pal Chowdhury, Junling Wang, Donya Rooein, April Yi Wang, Mrinmaya Sachan

发表机构 * ETH Zurich(苏黎世联邦理工学院) ETH AI Center(苏黎世联邦理工学院人工智能中心) Bocconi University(博科尼大学)

AI总结 提出GeoDial数据集,包含1300+几何师生对话,通过可扩展标注协议整合对话行为、视觉高亮和反馈,微调视觉语言模型发现其难以生成准确图解高亮。

详情
AI中文摘要

几个教育领域严重依赖图表和视觉线索,但现有的大多数辅导数据集仅限于纯文本交互。这限制了AI辅导者的发展,使其无法像人类教师那样以视觉为基础的方式进行教学。因此,我们引入了GeoDial,这是一个多模态辅导数据集,包含来自经验丰富的数学教师的1300多个几何领域的师生对话,其中教学轮次明确地基于图表高亮。我们提出了一种可扩展的标注协议,该协议整合了对话行为、视觉高亮和反馈,从而能够对语言和视觉辅导行为进行细粒度监督。为了说明这一设置带来的挑战,我们在GeoDial上微调了几个视觉语言模型,并评估它们生成辅导话语和图表高亮的能力。虽然监督微调显著提高了生成对话的质量,但它难以生成准确的图表高亮,揭示了当前方法的一个关键局限性,并强调了需要更有效地将视觉推理与教学互动相结合的方法。

英文摘要

Several educational domains rely heavily on diagrams and visual cues, yet most existing tutoring datasets are limited to text-only interactions. This limits the development of AI tutors that can teach in visually grounded ways used by human instructors. Thus, we introduce GeoDial, a multimodal tutoring dataset of over 1.3K teacher-student dialogs in the domain of geometry collected from experienced math teachers, where instructional turns are explicitly grounded in diagram highlights. We propose a scalable annotation protocol that integrates dialog acts, visual highlighting, and feedback, enabling fine-grained supervision of both language and visual tutoring behavior. To illustrate the challenges posed by this setting, we fine-tune several vision-language models on GeoDial and evaluate their ability to generate tutoring utterances and diagram highlights. While supervised fine-tuning substantially improves the quality of generated dialog, it struggles to produce accurate diagram highlights, revealing a key limitation of current methods and highlighting the need for approaches that more effectively integrate visual reasoning with pedagogical interaction.

2606.12415 2026-06-12 cs.CY cs.AI 新提交

The AI Legal Specialist: A Juridically Autonomous Professional Profile for AI Governance

AI法律专家:面向AI治理的司法自主职业画像

Nicola Fabiano

发表机构 * Studio Legale Fabiano, Italy(意大利法务工作室Fabiano) Independent Researcher on Artificial Intelligence, Data Protection, and Privacy(人工智能、数据保护与隐私独立研究员) Expert in the EDPB’s Support Pool of Experts — Field B: Legal Expertise in New Technologies(欧洲数据保护委员会(EDPB)专家支持池——领域B:新技术法律专长) Member, IEEE SA P7007 Working Group on Ontological Standards for Ethically Driven Robotics(IEEE SA P7007工作组成员:伦理驱动机器人学的本体标准) Member, Editorial Advisory Board, Journal of Systemics, Cybernetics and Informatics (JSCI)(《系统学、控制论与信息学杂志》(JSCI)编辑顾问委员会成员) Member, International Institute of Informatics and Systemics (IIIS)(国际信息与系统学研究院(IIIS)成员) Member, International Neural Network Society (INNS)(国际神经网络学会(INNS)成员) Member, United Nations University AI Network (UNU AI Network)(联合国大学人工智能网络(UNU AI Network)成员)

AI总结 本文提出“AI法律专家”这一新型职业画像,该角色具有司法自主性,源于AI监管义务结构,而非技术标准或相邻角色延伸,并基于欧洲电子能力框架构建参考能力架构。

详情
AI中文摘要

人工智能监管在全球范围内的快速扩张,已在多个司法管辖区产生了对专门从事AI法律专业知识的需求,而市场对此的回应是零散的。数据保护官员将其职责范围扩展到数据保护法之外;隐私律师重新定位自己以适应AI;合规官员在其现有手册中增加AI章节。本文认为,这些适应性回应均未能充分覆盖新兴全球AI监管格局所开辟的专业空间,其中欧盟《人工智能法案》((EU) 2024/1689号法规)是最全面的实例,此外还有欧洲委员会《AI框架公约》、美国行政和部门框架,以及英国、加拿大、巴西、中国、日本、新加坡等地的类似举措。需要一种独特的职业画像:AI法律专家,被设想为一位法学家——广义上理解为任何接受过高级法律培训的专业人士——在法律解释与AI治理的交汇处运作。该画像具有司法自主性:其存在源于AI受到实质性监管的任何地方所产生的监管义务结构,而非任何技术标准或相邻角色的扩展。本文提供了该画像的司法基础定义,论证了其相对于相邻角色和国际标准的自主性,提出了一种与欧洲电子能力框架(e-CF,EN 16234-1)相一致的参考能力架构作为方法论选择,并阐述了通过关键绩效指标进行操作性测量的条件。该贡献旨在作为该画像国际标准化的基础,并作为跨司法管辖区实践、课程和采纳的参考。

英文摘要

The rapid global expansion of artificial intelligence regulation has generated, across multiple jurisdictions, a demand for legal expertise dedicated to AI that the market has addressed in a fragmented manner. Data protection officers extend their remit beyond data protection law; privacy lawyers reposition themselves toward AI; compliance officers add AI chapters to their existing manuals. This paper argues that none of these adaptive responses adequately covers the professional space opened by the emerging global AI regulatory landscape, of which the EU Artificial Intelligence Act (Regulation (EU) 2024/1689) is the most comprehensive instance, alongside the Council of Europe Framework Convention on AI, the United States executive and sectoral framework, and analogous initiatives in the United Kingdom, Canada, Brazil, China, Japan, Singapore, and beyond. A distinct professional profile is required: the AI Legal Specialist, conceived as a jurist -- understood broadly to encompass any professional with advanced legal training -- operating at the intersection of legal interpretation and AI governance. The profile is juridically autonomous: it derives its existence from the structure of regulatory obligations generated wherever AI is subject to substantive regulation, rather than from any technical standard or the extension of adjacent roles. The paper provides a juridically grounded definition of the profile, argues for its autonomy from adjacent figures and international standards, proposes a reference competence architecture aligned with the European e-Competence Framework (e-CF, EN 16234-1) as a methodological choice, and articulates the conditions for its operational measurement through key performance indicators. The contribution is intended as a foundation for international standardization of the profile and as a reference for practice, curricula, and adoption across jurisdictions.

2606.13614 2026-06-12 stat.ML cs.LG math.ST stat.TH 新提交

Majority-of-Three is Optimal

三中多数是最优的

Divit Rawal, Nikita Zhivotovskiy

发表机构 * Department of Statistics, University of California, Berkeley(加州大学伯克利分校统计学系)

AI总结 本文通过简短证明,在可实现PAC学习框架下,三个独立一致分类器的多数投票是最优学习器,简化了投票学习器的算法结构和概率分析。

Comments 9 pages

详情
AI中文摘要

我们给出一个简短证明,表明在可实现PAC学习框架下,三个独立一致分类器的多数投票是最优学习器。这证明了最简单投票方案的最优性,同时简化了先前投票学习器的算法结构和概率分析,包括S. Hanneke的算法和K. Green Larsen对装袋的分析。

英文摘要

We give a short proof that the majority vote of three independent consistent classifiers is an optimal learner in the realizable PAC setting. This proves optimality for the simplest voting scheme, while simplifying both the algorithmic structure and the probabilistic analysis of previous voting learners, including the algorithm of S. Hanneke and the analysis of bagging by K. Green Larsen.

2606.12892 2026-06-12 stat.ML cs.LG econ.EM math.ST stat.ME stat.TH 新提交

Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

预测驱动的因果推断:自动去偏机器学习与半监督Riesz回归

Masahiro Kato

发表机构 * University of Tokyo(东京大学)

AI总结 研究半监督设置下因果参数的半参数有效估计,通过结合去偏机器学习和半监督Riesz回归,提出DML-PPCI和TMLE-PPCI方法,实现比仅用标注数据更小的渐近方差。

详情
AI中文摘要

本研究探讨了在半监督设置下因果和结构参数的半参数有效估计。在我们的设置中,除了由结果和回归变量组成的标注观测数据外,还有未标记的辅助回归变量可用。我们的目标是构建因果和结构参数的估计量,其渐近方差小于仅使用标注数据构建的估计量。我们将此框架称为预测驱动的因果推断(PPCI)。我们首先推导了有效影响函数和效率界,这表明使用辅助回归变量可以获得比仅从标注观测数据可达到的效率界更小的渐近方差。然后,通过将有效影响函数与去偏机器学习(DML)框架相结合,我们提出了称为DML-PPCI的方法。如果我们构建一个估计方程估计量,我们称之为EE-DML-PPCI;如果我们构建一个目标学习估计量,我们称之为TMLE-DML-PPCI。两种估计量的渐近方差都与我们推导的效率界相匹配。在构建估计量时,有效影响函数的估计起着重要作用。在我们的研究中,有效影响函数也是一个Neyman正交分数,它依赖于Riesz表示子和回归函数。对于Riesz表示子估计,我们开发了具有收敛速度保证的半监督广义Riesz回归。

英文摘要

This study investigates semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting. In our setting, unlabeled auxiliary regressors are available in addition to labeled observations consisting of outcomes and regressors. Our goal is to construct estimators of causal and structural parameters whose asymptotic variances are smaller than those of estimators constructed using only labeled data. We refer to this framework as prediction-powered causal inference (PPCI). We first derive the efficient influence function and the efficiency bound, which imply that the use of auxiliary regressors can attain a smaller asymptotic variance than the efficiency bound attainable from labeled observations alone. Then, by combining the efficient influence function with the debiased machine learning (DML) framework, we propose methods that we call DML-PPCI. If we construct an estimating-equation estimator, we refer to the method as EE-DML-PPCI; if we construct a targeted-learning estimator, we refer to the method as TMLE-DML-PPCI. The asymptotic variances of both estimators match our derived efficiency bound. In the construction of the estimators, estimation of the efficient influence function plays an important role. In our study, the efficient influence function is also a Neyman orthogonal score, which depends on the Riesz representer and the regression function. For Riesz representer estimation, we develop semi-supervised generalized Riesz regression with convergence rate guarantees.

2606.12694 2026-06-12 cs.DS cs.LG math.PR stat.ML 新提交

A unified complexity bound for logconcave sampling

对数凹采样的统一复杂度界

Yunbum Kook, Santosh S. Vempala

发表机构 * University of Texas at Austin(得克萨斯大学奥斯汀分校)

AI总结 本文通过In-and-Out算法与指数提升,给出了从热启动采样任意对数凹分布的简单、统一且近乎紧的界,主要创新是提升了提升分布的Poincaré常数界。

Comments 5 pages

详情
AI中文摘要

我们给出了一个简单、统一且近乎紧的界,用于从热启动使用In-and-Out算法结合指数提升采样任意对数凹分布。分析中的主要新成分是提升了提升分布的Poincaré常数界。因此,得到的收敛率对于约束设置(例如,限制在凸体上的高斯分布)和良条件设置(例如,强对数凹且光滑的密度)都是近乎紧的。

英文摘要

We give a simple, unified, and nearly tight bound for sampling arbitrary logconcave distributions from a warm start using the In-and-Out algorithm along with exponential lifting. The main new ingredient in the analysis is an improved bound on the Poincaré constant of a lifted distribution. As a consequence, the resulting convergence rate is nearly tight for both constrained settings (e.g., Gaussian restricted to a convex body) and well-conditioned settings (e.g., strongly logconcave and smooth densities).

2606.12646 2026-06-12 stat.ML cs.IT cs.LG math.IT 新提交

Epistemic Uncertainty Is Not the Reducible Kind

认知不确定性并非可约简的那种

Robin Young

发表机构 * University of Cambridge(剑桥大学)

AI总结 证明标准定义中认知不确定性为可被更多数据移除的部分,与互信息度量在扩展上不一致,并提出三部分分解:偶然、样本可约简认知和机制可约简认知不确定性。

详情
AI中文摘要

预测不确定性的标准分类将认知不确定性定义为可通过收集更多数据移除的部分,而标准度量将其与互信息项等同。我们证明该定义与度量在扩展上不一致。在一个显式构造中,度量将所有不确定性归为认知类,但任何数量的训练数据都无法减少它。可约简性反而是(不确定性,获取类)这一对的性质,二分法分解为三部分:偶然不确定性、样本可约简认知不确定性和机制可约简认知不确定性。一个观测值的精确恒等式表明,分布内数据永远不会减少机制不可约简的不确定性,并且通常会增加它。集成分歧,即部署的认知估计,追踪的是训练过程而非认知项。在一致训练下,它降至正真值以下的零,并在插值下等于超参数缩放的初始化噪声。有限样本的证伪测试和种子扫描实验证实了该理论。

英文摘要

The standard taxonomy of predictive uncertainty defines epistemic uncertainty as the part removable by collecting more data, while the standard measure identifies it with a mutual-information term. We prove the definition and the measure are extensionally inconsistent. On an explicit construction, the measure assigns all uncertainty to the epistemic class, yet no quantity of training data reduces it. Reducibility is instead a property of the pair (uncertainty, acquisition class), and the dichotomy resolves into three parts: aleatoric, sample-reducible epistemic, and mechanism-reducible epistemic uncertainty. An exact identity for the value of an observation shows that in-distribution data never reduces mechanism-irreducible uncertainty and generically increases it. Ensemble disagreement, the deployed epistemic estimate, tracks the training procedure rather than the epistemic term. It collapses to zero beneath a positive truth under consistent training, and equals hyperparameter-scaled initialization noise under interpolation. A finite-sample falsification test and seed-swept experiments confirm the theory.

2606.13605 2026-06-12 math.OC cs.LG cs.SY eess.SY 新提交

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

基于机会约束强化学习的分布无关鲁棒轨迹优化

Yashdeep Chaudhary, Roberto Armellin, Harry Holt, Marco Sagliano

发表机构 * Auckland University(奥克兰大学)

AI总结 提出一种分布无关的鲁棒轨迹优化框架,通过机会约束强化学习处理初始条件和过程噪声的不确定性,采用离线标称轨迹与在线仿射闭环校正,在两种不同轨迹设计问题上验证了概率可行性与燃料效率。

Comments Preprint. 39 pages, 16 figures

详情
AI中文摘要

本文提出了一种基于机会约束强化学习的分布无关鲁棒轨迹优化框架。不确定性通过初始条件和过程噪声表示,唯一要求是能够对其进行采样。首先离线计算确定性标称轨迹,然后仅使用强化学习通过结构化仿射闭环校正律(包括前馈控制调整和时变反馈增益)来鲁棒化该基线。通过基于rollout的上尾分位数经验性地强制执行概率可行性,同时通过协方差可行性惩罚来调节终端分散性。该框架在两个性质不同的轨迹设计问题上进行了评估。主要案例研究是一个三维多脉冲地球-火星转移任务,其中学习策略在高斯不确定性下与最近的鲁棒轨迹优化参考进行基准比较,然后在有界均匀不确定性和训练期间未见的过程扰动下进行评估。第二个案例研究是一个随机大气精确火箭着陆问题,用于评估在具有阻力、质量消耗和下滑角约束的短时连续推力设置中的可移植性。结果表明,所提出的框架在保持概率可行性的同时,能够在上尾燃料成本方面保持竞争力,并且相同的鲁棒化框架可以跨异构航天器轨迹规划问题移植,而无需重新设计其核心随机控制结构。

英文摘要

This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

2606.12858 2026-06-12 cs.IT cs.AI cs.CV math.IT 新提交

JSCGC: Joint Source-Channel-Generation Coding for Wireless Generative Communications

JSCGC:面向无线生成式通信的联合源信道生成编码

Tong Wu, Zhiyong Chen, Guo Lu, Li Song, Feng Yang, Meixia Tao, Wenjun Zhang

发表机构 * Cooperative Medianet Innovation Center, the School of Information Science and Electronic Engineering, Shanghai Jiao Tong University(联合中位网创新中心,信息科学与电子工程学院,上海交通大学)

AI总结 提出联合源信道生成编码(JSCGC),用生成模型替换传统解码器,将通信重构问题转化为受感知约束下的受控生成问题,通过联合训练和随机采样框架最大化互信息,在潜空间图像传输中提升特征、语义和分布质量。

Comments submitted to IEEE Journal

详情
AI中文摘要

传统通信系统,包括基于分离的编码和基于学习的联合源信道编码(JSCC),通常是在香农率失真理论下设计的。然而,依赖通用失真度量无法捕捉复杂的人类视觉感知,常常导致模糊或不真实的复原。在本文中,我们提出联合源信道生成编码(JSCGC),一种生成式通信范式,用接收端的生成模型替换传统解码器。接收信号被视为一个条件,控制采样过程进入学习到的条件分布,将通信从用于失真最小化的确定性重构重新表述为在感知约束下用于互信息最大化的受控生成。基于这一表述,我们开发了一个统一的联合训练和高效随机采样框架,并提供了其在学习和推理阶段有效性的理论分析。在潜空间图像传输上的大量实验表明,JSCGC在不同信道条件下持续改善基于特征、语义层面和分布的质量,同时表现出一种以语义不一致而非失真为特征的独特错误行为。

英文摘要

Conventional communication systems, including both separation-based coding and learning-based joint source-channel coding (JSCC), are typically designed under Shannon's rate-distortion theory. However, relying on generic distortion metrics fails to capture complex human visual perception, often resulting in blurred or unrealistic reconstructions. In this paper, we propose Joint Source-Channel-Generation Coding (JSCGC), a generative communication paradigm that replaces the conventional decoder with a generative model at the receiver. The received signal is treated as a condition that controls the sampling process into the learned conditional distribution, reformulating communication from deterministic reconstruction for distortion minimization to controlled generation for mutual information maximization under perceptual constraints. Based on this formulation, we develop a unified joint training and efficient stochastic sampling framework, and provide theoretical analysis of its effectiveness in both learning and inference stages. Extensive experiments on latent-space image transmission demonstrate that the JSCGC consistently improves feature-based, semantic-level, and distributional quality across diverse channel conditions, while exhibiting a distinct error behavior characterized by semantic inconsistency rather than distortion.

2606.12489 2026-06-12 cs.IT cs.LG math.IT 新提交

Masked Neural Detection for Constrained Channel Coding in Molecular Communication

分子通信中约束信道编码的掩码神经检测

Melih Şahin, Ozgur B. Akan

发表机构 * Centre for neXt Communications (CXC), Department of Engineering, University of Cambridge(下一代通讯中心(CXC)、工程系、剑桥大学) Centre for neXt Communications (CXC), Department of Electrical and Electronics Engineering, Koç University(下一代通讯中心(CXC)、电子与电气工程系、科克大学)

AI总结 针对分子通信中的扩散记忆问题,提出掩码神经检测器,结合RLIM约束码与SBRNN,在多数情况下优于未编码检测,平均增益达10.36倍,并设计RLIM定制训练掩码进一步提升性能。

Comments 5 pages, 2 figures, 4 tables

详情
AI中文摘要

分子通信(MC)遭受严重的扩散记忆,因为一个符号释放的分子可能在后续符号期间到达。神经序列检测器,特别是滑动双向循环神经网络(SBRNN),在此类信道中能显著优于阈值检测器。这引出了MC信道编码的一个核心问题:当编码和未编码传输均采用神经检测评估时,先前在阈值检测下建立优势的码是否仍能保持其优势?本文针对游程限制的ISI缓解(RLIM)码(一类先前在MC中显示出巨大BER增益的约束码)回答了这一问题。在测试的工作点中,最佳RLIM-SBRNN接收机在59个案例中的46个中击败了最佳未编码接收机(在阈值和SBRNN检测之间选择),平均增益为10.36倍。我们还为紧凑型SBRNN检测器提出了一个RLIM定制的训练掩码,在236次比较中的227次中改进了未掩码的RLIM-SBRNN,当掩码有益时平均增益为3.267倍。最后,紧凑型掩码RLIM-SBRNN尽管不使用任何信道知识,但与信道状态感知的MLSE具有竞争力。

英文摘要

Molecular communication (MC) suffers from severe diffusion memory because molecules released for one symbol may arrive during later symbols. Neural sequence detectors, especially sliding bidirectional recurrent neural networks (SBRNNs), can substantially outperform threshold detectors in such channels. This raises a central question for MC channel coding: does a code whose advantage was established under threshold detection retain it when both coded and uncoded transmission are evaluated with neural detection? This letter answers this question for run-length-limited ISI-mitigation (RLIM) codes, a class of constrained codes previously shown to provide large BER gains in MC. Across the tested operating points, the best RLIM-SBRNN receiver beats the best uncoded receiver, chosen between threshold and SBRNN detection, in $46$ of $59$ cases, with a mean gain of $10.36\times$ over those wins. We also propose an RLIM-tailored training mask for compact SBRNN detectors, improving the unmasked RLIM-SBRNN in $227$ of $236$ comparisons with $3.267\times$ mean gain when masking is beneficial. Finally, the compact masked RLIM-SBRNN is competitive with channel-state-aware MLSE despite using no channel knowledge.

2606.12806 2026-06-12 quant-ph cs.LG 新提交

Quantum Reservoir Computing for Short-Term Power Load Forecasting in Resource-Constrained Energy Systems

量子储层计算在资源受限能源系统中的短期电力负荷预测

Mansi Od, Param Pathak, Nouhaila Innan, Muhammad Shafique

发表机构 * University of Waterloo(滑铁卢大学)

AI总结 提出一种硬件高效的量子储层计算框架,通过固定量子储层和压缩经典读出层,在有限内存和硬件噪声下实现短期负荷预测,6位量化保留全精度性能并减少81.2%内存。

Comments 11 pages, 9 figures

详情
AI中文摘要

短期负荷预测对于可靠的能源管理至关重要,但在边缘设备上的实际部署需要模型在有限内存、有限测量预算和硬件噪声下保持准确性。本文提出一种硬件高效的量子储层计算(QRC)框架用于能源负荷预测,其中固定量子储层将时间输入窗口转换为高维特征,仅训练经典弹性网络读出层。为降低部署成本,训练后的读出层通过训练后定点量化压缩,位宽从8位到2位。该框架在Tetouan和Spain能源负荷数据集上评估,采用精确态矢量模拟、512次有限采样以及来自IBM FakeTorino和IBM FakeMarrakesh的 realistic 硬件噪声模型。结果表明,6位读出精度保持全精度预测性能,同时将读出内存减少81.2%。低于此阈值时,性能退化依赖于数据集,Tetouan表现出更强的敏感性,而Spain退化更缓慢。硬件噪声验证进一步表明,训练后的读出层可转移到噪声储层状态而无需重新训练。这些发现支持量化QRC作为近期量子时间序列应用的资源感知预测方法。

英文摘要

Short-term load forecasting is essential for reliable energy management, but practical deployment on edge devices requires models that remain accurate under limited memory, finite measurement budgets, and hardware noise. This work proposes a hardware-efficient Quantum Reservoir Computing (QRC) framework for energy load forecasting, where a fixed quantum reservoir transforms temporal input windows into high-dimensional features and only a classical Elastic Net readout is trained. To reduce deployment cost, the trained readout is compressed using post-training fixed-point quantization at bit widths from 8 to 2 bits. The framework is evaluated on the Tetouan and Spain energy load datasets under exact statevector simulation, 512-shot finite sampling, and realistic hardware-noise models from IBM FakeTorino and IBM FakeMarrakesh. Results show that 6-bit readout precision preserves full-precision forecasting performance while reducing readout memory by 81.2%. Below this point, degradation becomes dataset dependent, with Tetouan showing stronger sensitivity and Spain degrading more gradually. Hardware-noise validation further shows that the trained readout transfers to noisy reservoir states without retraining. These findings support quantized QRC as a resource-aware forecasting approach for near-term quantum time-series applications.

2606.13581 2026-06-12 cs.CY cs.CL cs.HC physics.soc-ph 新提交

The Tone of Awareness: Topic, Sentiment, and Toxicity Maps During Mental Health Month on TikTok

意识基调:TikTok 心理健康月期间的主题、情感和毒性地图

Henrique Ferraz de Arruda, Andreia Sofia Teixeira, Pranay Gundala Reddy, Anindya Mondal, Kleber Andrade Oliveira, Filipi Nascimento Silva

发表机构 * Institute for Biocomputation and Physics of Complex Systems (BIFI)(生物计算与复杂系统物理研究所) University of Zaragoza(萨拉戈塔大学) ARAID Foundation(ARAID基金会) Network Science Institute(网络科学研究所) Northeastern University London(伦敦东北大学) Kent Medway Medical School(肯特梅德斯医疗学院) LASIGE(拉西格研究所) Faculdade de Ciências da Universidade de Lisboa(里斯本大学科学学院) Department of Psychology, University of Limerick(利默里克大学心理学系) Observatory on Social Media, Indiana University(社交媒体观察所,印第安纳大学) CSSI - Kellogg School of Management, Northwestern University(CSSI - 北western大学凯洛格管理学院)

AI总结 通过分析 TikTok 2023-2024 年心理健康月期间的视频和评论,使用 BERTopic 提取主题、XLM-T 和 Detoxify 量化情感与毒性,发现视频情感偏负面而评论更混合,毒性在评论中呈长尾分布且集中于特定主题。

Comments 12 pages, 6 figures

详情
AI中文摘要

尽管人们担忧使用 TikTok 对心理健康的影响,但关于创作者如何构建相关内容以及受众如何接收这些内容,我们知之甚少。我们通过 TikTok 研究 API 收集了 2023 年和 2024 年心理健康意识月(5月)的 28,341 个 TikTok 视频和 80,130 条评论的内容,并研究了意识基调在不同主题和年份间的变化。我们将“基调”定义为心理健康话语的情感和人际框架,通过情感和毒性度量来操作化。我们使用 BERTopic 和对数几率关键词从视频文本中提取主题,然后分别对视频转录和评论量化主题条件下的情感(XLM-T)和毒性(Detoxify)。情感捕捉内容的效价,而毒性反映有害或辱骂性语言的存在。我们发现跨年份存在一组稳定的重复主题,涵盖临床状况、情感披露、自我护理和活动导向内容,且参与度高度偏向一小部分主题。所有情感和毒性分析均分别针对视频内容和评论进行计算,使我们能够区分内容生产和受众接收。视频中的情感对于情感强烈的主题通常是负面的,而评论则倾向于转向更混合或积极的极性,尤其是对于自杀预防。毒性总体中位数较低,但在评论中表现出比视频更长的尾部异常值,这些异常值在评论中更为明显,并集中在特定主题(例如“Duet”、“Suicide Prevention”和“Psychisch”)。总体而言,我们的结果提供了意识月活动期间 TikTok 上心理健康话语的主题级分解。

英文摘要

Despite raising concerns about the mental health effects associated with the usage of TikTok, little is known about how related content is framed by creators and received by audiences. We collect the content of 28,341 TikTok videos and 80,130 comments from Mental Health Awareness Month (May) in 2023 and 2024 via the TikTok Research API, and study how the tone of awareness varies across topics and years. We characterize "tone" as the emotional and interpersonal framing of mental health discourse, operationalized through sentiment and toxicity measures. We extract topics from video text using BERTopic and log-odds keywords, then quantify topic-conditioned sentiment (XLM-T) and toxicity (Detoxify) separately for video transcriptions and comments. Sentiment captures the affective valence of content, while toxicity reflects the presence of harmful or abusive language. We find a stable set of recurring themes across years, spanning clinical conditions, emotional disclosure, self-care, and campaign-oriented content, with engagement highly skewed toward a small subset of topics. All sentiment and toxicity analyses are computed separately for video content and comments, allowing us to distinguish between content production and audience reception. Sentiment in videos is often negative for emotionally charged topics, while comments tend to shift toward more mixed or positive polarity, especially for suicide prevention. Toxicity is low in median overall, but exhibits longer-tailed outliers in comments than in videos that are more pronounced in comments and concentrated in specific topics (e.g., "Duet", "Suicide Prevention", and "Psychisch"). Overall, our results provide a topic-level decomposition of mental health discourse on TikTok during awareness-month campaigns.

2606.13422 2026-06-12 quant-ph cs.LG physics.flu-dyn 新提交

Foundations of Practical Quantum Advantage in Quantum-Informed Machine Learning for Predicting Chaos

量子信息机器学习预测混沌的实用量子优势基础

Maida Wang, Xiao Xue, Minh Chung, Peter V. Coveney

发表机构 * Centre for Computational Science, University College London(大学学院伦敦计算科学中心) Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities(巴伐利亚科学院和人文科学莱比锡超算中心) Centre for Advanced Research Computing, University College London(大学学院伦敦先进研究计算中心)

AI总结 提出基于高阶量子统计先验的量子优势机制,通过两阶段优势(表示与提取)证明量子-经典复制测量复杂度分离,并在湍流和天气预报中验证。

详情
AI中文摘要

我们为混沌动力系统的量子信息机器学习中的实用量子优势机制建立了理论基础。一族由k索引的高阶量子统计先验(Q-Priors)在n_q = kq个量子比特上承载不变测度的k点边际,扩展了先前工作的单站点构造。我们证明了一个两阶段优势。在表示阶段,叠加和纠缠紧凑地存储了n_q个量子比特上不变测度的不可分解空间相关性。在提取阶段,对两个副本进行联合贝尔测量,以独立于n_q的副本对数量估计任何事后泡利泛函,而相应的全泡利读出的任何自适应单副本协议需要Ω(2^(n_q))个副本;这是复制测量复杂度中可证明的量子-经典分离。双副本读出在模拟和IQM超导处理器上实现。两个案例研究将这一机制实例化到具有独立科学价值的工作流程中:一个湍流通道流研究,其中双副本读出产生了不变测度的一个命名的非对角关联子(速度方向相干性),以及一个基于欧洲中期天气预报中心ERA5再分析的中期天气预报工作流程,其中对角k ≤ 2 Q-Prior引导Koopman展开,在48-240小时预报时效内将异常相关系数技能提高10-39%,并减少了滚动预报到静态平均场的长期崩溃。我们的实用优势定义的两个条件在互补层面上得到满足,为在容错硬件之前实现实用量子优势确定了一条候选路径。

英文摘要

We develop theoretical foundations for a practical quantum-advantage mechanism in quantum-informed machine learning for chaotic dynamical systems. A family of k-indexed higher-order quantum statistical priors (Q-Priors) hosts the k-point marginal of the invariant measure on n_q = kq qubits, extending the single-site construction of prior work. We prove a two-stage advantage. In the representation stage, superposition and entanglement compactly store non-factorisable spatial correlations of the invariant measure on n_q qubits. In the extraction stage, joint Bell measurements on two copies estimate any post hoc Pauli functional with a copy-pair count independent of n_q, whereas any adaptive single-copy protocol for the corresponding full-Pauli read-out requires Omega(2^(n_q)) copies; this is a provable quantum-classical separation in copy-measurement complexity. The two-copy read-out is realised in simulation and on IQM superconducting processors. Two case studies instantiate the mechanism in workflows of independent scientific value: a turbulent channel-flow study in which the two-copy read-out yields a named non-diagonal correlator of the invariant measure (the velocity-direction coherence), and a medium-range weather forecasting workflow on the European Centre for Medium-Range Weather Forecasts ERA5 reanalysis in which the diagonal k <= 2 Q-Prior steers a Koopman rollout, improves anomaly-correlation skill by 10-39% across 48-240 h lead times, and reduces the long-horizon collapse of rollouts onto a static mean field. The two conditions of our practical-advantage definition are met at complementary levels, identifying a candidate route to practical quantum advantage before fault-tolerant hardware.

2606.12824 2026-06-12 eess.IV cs.AI cs.CV physics.med-ph 新提交

Acquisition state behaves as a structured, measurable variable governing lung-nodule AI: kernel-driven measurement instability and noise-driven detection fragility, invisible to DICOM metadata

采集状态作为结构化、可测量变量影响肺结节AI:核驱动的测量不稳定性和噪声驱动的检测脆弱性,DICOM元数据不可见

Daniel Soliman

发表机构 * Daniel Soliman, M.S(丹尼尔·索利曼,硕士)

AI总结 研究通过LUNA16训练的RetinaNet检测器,发现CT采集状态(重建核与噪声)独立影响AI的测量与检测性能,且无法从DICOM元数据恢复,提出采集感知的输入验证层。

详情
AI中文摘要

医学影像AI治理正在规范化:2026年ACR-SIIM实践参数建议本地验收测试和持续漂移监测,ACR Assess-AI注册使用DICOM元数据监测AI输出。我们认为在输出指标之下存在一个必要但目前未监测的层:输入研究是否保持在模型验证过的采集范围内。使用LUNA16训练的MONAI RetinaNet肺结节检测器,我们测试采集状态是否表现为结构化的可测量变量。在仅重建核不同的真实配对CT(NLST B30f vs B80f)上,核单独使AI测量的直径发生偏移,并在5.2%(155个结节中的8个)中翻转了Fleischner尺寸类别,而检测置信度不变(Wilcoxon p=0.22)。在受控的LIDC-IDRI扰动下,效应按轴分离:噪声轴降低检测置信度(p=5.9e-32,集中在6mm以下结节)但不影响测量,而频率/核轴破坏测量(p=8.6e-13)但不影响检测。一个4特征像素指纹恢复了重建身份(真实CT上患者级AUC约0.95,QIBA体模上0.995),而ConvolutionKernel DICOM标签无信息(不同重建标签相同)。核轴跨四个制造商传输(留一制造商AUC 0.94-0.98,与制造商内上限匹配)。因此采集状态映射到不同的AI故障模式:频率内容对应测量可靠性,噪声对应检测灵敏度,且无法从元数据恢复。采集感知的输入侧验证是现在进入影像AI认证的验收测试和漂移监测要求中缺失的层。

英文摘要

AI governance for medical imaging is formalizing: the 2026 ACR-SIIM Practice Parameter recommends local acceptance testing and ongoing drift monitoring, and the ACR Assess-AI registry monitors AI outputs using DICOM metadata for context. We argue that a necessary, currently unmonitored layer sits beneath output metrics: whether incoming studies remain within the acquisition envelope a model was validated on. Using a LUNA16-trained MONAI RetinaNet lung-nodule detector, we test whether acquisition state behaves as a structured, measurable variable. On real paired CT differing only in reconstruction kernel (NLST B30f vs B80f), kernel alone shifted AI-measured diameter and flipped a Fleischner size category in 5.2% (8 of 155) of nodules at fixed patient and acquisition, while detection confidence was unchanged (Wilcoxon p=0.22). Under controlled LIDC-IDRI perturbations the effects dissociated by axis: the noise axis degraded detection confidence (p=5.9e-32, concentrated in nodules under 6 mm) but not measurement, while the frequency/kernel axis corrupted measurement (p=8.6e-13) but not detection. A 4-feature pixel fingerprint recovered reconstruction identity (patient-level AUC about 0.95 on real CT, 0.995 on a QIBA phantom) where the ConvolutionKernel DICOM tag was uninformative (identical labels across reconstructions). The kernel axis transported across four manufacturers (leave-one-vendor-out AUC 0.94-0.98, matching the within-vendor ceiling). Acquisition state thus maps to distinct AI failure modes, frequency content to measurement reliability and noise to detection sensitivity, and is not recoverable from metadata. Acquisition-aware, input-side validation is the missing layer for the acceptance-testing and drift-monitoring requirements now entering imaging-AI accreditation.

2606.12559 2026-06-12 physics.comp-ph cs.LG cs.NA math.NA physics.flu-dyn 新提交

Feature-preserving Latent-EnKF for Data Assimilation of Flows with Shocks

保持特征的潜在EnKF用于含激波流动的数据同化

Hemanth Chandravamsi, Hangchuan Hu, Ponkrshnan Thiagarajan, Tamer A. Zaki

发表机构 * Department of Mechanical Engineering, Johns Hopkins University(约翰霍普金斯大学机械工程系)

AI总结 针对含激波流动中EnKF因多模态统计产生伪振荡的问题,提出在学习的低维潜在空间进行集合更新以保持激波特征,并通过共享解码器恢复物理状态,数值实验验证了无伪振荡的准确特征恢复。

详情
AI中文摘要

集合卡尔曼滤波(EnKF)被广泛用于顺序数据同化,但对于具有间断的解(如可压缩流中的激波)会失效。激波位置的不确定性导致多模态集合统计,违反了EnKF的高斯假设,在分析状态中产生大尺度伪振荡。我们引入了一种保持特征的潜在EnKF,在学习的低维潜在空间中进行集合更新,其中激波和流动特征具有光滑流形表示,从而在EnKF分析期间保持尖锐特征。更新后的潜在状态通过所有集合成员共享的解码器映射回物理状态。该算法消除了先前方法中使用的成员特定有序训练和正性下限。在Sod激波管和马赫2激波与二维圆柱相互作用的数值实验中,使用稀疏和噪声观测,结果显示能够准确恢复激波和接触间断的特征,且无伪振荡。

英文摘要

The ensemble Kalman filter (EnKF) is widely adopted for sequential data assimilation, but fails for solutions with discontinuities, such as shocks in compressible flows. Uncertainty in shock location induces multimodal ensemble statistics that violate the Gaussian assumptions underlying the EnKF, producing large-scale spurious oscillations in the analysis state. We introduce a feature-preserving latent-EnKF that performs the ensemble update in a learned low-dimensional latent space, where shock and flow features admit a smooth manifold representation, thereby preserving sharp features during EnKF analysis. The updated latent state is mapped back to physical state through a shared decoder for all ensemble members. The algorithm eliminates the member-specific ordered training and positivity flooring used in prior approaches. Numerical experiments on a Sod shock tube and Mach 2 shock interaction with a 2D cylinder, using sparse and noisy observations, show accurate feature recovery of shocks and contact discontinuities without spurious oscillations.

2606.12502 2026-06-12 physics.soc-ph cs.AI 新提交

A Mathematical Theory of Value: a synthesis on goal-directed agency under resource constraints

价值的数学理论:资源约束下目标导向行为的综合

Cheng Qian

发表机构 * Cheng Qian(陈倩)

AI总结 本文提出价值是目标导向主体在资源约束下转化资源为目标进度的速率,通过尺度不变性公理导出对数度量,并推导出价值编码定理,实现价值与信息论的统一。

Comments Also available at https://doi.org/10.5281/zenodo.20487041 (v5)

详情
AI中文摘要

我们提出,价值——目标导向主体创造、毁灭和交换的量——是与信息同类的合法结构量。遵循香农的方法,我们做出一个无情的抽象:价值是主体将资源转化为目标进度的速率,相对于由其目标固定的参考系。尺度不变性公理强制采用对数度量 $V=\sum_i k_i \ln e_i$;通过Peters(2019)的遍历性论证,再投资资源的复利强制了相同的形式。这两条路径是亲缘关系而非独立;它们的一致性是一种一致性检查,而非过度确定。我们推导了价值的编码定理:$\Delta G \le I(X;Y)$,由贝叶斯比例分配实现;实现的价值分解为 $G=D(q\\|r)-D(q\\|p)$,将错位识别为可测量的浪费。对于群体,价值是参考系相关的,而价格是参考系无关的;共享资源并融合感知的舰队继承上限 $G_{\mathrm{fleet}} \le I(X;Y_{1:m}) \le H(X)$(一个推论;早期的求和形式声明是错误的,并在v5中修正)。动力学层产生了实然/应然不对称性,从该不对称性中,对齐作为控制稳定性条件出现,并具有闭式残差。我们在预注册的规模扩展中测试了单参考系定律于实时语言模型:感知互信息跟踪实际能力而非参数数量(在30个模型×领域点上合并的Spearman $\rho = 0.977$),样本外 $\Delta G$ 跟踪 $I(X;Y)$,过度自信是可测量的耗散;进一步的预注册测试显示,该桥在四种任务形状上形状不变($n=42$,斜率0.953)。这些机制没有一个是全新的——广义Kelly、Armstrong & Mindermann(2018)、经典控制;贡献在于它们的统一以及随之而来的治理映射(监督上的激励设计)。

英文摘要

We propose that value -- the quantity goal-directed agents create, destroy, and exchange -- is a lawful structural quantity in the same category as information. Following Shannon's method, we make one ruthless abstraction: value is the rate at which an agent converts a resource into goal-progress, relative to a frame fixed by its goal. A scale-invariance axiom forces a logarithmic measure, $V=\sum_i k_i \ln e_i$; compounding of a reinvested resource forces the same form via the ergodicity argument of Peters (2019). The two routes are kin rather than independent; their agreement is a consistency check, not an over-determination. We derive a coding theorem of value: $ΔG \le I(X;Y)$, achieved by Bayes-proportional allocation; realized value decomposes as $G=D(q\|r)-D(q\|p)$, identifying misalignment with measurable waste. For populations, value is frame-relative while price is frame-independent; a fleet that pools its resource and fuses its perception inherits the ceiling $G_{\mathrm{fleet}} \le I(X;Y_{1:m}) \le H(X)$ (a corollary; an earlier sum-form claim was wrong and is corrected in v5). A dynamical layer yields an is/ought asymmetry from which alignment emerges as a control-stability condition with a closed-form residual. We test the single-frame laws on live language models in a pre-registered scale-up: perception mutual information tracks realized capability rather than parameter count (Spearman $ρ= 0.977$ pooled over 30 model$\times$domain points), out-of-sample $ΔG$ tracks $I(X;Y)$, and over-confidence is measurable dissipation; a further pre-registered test shows the bridge is shape-invariant across four task shapes ($n=42$, slope 0.953). None of the mechanisms is individually new -- generalized Kelly, Armstrong & Mindermann (2018), classical control; the contribution is their unification and the governance mapping (incentive design over oversight) that follows.

2606.13535 2026-06-12 hep-ex cs.AI hep-ph 新提交

AgentRivet: an automated system for producing Rivet routines from journal publications

AgentRivet:从期刊论文自动生成Rivet例程的系统

Antonio J. Costa, Caterina Doglioni, Christian Gütschow, Andrew D. Pilkington, Sukanya Sinha

发表机构 * Department of Physics & Astronomy, University of Manchester(曼彻斯特大学物理与天文学系) Centre for Advanced Research Computing, University College London(伦敦大学学院先进计算中心)

AI总结 提出基于大语言模型的自动化工作流AgentRivet,从论文提取物理分析信息并生成缺失的Rivet例程,经代码和物理审查实现质量控制,在ATLAS和CMS测量中生成语法错误少、物理保真度合理的例程。

详情
AI中文摘要

粒子物理对撞机实验将Rivet例程作为模型无关测量分析保存策略的一部分。Rivet是一个C++工具包,允许将新的理论模型与测量结果进行比较,从而帮助开发和调整蒙特卡洛事件生成器,以及搜索标准模型之外的新物理。然而,已知分析覆盖不完整,只有39%的测量具有文档化且公开可用的Rivet例程。在本文中,我们设计并实现了一个基于大语言模型的自动化工作流,旨在提供缺失的例程。这个多步骤工作流称为AgentRivet,从已发表的论文中提取物理分析信息,并编写缺失的Rivet例程,中间代码和物理审查作为自主质量控制的一部分。我们报告了使用OpenAI、Anthropic和Google提供的商业大语言模型,针对ATLAS和CMS实验的两个近期测量所获得的结果。我们发现AgentRivet生成了语法错误很少的合格Rivet例程。例程的物理保真度合理,并遵循相关出版物中的解释。然而,物理实现问题确实出现,并使用AgentRivet产生的产物进行了调查。大多数物理实现问题源于给定出版物中微妙但模糊的定义,尽管有些模型即使在给出明确定义时也难以实现复杂的可观测量。

英文摘要

Particle physics collider experiments provide Rivet routines as part of the analysis preservation strategy for model-independent measurements. Rivet is a C++ toolkit that allow new theoretical models to be compared to the measurements, thus aiding the development and tuning of Monte Carlo event generators as well as searches for physics beyond the Standard Model. However, analysis coverage is known to be incomplete, with only 39% of measurements having documented and publicly available Rivet routines. In this article, we design and implement an automated workflow based on Large Language Models with the goal of providing the missing routines. This multi-step workflow, referred to as AgentRivet, extracts the physics analysis information from published papers and writes the missing Rivet routines, with intermediate code- and physics- reviews as part of an autonomous quality control. We report the results obtained using commercial Large Language Models, provided by OpenAI, Anthropic, and Google, for two recent measurements from the ATLAS and CMS experiments. We find that AgentRivet produces competent Rivet routines with few syntax errors. The physics fidelity of the routines is reasonable and follows the explanations given in the relevant publications. Nevertheless, physics-implementation issues do arise and are investigated using the artefacts produced by AgentRivet. The majority of physics implementation issues arise from subtle-but-ambiguous definitions in the given publication, although some models struggle to implement complex observables even when clear definitions are given.

2606.13454 2026-06-12 physics.optics cond-mat.dis-nn cs.ET cs.LG 新提交

Optical Implementation of Equilibrium Propagation Using Spatial Photonic Ising Machines

利用空间光子伊辛机实现平衡传播的光学实现

Dimitri Vanden Abeele, Daniele Veraldi, Davide Pierangeli, Claudio Conti, Serge Massar

发表机构 * Laboratoire d’Information Quantique, Université Libre de Bruxelles (ULB)(量子信息实验室,布鲁塞尔自由大学) Dipartimento di Fisica, Sapienza Università di Roma(物理学系,萨皮恩扎罗马大学)

AI总结 提出利用空间光子伊辛机光学实现平衡传播,通过规范变换方法编码神经元状态和可训练模式,在Wine和MNIST数据集上验证了能效物理实现的可行性。

详情
AI中文摘要

平衡传播为训练基于能量的网络提供了一种传统机器学习的引人注目的替代方案。在这里,我们展示了使用空间光子伊辛机(SPIM)的平衡传播(EP)的混合光学-数字实现。SPIM利用规范变换方法,通过空间光调制器将连续神经元状态和秩1二进制可训练模式光学编码为相位调制,并使用有限差分方案实现推理。实验系统在Wine分类数据集上进行了评估。该方法的潜力,包括使用连续耦合和结构化耦合矩阵,在更复杂的MNIST数据集上通过数值评估。我们的工作为平衡传播的节能物理实现提供了一条具体路径。

英文摘要

Equilibrium Propagation offers a compelling alternative to traditional machine learning for training energy-based networks. Here we demonstrate a hybrid optical-digital implementation of EP using a Spatial Photonic Ising Machine (SPIM). The SPIM exploits the gauge transformation method to optically encode both continuous neuron states and rank-1 binary trainable patterns as phase modulations via a spatial light modulator, with inference realized using a finite difference scheme. The experimental system is evaluated on the Wine classification dataset. The potential of this approach, including the use of continuous couplings and structured coupling matrices, is evaluated numerically on the more complex MNIST dataset. Our work provides a concrete pathway toward energy-efficient physical implementations of Equilibrium Propagation.

2606.13045 2026-06-12 cond-mat.dis-nn cs.LG 新提交

A solvable model for unsupervised federated learning

无监督联邦学习的一个可解模型

Giovanni Catania, Aurélien Decelle, Gianluca Manzan, Beatriz Seoane, Daniele Tantari

发表机构 * Institute for Cross-disciplinary Physics and Complex Systems IFISC (CSIC-UIB)(跨学科物理与复杂系统研究所(IFISC,CSIC-UIB)) Departamento de Física Teórica, Universidad Complutense de Madrid(马德里complutense大学理论物理系) Escuela Técnica Superior de Ingenieros Industriales, Universidad Politécnica de Madrid(马德里理工大学工业工程师学院) GISC - Grupo Interdisciplinar de Sistemas Complejos(跨学科复杂系统小组) Inria Saclay - Tau team(萨克利Inria团队) Department of Mathematics, University of Bologna(博洛尼亚大学数学系)

AI总结 提出一个理论框架,通过教师-多学生交互场景分析联邦学习,证明学生间交互能系统提升学习性能,并推导最优贝叶斯条件,映射到受限玻尔兹曼机。

详情
AI中文摘要

我们引入了一个理论框架,用于在生成式设置中分析联邦学习,通过教师-多学生交互场景,其中每个学生接收不同的数据实现,要么通过不同的噪声破坏,要么通过访问不同的子集,可能大小不同。使用平衡无序系统的理论工具,我们解析地表明学生间的交互系统地提升了学习性能:高噪声学生需要更少的样本来恢复潜在模式,而低噪声学生与真实信号的重叠更大。我们推导了教师恢复的最优贝叶斯条件,作为样本复杂度、噪声水平和交互强度的函数,并通过数值模拟验证了这些预测。得到的动力学可以映射到具有结构化隐藏层的受限玻尔兹曼机中的平衡采样,从而为交互如何改进分布式生成建模提供了原则性的理论理解。

英文摘要

We introduce a theoretical framework for analyzing federated learning in a generative setting through a teacher-multiple interacting students scenario, in which each student receives a distinct realization of the data, either through a different noise corruption or by accessing a different subset, possibly of varying size. Using theoretical tools in equilibrium disordered system, we analytically show that interactions among students systematically enhance learning performance: highly noisy students require fewer samples to recover the underlying pattern, while low-noise students achieve a larger overlap with the ground-truth signal. We derive the optimal Bayesian conditions for teacher recovery as functions of the sample complexity, noise level, and interaction strength, and validate these predictions through numerical simulations. The resulting dynamics can be mapped onto equilibrium sampling in a Restricted Boltzmann Machine with a structured hidden layer, providing a principled theoretical understanding of how interactions improve distributed generative modeling.

2606.11930 2026-06-12 cs.HC cs.AI cs.CV 新提交

Frozen Multimodal Embeddings for AI-Assisted Interview Assessment of Personality and Cognitive Ability

冻结多模态嵌入用于异步视频面试中的个性与认知能力评估

Kuo-En Hung, Hung-Yue Suen, Shih-Ching Yeh, Hsiang-Wen Wang

发表机构 * Technology Application and Human Resource Development, National Taiwan Normal University(台湾国立台中教育大学技术应用与人力资源发展系) Computer Science and Information Engineering, National Central University(台湾国立中央大学计算机科学与资讯工程系) Institute of Photonic System, National Yang Ming Chiao Tung University(台湾阳明交通大学光电系统研究所)

AI总结 针对异步视频面试中标注数据有限的高维多模态学习问题,提出使用冻结多模态编码器(CLIP、Whisper、RoBERTa等)结合低容量下游模型,在个性预测任务上实现MSE降低19.1%,并发现认知能力预测中存在数据集捷径。

Comments 9 pages, 1 figure, 5 tables

详情
AI中文摘要

从异步视频面试(AVI)中预测心理特质是一个具有挑战性的多模态学习问题,因为标注数据集有限,而每个回答包含高维的视觉、声学和语言信号。本文介绍了我们针对ACM多媒体AVI挑战2026的解决方案,该挑战评估两个任务:Track~1从与个性相关的面试回答中预测自我报告的HEXACO个性特质,Track~2从结构化AVI回答中对认知能力水平进行分类。我们将该问题视为小样本表示学习任务。我们不微调大型预训练模型,而是使用冻结的多模态编码器,包括用于视觉特征的CLIP、用于声学特征和转录的Whisper,以及用于文本表示的RoBERTa、E5和DeBERTaV3,随后使用低容量下游模型。对于Track~1,我们的特质特定回归和晚期融合系统实现了平均验证MSE为0.2696,优于官方基线0.3334。消融结果显示,从全局模型(0.3189)到逐特质建模(0.2871)再到逐特质晚期融合(0.2696)的三步改进,相对于官方基线MSE相对降低了19.1%。对于Track~2,一个紧凑的主题属性基线达到了0.5781的准确率,而我们的多模态集成达到了0.5313,两者均高于官方基线0.4062。我们将这一结果解释为验证分割中可能存在主题属性捷径的证据,而非从AVI内容中进行的稳健认知推理。总体而言,我们的发现表明,基于AVI的心理评估受益于特质特定的多模态建模,但认知能力预测需要仔细控制数据集捷径。

英文摘要

Predicting psychological traits from asynchronous video interviews (AVIs) is a challenging problem in AI-assisted interview assessment because labeled datasets are limited while each response contains high-dimensional visual, acoustic, and verbal signals. This paper presents our solution for the ACM Multimedia AVI Challenge 2026, which evaluates two tasks: Track~1 predicts self-reported HEXACO personality traits from personality-related interview responses, and Track~2 classifies cognitive ability levels from structured AVI responses. We treat the problem as a small-sample representation learning task. Instead of fine-tuning large pretrained models, we use frozen multimodal encoders, including CLIP for visual features, Whisper for acoustic features and transcripts, and RoBERTa, E5, and DeBERTaV3 for textual representations, followed by low-capacity downstream models. For Track~1, our trait-specific regression and late-fusion system achieves an average validation MSE of 0.2696, improving over the official baseline of 0.3334. Ablation results show a three-step improvement from a global model (0.3189), to per-trait modeling (0.2871), to per-trait late fusion (0.2696), corresponding to a 19.1% relative MSE reduction over the official baseline. For Track~2, a compact subject-attribute baseline reaches 0.5781 accuracy, while our multimodal ensemble reaches 0.5313, both above the official baseline of 0.4062. We interpret this result as evidence of possible subject-attribute shortcuts in the validation split rather than robust cognitive inference from AVI content. Overall, our findings suggest that AVI-based psychological assessment benefits from trait-specific multimodal modeling, but cognitive ability prediction requires careful control of dataset shortcuts.

2606.11654 2026-06-12 cs.IR cs.CL cs.HC cs.SI 新提交

The Long Tail, Not the Front Page: Cold-Start Prediction of Crowd Highlight Salience

长尾而非首页:众包高亮显著性的冷启动预测

Kazuki Nakayashiki, Keisuke Watanabe

发表机构 * Glasp Inc.(Glasp公司)

AI总结 本文研究在无读者标记时,如何从文本预测文档的众包高亮显著性,提出基于句子嵌入和位置/上下文特征的对数排序模型,在平均精度上比位置基线提升0.044,并证明该优势源于真实读者标记的学习。

Comments 10 pages, 3 figures, 4 tables

详情
AI中文摘要

社交高亮工具最有用的信号——一群读者标记的段落——仅存在于人们已经阅读过的文档中。能否在标记积累之前,从文本预测文档的聚合众包显著性?先前关于此数据的研究发现,零样本语言模型恢复高亮位置的效果不如简单的基线(位置),因此我们询问,在高亮语料上训练的模型能否击败该基线。使用预注册的模型阶梯和按文档的聚类自助法,我们发现一个微小但稳健的优势:基于句子嵌入和位置/上下文特征的对数排序器比位置基线平均精度高出+0.044(95%置信区间[+0.029, +0.058];在97%的重采样中超过预注册的边界delta=0.03,且在流水线重复运行中稳定)。两种无监督抽取式基线(质心、LexRank风格中心性)均输给位置基线,而训练模型比它们高出+0.108,因此该优势并非由通用无监督代理恢复——它反映了从真实读者标记中学习。在产品术语中,precision@3从0.25上升到0.39(相对提升55%),模型在69%的文档上击败位置基线。消融实验将优势归因于原始嵌入(+0.014)和训练增强(+0.010),每个都有正的置信区间。该优势并非时间泛化失败,我们也没有发现内容漂移或近似重复泄露可以解释它的证据。标准化回归显示,优势主要由文档流行度(流行度越低,优势越大)和标签可靠性决定。它仅在流行度最高的内容上几乎消失;在那里,是位置基线变强,而非模型变弱。由于我们的评估条件设定在最终积累了读者的文档上,这些结果是回顾性的冷启动模拟。

英文摘要

A social highlighter's most useful signal -- which passages a crowd of readers marks -- exists only for documents people have already read. Can the aggregate crowd salience of a document be predicted from its text before its marks accumulate? Prior work on this data found that zero-shot language models recover highlight locations worse than a trivial lead (position) baseline, so we ask whether a model trained on the highlight corpus can beat that baseline. Using a pre-registered ladder of models and a by-document cluster bootstrap, we find a small but robust edge: a logistic ranker over sentence embeddings and positional/contextual features beats the lead baseline by +0.044 average precision (95% CI [+0.029, +0.058]; clears a pre-registered margin delta=0.03 in 97% of resamples, and stable across pipeline re-runs). Two unsupervised extractive baselines (centroid, LexRank-style centrality) lose to lead, and the trained model beats them by +0.108, so the edge is not recovered by generic unsupervised proxies -- it reflects learning from real reader marks. In product terms, precision@3 rises from 0.25 to 0.39 (+55% relative) and the model beats lead on 69% of documents. An ablation attributes the edge to the raw embedding (+0.014) and training augmentation (+0.010), each with a positive CI. The edge is not a temporal-generalization failure, and we find no evidence that content drift or near-duplicate leakage explains it. A standardized regression shows the advantage is governed mainly by document popularity (lower popularity, larger edge) and by label reliability. It nearly vanishes only on the most popular content; there it is the lead baseline that strengthens, not the model that weakens. Because our evaluation conditions on documents that eventually accumulated readers, these results are a retrospective cold-start simulation.

2606.11238 2026-06-12 q-fin.GN cs.AI 新提交

Artificial Intelligence in Ship Finance: Applications, Opportunities, and a Case Study in AI-Augmented Loan Origination

人工智能在船舶金融中的应用:机遇与AI增强贷款发起的案例研究

Lasse Dierich, Orestis Schinas

发表机构 * ShipFinance.ai HHX.blue GmbH Technical University of Munich(慕尼黑技术大学) University of the Aegean(爱琴海大学)

AI总结 本文探讨AI在船舶金融中的应用,提出基于大语言模型的模块化架构,用于文档理解、信息提取和工作流自动化,以支持贷款申请流程。

Comments 9 pages, 1 figure

详情
AI中文摘要

船舶金融是资产担保贷款中数据密集且文档繁重的领域,需要整合来自异构且高度非结构化来源的财务、技术、合同和监管信息。日益严格的环境法规和ESG报告要求进一步增加了承销和贷款发起流程的复杂性。人工智能(AI)的最新进展,特别是大语言模型(LLMs),为处理和分析此类信息创造了新的机遇。本文回顾了AI在船舶金融中的潜在应用,特别关注基于LLM的系统用于文档理解、信息提取和工作流自动化。我们提出了this http URL,一个模块化代理架构,用于支持船舶金融中的贷款申请工作流。所提出的系统结合了基于LLM的提取模块、财务分析组件、外部海事数据服务以及带有聊天机器人界面的受控文档生成模块,以支持标准化融资申请的准备工作。本文讨论了在生产中使用此类模型的关键挑战。我们认为,AI辅助系统可以支持海事金融专业人士管理日益复杂的信息和报告要求。

英文摘要

Ship finance is a data-intensive and document-heavy segment of asset-based lending, requiring the integration of financial, technical, contractual, and regulatory information from heterogeneous and largely unstructured sources. Increasing environmental regulation and ESG reporting requirements are adding further complexity to underwriting and loan-origination processes. Recent advances in artificial intelligence (AI), particularly large language models (LLMs), create new opportunities for processing and analysing such information. This paper reviews potential applications of AI in ship finance, with a particular focus on LLM-based systems for document comprehension, information extraction, and workflow automation. We present ShipFinance.ai, a modular agentic architecture to support loan application workflows in ship finance. The proposed system combines an LLM-based extraction module, financial analysis components, external maritime data services, and a controlled document-generation module with a chatbot interface to support the preparation of standardized financing applications. The paper discusses the key challenges for using such models in production. We argue that AI-assisted systems can support maritime finance professionals in managing increasingly complex information and reporting requirements.

2606.09855 2026-06-12 cs.MM cs.CV cs.LG 新提交

MinhwaNet: Faithful but Insufficient Object Grounding in Korean Folk Painting

MinhwaNet: 韩国民俗画中忠实但不足的对象定位

Joonhyung Bae

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院)

AI总结 提出MinhwaNet,通过部分级检测器生成对象证据图,发现韩国民俗画中符号列表不足以预测画作类型,而符号布局更重要,揭示了忠实但不足的解离现象。

详情
AI中文摘要

韩国民俗画(minhwa)由少量吉祥符号构成——老虎代表保护、一对鸟代表婚姻和谐、牡丹代表财富——这些符号在其许多绘画类型中反复出现。这暗示了一种直观的计算方法:识别画作中出现的符号,并从符号清单中读取画作类型。我们使用一个公开语料库,包含整幅画作、八字段双语策展说明以及一组独立的专家对象裁剪图,发现这种方法并不奏效。仅给定画作包含的符号列表的模型,其预测画作类型的效果远不如将图像与策展文本融合的模型,而强制类型表示基于对象定位反而会损害准确性。然而,类型预测所依赖的视觉证据仍然是局部化的且可检查的。从部分级检测器投影出的无泄漏对象证据图,在空间上忠实于策展人隔离符号对象的位置以及基于补丁的替代模型的梯度显著性。我们将这种配置称为忠实但不足的解离。部分级解释诚实地反映了部分级模型所见,但类型目标取决于符号的排列方式而非出现的符号。相同的视角区分了内容标签(在转移到保留的源机构时仍然有效,即类型)和风格标签(无效,即时代),我们通过语料库中的另外两个标签验证了这一预测。我们发布了多模态系统、一幅画作的证据图与其目录的工作示例解读,以及在长尾遗产收藏中反复出现的一系列评估注意事项。

英文摘要

Korean folk painting (minhwa) is built from a small vocabulary of auspicious symbols, a tiger for protection, a pair of birds for marital harmony, a peony for wealth, that recur across many of its painted genres. This suggests an obvious computational approach, identify which symbols appear in a painting and read the genre from the inventory. Working with a public corpus that pairs whole paintings, eight-field bilingual curatorial captions, and a separate set of expert object crops, we find that this approach does not work. A model given only a list of which symbols a painting contains predicts the genre far worse than a model that fuses the image with the curatorial text, and forcing the genre representation to be object-grounded actively hurts accuracy. The visual evidence on which the genre prediction rests is nonetheless localized and inspectable. A leakage-safe object evidence map projected from a part-level detector is spatially faithful to where curators isolated symbolic objects and to a patch-based surrogate's own gradient saliency. We name this configuration a faithful-but-insufficient dissociation. The part-level explanation is honest about what the part-level model sees, yet the genre target turns on how symbols are arranged rather than on which ones appear. The same lens separates a content label that survives transfer to held-out source institutions, genre, from a style label that does not, era, a prediction we confirm on two further labels in the corpus. We release the multimodal system, a worked-example reading of one painting's evidence map against its catalogue, and a set of evaluation cautions that recur in long-tailed heritage collections.

2606.11000 2026-06-12 quant-ph cs.LG cs.NE 新提交

Analog Quantum Asynchronous Event-Based Graph Neural Network

模拟量子异步事件驱动图神经网络

Kristian Sotirov, Shaheen Acheche, Antonio A. Gentile, Osvaldo Simeone

发表机构 * King’s Communications, Learning and Information Processing (KCLIP) lab(国王通讯、学习与信息处理(KCLIP)实验室) Centre for Intelligent Information Processing Systems (CIIPS)(智能信息处理系统中心) Department of Engineering(工程系) Pasqal SAS(Pasqal SAS公司) Institute for Intelligent Networked Systems (INSI)(智能网络化系统研究所) Northeastern University London(伦敦东北大学)

AI总结 提出模拟量子异步事件驱动图神经网络(QA-AEGNN),利用中性原子量子处理器映射事件数据为原子阵列,通过Rydberg哈密顿量模拟消息传递,实现高效事件图计算。

Comments 31 pages, 8 figures, initial version

详情
AI中文摘要

异步、事件驱动的图神经网络(AEGNN)最近成为一种处理事件相机稀疏高时间分辨率数据的有效范式。本文提出量子模拟AEGNN(QA-AEGNN),一种在中性原子量子计算机上实现AEGNN的新框架。中性原子量子处理器基于可控的Rydberg原子相互作用,提供可编程的模拟量子计算平台。为此,我们将流式事件数据映射到被困中性原子阵列,每个原子代表一个图节点(事件),其位置使得几何邻近性反映事件的时空邻域。量子处理器的原生Rydberg哈密顿量被编程以镜像AEGNN的消息传递计算,原子量子比特状态作为节点特征嵌入,原子间相互作用实现图边。此外,我们提出一种混合量子-经典训练方案,其中模拟哈密顿量参数(如激光脉冲幅度和失谐)通过经典反馈优化,以从数据中学习量子AEGNN模型。我们的方法利用中性原子量子系统的连续哈密顿量动力学和大规模并行性,以潜在精度改进原生执行事件图计算。

英文摘要

Asynchronous, event-based graph neural networks (AEGNNs) have recently emerged as an efficient paradigm for processing the sparse and high-temporal-resolution data from event cameras. In this paper, we propose quantum analog AEGNNs (QA-AEGNNs), a novel framework to implement an AEGNN on a neutral-atom quantum computer. Neutral-atom quantum processors offer a programmable analog quantum computing platform based on controllable Rydberg-atom interactions. To this end, we map the streaming event data to an array of trapped neutral atoms, where each atom represents a graph node (event) and is positioned such that geometric proximity reflects the spatio-temporal neighborhood of events. The native Rydberg Hamiltonian of the quantum processor is programmed to mirror the message-passing computations of the AEGNN, with atomic qubit states serving as node feature embeddings and inter-atom interactions realizing graph edges. Furthermore, we propose a hybrid quantum-classical training scheme in which the analog Hamiltonian parameters (e.g., laser pulse amplitudes and detunings) are optimized using classical feedback to learn the quantum AEGNN model from data. Our approach leverages the continuous Hamiltonian dynamics and massive parallelism of neutral-atom quantum systems to natively execute event-based graph computations with potential accuracy improvements

2606.07218 2026-06-12 cs.IR cs.CL 新提交

HKVM-RAG: Key-Value-Separated Hypergraph Evidence Organization for Multi-Hop RAG

HKVM-RAG:用于多跳RAG的键值分离超图证据组织

Mingyu Zhang, Ying Ma

发表机构 * Faculty of Computing, Harbin Institute of Technology(哈尔滨工业大学计算机学院) School of Computer and Information Engineering, Henan University(河南大学计算机与信息工程学院)

AI总结 提出HKVM-RAG,一种键值分离的证据组织层,通过超图键值检索改进多跳RAG的证据链暴露,在三个基准上提升F1分数。

Comments Submitted to ICDE 2027. 13 pages, 3 figures

详情
AI中文摘要

多跳RAG提出了一个超越段落匹配的数据工程问题:在固定检索预算下,系统必须将检索到的文本组织成能够暴露答案链的证据单元。密集检索器独立评分段落,而基于图的记忆使关联显式化,但通常依赖于成对或实体中心的键,这些键会碎片化多跳证据。我们提出HKVM-RAG,一个键值分离的证据组织层。它从缓存的段落级LLM证据元组中组装答案路径超边,并将其用作检索键,同时保留段落文本作为答案值。为了隔离键空间设计,我们的固定基底协议在成对图和超图变体中保持元组缓存、候选段落、阅读器和评估预算不变。加权超图键值检索在2WikiMultiHopQA上比KG-PPR提高+3.426 F1,在MuSiQue上提高+3.592 F1;HotpotQA显示更高的结构化支持覆盖率不一定带来独立的答案F1增益。因此,我们将WHG-KV视为一种证据控制信号,而非密集检索的替代。Oracle和训练到开发分析表明支持选择是可修复的,一个密集感知控制器使用冻结的ColBERTv2和HKVM排名/分数特征,结合折外HKVM预测。它在三个基准上分别达到88.846、65.073和85.810 F1,比ColBERTv2提高+11.084、+6.763和+5.966 F1。源级消融实验表明,匹配的非WHG结构化信号无法达到WHG-KV的增益。这些结果提供了有界证据,表明键值分离的超图组织可以作为多跳RAG的可重用证据控制机制。

英文摘要

Multi-hop RAG poses a data-engineering problem beyond passage matching: under fixed retrieval budgets, a system must organize retrieved text into evidence units that expose answer chains. Dense retrievers score passages independently, while graph-based memories make associations explicit but often rely on pairwise or entity-centered keys that fragment multi-hop evidence. We present HKVM-RAG, a key-value-separated evidence-organization layer. It assembles answer-path hyperedges from cached passage-level LLM evidence tuples and uses them as retrieval keys, while retaining passage text as answer values. To isolate key-space design, our fixed-substrate protocol holds the tuple cache, candidate passages, reader, and evaluation budget constant across pairwise graph and hypergraph variants. Weighted hypergraph key-value retrieval improves over KG-PPR by +3.426 F1 on 2WikiMultiHopQA and +3.592 F1 on MuSiQue; HotpotQA shows that higher structured support coverage need not yield standalone answer-F1 gains. We therefore study WHG-KV as an evidence-control signal rather than a dense-retrieval replacement. Oracle and train-to-dev analyses identify support selection as repairable, and a dense-aware controller combines frozen ColBERTv2 and HKVM rank/score features using out-of-fold HKVM predictions. It reaches 88.846, 65.073, and 85.810 F1 on the three benchmarks, improving over ColBERTv2 by +11.084, +6.763, and +5.966 F1. Source-level ablations show that matched non-WHG structured signals do not match the WHG-KV gains. These results provide bounded evidence that key-value-separated hypergraph organization can serve as a reusable evidence-control mechanism for multi-hop RAG.

2606.06525 2026-06-12 cs.GR cs.AI 新提交

Agentic Large Language Models for Automated Structural Analysis of 3D Frame Systems

用于三维框架系统自动化结构分析的主体化大型语言模型

Ziheng Geng, Ian Franklin, Santiago Martinez, Jiachen Liu, Yunhe Zhao, Minghui Cheng

发表机构 * Department of Civil and Architectural Engineering, University of Miami(迈阿密大学土木与建筑工程系) School of Architecture, University of Miami(迈阿密大学建筑学院) HBC Engineering Company(HBC工程公司) Department of Electrical and Computer Engineering, University of Miami(迈阿密大学电气与计算机工程系)

AI总结 提出一种主体化LLM框架,通过投影表示和智能体流水线实现从自然语言输入到3D框架的自动化结构分析,平均准确率达90%。

详情
AI中文摘要

大型语言模型(LLM)已成为跨领域具有强推理能力的强大基础模型。除了反应式文本生成,主体化LLM通过模块化任务分解和协调工具使用实现自主工作流执行。在结构工程中,最近的工作开发了用于平面框架自动化分析的主体化LLM。然而,由于不规则几何表示、拓扑一致性和长程推理的挑战,它们向3D框架的扩展仍未充分探索。本文提出了一种主体化LLM框架,用于从自然语言输入自动化分析3D框架。不规则3D框架通过投影到2D平面表示,其中正交网格线定义空间坐标,楼层数矩阵编码每个网格单元的垂直拉伸。基于此表示,框架建立了一个多智能体流水线:问题分析智能体将输入解析为结构化JSON;楼层分解智能体推导每层的空间布局;3D几何由节点、梁、板和柱智能体组装;支撑和荷载智能体分配边界和荷载条件,代码翻译智能体生成可执行的SAP2000脚本。在十个代表性3D框架上评估,所提框架在重复试验中平均准确率达到90%,表现出一致且可靠的性能。

英文摘要

Large language models (LLMs) have emerged as powerful foundation models with strong reasoning capabilities across domains. Beyond reactive text generation, agentic LLMs enable autonomous workflow execution through modular task decomposition and coordinated tool use. In structural engineering, recent efforts have developed agentic LLMs for automated analysis of plane frames. However, their extension to 3D frames remains underexplored due to challenges in irregular geometric representation, topological consistency, and long-horizon reasoning. This paper proposes an agentic LLM framework for automated structural analysis of 3D frames from natural language inputs. Irregular 3D frames are represented by projection onto a 2D plan, where orthogonal gridlines define spatial coordinates and a matrix of number of stories encodes vertical extrusion of each grid cell. Building on this representation, the framework establishes a multi-agent pipeline: a problem analysis agent parses input into structured JSON; a floor decomposition agent derives the spatial layout of each floor; the 3D geometry is assembled by node, girder, slab, and column agents; support and load agents assign boundary and loading conditions, and code translation agents generate executable SAP2000 script. Evaluated on ten representative 3D frames, the proposed framework achieves an average accuracy of 90% across repeated trials, demonstrating consistent and reliable performance.

2606.04009 2026-06-12 stat.ML cs.AI cs.LG 版本更新

Counterfactual Explanations for Deep Two-Sample Testing

深度双样本检验的反事实解释

Wei-Cheng Lai, Marco Simnacher, Christoph Lippert

发表机构 * Hasso-Plattner-Institute, University of Potsdam(波茨坦大学洪堡-劳恩堡研究所) Hasso Plattner Institute for Digital Health at Mount Sinai Icahn School of Medicine at Mount Sinai(辛辛那提医学院洪堡数字健康研究所)

AI总结 针对深度双样本检验,提出基于扩散自编码器和MMD优化的反事实解释框架,生成样本级编辑以揭示驱动假设拒绝的特征。

Comments 17 pages

详情
AI中文摘要

双样本检验是检测科学领域中分布差异的基本工具,但经典检验(包括基于核的检验)在高维结构化数据(如图像)上可能效果不佳。最近的深度双样本检验通过学习信息表示提高了这些场景下的灵敏度,但它们对哪些数据特征驱动拒绝原假设 $H_0$ 提供的洞察有限。为解决此问题,我们提出了一种用于深度双样本检验的反事实解释框架,该框架生成样本级编辑,将观测值从源组移向目标组,同时明确减少检验所测量的差异。我们的方法将扩散自编码器与预训练的深度双样本检验模型相结合,并在检验模型的表示空间中优化最大均值差异(MMD)目标,以生成合理的反事实。我们通过检验统计量和由此产生的双样本p值的变化来量化分布级效应。我们在合成2D形状数据集和两个MRI队列上评估了该方法。在这两种设置下,反事实变换相对于原始样本持续增加p值,表明编辑后的源集在检验下在统计上更接近目标分布。我们使用LPIPS测量最小性,以确保反事实保持接近原始样本。由此产生的编辑提供了与检测到的组差异相关的特征的可解释证据。在MRI上,局部变化与队列之间已知的解剖学差异一致。

英文摘要

Two-sample testing is a fundamental tool for detecting distributional differences across scientific domains, but classical tests (including kernel-based tests) can be ineffective on high-dimensional structured data such as images. Recent deep two-sample tests improve sensitivity in these settings by learning informative representations, yet they provide limited insight into which data features drive rejection of the null hypothesis $H_0$. To address this issue, we propose a counterfactual explanation framework for deep two-sample testing that generates sample-level edits moving observations from a source group toward a target group while explicitly reducing the discrepancy measured by the test. Our method combines a diffusion autoencoder with a pretrained deep two-sample test model and optimizes a maximum mean discrepancy (MMD) objective in the test model's representation space to produce plausible counterfactuals. We quantify distribution-level effects through changes in the test statistic and the resulting two-sample p-values. We evaluate the method on synthetic 2D shape datasets and two MRI cohorts. Across both settings, the counterfactual transformations consistently increase p-values relative to the original samples, indicating that the edited source set becomes statistically closer to the target distribution under the test. We measure minimality using LPIPS to ensure the counterfactuals remain close to the original samples. The resulting edits provide interpretable evidence of the features associated with the detected group differences. On MRI, the localized changes are consistent with known anatomical differences between cohorts.

2606.02778 2026-06-12 astro-ph.EP astro-ph.IM cs.LG 版本更新

One Transit Is All You Need: Detecting Exoplanets Through Learned Stellar Behaviour with EXOVEIL

一次凌星足矣:通过EXOVEIL学习恒星行为检测系外行星

Pratik Priyanshu

发表机构 * SRH Hochschule(SRH 高校)

AI总结 提出EXOVEIL系统,利用Transformer世界模型和自监督学习从原始光变曲线中检测单次凌星事件,在Kepler数据上实现高召回率,并零样本迁移至TESS和PLATO任务。

Comments v3: appendix gallery of confirmed-planet recoveries added; Section 6 candidate catalogue reframed as transit-like anomalies for follow-up; TLS comparison table expanded

详情
AI中文摘要

我提出EXOVEIL,一个凌星检测系统,它学习恒星亮度应有的样子,并在现实不符时发出标记。与需要相位折叠输入的现有系统不同,EXOVEIL在原始通量时间序列上运行,可以检测仅凌星一次的行星。一个Transformer世界模型,在16,499条Kepler光变曲线上通过凌星掩蔽自监督学习训练,预测预期的恒星通量。一个带有方差加权的匹配滤波检测器从预测残差中提取凌星信号。一个学习分类器(XGBoost)将行星与假阳性区分开,在Kepler DR25上达到AUC 0.938。应用于单次凌星注入-恢复,EXOVEIL在1000 ppm深度下恢复了32%的凌星——而所有基于分类的系统由于设计原因得分为0%。对3,737颗Kepler恒星进行盲搜索,发现了179个新的凌星类信号,这些信号不在DR25 TCE目录中,包括46个单次凌星候选者。无需重新训练,应用于PLATO LOPS2场中的47颗已确认TESS行星,EXOVEIL实现了100%的恢复,展示了零样本跨任务迁移。在PLATO的25秒曝光下,检测达到100 ppm——接近地球类似物范围。我提供了共形预测在凌星检测中的首次应用(95.9%经验覆盖率),并发布了该系统,可通过pip install exoveil安装,包含预训练权重和候选目录。

英文摘要

I present EXOVEIL, a transit detection system that learns what a star's brightness should look like and flags when reality disagrees. Unlike existing systems that require phase-folded input, EXOVEIL operates on raw flux time series and can detect planets that transit only once.A Transformer world model, trained on 16,499 Kepler light curves with transit-masked self-supervised learning, predicts expected stellar flux. A matched-filter detector with variance weighting extracts transit signals from the prediction residuals. A learned classifier (XGBoost) separates planets from false positives, achieving AUC 0.938 on Kepler DR25. Applied to single-transit injection-recovery, EXOVEIL recovers 32% of transits at 1000 ppm depth a task where all classification-based systems score 0% by construction. A blind search of 3,737 Kepler stars yields 179 new transit-like signals not present in the DR25 TCE catalogue, including 46 monotransit candidates. Applied withoutretraining to 47 confirmed TESS planets in the PLATO LOPS2 field, EXOVEIL achieves 100% recovery, demonstrating zero-shot cross-mission transfer. At PLATO's 25-second cadence, detection reaches 100 ppm -- approaching the Earth-analog regime. I provide the first application of conformal prediction to transit detection (95.9% empirical coverage) and release the system as pip install exoveil with pretrained weights and a candidate catalogue.

2606.01538 2026-06-12 cs.GR cs.CV cs.LG 版本更新

MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Physical Dynamics

MPMWorlds: 用于推断和外推物理动力学的物质点法模拟

Žiga Kovačič, Kevin Ellis

发表机构 * Cornell University(康奈尔大学)

AI总结 通过构建2D物质点法(MPM)模拟数据集,研究从视频推断物理动力学并外推时间演化的能力,比较代码生成与视频扩散方法的优劣。

Comments 16 pages, 13 figures. Project page: https://zzigak.github.io/mpmworlds/

详情
AI中文摘要

为了研究从视频推断物理动力学并将其向前外推的能力,我们组装了一个包含丰富物理现象(如可变形物体、流体、运动物体和发射器)的2D物质点法(MPM)物理模拟数据集。我们在此数据集上研究了代码生成和视频扩散方法,通过改变物理相关辅助信息的数量来识别它们的优缺点。代码生成模型除了提供自动合成MPM模拟的工作演示外,还揭示了这种方法在从视觉输入推断物理参数方面存在困难,但相对于视频扩散,它能产生物理和时间上稳定的向前外推结果,而视频扩散模型能更强烈地从视觉输入中识别几何属性,但会产生物理上不可信的外推结果。

英文摘要

To study the ability to infer physical dynamics from videos and extrapolate them forward in time, we assemble a dataset of 2D Material Point Method (MPM) physical simulations covering rich physical phenomena such as deformable objects, fluids, kinetic objects, and emitters. We study code generation and video diffusion approaches on this dataset, identifying their strengths and weaknesses by varying the amount of physically relevant side information. The code generation model, beyond giving a working demonstration of automatic synthesis of MPM simulations, reveals that such an approach struggles with inferring physical parameters from visual input, but relative to video diffusion, produces physically and temporally stable extrapolations forward in time, while the video diffusion model more strongly identifies geometric properties from visual input but produces physically implausible extrapolations.