arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.16152 2026-06-11 cs.DL cs.AI cs.CL cs.LG 版本更新

Mapping Scientific Literature with Large Language Models and Topic Modeling

利用大语言模型和主题建模绘制科学文献图谱

Mason Smetana, Lev Khazanovich

发表机构 * Department of Civil and Environmental Engineering（土木与环境工程系）； University of Pittsburgh（匹兹堡大学）

AI总结提出基于大语言模型的两阶段分类框架，通过主题建模分析PNAS工程类文献，生成语义可解释主题并揭示跨主题关联，性能优于传统方法。

Comments 35 pages, 10 figures. Accepted for publication in Scientometrics. Final version available via DOI

详情

DOI: 10.1007/s11192-026-05643-9
Journal ref: Scientometrics (2026)

AI中文摘要

科学文献因学科边界、专业术语和潜在稀疏的关键词系统而日益碎片化，使得捕捉现代科学的演化结构变得困难。本研究引入了一个大语言模型驱动的框架，从主题建模的角度绘制科学文献图谱。该方法在《美国国家科学院院刊》20年间超过1500篇工程相关文章语料上进行了演示。一个两阶段分类流水线首先根据每篇文章的摘要分配一个主要主题类别，然后进行全文分析以识别次要分类，揭示语料库中潜在的跨主题联系。与传统主题模型不同，基于LLM的框架在保持强量化性能的同时，生成语义可解释的主题。与既定主题建模方法的比较评估显示，主题多样性更高，重叠度更低，且具有竞争性的一致性指标。对随机抽样的摘要子集进行手动验证，准确率达到75.9%。额外的传统自然语言处理分析证实，生成的主题对应于语料库中有意义的语言模式。连接主要和次要分类的二部网络进一步揭示了仅通过摘要或关键词系统不易观察到的隐含主题关系。结果表明，该框架无需事先了解期刊的编辑双重分类结构，即可独立恢复其大部分结构。总体而言，所提出的方法为绘制科学图谱和识别研究中新兴的跨主题联系提供了有力工具。

英文摘要

Scientific literature is increasingly fragmented by disciplinary boundaries, specialized terminology, and potentially sparse keyword systems, making it difficult to capture the evolving structure of modern science. This study introduces a large language model (LLM)-driven framework for mapping scientific literature from a topic modeling perspective. The approach is demonstrated on a 20-year corpus of more than 1,500 engineering-related articles published in the Proceedings of the National Academy of Sciences (PNAS). A two-stage classification pipeline first assigns a primary thematic category to each article based on its abstract, followed by full-text analysis to identify secondary classifications that reveal latent cross-topic connections within the corpus. Unlike conventional topic models, the LLM-based framework produces semantically interpretable topics while maintaining strong quantitative performance. Comparative evaluation against established topic modeling methods shows higher topic diversity and lower overlap with competitive coherence metrics. Manual validation on a randomly sampled subset of abstracts yields an accuracy of 75.9%. Additional traditional natural language processing analyses confirm that the generated topics correspond to meaningful linguistic patterns in the corpus. A bipartite network linking primary and secondary classifications further reveals implicit thematic relationships that are not readily observable through abstracts or keyword systems alone. The findings indicate that the framework independently recovers much of the journal's editorial dual-classification structure without prior knowledge of its schema. Overall, the proposed approach offers a powerful tool for mapping science and identifying emerging cross-topic connections in research.

URL PDF HTML ☆

赞 0 踩 0

2510.02660 2026-06-11 cs.HC cs.AI 版本更新

When Researchers Say Mental Model/Theory of Mind of AI, What Are They Really Talking About?

当研究人员谈论AI的心理模型/心智理论时，他们究竟在说什么？

Xiaoyun Yin, Elmira Zahmat Doost, Shiwen Zhou, Garima Arya Yadav, Jamie C. Gorman

发表机构 * Center for Human, Artificial Intelligence, and Robot Teaming（人类、人工智能与机器人协同中心）

AI总结本文指出当前AI心智理论研究混淆了行为预测与真实认知，提出应转向人机交互中的互惠心智理论框架。

Comments This work have been accepted in CogInterp @ NeurIPS 2025

2508.17077 2026-06-11 stat.ML cs.LG 版本更新

CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference

CP4SBI: 基于模拟推断中可信集的局部共形校准

Luben M. C. Cabezas, Vagner S. Santos, Thiago R. Ramos, Pedro L. C. Rodrigues, Rafael Izbicki

发表机构 * Department of Statistics, Federal University of São Carlos（统计系，圣卡洛斯联邦大学）； Institute of Mathematics and Computer Science, University of São Paulo（数学与计算机科学学院，圣保罗大学）； Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK（格勒诺布尔阿尔卑斯大学，法国国家信息与自动化研究所，法国国家科学研究中心，格勒诺布尔INP，LJK）

AI总结提出CP4SBI框架，通过回归树和CDF校准实现局部贝叶斯覆盖，为任意评分函数提供有限样本局部覆盖保证，提升神经后验估计的不确定性量化质量。

2508.18636 2026-06-11 cs.SE cs.AI 版本更新

LaQual: An Automated Framework for LLM App Quality Evaluation

LaQual: 一种用于LLM应用质量评估的自动化框架

Yan Wang, Xinyi Hou, Junjun Si, Yanjie Zhao, Weiguo Lin, Haoyu Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出LaQual自动化框架，通过静态指标筛选和动态场景评估，实现LLM应用质量评估，与人类判断高度一致，可减少66.7%-81.3%候选应用。

详情

AI中文摘要

代表软件分发的新范式，LLM应用商店正在迅速兴起，为用户提供内容生成、编程辅助、教育等多样化选择。然而，当前LLM应用商店中的排名和推荐机制主要依赖静态指标（如用户交互和收藏），使用户难以高效识别高质量应用。同时，当前学术研究专注于特定垂直领域，缺乏适用于多样化LLM应用生态的通用自动化评估框架。为应对上述挑战，我们提出LaQual，一种用于LLM应用质量评估的自动化框架。LaQual整合三个关键阶段：(1) LLM应用标注与层次分类，实现精确场景映射；(2) 静态指标评估，使用时间加权用户参与度和功能能力指标过滤低质量应用；(3) 动态场景自适应评估，由LLM生成场景特定评估指标、评分标准和任务，进行全面质量评估。在主流LLM应用商店上的实验证明了LaQual的有效性。其自动化评分与人类判断高度一致。通过有效筛选，LaQual可将候选LLM应用池减少66.7%至81.3%。用户研究进一步验证了其相对于基线系统的显著优势，特别是在比较效率（均值5.45 vs. 3.30）和解释信息价值（4.75 vs. 2.25）方面。这些结果表明，LaQual为现实场景中LLM应用的高质量发现与推荐提供了可扩展、客观且以用户为中心的解决方案。

英文摘要

Representing a new paradigm in software distribution, LLM app stores are rapidly emerging, offering users diverse choices for content generation, coding assistance, education, and more. However, current ranking and recommendation mechanisms in LLM app stores predominantly rely on static metrics, such as user interactions and favorites, making it challenging for users to efficiently identify high-quality apps. At the same time, current academic research focuses on specific vertical fields and lacks a general, automated evaluation framework applicable to the diverse LLM app ecosystem. To address the above challenges, we present LaQual, an automated framework for LLM app quality evaluation. LaQual integrates three key stages: (1) LLM app labeling and hierarchical classification for precise scenario mapping; (2) static indicator evaluation using time-weighted user engagement and functional capability indicators to filter low-quality apps; and (3) dynamic scenario-adapted evaluation, where an LLM generates scenario-specific evaluation metrics, scoring criteria, and tasks for comprehensive quality evaluation. Experiments on a mainstream LLM app store demonstrate the effectiveness of LaQual. Its automated scores show high consistency with human judgments. Through effective screening, LaQual can reduce the candidate LLM app pool by 66.7% to 81.3%. User studies further validate its significant outperformance over baseline systems, particularly in comparison efficiency (mean 5.45 vs. 3.30) and value of explanatory information (4.75 vs. 2.25). These results demonstrate that LaQual provides a scalable, objective, and user-centric solution for high-quality discovery and recommendation of LLM apps in real-world scenarios.

URL PDF HTML ☆

赞 0 踩 0

2508.10807 2026-06-11 quant-ph cs.LG math.OC 版本更新

Parity Cross-Resonance: A Multiqubit Gate

奇偶交叉共振：一种多量子比特门

Xuexin Xu, Siyu Wang, Radhika Joshi, Rihan Hai, Mohammad H. Ansari

发表机构 * Peter Grünberg Institute, Forschungszentrum Jülich（彼得·格林堡研究所，吕贝克研究中心）； Jülich-Aachen Research Alliance (JARA)（吕贝克-亚琛研究联盟（JARA））； Fundamentals of Future Information Technologies（未来信息科技基础）； Institute for Quantum Information, RWTH Aachen University（量子信息研究所，亚琛RWTH大学）； Department of Software Technology, Delft University of Technology（软件技术系，代尔夫特理工大学）

AI总结提出一种原生三量子比特纠缠门，通过混合优化方法实现控制-控制-目标和控制-目标-目标操作，用于GHZ态制备、Toffoli逻辑和受控ZZ门，提升表面码稳定子测量保真度。

Comments 19 pages, 10 figures

详情

DOI: 10.1103/6d5v-vrm4
Journal ref: Phys. Rev. Applied 25, 044045 (2026)

AI中文摘要

我们提出一种原生三量子比特纠缠门，它利用工程化相互作用在单次相干步骤中实现控制-控制-目标和控制-目标-目标操作。与传统的分解为多个两量子比特门不同，我们的混合优化方法选择性地放大所需相互作用，同时抑制不需要的耦合，从而在整个计算子空间及之外实现稳健性能。这种新门可归类为交叉共振门。我们展示了它可以多种方式使用，例如在GHZ三重态制备、具有多体相互作用的Toffoli类逻辑演示以及实现受控ZZ门中。后者将两个数据量子比特的奇偶性直接映射到测量量子比特上，从而在表面码量子纠错中实现更快、更高保真度的稳定子测量。在所有示例中，我们展示了三量子比特门性能在希尔伯特空间大小上的稳健性，这通过增加总激发数下的测试得到证实。这项工作为协同设计电路架构和控制协议奠定了基础，这些协议利用原生多量子比特相互作用作为下一代超导量子处理器的核心元素。

英文摘要

We present a native three-qubit entangling gate that exploits engineered interactions to realize control-control-target and control-target-target operations in a single coherent step. Unlike conventional decompositions into multiple two-qubit gates, our hybrid optimization approach selectively amplifies desired interactions while suppressing unwanted couplings, yielding robust performance across the computational subspace and beyond. The new gate can be classified as a cross-resonance gate. We show it can be utilized in several ways, for example, in GHZ triplet state preparation, Toffoli-class logic demonstrations with many-body interactions, and in implementing a controlled-ZZ gate. The latter maps the parity of two data qubits directly onto a measurement qubit, enabling faster and higher-fidelity stabilizer measurements in surface-code quantum error correction. In all these examples, we show that the three-qubit gate performance remains robust across Hilbert space sizes, as confirmed by testing under increasing total excitation numbers. This work lays the foundation for co-designing circuit architectures and control protocols that leverage native multiqubit interactions as core elements of next-generation superconducting quantum processors.

URL PDF HTML ☆

赞 0 踩 0

2505.17623 2026-06-11 cs.CR cs.AI cs.ET cs.LG cs.PF 版本更新

\texttt{Range-Arithmetic}: Verifiable Deep Learning Inference on an Untrusted Party

Range-Arithmetic: 在不可信方上进行可验证的深度学习推理

Ali Rahimi, Babak H. Khalaj, Mohammad Ali Maddah-Ali

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出Range-Arithmetic框架，通过将非算术运算转化为可验证的算术步骤，实现高效的深度神经网络推理验证，降低了计算和通信开销。

详情

AI中文摘要

可验证计算（VC）在去中心化机器学习系统中日益重要，由于区块链的限制，深度神经网络（DNN）推理等资源密集型任务被外包给外部参与者。这产生了在不重新执行的情况下验证外包计算正确性的需求。我们提出了\texttt{Range-Arithmetic}，一个新颖的框架，用于高效且可验证的DNN推理，它将非算术运算（如定点矩阵乘法后的舍入和ReLU）转化为可通过求和检查协议和串联范围证明验证的算术步骤。我们的方法避免了布尔编码、高次多项式和大查找表的复杂性，同时保持与基于有限域的证明系统的兼容性。实验结果表明，我们的方法不仅匹配现有方法的性能，还降低了验证结果的计算成本、执行DNN推理的不可信方所需的计算工作量以及双方之间的通信开销。

英文摘要

Verifiable computing (VC) has gained prominence in decentralized machine learning systems, where resource-intensive tasks like deep neural network (DNN) inference are offloaded to external participants due to blockchain limitations. This creates a need to verify the correctness of outsourced computations without re-execution. We propose \texttt{Range-Arithmetic}, a novel framework for efficient and verifiable DNN inference that transforms non-arithmetic operations, such as rounding after fixed-point matrix multiplication and ReLU, into arithmetic steps verifiable using sum-check protocols and concatenated range proofs. Our approach avoids the complexity of Boolean encoding, high-degree polynomials, and large lookup tables while remaining compatible with finite-field-based proof systems. Experimental results show that our method not only matches the performance of existing approaches, but also reduces the computational cost of verifying the results, the computational effort required from the untrusted party performing the DNN inference, and the communication overhead between the two sides.

URL PDF HTML ☆

赞 0 踩 0

2505.08784 2026-06-11 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework

PCS-UQ：基于可预测性-可计算性-稳定性框架的不确定性量化

Abhineet Agarwal, Fange Xiao, Rebecca Barter, Omer Ronen, Boyu Fan, Bin Yu

发表机构 * Department of Statistics, University of California, Berkeley（加州大学伯克利分校统计学系）； Department of Epidemiology, University of Utah（犹他大学流行病学系）； Department of Electrical Engineering and Computer Science, University of California, Berkeley（加州大学伯克利分校电气工程与计算机科学系）

AI总结提出PCS-UQ框架，通过预测检查、bootstrap采样和乘法校准实现不确定性量化，在回归和分类任务中优于或媲美共形预测方法，并提供理论保证。

详情

AI中文摘要

随着机器学习进入高风险领域，可信的不确定性量化对于安全性至关重要。本文基于真实数据科学的可预测性、可计算性和稳定性原则，提出了PCS-UQ框架。从候选模型或算法集开始，PCS-UQ集成了严格的预测检查以筛选出集合中不合适的模型，并利用bootstrap样本来捕获预测检查算法的样本间变异性和算法不稳定性。然后，我们引入了一种新颖的乘法校准方案来增强局部自适应性，这基本上对应于共形预测中的新分数。此外，我们编制了17个真实世界回归数据集，并手动构建了子组。在该基准测试中，PCS-UQ在保持目标覆盖率的同时，在区间宽度上优于或匹配配备有oracle选择算法的共形方法。PCS-UQ实现了一致的子组覆盖率，优于这些oracle选择的共形方法。值得注意的是，PCS-UQ在实现竞争性区间宽度和一致子组覆盖率方面表现出色。在6个分类数据集上，PCS-UQ将预测集大小减少了20%。为了将框架扩展到深度学习，我们提出了计算高效的变体，避免了昂贵的重新训练。在三个计算机视觉基准测试中，这些变体将预测集大小比共形基线减少了20%。最后，我们提供了理论证明，即修改后的PCS-UQ算法在可交换性下作为分割共形推断的一种形式保持了有效的覆盖率。

英文摘要

As machine learning (ML) enters high-stakes domains, trustworthy uncertainty quantification (UQ) is essential for safety. In this paper we introduce PCS-UQ, a framework based on the Predictability, Computability, and Stability (PCS) principles for veridical data science. Starting with a candidate set of models or algorithms, PCS-UQ integrates a rigorous prediction-check to screen out unsuitable models in the set and utilizes bootstrap samples, in order to capture both inter-sample variability and algorithmic instability for the prediction-checked algorithms. We then introduce a novel multiplicative calibration scheme to enhance local adaptivity, which basically corresponds to a new score in conformal prediction. Moreover, we produce a compilation of 17 real-world regression datasets with manually-constructed subgroups. On this benchmark, PCS-UQ maintains the target coverage while outperforming or matching conformal methods equipped with oracle-selected algorithms in interval width. PCS-UQ achieves consistent subgroup coverage, outperforming these oracle-selected conformal methods. Notably, PCS-UQ stands out in achieving both competitive interval widths and consistent subgroup coverage.Across 6 classification datasets, PCS-UQ reduces prediction set sizes by 20\%. To scale the framework for deep learning, we propose computationally efficient variants that bypass expensive retraining. On three computer vision benchmarks, these variants reduce prediction set sizes by 20\% over conformal baselines. Finally, we provide theoretical proof that a modified PCS-UQ algorithm preserves valid coverage under exchangeability as a form of split conformal inference.

URL PDF HTML ☆

赞 0 踩 0

2504.21072 2026-06-11 cs.CR cs.AI cs.LG 版本更新

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure

擦除但未遗忘：后门如何破坏概念擦除

Tobias Braun, Jonas Henry Grebe, Marcus Rohrbach, Anna Rohrbach

发表机构 * GitHub

AI总结本文揭示了一种名为擦除规避后门（EEB）的漏洞，攻击者将后门触发器绑定到待擦除概念上，使得该恶意链接在后续擦除后仍然存在，从而绕过多种概念擦除方法。

详情

AI中文摘要

文本到图像扩散模型的扩展引发了对有害输出的担忧，从捏造的公众人物描绘到露骨的色情图像。为减轻此类风险，先前工作提出了概念擦除方法，旨在通过微调从模型中切断不需要的概念，但仍不清楚这些方法是否真正移除了与有害概念的所有联系，或仅仅是掩盖了表面连接。在这项工作中，我们揭示了一个关键漏洞——擦除规避后门（EEB）：攻击者将后门触发器绑定到待擦除的概念上，并且这种恶意链接在后续擦除后仍然存在。我们展示了黑盒和白盒攻击者都能实例化这一威胁。在六种最先进的擦除方法中，包括那些明确搜索目标概念替代表示的鲁棒方法，EEB始终能暴露有害内容：针对名人身份遗忘的成功率高达82%，针对物体擦除的成功率高达94%，针对露骨内容暴露的放大倍数高达16倍。虽然EEB揭示了当前擦除方法的一个盲点，但它也为压力测试未来的概念擦除技术提供了诊断工具。

英文摘要

The expansion of text-to-image diffusion models has raised concerns about harmful outputs, from fabricated depictions of public figures to sexually explicit imagery. To mitigate such risks, prior work has proposed concept erasure methods that aim to sever unwanted concepts from the model via fine-tuning, yet it remains unclear whether these approaches truly remove all links to the harmful concept or merely conceal superficial connections. In this work, we reveal a critical vulnerability, the Erasure Evasion Backdoor (EEB): an adversary binds a backdoor trigger to a concept slated for removal, and this malicious link survives subsequent erasure. We show that both black-box and white-box adversaries can instantiate this threat. Across six state-of-the-art erasure methods, including robust ones that explicitly search for alternative representations of the target concept, EEB consistently exposes harmful content: up to 82% success against celebrity-identity unlearning, up to 94% for object erasure, and up to 16 times amplification of explicit-content exposure. While EEB uncovers a blind spot in current erasure methods, it also provides a diagnostic tool for stress-testing future concept erasure techniques.

URL PDF HTML ☆

赞 0 踩 0

2410.24145 2026-06-11 stat.ML cs.LG stat.ME 版本更新

Projected random forests and conformal prediction of circular data

投影随机森林与圆形数据的共形预测

Paulo C. Marques F., Rinaldo Artes, Helton Graziadei

发表机构 * Insper University（Insper大学）； University of São Paulo（圣保罗大学）

AI总结针对圆形响应回归问题，应用共形预测技术，通过投影方法将线性回归模型转换为圆形模型，并利用随机森林的袋外机制避免额外校准样本，生成具有有限样本覆盖保证和自适应弧长的预测集。

Comments 7 pages; 4 figures

2409.12707 2026-06-11 physics.flu-dyn cs.LG 版本更新

Machine-learning-based multipoint optimization of fluidic injection parameters for improving nozzle performance

基于机器学习的流体注入参数多点优化以提升喷管性能

Yunjia Yang, Jiazhe Li, Yufei Zhang, Haixin Chen

发表机构 * Tsinghua University（清华大学）

AI总结针对过膨胀单斜面喷管，采用预训练神经网络替代CFD进行多点优化，结合先验预测策略提高精度，利用反向传播快速计算梯度，在七个设计点优化平均推力系数提升1.14%。

详情

AI中文摘要

流体注入为改善车辆加速过程中过膨胀单斜面喷管（SERN）的性能提供了一种有前景的解决方案。然而，确定能在多个喷管工作状态下产生最佳整体性能的注入参数仍然是一个挑战。基于梯度的优化方法需要在每个设计点计算注入参数的梯度，当使用计算流体动力学（CFD）模拟时，这可能导致高昂的计算成本。本文使用预训练神经网络在优化过程中替代CFD，从而能够快速计算多个设计点的喷管流场。考虑到喷管流场的物理特性，采用基于先验的预测策略来提高模型的准确性。此外，神经网络的反向传播算法只需运行一次计算即可快速计算梯度，从而与有限差分法相比大大减少了梯度计算时间。作为测试案例，对SERN在七个设计点的平均喷管推力系数进行了优化，结果提高了1.14%。即使包括建立训练数据库所需的时间，与传统优化方法相比，时间成本也大大降低。

英文摘要

Fluidic injection offers a promising solution to improve the performance of the overexpanded single expansion ramp nozzles (SERNs) during vehicle acceleration. However, determining the injection parameters that yield the best overall performance across multiple nozzle operating conditions remains a challenge. The gradient-based optimization method requires gradients of injection parameters at each design point, which can lead to high computational costs when using computational fluid dynamics (CFD) simulations. This paper uses a pretrained neural network to replace CFD during optimization, enabling quick calculation of the nozzle flow field at multiple design points. Considering the physical characteristics of the nozzle flow field, a prior-based prediction strategy is adopted to enhance the model's accuracy. In addition, the neural network's back-propagation algorithm computes gradients quickly by running the computation only once, thereby greatly reducing gradient computation time compared to the finite difference method. As a test case, the average nozzle thrust coefficient of an SERN at seven design points is optimized, resulting in a 1.14\% improvement. The time cost is greatly reduced compared with traditional optimization methods, even when the time required to establish the training database is included.

URL PDF HTML ☆

赞 0 踩 0

2304.13905 2026-06-11 cs.CR cs.AI cs.LG 版本更新

LSTM based IoT Device Identification

基于LSTM的物联网设备识别

Kahraman Kostas

发表机构 * Kahraman Kostas

AI总结提出一种端到端机器学习流程，利用LSTM网络处理原始网络数据包，通过滑动窗口时间序列特征识别27类物联网设备，在最优配置下达到79.85%准确率和75.70%宏平均F1分数。

详情

AI中文摘要

随着物联网的使用越来越普及，大量设备进入市场，许多安全漏洞也随之出现。在此环境下，物联网设备识别方法提供了一种预防性安全措施，作为识别这些设备并检测其漏洞的重要因素。在本研究中，我们提出了一种端到端的机器学习流程，利用长短期记忆（LSTM）网络识别阿尔托大学数据集（物联网设备捕获）中的物联网设备。原始网络数据包捕获（PCAP）被处理成25个工程特征，然后排列为滑动窗口时间序列。我们系统地评估了从2到20的序列长度，报告称性能在长度6之前近似线性提升，之后呈波浪形模式，在长度18时达到峰值。在最优配置的最终保留测试集上，该模型在27个设备类别上达到了79.85%的准确率和75.70%的宏平均F1分数。

英文摘要

While the use of the Internet of Things is becoming more and more popular, many security vulnerabilities are emerging with the large number of devices being introduced to the market. In this environment, IoT device identification methods provide a preventive security measure as an important factor in identifying these devices and detecting the vulnerabilities they suffer from. In this study, we present an end-to-end machine learning pipeline that identifies IoT devices in the Aalto university dataset (IoT devices captures) using Long Short-Term Memory (LSTM) networks. Raw network packet captures (PCAP) are processed into 25 engineered features, which are then arranged as sliding-window time-series sequences. We systematically evaluate sequence lengths from 2 to 20, reporting that performance improves approximately linearly up to length 6 and thereafter in a wave-like pattern, reaching its peak at length 18. On the final held-out test set with the optimal configuration, the model achieves an accuracy of 79.85% and a macro-averaged F1-score of 75.70% across 27 device classes.

URL PDF HTML ☆

赞 0 踩 0

2605.22509 2026-06-11 cs.HC cs.CL

Reflecti-Mate: A Conversational Agent for Adaptive Decision-Making Support Through System 1 and System 2 Thinking

Reflecti-Mate: 通过系统1和系统2思维实现自适应决策支持的对话代理

Morita Tarvirdians, Senthil Chandrasegaran, Hayley Hung, Catholijn M. Jonker, Catharine Oertel

发表机构 * TU Delft（代尔夫特理工大学）； TU Delft/Leiden University（代尔夫特理工大学/莱顿大学）

AI总结本文研究了一种对话代理，通过适应个体思维模式促进决策整合，该代理能提供更个性化的反思路径和整合性反思语言，优于传统决策支持系统。

Comments Accepted at UMAP 2026

详情

DOI: 10.1145/3774935.3806176
Journal ref: UMAP 2026: Proceedings of the 34th ACM Conference on User Modeling, Adaptation and Personalization

AI中文摘要

在做出高风险个人决策时，涉及认知、情感和直觉过程，个体在这些模式间的注意力分配各不相同。整合这些过程已被证明有助于决策。然而，大多数现有决策支持系统主要支持认知方面，而非适应个体的思维特征以促进不同思维类型的整合。在本研究中，我们探讨了一种代理，旨在通过适应个体用户思维模式来促进整合。我们探讨了该代理对参与者对代理的看法及其反思行为的影响，与未受助的预反思和基线代理进行比较。在被试间研究（N=128）中，我们的代理促进了广泛且深入的思考，使参与者能够形成更个性化的反思轨迹，产生更多整合性的反思语言，并被感知为提供更强的全面反思支持。相比之下，基线代理产生了受认知语言主导的同质化特征。

英文摘要

Making high-stakes personal decisions involves cognitive, emotional, and intuitive processes, and individuals differ in how they allocate attention across these modes. Integration of these processes has shown to benefit decision making. Yet, most current decision-support systems focus primarily on supporting cognitive aspects, rather than adapting to the individual's thinking profile to support integration of different types of thoughts. In this study, we investigate an agent designed to encourage integration by adapting to the individual user's thought patterns. We explore its effects on participants' perceptions of the agent and their reflective behavior, in comparison with unaided pre-reflection and a baseline agent. In a between-subjects study (N = 128), our agent, which fostered broad and elaborated thinking, enabled more personalized reflective trajectories, elicited more integrative reflective language, and was perceived as providing stronger support for holistic reflection. In contrast, the baseline agent produced homogenized profiles dominated by cognitive language across participants.

URL PDF HTML ☆

赞 0 踩 0

2412.01459 2026-06-11 cs.CY cs.AI cs.HC

Perception Gaps in Risk, Benefit, and Value Between Experts and Public Challenge Socially Accepted AI

专家与公众在风险、收益和价值上的认知差距挑战社会接受的AI

Philipp Brauner, Felix Glawe, Gian Luca Liehner, Luisa Vervier, Martina Ziefle

发表机构 * RWTH Aachen University（亚琛工业大学）

AI总结研究比较了公众与AI专家在71个场景中对AI能力与影响的认知差异，发现专家更乐观，而公众更关注风险，揭示了沟通和政策干预的必要性。

详情

DOI: 10.1007/s00146-026-03023-8
Journal ref: AI & Society (2026)

AI中文摘要

人工智能（AI）正在重塑许多社会领域，引发了关于其风险、收益以及公众与学术界观点可能不一致的紧迫问题。本研究考察了普通公众（N=1110）——与AI技术互动或受其影响的人——和学术AI专家（N=119）——塑造AI发展的人——在71个场景中对AI能力与影响的感知。这些场景涵盖可持续性、医疗、工作表现、社会不平等、艺术和战争等领域。参与者在四个维度上评估这些场景：可能性、感知风险与收益，以及总体价值（或情感）。结果表明，专家普遍预期更高的概率，感知较低的风险，报告更高的收益，并对AI持有更积极的态度，与非专家相比。此外，两组人应用了不同的加权方案：专家更倾向于降低风险相对于收益的权重。这些评估的视觉映射揭示了评价一致的领域（如AI进行医学诊断或刑事用途）以及紧张点（如法律案件的决定、政治决策），突显了沟通和政策干预的必要性。这些发现强调了关键的转化挑战：如果AI研究和部署要与社会优先事项一致，开发者与公众之间的认知差距必须被更好地理解和解决。我们的结果为价值敏感的AI治理和跨利益相关者群体的信任建设策略提供了实证基础。

英文摘要

Artificial Intelligence (AI) is reshaping many societal domains, raising critical questions about its risks, benefits, and the potential misalignment between public and academic perspectives. This study examines how the general public (N=1110) -- individuals who interact with or are impacted by AI technologies -- and academic AI experts (N=119) -- those elites shaping AI development -- perceive AI's capabilities and impact across 71 scenarios. These scenarios span domains such as sustainability, healthcare, job performance, societal inequality, art, and warfare. Participants evaluated these scenarios across four dimensions using the psychometric model: likelihood, perceived risk and benefit, and overall value (or sentiment). The results suggest significant differences: experts consistently anticipate higher probabilities, perceive lower risks, report greater benefits, and express more positive sentiment toward AI compared to the non-experts. Moreover, both groups apply different weighting schemes: experts discount risk more heavily relative to benefit than non-experts. Visual mappings of these evaluations uncover areas convergent evaluations (e.g., AI performing medical diagnoses or criminal use) as well as tension points (e.g., decision of legal cases, political decision making), highlighting areas where communication and policy interventions may be needed. These findings underscore a critical translational challenge: if AI research and deployment are to align with societal priorities, the perception gap between developers and the public must be better understood and addressed. Our results provide an empirical foundation for value-sensitive AI governance and trust-building strategies across stakeholder groups.

URL PDF HTML ☆

赞 0 踩 0

2602.13513 2026-06-11 math.OC cs.CE cs.LG cs.NA math.DS math.NA

Learning Gradient Flow: Using Equation Discovery to Accelerate Engineering Optimization

学习梯度流：利用方程发现加速工程优化

Grant Norman, Conor Rowan, Kurt Maute, Alireza Doostan

发表机构 * Smead Aerospace Engineering Sciences（Smead航空航天工程科学）

AI总结本文通过数据驱动的方程发现方法，学习连续时间动态以加速优化过程，提出Learned Gradient Flow优化器，通过构建变量多项式阶数的替代模型，提升收敛速度。

Comments 44 pages, 13 figures. Submitted to CMAME. Changed Topology Optimization example to be 250% acceleration

详情

DOI: 10.1016/j.cma.2026.119099

AI中文摘要

在本文中，我们研究了利用数据驱动的方程发现方法来建模和预测无约束优化问题的连续时间动态。为避免昂贵的目标函数及其梯度评估，我们利用优化变量上的轨迹数据来学习与梯度下降、牛顿法和ADAM优化相关的连续时间动态。发现的梯度流随后作为原始优化问题的替代模型进行求解。为此，我们引入了Learned Gradient Flow (LGF) 优化器，该优化器能够在优化过程中以用户定义的间隔，在全空间或降维空间中构建变量多项式阶数的替代模型。我们展示了该方法在工程力学和科学机器学习中的标准问题上的有效性，包括两个反问题、结构拓扑优化以及两个具有不同离散化的正向求解。我们的结果表明，所学的梯度流可以通过捕捉优化轨迹的关键特征，从而显著加快收敛速度，同时避免昂贵的目标函数及其梯度评估。

英文摘要

In this work, we investigate the use of data-driven equation discovery for dynamical systems to model and forecast continuous-time dynamics of unconstrained optimization problems. To avoid expensive evaluations of the objective function and its gradient, we leverage trajectory data on the optimization variables to learn the continuous-time dynamics associated with gradient descent, Newton's method, and ADAM optimization. The discovered gradient flows are then solved as a surrogate for the original optimization problem. To this end, we introduce the Learned Gradient Flow (LGF) optimizer, which is equipped to build surrogate models of variable polynomial order in full- or reduced-dimensional spaces at user-defined intervals in the optimization process. We demonstrate the efficacy of this approach on several standard problems from engineering mechanics and scientific machine learning, including two inverse problems, structural topology optimization, and two forward solves with different discretizations. Our results suggest that the learned gradient flows can significantly expedite convergence by capturing critical features of the optimization trajectory while avoiding expensive evaluations of the objective and its gradient.

URL PDF HTML ☆

赞 0 踩 0

2601.07436 2026-06-11 eess.SP cs.LG physics.optics

PIDT: Physics-Informed Digital Twin for Optical Fiber Parameter Estimation

PIDT：基于物理的数字孪生用于光纤参数估计

Zicong Jiang, Magnus Karlsson, Erik Agrell, Christian Häger

发表机构 * Dept. of Electrical Engineering, Chalmers Univ. of Technology, Sweden（电气工程系，瑞典查尔姆斯理工大学）； Dept. of Microtechnology and Nanoscience, Chalmers Univ. of Technology, Sweden（微电子与纳米科技系，瑞典查尔姆斯理工大学）

AI总结本文提出基于物理的数字孪生（PIDT），结合参数化拆分步方法与基于物理的损失函数，以更低的复杂度提升光纤参数估计的精度和收敛速度。

Comments The paper will be appeared in Optical Fiber Communications Conference and Exhibition (OFC) 2026

2512.20464 2026-06-11 physics.optics cs.CV cs.NE physics.app-ph

Snapshot 3D image projection using a diffractive decoder

利用衍射解码器的快照3D图像投影

Cagatay Isil, Alexander Chen, Yuhang Li, F. Onuralp Ardic, Shiqi Chen, Che-Yung Shen, Aydogan Ozcan

发表机构 * Electrical and Computer Engineering Department, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校电气与计算机工程系）； Bioengineering Department, University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校生物医学工程系）； California NanoSystems Institute (CNSI), University of California, Los Angeles, CA, 90095, USA（加州大学洛杉矶分校加州纳米系统研究所）

AI总结本文提出了一种基于数字编码器和衍射光学解码器的3D显示系统，通过多层衍射波前解码和深度学习优化，实现高保真度的快照3D图像投影，支持亚波长轴向分离和动态重构。

Comments 22 Pages, 8 Figures

详情

DOI: 10.1038/s41377-026-02378-3
Journal ref: Light: Science & Applications (2026)

AI中文摘要

3D图像显示对于下一代体成像至关重要；然而，由于衍射引起的串扰随着轴向图像平面距离减小而迅速增加，3D图像投影的密集深度复用仍然具有挑战性。本文介绍了一种3D显示系统，包含数字编码器和衍射光学解码器，能够同时将不同图像投影到多个目标轴向平面，具有高轴向分辨率。通过利用多层衍射波前解码和基于深度学习的端到端优化，该系统在快照中实现了高保真度的深度分辨3D图像投影，使轴向平面分离达到波长量级。数字编码器利用傅里叶编码网络从输入图像中捕捉多尺度空间和频率域特征，整合轴向位置编码，并生成统一的相位表示，通过联合优化的衍射解码器同时编码所有要轴向投影的图像。我们分析了衍射解码器深度、输出衍射效率、空间光调制器分辨率和轴向编码密度的影响，揭示了支配轴向分离和3D图像投影质量的权衡。我们进一步展示了能够显示包含28个轴向切片的体图像以及动态重构图像平面轴向位置的能力。最后，我们通过实验验证了所提出的方法，证明了测量结果与目标图像之间的高度一致。这些结果确立了衍射3D显示系统作为紧凑且可扩展的深度分辨快照3D图像投影框架，潜在应用包括全息显示、AR/VR接口和体光学计算。

英文摘要

3D image display is essential for next-generation volumetric imaging; however, dense depth multiplexing for 3D image projection remains challenging because diffraction-induced cross-talk rapidly increases as the axial image planes get closer. Here, we introduce a 3D display system comprising a digital encoder and a diffractive optical decoder, which simultaneously projects different images onto multiple target axial planes with high axial resolution. By leveraging multi-layer diffractive wavefront decoding and deep learning-based end-to-end optimization, the system achieves high-fidelity depth-resolved 3D image projection in a snapshot, enabling axial plane separations on the order of a wavelength. The digital encoder leverages a Fourier encoder network to capture multi-scale spatial and frequency-domain features from input images, integrates axial position encoding, and generates a unified phase representation that simultaneously encodes all images to be axially projected in a single snapshot through a jointly-optimized diffractive decoder. We characterized the impact of diffractive decoder depth, output diffraction efficiency, spatial light modulator resolution, and axial encoding density, revealing trade-offs that govern axial separation and 3D image projection quality. We further demonstrated the capability to display volumetric images containing 28 axial slices, as well as the ability to dynamically reconfigure the axial locations of the image planes, performed on demand. Finally, we experimentally validated the presented approach, demonstrating close agreement between the measured results and the target images. These results establish the diffractive 3D display system as a compact and scalable framework for depth-resolved snapshot 3D image projection, with potential applications in holographic displays, AR/VR interfaces, and volumetric optical computing.

URL PDF HTML ☆

赞 0 踩 0

2412.13841 2026-06-11 cs.CY cs.AI cs.HC

Cultural Dimensions of AI Perception: Charting Expectations, Risks, Benefits, Tradeoffs, and Value in Germany and China

人工智能感知的文化维度：在德国和中国绘制期望、风险、收益、权衡与价值

Philipp Brauner, Felix Glawe, Gian Luca Liehner, Luisa Vervier, Martina Ziefle

发表机构 * RWTH Aachen University（亚琛工业大学）

AI总结本文通过比较德国和中国公众对人工智能的期望、风险与收益的权衡，揭示文化差异对AI接受度的影响，为AI与社会价值观的对齐提供见解。

详情

DOI: 10.1016/j.actpsy.2026.107094
Journal ref: Acta Psychologica (2026), volume 268, article 107094

AI中文摘要

随着人工智能（AI）的持续发展，理解公众对AI的感知——包括偏见、风险和收益——对于指导研究重点和AI对齐、塑造公共讨论以及制定政策至关重要。本探索性研究通过71个AI未来潜在可能性的想象，调查了不同文化背景下AI心理模型的差异。基于来自德国（N=52）和中国（N=60）的跨文化便利样本，我们识别出在期望、评估和风险-收益权衡方面的显著差异。德国参与者普遍提供了更为谨慎的评估，而中国参与者则对AI的社会效益表现出更大的乐观态度。中国参与者在风险-收益权衡上相对平衡（风险β=-0.463，收益β=+0.484，r²=0.630）。相比之下，德国参与者更强调AI的益处，而对风险相对较低（风险β=-0.337，收益β=+0.715，r²=0.839）。视觉认知图谱展示了这些对比，提供了新的视角，说明文化背景如何塑造AI的接受度。我们的发现突显了影响公众感知的关键因素，并为使AI与社会价值观对齐以及促进公平和文化敏感的AI技术整合提供了见解。

英文摘要

As artificial intelligence (AI) continues to advance, understanding public perceptions -- including biases, risks, and benefits -- is essential for guiding research priorities and AI alignment, shaping public discourse, and informing policy. This exploratory study investigates cultural differences in mental models of AI using 71 imaginaries of AI's potential futures. Drawing on cross-cultural convenience samples from Germany (N=52) and China (N=60), we identify significant differences in expectations, evaluations, and risk-benefit tradeoffs. Participants from Germany generally provided more cautious assessments, whereas participants from China expressed greater optimism regarding AI's societal benefits. Chinese participants exhibited relatively balanced risk-benefit tradeoffs ($β=-0.463$ for risk and $β=+0.484$ for benefit, $r^2=.630$). In contrast, German participants placed greater emphasis on AI's benefits and comparatively less on risks ($β=-0.337$ for risk and $β=+0.715$ for benefit, $r^2=.839$). Visual cognitive maps illustrate these contrasts, offering new perspectives on how cultural contexts shape AI acceptance. Our findings highlight key factors influencing public perception and provide insights for aligning AI with societal values and promoting equitable and culturally sensitive integration of AI technologies.

URL PDF HTML ☆

赞 0 踩 0

2508.11703 2026-06-11 cs.NE cs.LG

Data-Driven Discovery of Interpretable Kalman Filter Variants through Large Language Models and Genetic Programming

基于大数据驱动的可解释卡尔曼滤波变种发现：通过大规模语言模型和遗传编程

Vasileios Saketos, Sebastian Kaltenbach, Sergey Litvinov, Petros Koumoutsakos

发表机构 * University of Reading（reading大学）； University of Cambridge（剑桥大学）

AI总结本文探讨通过遗传编程和大规模语言模型自动发现卡尔曼滤波变种的可能性，展示框架在不同条件下发现最优解及可解释替代方案的能力。

详情

DOI: 10.1007/978-3-032-23607-4_13

AI中文摘要

算法发现传统上依赖人类智慧和大量实验。本文研究是否可以通过基于笛卡尔遗传编程（CGP）和大规模语言模型（LLM）的自动、数据驱动的进化过程发现卡尔曼滤波。我们评估了这两种模态在不同条件下发现卡尔曼滤波的贡献。结果表明，当卡尔曼最优性假设成立时，我们的CGP和LLM辅助进化框架能收敛到近最优解；当这些假设不成立时，框架会进化出优于卡尔曼滤波的可解释替代方案。这些结果表明，结合进化算法和生成模型进行可解释、数据驱动的简单计算模块合成，是科学计算中算法发现的有效方法。

英文摘要

Algorithmic discovery has traditionally relied on human ingenuity and extensive experimentation. Here we investigate whether a prominent scientific computing algorithm, the Kalman Filter, can be discovered through an automated, data-driven, evolutionary process that relies on Cartesian Genetic Programming (CGP) and Large Language Models (LLM). We evaluate the contributions of both modalities (CGP and LLM) in discovering the Kalman filter under varying conditions. Our results demonstrate that our framework of CGP and LLM-assisted evolution converges to near-optimal solutions when Kalman optimality assumptions hold. When these assumptions are violated, our framework evolves interpretable alternatives that outperform the Kalman filter. These results demonstrate that combining evolutionary algorithms and generative models for interpretable, data-driven synthesis of simple computational modules is a potent approach for algorithmic discovery in scientific computing.

URL PDF HTML ☆

赞 0 踩 0

2503.08379 2026-06-11 cs.IR cs.CL

JurisTCU: A Brazilian Portuguese Information Retrieval Dataset with Query Relevance Judgments

JurisTCU：一个带有查询相关性判断的巴西葡萄牙语信息检索数据集

Leandro Carísio Fernandes, Leandro dos Santos Ribeiro, Marcos Vinícius Borela de Castro, Leonardo Augusto da Silva Pacheco, Edans Flávius de Oliveira Sandes

发表机构 * Câmara dos Deputados（议会委员会）； Tribunal de Contas da União (TCU)（联邦审计法院）

AI总结本文介绍了一个包含16045份巴西联邦会计法院判例文书和150个带有相关性标注查询的巴西葡萄牙语信息检索数据集，通过混合方法产生相关性判断，展示了文档扩展方法在提升BM25性能上的显著效果，以及OpenAI模型在短关键词查询中的优越表现。

Comments 23 pages

详情

DOI: 10.1007/s10579-025-09881-w

AI中文摘要

本文介绍了一个包含16045份巴西联邦会计法院判例文书和150个带有相关性标注查询的巴西葡萄牙语信息检索数据集，通过混合方法产生相关性判断，展示了文档扩展方法在提升BM25性能上的显著效果，以及OpenAI模型在短关键词查询中的优越表现。

英文摘要

This paper introduces JurisTCU, a Brazilian Portuguese dataset for legal information retrieval (LIR). The dataset is freely available and consists of 16,045 jurisprudential documents from the Brazilian Federal Court of Accounts, along with 150 queries annotated with relevance judgments. It addresses the scarcity of Portuguese-language LIR datasets with query relevance annotations. The queries are organized into three groups: real user keyword-based queries, synthetic keyword-based queries, and synthetic question-based queries. Relevance judgments were produced through a hybrid approach combining LLM-based scoring with expert domain validation. We used JurisTCU in 14 experiments using lexical search (document expansion methods) and semantic search (BERT-based and OpenAI embeddings). We show that the document expansion methods significantly improve the performance of standard BM25 search on this dataset, with improvements exceeding 45% in P@10, R@10, and nDCG@10 metrics when evaluating short keyword-based queries. Among the embedding models, the OpenAI models produced the best results, with improvements of approximately 70% in P@10, R@10, and nDCG@10 metrics for short keyword-based queries, suggesting that these dense embeddings capture semantic relationships in this domain, surpassing the reliance on lexical terms. Besides offering a dataset for the Portuguese-language IR research community, suitable for evaluating search systems, the results also contribute to enhancing a search system highly relevant to Brazilian citizens.

URL PDF HTML ☆

赞 0 踩 0

2412.12944 2026-06-11 math.OC cs.CV

Online optimisation for dynamic electrical impedance tomography

在线优化用于动态电阻抗成像

Neil Dizon, Jyrki Jauhiainen, Tuomo Valkonen

发表机构 * School of Mathematics and Statistics, University of New South Wales（新南威尔士大学数学与统计学学院）； Department of Technical Physics, University of Eastern Finland（东芬兰大学技术物理系）

AI总结本文提出一种在线对偶优化方法，用于非线性时间离散反问题，通过懊悔理论分析其在流体中移动物体实时监测中的性能，证明了CEM解算子在L^∞上的二阶可导性。

2502.09084 2026-06-11 cs.CR cs.LG cs.NI

Application of Tabular Transformer Architectures for Operating System Fingerprinting

基于表格变换器架构的操作系统指纹识别应用

Rubén Pérez-Jove, Cristian R. Munteanu, Alejandro Pazos, Jose Vázquez-Naya

发表机构 * RNASNA-IMEDIR Research Group Department of Computer Science and Information Technologies Facultad de Informática Universidade da Coruña（RNASNA-IMEDIR研究组计算机科学与信息科技系信息学院科鲁纳大学）； CITIC Research Centre Universidade da Coruña（CITIC研究中心科鲁纳大学）； IKERDATA S.L（IKERDATA公司）

AI总结本文探讨了使用Tabular Transformer架构进行操作系统指纹识别，通过三个公开数据集验证了FT-Transformer在多级分类中的优越性，提升了复杂网络环境中的准确性和适应性。

Comments Submitted as a preprint (not peer reviewed). 22 pages, 9 figures. Code and datasets available at: https://github.com/rubenpjove/tabularT-OS-fingerprinting

详情

DOI: 10.1186/s42400-025-00494-y

AI中文摘要

操作系统（OS）指纹识别对于网络管理和网络安全至关重要，能够基于网络流量分析实现准确的设备识别。传统基于规则的工具如Nmap和p0f在动态环境中面临挑战，因为操作系统更新频繁且存在混淆技术。尽管已探索了机器学习（ML）方法，但深度学习（DL）模型，特别是变换器架构，在此领域仍未被利用。本研究调查了Tabular Transformer架构——特别是TabTransformer和FT-Transformer——在OS指纹识别中的应用，利用三个公开可用的数据集中的结构化网络数据。我们的实验表明，FT-Transformer在多个分类级别（OS家族、主要版本和次要版本）上普遍优于传统ML模型、先前方法和TabTransformer。结果为基于DL的OS指纹识别奠定了坚实基础，提高了复杂网络环境中的准确性和适应性。此外，我们通过提供开源实现来确保研究的可重复性。

英文摘要

Operating System (OS) fingerprinting is essential for network management and cybersecurity, enabling accurate device identification based on network traffic analysis. Traditional rule-based tools such as Nmap and p0f face challenges in dynamic environments due to frequent OS updates and obfuscation techniques. While Machine Learning (ML) approaches have been explored, Deep Learning (DL) models, particularly Transformer architectures, remain unexploited in this domain. This study investigates the application of Tabular Transformer architectures-specifically TabTransformer and FT-Transformer-for OS fingerprinting, leveraging structured network data from three publicly available datasets. Our experiments demonstrate that FT-Transformer generally outperforms traditional ML models, previous approaches and TabTransformer across multiple classification levels (OS family, major, and minor versions). The results establish a strong foundation for DL-based OS fingerprinting, improving accuracy and adaptability in complex network environments. Furthermore, we ensure the reproducibility of our research by providing an open-source implementation.

URL PDF HTML ☆

赞 0 踩 0

2406.07909 2026-06-11 eess.AS cs.CL cs.SD stat.ML

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

通过自我知识蒸馏引导帧级CTC对齐

Eungbeom Kim, Hantae Kim, Kyogu Lee

发表机构 * KAIST（韩国科学技术院）

AI总结本文提出通过自我知识蒸馏引导帧级CTC对齐，以解决传统知识蒸馏中教师-学生模型在帧级对齐上的分歧问题，提升模型性能和资源效率。

Comments Accepted by Interspeech 2024

详情

DOI: 10.21437/Interspeech.2024-363

AI中文摘要

Transformer编码器与连接主义时间分类（CTC）框架被广泛用于自动语音识别（ASR）。然而，知识蒸馏（KD）在ASR中表现出教师-学生模型在帧级对齐上的分歧问题，最终阻碍了学生模型性能的提升。为了解决这一问题，本文引入了一种自我知识蒸馏（SKD）方法，在训练过程中引导帧级对齐。与传统使用独立教师和学生模型的方法不同，本研究提出了一种简单有效的方法，共享编码器层，并将子模型作为学生模型。总体而言，我们的方法在提高资源效率和性能方面都有效。我们还对尖峰时间进行了实验分析，以说明所提出的方法通过减少对齐分歧来提升性能。

英文摘要

Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduces a self-knowledge distillation (SKD) method that guides the frame-level alignment during the training time. In contrast to the conventional method using separate teacher and student models, this study introduces a simple and effective method sharing encoder layers and applying the sub-model as the student model. Overall, our approach is effective in improving both the resource efficiency as well as performance. We also conducted an experimental analysis of the spike timings to illustrate that the proposed method improves performance by reducing the alignment disagreement.

URL PDF HTML ☆

赞 0 踩 0

2305.13108 2026-06-11 eess.AS cs.CL cs.LG cs.SD

Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

通过样本重加权与样本亲和测试实现去偏的口吃语音自动识别

Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee

发表机构 * Institute of Information & communications Technology Planning & Evaluation (IITP)（信息与通信技术规划与评估机构）

AI总结本文提出Re-SAT方法，通过评估样本的去偏有效性来减轻语音识别系统在口吃语音上的偏差，提升系统对口吃语音的鲁棒性。

Comments Accepted by Interspeech 2023

详情

DOI: 10.21437/Interspeech.2023-2421

AI中文摘要

基于深度学习的自动语音识别系统主要是在经验风险最小化（ERM）下训练的。由于ERM在数据样本上平均性能，而不考虑如健康或口吃说话者这样的群体，ASR系统无法察觉不同群体间的性能差异。这导致了性能差异严重的ASR系统。在本研究中，我们旨在提高ASR系统在口吃说话者群体上的鲁棒性。为此，我们提出了一种新的方法，样本重加权与样本亲和测试（Re-SAT）。Re-SAT系统地测量给定数据样本的去偏有效性，并通过基于去偏有效性的样本重加权来减轻偏差。实验结果表明，Re-SAT在口吃语音上提高了ASR性能，而不会对健康语音性能造成损害。

英文摘要

Automatic speech recognition systems based on deep learning are mainly trained under empirical risk minimization (ERM). Since ERM utilizes the averaged performance on the data samples regardless of a group such as healthy or dysarthric speakers, ASR systems are unaware of the performance disparities across the groups. This results in biased ASR systems whose performance differences among groups are severe. In this study, we aim to improve the ASR system in terms of group robustness for dysarthric speakers. To achieve our goal, we present a novel approach, sample reweighting with sample affinity test (Re-SAT). Re-SAT systematically measures the debiasing helpfulness of the given data sample and then mitigates the bias by debiasing helpfulness-based sample reweighting. Experimental results demonstrate that Re-SAT contributes to improved ASR performance on dysarthric speech without performance degradation on healthy speech.

URL PDF HTML ☆

赞 0 踩 0

2107.00693 2026-06-11 eess.SP cs.LG

Inter-Beat Interval Estimation with Tiramisu Model: A Novel Approach with Reduced Error

基于Tiramisu模型的跨节拍间隔估计：一种误差减少的新方法

Asiful Arefeen, Ali Akbari, Seyed Iman Mirzadeh, Roozbeh Jafari, Behrooz A. Shirazi, Hassan Ghasemzadeh

发表机构 * EECS（电气与计算机工程系）； BME（生物医学工程系）； Texas A&M University（德克萨斯大学）； Washington State University（华盛顿州立大学）； CSE and ECE（计算机科学与工程及电子工程系）

AI总结本文提出利用Tiramisu自动编码器模型来抑制运动伪影噪声，提高ECG信号中R峰的清晰度，从而更准确地估计跨节拍间隔，提升心血管疾病早期诊断的准确性。

Comments 16 pages, 14 figures

详情

DOI: 10.1145/3616020

AI中文摘要

跨节拍间隔（IBI）测量可用于估计心率变异性（HRV），进而提供心血管疾病的早期指示。然而，从噪声信号中提取IBI具有挑战性，因为噪声会扭曲信号的形态。运动伪影会严重破坏运动状态下的人的ECG信号，导致IBI估计不准确。作为远程健康监测和可穿戴系统开发的一部分，去噪ECG信号并准确估计其IBI已成为信号处理研究的新兴领域。除了传统方法外，深度学习技术最近在信号去噪中得到了成功应用，使诊断过程更加容易，从而实现了以前无法达到的准确性水平。本文提出了一种深度学习方法，利用Tiramisu自动编码器模型来抑制运动伪影噪声，并在高强度运动情况下使ECG信号的R峰突出。去噪后，IBI的估计更加准确，从而加快了诊断任务。结果表明，我们的方法能够从SNR高达-30dB的噪声ECG信号中估计IBI，平均RMSE为13毫秒。在这一噪声水平下，我们的误差百分比保持在8%以下，并优于其他最先进技术。

英文摘要

Inter-beat interval (IBI) measurement enables estimation of heart-rate variability (HRV) which, in turns, can provide early indication of potential cardiovascular diseases. However, extracting IBIs from noisy signals is challenging since the morphology of the signal is distorted in the presence of the noise. Electrocardiogram (ECG) of a person in heavy motion is highly corrupted with noise, known as motion-artifact, and IBI extracted from it is inaccurate. As a part of remote health monitoring and wearable system development, denoising ECG signals and estimating IBIs correctly from them have become an emerging topic among signal-processing researchers. Apart from conventional methods, deep-learning techniques have been successfully used in signal denoising recently, and diagnosis process has become easier, leading to accuracy levels that were previously unachievable. We propose a deep-learning approach leveraging tiramisu autoencoder model to suppress motion-artifact noise and make the R-peaks of the ECG signal prominent even in the presence of high-intensity motion. After denoising, IBIs are estimated more accurately expediting diagnosis tasks. Results illustrate that our method enables IBI estimation from noisy ECG signals with SNR up to -30dB with average root mean square error (RMSE) of 13 milliseconds for estimated IBIs. At this noise level, our error percentage remains below 8% and outperforms other state of the art techniques.

URL PDF HTML ☆

赞 0 踩 0

2606.12191 2026-06-11 cs.CL cs.AI 新提交

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

面向大语言模型的智能体环境工程：环境建模、合成、评估与应用综述

Jiachun Li, Zhuoran Jin, Tianyi Men, Yupu Hao, Kejian Zhu, Lingshuai Wang, Dongqi Huang, Longxiang Wang, Shengjia Hua, Lu Wang, Jinshan Gao, Hongbang Yuan, Ruilin Xu, Kang Liu, Jun Zhao

AI总结本文从环境工程生命周期出发，系统综述了智能体环境的建模、合成、评估与应用，涵盖八种属性与领域、两种合成范式、四种智能体演化路径及三种环境演化范式。

Comments 63 pages, 10 figures

详情

AI中文摘要

环境作为基于大语言模型（LLM）的智能体在不同场景下的交互系统，在推动模型能力持续演进中扮演关键角色。尽管重要性显著，现有工作缺乏系统分类与深入分析。本文从环境工程生命周期的视角系统研究了当前关于智能体环境的研究，涵盖其建模、合成、评估与应用。具体而言，本文首先从八个属性和八个领域引入代表性环境，详细分析其发展路径并突出核心能力。其次，针对自动化环境合成，介绍了两种范式，如符号合成和神经合成。本文还展示了每种范式下的不同环境评估方法。第三，从智能体-环境协同演化的角度讨论了相应的环境应用。具体来说，本文从四个互补视角描述了动态环境中智能体演化的主要路径：以记忆为中心的经验演化、以编排为中心的工作流演化、以轨迹为中心的离线演化和以探索为中心的在线演化。并识别了三种环境演化范式，即神经驱动、难度驱动和规模驱动方法。最后，讨论了几个有前景的未来方向，包括环境即服务、多智能体环境和神经符号环境。

英文摘要

Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite this importance, existing work lacks a systematic categorization and deep analysis. This paper systematically studies current researches on agentic environments from the perspective of the environment engineering lifecycle, covering their modeling, synthesis, evaluation and application. Specifically, the paper first introduces representative environments from the perspectives of eight attributes and eight domains, providing detailed analyses of their development paths and highlighting their core capabilities. Second, for automated environment synthesis, two paradigms are introduced, such as symbolic synthesis and neural synthesis. This paper also shows different environment evaluation methods in each paradigm. Thirdly, the corresponding environment applications from the perspective of agent-environment co-evolution are discussed. In specific, the paper characterizes the primary pathways for agent evolution in dynamic environments from four complementary perspectives: memory-centric experience evolution, orchestration-centric workflow evolution, trajectory-centric offline evolution, and exploration-centric online evolution. And three paradigms of environment evolution are identified, namely neural-driven, difficulty-driven, and scaling-driven approaches. At last, several promising future directions are discussed, including Environment-as-a-Service, Multi-agent Environments, and Neural-Symbolic Environments.

URL PDF HTML ☆

赞 0 踩 0

2606.11891 2026-06-11 cs.RO cs.LG 新提交

Critic Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation

评论家架构的重要性：双评论家与统一评论家在人形机器人移动操作中的对比

Mehmet Turan Yardımcı

AI总结针对人形机器人多目标强化学习，对比统一评论家与双评论家架构，实验表明双评论家策略在到达速度、吞吐量和成功率上显著优于统一评论家，且架构选择比奖励工程影响更大。

Comments Accepted at the ICRA 2026 Workshop on Reinforcement Learning for Imitation Learning (RL4IL), Vienna, Austria. 4 pages, 2 figures

详情

AI中文摘要

人形机器人的多目标强化学习必须在单一策略中协调移动和操作。一个自然的设计选择是使用单一（统一）评论家来估计所有目标的组合价值，还是使用具有不相交奖励信号的单独（双）评论家。我们在NVIDIA Isaac Lab中对Unitree G1人形机器人（23个主动自由度）进行了受控比较，通过一个从静态到达延伸到具有可变方向目标的行走的13级顺序课程训练移动操作策略。在标准化评估中，与统一评论家策略相比，双评论家策略到达目标的速度快3.5倍（6.5 vs. 22.6模拟步），吞吐量高2倍（每1000步验证到达次数14.3 vs. 7.0），并且验证到达率更高（65.2% vs. 53.8%）。值得注意的是，额外的反博弈奖励机制在架构改变之外没有提供进一步改进（60.9% vs. 65.2%）。这些结果对新兴的强化学习微调模仿学习策略范式有直接影响：当使用强化学习优化预训练的操作策略时，统一评论家可能通过竞争性的移动梯度抑制已学习的行为。这些发现表明，评论家架构是多目标人形机器人强化学习中一个首要且常被忽视的设计选择，其对到达效率的影响大于奖励工程。

英文摘要

Multi-objective reinforcement learning for humanoid robots must coordinate locomotion and manipulation within a single policy. A natural design choice is whether to use a single (unified) critic that estimates the combined value of all objectives, or separate (dual) critics with disjoint reward signals. We present a controlled comparison on the Unitree G1 humanoid (23 active DoF) in NVIDIA Isaac Lab, training loco-manipulation policies through a sequential curriculum spanning 13 levels from stationary reaching to walking with variable-orientation targets. In standardized evaluation, dual-critic policies reach targets 3.5$\times$ faster (6.5 vs. 22.6 simulation steps), achieve 2$\times$ higher throughput (14.3 vs. 7.0 validated reaches per 1,000 steps), and attain higher validated reach rates (65.2% vs. 53.8%) compared to the unified-critic policy. Notably, additional anti-gaming reward mechanisms provide no further improvement beyond the architectural change alone (60.9% vs. 65.2%). These results have direct implications for the emerging paradigm of RL fine-tuning of imitation-learned policies: when refining a pre-trained manipulation policy with RL, a unified critic risks suppressing the learned behavior through competing locomotion gradients. These findings demonstrate that critic architecture is a primary - and often overlooked - design choice in multi-objective humanoid RL, with greater impact than reward engineering on reaching efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.11783 2026-06-11 cs.CV 新提交

A Comprehensive Ecosystem for Open-Domain Customized Video Generation

开放域定制视频生成的综合生态系统

Jingxu Zhang, Yuqian Hong, Daneul Kim, Kai Qiu, Qi Dai, Jianmin Bao, Yifan Yang, Xiaoyan Sun, Chong Luo

AI总结提出百万级数据集PexelsCustom-1M和参数高效框架CustoMDiT，仅用8%额外参数实现定制视频生成，并构建千类基准OpenCustom，开源整个生态系统。

Comments 5 pages, 3 figures, 4 tables. Accepted by ICASSP 2026

详情

AI中文摘要

近期视频生成的进展展示了令人印象深刻的视觉合成能力。然而，开放域定制视频生成仍然受到缺乏大规模、带标注的数据集来捕捉多样化的身份特定属性的限制。为了解决这个问题，我们引入了PexelsCustom-1M，这是第一个公开可用的百万级身份保持视频生成数据集，包含跨越8000多个类别的一百万个精心策划的<身份，文本，视频>三元组。利用这一点，我们提出了CustoMDiT，一个参数高效的框架，将预训练的多模态扩散Transformer适配为定制视频生成器，仅增加8%的可学习参数。我们的方法超越了先前的最先进技术。然而，像DreamBooth这样的基准只覆盖了100个类别，对于现实应用来说是不够的。为了克服这一点，我们构建了OpenCustom，一个新的包含1000多个类别的基准，通过ImageNet和MS-COCO的跨数据集知识融合创建。大量实验证实了我们的数据集和模型的优势。我们将开源整个生态系统——包括数据集、流水线、基准和实现——以支持进一步的研究。

英文摘要

Recent progress in video generation has shown impressive visual synthesis capabilities. However, open-domain customized video generation remains limited by the lack of large-scale, annotated datasets capturing diverse identity-specific attributes. To address this, we introduce PexelsCustom-1M, the first publicly available million-scale dataset for identity-preserving video generation, containing one million curated <identity, text, video> triplets across 8,000+ categories. Leveraging this, we propose CustoMDiT, a parameter-efficient framework that adapts a pretrained multimodal Diffusion Transformer into a customized video generator with only 8% additional learnable parameters. Our method surpasses prior state-of-the-art. However, benchmarks such as DreamBooth cover only 100 classes, which is insufficient for real-world applications. To overcome this, we construct OpenCustom, a new benchmark with 1,000+ categories, created via cross-dataset knowledge fusion from ImageNet and MS-COCO. Extensive experiments confirm the advantages of both our dataset and model. We will open-source the entire ecosystem--including dataset, pipeline, benchmark, and implementations--to support further research.

URL PDF HTML ☆

赞 0 踩 0

2606.11710 2026-06-11 cs.CV 新提交

ERN-Net : Evolving Reason Node-Net for Document Binarization

ERN-Net: 用于文档二值化的演化推理节点网络

Hsin-Jui Pan, Sheng-Wei Chan, Jen-Shiung Chiang

AI总结提出ERN-Net，通过演化推理节点和多尺度推理增强退化敏感区域，结合ConvNeXt-Tiny骨干网络和DIBCO预训练，在低数据低内存下实现高效文档二值化。

2606.11348 2026-06-11 cs.LG 新提交

SwiftCTS: Fast Cross-Design Prediction and Pareto Optimization of Clock Tree Metrics via Few-Shot Calibration

SwiftCTS: 通过少样本校准实现时钟树指标的快速跨设计预测与帕累托优化

Barsat Khadka, Kawsher Roxy, Md Rubel Ahmed

AI总结提出SwiftCTS框架，利用物理信息代理模型和K-shot乘法校准机制，在数秒内训练、亚毫秒推理，实现跨设计时钟树指标的准确预测与帕累托优化。

详情

AI中文摘要

时钟树综合（CTS）是物理设计流程中计算成本高昂的阶段，需要迭代调用EDA工具以探索庞大的配置空间，从而优化功耗、线长和时序偏差。现有的机器学习方法需要昂贵的重新训练或微调周期来适应未见过的宏架构，并且在架构上与穷举组合搜索所需的数百万次评估不匹配。我们提出了SwiftCTS，一个物理信息代理框架，同时解决了这两个局限性。通过将轻量级、基于物理的统计特征与梯度提升集成相结合，SwiftCTS在CPU上训练时间不到五秒，且无需GPU支持即可实现亚毫秒级推理。为了处理分布外（OOD）设计而无需重新训练或微调，我们引入了一种K-shot乘法校准机制，该机制仅需一到两次物理参考运行即可锚定预测，将未见过的宏上的功耗预测误差从24.5%降低到3.3%，线长误差从56.6%降低到1%以下。将该引擎与进化优化器集成，SwiftCTS在十秒内评估了100,000个CTS配置，生成了在OpenROAD流程中经过物理验证的帕累托最优前沿。闭环验证确认了功耗和线长的预测误差低于0.5%，时序偏差预测在OOD基准上在五皮秒以内，在所有目标指标上始终优于默认工具启发式方法。代码公开于：\href{this https URL}{this https URL}

英文摘要

Clock Tree Synthesis (CTS) is a computationally expensive stage in the physical design flow, requiring iterative EDA tool invocations to navigate a vast configuration space for optimal power, wirelength, and timing skew. Existing machine learning approaches require computationally expensive retraining or fine-tuning cycles to adapt to unseen macro architectures and are architecturally mismatched to the millions of evaluations demanded by exhaustive combinatorial search. We present SwiftCTS, a physics-informed surrogate framework that addresses both limitations simultaneously. By coupling lightweight, physics-grounded statistical features with gradient-boosted ensembles, SwiftCTS trains in under five seconds on a CPU and delivers sub-millisecond inference without GPU support. To handle out-of-distribution (OOD) designs without retraining or fine-tuning, we introduce a K-shot multiplicative calibration mechanism that anchors predictions to just one or two physical reference runs, reducing power prediction error from 24.5\% to 3.3\% and wirelength error from 56.6\% to under 1\% on unseen macros. Integrating this engine with an evolutionary optimizer, SwiftCTS evaluates 100,000 CTS configurations in under ten seconds, yielding Pareto-optimal frontiers that are physically validated within the OpenROAD flow. Closed-loop validation confirms prediction errors below 0.5\% for power and wirelength, and timing skew predictions within five picoseconds on an OOD benchmark, consistently outperforming default tool heuristics across all target metrics. Code publicly available at: \href{https://anonymous.4open.science/r/SwiftCTS-7E6E}{https://github.com/BarsatKhadka/SwiftCTS}

URL PDF HTML ☆

赞 0 踩 0

2606.11285 2026-06-11 cs.CV 新提交

EventRadar: Long-Range Visual UAV Discovery through Spatiotemporal Event Sensing

EventRadar：通过时空事件感知实现远程视觉无人机发现

Zhiting Zhou, Xingchen Liu, Xinglin Yu, Jiashen Chen, Haoyang Wang, Jingao Xu, Yunhao Liu, Xinlei Chen

AI总结针对远程小目标无人机检测难题，提出EventRadar方法，利用事件相机捕捉螺旋桨引起的时域周期性，结合场景锚定几何证据（SAGE）和梳状引导谐波组学习迭代收缩阈值算法（CHG），在700-1500米距离上实现高精度检测。

详情

AI中文摘要

机场、公共场所及其他敏感区域周围的未经授权无人机活动使得受保护空域监测日益重要。一个实用的感知系统必须搜索广阔的角度区域，发现小型远程目标，并在限制周界被突破前返回方位支持和无人机特定证据。现有的无人机检测路径通常依赖空间组织的证据，如身体范围、轮廓或轨迹连续性。然而，在远距离上，随着目标足迹减弱和图像平面支撑缩小，这些线索变得难以保持和验证。EventRadar遵循一种互补线索：螺旋桨引起的时域周期性，最近的事件相机感知研究表明，在目标外观变弱后，这种周期性可以揭示无人机特有的运动。我们将这一线索扩展到千米级主动感知，使用事件相机原型。场景锚定几何证据（SAGE）将扫描事件与IMU姿态融合，维护一个方位索引的场景记忆，将瞬态候选支撑与持久背景杂波分离。然后，梳状引导谐波组学习迭代收缩阈值算法（CHG）将每个候选视为一个弱的高速率定时信号，并以固定计算量恢复相位不敏感的谐波证据。与相关事件相机基线在700-1500米无人机事件记录上的比较，EventRadar实现了0.990 mAP$_{.3}$和0.949 F1$_{.3}$，将FN$_{.3}$降低到0.009，并在原型分析中展示了实时可行性。

英文摘要

Unauthorized unmanned aerial vehicle (UAV) activity around airports, public venues, and other sensitive sites has made protected-airspace monitoring increasingly important. A practical sensing system must search a wide angular region, find small long-range targets, and return both bearing support and UAV-specific evidence before a restricted perimeter is breached. Existing UAV detection paths often rely on spatially organized evidence, such as body extent, silhouette, or track continuity. At long range, however, these cues become difficult to preserve and verify as the target footprint weakens and its image-plane support shrinks. EventRadar follows a complementary cue: propeller-induced temporal periodicity, which recent event-camera sensing studies have shown can reveal UAV-specific motion after appearance becomes weak. We extend this cue to kilometer-scale active sensing with an event-camera prototype. Scene-Anchored Geometry Evidence (SAGE) fuses scanning events with IMU pose to maintain a bearing-indexed scene memory, separating transient candidate support from persistent background clutter. Comb-guided Harmonic-Group Learned Iterative Shrinkage and Thresholding Algorithm (CHG) then treats each candidate as a weak high-rate timing signal and recovers phase-insensitive harmonic evidence with fixed compute. Compared with related event-camera baselines on 700-1500 m UAV event recordings, EventRadar achieves 0.990 mAP$_{.3}$ and 0.949 F1$_{.3}$, reduces FN$_{.3}$ to 0.009, and shows real-time feasibility in prototype profiling.

URL PDF HTML ☆

赞 0 踩 0