arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1970
专题追踪
2606.12963 2026-06-12 cs.NI cs.DC cs.ET 新提交

ScaleAcross: Designing Multi-Data-Center Infrastructure for Geo-Distributed AI Training

ScaleAcross: 为地理分布式AI训练设计多数据中心基础设施

Naved Inam, Aryan Alpesh Bhavsar, Masabattula Teja Nikhil, Sidharth Sharma

AI总结 本文提出一个基于EVPN-VXLAN的可扩展仿真框架,用于研究地理分布式AI训练中的同步密集型通信和跨站点数据交换问题,通过ECMP、BFD和队列对感知流量分配机制提升性能。

详情
AI中文摘要

AI模型的快速增长和日益增长的数据主权要求正在推动跨多个数据中心的地理分布式AI训练的转变。这种部署引入了由同步密集型通信、跨站点数据交换和广域网延迟约束引起的系统级挑战。本文研究了EVPN-VXLAN作为地理分布式AI训练环境的基础设施基础,并提出了一个可扩展的仿真框架,用于在现实广域网条件下系统研究分布式AI工作负载。所提出的框架结合了VXLAN覆盖网络和基于EVPN的数据中心间连接,并使用ContainerLab和FRRouting(FRR)实现。该框架进一步集成了等价多路径(ECMP)路由、双向转发检测(BFD)和队列对感知流量分配机制,旨在改善同步密集型AI工作负载的通信行为,同时保持与商品基础设施的兼容性。通过使用真实的广域网仿真,我们表征了采用AllReduce和参数服务器通信模式的分布式训练工作负载下的通信和系统行为。结果提供了对地理分布式AI环境中流量分布、弹性和基础设施行为的见解,突显了可重现的多数据中心基础设施框架在可扩展分布式AI训练中的潜力。

英文摘要

The rapid growth of AI models and increasing data sovereignty requirements are driving the transition toward geo-distributed AI training across multiple data centers. Such deployments introduce system-level challenges arising from synchronization-intensive communication, cross-site data exchange, and wide-area latency constraints. This paper investigates EVPN--VXLAN as an infrastructure foundation for geo-distributed AI training environments and presents a scalable emulation framework for systematically studying distributed AI workloads under realistic wide-area conditions. The proposed framework combines VXLAN overlays with EVPN-based inter-data-center connectivity and is implemented using ContainerLab and FRRouting (FRR). The framework further incorporates Equal-Cost Multi-Path (ECMP) routing, Bidirectional Forwarding Detection (BFD), and a queue-pair-aware traffic distribution mechanism designed to improve communication behavior for synchronization-intensive AI workloads while preserving compatibility with commodity infrastructure. Using realistic WAN emulation, we characterize communication and system behavior under distributed training workloads employing AllReduce and Parameter Server communication patterns. Results provide insights into traffic distribution, resilience, and infrastructure behavior in geo-distributed AI environments, highlighting the potential of reproducible multi-data-center infrastructure frameworks for scalable distributed AI training.

2606.12955 2026-06-12 eess.SY cs.SY 新提交

Data-Driven Frequency-Selective Output Regulation of Nonlinear Systems under Almost Periodic Exosignals

数据驱动的非线性系统在几乎周期外信号下的频率选择性输出调节

Yifei Li, Wenjie Liu, Gang Wang, Lihua Xie

AI总结 针对未知非线性系统受几乎周期外信号驱动的问题,提出一种数据驱动内模控制器,无需模型辨识即可实现频率选择性输出调节,并通过收缩理论与Fourier-Bohr分析证明稳态误差的频谱抑制与能量有界性。

详情
AI中文摘要

本文研究了一类由几乎周期外信号驱动的未知连续时间非线性系统的输出调节问题。假设系统动态在给定的非线性字典上线性参数化,而系统、输入通道、输出映射和外信号通道中的所有系数矩阵均未知。由于系统模型不可用,精确的非线性输出调节通常需要模型辨识,然后求解非线性调节方程。为避免这些步骤,我们追求一个频率选择性调节目标:稳态调节误差允许是几乎周期的,但其在指定外系统频率处的Fourier-Bohr系数保证为零,且残差能量被显式有界。为此,在动态控制器中嵌入p-副本内模,得到增广非线性系统,其未知常数矩阵直接由测量数据表示。推导了一个噪声鲁棒的半定规划,无需模型辨识且无需测量外信号幅值或相位即可综合控制器增益。所得闭环向量场在指定工作集上指数收缩,这意味着有界吸引轨迹的存在唯一性。通过将收缩理论与Fourier-Bohr分析相结合,我们证明该稳态轨迹是几乎周期的,调节误差中嵌入的频率分量被消除,未建模的频谱分量满足Parseval型时间平均能量界。在带有缆绳悬挂负载的四旋翼飞行器上的数值和基于物理的仿真验证了所提数据驱动内模设计的有效性。

英文摘要

This paper studies output regulation for a class of unknown continuous-time nonlinear systems driven by almost periodic exosignals. The plant dynamics are assumed to be linearly parameterized over a prescribed nonlinear dictionary, while all coefficient matrices in the plant, input channel, output map, and exosignal channel are unknown. Since the plant model is unavailable, exact nonlinear output regulation would generally require model identification followed by the solution of nonlinear regulator equations. To avoid these steps, we pursue a frequency-selective regulation objective: the steady-state regulation error is allowed to be almost periodic, but its Fourier-Bohr coefficients at prescribed exosystem frequencies are guaranteed to vanish, and the residual error energy is explicitly bounded. To this end, a p-copy internal model is embedded in a dynamic controller, yielding an augmented nonlinear system whose unknown constant matrices are represented directly by measured data. A noise-robust semidefinite program is derived to synthesize the controller gain without model identification and without measuring the exosignal amplitudes or phases. The resulting closed-loop vector field is made exponentially contractive on a prescribed operating set, which implies the existence and uniqueness of a bounded and attracting trajectory. By combining contraction theory with Fourier-Bohr analysis, we prove that this steady-state trajectory is almost periodic, that the embedded-frequency components of the regulation error are eliminated, and that the unmodeled spectral components satisfy a Parseval-type time-averaged energy bound. Numerical and physics-based simulations on a quadrotor with a cable-suspended payload illustrate the effectiveness of the proposed data-driven internal-model design.

2606.12950 2026-06-12 cs.DC 新提交

Maestro: Workload-Aware Cross-Cluster Scheduling for LLM-Based Multi-Agent Systems

Maestro: 面向基于LLM的多智能体系统的工作负载感知跨集群调度

Jinghao Wang, Xiao Zhou, Xiaoyang Sun, Yihui Zhang, Yilong Li, Tianyu Wo, Xu Wang, Chunming Hu, Renyu Yang

AI总结 提出Maestro调度系统,利用智能体语义预测输出长度和内存使用,通过层次化调度(节点级多模型共置、集群级延迟感知路由、全局工作流感知优先级)在严格GPU预算下优化LLM多智能体服务,减少KV缓存HBM占用67.2%,提高高竞争SLO达标率23.6个百分点。

Comments Accepted to the 46th IEEE International Conference on Distributed Computing Systems (ICDCS 2026). 11 pages

详情
AI中文摘要

基于大型语言模型的多智能体系统(LLM-MAS)已成为一种强大的范式,通过将复杂任务分解为专门LLM驱动的智能体的协作工作流来处理这些任务。然而,大规模部署此类多智能体工作负载带来了重大系统挑战。每个用户查询会引发LLM调用的迭代流水线,与单轮查询相比,极大地放大了资源消耗。在资源受限的云环境中,这些工作流面临解码阶段非确定性和输入依赖的成本、具有内存碎片和过度供应的重尾多模型需求,以及跨集群调度权衡。我们提出Maestro,一个为在严格GPU预算下服务LLM-MAS而设计的工作负载感知调度系统。Maestro明确利用智能体语义和角色:它预测每个阶段的输出长度和内存使用,并利用此预测驱动层次化调度器。在节点级别,Maestro通过层次化权重缓存和弹性内存供应实现动态多模型共置。在集群级别,它执行延迟感知路由以避免冷启动延迟和内存过载。在全局级别,它实施工作流感知优先级排序,以最小化交互式任务的队头阻塞。在原型实验和轨迹驱动模拟中,Maestro将KV预留HBM减少了67.2%,并将高竞争SLO达标率比EDF提高了23.6个百分点。

英文摘要

Large Language Model based Multi-Agent Systems (LLM-MAS) have emerged as a powerful paradigm for tackling complex tasks by breaking them into collaborative workflows of specialized LLM-powered agents. However, deploying such multi-agent workloads at scale poses significant system challenges. Each user query spawns an iterative pipeline of LLM calls, greatly amplifying resource consumption compared to single-turn queries. In resource-constrained cloud settings, these workflows face non-deterministic and input-dependent costs at decode stage, heavy-tailed multi-model requirements with memory fragmentation and over-provisioning, and cross-cluster scheduling trade-offs. We present Maestro, a workload-aware scheduling system designed for LLM-MAS serving under strict GPU budgets. Maestro explicitly leverages agent semantics and roles: it predicts the output length and memory usage of each stage and uses this prediction to drive a hierarchical scheduler. At the node level, Maestro enables dynamic multi-model co-location via hierarchical weight caching and elastic memory provisioning. At the cluster level, it performs latency-aware routing to avoid cold-start delays and memory overloads. At the global level, it enforces workflow-aware prioritization to minimize head-of-line blocking for interactive tasks. Across prototype experiments and trace-driven simulations, Maestro reduces KV-reservation HBM by 67.2% and improves high-contention SLO attainment over EDF by 23.6 percentage points.

2606.12946 2026-06-12 cs.CY 新提交

Data Aphasia: An Institutional Counterfactual Study of the Stability of Academic Cognition Under Letter-Grade Evaluation Systems

数据失语症:字母评分制度下学术认知稳定性的制度反事实研究

Li Li, Yu Cao

AI总结 本文提出“数据失语症”概念,通过将百分制成绩转换为字母等级,发现信息熵下降约69%,聚类结构不稳定,诊断一致性波动大,揭示了字母评分制度对认知稳定性的影响。

Comments 36 pages, 14 figures, 16 tables

详情
AI中文摘要

字母评分制度在实现减负目标的同时,是否影响了教育系统对学生学术结构的稳定认知?本文引入“数据失语症”概念,指因机构强制规定的数据呈现形式而对诊断信息表达造成的限制。利用75名小学生68次数学考试数据,采用制度反事实模拟方法将百分制成绩转换为A/B/C/D字母等级,并在信息、结构和诊断层面进行系统检验。结果显示:成绩转换后信息熵下降约69%;全样本下字母评分制度表面稳定(K=4),但移除单个极端锚点学生后,最优K从4增至8,个体诊断身份一致性从95%降至62%;时间一致性在52%至96%之间波动,远低于百分制93%-96%的基线。机制分析表明,离散化在68次考试中将特征空间压缩约19倍;标准化后产生大量伪异质性区域,使密度梯度平坦化,聚类边界对微小扰动高度敏感。基于此,本文提出双轨评价机制,并为理解教育评价改革的认知成本提供了可检验的分析框架。

英文摘要

Does the letter-grade evaluation system, while achieving its burden-reduction goals, affect the education system's stable understanding of students' academic structures? This paper introduces the concept of "data aphasia," referring to restrictions on diagnostic information expression caused by institutionally mandated forms of data presentation. Using data from 68 mathematics examinations administered to 75 primary school students, we employ an institutional counterfactual simulation method to convert percentage scores into A/B/C/D letter grades and conduct systematic tests at the information, structural, and diagnostic levels. Results show that information entropy decreases by approximately 69% after grade conversion; under the full sample, the letter-grade system appears superficially stable (K=4), but removing a single extreme anchor student causes the optimal K to increase from 4 to 8 and individual diagnostic identity consistency to fall from 95% to 62%; temporal consistency fluctuates between 52% and 96%, far below the 93%-96% baseline of the percentage system. Mechanism analysis indicates that discretization compresses the feature space by approximately nineteenfold across 68 examinations; after standardization, it creates extensive pseudo-heterogeneity regions, flattens density gradients, and makes clustering boundaries highly sensitive to minor perturbations. Based on these findings, this paper proposes a dual-track evaluation mechanism and provides a testable analytical framework for understanding the cognitive costs of educational evaluation reform.

2606.12944 2026-06-12 cs.LO 新提交

Testing Theory of Truly Concurrent Processes

真正并发过程的理论测试

Yong Wang

AI总结 本文基于Hennessy的工作,为真正并发过程代数建立测试语义,继承操作语义、公理语义和指称语义的三位一体。

详情
AI中文摘要

一个过程能够以预定义的方式执行一组动作,而真正并发过程则以具有真正并发特性的方式执行这组动作。所谓的真正并发过程代数桥接了真正并发(如Petri网、事件结构等)和交错并发(如CCS、CSP、ACP等)。在本文中,我们遵循Hennessy的重要工作,给出了真正并发过程的测试语义,该语义继承了操作语义、公理语义和指称语义的三位一体。

英文摘要

A process is able to execute a set of actions with a predefined manner, while a truly concurrent process executes this set of actions with a manner with the flavour of true concurrency. The so-called truly concurrent process algebras bridge the true concurrency (such as Petri nets, event structures, etc), and the interleaving concurrency (such as CCS, CSP, ACP, etc). In this paper, we give truly concurrent processes testing semantics followed by Hennessy's great work, which inherits the trinity of operational semantics, axiomatic semantics and denotational semantics.

2606.12887 2026-06-12 cs.CR cs.DC cs.NI 新提交

LNTest: A Testbed for Evaluating Bitcoin Lightning Network-Based Botnets

LNTest: 评估基于比特币闪电网络的僵尸网络的测试平台

Thomas Bakaysa, Ahmet Kurt, Abdul-Salem Beibitkhan, Jesus Maria Romo Diaz de Leon, Tag Kalat, Joshua Kramer, Estela Rodriguez, Abraham Watkins, Abdullah Aydeger

AI总结 提出LNTest测试平台,通过容器化闪电网络节点模拟僵尸网络,发现D-LNBot协议生成聚类链而非均匀链,命令传播呈线性复杂度,且覆盖拓扑影响拆除策略效果。

Comments Accepted at the 21st International Conference on Availability, Reliability and Security (ARES 2026)

详情
AI中文摘要

比特币的闪电网络(LN)可被利用作为僵尸网络的隐蔽、低成本的命令与控制(C&C)通道,如LNBot和D-LNBot设计所示。然而,两者仍仅为通过模拟评估的概念验证原型,关于实际拓扑形成、传播复杂性和抗拆除能力的关键问题尚未解答。我们提出LNTest,首个基于LN的僵尸网络可重用测试平台,基于Core Lightning节点构建,通过Docker容器化并在共享的Bitcoin Core regtest链上运行。LNTest支持三种覆盖拓扑模式(确定性链、自主对等发现和用户提供的图),从而能够跨不同僵尸网络结构进行受控实验。使用LNTest,我们报告三个主要发现。首先,D-LNBot的自主形成协议并未产生其设计中的均匀链;相反,它创建了一个聚类链,其中派系由桥接节点连接,移除这些节点会导致网络分裂。其次,命令传播与僵尸网络规模呈线性关系($\Theta(n)$),而非先前声称的$O(m \log n)$,并且更高的邻居连接度不会带来任何增益。第三,覆盖拓扑决定了拆除策略的有效性:均匀度链抵抗针对性移除但在随机故障下脆弱,无标度拓扑呈现相反模式,而自主形成的聚类链在两种情况下都脆弱,使其成为三者中最易受攻击的。LNTest作为开源发布,附带一个可重现所有实验的脚本,以支持基于LN的僵尸网络防御的可重复研究。

英文摘要

Bitcoin's Lightning Network (LN) can be exploited as a covert, low-cost command-and-control (C&C) channel for botnets, as demonstrated by the LNBot and D-LNBot designs. However, both remain proof-of-concept prototypes evaluated only through simulation, leaving key questions about real-world topology formation, propagation complexity, and resilience to takedowns unanswered. We present LNTest, the first reusable testbed for LN-based botnets, built from Core Lightning nodes containerized with Docker over a shared Bitcoin Core regtest chain. LNTest supports three overlay topology modes (a deterministic chain, autonomous peer discovery, and user-supplied graphs), enabling controlled experiments across different botnet structures. Using LNTest, we report three main findings. First, D-LNBot's autonomous formation protocol does not produce the uniform chain from its design; instead, it creates a clustered chain in which cliques are linked by bridge nodes whose removal fragments the network. Second, command propagation scales linearly with botnet size ($Θ(n)$), not the $O(m \log n)$ previously claimed, and gains nothing from higher neighbor connectivity. Third, the overlay topology determines the effectiveness of takedown strategies: uniform-degree chains resist targeted removal but fragment under random failure, scale-free topologies show the opposite pattern, and the autonomous clustered chain is fragile under both, making it the most vulnerable of the three. LNTest is released as open source, with a script that reproduces all our experiments, to support reproducible research on LN-based botnet defenses.

2606.12885 2026-06-12 cs.NE 新提交

Mixed-Categorical Black-Box Optimization via Information-Geometric Bilevel Decomposition

混合类别黑箱优化:基于信息几何的双层分解

Marc Ong, Shinichi Shirakawa, Youhei Akimoto

AI总结 针对混合类别-连续黑箱优化中类别与连续变量强交互导致性能下降的问题,提出信息几何双层优化框架,外层优化类别变量,内层优化连续变量,并通过热启动策略降低计算成本,在二元-连续域上优于现有方法。

Comments Accepted at PPSN 2026

详情
AI中文摘要

混合类别-连续优化出现在许多实际领域中,但仍然具有挑战性。在黑箱设置中,基于进化策略的方法在将CMA-ES的效率和鲁棒性扩展到混合变量空间方面显示出前景。然而,当存在强类别-连续交互时,这些方法的性能会下降,因为它们的基础搜索分布假设类别变量和连续变量之间独立。为了解决这一限制,我们提出了一个双层优化框架,通过在外循环中优化类别变量,在内循环中优化每个类别配置下的连续变量,显式地捕获这种交互。我们将双层问题的每一层都表述为信息几何优化下的随机松弛。为了减轻双层优化固有的高计算成本,我们引入了一种热启动策略,通过选择多个缓存配置中的最佳配置并在每次迭代后更新缓存来加速下层搜索。在二元-连续域上的实验结果表明,所提出的方法在交互处理能力上优于现有的最先进方法,同时在涵盖先前报告和新提出的交互类型的基准测试中计算效率也更高。

英文摘要

Mixed categorical-continuous optimization arises in many practical domains, yet remains challenging. In the black-box setting, evolution strategy-based approaches have shown promise in extending the efficiency and robustness of the CMA-ES to mixed-variable spaces. However, these methods exhibit worsened performance when strong categorical-continuous interactions are present, as their underlying search distributions assume independence between categorical and continuous variables. To address this limitation, we propose a bilevel optimization framework that explicitly captures such interactions by optimizing over categorical variables in an outer loop, and over continuous variables conditioned on each categorical configuration in an inner loop. We formulate each level of the bilevel problem as a stochastic relaxation under information-geometric optimization. To mitigate the high computational cost inherent to bilevel optimization, we introduce a warm-starting strategy that accelerates the lower-level search by selecting the best among multiple cached configurations and updating the cache after each iteration. Experimental results on binary-continuous domain demonstrate that the proposed method outperforms existing state-of-the-art approaches in interaction-handling capability while also being more computationally efficient across benchmarks encompassing both previously reported and newly proposed types of interaction.

2606.12855 2026-06-12 eess.SY cs.SY 新提交

Computing Headway Bounds under Worst-Case Bunching in Fixed-Line Transit Systems

固定线路公交系统中最坏情况串车下的车头时距界限计算

Michael Yuhas, George Gunter, Jose Paulo Talusan, Aron Laszka, Dan Freudberg, Abhishek Dubey

AI总结 针对公交串车问题,提出一种动态规划方法,在停靠时间和行驶时间有界条件下,计算单条公交线路的最大和最小车头时距,并应用于实际系统验证其规划效用。

Comments 11 pages, 9 figures, to be presented at the 2026 IEEE 32nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)

详情
AI中文摘要

车辆串车是公交运营者面临的一个主要问题。当车辆串车时,前车将服务大部分乘客需求,导致后续车辆低负荷运行,浪费燃料和资金。此外,在串车中最后一辆车通过后,下一辆车到达的时间(车头时距)将变得很大。公交运营者可以通过在沿途站点停靠车辆来对抗串车,以乘坐时间换取均匀的车头时距。虽然先前的工作侧重于开发停站策略以最小化平均串车情况,但没有工作分析在广泛此类策略下可能的最长和最短车头时距。我们假设站点停靠时间和站点间行驶时间有界,并开发了一个动态规划程序,用于计算具有任意数量控制点、车辆和停站策略的单条公交线路的最大和最小车头时距。这些界限是紧的,因为总是可以识别导致其发生的具体事件序列。我们利用这些界限研究不同停站策略、站点位置和车辆数量对线路车头时距和最坏情况串车的影响。最后,我们将这些分析技术应用于田纳西州纳什维尔的一个真实公交系统,并展示其对公交规划的效用。

英文摘要

Vehicle bunching is a major problem for transit operators. When vehicles bunch together, the lead vehicle will service the majority of passenger demand, leaving the following vehicles to operate below capacity, wasting fuel and money. Furthermore, after the last vehicle in the bunch passes, the time before the next vehicle's arrival (headway) will be large. Transit operators can combat bunching by holding buses at stops along a route, trading riding time for even headway times. While prior work has focused on developing holding policies to minimize average case bunching, no work has focused on analyzing the longest and shortest possible headway times under a broad group of such policies. We assume that dwell times at stops and travel times between stops are bounded and develop a dynamic program that computes the maximum and minimum headway times for a single bus route with an arbitrary number of control points, vehicles, and holding policies. These bounds are tight in the sense that it is always possible to identify the specific sequence of events that lead to their occurrence. We use these bounds to investigate the effects of different holding policies, stop placement, and number of vehicles on route headways and worst-case bunching. Finally, we apply these analysis techniques to a real-world transit system in Nashville, TN and show their utility for transit planning.

2606.12850 2026-06-12 cs.DC 新提交

High-Order Spectral Element Methods for Wave Propagation on ARM Multicore CPU with SME: Optimizations and Implications

基于谱元法的波传播在ARM多核CPU上的高阶谱元方法:优化与启示

Yinuo Wang, Lin Gan, Tianqi Mao, Wubing Wan, Zekun Yin, Wenqiang Wang, Wei Xue, Guangwen Yang

AI总结 针对ARM多核CPU,利用可扩展矩阵扩展(SME)优化谱元法波传播代码SPECFEM3D,提出SME感知的批处理小矩阵内核、混合MPI+OpenMP执行方案及色散精度分析,实现4-6倍性能提升,并揭示SME使高多项式阶更优。

详情
AI中文摘要

基于谱元法(SEM)的波传播是一种代表性HPC工作负载,但现有SEM实现与新兴的具有可扩展矩阵扩展(SME)的ARM多核CPU不匹配。我们在新兴的LX2处理器上提出了一个启用SME的SPECFEM3D优化方案,该方案结合了用于SEM张量积算子的SME感知批处理小矩阵内核、用于有限HBM系统的内存感知混合MPI+OpenMP执行方案,以及基于色散的(h,p)权衡等精度研究。在固定多项式阶下,优化后的实现将完整应用性能提升4-6倍,并优于优化的非SME CPU基线。除了这些实现层面的提升,我们的结果表明,SME沿着色散等精度边界将性能有利的工作点移向更高的多项式阶,进一步减少了求解时间和工作集大小。这些结果表明,SME不仅影响内核效率,还影响现代ARM多核平台上SEM的实际离散化权衡。

英文摘要

Wave propagation based on the spectral element method (SEM) is a representative HPC workload, but existing SEM implementations are not well matched to emerging ARM multicore CPUs with Scalable Matrix Extension (SME). We present an SME-enabled optimization of \textsc{SPECFEM3D} on the emerging LX2 processor that combines an SME-aware batched small-matrix kernel for SEM tensor-product operators, a memory-aware hybrid MPI+OpenMP execution scheme for limited-HBM systems, and a dispersion-based iso-accuracy study of the $(h,p)$ tradeoff. At fixed polynomial order, the optimized implementation improves full-application performance by 4--6$\times$ over the original code and delivers clear gains over optimized non-SME CPU baselines. Beyond these implementation-level gains, our results suggest that SME shifts the performance-favorable operating point toward higher polynomial orders along the dispersion-based iso-accuracy frontier, further reducing time-to-solution and working-set size. These results indicate that SME affects not only kernel efficiency, but also the practical discretization tradeoff for SEM on modern ARM multicore platforms.

2606.12803 2026-06-12 eess.SY cs.SY 新提交

Homotopy-Based Re-Initialization for Switched DAEs in Power System Transient Simulation

基于同伦的电力系统暂态仿真中切换DAE的重新初始化

Ahmad Ali, Hantao Cui

AI总结 针对电力系统暂态仿真中切换微分代数方程在事件后收敛失败的问题,提出基于同伦延拓的全局重新初始化方法,恢复收敛性。

Comments Manuscript submitted to IEEE Power and Energy Society Letters and is currently under revision

详情
AI中文摘要

电力系统暂态仿真中切换微分代数方程(DAE)的联立求解可能在非连续事件后失去收敛性。这一困难通常被解释为事件后初始化不良问题。本文提出了一个几何框架,解释了潜在的收敛机制,并阐明了为什么标准的收敛恢复方法可能在非连续点失效。基于这一解释,开发了一种基于同伦延拓的全局重新初始化方案来恢复收敛性。通过电力系统暂态仿真中代表性非连续性的数值模拟验证了所提方法。结果表明,在直接事件后求解失败的情况下,所提方案能够可靠地恢复收敛性。

英文摘要

The simultaneous solution of switched differential-algebraic equations (DAEs) in power system transient simulation may suffer convergence loss following discontinuous events. This difficulty is typically interpreted as a poor post-event initialization problem. This letter presents a geometric framework that explains the underlying convergence mechanism and clarifies why standard convergence-restoration methods may fail at discontinuities. Based on this interpretation, a homotopy-continuation based globalized re-initialization scheme is developed to restore convergence. The proposed method is validated through numerical simulations of representative discontinuities in power system transient simulation. Results show that in the cases where direct post-event solution fails, the proposed scheme can reliably recover convergence.

2606.12801 2026-06-12 cs.CY 新提交

AiAWE: An Open-Source LLM Automated Writing Evaluation System Using LoRA-Adapted Instruction-Tuned Models

AiAWE: 一种使用LoRA适配指令微调模型的开源LLM自动写作评估系统

John Maurice Gayed

AI总结 提出AiAWE开源自动写作评估系统,通过LoRA适配指令微调Gemma-3-27B模型,在TOEFL独立写作数据集上达到0.474 RMSE和90.56%一致率,优于更大模型和GPT-3.5基线。

Comments 21 pages with 7 tables and 1 figure and appendices

详情
AI中文摘要

本研究提出了AiAWE,一个开源的自动写作评估系统,该系统使用LoRA适配的指令微调大语言模型(Gemma-3-27B-it)对议论文进行评分。使用包含480篇TOEFL独立写作论文的专有教育考试服务中心(ETS)数据集,我们在120篇论文的训练子集上以相同的LoRA配置微调Gemma-3-27B和LLaMA-3.3-70B,并在剩余的360篇论文上以相同的推理量化进行评估。微调后的Gemma模型实现了0.474的均方根误差、0.828的二次加权kappa,以及在人类评分±0.5范围内的90.56%一致率,优于在同一数据集上先前工作中报告的更大LLaMA-3.3-70B模型和微调GPT-3.5基线。三个发现具有更广泛的意义:开放权重LLM可以在符合评分标准的评分中匹配或超越专有微调;模型规模不是LoRA适配下下游性能的可靠预测指标;相同的LoRA超参数在不同架构中产生定性的不同适配行为。该系统运行在消费级服务器上,并通过此https URL公开访问。LoRA适配器、应用程序代码和微调YAML文件通过各自的仓库公开提供。

英文摘要

This study presents AiAWE, an open-source automated writing evaluation system that scores argumentative essays using a LoRA-adapted instruction-tuned large language model (Gemma-3-27B-it). Using a proprietary Educational Testing Service (ETS) dataset of 480 TOEFL Independent Writing essays, we fine-tune Gemma-3-27B and LLaMA-3.3-70B under identical LoRA configurations on a 120-essay training subset and evaluate on the remaining 360 essays under identical inference quantization. The fine-tuned Gemma model achieves a root mean square error of 0.474, a quadratic weighted kappa of 0.828, and an agreement rate of 90.56% within +/- 0.5 of the human score, outperforming both the larger LLaMA-3.3-70B model and the fine-tuned GPT-3.5 baseline reported in prior work on the same dataset. Three findings are of broader interest: open-weight LLMs can match or exceed proprietary fine-tuning for rubric-aligned scoring; model scale is not a reliable predictor of downstream performance under LoRA adaptation; and identical LoRA hyperparameters produce qualitatively different adaptation behaviors across architectures. The production system runs on a consumer-grade server and is publicly accessible at https://app.awade.gec.waseda.ac.jp. LoRA adapters, application code, and fine-tuning YAMLs are publicly available through their respective repositories.

2606.12798 2026-06-12 eess.SY cs.SY 新提交

Pushing the Frontiers for Floating Solar Photovoltaics -- The Case for South America

推动漂浮式太阳能光伏发电的前沿——以南美洲为例

Soham Ghosh, Anik Goswami, Krishna Kumba

AI总结 提出一个技术-社会经济框架评估漂浮式光伏在能源获取、水安全和电网灵活性方面的潜力,以尼加拉瓜、洪都拉斯和圭亚那为例,显示50-398 MW系统年发电量超过1500-2000 kWh/kW,容量因子超20%,成本可与地面光伏竞争。

Comments 63 pages, 20 tables, 18 figures

详情
AI中文摘要

漂浮式太阳能光伏(FSPV)系统为在能源匮乏地区扩大清洁电力获取提供了一条土地高效利用的途径。南美洲拥有全球最高的FSPV潜力(约每百万英亩水面38.26 TWh),但部署仍然有限。本研究提出了一个技术-社会经济框架,用于评估FSPV在能源获取、水安全和电网灵活性方面的作用,并以尼加拉瓜、洪都拉斯和圭亚那为例进行研究。50至398 MW系统的预计年发电量超过1500至2000 kWh/kW,容量因子超过20%。在El Cajon,FSPV相对于化石发电可显著减少排放。结果显示,当考虑避免的土地使用、共享水电基础设施和水资源效益时,FSPV的成本与地面光伏相比具有竞争力。该框架还强调了与水电和AI数据中心的联合选址,为在欠发达地区的部署提供了可扩展的模型。

英文摘要

Floating solar photovoltaic (FSPV) systems provide a land-efficient pathway to expand clean electricity access in energy-poor regions. South America has among the highest global FSPV potential (approx 38.26 TWh per million acres of water surface), yet deployment remains limited. This study presents a techno-socio-economic framework to assess FSPV for energy access, water security, and grid flexibility, with case studies in Nicaragua, Honduras, and Guyana. Estimated yields for 50 to 398 MW systems exceed 1,500 to 2,000 kWh per kW annually with capacity factors above 20 percent. At El Cajon, FSPV could significantly reduce emissions relative to fossil generation. Results show competitive costs with land-based PV when accounting for avoided land use, shared hydropower infrastructure, and water benefits. The framework also highlights co-location with hydropower and AI data centers, offering a scalable model for deployment in underserved regions.

2606.12793 2026-06-12 cs.CR cs.IR 新提交

Semantic Identification of IoT Devices from Behavioral Primitives

基于行为基元的物联网设备语义识别

Samuel Witt, Hassan Habibi Gharakheili

AI总结 提出利用制造商使用描述(MUD)配置文件中的访问控制条目(ACE)作为行为基元,通过语义表示匹配实现物联网设备识别,在公开数据集和真实流量上验证了有效性。

Comments 14 pages, 3 figures, 4 tables

详情
AI中文摘要

物联网设备的准确识别对于安全管理和策略执行至关重要。现有方法通常从数据包或流记录中学习设备签名,这些方法基于低级通信观测,其流量模式可能因部署、软件版本和用户交互而异。本文研究使用制造商使用描述(MUD)配置文件进行设备识别。MUD配置文件使用访问控制条目(ACE)描述设备行为,每个ACE代表一个由协议、端点、方向和端口语义组成的行为基元,这些语义源自设备通信策略。我们的贡献有三点。首先,利用28个公开可用的MUD配置文件(包含1023个ACE实例),我们从紧凑的行为文本构建ACE级语义表示,并分析其几何特性。ACE级表示比整个配置文件嵌入更有效地保留设备级行为区分,并在白化校准后仍然有效。其次,我们在受控运行时变化下评估语义ACE匹配,包括未见过的ACE、漂移的主机名和部分运行时观测。当与规范MUD配置文件的重叠度较高时,精确ACE匹配表现良好,但当重叠变得稀疏或消失时性能急剧下降。相比之下,语义ACE匹配在这些条件下保留了有用的识别证据。第三,我们在包含超过80万个观测流的真实物联网流量轨迹上评估了相同方法。当存在稳定重叠时,精确重叠仍然是最强的信号,而语义ACE匹配在观测早期阶段提供更强的识别证据,经常将正确设备保留在排名最高的候选中,并在稀疏重叠的运行时流量下保持有效。

英文摘要

Accurate identification of IoT devices is important for security management and policy enforcement. Existing approaches typically learn device signatures from packets or flow records. These methods operate on low-level communication observations whose traffic patterns may vary across deployments, software versions, and user interactions. This paper studies device identification using Manufacturer Usage Description (MUD) profiles. MUD profiles describe device behavior using Access Control Entries (ACEs), where each ACE represents a behavioral primitive consisting of protocol, endpoint, direction, and port semantics derived from device communication policy. Our contributions are threefold. First, using 28 publicly available MUD profiles containing 1,023 ACE instances, we construct ACE-level semantic representations from compact behavioral text and analyze their geometric properties. ACE-level representations preserve device-level behavioral distinctions more effectively than whole-profile embeddings and remain effective after whitening calibration. Second, we evaluate semantic ACE matching under controlled runtime variations, including unseen ACEs, drifted hostnames, and partial runtime observation. Exact ACE matching performs well when the overlap with the canonical MUD profile remains high, but degrades sharply when the overlap becomes sparse or disappears. In contrast, semantic ACE matching preserves useful identification evidence across these conditions. Third, we evaluate the same approach on real IoT traffic traces comprising more than 800,000 observed flows. Exact overlap remains the strongest signal when stable overlap exists, while semantic ACE matching provides stronger identification evidence during the early stages of observation, frequently retains the correct device among the highest-ranked candidates, and remains effective under sparse-overlap runtime traffic.

2606.12788 2026-06-12 cs.SI cs.CY cs.DC cs.SY econ.GN eess.SY q-fin.EC 新提交

To Share or Not to Share: Orchestrating Trustworthy Data in Global Value Chains

共享还是不共享:协调全球价值链中的可信数据

Han-Teng Liao, Chang-Yi Kao

AI总结 针对欧盟CBAM带来的监管透明与数据主权矛盾,提出基于IDSA框架的RegTech参考架构,通过主权数据交换实现数字产品护照,驱动全球商业服务能力需求,并集成Agentic AI与绿色金融,为全球产业集群提供可扩展蓝图。

详情
AI中文摘要

随着欧盟碳边境调节机制(CBAM)的临近,全球半导体价值链在监管透明度和数据主权之间面临日益增长的结构性紧张。本文提出了一种使用国际数据空间(IDSA)框架的RegTech参考架构,以在半导体-石化关联领域协调可信的环境遥测。该架构区分了强制性CBAM要求和自愿性科学碳目标倡议(SBTi)框架,同时解决了安全与可持续设计(SSbD)框架的附加复杂性。超越标准线性技术栈,我们引入了一种前瞻性路线图方法,将上游物理脆弱性转化为循环的负反馈循环。聚焦台北和槟城技术走廊,本文详细说明了主权数据交换如何使数字产品护照(DPP)能够驱动全球商业服务(GBS)能力需求。最后,我们讨论了集成Agentic AI以实现自主合规以及金融科技绿色融资,为全球产业集群实现主权、可持续和透明的价值链提供了可扩展蓝图。

英文摘要

As the EU Carbon Border Adjustment Mechanism (CBAM) approaches, the global semiconductor value chain faces growing structural tensions between regulatory transparency and data sovereignty. This article proposes a RegTech reference architecture using the International Data Spaces (IDSA) framework to orchestrate trustworthy environmental telemetry across the semiconductor-petrochemical nexus. The framework distinguishes the mandatory CBAM requirements from voluntary Science Based Targets initiative (SBTi) frameworks, while addressing the additive complexities of the Safe-and-Sustainable-by-Design (SSbD) framework. Moving beyond standard linear technology stacks, we introduce a prospective roadmapping methodology that transforms upstream physical vulnerabilities into circular, negative feedback loops. Focusing on the Taipei and Penang technology corridor, the article details how sovereign data exchange enables Digital Product Passports (DPPs) to drive Global Business Services (GBSs) capability demands. Finally, we discuss the integration of Agentic AI for autonomous compliance and FinTech green financing, providing a scalable blueprint for global industrial clusters to achieve sovereign, sustainable, and transparent value chains.

2606.12787 2026-06-12 cs.SI cs.CY cs.SY econ.GN eess.SY q-fin.EC q-fin.RM 新提交

Orchestrating the Twin Transition in Multinational Corporations: Technology Roadmapping for Green and Digital Global Business Services

跨国企业中的双重转型编排:面向绿色与数字全球商业服务的技术路线图

Han-Teng Liao, Karen Ang

AI总结 本文综合技术路线图与ITU创新生态系统工具,提出社会技术框架,分析跨国企业全球商业服务如何通过“可持续智能”演进,协调绿色与数字双重转型,并识别关键枢纽国家的作用。

Comments 9 pages, 6 figures

详情
AI中文摘要

全球商业服务(GBS)已成为绿色与数字双重转型的“活实验室”,因为跨国企业(MNCs)面临协调数字效率与环境管理的日益增长的压力。为推导出一个社会技术框架,本文将技术路线图(TRM)与国际电信联盟(ITU)以ICT为中心的创新生态系统工具包相结合。对研究集群的文献计量分析揭示了从基本流程自动化向“可持续智能”的演进转变,将GBS单元识别为中央“操作气闸”,在景观压力(如欧盟双重指令和碳边境调节机制)与AI原生工作流中的利基创新之间进行调解。研究进一步将这些集群映射到利益相关者参与画布上,突出显示波兰、葡萄牙和马来西亚的韧性“中等强国”枢纽如何绕过中等收入陷阱,在地缘政治分裂的云环境中为全球价值链提供“第三条道路”。结果为领导者及创业支持网络提供了数据驱动的设计方法,以编排人才和供应链流动,从而丰富对工业5.0的概念理解以及GBS作为在动荡、多极数字经济中导航的主要机制的作用。

英文摘要

Global Business Services (GBS) have emerged as a "living laboratory" for the Twin Transition of Green and Digital Transformation, as multinational corporations (MNCs) face increasing pressure to harmonize digital efficiency with environmental stewardship. Aiming to derive a socio-technical framework, this paper synthesizes Technology Roadmapping (TRM) with the International Telecommunication Union (ITU) ICT-centric innovation ecosystem toolkit. A bibliometric analysis of research clusters reveals an evolutionary shift from basic process automation toward "Sustainable Intelligence," identifying the GBS unit as a central "operational airlock" that mediates between landscape pressures -- such as the EU's dual mandate and Carbon Border Adjustment Mechanisms -- and niche innovations in AI-native workflows. The study further maps these clusters onto a stakeholder engagement canvas, highlighting how resilient "Middle Power" hubs in Poland, Portugal, and Malaysia are bypassing the middle-income trap to provide a "third way" for global value chains amidst a bifurcated geopolitical cloud. The results offer a data-driven design approach for leaders and entrepreneurial support networks to orchestrate talent and supply chain flows, thereby enriching the conceptual understanding of Industry 5.0 and the role of GBS as a primary mechanism for navigating a volatile, multipolar digital economy.

2606.12785 2026-06-12 cs.GT 新提交

The No-show Paradox in Single Transferable Vote under One-dimensional Preferences

一维偏好下单一可转移投票中的缺席悖论

Farhad Mohsin

AI总结 研究一维偏好模型下单一可转移投票(STV)的群体缺席悖论,发现极端选民弃权易引发悖论,且随候选人数增加概率显著上升。

详情
AI中文摘要

群体缺席悖论(GNSP)是指一组选民弃权后,新获胜者更受他们偏好。先前研究表明,即使对于易受此悖论影响的投票规则,在实际选举和多种假设下,该悖论也罕见发生。然而,我们发现,在一维偏好模型(如1D-Euclidean、单峰或单交叉偏好)下,流行的 runoff 规则——单一可转移投票(STV)——极易受到 GNSP 的影响。这与另一类易受 GNSP 影响的规则——Condorcet 规则——形成鲜明对比,后者在这些一维偏好下不会出现悖论。我们从理论上识别了 STV 在一维偏好模型下发生 GNSP 的易于处理且普遍存在的充分条件。通过理论结果和来自这些领域的合成偏好配置实验,我们证明一维频谱两端的选民特别容易因弃权而引发 GNSP。此外,随着备选方案数量的增加,发生的可能性显著增加。

英文摘要

The group no-show paradox (GNSP) occurs when a group of agents abstaining from voting can make the new winner more preferred to them. Previous work has suggested that even for voting rules susceptible to this paradox, it is a rare occurrence in real elections and under various assumptions. However, we find that under one-dimensional preference models such as 1D-Euclidean, single-peaked, or single-crossing preferences, Single Transferable Vote (STV), a popular runoff rule, is highly vulnerable to GNSP. This is in stark contrast to Condorcet rules, another family of rules susceptible to GNSP, where the paradox cannot occur under these one-dimensional preferences. We theoretically identify tractable and prevalent sufficient conditions for GNSP to occur for STV under one-dimensional preference models. Through our theoretical results and experiments with synthetic preference profiles from these domains, we demonstrate that voters at the extremes of the 1D spectrum are particularly likely to cause GNSP by abstaining. Furthermore, the likelihood of occurrence increases substantially as the number of alternatives grows.

2606.12768 2026-06-12 eess.SY cs.SY 新提交

Patching Control Lyapunov Barrier Functions for Temporal Logic Specifications with Bounded Controls

带界控制的时序逻辑规格的控制李雅普诺夫障碍函数修补

Ruikun Zhou, Yating Yuan, Haocheng Chang, Yinan Li, Yiming Meng

AI总结 提出无抽象框架,结合LTL任务序列分解与控制李雅普诺夫障碍函数,通过修补获胜集实现带界控制输入的连续时间系统控制器综合。

详情
AI中文摘要

我们提出了一个无抽象框架,用于受线性时序逻辑(LTL)规格和带界控制输入约束的连续时间动力系统的控制器综合。所提出的方法将LTL任务的序列分解与使用形式化认证的控制李雅普诺夫障碍函数(CLBF)相结合。通过将局部规格表述为一系列安全稳定问题,我们系统地近似和修补分解子任务的获胜集。这些局部约束的满足由离线计算的CLBF水平集保证。因此,我们的框架产生了形式化验证的切换反馈控制器,能够实现高效的在线规划和动态重规划。这确保了在存在状态扰动的情况下鲁棒地满足连续规格,避免了文献中通常需要的显式状态空间抽象。通过数值模拟和Crazyflie四旋翼飞行器的硬件演示验证了该方法。

英文摘要

We propose an abstraction-free framework for controller synthesis for continuous-time dynamical systems subject to Linear Temporal Logic (LTL) specifications and bounded control inputs. The proposed method combines the sequential decomposition of LTL tasks with the use of formally certified Control Lyapunov-Barrier Functions (CLBFs). By formulating local specifications as a sequence of safe-stabilization problems, we systematically approximate and patch the winning sets of the decomposed subtasks. The satisfaction of these local constraints is guaranteed by the offline-computed level sets of the CLBFs. As a result, our framework yields formally verified switching feedback controllers that enable efficient online planning and dynamic re-planning. This ensures robust continuous specification satisfaction in the presence of state perturbations, avoiding the explicit state-space abstractions commonly required in the literature. The approach is validated through numerical simulations and a hardware demonstration on a Crazyflie quadrotor.

2606.12753 2026-06-12 cs.DC 新提交

On the Limits of Performance Portability in Directive-Based GPU Programming

基于指令的GPU编程中性能可移植性的极限

Alessandro Romeo, Nitin Shukla, Stefano Truzzi, Alessio Suriano, Andrea Mignone

AI总结 本文通过将天体物理磁流体动力学代码gPLUTO从OpenACC移植到OpenMP,评估了基于指令的GPU编程在NVIDIA A100和AMD MI250X上的性能可移植性,发现应用级性能差异可达3倍,核函数级可达47倍,主要受内存延迟和编译器限制影响。

Comments 8 pages, 1 plots, 5 tables

详情
AI中文摘要

科学应用向GPU加速的百亿亿次系统的过渡受到性能、可移植性和生产力之间权衡的限制。本文通过将用于天体物理模拟的生产级磁流体动力学代码gPLUTO从OpenACC移植到OpenMP,并分析其在NVIDIA A100(Leonardo Booster)和AMD MI250X(LUMI-G)设备上的性能,评估了基于指令的GPU编程的性能可移植性。在NVIDIA平台上,由于共享编译器后端,OpenACC和OpenMP实现了可比的性能,为评估算法效率提供了一致的基线。相比之下,相同的OpenMP实现在AMD MI250X上的应用级性能比NVIDIA A100上的OpenACC基线慢约三倍,核函数级减速高达一个数量级,这是由于对跨步内存访问模式和编译器限制的敏感性。核函数级分析显示,运行时的主要贡献者是内存延迟受限,而非峰值带宽限制。在低并行度核函数中,C++抽象层增加了寄存器压力和溢出,导致特定情况下高达47倍的极端减速。这些结果表明,跨GPU架构的可移植性能不仅需要应用级更改,还需要编译器后端和架构感知优化策略的持续进步。

英文摘要

The transition of scientific applications to GPU-accelerated exascale systems is constrained by trade-offs between performance, portability, and productivity. This work evaluates the performance portability of directive-based GPU programming by porting gPLUTO, a production-grade magnetohydrodynamics code for astrophysical simulations, from OpenACC to OpenMP, and analyzing its performance on NVIDIA A100 (Leonardo Booster) and AMD MI250X (LUMI-G) devices. On NVIDIA platforms, OpenACC and OpenMP achieve comparable performance due to a shared compiler backend, providing a consistent baseline for assessing algorithmic efficiency. In contrast, the same OpenMP implementation is approximately three times slower at the application level on AMD MI250X with respect to the NVIDIA A100 OpenACC baseline, with kernel-level slowdowns reaching up to an order of magnitude, driven by sensitivity to strided memory-access patterns and compiler limitations. Kernel-level profiling shows that the dominant contributors to run-time are memory-latency-bound rather than limited by peak band-width. In low-parallelism kernels, C++ abstraction layers increase register pressure and spilling, leading to extreme slowdowns of up to 47x in specific cases. These results indicate that portable performance across GPU architectures requires not only application-level changes but also continued advances in compiler backends and architecture-aware optimization strategies

2606.12719 2026-06-12 cs.HC 新提交

A Multiplexing Design Space: Theory, Method, and Application

复用设计空间:理论、方法与应用

Yiwen Xing, Afrah Farea, Saiful Khan, Min Chen

AI总结 提出一种针对特定应用约束的复用设计空间探索方法,以机器学习工作流中多个二维标量场分析为例,通过三步设计流程和预设计步骤,识别出相对最优的默认复用设计及用户可控的微小变体。

详情
AI中文摘要

许多可视化设计都包含被称为“视觉复用”的现象,即与同一数据点相关的多条信息同时被传达。尽管可视化设计者常常能无意识地将这种现象融入设计,但视觉复用的设计空间非常庞大,且系统性地将其作为设计模式进行探索并不常见。本文提出一种设计方法,用于探索受应用约束的较小设计空间。作为一个说明性案例研究,我们聚焦于开发逼近偏微分方程的机器学习模型的工作流。在这些工作流中,机器学习研究人员需要频繁分析多个二维标量场之间的相互关系。由于将热力图叠加在另一热力图之上并非有效设计,我们制定了三个设计步骤来探索多个二维标量场背景下视觉复用的设计空间。我们的设计方法还包括一个用于领域基础化和理论分析的预设计步骤,并让领域专家参与协同设计和评估活动。该设计过程使我们能够识别出相对最优的默认复用设计,以及领域专家可通过用户界面控制的小变体的需求。

英文摘要

Many visualization designs feature phenomena referred to as ``visual multiplexing'', where multiple pieces of information associated with the same data point are conveyed simultaneously. Although visualization designers are able to bring such phenomena, often unconsciously, into their designs, the design space of visual multiplexing is huge, and it is uncommon to explore visual multiplexing systematically as design patterns. In this paper, we propose a design method for exploring a smaller design space constrained by an application. As an illustrative case study, we focus on machine learning (ML) workflows for developing ML models that approximate partial differential equations (PDEs). In these workflows, ML researchers need to analyze the inter-relationships among multiple 2D scalar fields frequently. Since superimposing one heatmap on top of another is not an effective design, we formulate three design steps to explore the design space of visual multiplexing in the context of multiple 2D scalar fields. Our design method also includes a pre-design step for domain grounding and theoretical analysis, and involves domain experts in both co-design and evaluation activities. The design process enables us to identify relatively optimal default multiplexing designs as well as the need for small variations that domain experts can control through a user interface.

2606.12695 2026-06-12 eess.SY cs.SY 新提交

Polymer-based Capacitive Micromachined Transducer-Enabled Inline Monitoring of Ultrasonic Welding in Thermoplastic Carbon Fiber Composites

基于聚合物的电容式微机械超声换能器用于热塑性碳纤维复合材料超声焊接的在线监测

Jonas Welsch, Dominik Goerick, Martin Angerer, Jinhao Lu, Sergei Vostrikov, Michael Kupke, Heinz Voggenreiter, Andrea Cossettini, Luca Benini, Edmond Cretu, Robert Rohling

AI总结 提出一种集成聚合物CMUT的低成本无线超声无损检测系统,实时监测热塑性碳纤维复合材料焊接过程,成功检测所有预设缺陷,为智能质量监控奠定基础。

Comments 15 pages, 12 Figures

详情
AI中文摘要

热塑性复合材料结构可实现轻量化、可回收和高通量的航空航天制造,但先进连接工艺的可靠质量保证仍是一个关键挑战。本文提出一种紧凑、低成本、无线的超声无损检测系统,用于实时在线监测热塑性碳纤维复合材料的连续超声焊接。该系统将定制制造的聚合物基电容式微机械超声换能器(polyCMUT)与超低功耗WULPUS平台集成,使其能够在恶劣、高干扰的焊接环境中工作。设计、制造、封装并集成了一个八元线性polyCMUT阵列,中心频率约为3.6 MHz,集成到工业焊接设备中。在焊接碳纤维层压板时进行在线测量,层压板中故意引入了缺陷。过程同步的超声数据揭示了缺陷位置处回波深度的一致偏移,与X射线计算机断层扫描基准结果高度一致。在21次焊接中,所有诱导缺陷均被检测到,无假阴性,且假阳性有限。结果表明,基于聚合物的CMUT技术实现了稳健、可扩展且与制造兼容的超声传感,为下一代热塑性复合材料焊接的智能过程监控和质量保证迈出了决定性的一步。

英文摘要

Thermoplastic composite structures enable lightweight, recyclable, and high-throughput aerospace manufacturing, but reliable quality assurance of advanced joining processes remains a key challenge. This work presents a compact, low-cost, and wireless ultrasonic non-destructive testing system for real-time, inline monitoring of continuous ultrasonic welding of thermoplastic carbon fiber composites. The system integrates custom-fabricated polymer-based capacitive micromachined ultrasonic transducers (polyCMUTs) with the ultra-low-power WULPUS platform, enabling operation in the harsh, high-interference welding environment. An eight-element linear polyCMUT array operating at a center frequency of approximately 3.6 MHz is designed, fabricated, packaged, and integrated into an industrial welding setup. Inline measurements are performed during welding of carbon fiber laminates with intentionally introduced defects. Process-synchronous ultrasonic data reveal consistent depth-of-echo shifts at defect locations, in strong agreement with X-ray computed tomography ground truth. Across 21 welds, all induced defects are detected without false negatives and with limited false positives. The results demonstrate that polymer-based CMUT technology enables robust, scalable, and manufacturing-compatible ultrasonic sensing, representing a decisive step toward intelligent process monitoring and quality assurance for next-generation thermoplastic composite welding.

2606.12692 2026-06-12 cs.DS cs.DM 新提交

Random Proposals: A Softmax-Based Local-Improvement Framework for Maximum Weighted Matching

随机提议:基于Softmax的局部改进框架用于最大加权匹配

Ahmed M. Alzuhair, Ahmed Alherz

AI总结 提出一种基于softmax偏置采样的随机局部改进算法,实现局部ε-优势,达到期望1/2-ε近似比,时间复杂度为O(m log(1/ε)/p_min),在温和条件下简化为O(m log(1/ε))。

详情
AI中文摘要

我们针对最大加权匹配问题提出了一种随机局部改进算法。该方法引入了一种基于softmax的偏置采样机制,实现局部ε-优势,并达到期望的1/2-ε近似比。我们证明了收敛性保证,并表明算法运行时间为O(m log(1/ε)/p_min),其中p_min是所有边上最小的softmax提议概率;在偏置参数和权重范围的温和条件下,这简化为O(m log(1/ε))。该框架提供了收敛速度与近似质量之间的可调权衡。

英文摘要

We propose a randomized local-improvement algorithm for the Maximum Weighted Matching (MWM) problem. Our method introduces a softmax-based biased sampling mechanism that achieves local $\varepsilon$-dominance and yields an expected $\frac{1}{2}-\varepsilon$ approximation ratio. We prove convergence guarantees and show that the algorithm runs in $O\!\left(m\log(1/\varepsilon)/p_{\min}\right)$ time, where $p_{\min}$ is the minimum softmax proposal probability over all edges; under mild conditions on the bias parameter and weight range, this simplifies to $O(m\log(1/\varepsilon))$. The framework provides a tunable tradeoff between convergence speed and approximation quality.

2606.12676 2026-06-12 cs.LO cs.CG 新提交

A Calculus of Apartness over Separoids: Effective Convex Representation, Stratified Conservativity, and the Complexity of Entailment

分离体上的相离关系演算:有效凸表示、分层保守性与蕴含复杂性

Faruk Alpay, Baris Basaran

AI总结 研究有限族紧凸体诱导的相离关系,提出有效有理实现定理,证明布尔蕴含的完备性与可判定性,并分析计算复杂性。

Comments 21 pages, 2 figures. Includes effective rational representation with uniform margins, logical consequence analysis, and a fixed-dimensional hierarchy

详情
AI中文摘要

欧氏空间中每一有限族紧凸体在不相交指标集之间诱导一个相离关系:当对应并集的凸包不相交时,两个集合相离。本文研究以相离为原始关系的有限理论。其基本定律是对称性、双边包含和空性,等价于无环分离体的分离-极性形式。主要贡献是一个具有均匀边界的有效有理实现定理及其支持的精确推论理论。每一有限相离分离体可由有理多面体实现,其坐标由最大分离索引。最大分离和最小Radon划分可从全表、生成元或成员关系预言机枚举;坐标值具有受控的比特高度;每个坐标记录一个可读的最大分离证书。该实现使每一相离对具有至少2的间隙,在半径小于1的外平行扩张下保持正确,并在加厚后产生全维凸体。距离函数层通过Lipschitz比较、包含单调性和外平行体记录标准凸分析稳定性。在逻辑方面,正蕴含恰好是单前提包含。欧氏场景上的布尔推论是可靠、完备且可判定的;可满足性是NP完全的,有效性是coNP完全的,正蕴含对排序编码是线性的。分层定理表明布尔推理不引入超出分离体闭包的新原子相离。固定维度的蕴含关系形成一个严格递减的层级,在n个站点时稳定于维度n减1。

英文摘要

Every finite family of compact convex bodies in Euclidean space induces an apartness relation between disjoint index sets: two sets are apart when the convex hulls of the corresponding unions are disjoint. This paper studies the finite theory obtained by taking apartness as the primitive relation. Its basic laws are symmetry, bilateral subsumption, and vacuity, equivalently the separation-polarity form of acyclic separoids. The main contribution is an effective rational realization theorem with uniform margins and the exact consequence theory it supports. Every finite apartness separoid is realized by rational polytopes whose coordinates are indexed by maximal separations. Maximal separations and minimal Radon partitions can be enumerated from a full table, generators, or a membership oracle; the coordinate values have controlled bit height; and each coordinate records a readable certificate of one maximal separation. The realization separates every apart pair with clearance at least 2, remains correct under outer parallel enlargement by any radius below 1, and yields full-dimensional convex bodies after thickening. The distance-function layer records standard convex-analytic stability through Lipschitz comparison, monotonicity under inclusion, and outer parallel bodies. On the logical side, positive entailment is exactly one-premise subsumption. Boolean consequence over Euclidean scenes is sound, complete, and decidable; satisfiability is NP-complete, validity is coNP-complete, and positive entailment is linear for sorted encodings. A stratification theorem shows that Boolean reasoning introduces no new atomic apartness beyond separoid closure. Fixed-dimensional consequence relations form a strictly decreasing hierarchy that stabilizes in dimension n minus 1 for n sites.

2606.12664 2026-06-12 eess.SY cs.SY 新提交

Modeling and Estimation of Solid Electrolyte Interphase during Formation in Battery Manufacturing

电池制造过程中固态电解质界面相形成过程的建模与估计

Zhiwen Wan, Hamidreza Movahedi, Wenxue Liu, Jingchen Ma, Jason B. Siegel, Andrew Weng, Anna Stefanopoulou

AI总结 提出一种控制导向的半经验模型,利用低成本微米精度集成传感夹具在制造过程中原位测量端电压和电池膨胀,估计SEI厚度增长,为闭环控制奠定基础。

Comments 8 pages, 6 figures. Accepted by the 2026 American Control Conference (ACC)

详情
AI中文摘要

固态电解质界面相(SEI)——一种决定锂离子电池寿命、安全性和效率的关键钝化层——在电池制造的最后一步(称为化成)中形成。传统的电池化成协议很大程度上基于经验,导致处理时间长,且对影响SEI质量和寿命性能的SEI生长速率控制有限。本文开发了一种控制导向的半经验模型,利用低成本微米精度集成传感夹具在制造过程中原位测量端电压和电池膨胀,估计SEI厚度增长。模型参数根据电池化成数据进行校准,并采用无迹卡尔曼滤波器估计SEI膜生长。结果为未来SEI生长的闭环控制奠定了基础,从而实现高质量和更高效的化成过程。

英文摘要

The solid electrolyte interphase (SEI) - a critical passivation layer that governs the longevity, safety, and efficiency of lithium-ion batteries - is created during the last step in cell manufacturing called cell formation. Conventional cell formation protocols are largely empirical, resulting in long processing times and limited control over the SEI growth rate that influences SEI quality and lifetime performance. This paper develops a control-oriented, semi-empirical model to estimate SEI thickness growth from terminal voltage and cell expansion measurements acquired in-operando during manufacturing using low-cost micrometer-precision integrated-sensing fixture. Model parameters are calibrated against cell formation data, and an unscented Kalman filter is employed to estimate the SEI film growth. The results lay the foundation for future closed-loop control of SEI growth, enabling high-quality and more efficient formation processes.

2606.12650 2026-06-12 cs.PL cs.PF 新提交

nomp: A Framework for Building Domain Specific Compilers

nomp: 构建领域特定编译器的框架

Thilina Ratnayaka, Kaushik Kulkarni, Nipuna Fernando, Pubudu Hewavitharana, Hirumal Priyashan, Poorna Gunathilaka, Nagitha Abeywickrema, Ravindu Hirimuthugoda, Tarun Prabhu, Kirshanthan Sundararajah, Sanath Jayasena

AI总结 提出nomp框架,通过基于pragma的编程模型和运行时,利用领域特定优化模式在保持性能与可移植性的同时提高程序员生产力。

详情
AI中文摘要

低层GPU编程模型(CUDA、HIP、OpenCL等)提供对程序数据流和执行计划的精细控制,以提取接近硬件的性能。然而,由于其语法和语义的复杂性,学习曲线陡峭,降低了程序员的生产力。另一方面,高层模型(OpenMP、OpenACC等)作为低层模型的抽象,旨在提高程序员生产力,但实现与低层模型相当的性能是一个挑战。这两种方法在生产效率、可移植性和性能之间存在固有的权衡,没有一种通用解决方案能同时实现三者。然而,我们相信通过重用特定领域的优化模式,可以在不牺牲性能和可移植性的前提下提高程序员生产力。为此,我们提出nomp:一个用于构建领域特定编译器的框架。nomp包含一个基于pragma的编程模型和一个能够根据用户提供的元数据进行代码转换和生成的运行时。

英文摘要

The low-level GPU programming models (CUDA, HIP, OpenCL, etc.) provide detailed control of the data flow and execution plan of a program in order to extract close-to-metal performance. However, these have a steep learning curve due to the intricacies of their syntax and semantics. This reduces programmer productivity. On the other hand, high-level models (OpenMP, OpenACC, etc.) that serve as abstractions over the low-level models are aimed at improving programmer productivity but achieving performance on-par with the low-level models is a challenge. There are inherent trade-offs between productivity, portability and performance in both approaches and there is no one-size-fits-all solution which achieves all three simultaneously. However, we believe there is room to improve programmer productivity without sacrificing performance and portability by reusing optimization patterns specific to a given domain. To this end, we propose nomp: a framework for building domain specific compilers. nomp consists of a pragma based programming model and a runtime capable of code transformation and generation based on user provided metadata.

2606.12648 2026-06-12 cs.HC 新提交

OpenRoundup: Multi-Table Data Wrangling Through Interactive Visualization

OpenRoundup:通过交互式可视化进行多表数据整理

Stephen Kasica, Charles Berret, Tamara Munzner

AI总结 提出OpenRoundup系统,通过交互式可视化支持数据记者无代码整合多张表格,采用模式优先、按需取值范式,并引入急切表合并与声明式词汇(Stack和Pack),复制研究证明其表达能力,部署研究确认对非编程从业者的实用性。

Comments 18 pages

详情
AI中文摘要

数据记者通常需要整合多个独立发布来源的记录以支持问责报道,但现有的交互式整理工具均以单表而非多表集合作为主要工作单元。我们提出OpenRoundup,一个开源、基于浏览器的系统,使数据记者无需编写代码即可将多个表格合并为单一的分析就绪输出。界面包含五个协调面板,实现了模式优先、按需取值的范式,具有实时模式预览、环境数据质量警报以及操作树的可递归树图可视化。基于DuckDB-WASM的纯客户端架构在浏览器中运行,为敏感的新闻数据提供了强大的隐私保护。该系统引入了两个概念性贡献:急切表合并,即在整理阶段早期通过交互式、增量式组装多个源表来构建复合表;以及一个由两个操作(Stack和Pack)组成的表合并声明式词汇。我们通过一项复制研究(作者仅使用界面重现了17个已发布的记者编程工作流)和一项部署研究(与四位专业数据记者合作)来评估该系统。复制研究证明了系统对现实世界合并任务的表达能力。部署研究确认了其对理解连接概念但缺乏编程技能的执行者的实用性,并揭示了数据新闻教育中一个意想不到的次要价值。

英文摘要

Data journalists routinely integrate records across multiple independently published sources to support accountability reporting, yet no existing interactive wrangling tool treats the collection of tables -- rather than the single table -- as its primary unit of work. We present OpenRoundup, an open-source, browser-based system that enables data journalists to consolidate multiple tables into a single analysis-ready output without writing code. The interface comprises five coordinated panels that implement a schema-first, values-on-demand paradigm with live schema previews, ambient data quality alerts, and a recursive treemap visualization of the evolving operation tree. A client-only architecture powered by DuckDB-WASM runs in the browser, providing strong data privacy guarantees suited to sensitive journalism data. The system introduces two conceptual contributions: eager table consolidation, in which a composite table is assembled early in the wrangling phase via interactive, incremental assembly of multiple source tables; and a declarative vocabulary for table consolidation consisting of two operations, Stack and Pack. We evaluate the system through a replication study in which the authors reproduce 17 published journalist programming workflows using only the interface, and a deployment study with four professional data journalists. The replication study demonstrates expressive coverage of real-world consolidation tasks. The deployment study confirms utility for practitioners who understand joins conceptually but lack the programming skills to execute them, and surfaces an unanticipated secondary value for data journalism education.

2606.12638 2026-06-12 cs.DC cs.AR 新提交

Eidola: Modeling Multi-GPU Network Communication Traffic in Distributed AI Workloads

Eidola: 分布式AI工作负载中多GPU网络通信流量建模

Ranganath R. Selagamsetty, Matthew Poremba, Bradford M. Beckmann, Joshua San Miguel, Mikko H. Lipasti

AI总结 提出Eidola,一种可扩展的gem5模拟框架扩展,通过注释时序配置精确建模多GPU间通信流量,支持细粒度同步分析和架构探索。

Comments 13 pages, 11 figures, 1 table

详情
AI中文摘要

随着分布式AI工作负载规模的扩大,多GPU系统已成为训练大型模型的关键。尽管内核融合和计算与通信重叠等技术有助于减少延迟,但它们也引入了不规则和瞬态的流量模式,难以用现有工具建模。这些技术高度依赖细粒度同步和点对点通信,对互连带宽和延迟造成显著压力。在这项工作中,我们介绍了Eidola,这是gem5模拟框架的一个可扩展扩展,能够对GPU间通信流量进行详细建模。该扩展具有可扩展性,因为我们的GPU模型作为一个简洁的eidolon,模拟了流量建模所需的最小特征。Eidola使用来自真实应用的注释时序配置,以周期级精度模拟点对点GPU写入。这使得研究人员能够模拟和分析大规模多GPU配置下的同步行为。该模拟器支持可配置的每GPU流量模式,并能够在不同通信场景下进行隔离性能分析。我们通过重现融合内核执行中的变异性以及实现一个受SyncMon启发的同步机制,证实了Eidola的有效性,确认了轮询相关内存流量的减少。我们的结果表明,Eidola为研究GPU间通信提供了一个灵活且可扩展的平台,并支持现代分布式GPU系统中的架构探索。

英文摘要

As distributed AI workloads grow in scale, multi-GPU systems have become essential for training large models. Although techniques like kernel fusion and overlapping communication with computation help reduce delays, they also introduce irregular and transient traffic patterns that are difficult to model using existing tools. These techniques rely heavily on fine-grained synchronization and peer-to-peer communication, which place significant pressure on interconnect bandwidth and latency. In this work, we introduce Eidola, a scalable extension to the gem5 simulation framework that enables detailed modeling of inter-GPU communication traffic. The extension is scalable as our GPU model serves as a succinct eidolon, emulating the minimal characteristics needed for traffic modeling. Eidola uses annotated timing profiles from real applications to emulate peer-to-peer GPU writes with cycle-level precision. This allows researchers to simulate and analyze synchronization behavior across large multi-GPU configurations. The simulator supports configurable per-GPU traffic patterns and enables isolated performance analysis under different communication scenarios. We demonstrate Eidola's effectiveness by reproducing variability in fused kernel execution and by implementing a SyncMon-inspired synchronization mechanism, confirming reductions in polling-related memory traffic. Our results show that Eidola provides a flexible and scalable platform for studying inter-GPU communication and supports architectural exploration in modern distributed GPU systems.

2606.12631 2026-06-12 cs.CC 新提交

The Switching Lemma shows what the Switching Lemma cannot prove: an unconditional natural-proofs barrier

切换引理展示了切换引理无法证明的内容:一个无条件的自然证明障碍

Bruno Loff, Suhail Sherif, Navid Talebanfard, Francesca Ugazio

AI总结 本文无条件地证明了AC0自然证明无法证明超过2^{n^{7/(d-5)}}的深度-d电路下界,揭示了切换引理本身也无法超越其自身所确立的下界。

Comments 34 pages, 2 figures

详情
AI中文摘要

Razborov和Rudich (JCSS'97) 观察到所有已知的下界证明都遵循某种模式:当证明一个函数$F$是困难的时,证明过程会提供一个区分器,即一个高效的算法,能够区分简单函数和随机函数。他们称这种下界证明为自然证明。然后他们展示了一个自然证明障碍:在标准密码学假设下,自然证明无法证明针对布尔电路的超多项式下界。类似地,可以证明在合适的密码学假设下,自然证明无法显著改进针对恒定深度电路(AC0)的当前最先进下界。目前最先进的下界,使用Håstad的切换引理(SL),对于深度-$d$电路是$2^{n^{1/(d-1)}}$,并且(有条件地)没有自然证明能证明$2^{n^{c/d}}$的下界,其中$c$是某个大常数。在本文中,我们从$\textit{无条件的}$角度重新审视自然证明障碍。我们专注于AC0自然证明,即其区分器可由AC0电路计算的自然证明。Razborov和Rudich观察到基于SL的下界是AC0自然的。我们证明这对于大多数已知的针对恒定深度电路的下界技术都是成立的。然后我们为这类证明建立了一个无条件的障碍。通过局部化Trevisan--Xue伪随机生成器,我们能够证明没有AC0自然证明能证明针对深度-$d$电路的大于$2^{n^{7/(d-5)}}$的下界。这与SL前沿的定量范围相同,后者在$n$的幂次中是$1/(d-1)$。证明具有惊人的自指性质:Trevisan--Xue生成器的安全性证明关键依赖于SL,因此SL被用来证明AC0自然证明(如SL本身)无法证明比SL更好的AC0下界。

英文摘要

Razborov and Rudich (JCSS'97) observed that all known lower-bound proofs follow a certain pattern: when showing that a function $F$ is hard, along the way the proof provides us with a distinguisher, namely, an efficient algorithm which can distinguish easy functions from random functions. They called such lower-bound proofs natural proofs. They then showed a natural-proofs barrier: under standard cryptographic assumptions, natural proofs cannot show superpolynomial lower-bounds against Boolean circuits. Along similar lines it can be shown that under a suitable cryptographic assumption, natural proofs cannot significantly improve the current state-of-the-art lower bound against constant depth circuits (AC0). The state of the art, using Håstad's Switching Lemma (SL), is $2^{n^{1/(d-1)}}$ for depth-$d$ circuits, and (conditionally) no natural proof can prove lower bounds of $2^{n^{c/d}}$ for some large constant $c$. In this paper we revisit the natural-proofs barrier from an $\textit{unconditional}$ perspective. We focus on AC0-natural proofs, i.e. proofs whose distinguishers are computable by AC0 circuits. Razborov and Rudich observed that lower bounds based on SL are AC0-natural. We show that this is true for most known lower-bound techniques against constant-depth circuits. We then establish an unconditional barrier for such proofs. By localizing the Trevisan--Xue pseudorandom generator, we are able to show that no AC0-natural proof can prove a lower bound greater than $2^{n^{7/(d-5)}}$ against depth-$d$ circuits. This is in the same quantitative regime as the SL frontier which instead has $1/(d-1)$ in the power of $n$. The proof has a striking self-referential aspect: the proof of security of the Trevisan--Xue generator crucially relies on SL, and so SL has been used to show that AC0-natural proofs, such as SL itself, cannot prove AC0 lower bounds better than that of SL.

2606.12592 2026-06-12 cs.SE 新提交

Characterizing Tests in IoT Software: Practices, Challenges and Opportunities

物联网软件中的测试特征:实践、挑战与机遇

Rufeng Chen, Hengcheng Zhu, Wuqi Zhang, Zixu Zhou, Lili Wei

AI总结 通过首个开源物联网软件测试用例实证研究,评估测试有效性,识别与外部依赖交互的挑战,并分析模拟对象的使用潜力。

Comments 15 pages, 4 figures

Journal ref IEEE Transactions on Software Engineering, 2026

详情
AI中文摘要

物联网(IoT)正在经历快速增长。智能设备出现在智能家居和工业应用中,执行关键任务。物联网软件中的错误可能导致严重后果。例如,有缺陷的智能锁可能允许未经授权访问私人财产。测试是暴露软件错误和确保软件质量的主要实践。然而,关于物联网软件如何测试知之甚少。为填补这一空白,我们对开源物联网软件中的测试用例进行了首次实证研究。具体来说,我们评估了物联网软件中测试用例的有效性,探索了测试物联网软件固有的挑战,并分析了模拟对象的使用情况。我们的结果表明,虽然物联网软件通常包含相当数量的测试,但其有效性仍然有限。我们确定测试物联网软件的主要挑战是管理与各种外部依赖的复杂交互,例如其他依赖网络的物联网组件、文件系统、操作系统和数据库。我们还观察到,物联网软件中模拟对象的使用与我们识别的测试挑战密切相关。这种一致性表明模拟作为增强测试覆盖率和解决物联网软件测试复杂性的解决方案的潜力。

英文摘要

The Internet of Things (IoT) is experiencing rapid growth. Smart devices are emerging in smart homes and industrial applications, performing mission-critical tasks. Bugs in IoT software can lead to severe consequences. For example, a buggy smart lock can allow unauthorized access to a private property. Testing is a primary practice to expose software bugs and ensure software quality. However, little is known about how IoT software is tested. To bridge this gap, we conducted the first empirical study on test cases in open-source IoT software. Specifically, we evaluated the effectiveness of test cases in IoT software, explored the challenges inherent in testing IoT software, and analyzed the usage of mock objects. Our results indicate that while IoT software often contains a considerable number of tests, their effectiveness remains limited. We identified the primary challenges in testing IoT software as managing complex interactions with various external dependencies, such as other network-reliant IoT components, file systems, operating systems, and databases. We also observed that the use of mock objects in IoT software closely aligns with our identified testing challenges. This alignment demonstrates the potential of mocking as a solution to enhance test coverage and address the complexities of IoT software testing.

2606.12586 2026-06-12 cs.CR 新提交

Beyond Attack Success Rate: Examining Trigger Leakage in Vision-Language Agentic Systems

超越攻击成功率:视觉-语言智能系统中的触发器泄漏研究

Jiamin Chang, Salil Kanhere, Piotr Koniusz, Jason, Xue, Hammond Pearce

AI总结 本文提出“触发器泄漏”概念,量化视觉-语言智能系统中后门触发器在视觉或语义相近输入下意外激活隐藏行为的风险,并引入邻域泄漏率(NLR)指标。

详情
AI中文摘要

视觉-语言智能系统(VLAS)将视觉感知与规划、工具使用和物理动作相连接。这意味着后门型触发器可以通过决策管道及其连接的接口传播,从而使视觉后门成为系统级威胁。目前对此类后门的评估侧重于干净准确率和攻击成功率(ASR),这些指标衡量触发器是否有效,但并未评估攻击是否真正“精确”——即是否仅在预期时触发隐藏行为。在本工作中,我们将触发器精度的失败形式化为“触发器泄漏”:视觉或语义上接近预期触发器的输入,从而无意中激活攻击者指定的行为。为了量化这种泄漏,我们引入了邻域泄漏率(NLR)。实验表明,在3%的投毒率下,图标和文本触发器对常见视觉变换保持鲁棒性,但其邻近变体严重泄漏,NLR达到0.996(图标)和0.944(文本)。使用文本触发器作为受控探针,我们发现标准微调学习到的是宽泛的激活区域而非精确的触发条件,导致即使精确触发器缺失,邻近字符串也能调用恶意行为。在训练中添加编辑距离为1的硬负样本可显著缩小此激活区域并减少泄漏,包括在图像编辑和具身操作工作流中,泄漏的触发器可能传播到可执行程序和动作序列。

英文摘要

Vision-Language Agentic Systems (VLAS) connect visual perception to planning, tool use, and physical actions. This means backdoor-type triggers can propagate through both decision pipelines and their connected interfaces, thus making visual backdoors a system-level threat. Current evaluations on such backdoors focus on clean accuracy and attack success rate (ASR), metrics that capture whether a trigger works, but not whether an attack is actually "precise" -- i.e. whether it triggers hidden behaviors only when intended. In this work, we formalize the failure of trigger precision as "trigger leakage": inputs that are visually or semantically close to the intended trigger and therefore inadvertently activate the attacker-specified behavior. To quantify this leakage, we introduce Neighbor Leakage Rate (NLR). Our experiments show that at a 3% poisoning ratio, icon and text triggers remain robust to common visual transformations, but their neighboring variants leak heavily, with NLR reaching 0.996 (icon) and 0.944 (text). Using textual triggers as a controlled probe, we show that standard fine-tuning learns a broad activation region rather than an exact trigger condition, causing neighboring strings to invoke the malicious behavior even when the exact trigger is absent. Adding edit-distance-one hard-negative samples during training substantially narrows this activation region and reduces leakage, including in image-editing and embodied-manipulation workflows, where leaked triggers can propagate into executable programs and action sequences.

2606.12504 2026-06-12 cs.LO 新提交

A Type Theory of Sense: Witnessed Choice in Stratified Semantic Spaces

一种意义类型论:分层语义空间中的见证选择

Iman Poernomo

AI总结 提出依赖类型论TTS,用horn填充表示语义组合,通过测量上下文记录分离见证,实现非全局规范组合,支持弗雷格意义、指称和超内涵差异的几何解释。

详情
AI中文摘要

我们引入TTS,一种依赖类型论,其中语义组合由horn填充表示,可能完成之间的区别相对于显式测量机制被见证。TTS用基于测量索引的不可区分性和构造性分离替换全局规范组合,允许填充空间在完成全部观察连接时被分类为规范的,在两个有根据的完成被正分离时被分类为分叉的。分离见证仅通过记录实际仪器输出的测量上下文进入演算,产生保守性、来源性和空记录无分叉结果。我们证明分叉在细化下持续而规范性可能失败,并精确刻画一个机制所做的识别何时能与另一个机制所做的分离一致共存。该框架支持弗雷格意义作为填充的选择、指称作为约束该选择的边界、超内涵差异作为测量的分离的几何解释,同时为分层表示空间和语言模型生成中的分支行为提供了可证伪的桥梁。

英文摘要

We introduce TTS, a dependent type theory in which semantic composition is represented by horn filling and distinctions between possible completions are witnessed relative to explicit measurement regimes. TTS replaces globally canonical composition with regime-indexed indiscernibility and constructive apartness, allowing filler spaces to be classified as canonical when all completions are observationally connected and forked when two warranted completions are positively separated. Separation witnesses enter the calculus only through measurement contexts recording actual instrument outputs, yielding conservativity, provenance, and a no-fork-from-the-empty-record result. We prove that forks persist under refinement while canonicity may fail, and characterize exactly when an identification made by one regime can consistently coexist with a separation made by another. This framework supports a geometric account of Fregean sense as a choice of filler, reference as the boundary constraining that choice, and hyperintensional difference as measured apartness, while providing a falsifiable bridge to stratified representation spaces and branching behaviour in language-model generation.