arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1708
专题追踪
2606.07301 2026-06-08 q-bio.QM 新提交

Structure-guided taxonomic placement of divergent RNA viruses with ViraClass

基于结构的RNA病毒分类定位:ViraClass

Sheng Xu, Wenxuan Huang, Shutong Yue, Weiqiang Bai, Shiyang Feng, Xiaohan He, Bo Zhang, Qiantai Feng, Edward C. Holmes, Weifeng Shi, Siqi Sun

AI总结 针对RNA病毒分类中RdRp序列相似性低的问题,提出基于蛋白质结构的ViraClass框架,实现从门到属的层级分类,在深度进化距离上优于序列方法。

详情
AI中文摘要

宏转录组测序扩展了我们对RNA病毒圈的认识,其速度远超新病毒的分类学鉴定。科级以上的分类尤为困难,因为RNA依赖的RNA聚合酶(RdRp)通常是RNA病毒中唯一保留的基因,但在高度分化的病毒中序列相似性极低。这里我们证明,在RdRp一级序列相似性基本消失的进化深度上,RdRp蛋白质结构保留了分类信号,且这些信号的组织方式与当前ICTV层级一致。基于此,我们开发了ViraClass,一个用于RNA病毒分类定位的层级框架,它利用RdRp结构进行从门到属的逐级分类,在置信阈值支持的最深等级停止,并对仍处于现有参考空间之外的病毒进行校准的结构聚类。在随机分割、前瞻性和分类学保留基准测试中,ViraClass优于基于序列和基因组内容的基线方法。最大的提升出现在深度进化距离上,在从参考中保留整个科、目或纲的基准测试中,基于序列的方法失去了大部分信号。在诸如黄病毒科等具有挑战性的边界案例中,ViraClass基于结构的分类定位捕捉到了近期系统发育研究强调的分类边界张力。当应用于大量先前未分类的RdRp序列时,ViraClass将高置信度查询归入现有门,并将剩余序列组织成紧凑的结构组。因此,ViraClass提供了一种可扩展的方法,从大规模病毒发现到层级分类解释,特别是在当前基于序列的流程无法达到的深度进化范围。

英文摘要

Metatranscriptomic sequencing has expanded our knowledge of the RNA virosphere far more rapidly than novel viruses can be taxonomically classified. Taxonomic assignment above the family level is particularly difficult because the RNA-dependent RNA polymerase (RdRp) is often the only gene retained across RNA viruses yet exhibits little sequence similarity among highly divergent viruses. Here we show that RdRp protein structure retains taxonomic signal at evolutionary depths where RdRp primary sequence similarity has largely collapsed, and that the organization of this signal is consistent with the current ICTV hierarchy. Based on this, we developed ViraClass, a hierarchical framework for RNA virus taxonomic placement that uses RdRp structure for rank-by-rank assignment from phylum to genus, stopping at the deepest rank supported by confidence thresholds, and calibrated structural clustering for viruses that remain outside existing reference space. Across random-split, prospective and taxonomic hold-out benchmarks, ViraClass outperforms sequence-based and genome-content baselines. The largest gains emerge at deep evolutionary distances, in benchmarks that withhold entire families, orders or classes from the reference, where sequence-based methods lose most of their signal. In challenging boundary cases such as the Flaviviridae, ViraClass's structure-based placements capture the taxonomic boundary tensions highlighted by recent phylogenetic studies. When applied to a large collection of previously unclassified RdRp sequences, ViraClass places high-confidence queries into existing phyla and organizes the remainder into compact structural groups. ViraClass therefore provides a scalable approach from large-scale virus discovery to hierarchical taxonomic interpretation, particularly at the deep evolutionary ranges that current sequence-based pipelines cannot reach.

2606.06889 2026-06-08 q-bio.GN 新提交

From Genomes to Algorithms: Neural Network Applications for Palimpsest Detection in Medieval Manuscripts

从基因组到算法:中世纪手稿中重写本检测的神经网络应用

James B. Harr, Madelin E. Blong, Tessa Gadomski, Kelly A. Meiklejohn, William E. Gundling

AI总结 本研究通过非破坏性采样和测序,结合机器学习分类器(逻辑回归和神经网络),评估重写本制备对DNA完整性的影响,并探索计算方法在识别重写本中的应用。

详情
AI中文摘要

生物密码学(Biocodicology)研究手稿中保存的生物信息,为将羊皮纸视为文本和生物制品提供了新机会。本研究采用非破坏性采样,从14世纪手稿Ms. Codex 1629(包含单次使用和重写本页)中分离并测序线粒体基因组(mtGenomes)。我们旨在评估重写本制备(包括化学清洗)是否损害DNA完整性,以及计算方法是否有助于识别重复使用的羊皮纸。DNA测序显示,单次使用和重写本羊皮纸均保留了足够的mtGenomes用于分析,基因组覆盖度和深度无显著差异。为了评估计算生物学在手稿研究中的潜力,我们实施了机器学习分类器,包括逻辑回归和神经网络,以区分重写本和单次使用页。模型实现了高精度,但对少数类重写本的召回率较低,反映了数据集不平衡。虽然需要更多来自重写本的古代mtGenome样本并进行进一步测试,但本研究证明了整合分子生物学和神经网络如何为重写本检测提供新方法,并强调了数据科学在生物密码学中不断演变的作用。

英文摘要

Biocodicology, the study of biological information preserved in manuscripts, offers new opportunities to examine parchment as both a textual and biological artefact. This study applies non-destructive sampling to isolate and sequence mitochondrial genomes (mtGenomes) from a 14th-century manuscript, Ms. Codex 1629, which contains both single-use and palimpsested folios. We sought to evaluate whether palimpsest preparation, including chemical washing, compromised DNA integrity and whether computational methods could aid in identifying reused parchment. DNA sequencing revealed that both single-use and palimpsested parchments retained sufficient mtGenomes for analysis, with no significant differences in genome coverage or depth. To assess the potential of computational biology in manuscript studies, we implemented machine learning classifiers, including logistic regression and neural networks, to distinguish palimpsests from single-use folios. Models achieved high precision but exhibited reduced recall for the minority palimpsest class, reflecting dataset imbalance. While additional ancient mtGenome samples from palimpsest are required and further testing is needed, this study demonstrates how integrating molecular biology and neural networks highlights new approaches for palimpsest detection and underscores the evolving role of data science in biocodicology.

2606.06749 2026-06-08 q-bio.QM 新提交

Deterministic access to global viral sequence data enables robust agentic scientific discovery

确定性访问全球病毒序列数据实现稳健的自主科学发现

Ferdous Nasri, Sarah Gurev, Patrick Varilly, Krithik Ramesh, Nuala A. O'Leary, Jonah Cool, Bernhard Y. Renard, Pardis C. Sabeti, Laura Luebbert

AI总结 针对基于大语言模型的科学代理在病毒数据检索中的高错误率问题,提出确定性查询框架gget virus,通过形式化NCBI Virus过滤流程、元数据约束和结构化记录检索,将检索准确率提升至90%以上,并减少98%数据传输。

详情
AI中文摘要

公共病毒基因组资源,如美国国家生物技术信息中心(NCBI)病毒数据库,是疫情应对、进化分析、疫苗设计和基因组监测的核心。然而,许多高价值检索工作流程仍针对交互式使用而非确定性、可重复的程序化接口进行优化。这给基于大语言模型(LLM)的科学代理带来了挑战,其中元数据解释、过滤逻辑或检索中的错误可能传播到不正确的数据集中。为了评估自主病毒数据检索,我们构建了VirBench,这是一个手动策划的基准测试,包含120个查询,涵盖多种病原体、分类级别和元数据过滤器。当包括Biomni、Claude、GPT和Edison Analysis在内的自主AI系统在没有专用检索层的情况下执行这些查询时,性能差异很大:平均准确率从Claude Sonnet 4的16.9%到GPT-5.5的91.3%,较新的前沿模型虽有进步,但残留错误仍会产生严重后果。为了解决这个问题,我们构建了gget virus,一个确定性查询框架,将NCBI Virus风格的过滤形式化为可重复的程序化系统。通过分阶段检索、在序列下载前应用元数据约束以及检索结构化的GenBank记录,gget virus在高容量查询中减少了超过98%的数据传输,同时保持了精确匹配语义。指示自主AI系统使用gget virus后,所有评估系统的准确率至少提高到90.0%,GPT-5.5最高达到99.7%,响应稳定性提高到0.92-1.00,错误幅度减小,并且通常减少了运行时间和工具调用。总之,这项工作确立了确定性数据访问作为可靠自主科学的关键基础设施,并为稳健的人类和AI驱动的病毒基因组学工作流程提供了可重复的检索层。

英文摘要

Public viral genome resources such as the National Center for Biotechnology Information (NCBI) Virus database are central to outbreak response, evolutionary analysis, vaccine design, and genomic surveillance. Yet many high-value retrieval workflows remain optimized for interactive use rather than deterministic, reproducible programmatic interfaces. This creates a challenge for Large Language Model (LLM)-based scientific agents, where errors in metadata interpretation, filtering logic, or retrieval can propagate into incorrect datasets. To evaluate agentic viral data retrieval, we built VirBench, a manually curated benchmark of 120 queries spanning diverse pathogens, taxonomic levels, and metadata filters. When autonomous AI systems, including Biomni, Claude, GPT, and Edison Analysis, were tasked with these queries without a dedicated retrieval layer, performance varied widely: mean accuracy ranged from 16.9% for Claude Sonnet 4 to 91.3% for GPT-5.5, with newer frontier models showing progress but residual errors remaining consequential. To address this, we built gget virus, a deterministic query framework that formalizes NCBI Virus-style filtering as a reproducible programmatic system. By staging retrieval, applying metadata constraints before sequence download, and retrieving structured GenBank records, gget virus reduces data transfer by more than 98% for high-volume queries while preserving exact-match semantics. Instructing autonomous AI systems to use gget virus increased accuracy to at least 90.0% across all evaluated systems and up to 99.7% for GPT-5.5, improved response stability to 0.92-1.00, reduced error magnitude, and generally decreased runtime and tool calls. Together, this work establishes deterministic data access as critical infrastructure for reliable agentic science and provides a reproducible retrieval layer for robust human- and AI-driven viral genomics workflows.

2606.06562 2026-06-08 q-bio.QM 新提交

Iterative AI-guided optimisation of selective triple-drug combinations for breast cancer

AI引导的选择性三联药物组合用于乳腺癌的迭代优化

Oghenejokpeme Orhobor, Abbi Abdel-Rehim, Emma Tate, Holly X. Smith, Elizabeth Bourne, Ross J. Collins, Larisa N. Soldatova, Ross D. King

AI总结 提出AI引导的QSAR驱动迭代优化框架,结合机器学习与自动化实验筛选,闭环发现选择性三联药物组合,在MCF7乳腺癌细胞中快速富集高效且选择性高的方案。

Comments 4 figures, 3 tables

详情
AI中文摘要

个性化癌症治疗旨在根据个体肿瘤特征定制治疗方案,然而肿瘤异质性和适应性耐药性持续限制临床疗效。药物组合通过同时靶向多条通路提供克服耐药性的策略,但其合理设计受限于巨大的组合搜索空间和实验成本。本文提出一个AI引导的、QSAR驱动的迭代优化框架,将机器学习与自动化实验筛选相结合,实现选择性多药疗法的闭环发现。从初始随机筛选开始,系统迭代预测、测试和优化针对MCF7乳腺癌细胞的三药组合。引入非致瘤性MCF10A细胞使得能够显式优化肿瘤选择性疗效,优先选择最大化杀伤癌细胞同时保护健康细胞的方案。经过连续迭代,该框架快速富集高选择性、高效能的组合,同时保持化学和机制多样性,避免收敛于狭窄解空间。通过持续从实验反馈中学习,该方法高效探索数百万种组合,识别出一小组经过验证的、肿瘤选择性方案。这些结果建立了AI驱动的闭环优化高阶药物组合的可扩展概念验证,展示了计算与实验的迭代整合如何实现精准肿瘤学中自适应且可能个性化的治疗设计。

英文摘要

Personalised cancer therapy aims to tailor treatment to individual tumour profiles, yet tumour heterogeneity and adaptive resistance continue to limit clinical efficacy. Drug combinations offer a strategy to overcome resistance by simultaneously targeting multiple pathways, but their rational design is constrained by the vast combinatorial search space and experimental cost. Here, we present an AI-guided, QSAR-driven iterative optimisation framework that integrates machine learning with automated experimental screening to enable closed-loop discovery of selective multi-drug therapies. Starting from an initial random screen, the system iteratively predicts, tests, and refines three-drug combinations targeting MCF7 breast cancer cells. Incorporation of non-tumorigenic MCF10A cells enables explicit optimisation of tumour-selective efficacy, prioritising regimens that maximise cancer cell killing while sparing healthy cells. Across successive iterations, the framework rapidly enriched for highly selective, high-efficacy combinations, while maintaining chemical and mechanistic diversity and avoiding convergence on a narrow solution space. By continuously learning from experimental feedback, the approach efficiently navigates millions of combinations to identify a small set of validated, tumour-selective regimens. These results establish a scalable proof-of-concept for AI-driven, closed-loop optimisation of higher-order drug combinations, demonstrating how iterative integration of computation and experimentation can enable adaptive and potentially personalised therapeutic design in precision oncology.

2606.07487 2026-06-08 cs.MA cs.GT cs.SI 新提交

Modelling Opinion Dynamics at Scale with Deep MARL

用深度MARL建模大规模意见动态

Lukas Seier, Brandon Kaplowitz, Sebastian Towers, Richard Bailey, Jakob Foerster

AI总结 提出GPU加速的共识与真相发现游戏,扩展其他玩法至一般和社交互动,在Bluesky网络子集上验证模型,发现高从众性降低集体准确性并促进不诚实行为。

Comments 35 pages, 28 figures, preprint

详情
AI中文摘要

意见动态建模通常依赖于手工设计的局部交互规则来研究涌现的宏观现象,如共识和极化。相比之下,多智能体强化学习(MARL)使智能体能够通过优化简单奖励直接学习此类行为。为了探索MARL在意见动态中的潜力,我们引入了一个GPU加速的共识与真相发现游戏,该游戏可扩展到多达1000个智能体的人群,与许多现实世界的社会子网络相当。为了防止不切实际的约定,我们将其他玩法扩展到一般和社交互动。接下来,我们通过学习的注意力层仅从图拓扑中恢复智能体重要性结构,在Bluesky网络的一个子集上验证了我们的模型,发现高度从众的人群与人类数据最匹配。在大型社交媒体网络中,这种高度的从众性显著降低了集体准确性,并促进了为了融入而撒谎的不诚实智能体。相比之下,小型、动态的狩猎采集网络受影响较小;在这里,从众甚至可以提高集体一致性。这表明进化的人类从众启发式与现代社交媒体环境之间的不匹配可能是错误信息的潜在促成因素。

英文摘要

Modelling opinion dynamics typically relies on hand-crafted local interaction rules to study emergent macroscopic phenomena such as consensus and polarisation. In contrast, multi-agent reinforcement learning (MARL) enables agents to learn such behaviours directly by optimising simple rewards. To explore the potential of MARL for opinion dynamics, we introduce a GPU-accelerated consensus and truth-finding game that scales to populations of up to 1000 agents, comparable to many real-world social sub-networks. To prevent unrealistic conventions, we extend other-play to general-sum social interactions. We next validate our model on a subset of the Bluesky network by recovering agent importance structures from graph topology alone via a learned attention layer, finding that highly conforming populations most closely match human data. In large social media networks such high levels of conformity significantly reduce collective accuracy and promote dishonest agents that lie to fit in. By contrast, small, dynamic hunter-gatherer networks are less affected; here, conformity can even improve collective agreement. This suggests a mismatch between evolved human conformity heuristics and modern social media environments as a potential contributor to misinformation.

2606.07486 2026-06-08 eess.SY cs.SY 新提交

OPENPATH: A Supervisor--Specialist Agent System for Personalized, Accessible, and Multi-stop Urban Trip Planning

OPENPATH: 一种用于个性化、无障碍和多站点城市出行规划的监督-专家智能体系统

Ziyang Xiong, He Zong, Zhiyuan Xue, Manxi Wu

AI总结 提出监督-专家多智能体系统OPENPATH,结合LLM解析自然语言与经典算法优化路径,支持个性化偏好、多站点规划和无障碍需求,并用于城市尺度无障碍分析。

详情
AI中文摘要

城市出行规划系统通常针对旅行时间和成本进行优化,但对真实旅行者带来的异质需求(如个性化偏好、多站点行程构建和端到端轮椅无障碍)支持有限。我们提出OPENPATH,一个监督-专家多智能体系统,在单一架构中处理所有这些任务。OPENPATH采用明确的劳动分工:LLM智能体解析自然语言输入、分类请求意图并协调执行,而经典算法在精选的移动性和无障碍数据上进行路线优化。这种设计确保生成的行程尊重异质用户偏好,并在请求时强制执行严格的无障碍要求。除了针对单个用户的规划,OPENPATH还作为城市规模无障碍分析的测量工具:应用于纽约市,系统揭示了显著的ADA基础设施差距,并量化了其对轮椅使用者就业可达性的影响。总体而言,本研究展示了监督-专家LLM智能体框架如何在真实城市环境中支持异质出行规划和透明、公平的交通分析。

英文摘要

Urban trip-planning systems are commonly optimized for travel time and cost, but they offer limited support for the heterogeneous needs that real travelers bring, such as personalized preferences, multi-stop itinerary construction, and end-to-end wheelchair accessibility. We present openpaths, a supervisor-specialist multi-agent system that handles all of these tasks within a single architecture. openpaths adopts a deliberate division of labor: LLM agents parse natural-language input, classify request intent, and orchestrate execution, while classical algorithms perform route optimization over curated mobility and accessibility data. This design ensures that the resulting trip honors heterogeneous user preferences and enforces strict accessibility requirements when requested. Beyond per-user planning, openpaths doubles as a measurement instrument for city-scale accessibility analysis: applied to NYC, the system reveals substantial ADA infrastructure gaps and quantifies their effect on job accessibility for wheelchair users. Overall, this study shows how a supervisor-specialist LLM agentic framework can support heterogeneous trip planning and transparent, equitable transportation analysis in real urban environments.

2606.07470 2026-06-08 cs.CR 新提交

Verifiable and Confidential DNN Inference on Low-End Edge Devices

低端边缘设备上的可验证且保密的DNN推理

Mohamed Khalil Kiri, Ivan De Oliveira Nunes, Aurélien Francillon, Norrathep Rattanavipanon

AI总结 提出VECODI框架,利用TrustZone-M TEE的SHANGRI-LA抽象,在非安全世界执行推理代码,以最小安全世界支持实现模型保密性和结果可验证性,适用于低端边缘设备。

Comments 12 pages, 4 figures, 5 tables, 1 algorithm

详情
AI中文摘要

在低端边缘设备上部署深度神经网络(DNN)推理带来了两个关键挑战:保护模型机密性以防止潜在受损的边缘系统,以及在不产生过高开销的情况下实现可验证推理。现有方法要么将部分模型和推理软件置于可信执行环境(TEE)内,导致高成本和应用程序相关的可信计算基(TCB),要么在不可信环境中执行,安全性较低。在这项工作中,我们提出了VECODI,一个用于在受限边缘设备上进行可验证且保密的DNN推理的框架。其核心是VECODI引入了SHANGRI-LA,一种在TrustZone-M TEE上的新执行抽象,它建立了一个权限严格介于安全世界和非安全世界之间的第三运行时环境。VECODI利用SHANGRI-LA在非安全世界中执行不可信的推理代码,同时使用最小的与应用无关的安全世界支持来保护模型机密性,并实现推理结果的可验证性(关于推理代码和模型参数的正确执行)。我们在真实的NUCLEO-L552ZE-Q开发板上实现了VECODI,并开源了其原型。我们的结果表明,VECODI具有较小的TCB、内存占用和运行时开销,使其成为低端边缘设备中安全推理的实用选择。

英文摘要

Deploying deep neural network (DNN) inference on low-end edge devices raises two key challenges: protecting model confidentiality against a potentially compromised edge system and enabling verifiable inference without incurring prohibitive overhead. Existing approaches either house partial models and inference software within trusted execution environments (TEEs), resulting in high cost and an application-dependent trusted computing base (TCB), or execute in untrusted environments, providing little security. In this work, we present VECODI, a framework for verifiable and confidential DNN inference on constrained edge devices. At its core, VECODI introduces SHANGRI-LA, a new execution abstraction on TrustZone-M TEEs that establishes a third runtime environment with privileges strictly between the Secure and Non-Secure Worlds. VECODI leverages SHANGRI-LA to execute untrusted inference code in the Non-Secure World while using minimal application-agnostic Secure-World support to protect model confidentiality and enable verifiability (with respect to proper execution of inference code and model parameters) of inference results. We realize VECODI on a real-world NUCLEO-L552ZE-Q development board and open-source its prototype. Our results show VECODI's small TCB, memory footprint, and runtime overhead, making it a practical option for secure inference in low-end edge devices.

2606.07453 2026-06-08 cs.DS cs.DM 新提交

Odd Cycle Transversal in $P_k$-Free Graphs

在 $P_k$-自由图中的奇环横贯问题

Akramah Faizi, Arash Rafiey

AI总结 针对$P_k$-自由图,提出基于二分图环分解的常数因子近似算法,奇数$k$时近似比为$k-2$,偶数$k$时为$k-3$。

详情
AI中文摘要

奇环横贯(OCT)问题要求找到一个最小顶点子集,删除后使图变为二分图,是算法图论中的核心问题。已知即使在$P_k$-自由图上,对于$k \ge 6$,该问题也是NP完全的。此外,假设唯一游戏猜想(UGC),OCT在一般图上不存在常数因子近似算法。受这些困难结果的启发,我们研究了OCT在$P_k$-自由图上的可近似性。我们首先证明,该问题在$P_k$-自由图的特定子类上可以在多项式时间内解决,最值得注意的是$(P_6, C_3)$-自由图,通过利用二分图环的结构分解。以这些可处理的子结构为基础,我们提出了一个针对一般$P_k$-自由图上OCT的常数因子近似算法。当$k$为奇数时,我们达到$k-2$的近似比;当$k$为偶数时,近似比为$k-3$。这些结果提供了此类图依赖$k$的第一个非平凡常数因子近似,与UGC的推论一致,即不太可能存在与$k$无关的近似因子。

英文摘要

The Odd Cycle Transversal (OCT) problem, which asks for a minimum subset of vertices whose removal renders a graph bipartite, is a central problem in algorithmic graph theory. It is known to be NP-complete even on $P_k$-free graphs for $k \ge 6$. Furthermore, assuming the Unique Games Conjecture (UGC), OCT does not admit a constant-factor approximation algorithm on general graphs. Motivated by these hardness results, we investigate the approximability of OCT on $P_k$-free graphs. We first establish that the problem becomes polynomial-time solvable on specific subclasses of $P_k$-free graphs, most notably $(P_6, C_3)$-free graphs, by exploiting a structural decomposition into rings of bipartite graphs. Leveraging these tractable substructures as a basis, we present a constant-factor approximation algorithm for OCT on general $P_k$-free graphs. We achieve an approximation ratio of $k-2$ when $k$ is odd and $k-3$ when $k$ is even. These results provide the first nontrivial constant-factor approximations for this class dependent on $k$, aligning with the UGC implication that no approximation factor independent of $k$ is likely to exist.

2606.07450 2026-06-08 cs.SI q-fin.PM q-fin.ST 新提交

Information Networks of Stock Prices

股票价格的信息网络

Muhammad Aldy Hassan, Hokky Situngkir

AI总结 本文通过对比皮尔逊相关和互信息在印尼资本市场中的应用,发现皮尔逊相关、MST和Infomap组合在恢复行业分类上最稳健,而互信息与PMFG结合则能揭示隐藏的经济子结构。

Comments 12 pages, 6 figures

详情
AI中文摘要

股票价格的集体运动蕴含着复杂的相互依赖关系,传统上仅通过线性视角进行简化。本文通过测试皮尔逊相关和互信息在揭示市场谱动态方面的极限,探索了印尼资本市场的计算结构网络表示。在2015年至2025年的2328个滚动观察窗口中,我们检验了24种方法配置,这些配置结合了三种依赖估计器(皮尔逊、MI自适应分箱和MI-kNN)、两种图过滤方案(最小生成树/MST和平面最大过滤图/PMFG)以及四种社区解码器。实证结果揭示了一个基本事实:拓扑丰富度并不总是与行业分类精度共鸣。皮尔逊、MST和Infomap配置被证明是恢复传统行业分类最稳健的基础。然而,当更深入的观察需要揭示局部结构和异质社区的编织时,通过PMFG的结构松弛显示出其优越性。在残差信息检测领域,MI自适应分箱似乎比kNN更为成比例;基于直方图的正则化成功抑制了经验噪声,同时没有扫除非线性依赖的痕迹。最终,MI和PMFG的协同作用并非旨在取代线性相关的主导地位,而是为挖掘隐藏的经济子结构(例如商品体制的内聚性)提供一种必要的分析视角,这些结构早已超越市场正式部门的严格界限。

英文摘要

The collective movement of stock prices harbors complex interdependencies that are conventionally simplified only through a linear lens. This paper explores computed structural network representations in the Indonesian capital market by testing the limits of Pearson correlation and Mutual Information (MI) in unveiling the spectral dynamics of the market. Across 2,328 rolling observation windows from 2015 to 2025, we examine 24 methodological configurations that combine three dependency estimators (Pearson, MI adaptive binning, and MI-kNN), two graph filtering schemes (Minimum Spanning Tree/MST and Planar Maximally Filtered Graph/PMFG), and four community decoders. The empirical results unveil a fundamental reality: topological richness does not always resonate with sectoral classification precision. The Pearson, MST, and Infomap configuration is shown to remain the most robust foundation for recovering conventional sectoral taxonomy. Nevertheless, when deeper observation demands the exposition of local structures and the weave of heterogeneous communities, the architectural relaxation through PMFG demonstrates its superiority. In the realm of residual information detection, MI adaptive binning appears far more proportional than kNN; histogram-based regularization successfully tames empirical noise without sweeping away traces of non-linear dependency. Ultimately, the synergy of MI and PMFG is not positioned to dethrone the dominance of linear correlation, but rather to provide an essential analytical lens for excavating hidden economic sub-structures -- such as the cohesion of commodity regimes -- that have long transcended the rigid boundaries of the market's formal sectors.

2606.07448 2026-06-08 cs.SE 新提交

Agentic Very Much! Adoption of Coding Agent in New GitHub Projects

Agentic Very Much! 新GitHub项目中编码助手的采用

Romain Robbes, Théo Matricon, Thomas Degueule, Andre Hora, Stefano Zacchiroli

AI总结 研究新创建的GitHub项目中编码助手的采用情况,发现采用率是之前研究的两倍以上,且AI辅助提交比例显著更高。

详情
AI中文摘要

在之前的工作中,我们调查了GitHub项目中编码助手的采用情况,发现其非常显著。本研究延续这一工作线,但分析了之前研究之后创建的新项目。在这个新样本中,我们发现编码助手的采用率是之前的两倍以上。我们还发现采用强度显著增加,因为AI辅助提交的比例明显更高,尽管有强烈迹象表明我们并未检测到全部。

英文摘要

In previous work, we investigated the adoption of coding agents in GitHub projects, finding that it was very significant. This study follows this line of work, but analyses new projects, that were created after the previous study. In this new sample, we find that the adoption of coding agents is more than twice as high. We also find that the adoption is significantly more intensive, as the proportion of AI-assisted commits is sensibly higher, despite strong signs that we do not detect all of it.

2606.07439 2026-06-08 cs.AR 新提交

A 65 nm Multi-Modal Bayesian Inference Engine with 16.3 fJ/Sample Calibration-Free GRNG for Risk-Aware At-Home Skin Lesion Screening

65 nm 多模态贝叶斯推理引擎,具有 16.3 fJ/样本免校准 GRNG,用于风险感知的家庭皮肤病变筛查

Steven Davis, Likai Pei, Jianbo Liu, Zephan M. Enciso, Boyang Cheng, Xueji Zhao, Danny Z. Chen, Ningyuan Cao

AI总结 提出一种65 nm风险感知多模态贝叶斯推理引擎,通过存内计算架构实现词内混合高斯采样,提升不确定性建模能力,在鲁棒性和精度上超越现有单模态贝叶斯神经网络。

详情
AI中文摘要

我们提出了一种65 nm风险感知多模态贝叶斯推理引擎,用于在不受控制的家庭条件下进行隐私保护、完全设备上的皮肤病变筛查。所提出的存内计算架构执行词内混合高斯采样,改进了超越传统单模态贝叶斯神经网络的不确定性建模。这种增加的概率表达能力将等风险操作覆盖范围提高了1.4倍,对用户数据扰动的鲁棒性提高了>1.5倍,工艺变化弹性提高了5.5倍,并且与最先进的单模态贝叶斯神经网络相比,平衡精度提高了1.8%。硬件鲁棒性进一步通过使用互补工艺变化的免校准高斯随机数生成来支持,实现了16.3 fJ/样本和168.6 GSa/s/mm^2的效率。这些结果展示了一种实用、节能且风险感知的边缘AI解决方案,用于隐私敏感的医疗筛查。

英文摘要

We present a 65-nm risk-aware multimodal Bayesian inference engine for privacy-preserving, fully on-device skin lesion screening under uncontrolled at-home conditions. The proposed compute-in-memory architecture performs in-word Mixture-of-Gaussian sampling, improving uncertainty modeling beyond conventional unimodal Bayesian neural networks. This added probabilistic expressiveness increases equal-risk operating coverage by 1.4x, improves robustness to user-data perturbations by >1.5x, enhances process-variation resilience by 5.5x, and improves balanced accuracy by 1.8% over state-of-the-art unimodal Bayesian neural networks. Hardware robustness is further supported by calibration-free Gaussian random-number generation using complementary process variation, achieving 16.3 fJ/sample and 168.6 GSa/s/mm^2 efficiency. These results demonstrate a practical, energy-efficient, and risk-aware edge-AI solution for privacy-conscious medical screening.

2606.07434 2026-06-08 cs.GT 新提交

Evidence Markets

证据市场

Safwan Hossain, Gabriel Andrade, Chengqi Zang, Yiling Chen

AI总结 提出证据市场,通过动态调整流动性的对数市场评分规则,激励提交证据和信念,支持内生解决,证明有界损失、证据按市场不确定性奖励,并实现ε-DSIC策略。

详情
AI中文摘要

现代预测市场面临两个限制,限制了它们在多种场景中的适用性:~(i)~它们揭示了人群的信念,但没有揭示这些信念背后的证据或推理,以及~(ii)~它们需要一个具有外部真实结果的事件,该结果在已知的未来日期解决。我们通过引入证据市场来应对这两个挑战,证据市场是预测市场的一种推广,它激励在提交信念的同时提交证据,并且如果外部解决不可行,可以使用众包证据内生解决。其核心是使用对数市场评分规则,其流动性参数随累积证据质量动态变化。我们证明平台损失有界,证据根据当前市场不确定性获得奖励,并且可以通过自动做市商等价实现。在市场基于提交的证据内生解决的情况下,我们描述了隐瞒证据如何改变交易者对解决的信念,并利用它证明真实信念和证据报告始终是一个$\varepsilon$-占优策略激励兼容(DSIC)策略。为了解决操作上的考虑,我们提出了通过带有质押的LLM-as-a-Judge框架进行证据验证,并给出了一种不受验证瓶颈限制的异步执行算法。在整个工作中,我们使用LLM评估——确定哪个模型最适合给定任务——作为我们提出的市场的一个显著且具有代表性的运行示例。

英文摘要

Modern prediction markets face two limitations that restrict their applicability in a range of settings:~(i)~they reveal what the crowd believes but not the evidence or reasoning behind those beliefs, and~(ii)~they require an event with an external ground truth that resolves at a known future date. We address these twin challenges by introducing evidence markets, a generalization of prediction markets that incentivizes the submission of evidence alongside beliefs and can be endogenously resolved using the crowd-sourced evidence if external resolution is not possible. At its core, the market uses a logarithmic market scoring rule whose liquidity parameter changes dynamically with the accumulated evidence quality. We prove that platform loss is bounded, evidence is rewarded proportional to the current market uncertainty, and can be equivalently implemented through an automated market maker. In the case where the marker resolves endogenously based on submitted evidence, we characterize how withholding evidence shifts a trader's belief about resolution and use it to prove truthful belief and evidence reporting is a always an $\varepsilon$-dominant strategy incentive compatible (DSIC) strategy. To address operational considerations, we propose evidence verification via an LLM-as-a-Judge framework with staking and give an asynchronous execution algorithm that is not bottle-necked by verification. Throughout the work, we use LLM evaluations -- determining which model is best for a given task -- as a salient and representative running example for our proposed market.

2606.07427 2026-06-08 cs.CE 新提交

High-Frequency Preconditioners for Electromagnetic Integral Equations Based on Helmholtz Regularizations

基于Helmholtz正则化的电磁积分方程高频预处理器

S. Ciciriello, V. Giunzioni, A. Dély, A. Merlini, S. B. Adrian, F. P. Andriulli

AI总结 针对电场积分方程在不同频率和离散化条件下的病态问题,提出一种基于移位Helmholtz算子的新型预处理策略,稳定迭代次数并实现准线性复杂度。

详情
AI中文摘要

通过边界元法数值求解电场积分方程(EFIE)可能因不同情况下的条件数问题而面临计算挑战,例如:(i)频率降低而离散化密度保持不变时,(ii)频率保持不变而离散化细化时,以及(iii)频率随离散化密度增加而增加时。为了解决这些问题,文献中已经开发了几种针对相关矩阵系统的预处理方法,但只有少数方法能同时处理所有情况。本文研究了其中一种技术,并提出了一种加速相关矩阵-向量乘积(MVP)的策略。特别地,我们针对移位Helmholtz算子提出了一种新颖的预处理策略,而标准伪逆技术对此算子效果不佳。相反,我们的预处理技术的应用在所有上述情况下稳定了迭代次数。鉴于这些成果,当使用适当的加速策略时,移位Helmholtz算子的伪逆可以在准线性复杂度下获得,从而使得EFIE的数值解具有相同的复杂度。

英文摘要

The numerical solution of the Electric Field Integral Equation (EFIE) via the Boundary Element Method (BEM) can be computationally challenging due to conditioning issues arising in different regimes, such as (i) when the frequency decreases and the discretization density remains constant, (ii) when the frequency is kept constant while the discretization is refined, and (iii) when the frequency increases along with the discretization density. To address these issues, several preconditioning approaches for the related matrix system have been developed in the literature, only a few of which address all regimes simultaneously. This paper investigates one of these techniques and presents a strategy for accelerating the associated matrix-vector products (MVPs). In particular, we propose a novel preconditioning strategy for the shifted Helmholtz operator, for which standard pseudo-inversion techniques have shown unsatisfactory results. Instead, the application of our preconditioning technique stabilizes the number of iterations in all the aforementioned regimes. In view of these achievements, the pseudo-inversion of the shifted Helmholtz operator can be obtained in quasi-linear complexity when proper acceleration strategies are used, thus enabling the numerical solution of the EFIE with the same complexity.

2606.07420 2026-06-08 cs.CR 新提交

Lost in Migration: Exposing Android Framework Vulnerabilities in Parallel Java-Kotlin Implementations

迷失在迁移中:揭示Android框架中Java-Kotlin并行实现的安全漏洞

Rui Li, Wenrui Diao, Debin Gao

AI总结 本文首次系统研究Android框架中Java-Kotlin并行实现的语义差异,设计ParaDroid分析框架识别并比较并行方法,发现37个可利用漏洞,其中3个已确认并分配CVE。

Comments 14 pages

详情
AI中文摘要

Android已在应用和核心系统组件中采用Kotlin与Java并存。在此转变过程中,我们在Android开源项目(AOSP)中观察到并行实现,即同一组件同时用Java和Kotlin实现。原则上,它们的功能目的相同。实际上,可能出现微妙的语义差异。这些差异本身并非漏洞,但提供了可能揭示周围执行逻辑缺陷的有用线索。据我们所知,本文首次系统研究Android框架中Java-Kotlin并行实现,并考察其安全影响。我们设计并构建了ParaDroid,一个大规模识别并行方法并比较其行为的分析框架。ParaDroid将代码标准化为字节码级中间表示,重建类到源文件的映射,并使用大语言模型推理方法语义并识别行为差异。在AOSP Android 14-16上评估,ParaDroid识别了329个并行方法对和37个可利用的差异。我们负责任地向Android安全团队披露了可利用问题。已确认3个漏洞和2个缺陷,并分配了2个CVE编号。我们的结果表明,并行Java-Kotlin代码路径为发现现代Android中的安全缺陷提供了实用表面。

英文摘要

Android has adopted Kotlin alongside Java across apps and core system components. During this shift, we observe parallel implementations in the Android Open Source Project (AOSP) where the same component is implemented in both Java and Kotlin. In principle, their functional purposes are identical. In practice, subtle semantic divergences can appear. Such divergences are not vulnerabilities by themselves, but they provide useful clues that may reveal flaws in surrounding enforcement logic. To the best of our knowledge, this paper presents the first systematic study of Java-Kotlin parallel implementations in the Android framework and examines their security implications. We design and build ParaDroid, an analysis framework that identifies parallel methods at scale and compares their behaviors. ParaDroid normalizes code into a bytecode-level intermediate representation, reconstructs class-to-source mappings, and uses large language models to reason about method semantics and identify behavioral divergences. Evaluated on AOSP Android 14-16, ParaDroid identified 329 parallel method pairs and 37 vulnerable divergences. We responsibly disclosed the exploitable issues to the Android Security Team. Three vulnerabilities and two bugs have been confirmed, and two CVE IDs have been assigned. Our results demonstrate that parallel Java-Kotlin code paths provide a practical surface for discovering security flaws in modern Android.

2606.07408 2026-06-08 cs.DS cs.DB cs.FL cs.LO 新提交

Earliest query answering over streamed trees

流式树上的最早查询回答

Mateusz Gienieczko, Martín Muñoz, Filip Murlak, Charles Paperman

AI总结 针对大规模流式JSON/XML文档,提出基于单子二阶逻辑(MSO)的一元查询最早回答方法,实现常数时间更新和低延迟、低内存占用。

详情
AI中文摘要

流式处理允许对大规模JSON或XML文档执行查询,这些文档的大小使得完全解析成树变得不可行。最早查询回答是一种激进的方法,用于减少延迟和内存占用。为了最小化延迟,一旦保证某个文档节点是答案(无论文档如何结束),就必须立即返回该节点。类似地,为了最小化内存占用,一旦某个节点不可能成为答案(无论文档如何结束),就必须立即丢弃它。对于基于从根路径选择节点的简单查询,每个节点的决定可以当场做出,但诸如XPath或JSONpath等实用语言支持过滤器,允许基于从文档各个部分(可能更下游)收集的信息来选择节点。这使得最早查询回答成为一项具有挑战性的任务,因为候选节点必须保留在内存中,直到明确可以安全返回或丢弃它们。我们证明,对于所有可用单子二阶逻辑(MSO)表达的一元查询,这都可以实现,同时确保常数时间更新——前提是节点通过传递合适的迭代器返回,而不是逐个返回。

英文摘要

Streaming allows executing queries over massive JSON or XML documents whose size makes it infeasible to fully parse them into a tree. Earliest query answering is a radical approach to reducing latency and memory footprint. To minimize latency, a document node must be returned as soon as the node is guaranteed to be an answer regardless of how the document ends. Similarly, to minimize memory footprint, a node must be discarded as soon as it cannot become an answer regardless of how the document ends. For simple queries that select nodes based on the path from the root, the decision for each node can be made on the spot, but practical languages such as XPath or JSONpath support filters, which allow selecting nodes based on information collected from various parts of the document, possibly further down the stream. This makes earliest query answering a challenging task, as candidate nodes must be kept in memory until it becomes clear that they can be safely returned or discarded. We show that this can be done for all unary queries expressible in monadic second order logic (MSO), while ensuring constant update time -- provided that nodes are returned by passing a suitable iterator, rather than one by one.

2606.07393 2026-06-08 cs.SE 新提交

Is US Defense Acquisition Ready to Acquire AI-Enabled Capabilities? Assessing the DoD Software Acquisition Pathway Through a Scenario-Based Policy Analysis

美国国防采办是否准备好采办人工智能赋能能力?通过基于场景的政策分析评估国防部软件采办路径

Daniel Lugo, James C. Davis

AI总结 通过基于场景的政策分析,评估美国国防部软件采办路径是否足以应对AI采办的独特需求,发现核心指南中存在可操作性不足,建议增设AI支持子路径并完善工件。

Comments Submitted to ACM Digital Government: Research and Practice Journal on April 2026

详情
AI中文摘要

随着AI系统从实验原型过渡到关键任务工具,它们对动态数据、演化模型和治理的依赖引发了对现有采办路径能否跟上步伐的质疑。美国国防部通过适应性采办框架对其采办流程进行了现代化改造,其中软件采办路径(SWP)是采办软件密集型能力的主要机制。本文评估SWP是否足以应对AI采办的独特需求。我们进行了一项基于场景的评估,通过SWP的关键规划活动追踪一个假设的AI赋能项目,以评估政策如何转化为项目工件和决策。我们使用政策场景分析来检验以SWP为中心的治理堆栈是否为AI采办提供了足够的可操作支持。该治理堆栈为迭代交付和AI测试提供了可行基础。然而,我们在核心指南中发现了一个反复出现的可操作性问题。针对数据溯源、生命周期管理和人工监督的AI特定控制措施仍然分布在补充文件中,而不是嵌入到执行SWP的项目面对机制中。这种脱节使得项目办公室依赖于不一致的本地解释。最后,我们建议增设一个AI支持子路径并针对性地完善工件,以更好地弥合这种政策到工件的差距。

英文摘要

As AI systems transition from experimental prototypes to mission-critical tools, their dependence on dynamic data, evolving models, and governance raises questions about whether existing acquisition pathways can keep pace. The U.S. Department of Defense has modernized its acquisition processes through the Adaptive Acquisition Framework, with the Software Acquisition Pathway (SWP) serving as the primary mechanism for acquiring software-intensive capabilities. This paper evaluates whether SWP is sufficient to address the unique demands of AI acquisition. In this work, we perform a scenario-based evaluation that traces a notional AI-enabled program through key SWP planning activities to assess how policy translates into program artifacts and decisions. We use Policy Scenario Analysis to examine whether the SWP-centered governance stack provides sufficient actionable support for AI acquisition. The governance stack provides a viable foundation for iterative delivery and AI testing. However, we identify a recurring actionability problem in the core guidance. AI-specific controls for data provenance, lifecycle management, and human oversight remain distributed across supplemental documents rather than embedded in the program-facing mechanisms through which SWP is executed. This disconnect leaves program offices reliant on inconsistent local interpretation. We conclude by recommending an AI-supporting sub-path and targeted artifact refinements to better bridge this policy-to-artifact gap.

2606.07375 2026-06-08 eess.SY cs.CR cs.SY 新提交

An End-to-End Encrypted Control Pipeline for Multi-Agent Coordination via CKKS Homomorphic Encryption

基于CKKS同态加密的多智能体协同端到端加密控制管道

Sai Sandeep Damera, Maria Charitidou, Asim Zoulkarni, John S. Baras

AI总结 针对云端多智能体协同中的隐私冲突,提出端到端加密控制管道,所有环节在CKKS加密数据上仅用加、乘和循环旋转操作,通过稳态卡尔曼增益和对角线法实现图拉普拉斯,推导周期性自举界以量化加密噪声影响,并在编队控制中验证。

Comments 8 pages, 4 figures. This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

基于云的多智能体系统协同需要与中央服务器共享状态,这在协同与隐私之间产生了冲突。全同态加密(FHE)原则上解决了这一问题,但其严格的算术约束要求控制循环的每个阶段都从头重新设计。我们提出了一种端到端加密控制管道,其中感知、状态估计、状态传播和共识控制均在CKKS加密数据上仅使用加法、乘法和循环旋转操作。为了克服FHE的计算挑战,我们采用稳态卡尔曼增益而非在线求解矩阵,并通过对角线法以与非零循环对角线数量成比例的成本应用图拉普拉斯,从而在统一框架内适应环、环面和完全图拓扑。为了量化加密噪声的累积效应,我们利用分离定理解耦控制器和观测器误差动态,并推导出周期性自举界,其中CKKS自举作为脉冲扰动;由此产生的稳态误差球取决于自举精度和闭环谱半径,为隐私-精度权衡提供了直接的设计方程。该管道在多智能体编队控制场景中得到验证,确认了加密下闭环运行的稳定性及有界跟踪误差。

英文摘要

Cloud-based coordination of multi-agent systems requires sharing state with a central server, creating a conflict between coordination and privacy. Fully homomorphic encryption (FHE) resolves this in principle, but its severe arithmetic constraints demand that every stage of the control loop be redesigned from first principles. We present an end-to-end encrypted control pipeline in which sensing, state estimation, state propagation, and consensus control all operate on CKKS-encrypted data using only addition, multiplication, and cyclic rotation. In order to overcome the computational challenges of FHE, we employ steady-state Kalman gains instead of solving for the matrices online and graph Laplacians are applied via the diagonal method at a cost proportional to the number of nonzero cyclic diagonals, accommodating ring, torus, and complete-graph topologies within a unified framework. To quantify the cumulative effect of encryption noise, we use the separation principle to decouple controller and observer error dynamics and derive a periodic bootstrapping bound in which CKKS bootstrapping acts as an impulsive disturbance; the resulting steady-state error ball depends on the bootstrapping precision and the closed-loop spectral radius, providing a direct design equation for the privacy-accuracy tradeoff. The pipeline is validated on a multi-agent formation control scenario, confirming stable closed-loop operation under encryption with bounded tracking error.

2606.07363 2026-06-08 cs.CR cs.SE 新提交

On the Shoulders of Giants: Empowering Automated Smart Contract Auditing via the GiAnt Corpus

站在巨人的肩膀上:通过GiAnt语料库赋能自动化智能合约审计

Xiaoting Zhang, Zhipeng Gao, Yiran Lv, Xing Hu, Feifei Niu, Xin Xia

AI总结 提出自动化框架GiANT,从真实审计报告中提取漏洞信息构建高质量数据集GiAnt Corpus,包含7711个漏洞发现,验证了其在漏洞检测等任务中的实用性。

详情
AI中文摘要

高质量的智能合约审计数据集对于评估安全工具和推进智能合约安全研究至关重要。现有数据集的两个主要局限是手动引发的可扩展性瓶颈以及数据粒度和多样性的不足。为解决这些局限,我们提出GiANT,一个自动化框架,通过从真实世界审计报告中提炼漏洞见解来策划智能合约审计数据集。GiANT采用分治策略结合思维链技术从Code4rena报告中提取结构化漏洞信息,随后通过LLM-as-a-judge机制进行严格的质量保证。为评估GiANT的有效性,我们在388份真实审计报告上运行它,生成了包含跨五个严重级别的7711个漏洞发现的GiAnt语料库。数据集的人工评估显示出卓越的信息提取可靠性,平均质量得分为$4.76\pm0.37$(满分5分),评分者间一致性$\kappa$为0.88。我们进一步通过在漏洞检测、代码摘要、缓解建议和自动gas优化任务上对4个最先进的LLM进行基准测试,验证了数据集的实用性,建立了性能基线,从而为自动化智能合约审计的未来研究提供了宝贵的数据基础。

英文摘要

High-quality smart contract auditing datasets are crucial for evaluating security tools and advancing smart contract security research. Two major limitations of existing datasets are the manual-induced scalability bottleneck and the deficiency in data granularity and diversity. To address these limitations, we propose GiANT, an automated framework designed to curate smart contract auditing datasets by distilling vulnerability insights from real-world auditing reports. GiANT employs a divide-and-conquer strategy coupled with the Chain-of-Thought technique to extract structured vulnerability information from Code4rena reports, followed by an LLM-as-a-judge mechanism to perform rigorous quality assurance. To evaluate GiANT's effectiveness, we run it on 388 real-world audit reports and generate the GiAnt Corpus comprising 7,711 vulnerability findings across five severity levels. Manual assessment of the dataset demonstrates exceptional reliability in information extraction, achieving a mean quality score of $4.76\pm0.37$ (out of 5) with inter-rater agreement $κ$ of 0.88. We further validate the practicality of our dataset by benchmarking 4 state-of-the-art LLMs on vulnerability detection, code summarization, mitigation recommendation, and automated gas optimization tasks, to establish performance baselines, thereby providing a valuable data foundation for future research in automated smart contract auditing.

2606.07348 2026-06-08 cs.LO math.LO 新提交

Four intuitionistic modal connectives

四个直觉主义模态连接词

Philippe Balbiani, Çigdem Gencer

AI总结 本文引入基于Prenosil和Wijesekera两种风格的菱形及其对偶方框的直觉主义模态逻辑,分析框架类的模态可定义性、完全公理化,并证明最小直觉主义模态逻辑的可判定性。

详情
AI中文摘要

我们介绍了基于Prenosil风格的菱形连接词、其对偶方框连接词、Wijesekera风格的菱形连接词及其对偶方框连接词的直觉主义模态逻辑的语法和语义。我们分析了一些基本框架类的模态可定义性。我们研究了由这些框架类确定的有效公式集合的完全公理化。我们证明了由所有框架类确定的最小直觉主义模态逻辑的可判定性。

英文摘要

We introduce the syntax and the semantics of intuitionistic modal logics based on a diamond connective à la Prenosil, its dual box connective, a diamond connective à la Wijesekera and its dual box connective. We analyze the modal definability of some elementary classes of frames. We study the complete axiomatizability of the sets of valid formulas determined by these classes of frames. We prove the decidability of the minimal intuitionistic modal logic determined by the class of all frames.

2606.07341 2026-06-08 cs.CR 新提交

Empirical Evaluation of Large Language Models for Migration of Code Fragments to Post-Quantum Cryptography

大型语言模型在代码片段向后量子密码迁移中的实证评估

Javier Pallarés de Bonrostro, Ana I. González-Tablas, María Isabel González Vasco

AI总结 评估大型语言模型在将经典密码代码片段迁移至后量子密码中的能力,通过微调GPT-4.1-mini实现92.5%的功能正确率,优于零样本基线。

详情
AI中文摘要

向后量子密码(PQC)的过渡不仅需要替换易受攻击的密码原语,还需要重构周围的软件逻辑。虽然现有的PQC迁移框架提供了组织层面的指导,但实际的代码级修复仍然主要依赖人工且容易出错。本文评估了大型语言模型(LLM)是否可以被训练来协助将前量子密码代码片段迁移到后量子对应物,同时保持功能正确性。为此,我们引入了一个可重复的实验框架,该框架基于一个包含800个配对Python代码片段的合成数据集,涵盖六个密码家族和组合多原语案例。每个配对通过类别特定的功能测试进行验证,从而实现了数据集质量控制和模型生成迁移的客观评估。评估了四个模型:零样本设置下的GPT-4.1,以及微调版本的GPT-3.5-turbo、GPT-4.1-mini和CodeLlama-7B-Instruct。结果表明,领域特定的微调对于可靠的密码迁移至关重要。微调后的GPT-4.1-mini模型实现了最佳整体性能,平均静态相似度为0.9072,动态功能正确率为92.5%,显著优于零样本基线。对六个开源仓库的补充验证进一步表明,该方法可以在局部密码模块中产生有用的迁移,同时也揭示了在具有复杂依赖关系和跨模块交互的大型项目中的局限性。这些发现表明,微调后的LLM可以作为未来密码敏捷迁移管道中的实用组件,前提是它们与自动化验证和依赖感知验证相结合。

英文摘要

The transition to post-quantum cryptography (PQC) requires not only replacing vulnerable cryptographic primitives, but also refactoring the surrounding software logic. While existing PQC migration frameworks provide organizational guidance, practical code-level remediation remains largely manual and error-prone. This paper evaluates whether large language models (LLMs) can be trained to assist in the migration of pre-quantum cryptographic code fragments to post-quantum counterparts while preserving functional correctness. To this end, we introduce a reproducible experimental framework built around a synthetic dataset of 800 paired Python code fragments covering six cryptographic families and combined multi-primitive cases. Each pair is validated through category-specific functional tests, enabling both dataset quality control and objective evaluation of model-generated migrations. Four models are assessed: GPT-4.1 in a zero-shot setting, and fine-tuned versions of GPT-3.5-turbo, GPT-4.1-mini, and CodeLlama-7B-Instruct. The results show that domain-specific fine-tuning is essential for reliable cryptographic migration. The fine-tuned GPT-4.1-mini model achieves the best overall performance, with a mean static similarity of 0.9072 and a dynamic functional correctness rate of 92.5%, substantially outperforming the zero-shot baseline. A complementary validation on six open-source repositories further shows that the approach can produce useful migrations in localized cryptographic modules, while also revealing limitations in larger projects with complex dependencies and cross-module interactions. These findings suggest that fine-tuned LLMs can serve as practical components in future crypto-agile migration pipelines, provided they are coupled with automated verification and dependency-aware validation.

2606.07337 2026-06-08 cs.GR 新提交

Skeletal-Anchored Dual Harmonics for Structured 3D Modeling

骨骼锚定双谐波用于结构化三维建模

Zhentao Huang, Changhao Li, Ruizhen Hu, Hui Huang, Minglun Gong

AI总结 提出骨骼锚定双谐波(SADH)表示,通过内部锚点上的表面补丁和双通道球谐函数,联合优化表面几何与中轴骨骼结构,实现紧凑且连贯的三维形状建模。

Comments 11 pages

详情
AI中文摘要

我们提出骨骼锚定双谐波(SADH),一种新颖的三维形状表示,它将局部表面几何与内部中轴骨骼组织紧密耦合。SADH 将形状表示为附着在内部锚点上的紧凑表面补丁集合,这些锚点直接在物体体积内部优化。每个补丁使用双通道球谐函数(SH)公式参数化,其中一个通道建模局部径向几何,另一个通过广义视锥定义自适应补丁支持。与各向同性基元(如中轴球体或高斯核)不同,SH 补丁直接编码各向异性的局部表面几何以及自适应空间支持,从而能够紧凑表示细节丰富且方向变化的表面区域。从无组织点云开始,SADH 通过分阶段优化过程联合优化表面几何、锚点位置、补丁方向和结构连通性,逐步形成连贯的中轴骨骼结构。测地锚点图进一步保持相邻补丁之间的结构关系。在复杂三维形状上的实验表明,SADH 在广泛几何形状上实现了精确的表面重建以及紧凑且连贯的骨骼组织。

英文摘要

We present Skeletal-Anchored Dual Harmonics (SADH), a novel 3D shape representation that tightly couples local surface geometry with internal meso-skeletal organization. SADH represents a shape as a collection of compact surface patches rooted on internal anchors optimized directly inside the object volume. Each patch is parameterized using a dual-channel spherical harmonic (SH) formulation, where one channel models local radial geometry while the other defines adaptive patch support through a generalized viewing cone. Unlike isotropic primitives such as medial spheres or Gaussian kernels, SH patches directly encode anisotropic local surface geometry together with adaptive spatial support, enabling compact representation of detailed and directionally varying surface regions. Starting from unorganized point clouds, SADH jointly optimizes surface geometry, anchor locations, patch orientations, and structural connectivity through a staged optimization process that progressively forms a coherent meso-skeletal structure. A geodesic anchor graph further preserves structural relationships between neighboring patches. Experiments on complex 3D shapes demonstrate that SADH achieves accurate surface reconstruction together with compact and coherent skeletal organization across a wide range of geometries.

2606.07335 2026-06-08 cs.CR 新提交

Defending Jailbreak Attacks on Large Language Models via Manifold Trajectory Kinetics

通过流形轨迹动力学防御大语言模型的越狱攻击

Hangtao Zhang, Yucheng Zhao, Sishun Liu, Ziqi Zhou, Zeyu Ye, Wei Wan, Minghui Li, Shengshan Hu, Yanjun Zhang, Yi Liu, Leo Yu Zhang

AI总结 提出流形轨迹动力学(MTK)方法,通过分析提示词在模型层间的邻域结构演化来检测越狱攻击,在伪恶意提示和自适应攻击下均表现鲁棒。

Comments Accepted to USENIX Security '26 Cycle 2. Code is available at https://github.com/Rookie143/mtk

详情
AI中文摘要

越狱提示可以绕过大型语言模型(LLM)中的对齐护栏并引发不安全输出,使得可靠的部署时检测至关重要。先前的检测方法主要依赖于固定的度量空间,例如原始输入、梯度或隐藏特征,其中良性提示和越狱提示是线性可分的。我们证明这一假设在以下情况下失效:(i)伪恶意提示,其意图良性但包含安全相关关键词,以及(ii)明确针对部署检测器进行优化的自适应攻击。为克服这一限制,我们将关注点从识别通用度量空间转向分析底层数据流形更鲁棒的邻域结构。我们提出流形轨迹动力学(MTK),将LLM视为一个将输入转换为输出的动力学系统,并通过跟踪提示的邻域结构在层间的演化来检测越狱。良性提示在整个推理过程中保持接近良性邻域,而越狱提示则表现出特征轨迹:从接近恶意种子开始,随后策略性地向良性邻域移动以逃避检测。在四个LLM和十种越狱攻击下,MTK对两种失败模式均表现出强鲁棒性:在伪恶意提示上,它在良性提示上达到5%的假阳性率,在伪恶意提示上达到2%的假阳性率,同时越狱真阳性率为95%;在自适应攻击下,它保持85%的真阳性率。我们进一步展示了MTK在视觉语言模型中进行越狱检测的优越性能。我们的代码可在以下网址获取:https://this https URL。

英文摘要

Jailbreak prompts can bypass alignment guardrails in large language models (LLMs) and elicit unsafe outputs, making reliable deployment-time detection critical. Prior detection approaches largely rely on a fixed metric space, e.g., raw inputs, gradients, or hidden features, in which benign and jailbreak prompts are linearly separable. We show this assumption breaks under (i) pseudo-malicious prompts that are benign by intent but contain safety-related keywords, and (ii) adaptive attacks that explicitly optimize against the deployed detector. To overcome this limitation, we shift our focus from identifying a universal metric space to analyzing the more robust neighborhood structure of the underlying data manifold. We present Manifold Trajectory Kinetics (MTK), which treats an LLM as a kinetic system transforming inputs into outputs and detects jailbreaks by tracking how a prompt's neighborhood structure evolves across layers. Benign prompts remain close to benign neighborhoods throughout inference, whereas jailbreak prompts exhibit a characteristic trajectory that begins near malicious seeds and later strategically shifts toward benign neighborhoods to evade refusal.Across four LLMs and ten jailbreak attacks, MTK achieves strong robustness to both failure modes: on pseudo-malicious prompts, it attains a jailbreak true positive rate of 95% at a false positive rate of 5% on benign prompts and 2% on pseudo-malicious prompts, and under adaptive attacks, it maintains a true positive rate of 85%. We further demonstrate the superior performance of MTK for jailbreak detection in vision-language models. Our code is available at https://github.com/Rookie143/mtk.

2606.07332 2026-06-08 cs.DL 新提交

The disruption index does not measure scientific innovation

颠覆性指数不能衡量科学创新

Julien Larregue, Yves Gingras

AI总结 本文质疑《科学》杂志论文中提出的颠覆性指数,指出该指数基于直觉而非严谨验证,不能有效衡量科学创新,并警告其用于政策制定的风险。

Comments 16 pages, 5 figures

详情
AI中文摘要

一篇最近发表在《科学》杂志政策文章栏目下的论文认为,作者所称的科学颠覆性随学术年龄下降,并且这种下降与老年学者缺乏强制退休有关。自发表以来,其关于强制退休的结论和政策建议引起了媒体的广泛关注。因此,值得仔细审视所提出的颠覆性度量,因为所有分析和结论都基于该指数获得的结果,从而将其视为有效。我们讨论的问题并非该文章特有,在许多使用文献计量数据的论文中都能找到,这些论文基于常识直觉提出新指数,然后将其作为黑箱工具来衡量质量、创新或现在的颠覆性,以创建排名并根据计算出的指数值制定政策行动。

英文摘要

A paper recently published in Science under the rubric of Policy Article argued that what the authors call scientific disruption declines with academic age, and that this decline is related to the absence of mandatory retirement for older academics. Since its publication, its conclusions and policy suggestions in relation to mandatory retirement have received considerable media attention. Thus, it is worth taking a closer look at the proposed measure of disruption since all the analysis and conclusions are based on the results obtained from this index, thus taking it as valid. The issues we address are not specific to this article and can be found in many papers using bibliometric data that propose a new index on the basis of common sense intuition and then using it as a black boxed instrument to measure quality, innovation or, now, disruption for creating rankings and formulate policy actions on the basis of the calculated values of the index.

2606.07314 2026-06-08 cs.SE cs.ET quant-ph 新提交

QBugLM: An Agentic Benchmarking Framework for LLM-based Quantum Software Debugging

QBugLM:基于LLM的量子软件调试的智能基准测试框架

An B. B. Pham, Hoa T. Nguyen, Muhammad Usman

AI总结 提出QBugLM多智能体框架,自动化量子软件调试流程,通过案例研究评估LLM调试能力,发现迭代反馈显著提升修复成功率。

Comments This paper was accepted at IEEE QSW 2026

详情
AI中文摘要

量子软件缺陷通常产生静默的错误输出而非显式错误,这使得它们难以用传统技术检测和修复。尽管大型语言模型(LLM)在经典软件工程任务中表现出色,但其调试量子代码的能力仍未被充分探索。为填补这一空白,我们提出QBugLM,一个多智能体框架,自动化量子软件调试流程,从基于分类学的缺陷注入到基于LLM的检测和修复,最终到基于模拟的验证,适用于框架无关的OpenQASM 3.0程序。我们进一步使用QBugLM进行全面的案例研究,评估两个LLM(Claude 4.6 Sonnet和Qwen3 Coder Next)在不同提示策略、缺陷类别和量子程序上的表现。结果表明,迭代反馈至关重要,单次重试将Pass@1从低于25%提升至超过80%。此外,在固定资源约束下,对于具备推理能力的模型,更简单的结构化提示甚至优于思维链和ReAct。我们的工作迈出了基准测试LLM调试量子程序能力的第一步,并为未来自动化量子软件修复提供了实用见解。

英文摘要

Quantum software bugs often yield silent, incorrect outputs rather than explicit errors, making them particularly difficult to detect and repair with conventional techniques. Although large language models (LLMs) have shown strong performance on classical software engineering tasks, their ability to debug quantum code remains largely unexplored. To bridge this gap, we propose QBugLM, a multi-agent framework that automates the quantum software debugging pipeline, from taxonomy-driven bug injection to LLM-based detection and repair, and finally to simulation-based validation, for framework-agnostic OpenQASM 3.0 programs. We further conduct a comprehensive case study using QBugLM to benchmark two LLMs, Claude 4.6 Sonnet and Qwen3 Coder Next, across different prompting strategies, bug categories, and quantum programs. Our results show that iterative feedback is critical, as a single retry raises Pass@1 from below 25% to above 80%. Moreover, simpler structured prompting can even outperform Chain-of-Thought and ReAct for reasoning-capable models under fixed-resource constraints. Our work takes initial steps toward benchmarking LLM capabilities for debugging quantum programs and offers practical insights to support future efforts in automated quantum software repair.

2606.07285 2026-06-08 cs.GT 新提交

Improved Lower Bounds for Proportionally Fair Clustering

比例公平聚类的下界改进

Benjamin Cookson, Eva Deltl, Yeeseok Oh

AI总结 研究比例公平聚类中的α-core存在性,通过构造实例将α-core非空的下界从2提高到2.1508,并利用MILP等方法精确刻画了少量候选中心情形下的阈值。

详情
AI中文摘要

我们研究比例公平聚类,其中必须从度量空间中选择一组$k$个中心来代表$n$个智能体,并且没有足够大的智能体群体应被集体低估。该设置中公平性的核心概念之一是$\alpha$-core。Chen等人[2019]证明了$(1+\sqrt{2})$-core中聚类的存在性,他们也展示了对于每个$\alpha < 2$,$\alpha$-core为空的实例。缩小这一差距七年来一直是一个开放问题。我们从下界方面取得进展,提供了一个实例,其$\alpha$-core对于每个$\alpha < 2.1508$为空。我们的技术依赖于建立core的变体(即Hare core和Droop core)之间的联系;将最优空core实例的搜索简化为高度结构化的聚类实例族;以及使用混合整数线性规划(MILP)在这个缩减空间中搜索最优下界实例。使用这个框架,我们还确定了具有少量可能候选中心且仅需选择一个中心的Droop配额聚类实例的紧界。对于每个中心数$m \in \{3,4,5,6\}$,我们给出了精确阈值$\alpha_m^*$,使得$\alpha_m^*$-core聚类总是存在,而对于每个$\alpha < \alpha_m^*$,存在一个具有$m$个中心的实例其$\alpha$-core为空。尽管这些值最初是通过计算机辅助搜索找到的,我们也提供了不依赖MILP证书的直接证明。

英文摘要

We study proportionally fair clustering, where a set of $k$ centers must be chosen from a metric space to represent $n$ agents, and no sufficiently large group of agents should be collectively underrepresented. One of the central notions of fairness in this setting is the $α$-core. The existence of clusterings in the $(1+\sqrt{2})$-core was established by Chen et al. [2019], who also showed instances where the $α$-core is empty for every $α< 2$. Closing this gap has remained an open problem for seven years. We make progress from the lower-bound side by providing an instance whose $α$-core is empty for every $α< 2.1508$. Our techniques rely on establishing connections between variants of the core, namely the Hare core and the Droop core; reducing the search for optimal empty-core instances to a highly structured family of clustering instances; and using a Mixed Integer Linear Program (MILP) to search for optimal lower-bound instances within this reduced space. Using this framework, we also determine tight bounds for Droop quota clustering instances with a small number of possible candidate centers and a single center to be selected. For each number of centers $m \in \{3,4,5,6\}$, we give the exact threshold $α_m^*$ such that an $α_m^*$-core clustering always exists, while for every $α< α_m^*$ there is an instance with $m$ centers whose $α$-core is empty. Although these values were originally found through computer-aided search, we also provide direct proofs that do not rely on MILP certificates.

2606.07283 2026-06-08 cs.HC 新提交

A Model of Integrated Information Processing in Human-AI Interaction

人机交互中的集成信息处理模型

Tim Schrills. Thomas Franke

AI总结 提出集成信息处理(IIP)模型,将人类和AI视为耦合控制回路,通过三种信息处理质量(输入充分性、参考一致性、输出可操作性)连接心理机制与界面设计,指导人机耦合的设计与评估。

Comments 22 pages

详情
AI中文摘要

为了推动人机交互(HAII)研究的发展,需要将心理机制与界面设计联系起来的理论工作。这类工作应扩展而非取代现有的HCI和自动化研究,适应AI系统日益增强的自主性和能动性。基于先前关注人机交互中角色和层级的框架,从心理学视角仍存在空白:一个以任务为中心、面向过程的描述,将行动调节机制与人机耦合的具体设计和评估杠杆联系起来,并使用统一的人机词汇表达。此外,现有模型可能描述了系统如何设计(例如自动化中的功能分配),但未能展示这种设计如何影响人类行为。我们提出了集成信息处理(IIP)模型,这是一个以任务为中心的控制论模型,将人类、机器及其联合活动概念化为耦合控制回路。IIP模型使用统一建模语言描述人类和智能体,使行动调节的心理模型可用于AI系统设计。作为核心特征,我们认为共享任务中的效能由三种集成质量表征:输入充分性、参考一致性和输出可操作性,它们关键地影响以人为中心的基准,如透明度和可控性。该模型将界面选择(例如XAI技术)映射到理论驱动的用户行为期望,指导界面设计和评估。为此,我们提出:(1) 一个保持连续性的理论论述,将HAII扩展到AI的能动性;(2) 具有三种信息处理质量的IIP模型;(3) IIP模型在示例用例中的应用,展示对界面设计的启示。

英文摘要

For Human-AI Interaction (HAII) research to move forward, theoretical work linking psychological mechanisms to interface design is needed. Such work should extend rather than replace established HCI and automation research, adapting to the increasing autonomy and agency of AI systems. Building on prior frameworks focused on roles and levels in human interaction with automation, a gap remains from a psychological view: a task-centered, process-oriented account that links mechanisms of action regulation to concrete design and evaluation levers for human-AI coupling, expressed in a unified vocabulary for human and machine. Moreover, existing models may describe how a system is designed (e.g., function allocation in automation) but fall short in showing how this design affects human behavior. We present the Integrated Information Processing (IIP) model, a task-centered, cybernetic model that conceptualizes humans, machines, and their joint activity as coupled control loops. The IIP model uses a unified modeling language for human and artificial agents, making psychological models of action regulation accessible for AI system design. As a core feature, we argue that efficacy within a shared task is characterized by three integration qualities, input adequacy, reference consonance, and output operativity, which critically influence benchmarks of human-centeredness such as transparency and controllability. The model maps interface choices (e.g., XAI techniques) to theory-driven expectations of user behavior, guiding interface design and evaluation. To this end, we present (1) a continuity-preserving theoretical discourse that extends HAII to agency in AI; (2) the IIP model with three information-processing qualities; and (3) applications of the IIP model to exemplary use cases demonstrating implications for interface design.

2606.07282 2026-06-08 cs.CR cs.NI 新提交

Rethinking IoT Intrusion Detection: Augmenting Routing Metrics with Radio Features

重新思考物联网入侵检测:用无线电特征增强路由度量

Yichang Sun, Andreas Johnsson, Sourasekhar Banerjee

AI总结 针对RPL物联网网络,提出在LSTM入侵检测系统中结合收发无线电特征与标准RPL特征,在三种攻击下F1分数提升高达4%。

Comments 4 Pages, 8 figures, Accepted to Swedish National Computer Networking Workshop (SNCNW) 2026

详情
AI中文摘要

基于机器学习的入侵检测系统(IDS)用于基于RPL的物联网网络时,通常仅依赖路由层特征,这只能提供网络行为的部分视图。在这项工作中,我们研究了在基于LSTM的IDS中,将发送(TX)和接收(RX)无线电特征与标准RPL特征集相结合是否能提高检测性能。我们在三种不同的攻击类型(即DIS泛洪、本地修复和最差父节点)下,针对不同网络规模评估了所提出的方法。结果表明,与仅使用路由层特征相比,结合TX和RX特征使IDS的整体检测性能在F1分数上提高了约4%,其中在最差父节点攻击中观察到最显著的提升。

英文摘要

Machine learning-based intrusion detection systems (IDS) for RPL-based IoT networks often rely solely on routing layer features, which provide only a partial view of network behaviour. In this work, we investigate whether incorporating Transmit (TX) and Receive (RX) radio features alongside the standard RPL feature set can improve detection performance in an LSTM-based IDS. We evaluate the proposed approach across three different attack types, namely DIS-Flooding, Local Repair, and Worst Parent under varying network sizes. The results show that incorporating TX and RX improves the IDS's overall detection performance by up to ~4% in F1-score compared with using routing-layer features alone, with the most notable gain observed for the Worst Parent attack.

2606.07270 2026-06-08 cs.CY 新提交

Two-Phase Simulated Annealing for Equitable Team Formation: Eliminating Complaints in Large Engineering Cohorts

两阶段模拟退火算法实现公平团队组建:消除大型工程班级中的投诉

Yiwei Sun, Xinru Deng, Dimitrios G Papageorgiou

AI总结 提出一种两阶段算法,先通过图聚类形成固定三人组,再使用模拟退火优化配对,在238名学生中实现零投诉、GPA方差0.005、94.3%偏好满意度。

Comments 9 pages, 3 figures

详情
AI中文摘要

贡献:本文提出一种新颖的两阶段算法方法,将偏好满足与公平优化解耦,在不妥协的情况下同时实现学生团队组建的两个目标。该方法将模拟退火——一种核心材料科学技术——应用于教育挑战,展示了行政流程的教学整合。背景:在大型工程班级(100+学生)中组建有效团队需要平衡学生偏好、学术公平和人口多样性。现有工具要么忽略偏好而优化公平(CATME、Team-Anneal),要么在牺牲平衡的情况下容纳偏好(自选),导致投诉率在5-35%之间。预期成果:消除正式投诉,实现团队间GPA方差接近零,防止性别孤立,保持高偏好满意度,同时创建可扩展、可重复的解决方案,适用于工程课程。应用设计:第一阶段通过图论聚类形成固定三人组,最大化相互偏好,保留社会纽带。第二阶段采用模拟退火将三人组配对成六人团队,同时优化GPA方差、性别平衡和规模约束。这种分解反映了材料加工中的分层优化。结果:在238名学生中部署后,算法完全消除了正式投诉(对比>30%基线),实现了GPA方差0.005(对比历史均值9.74),消除了性别孤立的个体,并保持了94.3%的偏好满意度。针对82个历史分组实例(1538个团队,6个学年)的验证证实了相对于传统方法的显著改进。

英文摘要

Contribution: This paper presents a novel two-phase algorithmic approach that decouples preference satisfaction from fairness optimization in student team formation, achieving both objectives without compromise. The method applies simulated annealing -- a core materials science technique -- to an educational challenge, demonstrating pedagogical integration of administrative processes. Background: Forming effective teams in large engineering cohorts (100+ students) requires balancing student preferences, academic fairness, and demographic diversity. Existing tools either optimize for fairness while ignoring preferences (CATME, Team-Anneal) or accommodate preferences while compromising balance (self-selection), leaving complaint rates at 5--35%. Intended Outcomes: Eliminate formal complaints, achieve near-zero GPA variance between teams, prevent gender isolation, and maintain high preference satisfaction while creating a scalable, reproducible solution applicable across engineering programs. Application Design: Phase 1 forms fixed triads through graph-theoretic clustering that maximizes mutual preferences, preserving social bonds. Phase 2 employs simulated annealing to pair triads into teams of six while optimizing GPA variance, gender balance, and size constraints. This decomposition mirrors hierarchical optimization in materials processing. Findings: Deployed across 238 students, the algorithm eliminated formal complaints entirely (vs >30% baseline), achieved GPA variance of 0.005 (vs. historical mean 9.74), eliminated gender-isolated individuals, and maintained 94.3% preference satisfaction. Validation against 82 historical grouping instances (1,538 teams, 6 academic years) confirmed significant improvement over conventional methods.

2606.07258 2026-06-08 cs.CE q-bio.QM 新提交

CaliPPer: quantifying, predicting and improving AI model performance for binding prediction

CaliPPer:量化、预测和改进AI模型在结合预测中的性能

Jian-Qing Zheng, Hantao Lou, Zinan Yin, Sam Farrar, Yuze Zhou, Elie Antoun, Xiangxi Wang, Xuetao Cao, Tao Dong

AI总结 提出CaliPPer框架,通过多链样本到域距离和距离感知贝叶斯重校准,在三个分辨率上量化、预测和改进AI模型在结合预测中的性能,显著提升新表位、抗原变体和化学骨架上的发现率。

详情
AI中文摘要

结合预测模型加速了治疗性抗体和TCR的发现,但其在新数据集上的性能不可预测,常导致低发现率。密度比方法(PAPE, M-CBPE)为二分类提供无标签性能估计,但其假设和仅聚合输出限制了在新表位、抗原变体和化学骨架上的结合预测。这里我们提出CaliPPer(性能校准与预测),一个事后框架,将多链样本到域距离(S2DD)与距离感知贝叶斯重校准配对,在三个分辨率上运行:泛化性分数、聚合性能预测和每个样本置信度。在十个模型、八个架构和两个免疫受体域上,CaliPPer达到了距离-性能相关性$|r|=0.80\text{--}0.92$,预测AUROC/AP/F1的平均绝对误差为$0.008\text{--}0.070$,并在未见表位/变体上将AUROC提升高达$+0.20$。回顾性地应用于五个已发表的TCR、BCR、MHC-肽和小分子研究,CaliPPer在所有五个研究中提高了真实发现率(例如,$0/5 \to 3/5$确认的新抗原),在计算预测和实验验证之间提供了一个分诊层。

英文摘要

Binding prediction models accelerate therapeutic antibody and TCR discovery, but their performance on new datasets is unpredictable, often leading to low discovery rates. Density-ratio methods (PAPE, M-CBPE) provide label-free performance estimation for binary classification, but their assumptions and aggregate-only outputs limit binding prediction on neoepitopes, antigen variants and chemical scaffolds. Here we present CaliPPer (Calibration and Prediction of Performance), a post-hoc framework pairing a multi-chain Sample-to-Domain Distance (S2DD) with distance-aware Bayesian recalibration, operating at three resolutions: generalisability score, aggregate performance prediction, and per-sample confidence. Across ten models, eight architectures and two immune-receptor domains, CaliPPer attains distance--performance correlations $|r|=0.80\text{--}0.92$, predicts AUROC/AP/F1 with mean absolute errors $0.008\text{--}0.070$, and improves AUROC by up to $+0.20$ on unseen epitopes/variants. Applied retrospectively to five published TCR, BCR, MHC--peptide and small-molecule studies, CaliPPer raises true discovery rates in all five (e.g.\ $0/5 \to 3/5$ confirmed neoantigens), providing a triage layer between computational prediction and experimental validation.

2606.07248 2026-06-08 cs.DC 新提交

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

Clairvoyant: 预测性SJF调度以缓解串行LLM后端中的队头阻塞

Aravind Sundaresan

AI总结 提出Clairvoyant,一种轻量级侧车代理,通过预测请求长度实现SJF调度,缓解串行LLM后端中的队头阻塞,显著降低短请求延迟。

Comments 17 pages, 3 figures, 8 tables. Code: https://github.com/Aravind0403/clairvoyant-scheduler

详情
AI中文摘要

串行LLM推理后端(如Ollama)在FCFS准入策略下逐个处理请求,导致高利用率混合工作负载下的队头阻塞(HOLB):短事实查询可能被长生成任务延迟数分钟。虽然云规模部署通过连续批处理(vLLM, Orca)缓解HOLB,但这些方案需要数十GB的VRAM来维护并发KV缓存——对于依赖串行请求分发的内存受限边缘和本地部署不可行。我们提出\clairvoyant,一个即插即用的侧车代理,适用于任何串行OpenAI兼容后端(如Ollama, this http URL)。\clairvoyant通过ONNX导出的XGBoost分类器,从19个轻量级词汇特征预测响应长度,每个请求延迟0.029毫秒(比典型生成时间低四个数量级)。由于准入调度依赖于相对顺序而非精确预测,系统优化排序保真度,在自然对话数据集上达到62-96%的分布内准确率和52-66%的跨分布准确率。我们发现,精心设计的指令数据集是长度预测的退化训练源:GPT施加的简洁约束将长类表示减少到示例的0.02%以下,使得自然对话日志成为唯一可行的训练源。在RTX 4090上的端到端GPU基准测试显示,在最大队列压力(100个并发请求)下短请求的P50延迟降低70-76%,在稳态泊松到达(ρ=0.74)下降低17%。\clairvoyant是开源的,无需修改推理后端。

英文摘要

Serial LLM inference backends -- such as Ollama -- process requests one at a time under FCFS admission, causing Head-of-Line Blocking (HOLB) under mixed workloads at high utilisation: short factual queries can be delayed by minutes behind long generation jobs. While cloud-scale deployments mitigate HOLB via continuous batching (vLLM, Orca), these solutions require tens of GB of VRAM for concurrent KV-caches -- infeasible for memory-constrained edge and local deployments that rely on serial request dispatch. We present \clairvoyant, a drop-in sidecar proxy for any serial OpenAI-compatible backend (e.g., Ollama, llama.cpp). \clairvoyant predicts response length from 19 lightweight lexical features via an ONNX-exported XGBoost classifier, achieving 0.029\,ms per-request latency (four orders of magnitude below typical generation time). Because admission scheduling depends on relative ordering rather than exact prediction, the system optimises ranking fidelity, achieving 62--96\% in-distribution and 52--66\% cross-distribution accuracy across natural conversation datasets. We find that curated instruction datasets are degenerate training sources for length prediction: GPT-imposed brevity constraints reduce Long-class representation to under 0.02\% of examples, making natural conversation logs the only viable training source. End-to-end GPU benchmarks on an RTX~4090 show 70--76\% P50 latency reduction for short requests under maximum queue pressure (100 concurrent requests) and 17\% under steady-state Poisson arrivals ($ρ=0.74$). \clairvoyant is open-source and requires no modifications to the inference backend.