arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2605.06682 2026-05-11 cs.AI cs.CY

Fast and Effective Redistricting Optimization via Composite-Move Tabu Search

通过复合移动禁忌搜索实现高效的选区优化

Hai Jin, Diansheng Guo

发表机构 * Department of Land Surveying and Geo-Informatics(土地测绘与地理信息系)

AI总结 本文提出复合移动禁忌搜索算法,解决选区优化中的连通性约束问题,提升解的质量、鲁棒性和效率。

详情
AI中文摘要

空间选区优化是一个需要高质量解、快速响应和适应多目标的实用组合优化问题。核心挑战是连通性约束:在整数规划或启发式搜索中强制连通性会严重缩小可行邻域,削弱探索能力并陷入局部最优。我们引入了复合移动禁忌搜索(CM-Tabu),系统扩展禁忌搜索中的可行邻域空间同时保持连通性。当一个边界单元无法单独重新分配而不断开其选区时,我们的方法识别出一组最小单元或一对单元(或一组单元)可一起移动或交换作为连通性保持的复合移动。通过分析每个选区的连通图使用割点和双连通组件,生成候选单单元和复合移动。大量实验表明,所提出的方法在解的质量、运行鲁棒性和计算效率方面优于传统禁忌搜索和其他基线方法。例如,在费城案例中,该方法能持续达到人口均等和多目标权衡的理论全局最优。CM-Tabu提供的优化性能适合实际应用和决策支持流程。

英文摘要

Spatial redistricting is a practical combinatorial optimization problem that demands high-quality solutions, rapid turnaround, and flexibility to accommodate multi-criteria objectives and interactive refinement. A central challenge is the contiguity constraint: enforcing contiguity in integer-programming or heuristic search can severely shrink the feasible neighborhood, weaken exploration, and trap the search in poor local optima. We introduce a composite-move Tabu search (CM-Tabu) that systematically expands the feasible neighborhood space in Tabu search while preserving contiguity. When a boundary unit cannot be reassigned individually without disconnecting its district, our method identifies a minimal set of units that can move together, or a pair of units (or sets of units) that can be switched, as a contiguity-preserving composite move. Candidate single-unit and composite moves are generated in linear time by analyzing each district's contiguity graph using articulation points and biconnected components. Extensive experiments demonstrate that the proposed approach substantially improves solution quality, run-to-run robustness, and computational efficiency relative to traditional Tabu search and other baselines. For example, in the Philadelphia case, the approach can consistently attain the theoretical global optimum in population-equality and support multi-criteria trade-offs. CM-Tabu delivers optimization performance suitable for real-world practices and decision-support workflows.

2605.06680 2026-05-11 cs.LG cs.CV physics.flu-dyn

On the Role of Strain and Vorticity in Numerical Integration Error for Flow Matching

关于应变和涡量在流匹配数值积分误差中的作用

Chenxi Tao, Seung-Kyum Choi

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文分析了流匹配中应变和涡量对积分误差的影响,证明应变通过对数范数控制指数误差放大,而涡量仅线性影响局部截断误差,提出加权雅可比正则化方法,实验验证了理论预测,提升了积分精度。

Comments 16 pages, 7 figures. Preliminary version. Includes qualitative CIFAR-10 comparison and supporting synthetic experiments

详情
AI中文摘要

流匹配通过积分学习的流速场生成数据,积分步数(NFE)直接影响推理成本。本文分析了流速场的性质如何控制积分误差,通过将速度雅可比分解为对称部分S(应变率)和反对称部分Ω(涡量)。证明应变通过对数范数控制指数误差放大,而涡量仅线性影响局部截断误差。进一步表明最优传输流速场是无旋的且具有零材料导数,暗示二级欧拉精度;对于精确位移插值,相关拉格朗日粒子动力学通过欧拉积分精确。受此分析启发,研究了加权雅可比正则化,采用应变权重α和涡量权重β。实验在2D合成数据上验证了主要理论预测,显示在NFE=5时积分误差降低达2.7倍。初步CIFAR-10实验显示一致趋势,轻量级微调过程在NFE=10时提升FID 14%,同时保持高NFE质量。

英文摘要

Flow matching generates data by integrating a learned velocity field, where the number of integration steps (NFE) directly determines inference cost. We analyze which properties of the velocity field govern integration error by decomposing the velocity Jacobian into its symmetric part S (strain rate) and antisymmetric part Omega (vorticity). We prove that strain and vorticity play different roles: strain controls exponential error amplification through the logarithmic norm, while vorticity contributes only linearly to the local truncation error. We further show that the optimal transport velocity field is irrotational and has zero material derivative, implying second-order Euler accuracy; for exact displacement interpolation, the associated Lagrangian particle dynamics are integrated exactly by Euler. Motivated by this analysis, we study weighted Jacobian regularization with strain weight alpha and vorticity weight beta. Experiments on 2D synthetic data confirm the main theoretical predictions, showing up to 2.7x lower integration error at NFE=5. Preliminary CIFAR-10 experiments show consistent trends, with a lightweight fine-tuning procedure improving FID by 14 percent at NFE=10 while preserving high-NFE quality.

2605.06679 2026-05-11 cs.LG

Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding

打破幻觉:当积极与消极在多模态解码中相遇

Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang

发表机构 * School of Astronautics, Beihang University(北京航空航天大学航天学院) Longcat Interaction Team, Meituan(美团Longcat交互团队) Tianmushan Laboratory, Beihang University(北京航空航天大学天门山实验室)

AI总结 本文提出PND框架,通过在解码过程中引入正负对比路径,增强视觉真实性,无需重新训练即可在POPE、MME和CHAIR数据集上取得最佳性能。

Comments Accepted by CVPR 2026 (Conference on Computer Vision and Pattern Recognition). 11 pages, 5 figures. Code available at: https://github.com/JiangYubo4399/PND

详情
AI中文摘要

Vision-Language Models (VLMs) 频繁受到物体幻觉的损害,由于过度依赖语言先验,生成与视觉现实矛盾的内容。我们引入正负解码(PND),一种无需训练的推理框架,直接干预解码过程以强制视觉真实性。PND受我们发现的VLMs中注意力不平衡的启发,其中视觉特征被低估。我们的框架引入双路径对比:正路径放大视觉证据,负路径构建反事实以惩罚先验主导的生成。通过在解码过程中对比两个路径的输出,PND将生成引导至视觉基础的结果。在POPE、MME和CHAIR上的实验表明,无需重新训练即可实现最先进的性能。

英文摘要

Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free inference framework that intervenes directly in the decoding process to enforce visual fidelity. PND is motivated by our finding of an attention imbalance in VLMs, where visual features are under-weighted. Our framework introduces a dual-path contrast: a positive path that amplifies visual evidence and a negative path that constructs counterfactuals to penalize prior-dominant generation. By contrasting outputs from both paths during decoding, PND steers generation toward visually grounded results. Experiments on POPE, MME, and CHAIR demonstrate state-of-the-art performance without retraining.

2605.06678 2026-05-11 cs.LG q-fin.RM stat.AP

A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence

基于Wasserstein GAN的气候情景生成器用于风险管理与保险:以土壤沉降为例

Antoine Heranval, Olivier Lopez, Didier Ngatcha, Daniel Nkameni

发表机构 * Biostatistiques et Processus Spatiaux (BioSP), INRAE, Avignon, France(生物统计与空间过程(BioSP),法国国家农业食品与环境研究委员会(INRAE),法国阿维尼昂) CREST, CNRS, Ecole polytechnique, Groupe ENSAE-ENSAI, ENSAE Paris, Institut Polytechnique de Paris, Palaiseau, France(法国国家科学研究中心(CNRS)、巴黎高等理工学院(École polytechnique)、ENSAE-ENSAI小组、巴黎ENSAE、巴黎理工学院(Institut Polytechnique de Paris)、法国帕莱索) Fondation du Risque, Institut Louis Bachelier, Paris, France(风险基金会、路易·巴舍利尔研究所(Institut Louis Bachelier)、法国巴黎)

AI总结 本文提出基于条件生成对抗网络的SwiGAN模型,生成未来气候指数时空轨迹,用于模拟土壤湿润指数的干旱传播模式,支持适应性风险管理与保险策略设计。

详情
AI中文摘要

根据联合国减灾办公室(2025)的数据,自然灾害的年均损失从1970-2000年的70-80十亿美元增加到2001-2020年的180-200十亿美元。IFOA和WWF等组织报告指出,保险行业需通过制定中长期策略适应快速变化的环境,超越Solvency II等监管规定的年度视野。本文介绍了一种基于条件生成对抗网络(Conditional GANs)的人工智能框架,用于生成未来气候指数的时空轨迹。该方法聚焦于法国用于评估干旱严重程度的关键指标——土壤湿润指数(SWI)。干旱约占法国自然灾害保险方案下支付的赔偿款的30%。所提出的模型SwiGAN模拟了到2050年法国特定地区土壤沉降的可能传播模式。通过生成真实的SWI地图序列,SwiGAN提供了在气候变化情景下的干旱动态洞察,并支持适应性风险管理与保险策略的设计。该方法还可推广到其他气候相关风险和精算应用,如经济情景生成。

英文摘要

According to the United Nations Office for Disaster Risk Reduction (2025), the average annual cost of natural catastrophes increased from 70--80 billion USD between 1970 and 2000 to 180--200 billion USD between 2001 and 2020. Reports from organizations such as the IFOA and the WWF highlight the need for the insurance sector to adapt to this rapidly evolving context by developing medium- to long-term strategies that go beyond the one-year horizon of prudential regulations such as Solvency II. This paper introduces an artificial intelligence framework based on Conditional Generative Adversarial Networks (Conditional GANs) to generate future spatio-temporal trajectories of climatic indices. The approach focuses on the Soil Wetness Index (SWI), a key indicator used in France to assess drought severity. Drought accounts for approximately 30% of the indemnities paid under the French natural catastrophe insurance scheme. The proposed model, SwiGAN, simulates plausible drought propagation patterns up to 2050 for a region of France particularly exposed to this hazard. By generating realistic sequences of SWI maps, SwiGAN provides insights into drought dynamics under climate change scenarios and supports the design of adaptive risk management and insurance strategies. The methodology is also generalizable to other climate-related perils and actuarial applications such as economic scenario generation.

2605.06676 2026-05-11 cs.LG cs.CL

LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

LKV:端到端学习头部预算和令牌选择以LLM KV缓存淘汰

Enshuai Zhou, Yifan Hao, Chao Wang, Rui Zhang, Di Huang, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, Yunji Chen

发表机构 * University of Science and Technology of China(中国科学技术大学) State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China(中国科学院计算技术研究所过程器重点实验室,北京,中国) University of Chinese Academy of Sciences, Beijing, China(中国科学院大学,北京,中国)

AI总结 本文提出LKV,通过端到端可微优化问题实现KV缓存压缩,学习任务优化全局预算和内在KV重要性,提升长上下文推理性能。

详情
AI中文摘要

长上下文推理在大型语言模型中受限于KV缓存内存的线性增长。现有KV缓存压缩方法受限于启发式方法:启发式预算依赖统计先验而非任务目标,导致资源误配;启发式选择依赖耦合查询-键交互或静态归纳偏置(如注意力sink)。为解决这一限制,我们引入LKV(Learned KV Eviction),将KV压缩作为端到端可微优化问题。LKV整合LKV-H学习任务优化的全局预算,以及LKV-T推导内在KV重要性而无需显式生成注意力矩阵。此设计绕过启发式代理,严格对齐压缩与任务目标。广泛评估显示,LKV在LongBench和RULER基准测试中在高压缩率下取得最佳性能。特别是,在LongBench中,LKV仅保留15%的KV缓存即可实现近无损性能。关键地,我们的分析发现学习预算作为保真度的主要驱动因素,证明数据驱动分配对于克服手工启发式限制至关重要。

英文摘要

Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical priors rather than task objectives, causing resource misallocation, while heuristic selection relies on coupled query-key interactions or static inductive biases (e.g., attention sinks). To address this limitation, we introduce LKV (Learned KV Eviction), which formulates KV compression as an end-to-end differentiable optimization problem. LKV integrates LKV-H to learn task-optimized global budgets, and LKV-T to derive intrinsic KV importance without materializing attention matrices. This design bypasses heuristic proxies, strictly aligning compression with task objectives. Extensive evaluations demonstrate that LKV achieves state-of-the-art performance on both LongBench and RULER benchmarks at high compression rates. In particular, on LongBench, LKV achieves near-lossless performance with only 15\% KV cache retention. Crucially, our analysis identifies learned budgeting as the dominant driver of fidelity, demonstrating that data-driven allocation is essential to overcome the limitations of hand-crafted heuristics.

2605.06675 2026-05-11 cs.LG cs.CL cs.IT math.IT

RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

RateQuant: 通过率-失真理论实现最优混合精度KV缓存量化

Fei Zuo, Zikang Zhou, Hao Cong, Xiaoyan Xi, Ho Fai Leung

发表机构 * BA TechWorks (BMW Group)(BA TechWorks(宝马集团)) National University of Singapore(新加坡国立大学) Tsinghua University(清华大学)

AI总结 RateQuant利用率-失真理论优化混合精度KV缓存量化,通过为不同重要性头部分配不同位宽,减少内存瓶颈,提升生成性能。

Comments 18 pages, 7 figures, 5 tables

详情
AI中文摘要

大型语言模型在生成过程中缓存所有先前计算的关键值对(KV对),这种KV缓存与序列长度线性增长,成为服务的主要内存瓶颈。对KV缓存进行量化以减少成本,但当前所有量化器均对每个注意力头分配相同位宽,忽略了头部重要性的显著差异。一个自然的想法是为重要头部分配更多位宽,为其余头部分配更少位宽。然而,我们发现这种混合精度分配存在隐藏的陷阱:每个量化器遵循不同的失真曲线D(b)=alpha*beta^{-b},且beta值在不同量化器设计中从3.6到5.3变化。应用一个量化器的失真模型到另一个会颠倒位宽分配顺序,使性能劣于均匀量化。我们称这种失败模式为失真模型不匹配,并提出RateQuant来解决这一问题。RateQuant从一个小的校准集拟合每个量化器的失真模型,然后通过率-失真理论的反向水填充方法求解由此产生的位宽分配问题。在Qwen3-8B上,校准后的RateQuant将KIVI的困惑度从49.3降至14.9(70%的减少),并改进QuaRot的PPL值6.6。整个校准过程在单个GPU上仅需1.6秒,且推理时间无额外开销。

英文摘要

Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache to fewer bits reduces this cost, yet all current quantizers assign the same bit-width to every attention head, ignoring the large variation in head importance. A natural idea is to allocate more bits to important heads and fewer to the rest. We show, however, that such mixed-precision allocation has a hidden pitfall: each quantizer follows a different distortion curve D(b)=alpha*beta^{-b}, and the decay rate beta varies from 3.6 to 5.3 across quantizer designs. Applying one quantizer's distortion model to another inverts the allocation order and makes performance worse than uniform quantization. We call this failure mode distortion model mismatch and propose RateQuant to resolve it. RateQuant fits a per-quantizer distortion model from a small calibration set, then solves the resulting bit-allocation problem in closed form via reverse waterfilling from rate-distortion theory. On Qwen3-8B at 2.5 average bits, calibrated RateQuant reduces KIVI's perplexity from 49.3 to 14.9 (70% reduction) and improves QuaRot by 6.6 PPL. The entire calibration takes 1.6 s on a single GPU and adds zero overhead at inference time.

2605.06673 2026-05-11 cs.CL cs.AI cs.LG

Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas

领域级元认知监控在前沿大语言模型中的应用:一个33模型图谱

Jon-Paul Cacioli

发表机构 * Independent Researcher(独立研究员)

AI总结 本文通过33个前沿LLM在MMLU基准领域中的表现,揭示了元认知评分掩盖的领域级差异,发现应用/专业知识领域监控效果最佳,而形式推理和自然科学领域最难,且中等难度领域无显著差异。

Comments 25 pages, 7 figures, 1 supplementary table. Code and data: https://github.com/synthiumjp/metacognitive-profile-atlas

详情
AI中文摘要

聚合的元认知质量评分掩盖了在MMLU基准领域内模型间的差异。我们对33个前沿LLM进行了1,500个MMLU项目(每个领域250个,预先确定的六个领域分组)的测试,并使用口头化的置信度(0-100)计算每个模型-领域单元的类型2 AUROC。总观察数:47,151。每个模型在聚合监控中超过偶然水平的模型都显示出非平凡的领域级差异。应用/专业知识是可靠最容易监控的基准领域(平均AUROC = .742,在21个33个模型中排名前2);形式推理和自然科学是可靠最难监控的(在27个33个模型中排名后2之一)。三个中等难度领域在统计上无法区分(Kendall's W = .164)。受体层面的相似性分析(领域内相似性比 = 0.95)证实了六个领域分组是一个实用的基准分类法,而不是一个验证的潜在结构。在家族内轮廓形状聚类在Anthropic、Google-Gemini和Qwen中是显著的(置换p < .0001),但在DeepSeek、Google-Gemma和OpenAI中不显著。Gemma 4 31B在Gemma 3 27B上显示了+0.202 AUROC的改进。三个模型在二元KEEP/WITHDRAW探测器上被分类为无效,但在口头化置信度下产生了正常轮廓,证实了探测器格式的特异性。198个单元的bootstrap 95%置信区间中位数宽度为.199。分割一半的聚合稳定性r = .893;轮廓级分割一半较弱(总体中位数r = .184)。这些结果表明,稳定的基准领域差异被聚合指标所掩盖,支持在特定应用领域部署前进行基准阶段领域筛选。

英文摘要

Aggregate metacognitive quality scores mask within-model variation across MMLU benchmark domains. We administered 1,500 MMLU items (250 per domain, under an a priori six-domain grouping) to 33 frontier LLMs from eight model families and computed Type-2 AUROC per model-domain cell using verbalized confidence (0-100). Total observations: 47,151. Every model with above-chance aggregate monitoring showed non-trivial domain-level variation. Applied/Professional knowledge was reliably the easiest benchmark domain to monitor (mean AUROC = .742, ranked top-2 in 21 of 33 models); Formal Reasoning and Natural Science were reliably the hardest (one of the two ranked bottom-2 in 27 of 33 models). The three middle domains were statistically indistinguishable (Kendall's W = .164). A subject-level coherence analysis (within-domain similarity ratio = 0.95) confirms the six-domain grouping is a pragmatic benchmark taxonomy, not a validated latent construct. Within-family profile-shape clustering is significant for Anthropic, Google-Gemini, and Qwen (permutation p < .0001) but not DeepSeek, Google-Gemma, or OpenAI. Gemma 4 31B showed a +.202 AUROC improvement over Gemma 3 27B. Three models classified Invalid on binary KEEP/WITHDRAW probes produced normal profiles under verbalized confidence, confirming probe-format specificity. Bootstrap 95% CIs on 198 cells have median width .199. Split-half aggregate stability r = .893; profile-level split-half is weaker (grand median r = .184). These results show stable benchmark-domain variation obscured by aggregate metrics, and support benchmark-stage domain screening as a step before deployment in specific application areas.

2605.06671 2026-05-11 cs.AI cs.MA

GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning

GraphDC: 一种用于可扩展图算法推理的分而治之多智能体系统

Wenjin Li, Jiaming Cui

发表机构 * Department of Computer Science, Virginia Tech(弗吉尼亚理工大学计算机科学系)

AI总结 GraphDC通过分而治之的多智能体框架提升图算法推理效率,通过分解图并分配子任务给专用智能体,结合主智能体整合结果,提升大规模图处理的性能和鲁棒性。

详情
AI中文摘要

大型语言模型(LLMs)在许多数学问题上表现出强大的潜力,但在图算法任务上表现仍不令人满意,因为图结构本身更复杂,通常需要系统性的多步推理,尤其是对大规模图而言。受此差距的启发,我们提出了GraphDC,一种用于可扩展图算法推理的分而治之多智能体框架。具体而言,受分而治之设计启发,GraphDC将输入图分解为更小的子图,将每个子图分配给专用智能体进行局部推理,并使用主智能体整合局部输出与子图间的信息以生成最终解决方案。这种分层设计减少了单个智能体的推理负担,缓解了计算瓶颈,并提高了大规模图实例的鲁棒性。大量实验表明,GraphDC在多样化的任务和不同规模上均优于现有方法,特别是在较大实例上,直接端到端推理的可靠性较低。

英文摘要

Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi-step reasoning, especially on larger graphs. Motivated by this gap, we propose GraphDC, a Divide-and-Conquer multi-agent framework for scalable graph algorithm reasoning. Specifically, inspired by Divide-and-Conquer design, GraphDC decomposes an input graph into smaller subgraphs, assigns each subgraph to a specialized agent for local reasoning, and uses a master agent to integrate the local outputs with inter-subgraph information to produce the final solution. This hierarchical design reduces the reasoning burden on individual agents, alleviates computational bottlenecks, and improves robustness on large graph instances. Extensive experiments show that GraphDC consistently outperforms existing methods on graph algorithm reasoning across diverse tasks and scales, especially on larger instances where direct end-to-end reasoning is less reliable.

2605.06623 2026-05-11 cs.AI cs.CL cs.LG cs.MA

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

MASPO:基于大语言模型的多智能体系统的联合提示优化

Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang

发表机构 * Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China(哈尔滨工业大学深圳研究院)

AI总结 本文提出MASPO框架,通过联合评估机制和数据驱动的进化束搜索,优化多智能体系统中的提示,提升整体性能,实验显示在6个任务中平均准确率提升2.9。

Comments Accepted at ICML 2026

详情
AI中文摘要

基于大语言模型(LLM)的多智能体系统(MAS)在处理复杂协作任务中展现出潜力,其中智能体通常通过角色特定的提示进行协调。尽管提示质量至关重要,但跨交互智能体的联合优化仍具挑战性,主要由于局部智能体目标与整体系统目标之间的不一致。为此,我们引入MASPO,一种新的框架,旨在自动且迭代地优化整个系统的提示。MASPO的核心创新是其联合评估机制,该机制不仅评估提示的局部有效性,还评估其促进后续智能体下游成功的能力。这有效弥合了局部交互与全局结果之间的鸿沟,而无需依赖地面真实标签。此外,MASPO采用数据驱动的进化束搜索,以高效地导航高维提示空间。在六个多样化任务上的广泛实验证明,MASPO在6个任务中均优于最先进的提示优化方法,平均准确率提升2.9。我们已发布代码至https://github.com/wangzx1219/MASPO。

英文摘要

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at https://github.com/wangzx1219/MASPO.

2605.06435 2026-05-11 cs.CL cs.AI cs.LG

COVID-19 Infodemic. Understanding content features in detecting fake news using a machine learning approach

新冠疫情信息洪流。利用机器学习方法理解内容特征以检测虚假新闻

Vimala Balakrishnan, Lee Zing Hii, Eric Laporte

发表机构 * Faculty of Computer Science & Information Technology, Universiti Malaya(马来亚大学计算机科学与信息技术学院) LIGM, Univ Gustave Eiffel, CNRS(法国埃菲尔大学LIGM研究所,CNRS)

AI总结 本研究通过机器学习方法探讨内容特征在虚假新闻检测中的应用,发现文本和语言特征单独使用能提升检测效果,但结合使用未显著提升性能。

Journal ref Malaysian Journal of Computer Science, 2023, 36 (1), pp.1-13

详情
AI中文摘要

尽管有实证证据表明内容特征(特别是文本和语言特征)可用于区分真实和虚假新闻,但其在虚假新闻检测中的应用仍研究不足。本研究探讨了词双元组、词性分布等特征,通过在新冠疫情期间收集的新数据集,使用决策树、K近邻、逻辑回归、支持向量机和随机森林等方法进行实验。随机森林在所有设置中表现最佳,其次是支持向量机。总体而言,单独使用文本和语言特征能提升虚假新闻检测效果,但将其结合到单一模型中未显著提升检测性能。此外,大写和词性标签的使用也表现出差异。本研究显示,传统机器学习方法可以成功用于虚假新闻检测,而非深度学习。

英文摘要

The use of content features, particularly textual and linguistic for fake news detection is under-researched, despite empirical evidence showing the features could contribute to differentiating real and fake news. To this end, this study investigates a selection of content features such as word bigrams, part of speech distribution etc. to improve fake news detection. We performed a series of experiments on a new dataset gathered during the COVID-19 pandemic and using Decision Tree, K-Nearest Neighbor, Logistic Regression, Support Vector Machine and Random Forest. Random Forest yielded the best results, followed closely by Support Vector Machine, across all setups. In general, both the textual and linguistic features were found to improve fake news detection when used separately, however, combining them into a single model did not improve the detection significantly. Differences were also noted between the use of bigrams and part of speech tags. The study shows that textual and linguistic features can be used successfully in detecting fake news using the traditional machine learning approach as opposed to deep learning.

2605.06298 2026-05-11 cs.CV cs.AI

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

渲染,而非解码:具有潜在结构解耦的权重空间世界模型

Roussel Desmond Nzoyem, Mauro Comi

发表机构 * Department of Computer Science(计算机科学系) University of Manchester(曼彻斯特大学) University of Bristol(布里斯托大学)

AI总结 本文提出NOVA框架,通过权重和偏置构建隐式神经表示,实现高效且可解释的世界模型,支持结构场景组件的解耦与编辑。

Comments 35 pages, 30 figures, 8 tables

详情
AI中文摘要

在大量未标记视频上训练世界模型是实现完全自主智能的关键步骤。然而,传统将原始像素编码为模糊潜在空间并依赖重装解码器进行重建的方法使模型计算成本高且不可解释。我们通过引入NOVA世界建模框架,将系统状态表示为辅助坐标基于隐式神经表示(INR)的权重和偏置。这种结构化表示可解析地渲染,消除了解码器瓶颈,同时具有紧凑性、可移植性和零样本超分辨率能力。此外,像大多数潜在动作模型一样,NOVA可通过动作匹配目标转换为上下文相关的视频生成器。令人惊讶的是,无需辅助损失或对抗性目标,NOVA能够解耦结构场景组件,如背景、前景和帧间运动,使用户能够在不损害其他部分的情况下编辑内容或动态。我们在多个具有挑战性的数据集上验证了该框架,实现了强大的可控预测,仅在单个消费级GPU上以约4000万参数运行。最终,像INR这样的结构化表示不仅增强了我们对潜在动态的理解,也为沉浸式和可定制的虚拟体验铺平了道路。

英文摘要

Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction leaves these models computationally expensive and uninterpretable. We address this problem by introducing NOVA, a world modelling framework that represents the system state as the weights and biases of an auxiliary coordinate-based implicit neural representation (INR). This structured representation is analytically rendered, which eliminates the decoder bottleneck while conferring compactness, portability, and zero-shot super-resolution. Furthermore, like most latent action models, NOVA can be distilled into a context-dependent video generator via an action-matching objective. Surprisingly, without resorting to auxiliary losses or adversarial objectives, NOVA can disentangle structural scene components such as background, foreground, and inter-frame motion, enabling users to edit either content or dynamics without compromising the other. We validate our framework on several challenging datasets, achieving strong controllable forecasting while operating on a single consumer GPU at $\sim$40M parameters. Ultimately, structured representations like INRs not only enhance our understanding of latent dynamics but also pave the way for immersive and customisable virtual experiences.

2605.06230 2026-05-11 cs.AI cs.DC

Safactory: A Scalable Agentic Infrastructure for Training Trustworthy Autonomous Intelligence

Safactory:一个可扩展的代理基础设施,用于训练可信的自主智能

Xinquan Chen, Zhenyun Yin, Shan He, Bin Huang, Shanzhe Lei, Pengcheng Shi, Kun Cai, Bei Chen, Bangwei Liu, Zeyu Kang, Chao Huang, Yang Zhang, Wenjie Li, Ruijun Ge, Yajie Wang, Tianshun Fang, Tianyang Xu, Yiwen Cong, Meng Jin, Gaolei Li, Xuansheng Wu, Linhan Liu, Zijing He, An Li, Yan Teng, Xin Tan, Dongrui Liu, Jing Shao, ChaoChao Lu, Ji He, Jie Li, Chunfeng Song, Jinya Xu, Fan Song, Shujie Wang, Jianmin Qian, Jie Hou, Xuhong Wang, Yingchun Wang, Hui Wang, Xia Hu

发表机构 * Shanghai AI Laboratory(上海人工智能实验室)

AI总结 本文提出Safactory,一个可扩展的代理工厂,用于训练可信的自主智能。该框架整合了三个紧密耦合的平台,以实现长期决策、工具使用和真实环境交互的可靠性和连续改进。

Comments 50 pages, 21 figures

详情
AI中文摘要

随着大模型从对话助手发展为自主代理,长期决策、工具使用和真实环境交互等挑战日益增加。现有的代理基础设施在评估、数据管理和代理进化方面仍然碎片化,难以系统性地发现风险并持续闭环改进模型。在本报告中,我们提出了Safactory,一个可扩展的代理工厂,用于可信的自主智能。Safactory整合了三个紧密耦合的平台:平行模拟平台用于轨迹生成,可信数据平台用于轨迹存储和经验提取,以及自主进化平台用于异步强化学习和在线策略蒸馏。据我们所知,Safactory是第一个提出下一代可信自主智能统一进化管道的框架。

英文摘要

As large models evolve from conversational assistants into autonomous agents, challenges increasingly arise from long-horizon decision making, tool use, and real environment interaction. Existing agenticinfrastructure remain fragmented across evaluation, data management, and agent evolution, making it difficult to discover risks systematically and improve models in a continuous closed loop. In this report, we present \textbf{Safactory}, a scalable agent factory for trustworthy autonomous intelligence. Safactory integrates three tightly coupled platforms: a \textbf{Parallel Simulation Platform} for trajectory generation, a \textbf{Trustworthy Data Platform} for trajectory storage and experience extraction, and an \textbf{Autonomous Evolution Platform} for asynchronous reinforcement learning and on-policy distillation. As far as we know, Safactory is the first framework to propose a unified evolutionary pipeline for next-generation trustworthy autonomous intelligence.

2605.06175 2026-05-11 cs.RO

VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts

VLA-GSE: 提升VLA中基于通用和专用专家的参数高效微调

Yuhua Jiang, Junjie Lu, Xinyao Qin, Xiaoyu Chen, Kaixin Wang, Feifei Gao, Li Zhao

发表机构 * Microsoft Research Asia(微软亚洲研究院) Tsinghua University(清华大学)

AI总结 VLA-GSE通过通用和专用专家机制提升VLA在机器人控制中的参数高效微调效果,保留预训练知识的同时提高适应能力,实验表明其在多个基准测试中表现优异。

详情
AI中文摘要

Vision-language-action (VLA) 模型继承了预训练视觉-语义先验,但适应机器人控制仍具挑战性。全微调(FFT)易过拟合下游机器人数据并遗忘预训练视觉-语言能力。参数高效微调(PEFT)能更好地保留预训练知识,但现有方法仍难以有效适应机器人控制任务。为此,我们提出VLA-GSE,一种参数高效的VLA微调框架,通过谱分解冻结骨干,将主导奇异成分分配给通用专家,将不相交残差成分分配给专用专家,从而在固定可训练参数预算下提升适应能力。在可比参数预算下,VLA-GSE仅更新全模型参数的2.51%,并持续优于强FFT和PEFT基线。它在LIBERO-Plus上实现81.2%的零样本成功率,与LoRA在多模态理解基准测试中保持相似的预训练VLM能力,并在多个分布偏移下提升现实世界操作的成功率。代码可在:https://github.com/YuhuaJiang2002/VLA-GSE 获取。

英文摘要

Vision-language-action (VLA) models inherit rich visual-semantic priors from pre-trained vision-language backbones, but adapting them to robotic control remains challenging. Full fine-tuning (FFT) is prone to overfitting on downstream robotic data and catastrophic forgetting of pretrained vision-language capabilities. Parameter-efficient fine-tuning (PEFT) better preserves pre-trained knowledge, yet existing PEFT methods still struggle to adapt effectively to robot control tasks. To address this gap, we propose VLA-GSE, a parameter-efficient VLA fine-tuning framework that improves control adaptation while retaining PEFT's knowledge preservation advantage. Specifically, VLA-GSE (Generalized and Specialized Experts) is initialized by spectrally decomposing the frozen backbone, assigning leading singular components to generalized experts (shared experts) and disjoint residual components to specialized experts (routed experts). This decomposition improves adaptation capacity under a fixed trainable-parameter budget. Under a comparable parameter budget, VLA-GSE updates only 2.51% of the full model parameters and consistently outperforms strong FFT and PEFT baselines. It achieves 81.2% average zero-shot success on LIBERO-Plus, preserves pre-trained VLM capability comparably to LoRA on multimodal understanding benchmarks, and improves real-world manipulation success under multiple distribution shifts. Code is available at: https://github.com/YuhuaJiang2002/VLA-GSE

2605.06169 2026-05-11 cs.LG cs.CV

Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers

均值模式尖叫:用于1000层扩散变换器的均值-方差分割残差

Pengqi Lu

发表机构 * Beijing, China(中国北京)

AI总结 本文提出均值-方差分割残差,解决扩散变换器在数百层时出现的均值主导崩溃问题,通过分割残差更新和泄漏主干均值替换,使模型在极端深度下保持稳定训练。

Comments 43 pages (9-page main paper + appendix)

详情
AI中文摘要

将扩散变换器(DiTs)扩展到数百层会引入结构漏洞:网络可能进入一个沉默、均值主导的崩溃状态,使标记表示同质化并抑制中心变化。通过机制审计,我们发现这种崩溃的触发事件是均值模式尖叫(MMS)。MMS即使在训练看似稳定时也可能发生,其由均值相干的反向冲击在残差写入器上产生,打开深层残差分支并驱动网络进入均值主导状态。我们展示这种行为是由这些梯度的精确分解为均值相干和中心成分驱动的,并受注意力-logit梯度通过Softmax雅可比矩阵的空域抑制影响。为了解决这一问题,我们提出了均值-方差分割(MV-Split)残差,结合单独获得的中心残差更新和泄漏主干均值替换。在400层单流DiT上,MV-Split防止了崩溃并使模型在崩溃前的轨迹上接近基线,同时在全调度中优于如LayerScale等令牌各向同性门控方法。最后,我们展示了1000层DiT作为边界尺度的规模验证运行,证明在极端深度下架构仍能稳定训练。

英文摘要

Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. Through mechanistic auditing, we isolate the trigger event of this collapse as Mean Mode Screaming (MMS). MMS can occur even when training appears stable, with a mean-coherent backward shock on residual writers that opens deep residual branches and drives the network into a mean-dominated state. We show this behavior is driven by an exact decomposition of these gradients into mean-coherent and centered components, compounded by the structural suppression of attention-logit gradients through the null space of the Softmax Jacobian once values homogenize. To address this, we propose Mean-Variance Split (MV-Split) Residuals, which combine a separately gained centered residual update with a leaky trunk-mean replacement. On a 400-layer single-stream DiT, MV-Split prevents the divergent collapse that crashes the un-stabilized baseline; it tracks close to the baseline's pre-crash trajectory while remaining substantially better than token-isotropic gating methods such as LayerScale across the full schedule. Finally, we present a 1000-layer DiT as a scale-validation run at boundary scales, establishing that the architecture remains stably trainable at extreme depth.

2605.06156 2026-05-11 cs.LG cs.AI

Entropy-Regularized Adjoint Matching for Offline Reinforcement Learning

熵正则化对偶匹配用于离线强化学习

Abdelghani Ghanem, Mounir Ghogho

发表机构 * College of Computing(计算学院) Mohammed VI Polytechnic University(穆莱·阿卜杜勒阿齐兹技术大学)

AI总结 本文提出最大熵对偶匹配框架,通过镜像下降熵最大化和混合行为先验机制,缓解流行偏差和支撑绑定问题,提升离线数据中最优策略的提取能力。

详情
AI中文摘要

将具有表达力的生成策略,如流匹配模型,整合到离线强化学习(RL)中,使智能体能够捕捉复杂多模式行为。虽然通过连续共轭方法的Q学习与对偶匹配(QAM)通过连续共轭方法稳定了策略优化,但其本质上仍受限于固定行为分布。这种依赖性导致了流行偏差,可能抑制低密度区域中的高奖励动作,并产生支撑绑定,限制非流形探索。现有解决方案,如附加残差高斯策略,往往重新引入与单峰分布相关的表达性瓶颈。在本文中,我们提出最大熵对偶匹配(ME-AM),一个统一框架,该框架在连续流公式化中解决了这些限制。ME-AM包含两个机制:(1)一个镜像下降熵最大化目标,缓解流行偏差以促进从离线数据集中提取最优策略;(2)一个混合行为先验,扩展几何支持以涵盖非分布高奖励区域。通过探索此扩展几何,ME-AM识别出稳健的动作,同时保持生成向量场的绝对连续性。实证上,ME-AM在多样化的稀疏奖励连续控制环境中表现出竞争力或优于现有最先进(SOTA)方法的性能。

英文摘要

Integrating expressive generative policies, such as flow-matching models, into offline reinforcement learning (RL) allows agents to capture complex, multi-modal behaviors. While Q-learning with Adjoint Matching (QAM) stabilizes policy optimization via the continuous adjoint method, it remains inherently bound to the fixed behavior distribution. This dependence induces a \textit{popularity bias} that can suppress high-reward actions in low-density regions, and creates a \textit{support binding} that restricts off-manifold exploration. Existing workarounds, such as appending \textit{residual} Gaussian policies, often re-introduce the expressivity bottlenecks associated with unimodal distributions. In this work, we propose \textit{Maximum Entropy Adjoint Matching} (ME-AM), a unified framework that addresses these limitations within the continuous flow formulation. ME-AM incorporates two mechanisms: (1) a Mirror Descent entropy maximization objective that mitigates the popularity bias to facilitate the extraction of optimal policies from offline datasets, and (2) a \textit{Mixture Behavior Prior} that broadens the geometric support to encompass out-of-distribution high-reward regions. By exploring this extended geometry, ME-AM identifies robust actions while preserving the absolute continuity of the generative vector field. Empirically, ME-AM demonstrates competitive or superior performance compared to prior state-of-the-art (SOTA) methods across a diverse suite of sparse-reward continuous control environments.

2605.06115 2026-05-11 cs.AI

CrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMs

CrossCult-KIBench:一种用于多模态大语言模型跨文化知识插入的基准测试

Zhen Zeng, Leijiang Gu, Feng Li, Jing Yu, Zenglin Shi

发表机构 * Hefei University of Technology(合肥工业大学) Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China(民族语言智能分析与安全治理国家重点实验室,中央民族大学) School of Information Engineering, Minzu University of China(中央民族大学信息工程学院)

AI总结 本文提出CrossCult-KIBench基准测试,用于评估跨文化知识插入的有效性及非目标文化的影响,通过9800个图像场景测试不同语言文化的适应性,提出MCKI方法并揭示了当前模型在文化适应与行为保持间的平衡难题。

详情
AI中文摘要

多模态大语言模型(MLLMs)主要基于英语数据训练,在跨文化环境下常生成文化不当或不一致的响应。为缓解此问题,我们引入了跨文化知识插入任务,旨在在适应特定文化背景的同时保持模型在其他文化中的原始行为。为促进该领域研究,我们提出了CrossCult-KIBench,一个全面的评估基准,用于评估知识插入的有效性及其对非目标文化的影响。该基准包含9,800个图像基础案例,涵盖英语、中文和阿拉伯语文化相关的49种视觉场景,支持单次插入和连续插入两种评估模式。我们还提出了记忆条件知识插入(MCKI)作为基线方法。MCKI通过使用冻结的MLLM表示从外部记忆中检索相关文化知识,在适用时将匹配条目作为条件提示前置。在CrossCult-KIBench上的广泛实验表明,当前方法在有效文化适应与行为保持之间的平衡上存在困难,突显了开发文化感知MLLMs的关键挑战。我们的工作因此强调了开发更具文化适应性和责任感的MLLMs的重要研究方向。

英文摘要

Multimodal Large Language Models (MLLMs), trained primarily on English-centric data, frequently generate culturally inappropriate or misaligned responses in cross-cultural settings. To mitigate this, we introduce the task of cross-cultural knowledge insertion, which focuses on adapting models to specific cultural contexts while preserving their original behavior in other cultures. To facilitate research in this area, we introduce CrossCult-KIBench, a comprehensive evaluation benchmark for assessing both the effectiveness of knowledge insertion and its unintended side effects on non-target cultures. The benchmark includes 9,800 image-grounded cases covering 49 culturally relevant visual scenarios across English, Chinese, and Arabic language-culture groups. It supports evaluation in both single-insert and sequential-insert settings. We also propose Memory-Conditioned Knowledge Insertion (MCKI) as a baseline method. MCKI retrieves relevant cultural knowledge from an external memory using frozen MLLM representations, prepending matched entries as conditional prompts when applicable. Extensive experiments on CrossCult-KIBench reveal that current approaches struggle to balance effective cultural adaptation with behavioral preservation, highlighting a key challenge in developing culturally-aware MLLMs. Our work thus underscores an important research direction for developing more culturally adaptive and responsible MLLMs.

2605.05958 2026-05-11 cs.AI

Temporal Smoothness Doubly Robust Learning for Debiased Knowledge Tracing

基于时间平滑的双重鲁棒知识追踪学习

Peilin Zhan, Wei Chen, Weilin Chen, Shuyi Pan, Ruichu Cai

发表机构 * School of Computer Science and Technology, Guangdong University of Technology(广东工业大学计算机科学与技术学院)

AI总结 本文提出TSDR框架,通过引入时间平滑正则化,结合倾向性模型与误差填补模型,解决知识追踪中的偏差问题,提升预测稳定性与准确性。

详情
AI中文摘要

知识追踪(KT)是智能教育系统的核心,但依赖于选择性观察的教育日志,导致严重的选择偏差。现有方法忽视此问题,训练时使用标准经验风险,产生偏差的掌握估计并积累误差。本文引入双重鲁棒(DR)框架,整合倾向性模型与误差填补模型,理论上保证无偏性。在序列设置中,估计器性能受方差依赖的随机偏差影响,导致训练不稳定。本文推导泛化界,明确估计器方差的影响,并识别时间平滑为控制关键因素。基于理论,提出TSDR框架,联合优化KT预测器与填补模型,减少方差并保持无偏性。实验表明,TSDR提升多种先进KT模型,凸显系统性偏差校正的重要性。

英文摘要

Knowledge Tracing (KT) is fundamental to intelligent education systems, yet relies on educational logs that are selectively observed. The non-random nature of exercise recommendations and student choices inevitably induces severe selection bias. Most existing KT methods neglect this issue, training on observed logs using standard empirical risk, which yields biased mastery estimates and accumulates errors in subsequent recommendations. To address this, we introduce a doubly robust (DR) formulation for KT that integrates a propensity model with an error imputation model, theoretically guaranteeing unbiasedness if either model is accurate. Beyond unbiasedness, in the sequential setting of KT, we identify that the estimator's performance is compromised by variance-dependent stochastic deviations that accumulate over time, thereby causing training instability and limiting performance. To mitigate this, we derive a generalization bound that explicitly characterizes the impact of estimator variance and identifies temporal smoothness as a key factor in controlling it. Building on these theoretical insights, we propose the Temporal Smoothness Doubly Robust (TSDR) framework. TSDR jointly optimizes the KT predictor and the imputation model with a smoothness regularizer, effectively reducing variance while preserving the unbiasedness guarantee of DR. Experiments on multiple real-world benchmarks demonstrate that TSDR consistently enhances various state-of-the-art KT backbones, underscoring the vital role of principled bias correction in KT.

2605.05949 2026-05-11 cs.AI cs.SE

MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System

MAS-Algorithm: 一个基于多智能体系统的算法问题求解工作流

Yuliang Xu, Xiang Xu, Yao Wan, Hu Wei, Tong Jia

发表机构 * Peking University(北京大学) Alibaba Group(阿里巴巴集团) Huazhong University of Science and Technology(华中科技大学)

AI总结 本文提出MAS-Algorithm,通过多智能体工作流提升算法问题求解能力,实验显示在多个模型上均取得显著提升,包括接受率提升6.48%和LiveCodeBench-Pro上的4.72%提升。

详情
AI中文摘要

算法问题求解是评估AI编码系统结构推理能力的严格测试场,直接反映模型在复杂场景中进行结构推理的能力。现有方法主要依赖模型中心策略,如架构修改和数据扩展,成本高且可解释性差。替代方法利用外部工具或提示技术(如链式思考)往往碎片化且缺乏统一框架。本文提出MAS-Algorithm,一个受竞赛程序员和算法工程师实践启发的系统性多智能体工作流。我们的框架将端到端求解过程分解为模块化阶段,实现结构推理、工具集成和智能体间灵活协调。设计强调严谨性和可扩展性,使其能够推广到多样化的问题类型。在自建基准上的实验结果表明,多个Qwen系列模型均取得一致提升,平均接受率提升6.48%。相比之下,相同数据上的参数高效微调仅带来0.89%的微小提升。我们进一步在LiveCodeBench-Pro上观察到4.72%的提升,同时在其他准确性和效率指标上也取得一致提升。除了性能提升,我们还进行了全面分析以更好地理解工作流内的推理过程,包括错误模式和跨场景行为。我们进一步进行了定制替换和消融研究以探索框架的上限,显示单个智能体可带来高达27.7%的改进。这些结果突显了MAS-Algorithm在推动AI驱动算法推理方面的强大潜力。

英文摘要

Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios. Existing approaches predominantly rely on model-centric strategies, such as architectural modifications and data scaling, which are costly and offer limited interpretability. Alternative methods leveraging external tools or prompting techniques (e.g., chain-of-thought) are often fragmented and lack a unified framework. In this paper, we propose MAS-Algorithm, a systematic multi-agent workflow for algorithmic problem solving inspired by the practices of competitive programmers and algorithm engineers. Our framework decomposes the end-to-end solving process into modular stages, enabling structured reasoning, tool integration, and flexible coordination among agents. The design emphasizes both rigor and extensibility, allowing it to generalize across diverse problem types. Experimental results on a self-constructed benchmark demonstrate consistent improvements across multiple Qwen series models, achieving an average gain of 6.48% in acceptance rate. In contrast, parameter-efficient fine-tuning on the same data yields only a marginal improvement of 0.89%. We further observe a 4.72% gain on LiveCodeBench-Pro, along with consistent improvements across additional accuracy and efficiency metrics. Beyond performance gains, we conduct comprehensive analyses to better understand the reasoning process within the workflow, including error patterns and cross-scenario behaviors. We further perform customized replacement and ablation studies to explore the upper bound of the framework, showing that individual agents can contribute improvements of up to 27.7%. These results highlight the strong potential of MAS-Algorithm for advancing AI-driven algorithmic reasoning.

2605.05927 2026-05-11 cs.CL cs.SD eess.AS

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

从输入侧最小化模态差距:你的语音大语言模型可以成为具有语调意识的文本大语言模型

Wenqian Cui, Xiao-Hui Li, Daxin Tan, Qiyong Zheng, Irwin King

发表机构 * The Chinese University of Hong Kong(香港中文大学) Huawei Technologies(华为技术)

AI总结 本文提出TextPro-SLM,通过改进语音输入使其更接近具有语调意识的文本大语言模型,从而有效减少模态差距,实验表明其在3B和7B规模下表现最佳,且数据效率高。

Comments Work in progress

详情
AI中文摘要

语音大语言模型(SLMs)通常基于文本大语言模型(TLM)检查点构建,但仍存在显著的模态差距。先前工作主要从输出侧减少此差距,通过使语音生成更接近文本,但效果有限。本文认为关键瓶颈在于输入侧。我们提出TextPro-SLM,一种使语音输入更接近具有语调意识的文本大语言模型的SLM。TextPro-SLM结合WhisperPro,一个统一的语音编码器,可生成同步文本标记和语调嵌入,与训练以保留原始TLM语义能力并学习非语言理解的LLM骨干相结合。实验表明,TextPro-SLM在3B和7B规模下实现了最低的模态差距,同时在非语言理解任务上表现出色。这些收益仅需约1000小时的LLM训练语音,表明从输入侧减少模态差距是有效且数据高效的。

英文摘要

Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech generation more text-like, but the gap remains. We argue that the key remaining bottleneck lies on the input side. We propose TextPro-SLM, an SLM that makes spoken input more closely resemble that of a prosody-aware text LLM. TextPro-SLM combines WhisperPro, a unified speech encoder that produces synchronized text tokens and prosody embeddings, with an LLM backbone trained to preserve the semantic capabilities of the original TLM while learning paralinguistic understanding. Experiments show that TextPro-SLM achieves the lowest modality gap among leading SLMs at both 3B and 7B scales, while also delivering strong overall performance on paralinguistic understanding tasks. These gains are achieved with only roughly 1,000 hours of LLM training audio, suggesting that reducing the modality gap from the input side is both effective and data-efficient.

2605.05866 2026-05-11 cs.AI cond-mat.mtrl-sci cs.LG

XDecomposer: Learning Prior-Free Set Decomposition for Multiphase X-ray Diffraction

XDecomposer:学习无先验的多相X射线衍射集分解

Hanyu Gao, Bin Cao, Yunyue Su, Tong-Yi Zhang, Qiang Liu

发表机构 * New Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA)(中国科学院自动化研究所新型模式识别实验室) School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences(中国科学院大学先进交叉学科学院) Guangzhou Municipal Key Laboratory of Materials Informatics, The Hong Kong University of Science and Technology (Guangzhou)(广州市材料信息学重点实验室,香港科技大学(广州)) Green Dynamics, Australia(澳大利亚绿动公司)

AI总结 XDecomposer通过无先验假设的方法,实现多相X射线衍射模式的联合分解与识别,无需候选相列表或结构模板,提升重建准确性和相识别能力。

Comments 28pages, 8figures, 6tables

详情
AI中文摘要

多相粉末X射线衍射(PXRD)分析仍是结构鉴定的关键瓶颈,因为实际合成常产生复杂混合物,其组成相难以可靠分离。尽管近期在基于表示的晶体检索和生成方面有进展,但现有方法大多假设单相输入,在多相情况下失效。本文提出XDecomposer,一种无先验假设的框架,用于多相XRD模式的联合分解和识别,无需候选相列表、结构模板或相数先验知识。我们将多相衍射分析建模为集合预测问题,模型推断无序的相分辨组件集合、混合比例及其对应的结构表示。相查询驱动的分解机制与衍射一致的物理重建,使源分离准确且保持晶体学保真度。在模拟和实验数据集上的广泛实验表明,XDecomposer在不同化学系统中显著提高了重建准确性和相识别能力,同时保持对未见混合物的良好泛化能力。这些结果为数据驱动的、源解析的多相XRD分析提供了实用路径,减少了对先验引导的迭代相匹配的依赖。代码在https://github.com/Licht0812/XDecomposer上公开。

英文摘要

Multiphase powder X-ray diffraction (PXRD) analysis remains a fundamental bottleneck in structure identification, as real-world synthesis often produces complex mixtures whose constituent phases (components) cannot be reliably disentangled. While recent advances in representation-based crystal retrieval and generation suggest the possibility of inferring structures directly from PXRD, existing approaches largely assume single-phase inputs and break down in multiphase settings. Here, we present XDecomposer, a prior-free framework for joint decomposition and identification of multiphase XRD patterns without requiring candidate phase lists, structural templates, or prior knowledge of phase number. We formulate multiphase diffraction analysis as a set prediction problem, where the model infers an unordered set of phase-resolved components, their mixture proportions, and corresponding structural representations within a unified architecture. A phase-query-driven decomposition mechanism, together with diffraction-consistent physical reconstruction, enables accurate source separation while preserving crystallographic fidelity. Extensive experiments on both simulated and experimental datasets show that XDecomposer substantially improves reconstruction accuracy and phase identification across diverse chemical systems, while maintaining strong generalization to unseen mixtures. These results provide a practical route toward data-driven, source-resolved multiphase XRD analysis and reduce long-standing dependence on prior-guided iteratively phase matching. The code is openly available at https://github.com/Licht0812/XDecomposer

2605.05806 2026-05-11 cs.LG

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

内部检索:基于注意力模型的内在能力

Elad Hoffer, Yochai Blau, Edan Kinderman, Ron Banner, Daniel Soudry, Boris Ginsburg

发表机构 * NVIDIA Department of Electrical Engineering, Technion, Haifa, Israel(技术学院电子工程系,以色列海法)

AI总结 本文提出INTRA框架,通过注意力机制直接从内部表征检索证据,统一检索与生成流程,提升问答任务的召回与生成质量。

详情
AI中文摘要

Retrieval-augmented generation (RAG)通常将检索与生成视为独立系统。我们探讨注意力编码器-解码器是否能直接从内部表征进行检索。我们引入INTRA(通过注意力的内在检索)框架,其中解码器注意力查询评分预先编码的证据片段,然后直接重用这些片段作为生成的上下文。通过设计,INTRA统一了检索与生成,消除了RAG流水线中常见的检索器-生成器不匹配问题。此设计还通过重用预先计算的编码器状态来摊销上下文编码。在问答基准测试中,INTRA在证据召回和端到端答案质量上均优于强大的工程化检索流水线。我们的结果表明,基于注意力的模型已经具备可以被激发而非作为外部模块添加的检索机制。

英文摘要

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.

2605.05732 2026-05-11 cs.LG cs.AI

CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

CRAFT:面向持续学习的遗忘感知干预式适应

Md Anwar Hossen, Fatema Siddika, Juan Pablo Munoz, Tanya Roosta, Ali Jannesari

发表机构 * Iowa State University(爱荷华州立大学) Maro Systems(Maro系统) University of California, Berkeley(加州大学伯克利分校) AMD

AI总结 CRAFT通过在隐藏表示上学习低秩干预,避免更新模型权重,通过输出分布分歧路由任务、KL散度正则化和合并干预,减少遗忘并提升性能。

Comments 24 pages

详情
AI中文摘要

大型语言模型(LLMs)可通过微调获得新能力,但持续适应常导致灾难性遗忘。我们提出CRAFT,一种持续学习框架,通过学习隐藏表示上的低秩干预避免更新模型权重。CRAFT分为三个阶段:首先根据输出分布分歧将每个任务路由到相似任务组;然后使用KL散度对齐组的先验状态,直接控制遗忘并决定收敛;最后用相同KL信号将更新任务的干预合并到共享表示中。此设计通过单一KL目标统一了路由、正则化和合并。CRAFT在多个基准和模型规模上优于强LoRA方法,同时对任务顺序鲁棒。这些结果表明,通过输出空间分歧引导的表示空间适应,为LLM的持续学习提供了可扩展且原理化的方案。

英文摘要

Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in three stages: it first routes each task to a group of similar tasks based on output-distribution divergence; it then fine-tunes the model using a Kullback-Leibler (KL) divergence against the group's prior state, which directly controls forgetting and determines convergence; finally, it merges interventions for the updated task into the shared representation using the same KL signal. This design unifies routing, regularization, and merging through a single KL-based objective. CRAFT improves overall performance and reduces forgetting compared to strong LoRA-based approaches across multiple benchmarks and model scales, while remaining robust to task ordering. These results suggest that controlling adaptation in representation space, guided by output-space divergence, provides a scalable and principled approach to continual learning in LLMs.

2605.05693 2026-05-11 cs.AI cs.LG

Saliency-Aware Regularized Quantization Calibration for Large Language Models

具有显著性感知的正则化量化校准用于大语言模型

Yanlong Zhao, Xiaoyuan Cheng, Huihang Liu, Baihua He, Xinyu Zhang, Harrison Bo Hua Zhu, Wenlong Chen, Li Zeng, Zhuo Sun

发表机构 * University of Science and Technology of China(中国科学技术大学) University College London(伦敦大学学院) Shanghai University of Finance and Economics(上海财经大学) Academy of Mathematics and Systems Science, Chinese Academy of Sciences(中国科学院数学与系统科学研究院) University of Copenhagen(哥本哈根大学) Imperial College London(伦敦帝国学院) Technical University of Denmark(丹麦技术大学) Peking University(北京大学)

AI总结 本文提出SARQC,通过引入显著性感知正则化改进量化校准,提升大语言模型的泛化能力,实验表明在密集和专家混合模型中提升了困惑度和零样本准确率。

详情
AI中文摘要

后训练量化(PTQ)是一种在内存和延迟约束下部署大语言模型(LLMs)的有效方法。现有PTQ方法通过最小化预定义校准数据集上的层间重建误差确定量化参数,通常通过缩放搜索或Gram基方法优化。然而,从泛化风险的角度来看,基于有限或不具代表性的校准数据的重建误差可能使量化权重偏离原始浮点权重,从而降低下游性能。为了解决这个问题,我们提出了正则化量化校准(RQC),这是一个统一的框架,通过添加显式控制权重偏离原始权重的正则化项来增强标准PTQ目标。我们进一步将此框架推广以纳入显著性感知正则化,得到显著性感知正则化量化校准(SARQC)。所提出的正则化鼓励量化权重在校准过程中保持接近原始权重,从而在推理时提高泛化能力。SARQC无缝集成到现有的PTQ流程中,并在统一的公式下增强缩放搜索和Gram基方法。在密集和专家混合LLM上的广泛实验显示,SARQC在困惑度和零样本准确率上表现出一致的改进,而无需引入额外的推理开销。

英文摘要

Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predetermined calibration dataset, typically optimized via either scale search or Gram-based methods. However, from the perspective of generalization risk, existing PTQ calibration objectives based solely on empirical reconstruction error over limited or unrepresentative calibration data may move the quantized weights away from the original floating-point weights, potentially degrading downstream performance. To address this issue, we propose \emph{Regularized Quantization Calibration} (RQC), a unified framework that augments standard PTQ objectives with a regularizer that explicitly controls weight deviation from the original weights. We further generalize this framework to incorporate a saliency-aware regularizer, resulting in \emph{Saliency-Aware Regularized Quantization Calibration} (SARQC). The proposed regularization encourages quantized weights to remain close to the original weights during calibration, leading to improved generalization at inference time. SARQC integrates seamlessly into existing PTQ pipelines and enhances both scale-search-based and Gram-based methods under a unified formulation. Extensive experiments on dense and Mixture-of-Experts LLMs demonstrate consistent improvements in perplexity and zero-shot accuracy, without introducing additional inference overhead.

2605.05674 2026-05-11 cs.CV cs.AI cs.LG

EGA: Adapting Frozen Encoders for Vector Search with Bounded Out-of-Distribution Degradation

EGA:基于冻结编码器的向量搜索适应方法,具有受限制的分布外退化

Dongfang Zhao

发表机构 * Tacoma School of Engineering and Technology University of Washington(塔科马工程与技术学院华盛顿大学)

AI总结 本文提出EGA方法,通过零初始化、局部三元组损失和超球面投影,解决冻结编码器在部署时面对未见类查询的问题,提升分布外场景下的标签精度。

Comments added ack and github link

详情
AI中文摘要

基于冻结视觉编码器的向量搜索系统在部署时会遇到未见类查询,但现有适配器训练在此情况下失效:高容量适配器使用全局对比损失会静默地将未见类样本重新分配到错误的已见类聚类中,导致最坏情况下的标签精度下降超过40个百分点。我们提出欧几里得测地线对齐(EGA),一种残差适配器,结合三个原则:零初始化、局部三元组损失和超球面投影。这些共同诱导了一个自我限制的动态:已经满足小边距的三元组停止产生梯度,因此适配器在局部几何已正确时会自动停止更新。我们的实验表明,当收敛时,96.5%的三元组没有梯度,未见类区域基本未受影响,同时仍能实现已见类的完整容量优化。在五个不同的分布外(OOD)基准测试中,EGA在四个主要分割上实现了最高的最坏情况标签精度,并在第五个分割上实现了持续改进。该设计还适用于更强的主干网络,除了CLIP之外,我们还提供了将梯度稀疏性与受限制的OOD扰动联系起来的分析证明。

英文摘要

Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wrong seen-class clusters, dropping worst-case Label Precision by over 40 points below the frozen baseline in our tests. We propose Euclidean Geodesic Alignment (EGA), a residual adapter that couples three principles: zero initialization, local triplet loss, and hypersphere projection. These collectively induce a self-limiting dynamic: triplets that already satisfy a small margin stop producing gradients, so the adapter automatically stops updating where the local geometry is already correct. Our experiments show that at convergence $96.5\%$ of triplets are gradient-free, leaving unseen-class regions largely untouched while still enabling full-capacity refinement of seen classes. Across five diverse out-of-distribution (OOD) benchmarks, EGA achieves the highest worst-case Label Precision on the four primary splits and a consistent improvement on the fifth. The design also transfers to stronger backbones in addition to CLIP, and we provide an analytical justification linking gradient sparsity to bounded OOD perturbation.

2605.05615 2026-05-11 cs.LG cs.CY

LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites

LLMSpace:大型语言模型在低地球轨道卫星上的推理碳足迹建模

Lei Jiang, Adrian Ildefonso, Daniel Loveless, Fan Chen

发表机构 * Indiana University(印第安纳大学)

AI总结 本文提出LLMSpace框架,用于建模AI赋能低地球轨道卫星上大型语言模型推理的碳足迹,分析碳足迹、延迟、硬件设计和使用寿命之间的关键权衡。

Comments 12 pages, 4 figures, 6 tables

详情
AI中文摘要

大型语言模型(LLMs)对能源需求迅速增长,导致由大规模推理驱动的新兴能源和碳危机。太阳能供电、AI赋能的低地球轨道(LEO)卫星已被提出以缓解地面电力消耗,但其生命周期碳足迹仍因发射排放、卫星制造和辐射硬化硬件需求而难以理解。本文提出LLMSpace,首个用于AI赋能LEO卫星上LLM推理的碳建模框架。LLMSpace联合建模运营碳和嵌入式碳、外围子系统、辐射硬化加速器和内存,以及LLM特定的工作负载特征,如prefill-decode行为和token生成。使用现实的卫星和GPU配置,LLMSpace揭示了碳足迹、推理延迟、硬件设计和使用寿命之间的关键权衡,以实现可持续的空间LLM推理。源代码:https://github.com/UnchartedRLab/LLMSpace。

英文摘要

Large language models (LLMs) impose rapidly growing energy demands, creating an emerging energy and carbon crisis driven by large-scale inference. Solar-powered, AI-enabled low Earth orbit (LEO) satellites have been proposed to mitigate terrestrial electricity consumption, but their lifecycle carbon footprint remains poorly understood due to launch emissions, satellite manufacturing, and radiation-hardened hardware requirements. This paper presents \textit{LLMSpace}, the first carbon modeling framework for LLM inference on AI-enabled LEO satellites. LLMSpace jointly models operational and embodied carbon, peripheral subsystems, radiation-hardened accelerators and memories, and LLM-specific workload characteristics such as prefill-decode behavior and token generation. Using realistic satellite and GPU configurations, LLMSpace reveals key trade-offs among carbon footprint, inference latency, hardware design, and operational lifetime for sustainable space-based LLM inference. Source code: https://github.com/UnchartedRLab/LLMSpace.

2605.05583 2026-05-11 cs.AI cs.CL

Belief Memory: Agent Memory Under Partial Observability

信念记忆:在部分可观测性下的智能体记忆

Junfeng Liao, Qizhou Wang, Jianing Zhu, Bo Du, Rui Yan, Xiuying Chen

发表机构 * MBZUAI RIKEN AIP UT Austin Wuhan University(武汉大学)

AI总结 本文提出BeliefMem,通过保留多个候选结论及其概率,解决智能体在部分可观测环境中记忆的不确定性问题,实验证明其在LoCoMo和ALFWorld基准上表现优异。

详情
AI中文摘要

本文提出BeliefMem,通过保留多个候选结论及其概率,解决智能体在部分可观测环境中记忆的不确定性问题,实验证明其在LoCoMo和ALFWorld基准上表现优异。

英文摘要

LLM agents that operate over long context depend on external memory to accumulate knowledge over time. However, existing methods typically store each observation as a single deterministic conclusion (e.g., inferring "API~X failed" from temporary errors), even though such observations are inherently partial and potentially ambiguous. By committing to one conclusion and discarding uncertainty, these methods introduce self-reinforcing error: the agent acts on the stored conclusion, never revisits alternatives, and reinforces the conclusion over time. To address this issue, we propose BeliefMem, which shifts the memory paradigm from committing to a single conclusion per observation to retaining multiple candidate conclusions with their probabilities. Concretely, BeliefMem stores the candidate conclusions as separate memory entries, each carrying a probability that is updated via Noisy-OR rules as new observations arrive. At retrieval, all candidates surface together with their probabilities, keeping alternatives visible to the agent. Since each conclusion in memory retains its probability, BeliefMem preserves the uncertainty that the deterministic paradigm discards, enabling the agent to act with high confidence on well-evidenced knowledge while retaining the capacity to update its confidence when new evidence arrives. Empirical evaluations on LoCoMo and ALFWorld benchmarks show that, even with limited data, BeliefMem achieves the best average performance, remarkably outperforming well-known baselines. More broadly, such probabilistic memory produces substantial gains and explores a new direction for agent memory in partially observable environments.

2605.05558 2026-05-11 cs.AI cs.CY

Who Prices Cognitive Labor in the Age of Agents? Compute-Anchored Wages

在代理时代谁定价认知劳动?基于计算的工资

Siqi Zhu

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文提出计算锚定工资理论,指出代理并非劳动力而是将计算资本转化为认知劳动的有效单位,从而改变劳动力市场的工资定价机制。

详情
AI中文摘要

关于人工智能代理经济学的一个自然直觉是,由于代理可以以极低的边际成本复制,代理劳动可能具有高度弹性供应,当其接近替代人类劳动时会向下压力认知劳动工资。我们论证这种框架在机制上是错误的,但在结论上部分正确,这种修正对理论和政策都至关重要。代理并非劳动力;它们是一种生产技术,将计算资本K_c转化为有效认知劳动单位L_A。一旦认识到这一点,弹性供应边界的均衡工资锚定从劳动力市场迁移到计算资本市场。基于经典要素定价框架,我们推导出一个计算锚定工资(CAW)界限,指出在人类和代理产生的认知劳动是替代品的任务上,竞争性的人类工资上限为λ·k·r_c,其中r_c是计算资本的租金率,k是每个有效代理产生的认知劳动单位的计算强度,λ是人类与代理生产力的相对比例。我们通过常弹性替代(CES)聚合泛化这一结果,分别讨论可替代与互补任务,并讨论要素份额后果。结论简洁:认知劳动的价格制定者不再是由劳动力市场决定。

英文摘要

A natural intuition about the economics of AI agents is that, because agents can be replicated at very low marginal cost, agent labor may be supplied highly elastically, placing downward pressure on cognitive-labor wages when it closely substitutes for human labor. We argue this framing is wrong in mechanism but partially correct in conclusion, and that the correction matters for both theory and policy. \textbf{Agents are not labor; they are a production technology that converts compute capital $K_c$ into effective units of cognitive labor $L_A$.} Once this is recognized, the elastic-supply margin that anchors the equilibrium wage migrates from the labor market to the compute capital market. Building on the classic factor-pricing framework \citep{mankiw2020}, we derive a \emph{Compute-Anchored Wage} (CAW) bound stating that, on tasks where human and agent-produced cognitive labor are substitutes, the competitive human wage is bounded above by $λ\cdot k \cdot r_c$, where $r_c$ is the rental rate of compute capital, $k$ is the compute intensity of one effective agent-produced cognitive labor unit, and $λ$ is the relative human-to-agent productivity. We generalize the result through constant elasticity of substitution (CES) aggregation, separate substitutable from complementary tasks, and discuss factor-share consequences. The conclusion is concise: \emph{the price-setter for cognitive labor is no longer the labor market.}

2605.04279 2026-05-11 cs.LG

Gradient Flow Structure and Quantitative Dynamics of Multi-Head Self-Attention

多头自注意力的梯度流结构与定量动力学

Ayan Pendharkar

发表机构 * Ayan Pendharkar

AI总结 本文研究多头自注意力的动力学,揭示其梯度流结构及聚类行为,提出非单调性障碍并建立稳定性机制,通过简化模型推导关键温度与聚类速率关系。

Comments 20 pages, 5 figures

详情
AI中文摘要

Transformer自注意力可解释为单位球面上的梯度流,在softmax相互作用势下,token演化并趋向聚类。尽管先前研究已确立单头注意力的聚类行为,多头设置因几何干扰导致标准单调性论证失效。本文发展了多头自注意力动力学的理论框架,解决若干开放问题。我们证明,在适当条件下,自然多头能量函数在平坦和球面动态下非递减。识别了单头单调性障碍为径向阴影项,即各头输出对token方向的投影,即使在正交假设下仍存在。我们引入确保单调性的充分条件,并建立对近似正交性的鲁棒性。在简化标量头设置中,推导了关键逆温度的闭式表达式,证明异质头具有超加性聚类速率。在此设置中,还证明了ReLU与softmax注意力在线性动态中的聚类时间分离。最后,建立熵产恒等式,证明注意力熵随聚类进程单调增加。本文结果为多头注意力动力学提供了统一视角,阐明了transformer模型中聚类和稳定性机制。

英文摘要

Transformer self-attention can be interpreted as a gradient flow on the unit sphere, in which tokens evolve under softmax interaction potentials and tend to form clusters. While prior work has established clustering behavior for single-head attention, the multi-head setting remains less understood due to geometric interference between heads, which invalidates standard monotonicity arguments. In this work, we develop a theoretical framework for multi-head self-attention dynamics and resolve several open questions. We show that, under suitable conditions on the score matrices, a natural multi-head energy functional is non-decreasing along both flat and spherical dynamics. We identify the key obstruction to per-head monotonicity as radial shadow terms, which are projections of each head's output onto token directions, persisting even under orthogonality assumptions. We introduce a sufficient condition ensuring monotonicity and establish robustness to approximate orthogonality. In a simplified scalar-head regime with equiangular token configurations, we derive a closed-form expression for the critical inverse temperature governing clustering behavior, and show that heterogeneous heads exhibit super-additive clustering rates. In this regime, we also prove a separation in clustering time between ReLU and softmax attention in the linearized dynamics. Finally, we establish an entropy production identity and show that attention entropy increases monotonically toward equilibrium as clustering progresses. Our results provide a unified perspective on the dynamics of multi-head attention and clarify the mechanisms underlying clustering and stability in transformer models.

2605.03067 2026-05-11 cs.AI cs.GT

Computing Thiele Rules on Interval Elections and their Generalizations

区间选举中Thiele规则及其推广的计算

Dimitris Avramidis, Alexandra Lassota, Ulrike Schmidt-Kraepelin, Adrian Vetta

发表机构 * Department of Mathematics and Computer Science(数学与计算机科学系) Eindhoven University of Technology(埃因霍温理工大学) School of Computer Science, and Department of Mathematics and Statistics(计算机科学学院和数学与统计学系) McGill University(麦吉尔大学)

AI总结 本文研究了区间选举中Thiele规则的计算问题,发现其在候选区间域内可通过线性规划高效求解,但在选民区间域内则存在NP难性,并提出了新的域扩展方法。

Comments 19 pages

详情
AI中文摘要

在区间选举中,Thiele规则及其推广的计算问题受到社会选择领域的广泛关注。其中,Thiele规则,尤其是比例同意投票(PAV),因其具有比例代表、帕累托最优和支持单调性等 desirable 属性而脱颖而出。其主要缺点是计算Thiele结果在一般情况下是NP难的。然而,由于在结构化偏好下Thiele规则表现更佳,这为问题提供了一丝希望。在候选区间(CI)域上,通过一个具有完全单模矩阵约束的线性规划(LP)可在多项式时间内计算。令人惊讶的是,这种方法在相关的选民区间(VI)域上失效,且该问题的复杂性曾被反复提出作为开放问题。我们的主要结果解决了这一问题:虽然相关矩阵不完全单模,但“标准”LP仍然至少有一个最优整数解,我们提供了一种快速算法来找到它。我们的技术自然扩展到选民-候选人区间(VCI)域,也称为一维选民-候选人范围(1D-VCR)域,以及线性一致(LC)域,两者均扩展了候选和选民区间域。尽管VCI和LC域已在社会选择领域研究,但它们之间的关系尚不清楚。我们通过与图论的联系,证明LC严格包含VCI。我们还提供了一种更接近VCI精神的LC替代定义,并具有自然的批准选举解释;这种等价性可能具有独立价值。最后,我们研究了一种替代的树形VCI扩展,并展示在该域上Thiele规则的计算变得NP难。

英文摘要

Approval-based committee voting has received significant attention in the social choice community. Among the studied rules, Thiele rules, and especially Proportional Approval Voting (PAV), stand out for desirable properties such as proportional representation, Pareto optimality, and support monotonicity. Their main drawback is that computing a Thiele outcome is NP-hard in general. A glimpse of hope comes from the fact that Thiele rules are better behaved under structured preferences. On the candidate interval (CI) domain, they are computable in polynomial time via a linear program (LP) that has a totally unimodular constraint matrix. Surprisingly, this approach fails for the related voter interval (VI) domain, and the complexity of the problem has repeatedly been posed as an open question. Our main result resolves this question: although the relevant matrix is not totally unimodular, the ``standard'' LP still admits at least one optimal integral solution, and we provide a fast algorithm for finding it. Our technique naturally extends to the voter-candidate interval (VCI) domain, also known as the 1-dimensional voter-candidate range (1D-VCR) domain, and to the linearly consistent (LC) domain, both of which generalize the candidate and voter interval domains. Although both the VCI and LC domains have been studied in social choice, their relationship was unknown. We show, through connections to graph theory, that LC strictly contains VCI. We also provide an alternative definition of LC that is closer in spirit to VCI and has a natural interpretation in approval elections; this equivalence may be of independent interest. Finally, we study an alternative tree-based generalization of VCI and show that Thiele rules become NP-hard to compute on this domain.

2605.02971 2026-05-11 cs.LG cs.AI cs.CL

Multilingual Safety Alignment via Self-Distillation

通过自蒸馏实现多语言安全对齐

Ruiyang Qin, Qingzhuo Wang, Dongrui Liu, Qiang Li, Zhihua Wei, Wen Shen

发表机构 * Tongji University(同济大学) Shanghai AI Laboratory(上海人工智能实验室)

AI总结 本文提出多语言自蒸馏框架,通过将高资源语言的安全能力迁移到低资源语言,解决多语言安全对齐问题,无需特定语言响应数据,实验表明方法在多语言安全性能上表现优异。

详情
AI中文摘要

大型语言模型(LLMs)表现出严重的多语言安全对齐问题:它们在高资源语言中具有强大的安全保护机制,但在低资源语言中却极易受到越狱攻击。当前的安全对齐方法通常依赖于每种目标语言的高质量响应数据,这既昂贵又难以生成。在本文中,我们提出了一种跨语言安全保护转移框架,称为多语言自蒸馏(MSD)。该框架将LLM固有的安全能力从高资源语言(例如英语)转移到低资源语言(例如爪哇语),从而克服了任何语言中响应数据的需求。我们的框架具有灵活性,可以与不同的自蒸馏策略集成。具体而言,我们实现了两种具体方法——on-policy MSD和off-policy MSD,这两种方法仅使用多语言查询即可实现有效的跨语言安全转移。此外,我们提出了双视角安全加权(DPSW),一种测度来优化蒸馏目标。通过同时考虑教师和学生的视角,DPSW会自适应地增加对安全关键token的惩罚权重,同时减少对非关键token的权重。在代表性的LLM上广泛实验,涵盖多样化的多语言越狱和实用基准测试,证明我们的方法在多语言安全性能上始终表现出色。值得注意的是,它能有效泛化到更具挑战性的数据集和未见语言,同时保持模型的通用能力。

英文摘要

Large language models (LLMs) exhibit severe multilingual safety misalignment: they possess strong safeguards in high-resource languages but remain highly vulnerable to jailbreak attacks in low-resource languages. Current safety alignment methods generally rely on high-quality response data for each target language, which is expensive and difficult to generate. In this paper, we propose a cross-lingual safeguard transfer framework named Multilingual Self-Distillation (MSD). This framework transfers an LLM's inherent safety capabilities from high-resource (e.g., English) to low-resource (e.g., Javanese) languages, overcoming the need for response data in any language. Our framework is flexible and can be integrated with different self-distillation strategies. Specifically, we implement two concrete methods -- on-policy MSD and off-policy MSD -- both of which enable effective cross-lingual safety transfer using only multilingual queries. Furthermore, we propose Dual-Perspective Safety Weighting (DPSW), a divergence measure to optimize the distillation objective. By jointly considering the perspectives of both the teacher and the student, DPSW adaptively increases the penalty weights on safety-critical tokens while reducing the weights on non-critical tokens. Extensive experiments on representative LLMs across diverse multilingual jailbreak and utility benchmarks demonstrate that our method consistently achieves superior multilingual safety performance. Notably, it generalizes effectively to more challenging datasets and unseen languages while preserving the model's general capabilities.