arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.06797 2026-05-11 cs.LG

重新审视Adam用于流式强化学习

Florin Gogianu, Adrian Catalin Lutu, Razvan Pascanu

发表机构 * GitHub

AI总结本文研究了在线环境下传统更新方法的有效性，发现C51在流式强化学习中表现优异，基于此提出Adaptive Q(λ)算法，性能超越现有方法。

详情

AI中文摘要

信念空间动力学中可接受学习率步骤的闭式上界

Zixi Li, Youzhen Li

发表机构 * Datawhale

AI总结本文提出信念空间动力学中可接受学习率步骤的闭式上界公式，通过概率简单集的投影前向步骤建模更新，并在自然KL/Bregman几何中定义可接受性。

2605.06740 2026-05-11 cs.LG cs.AI

Geometric Kolmogorov--Arnold Network (GeoKAN)

几何科拉莫戈罗夫-阿诺尔德网络（GeoKAN）

Abhijit Sen, Bikram Keshari Parida, Giridas Maiti, Mahima Arya, Denys I. Bondar

发表机构 * Department of Physics and Engineering Physics, Tulane University（Tulane 大学物理与工程物理系）； Institute of Applied Geosciences, Karlsruhe Institute of Technology（卡尔斯鲁厄技术大学应用地球科学研究所）

AI总结 GeoKAN通过学习几何适应坐标进行函数逼近，提升科学计算和微分方程问题的建模能力。

Comments 46 pages, 24 figures, 13 tables

2605.06736 2026-05-11 cs.LG cs.AI cs.HC

STDA-Net: Spectrogram-Based Domain Adaptation for cross-dataset Sleep Stage Classification

STDA-Net：基于频谱图的跨数据集睡眠阶段分类域适应

Unaza Tallal, Shruti Kshirsagar, Ankita Shukla

发表机构 * School of Computing, Wichita State University（维斯科州立大学计算机学院）； Computer Science and Engineering Department, University of Nevada, Reno（内华达大学里诺分校计算机科学与工程系）

AI总结本文提出STDA-Net框架，结合CNN提取频谱图特征、BiLSTM建模睡眠动态和DANN实现无监督域适应，提升跨数据集睡眠阶段分类的准确性和稳定性。

Comments submitted to IEEE SMC conference

详情

AI中文摘要

准确的跨数据集睡眠阶段分类仍面临挑战，由于EEG通道布局、采样率、记录环境和受试者群体的差异。尽管深度学习在自动化睡眠分期中表现出色，但大多数现有跨数据集方法依赖于一维EEG信号表示，而利用二维频谱图输入在无监督域适应框架中的应用仍较少探索。本文提出STDA-Net（基于频谱图的时域域适应网络），结合卷积神经网络（CNN）提取频谱图特征、双向长短期记忆（BiLSTM）模块建模睡眠动态，以及域对抗神经网络（DANN）实现源到目标特征对齐，无需任何标记的目标域数据进行训练。实验在三个公开数据集Sleep-EDF、SHHS-1和SHHS-2上进行，六个跨数据集迁移设置下进行测试。结果表明，所提框架在平均准确率为89.03%和平均宏F1得分为87.64%，在平衡分类性能上优于现有1D基线方法，且在五次独立运行中方差显著降低，表明改进的稳定性和可重复性。总体而言，这些发现表明，结合时域建模和对抗域适应的二维频谱图表示，为跨数据集睡眠分期提供了稳健且具有竞争力的替代方案，替代传统的一维EEG输入。

英文摘要

Accurate sleep stage classification across datasets remains challenging due to variability in EEG channel montages, sampling rates, recording environments, and subject populations. Although deep learning has shown considerable promise for automated sleep staging, most existing cross-dataset methods rely on one-dimensional EEG signal representations, whereas the use of two-dimensional spectrogram-based inputs within an unsupervised domain adaptation framework has remained largely unexplored. Here, we propose STDA-Net (Spectrogram-based Temporal Domain Adaptation Network), a framework that combines a convolutional neural network (CNN) for spectrogram-based feature extraction, a bidirectional long short-term memory (BiLSTM) module for temporal modeling of sleep dynamics, and a domain-adversarial neural network (DANN) for source-to-target feature alignment without requiring any labeled target-domain data during training. Experiments are conducted on three publicly available datasets Sleep-EDF, SHHS-1, and SHHS-2 under six cross-dataset transfer settings. Results show that the proposed framework achieves an average accuracy of 89.03% and an average macro F1-score of 87.64%, consistently outperforming existing 1D baseline methods in terms of balanced classification performance, with substantially lower variance across five independent runs, indicating improved stability and reproducibility. Overall, these findings demonstrate that 2D spectrogram-based representations, combined with temporal modeling and adversarial domain adaptation, provide a robust and competitive alternative to conventional 1D EEG inputs for cross-dataset sleep staging.

URL PDF HTML ☆

赞 0 踩 0

2605.06733 2026-05-11 cs.LG cs.AI

利用冷原子共振计算进行医学影像分类：结合自编码器和代理驱动训练

Nuno Batista, Ana Morgado, Oscar Ferraz, Sagar Silva Pratapsi, Jorge Lobo, Gabriel Falcao

发表机构 * Instituto de Telecomunicações, Dept. of Electrical and Computer Engineering, University of Coimbra, Portugal（葡萄牙科英布拉大学电信研究所，电气与计算机工程系）； ISR - Institute of Systems and Robotics, Dept. of Electrical and Computer Engineering, University of Coimbra, Portugal（葡萄牙科英布拉大学系统与机器人研究所，电气与计算机工程系）； CFisUC, Department of Physics, University of Coimbra, Portugal（葡萄牙科英布拉大学物理系，CFisUC）

AI总结本文提出基于中性原子共振计算的混合量子-经典管道，用于医学图像分类，特别是息肉检测的二分类任务。通过引导自编码器处理高维数据，结合可微代理模型克服量子测量非可微问题，提升分类准确性和图像恢复能力。

Comments 8 pages, 6 figures. Accepted to the 2025 IEEE International Conference on Quantum AI (IEEE QAI). Supported by FCT and the Open Quantum Institute (OQI)

Journal ref 2025 IEEE International Conference on Quantum Artificial Intelligence (QAI)

详情

DOI: 10.1109/QAI63978.2025.00064

AI中文摘要

我们介绍了一种基于中性原子共振计算的混合量子-经典管道，用于医学图像分类，重点是息肉检测的二分类任务。为有效处理高维性，我们集成了引导自编码器。该管道学习了紧凑且判别性的图像数据表示，这些表示也适合量子共振计算。此类系统的一个关键挑战是量子测量的非可微性，这会形成标准训练的'梯度障碍'。我们通过引入可微代理模型来模拟量子层，从而实现整个系统的端到端反向传播。此引导训练过程联合优化分类准确性和自编码器的忠实图像恢复。所学的潜在表示被编码为脉冲调制参数，嵌入到里德堡哈密顿量中，随后通过期望值获得量子嵌入。这些嵌入随后传递给线性分类器。我们的模拟显示，该方法在使用PCA或无指导自编码器的传统方法中表现更优。我们还进行了消融研究，评估了各种量子和训练参数的影响，证明了我们提出的管道在现实世界医学影像应用中的鲁棒性和灵活性，即使在当前NISQ时代也是如此。

英文摘要

We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guided auto-encoder. This pipeline learns compact and discriminative representations of image data that are also well-suited for quantum reservoir computing. A key challenge in such systems is the non-differentiable nature of quantum measurements, which creates a 'gradient barrier' for standard training. We overcome this barrier by incorporating a differentiable surrogate model that emulates the quantum layer, enabling end-to-end backpropagation through the entire system. This guided training process is jointly optimized for classification accuracy and for faithful image recovery from the auto-encoder. The learned latent representations are encoded as pulse detuning parameters within a Rydberg Hamiltonian, and quantum embeddings are subsequently obtained through expectation values. These embeddings are then passed to a linear classifier. Our simulations show that this method outperforms some traditional approaches that use PCA or unguided autoencoders. We also conduct ablation studies to assess the impact of various quantum and training parameters, demonstrating the robustness and flexibility of our proposed pipeline for real-world medical imaging applications, even in the current NISQ era.

URL PDF HTML ☆

赞 0 踩 0

2605.06726 2026-05-11 cs.LG

Transformer-Based Wildlife Species Classification from Daily Movement Trajectories

基于Transformer的野生动物物种分类：从每日移动轨迹中推断物种身份

Obed Irakoze, Prasenjit Mitra

发表机构 * Department of Electrical \& Computer Engineering Carnegie Mellon University Africa Kigali, Rwanda

AI总结本文通过训练序列模型对大规模GPS轨迹进行分类，发现Transformer在物种分类中表现优于LSTM、CNN等模型，尤其在数据有限时提升显著，且统一1小时分辨率能提升整体性能。

Comments 8 pages

详情

AI中文摘要

从单日移动数据推断野生动物物种身份是一项具有挑战性的任务。我们在此基础上训练序列模型，利用Movebank平台上的大规模、多物种GPS轨迹进行训练。轨迹模型通过在测试中排除整个测距研究或区域的协议进行评估。我们比较了基于Transformer的序列模型与LSTM、CNN和时间卷积网络，发现Transformer在平衡准确率上普遍优于其他模型，提升幅度约为8到22个百分点，具体取决于物种和实验设置。在一项针对大象的二分类任务中，使用1小时分辨率时，Transformer的平衡准确率为0.83，AUC为0.92，显著优于所有基线模型。我们还探讨了在数据有限条件下，通过分析基本位移编码与扩展范围的运动描述符（包括速度、方向和转向行为）之间的差异，来研究特征表示。通过特征增强，我们观察到性能提升，尤其是对于受关注较少且稀疏表示的物种，如大型食肉动物、狮子和斑马。最后，比较1小时和30分钟时间分辨率的实验表明，尽管更细粒度的采样可以捕捉某些物种的短期移动模式，但统一的1小时分辨率在减少缺失数据和确保时间一致性方面能带来更广泛的性能提升。

英文摘要

Inferring the identity of wildlife species from daily movement data alone is a challenging task. We train sequence models on large-scale, 7-species GPS trajectories from the Movebank platform. Trajectories models are evaluated using a protocol in which entire telemetry studies or regions are heldout during testing. We compare Transformer-based sequence models to LSTM, CNN, and Temporal Convolutional Networks, and find that Transformers consistently achieve higher balanced accuracy with gains of approximately 8 to 22 percentage points, depending on the species and experimental setting. In an elephant binary classification task with 1-hour resolution, the Transformer achieves a balanced accuracy of 0.83 and an AUC of 0.92, substantially outperforming all baseline models. We examine, under data-limited conditions, feature representations by analyzing the differences between a basic displacement-based encoding and an expanded range of movement descriptors that include speed, direction, and turning behavior. With feature augmentation, we see clear performance gains, especially for underrepresented and sparsely represented species, such as large carnivores, lions, and Zebras. Finally, experiments comparing 1-hour and 30-minutetemporal resolutions show that while finer sampling can capture short-term movement patterns for some species, a unified 1-hour resolution yields more promising performance across studies by reducing missing data and ensuring consistent temporal coverage.

URL PDF HTML ☆

赞 0 踩 0

2605.06724 2026-05-11 cs.LG cs.AI eess.SP

Enabling Unsupervised Training of Deep EEG Denoisers With Intelligent Partitioning

通过智能分区实现深度EEG去噪器的无监督训练

Qiyu Rao, Haozhe Tian, Homayoun Hamedmoghadam, Danilo Mandic

发表机构 * Department of Electrical and Electronic Engineering, Imperial College London（帝国理工学院伦敦校区电子与电气工程系）； Dyson School of Design Engineering, Imperial College London（帝国理工学院伦敦校区戴森设计工程学院）

AI总结本文提出iPSD方法，通过学习将输入EEG段分割为独立的噪声实现，实现无监督去噪，尤其在低信噪比和复杂噪声下表现优异。

详情

AI中文摘要

EEG去噪因神经活动微妙且与频谱重叠噪声难以分离而具有挑战性。传统方法难以处理可穿戴EEG中的时变噪声，而深度学习方法虽能有效去噪但需无噪声参考信号。本文提出iPSD方法，通过学习将输入EEG段分割为独立的噪声实现，无需清洁参考信号即可实现深度学习去噪器的自监督训练，即使在仅有一个待去噪EEG段的情况下也能有效工作。通过大量实验验证，iPSD在极低信噪比（低至-10 dB）和挑战性噪声（如EMG）下表现出卓越的频谱保真度，优于竞争基线。

英文摘要

Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot handle the time-varying pervasive artifacts in wearable EEGs. Deep learning methods, on the other hand, show promise in decomposition-free EEG denoising using highly expressive neural networks, but the training requires artifact-free EEG, which is inherently unobtainable. To address this, we propose Intelligent Partitioning for Self-supervised Denoising (iPSD). Our method eliminates the need for clean references by learning to partition an input EEG segment into independent noisy realizations with the same underlying signal. This enables self-supervision of deep learning denoisers, even in zero-shot settings where only a single EEG segment to be denoised is available. We validate iPSD through extensive experiments, including validations on wearable EEG from in-ear sensors. The results show that iPSD achieves state-of-the-art performance, most notably under extremely low signal-to-noise ratios (down to -10 dB) and challenging artifacts (e.g., EMG), with spectral fidelity orders of magnitude higher than competitive baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.06723 2026-05-11 cs.AI cs.CL cs.LG

视觉文本压缩作为度量传输

Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li

发表机构 * University of Alberta（阿尔伯塔大学）； vivo Mobile Communication Co., Ltd（vivo移动通信有限公司）； Tsinghua University（清华大学）

AI总结本文通过度量传输理论分析视觉文本压缩，提出无标签路由准则和传输感知聚焦机制，提升压缩效率并优化下游任务表现。

详情

AI中文摘要

视觉文本压缩（VTC）通过将文本渲染为图像并用视觉-语言模型重新编码，实现长上下文处理的高效性，但其压缩比并不直接转化为下游任务的实用性。本文通过度量传输理论，将文本和视觉标记视为经验概率测度，揭示ViT补丁编码器诱导的推前映射的传输成本，包含精度成本和覆盖成本。该方法提出无标签路由准则和传输感知聚焦机制，在24个NLP数据集上，无标签规则在17个数据集上达到Oracle水平，平均任务得分提升3.3%且平均tokens减少10.3%。

英文摘要

Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not translate predictably into downstream utility: on some tasks the visual path matches or exceeds the text path, on others it collapses, and the compression ratio itself does not predict which regime will occur. The missing quantity is therefore not another summary of efficiency, but a principled measure of task-relevant information loss induced by visual encoding. We address this problem by formulating VTC in the language of measure transport. Treating text and visual tokens as empirical probability measures, we show that the ViT patch encoder induces a push-forward map whose transport cost decomposes into a precision cost from within-patch aggregation and a coverage cost from cross-patch fragmentation. Both terms are estimable from downstream-label-free probes. This formulation yields two operational consequences: a downstream-label-free routing criterion that selects whether to use the visual path for a given input or benchmark instance, and a transport-informed foveation mechanism that re-encodes high-cost regions at higher resolution. Across $24$ NLP datasets at Qwen3-4B, our label-free rule matches the per-dataset oracle on $17/24$ datasets ($70.8\%$), and improves the average task score by $+3.3\%$ with $-10.3\%$ average tokens relative to a pure-LLM.

URL PDF HTML ☆

赞 0 踩 0

2605.06702 2026-05-11 cs.AI cs.CL cs.LG

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

CASCADE：基于案例的连续适应：在部署期间为大型语言模型进行持续适应

Siyuan Guo, Yali Du, Hechang Chen, Yi Chang, Jun Wang

发表机构 * School of Artificial Intelligence, Jilin University（吉林大学人工智能学院）； Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Jilin University（吉林大学知识驱动人机智能工程研究中心）； International Center of Future Science, Jilin University（吉林大学未来科学国际中心）； Department of Informatics, King’s College London（伦敦国王学院信息学院）； The Alan Turing Institute（艾伦·图灵研究所）； AI Centre, Department of Computer Science, UCL（UCL计算机科学系人工智能中心）

AI总结本文提出CASCADE框架，通过在部署期间持续学习提升LLM性能，实现20.9%的提升，并在多个领域任务中优于基线方法。

详情

AI中文摘要

大型语言模型（LLMs）已成为现代人工智能的核心基础，但其生命周期仍受训练与部署之间严格分割的限制，部署后学习效果显著下降。本文将部署时间学习（DTL）定义为LLM生命周期的第三阶段，使LLM代理在不修改模型参数的情况下通过经验提升自身。我们提出了CASCADE（CASe-based Continual Adaptation during DEployment），一种通用且原则性的框架，使LLM代理具备显式且持续演化的片段记忆。CASCADE将经验重用建模为上下文老虎机问题，使代理能够进行原则性的探索-利用权衡，并在长期交互中建立无遗憾保证。此设计使代理能够积累、选择和优化任务相关案例，将过去经验转化为可操作的知识。在16个多样化的任务中，包括医学诊断、法律分析、代码生成、网络搜索、工具使用和具身交互，CASCADE在零样本提示下将宏平均成功率提高了20.9%，并一致优于梯度和记忆基线方法。通过将部署重新定义为适应性学习过程，本文为持续改进人工智能系统奠定了基础。

英文摘要

Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-based Continual Adaptation during DEployment), a general and principled framework that equips LLM agents with an explicit, evolving episodic memory. CASCADE formulates experience reuse as a contextual bandit problem, enabling principled exploration-exploitation trade-offs and establishing no-regret guarantees over long-term interactions. This design allows agents to accumulate, select, and refine task-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improves macro-averaged success rate by 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.

URL PDF HTML ☆

赞 0 踩 0

2605.06696 2026-05-11 cs.AI cs.LG cs.MA

Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations

多智能体AI中的隐藏联盟：来自内部表示的谱诊断

Cameron Berg, Susan L. Schneider, Mark M. Bailey

发表机构 * Reciprocal Research（递归研究）； Center for the Future of AI, Mind, and Society（人工智能、心智与社会未来中心）； Florida Atlantic University（佛罗里达 Atlantic 大学）； Biological and Computational Intelligence Center（生物与计算智能中心）； National Intelligence University（国家情报大学）

AI总结本文提出通过分析多智能体系统内部神经表示的谱分区方法，检测隐藏联盟结构，验证了该方法在强化学习和大语言模型中的有效性，揭示了代表层次结构。

Comments 18 pages

详情

AI中文摘要

交互式AI代理集合可能形成联盟，产生关键的群体级组织，对AI安全和对齐至关重要。然而，仅观察代理行为往往不足以区分真实的信患耦合与虚假的相似性，因为 consequential 联盟可能在任何明显行为变化之前在内部表示层面形成。本文介绍了一种从多代理系统的内部神经表示中检测联盟结构的实用方法。该方法从代理的隐藏状态构建成对互信息图，并应用谱分区来识别最显著的联盟边界。我们在两个领域验证了该方法：首先，在多代理强化学习环境中，该方法成功恢复了编程的分层和动态联盟结构，并正确拒绝了没有信息耦合的行为协调的假阳性。其次，使用大型语言模型，该方法识别了由描述性提示暗示的联盟结构，跟踪动态团队重新分配，并揭示了代表层次结构，其中显式标签优于冲突的交互模式。在两种设置中，恢复的分区揭示了子组组织，这无法通过标量跨代理互信息测量区分。结果表明，通过谱分区分析隐藏状态互信息提供了一种可扩展的诊断方法，用于识别代表联盟，为监控分布式AI系统中的新兴结构提供了有价值的工具。

英文摘要

Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent. Here, we introduce a practical method for detecting coalition structure from the internal neural representations of multi-agent systems. The approach constructs a pairwise mutual-information graph from the hidden states of agents and applies spectral partitioning to identify the most salient coalition boundary. We validate this method in two domains. First, in multi-agent reinforcement learning environments, the method successfully recovers programmed hierarchical and dynamic coalition structures and correctly rejects false positives arising from behavioral coordination without informational coupling. Second, using a large language model, the method identifies coalition structures implied by descriptive prompts, tracks dynamic team reassignments, and reveals a representational hierarchy where explicit labels dominate over conflicting interaction patterns. Across both settings, the recovered partition reveals subgroup organization that a scalar cross-agent mutual-information measure cannot distinguish. The results demonstrate that analyzing hidden-state mutual information through spectral partitioning provides a scalable diagnostic for identifying representational coalitions, offering a valuable tool for monitoring emergent structure in distributed AI systems.

URL PDF HTML ☆

赞 0 踩 0

2605.06690 2026-05-11 cs.AI cs.CL cs.LG

State Representation and Termination for Recursive Reasoning Systems

递归推理系统的状态表示与终止

Debashis Guha, Amritendu Mukherjee, Sanjay Kukreja, Tarun Kumar

发表机构 * S P Jain School of Global Management（S P Jain 全球管理学院）； Indian Statistical Institute（印度统计研究所）； eClerx Services Ltd.（eClerx 服务有限公司）

AI总结本文提出了一种递归推理系统的状态表示方法及终止条件，通过epistemic状态图编码提取的主张、证据关系、开放问题和置信度权重，并定义了顺序间隙以判断迭代的必要性。

详情

AI中文摘要

Toeplitz MLP Mixers 是低复杂度、信息丰富的序列模型

Benjamin L. Badger, Ethan Roland

发表机构 * IBM ； AE Studio

AI总结本文提出Toeplitz MLP Mixer，通过三角掩码的Toeplitz矩阵乘法替代注意力机制，实现更低的计算复杂度，同时在信息保留和复制能力上表现更优。

详情

AI中文摘要

基于Transformer的大型语言模型在某些方面受到注意力机制二次时间与空间计算复杂度的限制。我们引入了Toeplitz MLP Mixer (TMM)，一种类似于Transformer的架构，通过在序列维度上使用三角掩码的Toeplitz矩阵乘法替代注意力机制，从而在训练时达到O(dn log n)的时间和O(dn)的空间复杂度，在推理预填时达到O(dn)的时间和空间复杂度。尽管与其他亚二次复杂度架构相比，TMM缺乏复杂的输入调节或状态维护，但其在单位计算和设备内存下的训练效率更高。我们证明TMM能够保留更多信息，从而在复制能力上表现更优，我们认为这是由于缺乏架构偏置所致。与更高的输入信息保留一致，TMM在信息检索和上下文学习基准准确性方面优于其他架构。最后，我们从操作符索引理论的角度进行分析，并表明，反直觉的是，训练后的因果不可逆模型的Toeplitz层更可能成为可逆或几乎可逆的，而不是实际上在输入上可逆的模型。

英文摘要

Transformer-based large language models are in some respects limited by the quadratic time and space computational complexity of attention. We introduce the Toeplitz MLP Mixer (TMM), a transformer-like architecture that swaps attention for triangular-masked Toeplitz matrix multiplication over the sequence dimension resulting in $\mathcal{O} (dn \log n)$ time and $\mathcal O(dn)$ space complexity during training and $\mathcal O(dn)$ time and space at inference prefill. Despite the lack of sophisticated input modulation or state maintenance present in other sub-quadratic architectures, TMMs yield greater training efficiency in terms of loss achieved per compute and device memory. We demonstrate that TMMs are capable of retaining more input information resulting in improved copying ability, which we argue results from a lack of architectural biases. Consistent with higher input information retention, TMMs exhibit superior information retrieval and in-context learning benchmark accuracy compared to comparable architectures. We conclude with an analysis from the perspective of operator index theory and show that, counterintuitively, trained Toeplitz layers of causal non-invertible models are more likely to be invertible or nearly so than models that are actually invertible over their inputs.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

MIND: Monge Inception Distance for Generative Models Evaluation

Conformal Agent Error Attribution

When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

Revisiting Adam for Streaming Reinforcement Learning

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

An Aerial Manipulator for Perception-Driven Flower Targeting Toward Contactless Pollination in Vertical Farming

Physics-based Digital Twins for Integrated Thermal Energy Systems Using Active Learning

Gradient Extrapolation-Based Policy Optimization

HumanNet: Scaling Human-centric Video Learning to One Million Hours

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

Geometric Kolmogorov--Arnold Network (GeoKAN)

STDA-Net: Spectrogram-Based Domain Adaptation for cross-dataset Sleep Stage Classification

Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA

Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics

The E$Δ$-MHC-Geo Transformer: Adaptive Geodesic Operations with Guaranteed Orthogonality

Medical Imaging Classification with Cold-Atom Reservoir Computing using Auto-Encoders and Surrogate-Driven Training

Transformer-Based Wildlife Species Classification from Daily Movement Trajectories

Enabling Unsupervised Training of Deep EEG Denoisers With Intelligent Partitioning

When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

Visual Text Compression as Measure Transport

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations

State Representation and Termination for Recursive Reasoning Systems

Robustness of Refugee-Matching Gains to Off-Policy Evaluation Choices

An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire

From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models