语言大模型 / LLM - arXivDaily 专题

2606.19734 2026-06-19 cs.LG 新提交 60%

Federated Bilevel Performative Prediction

联邦双层执行预测

Liangxin Qian, Chang Liu, Xuanyu Cao, Jun Zhao, Kwok-Yan Lam

发表机构 * Nanyang Technological University（南洋理工大学）； Zhejiang University（浙江大学）； Washington State University（华盛顿州立大学）

专题命中其他LLM ：研究联邦学习中的双层优化，涉及分布偏移。

AI总结研究联邦学习中客户端数据分布受决策影响的双层优化问题，提出联邦双层执行稳定点概念及两种求解方法，实验验证了稳定性阈值和元泛化提升。

Comments Accepted by ICML 2026

详情

AI中文摘要

联邦双层优化广泛用于跨分布式客户端的嵌套学习问题，例如在隐私和通信约束下的联邦超参数调整和元学习。大多数现有公式假设客户端数据分布固定，但执行性可能违反这一假设，其中部署的决策会重塑客户端行为和数据收集，导致客户端特定的、决策依赖的分布偏移。我们研究联邦双层执行预测，其中上层（UL）和下层（LL）目标都在客户端依赖、决策依赖的分布下进行评估。我们在解耦风险视角下形式化联邦双层执行稳定（FBPS）点，并给出其存在性和唯一性的充分条件。然后，我们开发两种联邦方法来计算FBPS解：FBi-RRM，在收缩条件下线性收敛；以及FBi-SGD，一种基于联邦超梯度估计的通信高效随机方法，在步长递减且敏感性足够小时具有收敛保证。在策略回归和元策略分类上的实验验证了预测的稳定性阈值，并展示了相对于非执行基线的元泛化改进，基于CNN的分类进一步证明了所提方法在非凸神经网络设置中的实际有效性。

英文摘要

Federated bilevel optimization is widely used for nested learning problems across distributed clients, such as federated hyperparameter tuning and meta-learning under privacy and communication constraints. Most existing formulations assume fixed client data distributions, which can be violated by performativity, where deployed decisions reshape client behavior and data collection, inducing client-specific, decision-dependent distribution shift. We study federated bilevel performative prediction, where both upper-level (UL) and lower-level (LL) objectives are evaluated under client-dependent, decision-dependent distributions. We formalize the federated bilevel performatively stable (FBPS) point under a decoupled-risk perspective and provide sufficient conditions for its existence and uniqueness. We then develop two federated methods to compute the FBPS solution: FBi-RRM, which converges linearly under a contraction condition, and FBi-SGD, a communication-efficient stochastic method based on federated hypergradient estimation with convergence guarantees under diminishing step sizes when sensitivities are sufficiently small. Experiments on strategic regression and meta strategic classification validate the predicted stability thresholds and demonstrate improved meta-generalization over non-performative baselines, and CNN-based classification further demonstrates the practical effectiveness of the proposed methods in nonconvex neural network settings.

URL PDF HTML ☆

赞 0 踩 0

2606.19603 2026-06-19 cs.LG 新提交 60%

Comparing Linear Probes with Mahalanobis Cosine Similarity

比较线性探针与马氏余弦相似度

Zhuofan Josh Ying, Peter Hase, Nikolaus Kriegeskorte

发表机构 * Columbia University（哥伦比亚大学）； Stanford University（斯坦福大学）； Schmidt Sciences（施密特科学）

专题命中其他LLM ：研究线性探针比较方法，与LLM可解释性相关

AI总结研究证明马氏余弦相似度与OOD AUROC存在线性关系，提供理论解释并验证其作为线性探针比较指标的有效性。

Comments 16 pages, 10 figures

详情

AI中文摘要

线性探针广泛用于可解释性研究，并常通过余弦相似度进行比较。两个方向之间的马氏余弦相似度（MCS）通过测试数据协方差重新加权内积，是一种自然的任务感知改进。Ying等人（2026）报告称，探针与在分布外（OOD）数据上训练的参考探针的MCS近乎完美地线性预测了该探针的OOD AUROC（R^2 = 0.98）。在这里，我们将这一实证发现扩展到不同模型、层和概念领域，并以封闭形式证明了这一普遍现象：对于投影为高斯分布的平衡类别，OOD AUROC与参考探针的MCS是线性的，因为两者都是探针在测试数据上信噪比（SNR）的S形函数。该理论还预测了这种线性何时失效，我们通过实验验证了这一点。MCS为比较线性探针提供了有理论依据且经验有效的替代方案，优于欧几里得余弦相似度。

英文摘要

Linear probes are widely used in interpretability research and often compared by cosine similarity. The Mahalanobis cosine similarity (MCS) between two directions, which reweights the inner product by test data covariance, is a natural task-aware refinement. Ying et al. (2026) report that a probe's MCS to a reference probe trained on the out-of-distribution (OOD) data near-perfectly linearly predicts the probe's OOD AUROC (R^2 = 0.98). Here, we extend this empirical finding across models, layers, and concept domains, and prove this general phenomenon in closed form: For balanced classes whose projections are Gaussian, OOD AUROC and MCS to the reference probe are linear because both are sigmoid-shaped functions of the probe's signal-to-noise ratio (SNR) on the test data. The theory also predicts when this linearity fails, which we verify empirically. MCS offers a theoretically grounded and empirically effective alternative to Euclidean cosine similarity for comparing linear probes.

URL PDF HTML ☆

赞 0 踩 0

2606.19411 2026-06-19 cs.LG 新提交 60%

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

通过NEPv的谱DPP：用于多样性感知数据选择的确定性点过程MAP的可扩展连续松弛

Richard Yi Da Xu

发表机构 * Hong Kong Baptist University（香港浸会大学）； TadReamk Limited（TadReamk有限公司）

专题命中其他LLM ：多样性感知数据选择，可应用于LLM数据筛选。

AI总结提出将NP难的DPP-MAP选择问题转化为Stiefel流形上的连续优化，通过非线性特征值问题（NEPv）的自洽场迭代实现近线性时间求解，适用于大规模数据选择。

详情

AI中文摘要

从海量候选池中选择一个小的、多样化的、高质量的子集是现代机器学习中的一个常见原语——用于训练和微调大型模型的数据整理和核心集选择、主动学习批次获取、上下文学习的提示和示例选择、检索多样化以及实验设计。确定性点过程（DPP）为此任务提供了原则性的、良好校准的多样性概念，但其MAP目标——选择大小为$k$的子集$S$最大化$\log\det(L_S)$——是NP难的，并且标准的贪心和采样算法在候选集大小$n$上具有超线性复杂度。这种成本在多样性最重要的数据为中心的场景中尤其高昂，其中$n$范围从数百万到数十亿的候选示例、特征或嵌入。我们将DPP-MAP重新表述为Stiefel流形上的连续优化问题，并证明其最优性条件构成一个先前未研究形式的具有特征向量依赖性的非线性特征值问题（NEPv）。该NEPv允许自洽场（SCF）迭代，具有基于谱间隙的局部收缩保证，从而提供了一个原则性的迭代求解器，其中多样性目标驱动一个特征向量依赖的算子。由此产生的算法OurMethod仅需要与核的矩阵-向量乘积，运行时间为$O\!\big((ndk+nk^2)\,t\big)$，其中迭代次数$t$很小，在$n$上接近线性，并直接与机器学习中常见的低秩和特征映射核集成。本文重点介绍松弛、求解器和扩展分析；完整的真实数据基准测试留给计划中的实证研究。

英文摘要

Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning batch acquisition, prompt and exemplar selection for in-context learning, retrieval diversification, and experimental design. Determinantal Point Processes (\DPP s) give a principled, well-calibrated notion of diversity for this task, but their \emph{MAP} objective -- pick a size-$k$ subset $S$ maximizing $\logdet(L_S)$ -- is NP-hard, and the standard greedy and sampling algorithms scale superlinearly in the ground-set size $n$. This cost is prohibitive precisely in the data-centric regime where diversity matters most, where $n$ ranges over millions to billions of candidate examples, features, or embeddings. We recast \DPP-MAP as a continuous optimization problem over the Stiefel manifold, and show that its first-order optimality conditions form a \emph{Nonlinear Eigenvalue Problem with eigenvector dependency} (\NEPv) of a previously unstudied form. This \NEPv\ admits a self-consistent field (\SCF) iteration with a spectral-gap-based local contraction guarantee, giving a principled iterative solver where the diversity objective drives an eigenvector-dependent operator. The resulting algorithm, \OurMethod, requires only matrix-vector products with the kernel and runs in time $O\!\big((ndk+nk^2)\,t\big)$ for a small number of iterations $t$, scaling near-linearly in $n$ and integrating directly with low-rank and feature-map kernels common in ML. This paper focuses on the relaxation, solver, and scaling analysis; full real-data benchmarking is left to a planned empirical study.

URL PDF HTML ☆

赞 0 踩 0

2606.19539 2026-06-19 astro-ph.SR cs.AI 新提交 60%

Review of Machine Learning Models for Solar Energetic Particle Prediction

太阳高能粒子预测的机器学习模型综述

Spiridon Kasapis, Pouya Hosseinzadeh, Kathryn Whitman, Ricky Egeland, Manolis Georgoulis, Angelos Vourlidas, Athanasios Papaioannou, Eleni Lavasa, Anastasios Anastasiadis, Giorgos Giannopoulos, Andres Munoz-Jaramillo, Bala Poduval, Irina N. Kitiashvili, Alexander G. Kosovichev, Viacheslav Sadykov, Soukaina Filali Boubrahimi, Tate T. Hutchins, Hameedullah A. Farooki, Manuel E. Cuesta, Leng Y. Khoo, Sungmin Pak, Robert Czarnota, Jamie S. Rankin, Jamey Szalay, Mitchell M. Shen, Georgios Livadiotis, Zigong Xu, David J. McComas, Nikolaos Sarlis, Dionissios Hristopulos, Arik Posner, Alec J. Engell, Mohammed AbuBakr Ali, Ali G. A. Abdelkawy, Abdelrazek M. K. Shaltout, M. M. Beheary, Christina O. Lee, Sigiava Aminalragia-Giamini, Constantinos Papadimitriou, Ingmar Sandberg, Savvas Raptis, Shah Muhammad Hamdi, Monica Laurenza, Mirko Stumpo, Sumanth A. Rotti, India Jackson, Aatiya Ali, Atilim Gunes Baydin, Nathan Schwadron, Subhamoy Chatterjee, Maher A. Dayeh, Gelu M. Nita, Patrick M. O'Keefe, Chun Jie Chong, Paul Kosovich, Russell D. Marroquin, Berkay Aydin, Petrus C. Martens, Lulu Zhao, Yang Chen, Yian Yu, Monica G. Bobra, Ward Manchester, Tamas Gombosi, Ming Zhang, Jesse Torres, Philip K. Chan, Mohamed Nedal, Kamen Kozarev, Peijin Zhang, Kimberly Moreland, Hazel M. Bain, Samuel Hart, Michael J. Starkey, Alan G. Ling, Simone Benella

发表机构 * Department of Astrophysical Sciences, Princeton University, Princeton, NJ, USA ； Computational Physics Branch, NASA Ames Research Center, Moffett Field, CA, USA ； Department of Computer Science, Utah State University, Logan, UT, USA ； Space Radiation Analysis Group, NASA Johnson Space Center, Houston, TX, USA ； Johns Hopkins Applied Physics Lab, 11100 Johns Hopkins Rd, Laurel, MD 20723, United States ； Research Center for Astronomy ； Applied Mathematics of the Academy of Athens, 4 Soranou Efesiou Street, Athens 11527, Greece ； Institute for Astronomy, Astrophysics, Space Applications ； Southwest Research Institute, Boulder, CO, USA ； Space Science Center, University of New Hampshire, Durham, NH, USA ； Department of Physics, New Jersey Institute of Technology, Newark, NJ, USA ； Astronomy Department, Georgia State University, Atlanta, GA, USA ； Department of Computer Science, Princeton University, Princeton, NJ, USA ； Department of Mathematics, Rowan University, Glassboro, NJ, USA ； Astronomy, California Institute of Technology, Pasadena, CA, USA ； Department of Physics, National ； Kapodistrian University of Athens, Athens, Greece ； School of Electrical ； Computer Engineering, Technical University of Crete, Chania, Greece ； Department of Astronomy ； Meteorology, Faculty of Science, Al-Azhar University, Cairo, Egypt ； Space Sciences Lab, University of California, Berkeley, CA, USA ； Research Consultancy, Athens, Greece ； Institute for Space Astrophysics ； Department of Physics ； Astronomy, Georgia State University, Atlanta, GA 30303, USA ； Aryabhatta Research Institute of Observational Sciences (ARIES), Manora Peak, Nainital-263001, Uttarakhand, India ； Department of Computer Science, Oxford University, Oxford, England ； Southwest Research Institute, San Antonio, TX, USA ； Computer Science Department, New Jersey Institute of Technology, Newark, NJ, USA ； Department of Physics, University of California San Diego, La Jolla, CA 92093, USA ； Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA ； Department of Climate ； Engineering, University of Michigan, Ann Arbor, MI, USA ； Department of Statistics, University of Michigan, Ann Arbor, MI, USA ； Department of Electrical Engineering ； Computer Science, Florida Institute of Technology, Melbourne, FL, USA ； Astrophysics Section, School of Cosmic Physics, Dublin Institute for Advanced Studies, DIAS Dunsink Observatory, Dublin D15 XR2R, Ireland ； Institute of Astronomy of the Bulgarian Academy of Sciences, Sofia, Bulgaria ； Center for Solar-Terrestrial Research, New Jersey Institute of Technology, Newark, NJ 07102, USA ； Cooperative Programs for the Advancement of Earth System Science, University Corporation for Atmospheric Research, Boulder, CO, USA ； CIRES, University of Colorado Boulder, Boulder, CO, USA ； Space Weather Prediction Center, NOAA, Boulder, CO, USA ； Astronomy, College of Science, The University of Texas at San Antonio, San Antonio, TX, USA ； Space Weather Prediction Center, National Oceanic ； The University of Texas at San Antonio, San Antonio, TX, USA ； Environmental Research, Inc., MA, USA

专题命中其他LLM ：机器学习模型综述，非LLM核心

AI总结综述了用于太阳高能粒子预测的机器学习模型，包括数据集、架构、输入输出比较，并提出了未来研究建议。

Comments Review Paper, Maine text: 23 pages, References: 5 pages, Appendix: 42 pages

详情

AI中文摘要

太阳高能粒子事件因其对航空、航天器电子设备以及地球磁层外人类任务的显著辐射危害而日益受到关注。从科学角度来看，SEP事件之所以引人入胜，是因为它们源于从太阳表面和日冕延伸到日光层的一系列物理过程，提供了对广泛适用于天体物理学的粒子加速和传输机制的洞察。因此，提高我们理解和预测SEP事件的能力，对于加深对这些机制的认识以及保护空间技术和探索至关重要。传统上，研究人员使用基于物理的模拟和经验方法对SEP进行建模。最近，机器学习已成为理解和预测SEP事件的新工具。本文旨在回顾当前可用于SEP预测的机器学习模型，识别用于训练的数据集，比较它们的架构、输入和输出，并基于这些见解，为未来研究概述良好实践和建议。

英文摘要

Solar energetic particle (SEP) events have attracted increasing attention due to their significant radiation hazards for aviation, spacecraft electronics, and human missions beyond Earth's magnetosphere. From a scientific perspective, SEP events are intriguing because they arise from a set of physical processes extending from the solar surface and corona through the heliosphere, offering insight into particle acceleration and transport mechanisms that are widely applicable across astrophysics. Therefore, advancing our ability to understand and predict SEP events is essential both for deepening our knowledge of such mechanisms and for safeguarding space technologies and exploration. Traditionally, researchers have modeled SEPs using physics-based simulations and empirical methods. More recently, machine learning (ML) has emerged as a new tool for understanding and predicting SEP events. The purpose of this manuscript is to review the currently available ML models for SEP prediction, identify the datasets used for training, compare their architectures, inputs, and outputs, and, based on these insights, outline good practices and recommendations for future research.

URL PDF HTML ☆

赞 0 踩 0

2606.16106 2026-06-19 cs.PF cs.AR cs.DC 新提交 60%

Edge-Inference Governors Need Memory-Clock State

超越CPU-GPU频率：内存时钟和尾部效应对边缘推理延迟估计的影响

Jaehoon Kang

专题命中其他LLM ：研究边缘推理中LLM延迟估计

AI总结通过测量NVIDIA Jetson Orin Nano，发现内存时钟是缺失的维度、聚合丢失率隐藏突发性、频率切换存在延迟，这些现象超出传统频率感知延迟模型的范围。

Comments 20 pages, 13 figures, 11 tables. Code and data: https://github.com/dankang21/jetson-latency-lab ; traces: https://doi.org/10.5281/zenodo.20745228

详情

AI中文摘要

频率感知延迟估计器通过建模CPU和GPU频率上的延迟，使得边缘ML推理的截止时间感知DVFS成为可能。我们在NVIDIA Jetson Orin Nano上进行了测量研究，展示了该建模范围之外的三种现象。(1) 内存时钟是一个缺失的维度：在现实的上限EMC范围（2133->3199 MHz）内，根据工作负载的不同，它将中位数延迟偏移了+11%到+48%，并且在最高GPU时钟下，对于合成L2驻留内核，我们观察到一个可重复的非单调情况（-9%）。在一个功率配置下分析并在另一个功率配置下部署的GPU频率估计器，因此低估了高达32%的延迟；列出四个可锁定的EMC点可以修复大多数工作负载，而参数化的1/f_emc项则不能。(2) 聚合丢失率隐藏了突发性：在固定时钟下，100k周期运行显示出刀锋边缘分布，其截止时间丢失的悬崖跨度约为1毫秒，但丢失的聚集远超出独立性——在0.1%的聚合丢失率下，下一个周期也丢失的概率高达74%（是独立基线的740倍）。高斯mu+3sigma边界超过0.1%丢失目标13倍到29倍，而样本外广义帕累托边界在所有八种配置中保持在~2倍以内。(3) 频率切换并非免费：每个域的过渡停顿低于100微秒，但新的工作点需要1/5/8毫秒（CPU/GPU/EMC）才能生效——对于每推理周期的调控器来说，这是典型推理周期的很大一部分。我们发布了完整的测量工具，并讨论了对下一代频率感知估计器和调控器的影响。

英文摘要

Frequency-aware latency estimators let deadline-aware DVFS governors schedule edge ML inference by modeling latency over CPU and GPU clocks, but they cannot observe the memory clock (EMC) -- a missing deployment state that decides whether a governor meets its deadlines and at what energy. We show this with a deployed, measured governor on a Jetson Orin NX: an EMC-blind GPU-only fit misses 25-28% of cycles at tight deadlines, whereas an EMC-aware refit holds misses to at most 1.3% under a 2% QoS miss budget by selecting a budget-feasible clock -- the energy-minimal one for periodic vision (calibrated module-rail power). The failure generalizes across three workload classes -- MobileNetV2, a ViT transformer, and Qwen2.5 LLM token decode (where saturated decode makes the aware policy lower-energy than the infeasible blind choice): a CPUxGPU estimator sends the deployed governor to an infeasible operating point, and only an EMC-aware model identifies the feasible side of the energy frontier. The effect is real and outside the CPUxGPU state abstraction: across two Orin SKUs sharing the same lockable EMC points it shifts median latency by up to ~45%, replicates on both, and survives a fused TensorRT fp16 engine. CPUxGPU models do not absorb it: per-lockable-point EMC tables are needed, a scoped inversion shows monotone assumptions can pick the wrong direction, and clustered misses make aggregate QoS rates understate deployment risk. We release the harness; this complements, not rebuts, the state of the art within its CPUxGPU scope.

URL PDF HTML ☆

赞 0 踩 0

2306.12679 2026-06-19 cs.CL 60%

Constructing Colloquial Dataset for Persian Sentiment Analysis of Social Microblogs

构建波斯语社交媒体微博客情感分析的口语数据集

Mojtaba Mazoochi, Leila Rabiei, Farzaneh Rahmani, Zeinab Rajabi

发表机构 * Faculty member in ICT Research Institute（ICT研究所教员）； Iran Telecommunication Research Center (ITRC)（伊朗电信研究中心）； Faculty member in Computer Department（计算机系教员）； Mehralborz University（梅赫拉布尔兹大学）； Hazrat-e Masoumeh University（玛苏姆大学）

专题命中其他LLM ：构建波斯语情感分析数据集，使用CNN模型

AI总结本文构建了波斯语口语数据集并提出基于CNN的模型，提升社交媒体微博客口语文本的情感分析性能，实验结果显示72%的准确率。

Journal ref Multimedia Tools and Applications, 2025

详情

DOI: 10.1007/s11042-025-20777-3

AI中文摘要

介绍：微博网站为情感分析和观点挖掘提供了丰富的数据源。然而，由于微博帖子通常缺乏语法一致的术语和代表性词汇，且用户不愿撰写长文，情感分类效率较低。此外，低资源语言也存在局限性。波斯语具有独特特征，需要独特的标注数据和模型进行情感分析，这与英语文本特征不同。方法：本文首先在协作环境中构建了一个名为ITRC-Opinion的用户意见数据集，包含60,000条来自Twitter和Instagram等社交媒体的非正式波斯语文本。其次，本文提出了一种基于卷积神经网络（CNN）的新型架构，以更有效地进行社交媒体微博客口语文本的情感分析。构建的数据集用于评估所提出的架构。此外，一些模型，如LSTM、CNN-RNN、BiLSTM和BiGRU，结合不同的词嵌入，包括Fasttext、Glove和Word2vec，也研究了我们的数据集并评估了结果。结果：结果表明我们的数据集和所提模型（72%准确率）的优势，展示了情感分类性能的显著提升。

英文摘要

Introduction: Microblogging websites have massed rich data sources for sentiment analysis and opinion mining. In this regard, sentiment classification has frequently proven inefficient because microblog posts typically lack syntactically consistent terms and representatives since users on these social networks do not like to write lengthy statements. Also, there are some limitations to low-resource languages. The Persian language has exceptional characteristics and demands unique annotated data and models for the sentiment analysis task, which are distinctive from text features within the English dialect. Method: This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way. Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram. Second, this study proposes a new architecture based on the convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts. The constructed datasets are used to evaluate the presented architecture. Furthermore, some models, such as LSTM, CNN-RNN, BiLSTM, and BiGRU with different word embeddings, including Fasttext, Glove, and Word2vec, investigated our dataset and evaluated the results. Results: The results demonstrate the benefit of our dataset and the proposed model (72% accuracy), displaying meaningful improvement in sentiment classification performance.

URL PDF HTML ☆

赞 0 踩 0

2606.19366 2026-06-19 cs.LG cs.AI eess.SP 新提交 55%

Information Lattice Learning as Probabilistic Graphical Model Structure Learning

信息格学习作为概率图模型结构学习

Haizi Yu, Lav R. Varshney

发表机构 * Kocree, Inc.（Kocree公司）； AI Innovation Institute, Stony Brook University（石溪大学人工智能创新研究所）

专题命中其他LLM ：信息格学习与概率图模型相关，非LLM。

AI总结将信息格学习（ILL）解释为概率图模型结构学习，通过投影到分区格上学习可解释规则，并建立与最大熵和因子图的联系。

详情

AI中文摘要

信息格学习（ILL）通过将信号交替投影到编码抽象层次结构的分区格上，并将选定的规则提升回信号域，来学习信号的可解释规则。当信号是概率质量函数时，我们证明ILL学习的概率规则具有自然的概率图模型（PGM）解释，并详细发展了这一解释。ILL中的分区诱导出一个确定性的商变量，规则是该商变量的边际分布。因此，规则集是可解释抽象上的边际约束集合。一般提升是满足这些约束的所有联合分布的可行族，而特殊提升则选择最大无知重建，在ILL中通过L2均匀性原理实现，该原理与最大熵密切相关。在香农熵提升下，相同的约束产生一个对数线性因子图，其因子由学习的抽象索引。然而，信息格本身不是贝叶斯网络：其边编码抽象的细化与粗化，而非条件依赖。因此，ILL最好被视为商变量上可解释的基于约束的因子图的结构学习。这一观点阐明了ILL如何与图模型和最大熵模型相关，同时为推理、可识别性和混合符号-概率学习提出了新方向。

英文摘要

Information lattice learning (ILL) learns interpretable rules of a signal by alternately projecting the signal onto a partition lattice that encodes a hierarchy of abstractions and lifting selected rules back to the signal domain. When the signal is a probability mass function, we show the probabilistic rules learned by ILL admit a natural probabilistic graphical model (PGM) interpretation and develop this interpretation in detail. A partition in ILL induces a deterministic quotient variable, and a rule is the marginal law of that quotient variable. A rule set is therefore a collection of marginal constraints over interpretable abstractions. General lifting is the feasible family of all joint distributions satisfying those constraints, while special lifting chooses a maximum-ignorance reconstruction, implemented in ILL by an L2 uniformity principle closely related to maximum entropy. Under a Shannon-entropy lifting, the same constraints yield a log-linear factor graph whose factors are indexed by learned abstractions. The information lattice itself, however, is not a Bayesian network: its edges encode refinement and coarsening of abstractions, not conditional dependence. Thus ILL is best viewed as structure learning for interpretable constraint-based factor graphs over quotient variables. This view clarifies how ILL relates to graphical models and maximum entropy models, while suggesting new directions for inference, identifiability, and hybrid symbolic-probabilistic learning.

URL PDF HTML ☆

赞 0 踩 0