Ergodic Deviation-Robust Equilibrium under Mirror Descent Learning in Finite Games
有限博弈中镜像下降学习下的遍历偏差鲁棒均衡
Joshua Steier
AI总结 提出遍历偏差鲁棒均衡(EDRE),一种针对熵镜像下降学习的动态相关均衡概念,要求极限分布为ε-纳什均衡、全程偏差增益为√T量级且为EMD不动点,并证明其在势博弈中存在性及PPAD难度。
Comments Under Review
详情
我们引入了遍历偏差鲁棒均衡(EDRE),这是一种针对重复有限博弈的动态相关均衡概念,其中智能体通过熵镜像下降(EMD)进行学习。EDRE要求同一配置和学习运行同时满足三个性质:(E1)极限配置是乘积分布下的ε-纳什均衡;(E2)在整个学习轨迹上,每个固定联盟的累积(单边)偏差增益以高概率为~O(√T);(E3)极限配置是EMD映射的不动点,因此它是由动力学选择而非仅仅被认证为均衡。我们证明了√T的偏差遗憾率是阶紧的,建立了在精确势博弈中的存在性(通过纳什定理,并在凹性下给出构造性近端路径),同时证明了EMD的Lyapunov单调性(当不动点集为单点集时逐点收敛),并通过变分不等式将选择性质扩展到单调多矩阵博弈。尽管静态EDRE等同于ε-纳什均衡,但其内容是动态的:EMD下的鲁棒(正测度)选择排除了线性不稳定均衡,因此EDRE充当了带有动态证书而非静态精炼的纳什均衡。在复杂性方面,我们证明了一般多矩阵博弈中计算EDRE是PPAD难的,而在势博弈中属于promise-PPAD。一个2×2协调博弈的实例说明了该框架的所有组成部分。附录中包含了额外结果,包括赌博反馈扩展、大步长下双策略EMD映射通向Li-Yorke混沌的倍周期路径、最小成本转向的线性规划公式以及支持性模拟。
We introduce Ergodic Deviation-Robust Equilibrium (EDRE), a dynamics-relative equilibrium concept for repeated finite games in which agents learn via entropic mirror descent (EMD). EDRE requires three properties to hold simultaneously for the same profile and learning run: (E1) the limit profile is an $\varepsilon$-Nash equilibrium at a product distribution; (E2) along the entire learning trajectory, every fixed coalition's cumulative aggregate (summed-unilateral) deviation gain is $\tilde{\mathcal{O}}(\sqrt{T})$ with high probability; and (E3) the limit profile is a fixed point of the EMD map, so that it is selected by the dynamics rather than merely certified as an equilibrium. We prove that the $\sqrt{T}$ deviation-regret rate is order-tight, establish existence in exact-potential games (via Nash's theorem, with a constructive proximal route under concavity) together with Lyapunov monotonicity of EMD (and pointwise convergence when the fixed-point set is a singleton), and extend the selection property to monotone polymatrix games through variational inequalities. Although a static EDRE coincides with an $\varepsilon$-Nash equilibrium, its content is dynamic: robust (positive-measure) selection under EMD excludes linearly unstable equilibria, so EDRE acts as a Nash equilibrium equipped with a dynamic certificate rather than a static refinement. On the complexity side, we show that computing EDRE is PPAD-hard in general polymatrix games and belongs to promise-PPAD for potential games. A worked $2\times 2$ coordination-game example illustrates all components of the framework. Additional results, including a bandit-feedback extension, a period-doubling route to Li-Yorke chaos for the two-strategy EMD map at large step size, a linear-program formulation for minimum-cost steering, and supporting simulations, appear in the appendices.