StakeBench: Evaluating Language Understanding Grounded in Market Commitment
StakeBench: 评估基于市场承诺的语言理解
AI总结 提出StakeBench框架,通过将市场评论与可验证的交易记录关联,从市场行为中自动生成监督信号,评估语言模型对市场承诺的理解能力。
Comments 21 pages, 2 figures, 20 tables. Preprint. Dataset and evaluation code included