arXivDaily arXiv每日学术速递 周一至周五更新
2606.20435 2026-06-19 econ.EM 新提交

Choosing A Headline Estimand from Matching, DID, and Hybrid Designs: A Minimax-Regret Approach

从匹配、DID和混合设计中选择标题估计量:一种极小化最大遗憾方法

Yechan Park, Yuya Sasaki

AI总结 本文提出在面板数据因果效应估计中,混合设计(DIDM)的估计量介于匹配(M)和双重差分(DID)之间,并在宽泛损失函数下是极小化最大遗憾选择,建议将DIDM作为标题估计量,匹配和DID作为边界。

详情
AI中文摘要

使用面板数据估计因果效应的研究人员通常从三种利用过去结果的方法中选择:双重差分(DID)、对滞后结果进行条件化(匹配,M)以及同时进行两者的混合方法(DIDM)。相应的识别假设是非嵌套的,因此对于报告哪种方法几乎没有指导。我们给出了相应估计量有序的条件,其中DIDM介于匹配和DID之间。这使得DIDM在宽泛的损失函数类中成为三者中的极小化最大遗憾选择。我们建议将DIDM报告为标题估计量,匹配和DID作为边界。我们在应用中进行了说明。

英文摘要

Researchers using panel data to estimate causal effects routinely choose among three approaches to using past outcomes: difference-in-differences (DID), conditioning on lagged outcomes (matching, M), and a hybrid that does both (DIDM). The corresponding identifying assumptions are non-nested, leaving little guidance on which to report. We give conditions under which the corresponding estimands are ordered, with DIDM bracketed between matching and DID. This makes DIDM the minimax-regret choice among the three under a broad class of loss functions. We recommend reporting DIDM as the headline estimate, with matching and DID as bounds. We illustrate in applications.

2606.20286 2026-06-19 econ.EM 新提交

Institutions, Inputs, and Agricultural Growth in China:Revisiting Several Controversies, 1949--1986

制度、投入与中国农业增长:重访若干争议(1949–1986)

Jiyuan Lyu

AI总结 本文利用统一数据集和计量方法,重新审视关于中国农业增长的价格剪刀差、重工业投资、1978年改革及去集体化对灌溉影响的四大争议。

详情
AI中文摘要

关于1949年至1986年间中国农业增长的学术争论在价格剪刀差的程度、重工业投资的影响、1978年改革的作用以及去集体化对灌溉的影响等方面持续存在分歧。本文利用单一数据集和互补的计量经济学方法,逐一回应了这些争议。结果表明,1952–1957年是唯一一个通过所有三个渠道实现净提取的时期,此后国家通过财政和信贷工具向农业净流入约1686亿元。重工业投资对农业产生了显著的正向滞后效应,而同期负相关源于投资份额指标的零和性质。投入产出弹性在1970年突然变化,集体农业贷款在1971年断裂,两者均指向华北农业会议的整顿效果。防灾能力从集体时期的0.70下降到家庭承包后的0.53,主要原因是集体维护体系崩溃而非国家投资减少。1979年后农业供给的价格弹性趋近于零,表明1979年的收购价格提高更像是一次性重新校准而非持续的边际激励。

英文摘要

Scholarly debates on China's agricultural growth between 1949 and 1986 continue to differ over the extent of the price scissors, the effect of heavy industrial investment, the role of the 1978 reforms, and the impact of decollectivization on irrigation. Using a single dataset and complementary econometric methods, this paper addresses each of these controversies. The results show that 1952--1957 was the only net extraction period across all three channels, after which the state channelled a net inflow of about 168.6 billion yuan into agriculture via fiscal and credit instruments. Heavy industrial investment exerted a significant positive lagged effect on agriculture, while the contemporaneous negative correlation stemmed from the zero-sum nature of the investment share indicator. The input-output elasticity shifted abruptly in 1970, and collective agricultural loans broke in 1971, both pointing to the rectification effects of the North China Agricultural Conference. Disaster prevention capacity fell from 0.70 under the collective era to 0.53 after household contracting, mainly because the collective maintenance system collapsed rather than because state investment declined. After 1979 the price elasticity of agricultural supply approached zero, suggesting that the 1979 procurement price increase acted more like a one-off recalibration than a sustained marginal incentive.

2606.19972 2026-06-19 econ.EM 新提交

Biodiversity Media Narratives and Stock Market Performance: Evidence from Europe

生物多样性媒体叙事与股市表现:来自欧洲的证据

Andres Azqueta-Gavaldon, Ben Jabeur Sami, Leila Hedhili

AI总结 利用GDELT全球知识图谱构建2015-2025年法德意西四国的生物多样性媒体风险指标,通过面板格兰杰因果检验和增广逆概率加权事件研究发现,生物多样性风险显著降低股价,且低风险期的正面效应大于高风险期的负面效应。

详情
AI中文摘要

本研究为法国、德国、意大利和西班牙构建了2015-2025年间新颖的生物多样性相关媒体风险指标,利用GDELT全球知识图谱捕捉媒体对生物多样性威胁的关注。通过面板格兰杰因果检验和增广逆概率加权(AIPW)事件研究设计,我们发现了高度显著的证据表明生物多样性风险会降低股票价格,其影响在冲击后3至10个月达到峰值。此外,我们揭示了一个明显的非对称性,即低生物多样性风险期的正面效应大于高风险期的负面效应。结果在收益分布的分位数上稳健,并在控制欧洲股票市场波动性和经济政策不确定性时依然成立。我们的发现首次提供了生物多样性媒体叙事驱动欧洲股市估值的证据。

英文摘要

This study constructs novel biodiversity related media risk indicators for France, Germany, Italy, and Spain over 2015-2025, capturing media attention to biodiversity threats using the GDELT Global Knowledge Graph. Using panel Granger causality tests and an augmented inverse probability weighting (AIPW) event-study design, we find highly significant evidence that biodiversity risk reduces stock prices, with effects peaking between 3 and 10 months after a shock. Moreover, we uncover a marked asymmetry whereby the positive effects of low biodiversity risk episodes outweigh the negative effects of high-risk episodes. Results are robust across quantiles of the return distribution and hold when controlling for European equity market volatility and economic policy uncertainty. Our findings provide the first evidence that biodiversity media narratives drive stock market valuations in Europe.

2606.20240 2026-06-19 econ.EM stat.AP 新提交

Two-Sample IV: Efficient Two-Step Estimation and Tests for Overidentification and Weak-Instruments

两样本IV:高效两步估计及过度识别与弱工具变量检验

Fatima Kasenally, Ruoxi Guan, Frank Windmeijer

AI总结 针对两样本IV估计,提出异方差和样本异质性下稳健的两步高效估计方法及过度识别检验,仅需线性回归的汇总统计量,并扩展弱工具变量检验。

详情
AI中文摘要

两样本IV是一种流行的估计方法,当结果变量和处理变量在不同样本中可用,而工具变量在两个样本中都可用时。标准估计量是两样本两阶段最小二乘估计量,在同方差和样本同质性下是有效的。我们开发了一个稳健的两步程序,用于在一般异方差和样本异质性下进行有效估计,并提出了相关的两样本Hansen过度识别检验。我们方法的一个关键特征是只需要两个样本中简化形式和第一阶段的线性回归的汇总统计量。这些是估计系数向量的六个对象,以及同方差和异方差稳健的估计方差矩阵。我们进一步表明,在同方差和同质性下,处理样本中的第一阶段F统计量可以按标准方式用作弱工具变量检验,这里的相对偏差是比例偏差。我们提出了Montiel-Olea和Pflueger (2013)的有效F统计量的扩展,用于异方差情况,遵循Windmeijer (2025)的推广。我们在Marshall (2019)研究教育对投票行为影响的应用中说明了估计量和检验,并进行了聚类稳健推断。

英文摘要

Two-sample IV is a popular estimation method when the outcome and treatment variables are available in different samples, whereas instruments are available in both samples. The standard estimator is two-sample two-stage least squares estimator, which is efficient under homoskedasticity and homogeneity of the samples. We develop a robust two-step procedure for efficient estimation under general heteroskedasticity and heterogeneity of the samples, and propose a related two-sample Hansen overidentification test. A key feature of our approach is that only summary statistics from the linear regressions of the reduced form and first-stage in the two samples are needed. These are the six objects of the estimated coefficient vectors, and the homoskedastic and heteroskedasticity robust estimated variance matrices. We further show that the first-stage F-statistic in the treatment sample can be used as a test for weak instruments in the standard way under homoskedasticity and homogeneity, with the relative bias here a proportional bias. We propose an extension of the effective F-statistic of Montiel-Olea and Pflueger (2013) for the heteroskedastic case, following the generalization in Windmeijer (2025). We illustrate the estimators and tests in an application studying the effect of education on voting behavior from Marshall (2019), with cluster robust inference.

2606.19846 2026-06-19 econ.GN q-fin.EC 新提交

What Capital After Labor? Forecasting the Talent ROI Transition in the Human-AI Era

劳动力之后是什么资本?预测人机时代的人才ROI转型

Kwan Soo Shin, In Seok Kang

AI总结 针对AI增强打破劳动时间与贡献的会计关联,本文构建从时间到产出的人才ROI预测框架,核心定理为ROI反转,并利用韩国52小时工作制案例验证了前期压力信号,预测产出型企业在2032年TFP增长领先1.5-2.0个百分点。

Comments 90 pages, 6 figures

详情
AI中文摘要

AI增强打破了劳动时间与生产贡献之间的会计联系,但企业仍通过基于时间的间接费用包来评估人才。本文开发了一个预测框架,用于在人机时代从基于时间的人才会计向基于产出的人才ROI转型。该框架以定理3(在τ*处的ROI反转)为实证主轴,包含四个机制定理:间接费用非加性、增强节省时间路径、创新溢价放大以及人机二元归因不确定性。韩国分阶段实施的52小时工作制规定提供了一个实证预警案例。在一个包含365家上市公司的DART面板数据(2281个公司-年观测值)中,SG&A与收入比率从2018年的18.26%上升至2020年的20.06%,在2021-2022年略有修正,并于2024年达到20.10%的峰值。在收入百分位队列代理下,双向固定效应(+1.56个百分点,p=0.049)、合并事件研究估计(t=+3时为+4.21个百分点,p=0.001)以及Callaway-Sant'Anna双重稳健交错DID估计(t=+4时为+4.51个百分点)收敛于一个正向间接费用压力特征。2015-2017年的向后扩展(224家公司,601个观测值)提供了预处理数据,提供了反对预先存在的上升趋势混杂因素的证据。我们将韩国证据解读为,据我们所知,第一个经验记录的τ*前间接费用压力制度特征,其中基于时间的会计仍占主导地位,而AI增强和劳动时间压缩共同推高了间接费用。预计到2032年,基于产出的公司在公司层面TFP增长上比基于时间的同行高出1.5-2.0个百分点。贡献在于为向AI增强的人才ROI会计转型提供了一个预测模型和管理规划工具。

英文摘要

AI augmentation breaks the accounting link between labor time and productive contribution, yet firms continue to evaluate talent through time-based overhead bundles. This paper develops a forecasting framework for the transition from time-based talent accounting to output-based talent ROI in the human-AI era. The framework centres on Theorem 3 (ROI Inversion at τ*) as the empirical spine, with four mechanism theorems: overhead non-additivity, augmentation-saved-time pathways, innovation-premium amplification, and human-AI dyad attribution uncertainty. Korea's staged 52-hour workweek mandate provides an empirical early-warning case. In a DART panel of 365 listed firms (2,281 firm-year observations), the SG&A-to-revenue ratio rose from 18.26 percent in 2018 to 20.06 percent in 2020, corrected mildly in 2021-2022, and peaked at 20.10 percent in 2024. Under the revenue-percentile cohort proxy, two-way fixed effects (+1.56 pp, p = 0.049), pooled event-study estimates (+4.21 pp at t = +3, p = 0.001), and Callaway-Sant'Anna doubly-robust staggered DiD estimates (+4.51 pp at t = +4) converge on a positive overhead-pressure signature. A 2015-2017 backward extension (224 firms, 601 observations) supplies pre-treatment data, providing evidence against pre-existing upward-trend confounds. We read the Korean evidence not as a direct τ* estimate or a point causal magnitude, but as, to our knowledge, the first empirically documented signature of the pre-τ overhead-pressure regime, where time-based accounting still dominates while AI augmentation and labor-time compression jointly raise overhead. Output-based firms are forecast to outperform time-based peers by 1.5-2.0 percentage points in firm-level TFP growth by 2032. The contribution is a forecasting model and managerial planning tool for the shift to AI-augmented talent ROI accounting.

2606.20041 2026-06-19 econ.GN cs.AI cs.LG q-fin.EC q-fin.GN 新提交

AI Economist Agent: An Agentic Framework for Model-Grounded Economic Analysis with RAG, Knowledge Graphs, and Large Language Models

AI经济学家代理:一种基于模型的经济分析代理框架,结合RAG、知识图谱和大语言模型

Masahiro Kato

AI总结 提出一种基于RAG的AI经济学家代理框架,利用知识图谱和大语言模型进行经济情景分析,通过代理规划、检索证据、选择模型并生成报告,提高经济叙事的连贯性和可追溯性。

详情
AI中文摘要

我们提出了一种基于模型的RAG型AI经济学家,具有用于经济情景分析的代理框架,使用大语言模型(LLMs)和知识图谱。虽然LLMs可以生成流畅的经济叙事,但经济学家通常需要做出基于经济理论和现实数据的经济主张。基于这一动机,本研究提出了一种基于RAG的AI经济学家,它利用包含经济数据和理论的知识图谱以及基于LLM的代理来规划分析、检索相关证据、选择合适的模型并生成报告。在我们的框架中,我们不直接仅使用语言模型产生定量主张;相反,我们生成基于显式模型计算的叙事,并通过AI代理与检索到的证据相关联。我们将我们的框架称为AI经济学家代理。我们在两个应用中评估了AI经济学家代理:为美国通胀持续性和美联储政策生成经济学家报告,以及为美国商业房地产再融资压力生成银行压力测试叙事。结果说明了如何通过基于生成报告来提高其经济连贯性和可追溯性。

英文摘要

We propose a model-grounded RAG-based AI economist with an agentic framework for economic scenario analysis using large language models (LLMs) and knowledge graphs. While LLMs can generate fluent economic narratives, economists are often required to make economic claims grounded by economic theory and real-world data. Based on this motivation, this study proposes an RAG-based AI economist, which utilizes knowledge graphs including economic data and theory and LLM-based agents to plan the analysis, retrieve relevant evidence, select appropriate models, and generate reports. In our framework, we do not produce quantitative claims directly with the language model alone; instead, we generate narratives grounded in explicit model-based computations and linked to the retrieved evidence via AI agents. We refer to our framework as an AI economist agent. We evaluate the AI economist agent in two applications: economist report generation for U.S. inflation persistence and Federal Reserve policy, and bank stress-test narrative generation for U.S. commercial real estate refinancing stress. The results illustrate how grounding the generated reports improves their economic coherence and traceability.

2606.19794 2026-06-19 econ.GN cs.CY q-fin.EC 新提交

Forecasting AI-Era Productivity: The Intellectually Converged Human Framework and a Missing Cognitive Mediator in Production Function Theory

预测AI时代的生产率:智力融合人类框架与生产函数理论中缺失的认知中介

Kwan Soo Shin, In Seok Kang

AI总结 本文提出智力融合人类(ICH)框架,通过引入四维认知构念“融合能力”(C)作为AI与生产率之间的认知中介,解释了AI投资未能带来相应生产率增长的理论悖论,并基于20个OECD国家的数据分析验证了AI与C的交互作用对全要素生产率变异的解释力。

Comments 78 pages, 3 figures

详情
AI中文摘要

为什么大规模AI投资未能产生相应的生产率增长?我们认为这一悖论在理论上是生成的:主流生产函数框架通过将AI视为可分离的生产要素,而未建模AI产生生产性价值的认知中介,从而遇到了结构性边界。这导致投资倾向于部署,而生产率需要先发展我们称之为融合能力(C)的东西。我们提出了智力融合人类(ICH)框架,这是生产函数理论的第五阶段框架:H-hat = H[1 + phi(A,C)],其中有效生产能力等于人力资本(H)乘以一个增强因子[1 + phi],phi由AI利用强度(A)和融合能力(C)共同决定,C是一个四维认知构念,涵盖具身理解、元认知、时间整合和整合思维。生产函数Y = F(K, H-hat)为索洛的TFP残差提供了一个以人为中心的机制:A_Solow = [1 + phi(A,C)]^(1-alpha)。该框架预测了三种具有不同政策含义的增强机制。对20个OECD经济体的描述性跨国分析显示,AIxC交互作用与86%的TFP变异相关,而仅AI为31%,这是小n理论传统中模式一致的发现。韩国是国家级欠增强的例证:高H、大量A、低C导致phi=0。我们将融合能力与相邻构念——吸收能力、动态能力和人力资本——区分开来,并证明C构成了先前框架中隐含的特定认知中介。我们推导出C优先的政策建议,并提出了三个可实证检验的命题及一个可证伪的10年预测。

英文摘要

Why does massive AI investment fail to generate commensurate productivity gains? We argue the paradox is theoretically generated: prevailing production function frameworks encounter a structural boundary by treating AI as a separable factor of production without modeling the cognitive mediation through which AI generates productive value. This directs investment toward deployment when productivity requires prior development of what we term convergence capacity (C). We propose the Intellectually Converged Human (ICH) framework, a fifth-stage framework for production function theory: H-hat = H[1 + phi(A,C)], where effective productive capacity equals human capital (H) scaled by an augmentation factor [1 + phi], with phi jointly determined by AI utilization intensity (A) and convergence capacity (C), a four-dimensional cognitive construct encompassing embodied understanding, metacognition, temporal integration, and integrative thinking. The production function Y = F(K, H-hat) provides a human-centered mechanism for Solow's TFP residual: A_Solow = [1 + phi(A,C)]^(1-alpha). The framework predicts three augmentation regimes with distinct policy implications. Descriptive cross-national analysis of 20 OECD economies shows the AIxC interaction is associated with 86% of TFP variance versus 31% for AI alone, a pattern-consistent finding in the small-n theoretical tradition. South Korea exemplifies national-scale under-augmentation: high H, substantial A, low C produce phi = 0. We distinguish convergence capacity from adjacent constructs, absorptive capacity, dynamic capability, and human capital, and demonstrate that C constitutes the specific cognitive mediator that prior frameworks have left implicit. We derive C-first policy prescriptions and offer three empirically testable propositions with a falsifiable 10-year forecast.

2606.19599 2026-06-19 eess.SY cs.SY econ.EM 新提交

Ramping Procurement and Bid-Cost Recovery in Real-Time Market

实时市场中的爬坡采购与投标成本回收

Cong Chen, Valentina Norambuena, Lang Tong

AI总结 研究净需求不确定下与经济调度协同优化的爬坡采购,分析单间隔与多间隔协同优化设计,提出评估发电机利润、消费者支付、投标成本回收和运营效率的分析框架,并比较三种定价机制。

Comments 4 figures

详情
AI中文摘要

我们研究了净需求不确定下与经济调度协同优化的爬坡采购。我们考察了电网运营商实施的两种灵活爬坡产品设计:单间隔和多间隔协同优化。两者都依赖于滚动窗口随机优化,包含绑定和咨询间隔决策。我们开发了分析框架来评估发电机利润、消费者支付、投标成本回收(BCR)和运营效率。特别是,净需求不确定性可能导致发电机补偿不足,需要歧视性BCR。虽然运营效率对能量和爬坡价格不变,但生产者利润和消费者支付关键取决于定价。我们研究了节点边际定价(LMP)和两种统一定价:最大调度成本定价(MDCP)和最大时间节点边际定价(MTLMP)。在市场外BCR下,LMP产生歧视性能量价格,而MDCP消除BCR,MTLMP在大多数情况下也是如此。这一性质使我们能够在MDCP下为价格接受型发电机建立真实投标激励。我们的分析突出了单间隔和多间隔协同优化与定价设计之间的权衡:在高预测不确定性和中等爬坡需求下,单间隔能量-爬坡协同优化具有优势,而当净需求预测相对准确且爬坡需求具有挑战性时,多间隔协同优化更优。基于CAISO和ERCOT数据的实证结果表明,与LMP相比,MDCP和MTLMP增加了生产者利润且BCR可忽略,但以消费者支付增加为代价。

英文摘要

We study ramping procurement co-optimized with economic dispatch under net-demand uncertainty. We examine two flexible ramp product designs implemented by grid operators: single-interval and multi-interval co-optimization. Both rely on rolling-window stochastic optimization with binding and advisory interval decisions. We develop analytical frameworks to evaluate generator profits, consumer payments, bid cost recovery (BCR), and operational efficiency. In particular, net-demand uncertainty may lead to generator under-compensation, requiring discriminatory BCR. While operational efficiency is invariant to energy and ramp prices, producer profits and consumer payments depend critically on pricing. We examine locational marginal pricing (LMP) and two uniform pricing: maximum dispatch cost pricing (MDCP) and maximum temporal locational marginal pricing (MTLMP). With out-of-market BCR, LMP yields discriminatory energy prices, whereas MDCP eliminates BCR and MTLMP does so in most cases. This property enables us to establish truthful bidding incentives for price-taking generators under MDCP. Our analysis highlights trade-offs between single- and multi-interval co-optimization and pricing designs: single-interval energy-ramp co-optimization is advantageous under high forecast uncertainty and moderate ramping requirements, whereas multi-interval co-optimization is superior when net-demand forecasts are relatively accurate and ramp needs are challenging. Empirical results on CAISO and ERCOT data show that MDCP and MTLMP increase producer profits with negligible BCR, albeit at the expense of higher consumer payments relative to LMP.

2606.19777 2026-06-19 physics.soc-ph econ.GN q-fin.EC 新提交

Have Data Centers Raised Your Electric Bill? Causal Evidence from the United States

数据中心提高了你的电费吗?来自美国的因果证据

Asa Watten, John Bistline, Geoffrey Blanford

AI总结 利用工具变量法,发现2015-2024年美国数据中心使平均零售电价温和下降,归因于电力系统的规模经济效应。

详情
AI中文摘要

我们使用工具变量法估计,从2015年到2024年,数据中心导致美国平均零售电价温和下降。尽管普遍看法相反,这一发现与经济推理一致:现有的大型电力系统固定成本、输配电的规模经济以及发电单位成本的下降意味着持久的需求增长会降低平均价格。我们发现了输电、配电和发电成本以及零售客户类别内部和之间的规模经济模式。我们警告说,未来的供应限制可能会逆转这一效应。

英文摘要

We estimate that data centers caused average retail electricity rates to fall modestly in the United States from 2015 to 2024 using an instrumental variables approach. Despite prevailing sentiment, the finding is consistent with economic reasoning: existing large power system fixed costs, economies of scale in transmission and distribution, and declining unit costs for generation imply that durable demand growth lowers average prices. We find patterns of economies of scale for transmission, distribution, and generation costs as well as within and across retail customer classes. We caution that future supply constraints could reverse the effect.

2606.17165 2026-06-19 stat.ME cs.AI econ.EM math.ST stat.TH 新提交

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

基于LLM的A/B测试的统计基础:用于人类因果推断的替代指标框架

Joel Persson, Mårten Schultzberg, Sebastian Ankargren

发表机构 * Spotify USA, Inc.(Spotify美国公司)

AI总结 提出替代指标理论框架,证明在弱于分布等价条件下,校准LLM输出可识别平均处理效应,并分析随机性带来的偏差与方差。

详情
AI中文摘要

组织和研究者越来越有兴趣在A/B测试中使用大型语言模型(LLM)代替人类参与者,以期更快、更低成本地进行实验。我们研究当在LLM结果上估计的处理效应何时能够恢复在感兴趣的人类群体上测量的效应。LLM与人类结果之间的分布等价性会使任何标准估计量有效,但这不现实。因此,我们开发了一个统计框架,将替代终点理论适配到LLM。该框架表明,将LLM结果校准到人类结果,在替代性和可比性条件(联合弱于分布等价性)下,可以识别平均处理效应。当这些条件不成立时,感兴趣的效应仅部分可识别,我们提供了诊断方法,可以在历史实验上证伪替代性,并给出有限重叠下最坏情况偏差的界限。我们进一步证明,LLM固有的随机性会引入偏差和方差,但使用多次抽取的平均值作为替代指标可以同时缓解两者。我们在模拟和Upworthy标题的A/B测试应用中展示了方法和理论。我们工作的一个核心结论是,LLM结果作为替代指标的有效性只能对过去的处理被证伪,而无法对新处理被验证,因此对于新颖干预,人类实验仍然不可或缺。我们讨论了LLM选择、提示和温度作为设计变量的作用,以及如何确定人类实验的规模以进行验证。

英文摘要

Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect estimated on LLM outcomes can recover the effect that would have been measured on the human population of interest. Distributional equivalence between LLM and human outcomes would make any standard estimator valid but is unrealistic. We therefore develop a statistical framework that adapts surrogate endpoint theory to LLMs, showing that calibrating LLM outcomes to human outcomes identifies the average treatment effect under surrogacy and comparability conditions that are jointly weaker than distributional equivalence. We present a falsification test for surrogacy and a bound on the worst-case bias from limited overlap between the LLM and human samples. We further show that the stochasticity inherent to LLMs can weaken surrogacy for identification while also introducing bias and variance during estimation, but that using an average over multiple LLM draws per unit as the surrogate mitigates these issues. Simulations validate the results, and an empirical application to A/B tests on Upworthy headlines shows that raw LLM predictions recover only 39\% of the human treatment effect while nonparametric calibration closes the gap. A central takeaway is that A/B testing on LLMs yields correct results only by assumption, whereas A/B testing on humans is correct by design, and that the required assumptions are hardest to justify precisely where A/B testing on LLMs promises the greatest benefit. We discuss the role of LLM choice, prompting, and temperature as design variables, the compounded challenge posed by long-term outcomes, and how to size human pilot studies for validation.

2412.17470 2026-06-19 math.ST econ.EM stat.ME stat.TH 版本更新

A Necessary and Sufficient Condition for Size Controllability of Heteroskedasticity Robust Test Statistics

异方差稳健检验统计量尺寸可控性的一个充要条件

Benedikt M. Pötscher, David Preinerstorfer

AI总结 针对回归模型中单个约束检验,给出了异方差稳健检验统计量尺寸可控性的充要条件,改进了现有仅充分条件的结果。

Comments Clarification in Footnote 15 added

详情
AI中文摘要

我们重新审视了Pötscher和Preinerstorfer (2025)中关于回归模型中异方差稳健检验统计量的尺寸可控性结果。对于检验单个约束(例如,单个系数的零约束)这一特殊但重要的情形,我们给出了尺寸可控性的一个充要条件,而Pötscher和Preinerstorfer (2025)中的条件通常仅是充分的(即使在检验单个约束的情形下)。

英文摘要

We revisit size controllability results in Pötscher and Preinerstorfer (2025) concerning heteroskedasticity robust test statistics in regression models. For the special, but important, case of testing a single restriction (e.g., a zero restriction on a single coefficient), we povide a necessary and sufficient condition for size controllability, whereas the condition in Pötscher and Preinerstorfer (2025) is, in general, only sufficient (even in the case of testing a single restriction).

2603.06820 2026-06-19 econ.EM stat.OT 版本更新

Hippocratic Utility and Status Quo Bias

希波克拉底效用与现状偏见

Tomasz Strzalecki

AI总结 本文通过简单例子揭示一种重视失去生命多于拯救生命的效用函数,其适用范围比最初看起来有限得多。

详情
AI中文摘要

一种效用函数被提出,它更重视失去的生命而非被拯救的生命。我不质疑这种不对称背后的伦理动机。然而,我通过一个简单例子表明,这种决策标准的适用范围比最初看起来要有限得多。

英文摘要

A utility function has been proposed that values more lives that are lost than those that are saved. I do not dispute the ethical motivation behind this kind of asymmetry. However, I show with a simple example that the scope of applicability of such a decision criterion is considerably more limited than it may first appear.

2410.19333 2026-06-19 econ.GN physics.soc-ph q-fin.EC stat.AP 版本更新

Swiss-system chess tournaments and unfairness

瑞士制国际象棋锦标赛与不公平性

László Csató, Alex Krumer

AI总结 研究瑞士制国际象棋锦标赛中轮次奇偶性导致的不公平性,发现多执白一局的选手得分显著更高,建议采用偶数轮次和平衡颜色分配机制。

Comments 13 pages, 4 tables

详情
AI中文摘要

瑞士制是一种日益流行的比赛形式,因为它提供了比赛场次与排名准确性之间的有利权衡。然而,关于瑞士制国际象棋锦标赛在奇数轮次下潜在的不公平性,尚无实证研究。为了分析这一问题,我们的论文比较了比赛中多执白一局的选手与少执白一局的选手的得分。利用28个高知名度赛事的数据,我们发现多执白一局的选手得分显著更高。特别是在四个Grand Swiss赛事中,这一优势超过了平局的价值。解决这种不公平性的一种潜在方案是组织偶数轮次的瑞士制国际象棋锦标赛,并使用最近提出的配对机制保证所有选手的颜色分配平衡。

英文摘要

The Swiss system is an increasingly popular competition format as it provides a favourable trade-off between the number of matches and ranking accuracy. However, there is no empirical study on the potential unfairness of Swiss-system chess tournaments if an odd number of rounds is played. To analyse this issue, our paper compares the number of points scored in the tournament between players who played one game more with the white pieces and players who played one game fewer with the white pieces. Using data from 28 highly prestigious competitions, we find that players with an extra white game score significantly more points. In particular, the advantage exceeds the value of a draw in the four Grand Swiss tournaments. A potential solution to this unfairness could be organising Swiss-system chess tournaments with an even number of rounds, and guaranteeing a balanced colour assignment for all players using a recently proposed pairing mechanism.

2508.20053 2026-06-19 econ.TH 版本更新

Misperception and informativeness in statistical discrimination

统计歧视中的误解与信息量

Matteo Escudé, Paula Onuchic, Ludvig Sinander, Quitzé Valenzuela-Stookey

AI总结 研究劳动力市场统计歧视模型中信息与先验误解的相互作用,分解信息量增加对平均工资的影响为工具成分和感知修正成分,并分析其对工资差距的影响。

详情
AI中文摘要

我们研究了Phelps-Aigner-Cain型劳动力市场统计歧视模型中信息与先验(错误)感知的相互作用。我们将可观测信息关于工人技能的信息量增加对平均工资的影响分解为一个非负的工具成分(反映由于工人与任务更好匹配而增加的剩余)和一个感知修正成分(捕捉额外信息如何减少关于工人群体技能分布的先验误解的重要性)。我们确定了感知修正项的符号:如果群体在先验上被低估(高估),则该项为非负(非正)。然后,我们考虑了对于在信息、感知或两者上存在差异但技能相同的群体之间工资差距的含义,并确定了改善信息缩小工资差距的条件。

英文摘要

We study the interplay of information and prior (mis)perceptions in a Phelps-Aigner-Cain-type model of statistical discrimination in the labor market. We decompose the effect on average pay of an increase in how informative observables are about workers' skills into a non-negative instrumental component, reflecting increased surplus due to better matching of workers with tasks, and a perception-correcting component capturing how extra information diminishes the importance of prior misperceptions about the distribution of skills in the worker population. We sign the perception-correcting term: it is non-negative (non-positive) if the population was ex-ante under-perceived (over-perceived). We then consider the implications for pay gaps between equally-skilled populations that differ in information, perceptions, or both, and identify conditions under which improving information narrows pay gaps.

2512.02203 2026-06-19 econ.EM stat.AP 版本更新

Statistical Inference in Large Multi-way Networks

大规模多路网络中的统计推断

Lucas Resende, Guillaume Lecué, Lionel Wilner, Philippe Choné

AI总结 提出一种基于分类任务的多路网络结构参数估计方法,无需固定效应数量与结构假设,避免 incidental parameter 问题,在稀疏网络中比 PPML 更快且置信区间更可靠,应用于法国医疗政策因果效应分析。

Comments Working paper

详情
AI中文摘要

我们提出了一种新方法,用于在多路网络中估计结构参数,同时控制丰富的固定效应结构。该方法基于一系列分类任务,对固定效应的数量和结构均不敏感。与完全最大似然方法相比,我们的估计量不会受到 incidental parameter 问题的影响。对于稀疏连接的网络,它在计算上也比 PPML 更快。我们提供的经验证据表明,我们的估计量比 PPML 及其偏差修正策略产生更可靠的置信区间。即使在模型误设下,这些改进仍然成立,并且在稀疏设置中更为显著。虽然 PPML 在密集、低维数据中仍具有竞争力,但我们的方法为多路模型提供了一种稳健的替代方案,能够随稀疏性高效扩展。该方法被应用于研究政策改革对法国医疗空间可达性的因果效应。

英文摘要

We propose a new method to estimate structural parameters in multi-way networks while controlling for rich structures of fixed effects. The method is based on a series of classification tasks and is agnostic to both the number and structure of fixed effects. In contrast to full maximum likelihood approaches, our estimator does not suffer from the incidental parameter problem. For sparsely connected networks, it is also computationally faster than PPML. We provide empirical evidence that our estimator yields more reliable confidence intervals than PPML and its bias-correction strategies. These improvements hold even under model misspecification and are more pronounced in sparse settings. While PPML remains competitive in dense, low-dimensional data, our approach offers a robust alternative for multi-way models that scales efficiently with sparsity. The method is applied to study the causal effect of a policy reform on spatial accessibility to health care in France.

2512.17422 2026-06-19 econ.GN q-fin.EC 版本更新

Hired in High Season: Seasonal Labor Demand and Refugee Labor Market Integration

旺季雇佣:季节性劳动力需求与难民劳动力市场融合

Felix Degenhardt

AI总结 利用奥地利难民准外生分配与酒店业季节性变化,发现旺季进入低门槛酒店业使难民早期就业概率提高3个百分点,三年收入显著增加,但加剧了行业和职场隔离。

详情
AI中文摘要

我研究了早期但临时性的低门槛酒店业就业是否影响难民的劳动力市场融合。我通过将难民在奥地利各地区的准外生分配与酒店业的季节性变化相结合,利用区域内、年份内的变异,其中25%的难民首次找到工作。在季节性高需求期间进入劳动力市场使早期就业概率提高3个百分点(占均值的9%)。就业增长在一年后消失,但受影响的难民在三年内积累了显著更高的收入,中期工资或工作质量没有差异。然而,早期的酒店业工作增加了向难民典型行业和奥地利同事较少的公司的隔离。

英文摘要

I examine whether early but temporary access to low-barrier hospitality employment affects refugees' labor market integration. I exploit within-region, within-year variation by combining the quasi-exogenous allocation of refugees to Austrian regions with seasonality in hospitality, where 25% of refugees first find work. Labor market access during high seasonal demand raises early employment probability by 3 percentage points (9% of the mean). Employment gains fade after one year, but treated refugees accumulate significantly higher three-year earnings, with no differences in medium-term wages or job quality. However, early hospitality work increases segregation into refugee-typical industries and firms with fewer Austrian coworkers.

2502.06866 2026-06-19 cs.LG cs.AI econ.EM stat.AP stat.ML 版本更新

Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies

全球生活便利指数:面向主要经济体纵向分析的机器学习框架

Arun Kumar Selvaraj, Tanay Panat, Rohitash Chandra

发表机构 * Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics(过渡人工智能研究组,数学与统计学学院) Centre for Artificial Intelligence and Innovation(人工智能与创新中心) Pingla Institute(Pingla研究所)

AI总结 提出全球生活便利指数,结合社会经济和基础设施因素,利用机器学习处理缺失数据,并通过主成分分析和因子分析降维,为政策制定者提供改善生活质量的可操作工具。

详情
AI中文摘要

全球经济、地缘政治条件以及COVID-19疫情等破坏性事件对生活成本和生活质量产生了巨大影响。理解主要经济体中生活成本和生活质量的长期影响至关重要。一个透明且全面的生活指数必须包含生活条件的多个维度。在本研究中,我们提出了一种通过全球生活便利指数量化生活质量的方法,该指数将各种社会经济和基础设施因素整合为一个单一综合得分。我们的指数利用定义生活水平的经济指标,这有助于针对特定领域进行干预改进。我们提出了一个机器学习框架来处理特定国家某些经济指标的数据缺失问题。然后,我们整理并更新数据,并使用降维方法(主成分分析和因子分析)创建自1970年以来主要经济体的生活便利指数。我们的工作通过为政策制定者提供识别需要改进领域(如医疗系统、就业机会和公共安全)的实用工具,显著丰富了相关文献。我们的方法使用开放数据和代码,易于复现并适用于各种情境,为生活质量评估的持续研究和政策制定提供了透明度和可访问性。

英文摘要

The drastic changes in the global economy, geopolitical conditions, and disruptions such as the COVID-19 pandemic have impacted the cost of living and quality of life. It is essential to comprehend the long-term implications of the cost of living and quality of life in major economies. A transparent and comprehensive living index must include multiple dimensions of living conditions. In this study, we present an approach to quantifying the quality of life through the Global Ease of Living Index that combines various socio-economic and infrastructural factors into a single composite score. Our index utilises economic indicators that define living standards, which could help in targeted interventions to improve specific areas. We present a machine learning framework to address missing data for certain economic indicators in specific countries. We then curate and update the data and use a dimensionality reduction approach (Principal Component Analysis and Factor Analysis) to create the Ease of Living Index for major economies since 1970. Our work significantly adds to the literature by offering a practical tool for policymakers to identify areas needing improvement, such as healthcare systems, employment opportunities, and public safety. Our approach with open data and code can be easily reproduced and applied to various contexts, providing transparency and accessibility for ongoing research and policy development in quality-of-life assessment.

2202.03332 2026-06-19 stat.ME econ.EM stat.AP 版本更新

Practical Forecasting of Environmental Maps: A Functional Data Approach

环境地图的实用预测:一种函数型数据方法

Alexander Gleim, Nazarii Salish

AI总结 提出一种基于函数型数据分析的统计方法,用于预测随时间变化的地理区域环境数据,通过整合时空依赖关系生成预测表面,并以德国地面臭氧浓度预测为例验证其有效性。

详情
AI中文摘要

环境问题在社会经济和健康研究中日益受到关注,推动了相关现实过程记录和数据收集的进展。然而,传统数据处理工具往往过于局限,无法考虑此类数据集的丰富特性。本文提出了一种简单的统计视角,用于预测随时间在预定义地理区域上顺序收集的环境数据。我们将此类数据集视为具有可能复杂地理区域的表面(或函数型)时间序列。利用函数型数据分析技术,我们开发了一种预测方法,能够同时考虑地理和时间依赖性。该方法允许整合传统多元技术以提供预测表面。我们通过德国地面臭氧浓度的预测示例展示了我们方法的实用价值,证明了其有效性和广泛应用的潜力。

英文摘要

Environmental problems are receiving increasing attention in socio-economic and health studies, fostering advances in recording and data collection of related real-life processes. However, traditional tools for data processing are often found too restrictive as they do not account for the rich nature of such data sets. In this paper, we propose a simple statistical perspective on forecasting environmental data collected sequentially over time across some predefined geographic region. We treat such data set as a surface (or functional) time series with a possibly complicated geographical domain. Using techniques from functional data analysis, we develop a forecasting methodology that allows to account for both geographic and temporal dependencies. This methodology allows integration of traditional multivariate techniques to provide forecasts surfaces. We demonstrate the practical value of our approach with a forecasting example of ground-level ozone concentration across Germany, showcasing its effectiveness and potential for broad application.