URL PDF HTML ☆

赞 0 踩 0

2605.00798 2026-05-04 cs.LG cs.CL cs.MA

用人类注视建模主观城市感知

Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结本文提出基于注视行为的城市感知框架，通过结合注视数据与场景表示，提升对主观城市感知的预测能力。

详情

AI中文摘要

城市感知描述了人们对城市环境的主观评价，塑造了城市被体验和理解的方式。现有计算方法主要从街景图像直接建模城市感知，但忽略了人类感知过程如何形成这些判断。本文引入Place Pulse-Gaze数据集，通过同步眼动记录和个体感知标签增强街景图像。基于该数据集，我们提出Gaze-Guided Urban Perception Framework，系统研究注视行为如何影响主观城市感知的建模。框架探讨了三种互补设置：仅基于注视的建模、注视与显式语义场景表示的融合、以及注视与隐式更丰富的视觉表示的融合。实验表明，注视本身已包含有用的预测信号，结合场景表示进一步提升预测效果。整体发现强调了将人类感知过程纳入城市场景理解的重要性，并为基于注视的多模态城市计算开辟了新方向。

英文摘要

Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed. In this paper, we introduce Place Pulse-Gaze, an urban perception dataset that augments street view images with synchronized eye-tracking recordings and individual perception labels. Based on this dataset, we propose a Gaze-Guided Urban Perception Framework to study how gaze behavior contributes to the modeling of subjective urban perception. The framework systematically investigates three complementary settings: gaze-only modeling, gaze fusion with explicit semantic scene representations, and gaze fusion with implicit richer visual representations. Experiments show that gaze alone already carries useful predictive signals for subjective urban perception, and that integrating gaze with scene representations further improves prediction under both semantic and richer visual representations. Overall, our findings highlight the importance of incorporating human perceptual processes into urban scene understanding and open a direction for gaze-guided multimodal urban computing.

URL PDF HTML ☆

赞 0 踩 0

2605.00762 2026-05-04 cs.LG cs.AI cs.MA

基于量子梯度的方法用于使用Sobel核的边缘和角点检测

Mohammad Aamir Sohail, Gabriela Pinheiro, Yasemin Poyraz Kocak, Batuhan Hangun, Emre Camkerten, Simge Yigit, Hafize Asude Ertan

发表机构 * Department of Electrical Engineering and Computer Science（电气工程与计算机科学系）； Department of Computer Science（计算机科学系）； Department of Computer Technologies（计算机技术系）； Department of Computer Engineering（计算机工程系）

AI总结本文提出了一种量子实现的Sobel边缘检测和Harris角点检测方法，采用FRQI和QPIE两种量子图像编码方式，通过量子梯度计算和经典后处理提升检测性能，实验结果与经典方法一致，QPIE配置在有限测量次数下表现更稳定。

详情

AI中文摘要

边缘检测是指识别数字图像中强度急剧变化的点，表示物体边界或结构特征。角点是灰度强度在多个方向上突然变化的位置，广泛应用于特征提取、目标跟踪和3D建模。本文提出了一种基于量子梯度计算的Sobel边缘检测和Harris风格角点检测的量子实现。采用两种量子图像编码方法——灵活的量子图像表示（FRQI）和量子概率图像编码（QPIE）——对输入数据进行编码并进行比较分析。所提出的方法引入了一种基于滞后2差分的量子梯度计算方案，使能够在叠加态下评估梯度特征。为提高检测质量和减少误报，对由量子电路识别出的候选角点应用经典后处理步骤。实验结果表明，所提出的量子电路输出与经典Sobel和Harris算子一致。此外，基于QPIE的配置在有限测量次数下比FRQI更稳定和一致。尽管梯度计算可以在电路层面高效执行，但总体成本仍由状态制备、测量和经典后处理主导。所有实验均在无噪声模拟下进行，性能在NISQ硬件上可能受噪声和测量限制的影响。因此，本文展示了经典边缘和角点检测方法的功能性和可扩展的量子实现，而非端到端的加速。

英文摘要

Edge detection refers to identifying points in a digital image where intensity changes sharply, indicating object boundaries or structural features. Corners are locations where gray-level intensity changes abruptly in multiple directions and are widely used in feature extraction, object tracking, and 3D modeling. In this study, we present a quantum implementation of Sobel-based edge detection and Harris-style corner detection. Two quantum image encoding methods - Flexible Representation of Quantum Images (FRQI) and Quantum Probability Image Encoding (QPIE) - are used to encode the input data and are comparatively analyzed. The proposed approach introduces a quantum gradient computation scheme based on lag-2 differences, enabling the evaluation of gradient-like features in superposition. To improve detection quality and reduce false positives, a classical post-processing step is applied to candidate corner points identified by the quantum circuit. Results show that the proposed quantum circuits produce outputs consistent with classical Sobel and Harris operators. Furthermore, the QPIE-based configuration yields more stable and coherent results than FRQI, especially under limited measurement shots. While gradient computation can be performed efficiently at the circuit level, the overall cost remains dominated by state preparation, measurement, and classical post-processing. All experiments are conducted under noiseless simulation, and performance on NISQ hardware may be affected by noise and measurement limitations. Therefore, this work demonstrates a functional and scalable quantum realization of classical edge and corner detection methods rather than an end-to-end speedup.

URL PDF HTML ☆

赞 0 踩 0

2605.00738 2026-05-04 cs.LG

Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

预测非计划性医院再入院的时序数据需求

Ramin Mohammadi, Vahab vahdat, Sarthak Jain, Amir T. Namin, Ramya Palacholla, Sagar Kamarthi

发表机构 * Northeastern University（东北大学）； Partners Healthcare Connected Health Innovation（Partners Healthcare 连接健康创新）； MGH Institute for Technology Assessment（MGH 技术评估研究所）； Harvard Medical School（哈佛医学院）； Tufts University School of Medicine（塔夫茨大学医学院）； Department of Public Health and Community Medicine（公共卫生与社区医学系）

AI总结本文研究了不同观察窗口对预测髋膝关节置换术后30天再入院的影响，发现非结构化临床笔记的最佳时间窗口比结构化数据更短，而结构化数据在延长窗口后性能趋于稳定。

详情

AI中文摘要

随着电子健康记录（EHRs）的普及，构建预测模型中的关键挑战是确定最优的历史数据时间窗口以最大化准确性。本研究探讨了从手术当天到三年前的各种观察窗口对髋膝关节置换术后30天再入院预测的影响。数据集涵盖超过400万条结构化就诊记录和8万条非结构化临床笔记，来自7174名患者。为了从临床笔记中提取意义，我们采用了非神经网络（BOW、计数BOW、TF-IDF、LDA）和神经网络编码器（BERT、1D CNN、BiLSTM、Average）。随后，我们评估了仅使用临床笔记、仅使用结构化数据以及两者结合的模型。我们的结果显示，非结构化临床笔记的最佳时间窗口显著短于结构化数据，最大预测性能通过手术前三个月至六个月的笔记实现。相比之下，使用结构化数据的性能随着时间窗口的延长而提高，但在十二个月后趋于稳定。这些模态特定的时间模式在模型复杂度或编码器类型无关的情况下保持一致。最终，这些发现挑战了更多历史数据必然带来更好机器学习预测的一般假设，建立了针对优化再入院预测模型的时间窗口指南。

英文摘要

With the proliferation of Electronic Health Records (EHRs), a critical challenge in building predictive models is determining the optimal historical data time window to maximize accuracy. This study investigates the impact of various observation windows ranging from the day of surgery to three years prior on predicting 30-day readmission following hip and knee arthroplasties. The dataset encompasses both structured encounter records (over 4 million) and unstructured clinical notes (80,000) from 7,174 patients. To extract meaning from the clinical notes, we employed a suite of non neural (BOW, count BOW, TF IDF, LDA) and neural encoders (BERT, 1D CNN, BiLSTM, Average). We subsequently evaluated models utilizing clinical notes alone, structured data alone, and a combination of both modalities. Our results demonstrate that the optimal time window for unstructured clinical notes is significantly shorter than for structured data, maximum predictive performance was achieved using notes from just three to six months prior to surgery. In contrast, performance using structured data improved as the time window lengthened, but strictly plateaued after twelve months. These modality-specific temporal patterns remained consistent regardless of model complexity or encoder type. Ultimately, these findings challenge the general assumption that more historical data inherently yields better machine learning predictions, establishing targeted time-window guidelines for optimizing readmission prediction models.

URL PDF HTML ☆

赞 0 踩 0

2604.27977 2026-05-04 cs.AI cs.LG

D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

D3-Gym：构建现实世界可验证环境用于数据驱动发现

Hanane Nour Moussa, Yifei Li, Zhuoyang Li, Yankai Yang, Cheng Tang, Tianshu Zhang, Nesreen K. Ahmed, Ali Payani, Ziru Chen, Huan Sun

发表机构 * The Ohio State University（俄亥俄州立大学）； Cisco Research（思科研究）

AI总结 D3-Gym通过构建首个自动化的可验证环境数据集，提升科学数据驱动发现的模型能力，验证信号质量高，训练效果显著。

详情

AI中文摘要

尽管语言模型和代理在科学数据驱动发现方面取得进展，但缺乏可验证的现实世界科学任务环境限制了其发展。为填补这一空白，我们引入D3-Gym，首个自动构建的可验证环境数据集。D3-Gym包含565个来自239个真实科学仓库的跨学科任务，每个任务配备自然语言指令、可执行环境、输入数据集、参考代码解决方案和自动合成评估脚本。严格评估证实D3-Gym的验证信号质量高，与人工标注的黄金标准一致度达87.5%，在领域特定评估逻辑上表现良好。进一步训练发现，基于D3-Gym采样的轨迹在ScienceAgentBench上提升了Qwen3模型性能，Qwen3-32B模型提升7.8个绝对点，大幅缩小与强私有模型的差距。所有D3-Gym成果均可在https://github.com/OSU-NLP-Group/D3-Gym获取。

英文摘要

Despite recent progress in language models and agents for scientific data-driven discovery, further advancing their capabilities is held back by the absence of verifiable environments representing real-world scientific tasks. To fill this gap, we introduce D3-Gym, the first automatically constructed dataset with verifiable environments for scientific Data-Driven Discovery. D3-Gym comprises (1) 565 tasks sourced from 239 real scientific repositories across four disciplines where (2) each task is equipped with a natural language instruction, an executable environment with pre-installed dependencies, input dataset and artifact previews, a reference code solution, and an automatically synthesized evaluation script. Rigorous evaluation of the quality of the verification signal in D3-Gym confirms that our evaluation scripts achieve 87.5% agreement with human-annotated gold standards and strong alignment in domain-specific evaluation logic, showing their scientific soundness. Further, training on trajectories sampled from D3-Gym yields consistent and substantial gains across Qwen3 models of varying sizes on ScienceAgentBench, boosting Qwen3-32B by 7.8 absolute points and substantially shrinking the gap with strong proprietary models. All D3-Gym artifacts (environments, creation workflow, trajectories, and models) can be found at https://github.com/OSU-NLP-Group/D3-Gym.

URL PDF HTML ☆

赞 0 踩 0

2604.10418 2026-05-04 cs.CL

Turing or Cantor: That is the Question

图灵还是康托尔：这是一个问题

Eugene Eberbach

发表机构 * Dept. of Eng. and Science, Rensselaer Polytechnic Institute（工程与科学系，伦斯勒理工学院）

AI总结本文探讨图灵成就与康托尔集合论的关联，提出基于输入数据概率分布的不可判定性度量，并定义了三种新的TM不可判定问题复杂性类。

Comments arXiv admin note: text overlap with arXiv:2106.15969

详情

AI中文摘要

Alan Turing被视为计算机科学的奠基人之一，与Kurt Godel、Alonzo Church和John von Neumann齐名。本文提出了多项新研究成果，证明图灵的成就离不开康托尔在集合论和数学基础中的早期贡献。文章提出基于输入数据概率分布的不可判定性度量，以及扩展图灵对无限逻辑和Oracle机的工作至更广泛的超图灵计算模型。还定义了三种新的TM不可判定问题复杂性类：U-完全（通用完全）、D-完全（对角线完全）和H-完全（超计算完全）类。这些定义此前未被其他科学家明确提出，受Cook/Levin NP完全类的启发。最后，对于NP完全类的P≠NP问题，本文对U-完全类的不可判定问题复杂性进行了否定性回答。

英文摘要

Alan Turing is considered as a founder of current computer science together with Kurt Godel, Alonzo Church and John von Neumann. In this paper multiple new research results are presented. It is demonstrated that there would not be Alan Turing's achievements without earlier seminal contributions by Georg Cantor in the set theory and foundations of mathematics. It is proposed to introduce the measure of undecidability of problems unsolvable by Turing machines based on probability distribution of its input data, i.e., to provide the degree of unsolvabilty based on the number of undecidable instances of input data versus decidable ones. It is proposed as well to extend the Turing's work on infinite logics and Oracle machines to a whole class of super-Turing models of computation. Next, the three new complexity classes for TM undecidable problems have been defined: U-complete (Universal complete), D-complete (Diagonalization complete) and H-complete (Hypercomputation complete) classes. The above has never been defined explicitly before by other scientists, and has been inspired by Cook/Levin NP-complete class for intractable problems. Finally, an equivalent to famous P is not equal to NP unanswered question for NP-complete class, has been answered negatively for U-complete class of complexity for undecidable problems.

URL PDF HTML ☆

赞 0 踩 0

2604.06940 2026-05-04 cs.LG cs.AI

WildfireVLM：基于卫星影像的AI分析用于早期野火检测与风险评估

Aydin Ayanzadeh, Prakhar Dixit, Sadia Kamal, Milton Halem

发表机构 * Department of Computer Science and Electrical Engineering（计算机科学与电气工程系）； University of Maryland, Baltimore County（马里兰大学巴尔的摩分校）

AI总结 WildfireVLM结合卫星影像检测与语言驱动的风险评估，利用YOLOv12检测火区与烟雾，并通过多模态大语言模型生成风险评估和应急响应建议，验证其有效性并实现实时处理与长期追踪。

详情

AI中文摘要

野火对生态系统、人类生命和基础设施构成日益增长的威胁，其频率和强度因气候变化和人类活动而增加。早期检测至关重要，但基于卫星的监测因烟雾信号微弱、天气动态变化及大范围实时分析需求而具有挑战性。我们引入WildfireVLM，一种结合卫星影像野火检测与语言驱动风险评估的AI框架。我们使用Landsat-8/9、GOES-16和其他公开地球观测源构建了标注的野火和烟雾数据集，包括对齐的光谱带产品。WildfireVLM利用YOLOv12检测火区和烟雾云，利用其在卫星影像中检测小而复杂模式的能力。我们整合多模态大语言模型（MLLMs），将检测输出转换为上下文化的风险评估和优先级应急响应建议。我们使用LLM-as-judge评估验证风险推理质量，采用服务导向架构部署系统，支持实时处理、风险可视化仪表板和长期野火追踪，展示了计算机视觉与语言推理结合在可扩展野火监测中的价值。代码和数据集已公开在GitHub上：https://github.com/Ayanzadeh93/_WildfireVLM_.

英文摘要

Wildfires are a growing threat to ecosystems, human lives, and infrastructure, with their frequency and intensity rising due to climate change and human activities. Early detection is critical, yet satellite-based monitoring remains challenging due to faint smoke signals, dynamic weather conditions, and the need for real-time analysis over large areas. We introduce WildfireVLM, an AI framework that combines satellite imagery wildfire detection with language-driven risk assessment. We construct a labeled wildfire and smoke dataset using imagery from Landsat-8/9, GOES-16, and other publicly available Earth observation sources, including harmonized products with aligned spectral bands. WildfireVLM employs YOLOv12 to detect fire zones and smoke plumes, leveraging its ability to detect small, complex patterns in satellite imagery. We integrate Multimodal Large Language Models (MLLMs) that convert detection outputs into contextualized risk assessments and prioritized response recommendations for disaster management. We validate the quality of risk reasoning using an LLM-as-judge evaluation with a shared rubric. The system is deployed using a service-oriented architecture that supports real-time processing, visual risk dashboards, and long-term wildfire tracking, demonstrating the value of combining computer vision with language-based reasoning for scalable wildfire monitoring. The code and dataset are publicly available on GitHub at https://github.com/Ayanzadeh93/_WildfireVLM_.

URL PDF HTML ☆

赞 0 踩 0

2601.21214 2026-05-04 cs.CL cs.LG

Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models

推理步扩展揭示弱点：解密和改进大语言模型中的推理泛化

Zhaoyi Li, Jiatong Li, Gangwei Jiang, Linqi Song, Defu Lian, Ying Wei

发表机构 * University of Science and Technology of China（中国科学技术大学）； City University of Hong Kong（香港城市大学）； Zhejiang University（浙江大学）； City University of Hong Kong, Shenzhen Research Institute（香港城市大学深圳研究院）

AI总结本文通过多领域任务研究发现，推理错误集中在少数关键错误类型的位置，而非均匀分布。提出在推理过程中动态识别并禁用错误处理头，从而提升推理步泛化能力。

Comments 52 pages, accepted by ICLR 2026 main conference

详情

AI中文摘要

链式推理（CoT）推理已成为使大语言模型（LLMs）解决复杂问题的标准范式。然而，最近的研究揭示了在推理步泛化场景中性能显著下降的问题，即所需推理步骤数超过训练分布时，底层算法未变。驱动这一失败的内部机制尚不明确。在本文中，我们对多个领域的任务进行了系统研究，发现错误集中在少数关键错误类型的token位置，而非均匀分布。更深入的检查发现，这些token层面的错误预测源于内部竞争机制：某些称为错误处理头（ep heads）的注意力头通过放大错误推理轨迹而抑制正确轨迹。值得注意的是，在推理过程中移除单个ep heads通常可以恢复正确预测。受这些见解启发，我们提出了推理过程中的测试时间修正，一种轻量级干预方法，可动态识别并禁用推理过程中的ep heads。在不同任务和LLM上的广泛实验表明，它一致地提高了推理步泛化能力，突显了其有效性和潜力。

英文摘要

Chain-of-thought (CoT) reasoning has become the standard paradigm for enabling Large Language Models (LLMs) to solve complex problems. However, recent studies reveal a sharp performance drop in reasoning hop generalization scenarios, where the required number of reasoning steps exceeds training distributions while the underlying algorithm remains unchanged. The internal mechanisms driving this failure remain poorly understood. In this work, we conduct a systematic study on tasks from multiple domains, and find that errors concentrate at token positions of a few critical error types, rather than being uniformly distributed. Closer inspection reveals that these token-level erroneous predictions stem from internal competition mechanisms: certain attention heads, termed erroneous processing heads (ep heads), tip the balance by amplifying incorrect reasoning trajectories while suppressing correct ones. Notably, removing individual ep heads during inference can often restore the correct predictions. Motivated by these insights, we propose test-time correction of reasoning, a lightweight intervention method that dynamically identifies and deactivates ep heads in the reasoning process. Extensive experiments across different tasks and LLMs show that it consistently improves reasoning hop generalization, highlighting both its effectiveness and potential.

URL PDF HTML ☆

赞 0 踩 0

2512.16762 2026-05-04 cs.LG

NRGPT: An Energy-based Alternative for GPT

NRGPT：一种基于能量的GPT替代方案

Nima Dehmamy, Benjamin Hoover, Bishwajit Saha, Leo Kozachkov, Jean-Jacques Slotine, Dmitry Krotov

发表机构 * IBM Research（IBM研究院）； Georgia Tech（佐治亚理工学院）； Brown University（布朗大学）； MIT（麻省理工学院）

AI总结 NRGPT通过最小化修改将GPT与能量基模型框架统一，其推理过程被视为在能量景观上探索，实验证明在特定条件下可转化为梯度下降，适用于简单语言、代数任务和更复杂的语言建模。

Comments Accepted to ICLR 2026 main conference

2512.01116 2026-05-04 cs.CV

Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

多模态癌症生存分析中的结构预后事件建模

Yilan Zhang, Li Nanbo, Changchun Yang, Jürgen Schmidhuber, Xin Gao

发表机构 * King Abdullah University of Science and Technology（国王 Abdullah 科学与技术大学）

AI总结本文提出SlotSPE框架，通过槽注意力机制压缩多模态数据为结构化槽，有效建模癌症生存分析中的复杂交互，提升预后相关性与可解释性。

Comments 37 pages, 14 Figures

Journal ref The Fourteenth International Conference on Learning Representations (ICLR2026)

详情

AI中文摘要

组织学图像与基因谱的整合在改善癌症生存预测方面显示出巨大潜力。然而，当前方法在高效有效建模内模和跨模交互方面面临挑战，由于输入的高维性和复杂性。主要挑战是捕捉关键预后事件，尽管数量少，但这些事件奠定了观察输入的复杂性，并很大程度决定患者结果。这些事件表现为高水平的结构信号，如空间组织学模式或通路协同激活，通常稀疏、患者特异且未标注，使它们难以揭示。为此，我们提出SlotSPE，一种基于槽的结构预后事件建模框架。具体而言，受因子编码原理启发，我们使用槽注意力将每个患者的多模态输入压缩为紧凑、模态特定且互不相同的槽集合。通过利用这些槽表示作为预后事件的编码，我们的框架实现了复杂内模和跨模交互的高效有效建模，同时促进无缝整合生物先验知识以增强预后相关性。在十个癌症基准上的广泛实验表明，SlotSPE在8个队列中优于现有方法，整体改进2.9%。它在缺失基因组数据下仍保持稳健，并通过结构化事件分解显著提高可解释性。

英文摘要

The integration of histology images and gene profiles has shown great promise for improving survival prediction in cancer. However, current approaches often struggle to model intra- and inter-modal interactions efficiently and effectively due to the high dimensionality and complexity of the inputs. A major challenge is capturing critical prognostic events that, though few, underlie the complexity of the observed inputs and largely determine patient outcomes. These events, manifested as high-level structural signals such as spatial histologic patterns or pathway co-activations, are typically sparse, patient-specific, and unannotated, making them inherently difficult to uncover. To address this, we propose SlotSPE, a slot-based framework for structural prognostic event modeling. Specifically, inspired by the principle of factorial coding, we compress each patient's multimodal inputs into compact, modality-specific sets of mutually distinctive slots using slot attention. By leveraging these slot representations as encodings for prognostic events, our framework enables both efficient and effective modeling of complex intra- and inter-modal interactions, while also facilitating seamless incorporation of biological priors that enhance prognostic relevance. Extensive experiments on ten cancer benchmarks show that SlotSPE outperforms existing methods in 8 out of 10 cohorts, achieving an overall improvement of 2.9%. It remains robust under missing genomic data and delivers markedly improved interpretability through structured event decomposition.

URL PDF HTML ☆

赞 0 踩 0

2510.22819 2026-05-04 cs.LG

Last-Iterate Analyses of FTRL with the 1/2-Tsallis Entropy in Stochastic Bandits

FTRL在随机老虎机中使用1/2-Tsallis熵的最后迭代分析

Jingxin Zhan, Yuze Han, Zhihua Zhang

发表机构 * School of Mathematical Sciences, Peking University（北京大学数学科学学院）； Center for Applied Statistics and School of Statistics, Renmin University of China（中国人民大学统计学院）

AI总结本文研究了使用1/2-Tsallis熵正则化器的FTRL算法，证明了其最后迭代收敛率为t^{-1/2}，并验证了对数遗憾与该收敛率的对应关系。

详情

AI中文摘要

在线学习算法的收敛性分析是机器学习理论的核心，其中最后迭代收敛性尤其重要，因为它捕捉了学习者的实际决策并描述了学习过程随时间的演变。然而，在多臂老虎机中，大多数现有算法分析主要关注遗憾的顺序，而最后迭代（简单遗憾）收敛率仍较少被研究，尤其是对于广泛研究的Follow-the-Regularized-Leader（FTRL）算法。最近，使用$1/2$-Tsallis熵正则化器$Ψ(p) = -4\sum_{i=1}^d \sqrt{p_i}$（即1/2-Tsallis-INF算法，由arXiv:1807.07623提出）的FTRL算法在随机老虎机中实现了对数遗憾。然而，其最后迭代收敛率尚未被研究。直观上，对数遗憾应对应于$t^{-1}$的最后迭代收敛率。本文研究了1/2-Tsallis-INF算法，并通过理论分析部分验证了这一直觉，证明了由$Ψ(p)$定义的Bregman散度，即在迭代$t$时在最优臂上的点质量和在臂集上的概率分布之间的散度，以$t^{-1/2}$的速度衰减。

英文摘要

The convergence analysis of online learning algorithms is central to machine learning theory, where the last-iterate convergence is particularly important, as it captures the learner's actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored -- especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms. Recently, FTRL with the $1/2$-Tsallis entropy regularizer $Ψ(p) = -4\sum_{i=1}^d \sqrt{p_i}$ (the $1/2$-Tsallis-INF algorithm, by arXiv:1807.07623) was shown to achieve logarithmic regret in stochastic bandits. Nevertheless, its last-iterate convergence rate has not yet been studied. Intuitively, logarithmic regret should correspond to a $t^{-1}$ last-iterate convergence rate. This paper studies the $1/2$-Tsallis-INF algorithm and partially confirms this intuition through theoretical analysis, showing that the Bregman divergence, defined by $Ψ(p)$, between the point mass on the optimal arm and the probability distribution over the arm set obtained at iteration $t$, decays at a rate of $t^{-1/2}$.

URL PDF HTML ☆

赞 0 踩 0

2507.22699 2026-05-04 cs.CV

Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints

基于网格不可伸长约束的图像引导形状从模板方法

Thuy Tran, Ruochen Chen, Shaifali Parashar

发表机构 * CNRS（法国国家科学研究中心）； École Centrale de Lyon（里昂中央理工大学）； INSA Lyon（里昂国立应用科学学院）； Université Claude Bernard Lyon 1（里昂一大学）； LIRIS（图像研究所）

AI总结本文提出一种无监督的形状从模板方法，利用图像观测和网格不可伸长约束，实现比现有无监督方法快400倍的重建速度，并在细节生成和严重遮挡处理上表现更优。

Comments Accepted to ICCV 2025. Total 13 pages, 9 figures, 9 tables

详情

AI中文摘要

形状从模板（SfT）是指通过图像或视频重建变形物体3D形状的一类方法，使用3D模板。传统SfT方法需要图像与3D模板之间的点对应关系和纹理信息，以实时重建3D形状，但在严重遮挡情况下性能显著下降。相比之下，现代SfT方法通过深度神经网络进行无对应关系的重建，但需要大量数据监督。最近的进展结合可微物理和图形，采用完全无监督或自监督方法变形3D模板以匹配输入图像。本文提出了一种无监督SfT方法，仅使用图像观测（颜色特征、梯度和轮廓）和网格不可伸长约束，以400倍的速度重建3D形状，并在生成细节和处理严重遮挡方面显著优于现有方法。代码可在https://github.com/dvttran/nsft获取。

英文摘要

Shape-from-Template (SfT) refers to the class of methods that reconstruct the 3D shape of a deforming object from images/videos using a 3D template. Traditional SfT methods require point correspondences between images and the texture of the 3D template in order to reconstruct 3D shapes from images/videos in real time. Their performance severely degrades when encountered with severe occlusions in the images because of the unavailability of correspondences. In contrast, modern SfT methods use a correspondence-free approach by incorporating deep neural networks to reconstruct 3D objects, thus requiring huge amounts of data for supervision. Recent advances use a fully unsupervised or self-supervised approach by combining differentiable physics and graphics to deform 3D template to match input images. In this paper, we propose an unsupervised SfT which uses only image observations: color features, gradients and silhouettes along with a mesh inextensibility constraint to reconstruct at a $400\times$ faster pace than (best-performing) unsupervised SfT. Moreover, when it comes to generating finer details and severe occlusions, our method outperforms the existing methodologies by a large margin. Code is available at https://github.com/dvttran/nsft.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Posterior Augmented Flow Matching

Generating Statistical Charts with Validation-Driven LLM Workflows

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

Make Your LVLM KV Cache More Lightweight

SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

Map2World: Segment Map Conditioned Text to 3D World Generation

Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint

LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

Modeling Subjective Urban Perception with Human Gaze

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

Quantum Gradient-Based Approach for Edge and Corner Detection Using Sobel Kernels

Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

Turing or Cantor: That is the Question

A First Guess is Rarely the Final Answer: Learning to Search in the Traveling Salesperson Problem

How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas

Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision

The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning

WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery

Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models

NRGPT: An Energy-based Alternative for GPT

Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

Last-Iterate Analyses of FTRL with the 1/2-Tsallis Entropy in Stochastic Bandits

Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints