图像生成 - arXivDaily 专题

2603.07236 2026-06-19 cs.CV 版本更新 85%

HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing

HY-WU (第一部分): 一种可扩展的功能性神经记忆框架及其在文本引导图像编辑中的应用

Mengxuan Wu, Xuanlei Zhao, Ziqiao Wang, Ruicheng Feng, Zhangyang Wang, Kai Wang

发表机构 * Tencent HY Team（腾讯 HY 团队）

专题命中图像编辑：提出HY-WU框架用于文本引导图像编辑。

AI总结提出HY-WU框架，通过功能性神经记忆模块即时生成实例特定权重更新，避免共享权重覆盖导致的干扰，解决持续学习与个性化中的灾难性遗忘问题。

详情

AI中文摘要

基础模型正从离线预测器过渡到期望长时间运行的部署系统。在实际部署中，目标并非固定：领域漂移、用户偏好演变，以及模型发布后出现新任务。这将持续学习和即时个性化从可选功能提升为核心架构要求。然而，大多数适应流程仍遵循静态权重范式：训练后（或任何适应步骤后），推理执行单一参数向量，而不考虑用户意图、领域或实例特定约束。这将训练或适应后的模型视为参数空间中的单个点。在异构且持续演变的机制中，不同目标可能在参数上诱导分离的可行区域，迫使任何单一共享更新陷入妥协、干扰或过度专业化。结果，持续学习和个性化通常实现为对共享权重的重复覆盖，冒着先前学习行为退化的风险。我们提出HY-WU（权重释放），一种记忆优先的适应框架，将适应压力从覆盖单一共享参数点转移。HY-WU将功能性（算子级）记忆实现为神经模块：一个根据实例条件即时合成权重更新的生成器，产生实例特定算子而无需测试时优化。

英文摘要

Foundation models are transitioning from offline predictors to deployed systems expected to operate over long time horizons. In real deployments, objectives are not fixed: domains drift, user preferences evolve, and new tasks appear after the model has shipped. This elevates continual learning and instant personalization from optional features to core architectural requirements. Yet most adaptation pipelines still follow a static weight paradigm: after training (or after any adaptation step), inference executes a single parameter vector regardless of user intent, domain, or instance-specific constraints. This treats the trained or adapted model as a single point in parameter space. In heterogeneous and continually evolving regimes, distinct objectives can induce separated feasible regions over parameters, forcing any single shared update into compromise, interference, or overspecialization. As a result, continual learning and personalization are often implemented as repeated overwriting of shared weights, risking degradation of previously learned behaviors. We propose HY-WU (Weight Unleashing), a memory-first adaptation framework that shifts adaptation pressure away from overwriting a single shared parameter point. HY-WU implements functional (operator-level) memory as a neural module: a generator that synthesizes weight updates on-the-fly from the instance condition, yielding instance-specific operators without test-time optimization.

URL PDF HTML ☆

赞 0 踩 0

2602.01391 2026-06-19 cs.CV 版本更新 70%

Relighting as a Probe of Visual Priors via Augmented Latent Intrinsics

通过增强潜在本征属性将重光照作为视觉先验的探针

Xiaoyan Xing, Xiao Zhang, Sezer Karaoglu, Theo Gevers, Anand Bhattad

发表机构 * UvA-Bosch Delta Lab, University of Amsterdam, Amsterdam, Netherlands（乌得勒支大学阿姆斯特丹分校博世Delta实验室）； The University of Chicago, Chicago, USA（芝加哥大学）； Johns Hopkins University, Baltimore, USA（约翰霍普金斯大学）

专题命中图像编辑：重光照属于图像编辑范畴

AI总结提出增强潜在本征属性（ALI）方法，融合密集像素对齐视觉特征到潜在本征重光照模型，平衡语义与光度保真度，提升复杂材质重光照质量。

Comments Camera-ready version for ICML 2026. Project page: https://augmented-latent-intrinsics.github.io

详情

AI中文摘要

图像到图像的重光照需要能够将光照与场景属性分离，同时保留密集几何、材质和光度线索的表征。我们将此任务用作视觉先验的探针：与奖励不变性的识别任务不同，重光照测试视觉特征是否保留光传输所需的信息。通过一个受控的生成式重光照框架，我们发现强语义编码器会降低重光照质量，揭示了抽象与物理保真度之间的语义-光度权衡。我们引入了增强潜在本征属性（ALI），通过将密集的、像素对齐的视觉特征融合到潜在本征重光照模型中，并在未标注的真实图像对上通过自监督进行细化，来平衡这一权衡。ALI提高了重光照质量，尤其是在光泽、金属和透明材质上，并证明了生成式重光照是量化视觉编码器对物理世界编码内容的有效工具。

英文摘要

Image-to-image relighting requires representations that separate illumination from scene properties while preserving dense geometry, material, and photometric cues. We use this task as a probe of visual priors: unlike recognition tasks that reward invariance, relighting tests whether visual features retain the information needed for light transfer. Through a controlled generative relighting framework, we find that strong semantic encoders can degrade relighting quality, exposing a semantic--photometric trade-off between abstraction and physical fidelity. We introduce Augmented Latent Intrinsics (ALI), which balances this trade-off by fusing dense, pixel-aligned visual features into a latent-intrinsic relighting model and refining it with self-supervision on unlabeled real image pairs. ALI improves relighting quality, especially on glossy, metallic, and transparent materials, and demonstrates that generative relighting is an effective tool for quantifying what visual encoders encode about the physical world.

URL PDF HTML ☆

赞 0 踩 0