arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

视频大模型

视频理解、视频生成、视频语言模型和时序视觉推理。

今日/当前日期收录 14 信号源:cs.CV, eess.IV, cs.MM
2606.18702 2026-06-18 cs.CV 新提交 专题 95

UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation

UniTemp: 通过双向蒸馏实现任意时间顺序的视频生成

Lin Zhang, Sicheng Mo, Zefan Cai, Jinhong Lin, Zihao Lin, Jiuxiang Gu, Krishna Kumar Singh, Yuheng Li, Yin Li

专题命中 视频生成 :任意时间顺序的视频生成方法

AI总结 提出UniTemp框架,通过双向蒸馏训练单个自回归模型,支持任意时间方向(前向、后向、中间插值)的视频生成,解决因果3D VAE在后向生成中的不连续性,提升可控性。

2606.18478 2026-06-18 cs.CV 新提交 专题 95

Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation

数据强制蒸馏:恢复少步视频生成中的多样性和保真度

Siyi Chen, Shaowei Liu, Yixuan Jia, Zian Wang, Huan Ling, Qing Qu, Jun Gao

专题命中 视频生成 :少步视频生成中的蒸馏方法

AI总结 针对分布匹配蒸馏(DMD)在少步视频生成中出现的模式坍塌和过饱和问题,提出数据强制蒸馏(DFD)框架,通过教师评分差异引导学生接近真实数据分布,仅需一行代码修改即可恢复多样性和保真度。

2605.21028 2026-06-18 cs.CV cs.AI 版本更新 专题 95

DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

DySink:动态帧 sinks 用于自回归长视频生成

Bo Ye, Xinyu Cui, Jian Zhao, Tong Wei, Min-Ling Zhang

专题命中 视频生成 :提出DySink框架用于自回归长视频生成,核心是视频生成。

AI总结 本文提出 DySink,一种基于检索的框架,通过维护紧凑的记忆银行并选择视觉相关的历史帧作为动态帧 sinks,以提高自回归长视频生成的动态性和时间质量。

2502.07531 2026-06-18 cs.CV cs.AI cs.LG cs.MM 版本更新 专题 95

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

VidCRAFT3: 面向图像到视频生成的相机、物体与光照控制

Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu

专题命中 视频生成 :可控图像到视频生成,控制相机、物体和光照

AI总结 提出VidCRAFT3框架,通过显式建模几何、运动与光照的跨因素交互,实现对相机运动、物体运动和光照方向的独立或联合控制,在控制精度和视觉一致性上达到最优。

Comments Accepted to TVCG 2026

2606.18591 2026-06-18 cs.CV 新提交 专题 90

Bridging Creative Intent and Visual Quality: Creator-Driven Recurrent Video Generation with Agentic Feedback Loops

桥接创意意图与视觉质量:基于创作者驱动的循环视频生成与代理反馈循环

Denis Savytski, Aiden Lei, Heding Liu, Warren Yang, Sihan Liang, Alexander Liu, Zhe Zhao

专题命中 视频生成 :CHIEF框架实现创作者驱动循环视频生成

AI总结 提出CHIEF框架,通过人类-AI协作的迭代视频精炼,结合创作者驱动和代理主观反馈,提升长视频的叙事连贯性与创意方向。

Comments Accepted to the Workshop on Human-AI Co-Creativity at ICML 2026

2606.13768 2026-06-18 cs.CV cs.AI 新提交 专题 90

CineOrchestra: Unified Entity-Centric Conditioning for Cinematic Video Generation

CineOrchestra:面向电影视频生成的统一实体中心条件控制

Sharath Girish, Tsai-Shien Chen, Zhikang Dong, Mukesh Singhal, Hao Chen, Sergey Tulyakov, Aliaksandr Siarohin

专题命中 视频生成 :统一控制主体、事件、相机和镜头切换的视频生成

AI总结 提出CineOrchestra,一种统一控制主体、事件、相机和镜头切换的视频扩散模型,通过实体中心条件原语和参数无关的旋转位置编码实现多轴联合控制,在密集描述跟随和镜头切换时序上超越六种专用方法。

Comments Project page: https://snap-research.github.io/CineOrchestra

2606.06361 2026-06-18 cs.CV 版本更新 专题 90

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

两步物理:在视觉细化之前锁定运动先验会擦除它们

Woojung Han, Seil Kang, Youngjun Jun, Min-Hung Chen, Fu-En Yang, Seong Jae Hwang

专题命中 视频生成 :图像到视频扩散模型的物理一致性改进

AI总结 本文发现图像到视频扩散模型在两步生成中比多步生成具有更好的物理一致性,通过频谱分析将原因归结为去噪过程中的相位侵蚀,并提出无需训练的PhaseLock框架,通过从两步推理中提取运动先验并利用潜在增量引导强制到高保真生成中,有效缓解相位退化,提升物理一致性平均6.2点,同时保持视觉保真度且开销极小。

Comments ICML 2026

2605.15824 2026-06-18 cs.CV 版本更新 专题 90

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

FashionChameleon:迈向实时和交互式的人体服装视频定制

Quanjian Song, Yefeng Shen, Mengting Chen, Hao Sun, Jinsong Lan, Xiaoyong Zhu, Bo Zheng, Liujuan Cao

专题命中 视频生成 :提出实时交互式人体服装视频定制框架。

AI总结 本文提出FashionChameleon框架,通过单件服装视频数据实现交互式多服装视频定制,保留动作一致性,实现实时生成23.8FPS,比现有方法快30-180倍。

Comments Project Page: https://quanjiansong.github.io/projects/FashionChameleon/

2510.21615 2026-06-18 cs.CV 版本更新 专题 90

Epipolar Geometry Improves Video Generation Models

极线几何改进视频生成模型

Orest Kupyn, Théo Uscidda, Marta Tintore Gazulla, Fabian Manhardt, Federico Tombari, Christian Rupprecht

专题命中 视频生成 :利用极线几何约束改进视频生成模型的几何一致性。

AI总结 针对视频生成模型几何不一致和运动伪影问题,提出基于极线几何约束的偏好优化方法,在保持视觉质量的同时将极线误差降低31%,人类评分一致性从54%提升至72%。

2606.19271 2026-06-18 cs.DC 新提交 专题 85

TurboServe: Serving Streaming Video Generation Efficiently and Economically

TurboServe: 高效经济地服务流式视频生成

Youhe Jiang, Haoxu Wang, Haotong Bao, Kai Jiang, Jianfei Chen, Jun Zhu, Fangcheng Fu, Jintao Zhang

专题命中 视频生成 :流式视频生成服务系统TurboServe

AI总结 针对流式视频生成的会话时长和用户需求异构性,提出TurboServe系统,通过在线调度联合优化会话放置与GPU配置,采用迁移感知放置和负载驱动自动缩放,降低延迟和成本。

2606.02800 2026-06-18 cs.CV cs.AI cs.LG cs.MM cs.RO 版本更新 专题 85

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3:面向物理AI的全模态世界模型

NVIDIA, :, Aditi, Niket Agarwal, Arslan Ali, Jon Allen, Martin Antolini, Adeline Aubame, Alisson Azzolini, Junjie Bai, Maciej Bala, Yogesh Balaji, Josh Bapst, Aarti Basant, Mukesh Beladiya, Mohammad Qazim Bhat, Zaid Pervaiz Bhat, Dan Blick, Vanni Brighella, Han Cai, Tiffany Cai, Eric Cameracci, Jiaxin Cao, Yulong Cao, Mark Carlson, Carlos Casanova, Ting-Yun Chang, Yan Chang, Yu-Wei Chao, Prithvijit Chattopadhyay, Roshan Chaudhari, Chieh-Yun Chen, Junyu Chen, Ke Chen, Qizhi Chen, Wenkai Chen, Xiaotong Chen, Yu Chen, An-Chieh Cheng, Click Cheng, Xiu Chia, Jeana Choi, Chaeyeon Chung, Wenyan Cong, Yin Cui, Magdalena Dadela, Nalin Dadhich, Wenliang Dai, Joyjit Daw, Alperen Degirmenci, Rodrigo Vieira Del Monte, Robert Denomme, Sameer Dharur, Marco Di Lucca, Ke Ding, Wenhao Ding, Yifan Ding, Yuzhu Dong, Nicole Drumheller, Yilun Du, Aigul Dzhumamuratova, Aleksandr Efitorov, Hamid Eghbalzadeh, Naomi Eigbe, Imad El Hanafi, Hassan Eslami, Benedikt Falk, Jiaojiao Fan, Jim Fan, Amol Fasale, Sergiy Fefilatyev, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Vikram Fugro, Prashant Gaikwad, TJ Galda, Katelyn Gao, Yihuai Gao, Wenhang Ge, Sreyan Ghosh, Arushi Goel, Vivek Goel, Akash Gokul, Rama Govindaraju, Jinwei Gu, Miguel Guerrero, Elfie Guo, Aryaman Gupta, Siddharth Gururani, Hugo Hadfield, Song Han, Ankur Handa, Zekun Hao, Mohammad Harrim, Ali Hassani, Nathan Hayes-Roth, Yufan He, Chris Helvig, Cyrus Hogg, Madison Huang, Michael Huang, Sophia Huang, Yufan Huang, Jacob Huffman, DeLesley Hutchins, Suneel Indupuru, Boris Ivanovic, Arihant Jain, Joel Jang, Ryan Ji, Yanan Jian, Dongfu Jiang, Jingyi Jin, Atharva Joshi, Nikhilesh Joshi, Pranjali Joshi, Andy Ju, Jaehun Jung, Weiwei Kang, Scott Kassekert, Jan Kautz, Ashna Khetan, Julia Kiczka, Slawek Kierat, Gwanghyun Kim, Kuno Kim, Sunny Kim, Kezhi Kong, Xin Kong, Zhifeng Kong, Tomasz Kornuta, Egor Krivov, Hui Kuang, Saurav Kumar, Chia-Wen Kuo, George Kurian, Wojciech Kutak, JF Lafleche, Himangshu Lahkar, Omar Laymoun, Jayjun Lee, Sanggil Lee, Gabriele Leone, Boyi Li, Freya Li, Jiajun Li, Jinfeng Li, Ling Li, Pengcheng Li, Shangru Li, Tingle Li, Xiaolong Li, Xuan Li, Zhaoshuo Li, Zhiqi Li, Hao Liang, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Ming-Yu Liu, Sifei Liu, Zihan Liu, Hai Loc Lu, Xiangyu Lu, Alice Luo, Ruipu Luo, Wenjie Luo, Jiangran Lyu, Martin Ding Ma, Nic Ma, Qianli Ma, Dawid Majchrowski, Louis Marcoux, Miguel Martin, Qing Miao, Ashkan Mirzaei, Shreyas Misra, Kaichun Mo, Durra Mohsin, Hyejin Moon, Pawel Morkisz, Saeid Motiian, Kirill Motkov, Seungjun Nah, Yashraj Narang, Deepak Narayanan, Thabang Ngazimbi, Julian Ouyang, Shubham Pachori, David Page, Yatian Pang, Sehwi Park, Mahesh Patekar, Mostofa Patwary, Marco Pavone, Trung Pham, Wei Ping, Soha Pouya, Shrimai Prabhumoye, Varun Praveen, Delin Qu, Hesam Rabeti, Morteza Ramezanali, Marilyn Reeb, Xuanchi Ren, Kristen Rumley, Wojciech Rymer, Jun Saito, Yeongho Seol, John Shao, Piyush Shekdar, Tianwei Shen, Humphrey Shi, Min Shi, Stella Shi, Kevin Shih, Mohammad Shoeybi, Mateusz Sieniawski, Shuran Song, Alexander Sotelo, Amir Sotoodeh, Sunil Srinivasa, Vignesh Srinivasakumar, Bartosz Stefaniak, Rahul Heinrich Steiger, Shangkun Sun, Jiaxiang Tang, Shitao Tang, Yangyang Tang, Yue Tang, Tolou Tavakkoli, Kayley Ting, Krzysztof Tomala, Wei-Cheng Tseng, Jibin Varghese, Sergei Vasilev, Thomas Volk, Raju Wagwani, Roger Waleffe, Andrew Z. Wang, Boxiang Wang, Haoxiang Wang, Qiao Wang, Shihao Wang, Shijie Wang, Ting-Chun Wang, Yan Wang, Yu Wang, Rohit Watve, David Wehr, Fangyin Wei, Xinshuo Weng, Jay Zhangjie Wu, Kedi Wu, Hongchi Xia, Summer Xiao, Tianjun Xiao, Kevin Xie, Daguang Xu, Jiashu Xu, Mengyao Xu, Ruqing Xu, Xingqian Xu, Yao Xu, Dinghao Yang, Dong Yang, Hans Yang, Xiaodong Yang, Xuning Yang, Yichu Yang, Yurong You, Zhiding Yu, Hao Yuan, Simon Yuen, Xiaohui Zeng, Pengcuo Zeren, Cindy Zha, Haotian Zhang, Jenny Zhang, Jing Zhang, Liangkai Zhang, Paris Zhang, Shun Zhang, Xuanmeng Zhang, Zhizheng Zhang, Ann Zhao, Yilin Zhao, Yuliya Zhautouskaya, Charles Zhou, Fengzhe Zhou, Shilin Zhu, Yuke Zhu, Dima Zhylko, Artur Zolkowski

专题命中 视频生成 :视频生成能力,世界模拟器

AI总结 提出基于统一混合Transformer架构的全模态世界模型Cosmos 3,联合处理语言、图像、视频、音频和动作序列,在理解和生成任务上达到新最优,为具身智能体提供可扩展的通用骨干。

2606.17030 2026-06-18 cs.CV 新提交 专题 80

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Qwen-RobotWorld技术报告:通过语言条件视频生成统一具身世界模型

Jie Zhang, Xiaoyue Chen, Anzhe Chen, Dayiheng Liu, Deqing Li, Gengze Zhou, Hale Yin, Haoqi Yuan, Haoyang Li, Jiahao Li, Jiazhao Zhang, Jingren Zhou, Kaiyuan Gao, Kun Yan, Lihan Jiang, Ningyuan Tang, Pei Lin, Qihang Peng, Shengming Yin, Tianhe Wu, Tianyi Yan, Xiao Xu, Yan Shu, Yanran Zhang, Ye Wang, Yi Wang, Yilei Chen, Yixian Xu, Yiyang Huang, Yuxiang Chen, Zekai Zhang, Zhendong Wang, Zixing Lei, Zhixuan Liang, Zihao Liu, Zikai Zhou, Chenxu Lv, Xiong-Hui Chen, Chenfei Wu

专题命中 视频生成 :视频世界模型,生成未来视觉轨迹

AI总结 提出Qwen-RobotWorld,一种以自然语言为统一动作接口的语言条件视频世界模型,通过双流MMDiT、大规模具身世界知识语料和渐进式课程训练,在机器人操作、自动驾驶等任务中实现物理一致的未来视觉轨迹预测,在多个基准上取得最优结果。

2606.13376 2026-06-18 cs.CV 新提交 专题 80

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

MoVerse: 基于全景高斯支架的实时视频世界建模

Yang Zhou, Ziheng Wang, Yuqin Lu, Haofeng Liu, Jun Liang, Shengfeng He, Jing Li

专题命中 视频生成 :实时视频世界建模与渲染

AI总结 提出MoVerse,从单张窄视场图像实时构建可交互漫游的360度全景世界,通过拓扑感知扩散补全视场、全景几何残差预测生成3D高斯支架,并结合双向扩散教师蒸馏为因果自回归学生实现低延迟视频渲染。

Comments Project Page: https://orange-3dv-team.github.io/MoVerse/

2606.19163 2026-06-18 cs.DC 新提交 专题 60

Pulse: Training Acceleration for Large Diffusion Models with Automatic Pipeline Parallelism

Pulse: 面向大规模扩散模型的自动流水线并行训练加速

Boran Sun, Guoyong Jiang, Lin Zhang, Chen Chen, Yuechen Tao, Zhishu Che, Jieling Yu, Shan Chang, Huaxi Gu, Fangming Liu, Bo Li

专题命中 视频生成 :方法适用于视频生成模型训练加速

AI总结 提出PULSE自动流水线并行策略,通过将跳跃连接层同设备放置、局部缓存激活值,消除跨流水线通信,结合动态规划分区器、ILP调度合成器和混合并行调优器,在通信受限硬件上实现最高2.3倍吞吐提升。

Comments Accepted by International Conference on Distributed Computing Systems(ICDCS'26)