arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2601.14103 2026-01-21 cs.CV

Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing

Xiaolu Liu, Yicong Li, Qiyuan He, Jiayin Zhu, Wei Ji, Angela Yao, Jianke Zhu

Comments 22 pages, 12 figures

2601.14101 2026-01-21 cs.CV

Curriculum-Based Strategies for Efficient Cross-Domain Action Recognition

Emily Kim, Allen Wu, Jessica Hodgins

详情

英文摘要

Despite significant progress in human action recognition, generalizing to diverse viewpoints remains a challenge. Most existing datasets are captured from ground-level perspectives, and models trained on them often struggle to transfer to drastically different domains such as aerial views. This paper examines how curriculum-based training strategies can improve generalization to unseen real aerial-view data without using any real aerial data during training. We explore curriculum learning for cross-view action recognition using two out-of-domain sources: synthetic aerial-view data and real ground-view data. Our results on the evaluation on order of training (fine-tuning on synthetic aerial data vs. real ground data) shows that fine-tuning on real ground data but differ in how they transition from synthetic to real. The first uses a two-stage curriculum with direct fine-tuning, while the second applies a progressive curriculum that expands the dataset in multiple stages before fine-tuning. We evaluate both methods on the REMAG dataset using SlowFast (CNN-based) and MViTv2 (Transformer-based) architectures. Results show that combining the two out-of-domain datasets clearly outperforms training on a single domain, whether real ground-view or synthetic aerial-view. Both curriculum strategies match the top-1 accuracy of simple dataset combination while offering efficiency gains. With the two-step fine-tuning method, SlowFast achieves up to a 37% reduction in iterations and MViTv2 up to a 30% reduction compared to simple combination. The multi-step progressive approach further reduces iterations, by up to 9% for SlowFast and 30% for MViTv2, relative to the two-step method. These findings demonstrate that curriculum-based training can maintain comparable performance (top-1 accuracy within 3% range) while improving training efficiency in cross-view action recognition.

URL PDF HTML ☆

赞 0 踩 0

2601.14099 2026-01-21 cs.LG cs.AI

Causal feature selection framework for stable soft sensor modeling based on time-delayed cross mapping

Shi-Shun Chen, Xiao-Yang Li, Enrico Zio

Journal ref Advanced Engineering Informatics 2026, 71, 104337

详情

DOI: 10.1016/j.aei.2026.104337

英文摘要

Soft sensor modeling plays a crucial role in process monitoring. Causal feature selection can enhance the performance of soft sensor models in industrial applications. However, existing methods ignore two critical characteristics of industrial processes. Firstly, causal relationships between variables always involve time delays, whereas most causal feature selection methods investigate causal relationships in the same time dimension. Secondly, variables in industrial processes are often interdependent, which contradicts the decorrelation assumption of traditional causal inference methods. Consequently, soft sensor models based on existing causal feature selection approaches often lack sufficient accuracy and stability. To overcome these challenges, this paper proposes a causal feature selection framework based on time-delayed cross mapping. Time-delayed cross mapping employs state space reconstruction to effectively handle interdependent variables in causality analysis, and considers varying causal strength across time delay. Time-delayed convergent cross mapping (TDCCM) is introduced for total causal inference, and time-delayed partial cross mapping (TDPCM) is developed for direct causal inference. Then, in order to achieve automatic feature selection, an objective feature selection strategy is presented. The causal threshold is automatically determined based on the model performance on the validation set, and the causal features are then selected. Two real-world case studies show that TDCCM achieves the highest average performance, while TDPCM improves soft sensor stability and performance in the worst scenario. The code is publicly available at https://github.com/dirge1/TDPCM.

URL PDF HTML ☆

赞 0 踩 0

2601.14092 2026-01-21 cs.LG cs.NI

Optimizing Energy and Data Collection in UAV-aided IoT Networks using Attention-based Multi-Objective Reinforcement Learning

Babacar Toure, Dimitrios Tsilimantos, Omid Esrafilian, Marios Kountouris

2601.14091 2026-01-21 cs.RO cs.AI

Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems

Hossein Naderi, Alireza Shojaei, Lifu Huang, Philip Agee, Kereshmeh Afsari, Abiola Akanmu

2601.14086 2026-01-21 cs.CV cs.AI cs.LG

Two-Stream temporal transformer for video action classification

Nattapong Kurpukdee, Adrian G. Bors

2601.14084 2026-01-21 cs.CV cs.AI cs.CL

DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning

Abdurrahim Yilmaz, Ozan Erdem, Ece Gokyayla, Ayda Acar, Burc Bugra Dagtas, Dilara Ilhan Erdil, Gulsum Gencoglan, Burak Temelkuran

2601.14079 2026-01-21 cs.CV

VENI: Variational Encoder for Natural Illumination

Paul Walker, James A. D. Gardner, Andreea Ardelean, William A. P. Smith, Bernhard Egger

Comments Project Repo - https://github.com/paul-pw/veni Project page - https://paul-pw.github.io/veni

2601.14069 2026-01-21 cs.CV cs.AI cs.LG

Unsupervised Video Class-Incremental Learning via Deep Embedded Clustering Management

Nattapong Kurpukdee, Adrian G. Bors

2601.14066 2026-01-21 cs.CV

VERIDAH: Solving Enumeration Anomaly Aware Vertebra Labeling across Imaging Sequences

Hendrik Möller, Hanna Schoen, Robert Graf, Matan Atad, Nathan Molinier, Anjany Sekuboyina, Bettina K. Budai, Fabian Bamberg, Steffen Ringhof, Christopher Schlett, Tobias Pischon, Thoralf Niendorf, Josua A. Decker, Marc-André Weber, Bjoern Menze, Daniel Rueckert, Jan S. Kirschke

2601.14060 2026-01-21 cs.CV

Fine-Grained Zero-Shot Composed Image Retrieval with Complementary Visual-Semantic Integration

Yongcong Ye, Kai Zhang, Yanghai Zhang, Enhong Chen, Longfei Li, Jun Zhou

2601.14056 2026-01-21 cs.CV cs.AI

POCI-Diff: Position Objects Consistently and Interactively with 3D-Layout Guided Diffusion

Andrea Rigo, Luca Stornaiuolo, Weijie Wang, Mauro Martino, Bruno Lepri, Nicu Sebe

2601.14055 2026-01-21 cs.CV cs.AI

Decoder-Free Supervoxel GNN for Accurate Brain-Tumor Localization in Multi-Modal MRI

Andrea Protani, Marc Molina Van Den Bosch, Lorenzo Giusti, Heloisa Barbosa Da Silva, Paolo Cacace, Albert Sund Aillet, Miguel Angel Gonzalez Ballester, Friedhelm Hummel, Luigi Serio

Comments 10 pages, 3 figures,

2601.14052 2026-01-21 cs.CV

Vision Also You Need: Navigating Out-of-Distribution Detection with Multimodal Large Language Model

Haoran Xu, Yanlin Liu, Zizhao Tong, Jiaze Li, Kexue Fu, Yuyang Zhang, Longxiang Gao, Shuaiguang Li, Xingyu Li, Yanran Xu, Changwei Wang

2601.14051 2026-01-21 cs.CL cs.AI cs.LG

Kakugo: Distillation of Low-Resource Languages into Small Language Models

Peter Devine, Mardhiyah Sanni, Farid Adilazuarda, Julieta Gil Loizaga, Barry Haddow

2601.14050 2026-01-21 cs.CL

Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering

Yuxin Chen, Zhengzhou Cai, Xiangtian Ji, Weixiang Zhao, An Zhang, Xiang Wang, Tat-Seng Chua

2601.14046 2026-01-21 cs.CL cs.SD

PRiSM: Benchmarking Phone Realization in Speech Models

Shikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim, Kwanghee Choi, Eunjung Yeo, Ryan Soh-Eun Shim, Hanyu Zhou, Brendon Boldt, Karen Rosero Jacome, Kalvin Chang, Darsh Agrawal, Keer Xu, Chao-Han Huck Yang, Jian Zhu, Shinji Watanabe, David R. Mortensen

2601.14041 2026-01-21 cs.CL cs.AI

Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants

Yunhe Wang, Kai Han, Huiling Zhen, Yuchuan Tian, Hanting Chen, Yongbing Huang, Yufei Cui, Yingte Shu, Shan Gao, Ismail Elezi, Roy Vaughan Miles, Songcen Xu, Feng Wen, Chao Xu, Sinan Zeng, Dacheng Tao

2601.14039 2026-01-21 cs.CV cs.AI

Generalizing Abstention for Noise-Robust Learning in Medical Image Segmentation

Wesam Moustafa, Hossam Elsafty, Helen Schneider, Lorenz Sparrenberg, Rafet Sifa

2601.14038 2026-01-21 cs.CV

Correcting and Quantifying Systematic Errors in 3D Box Annotations for Autonomous Driving

Alexandre Justo Miro, Ludvig af Klinteberg, Bogdan Timus, Aron Asefaw, Ajinkya Khoche, Thomas Gustafsson, Sina Sharif Mansouri, Masoud Daneshtalab

Comments Accepted to The IEEE/CVF Winter Conference on Applications of Computer Vision 2026

2601.14032 2026-01-21 cs.CL

RM-Distiller: Exploiting Generative LLM for Reward Model Distillation

Hongli Zhou, Hui Huang, Wei Liu, Chenglong Wang, Xingyuan Bu, Lvyuan Han, Fuhai Song, Muyun Yang, Wenhao Jiang, Hailong Cao, Tiejun Zhao

2601.14030 2026-01-21 cs.CV

Likelihood-Separable Diffusion Inference for Multi-Image MRI Super-Resolution

Samuel W. Remedios, Zhangxing Bian, Shuwen Wei, Aaron Carass, Jerry L. Prince, Blake E. Dewey

2601.14027 2026-01-21 cs.AI

Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics

Junqi Liu, Zihao Zhou, Zekai Zhu, Marco Dos Santos, Weikun He, Jiawei Liu, Ran Wang, Yunzhou Xie, Junqiao Zhao, Qiufeng Wang, Lihong Zhi, Jia Li, Wenda Li

2601.14022 2026-01-21 cs.LG cs.AI

Credible CO2 Comparisons: A Machine Learning Approach to Vehicle Powertrain Assessment

Rodrigo Pereira David, Luciano Araujo Dourado Filho, Daniel Marques da Silva, João Alfredo Cal-Braz

2601.14007 2026-01-21 cs.CL

BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models

Junyu Zhang, Yipeng Kang, Jiong Guo, Jiayu Zhan, Junqi Wang

Comments 34 pagess, 16 figures, 6 tables, submitted to ACL 2026

2601.14000 2026-01-21 cs.RO cs.LG

Group-Invariant Unsupervised Skill Discovery: Symmetry-aware Skill Representations for Generalizable Behavior

Junwoo Chang, Joseph Park, Roberto Horowitz, Jongmin Lee, Jongeun Choi

Comments 14 pages, 6 figures

2601.13995 2026-01-21 cs.CL

From Tags to Trees: Structuring Fine-Grained Knowledge for Controllable Data Selection in LLM Instruction Tuning

Zihan Niu, Wenping Hu, Junmin Chen, Xiyue Wang, Tong Xu, Ruiming Tang

2601.13989 2026-01-21 cs.LG

A universal linearized subspace refinement framework for neural networks

Wenbo Cao, Weiwei Zhang

详情

英文摘要

Neural networks are predominantly trained using gradient-based methods, yet in many applications their final predictions remain far from the accuracy attainable within the model's expressive capacity. We introduce Linearized Subspace Refinement (LSR), a general and architecture-agnostic framework that exploits the Jacobian-induced linear residual model at a fixed trained network state. By solving a reduced direct least-squares problem within this subspace, LSR computes a subspace-optimal solution of the linearized residual model, yielding a refined linear predictor with substantially improved accuracy over standard gradient-trained solutions, without modifying network architectures, loss formulations, or training procedures. Across supervised function approximation, data-driven operator learning, and physics-informed operator fine-tuning, we show that gradient-based training often fails to access this attainable accuracy, even when local linearization yields a convex problem. This observation indicates that loss-induced numerical ill-conditioning, rather than nonconvexity or model expressivity, can constitute a dominant practical bottleneck. In contrast, one-shot LSR systematically exposes accuracy levels not fully exploited by gradient-based training, frequently achieving order-of-magnitude error reductions. For operator-constrained problems with composite loss structures, we further introduce Iterative LSR, which alternates one-shot LSR with supervised nonlinear alignment, transforming ill-conditioned residual minimization into numerically benign fitting steps and yielding accelerated convergence and improved accuracy. By bridging nonlinear neural representations with reduced-order linear solvers at fixed linearization points, LSR provides a numerically grounded and broadly applicable refinement framework for supervised learning, operator learning, and scientific computing.

URL PDF HTML ☆

赞 0 踩 0

2601.13986 2026-01-21 cs.CV eess.IV

Equivariant Learning for Unsupervised Image Dehazing

Zhang Wen, Jiangwei Xie, Dongdong Chen

Comments Technical report

2601.13979 2026-01-21 cs.RO

Active Cross-Modal Visuo-Tactile Perception of Deformable Linear Objects

Raffaele Mazza, Ciro Natale, Pietro Falco

AI 大模型

视觉与机器人

科学与医疗

Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing

Curriculum-Based Strategies for Efficient Cross-Domain Action Recognition

Causal feature selection framework for stable soft sensor modeling based on time-delayed cross mapping

Optimizing Energy and Data Collection in UAV-aided IoT Networks using Attention-based Multi-Objective Reinforcement Learning

Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems

Two-Stream temporal transformer for video action classification

DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning

VENI: Variational Encoder for Natural Illumination

Unsupervised Video Class-Incremental Learning via Deep Embedded Clustering Management

VERIDAH: Solving Enumeration Anomaly Aware Vertebra Labeling across Imaging Sequences

Fine-Grained Zero-Shot Composed Image Retrieval with Complementary Visual-Semantic Integration

POCI-Diff: Position Objects Consistently and Interactively with 3D-Layout Guided Diffusion

Decoder-Free Supervoxel GNN for Accurate Brain-Tumor Localization in Multi-Modal MRI

Vision Also You Need: Navigating Out-of-Distribution Detection with Multimodal Large Language Model

Kakugo: Distillation of Low-Resource Languages into Small Language Models

Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering

PRiSM: Benchmarking Phone Realization in Speech Models

Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants

Generalizing Abstention for Noise-Robust Learning in Medical Image Segmentation

Correcting and Quantifying Systematic Errors in 3D Box Annotations for Autonomous Driving

RM-Distiller: Exploiting Generative LLM for Reward Model Distillation

Likelihood-Separable Diffusion Inference for Multi-Image MRI Super-Resolution

Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics

Credible CO2 Comparisons: A Machine Learning Approach to Vehicle Powertrain Assessment

BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models

Group-Invariant Unsupervised Skill Discovery: Symmetry-aware Skill Representations for Generalizable Behavior

From Tags to Trees: Structuring Fine-Grained Knowledge for Controllable Data Selection in LLM Instruction Tuning

A universal linearized subspace refinement framework for neural networks

Equivariant Learning for Unsupervised Image Dehazing

Active Cross-Modal Visuo-Tactile Perception of Deformable Linear Objects