arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2502.04501 2026-04-14 cs.CL

Ultra-Low-Dimensional Prompt Tuning via Random Projection

Zijun Wu, Yongchang Hao, Lili Mou

Comments Accepted by EACL 2026 (Main Conference, Long Paper)

2501.07773 2026-04-14 cs.LG

Symmetry-Aware Generative Modeling through Learned Canonicalization

Kusha Sareen, Daniel Levy, Arnab Kumar Mondal, Sékou-Oumar Kaba, Tara Akhound-Sadegh, Siamak Ravanbakhsh

Comments NeurReps 2024 Workshop Version

2501.06416 2026-04-14 cs.LG cs.AI cs.HC

Influencing Humans to Conform to Preference Models for RLHF

Stephane Hatgis-Kessell, W. Bradley Knox, Serena Booth, Peter Stone

2412.20704 2026-04-14 cs.CV cs.LG

HFI: A unified framework for training-free detection and implicit watermarking of latent diffusion model generated images

Sungik Choi, Hankook Lee, Jaehoon Lee, Seunghyun Kim, Stanley Jungkyu Choi, Moontae Lee

2412.15803 2026-04-14 cs.LG cs.AI

WebLLM: A High-Performance In-Browser LLM Inference Engine

Charlie F. Ruan, Yucheng Qin, Akaash R. Parthasarathy, Xun Zhou, Ruihang Lai, Hongyi Jin, Yixin Dong, Bohan Hou, Meng-Shiun Yu, Yiyan Zhai, Sudeep Agarwal, Hangrui Cao, Siyuan Feng, Tianqi Chen

2411.17163 2026-04-14 cs.CV

OSDFace: One-Step Diffusion Model for Face Restoration

Jingkai Wang, Jue Gong, Lin Zhang, Zheng Chen, Xing Liu, Hong Gu, Yutong Liu, Yulun Zhang, Xiaokang Yang

Comments Accepted to CVPR 2025. The code and model will be available at https://github.com/jkwang28/OSDFace

2411.14072 2026-04-14 cs.CL cs.PL

The Master-Slave Encoder Model for Improving Patent Text Summarization: A New Approach to Combining Specifications and Claims

Shu Zhou, Xin Wang, Zhengda Zhou, Haohan Yi, Xuhui Zheng, Hao Wan

Comments 25pages, 1 figure

2411.11259 2026-04-14 cs.LG

Graph Retention Networks for Dynamic Graphs

Qian Chang, Xia Li, Xiufeng Cheng, Runsong Jia, Jinqing Yang, Guoping Hu, Ciprian Doru Giurcaneanu

Comments Accepted as a full paper at ACM Web Conference 2026 (WWW 2026)

2410.21316 2026-04-14 cs.LG cs.AI cs.DC cs.ET cs.PF

Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading

Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae

详情

DOI: 10.1145/3652892.3700781

英文摘要

Transformers and large language models~(LLMs) have seen rapid adoption in all domains. Their sizes have exploded to hundreds of billions of parameters and keep increasing. Under these circumstances, the training of transformers is very expensive and often hits a ``memory wall'', i.e., even when using 3D parallelism (pipeline, tensor, data) and aggregating the memory of many GPUs, it is still not enough to hold the necessary data structures (model parameters, optimizer state, gradients, activations) in GPU memory. To compensate, state-of-the-art approaches offload the optimizer state, at least partially, to the host memory and perform hybrid CPU-GPU computations. However, the management of the combined host-GPU memory is often suboptimal and results in poor overlapping between data movements and computations. This leads to missed opportunities to simultaneously leverage the interconnect bandwidth and computational capabilities of CPUs and GPUs. In this paper, we leverage a key observation that the interleaving of the forward, backward, and update phases generates fluctuations in the GPU memory utilization, which can be exploited to dynamically move a part of the optimizer state between the host and the GPU memory at each iteration. To this end, we design and implement Deep Optimizer States, a novel technique to split the LLM into subgroups, whose update phase is scheduled on either the CPU or the GPU based on our proposed performance model that addresses the trade-off between data movement cost, acceleration on the GPUs vs the CPUs, and competition for shared resources. We integrate our approach with DeepSpeed and demonstrate 2.5$\times$ faster iterations over state-of-the-art approaches using extensive experiments.

URL PDF HTML ☆

赞 0 踩 0

2407.15389 2026-04-14 cs.LG cs.CR cs.DC

Poisoning with A Pill: Circumventing Detection in Federated Learning

Hanxi Guo, Hao Wang, Tao Song, Tianhang Zheng, Yang Hua, Haibing Guan, Xiangyu Zhang

Comments Accepted by AAAI 2026

2407.10953 2026-04-14 cs.CL

A Multilingual Dataset and Empirical Validation for the Mutual Reinforcement Effect in Information Extraction

Chengguang Gan, Sunbowen Lee, Qingyu Yin, Yunhao Liang, Xinyang He, Hanjun Wei, Younghun Lim, Shijian Wang, Hexiang Huang, Qinghao Zhang, Shiwen Ni, Tatsunori Mori

Comments Accepted by ACL 2026 Findings

2406.09588 2026-04-14 cs.CV cs.LG

Learning Color Equivariant Representations

Yulong Yang, Felix O'Mahony, Christine Allen-Blanchette

Comments Accept to The 13th International Conference on Learning Representations (ICLR 2025)

2406.05358 2026-04-14 cs.LG math.OC

Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management

Huiling Meng, Ningyuan Chen, Xuefeng Gao

2403.15651 2026-04-14 cs.CV

GaNI: Global and Near Field Illumination Aware Neural Inverse Rendering

Jiaye Wu, Saeed Hadadan, Geng Lin, Matthias Zwicker, David Jacobs, Roni Sengupta

2403.01919 2026-04-14 cs.LG

Randomized Approach to Matrix Completion: Applications in Recommendation Systems and Image Inpainting

Antonina Krajewska, Ewa Niewiadomska-Szynkiewicz

Journal ref Machine Learning, 115, 57 (2026)

2309.17257 2026-04-14 cs.CV

A Survey on Deep Learning Techniques for Action Anticipation

Zeyun Zhong, Manuel Martin, Michael Voit, Juergen Gall, Jürgen Beyerer

Comments If any relevant references are missing, please contact the authors for future inclusion

2308.12067 2026-04-14 cs.LG cs.AI cs.CL cs.CV

MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets

Lai Wei, Xiaozhe Li, Zihao Jiang, Weiran Huang, Lichao Sun

Comments Published at Artificial Intelligence for Engineering

2307.01139 2026-04-14 cs.CV cs.AI cs.CL cs.LG

SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions

Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge, Karl Pazdernik

Comments In Proceedings of the 1st Workshop on NLP for Science, Association for Computational Linguistics

Journal ref Proc. 1st Workshop on Natural Language Processing for Science (NLP4Science 2024) (2024) 58-72

2305.15404 2026-04-14 cs.CV

RoMa: Robust Dense Feature Matching

Johan Edstedt, Qiyu Sun, Georg Bökman, Mårten Wadenbäck, Michael Felsberg

2305.14299 2026-04-14 cs.CL cs.AI

Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings

Minsik Oh, Jiwei Li, Guoyin Wang

Comments Accepted to ACL 2026

2305.09958 2026-04-14 cs.LG cs.SI

SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

Haoyu Liu, Ningyi Liao, Siqiang Luo

Comments ICDE 2025

1810.07793 2026-04-14 cs.LG stat.ML

The Wasserstein transform

Kun Jin, Facundo Mémoli, Zane Smith, Zhengchao Wan

2604.10358 2026-04-14 cs.RO

COSMIK-MPPI: Scaling Constrained Model Predictive Control to Collision Avoidance in Close-Proximity Dynamic Human Environments

Ege Gursoy, Maxime Sabbah, Arthur Haffemayer, Joao Cavalcanti Santos, Pietro Noah Crestaz, Vladimir Petrik, Nicolas Mansard, Vincent Bonnet

2604.10352 2026-04-14 cs.AI cs.OS cs.SE

ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents

Mofasshara Rafique, Laurent Bindschaedler

Comments 8 pages, 1 figure, 10 tables; accepted at EuroMLSys '26 (6th Workshop on Machine Learning and Systems, co-located with EuroSys 2026)

2604.10347 2026-04-14 cs.CV

Multi-modal, multi-scale representation learning for satellite imagery analysis just needs a good ALiBi

Patrick Kage, Pavlos Andreadis

Comments Originally appeared at the 4th Space Imaging Workshop at the Georgia Institute of Technology, October 7-9, 2024

2604.10344 2026-04-14 cs.CV

Context Matters: Vision-Based Depression Detection Comparing Classical and Deep Approaches

Maneesh Bilalpur, Saurabh Hinduja, Sonish Sivarajkumar, Nicholas Allen, Yanshan Wang, Itir Onal Ertugrul, Jeffrey F. Cohn

2604.10343 2026-04-14 cs.LG

WaterAdmin: Orchestrating Community Water Distribution Optimization via AI Agents

Jiaqi Wen, Pingbo Tang, Shaolei Ren, Jianyi Yang

2604.10341 2026-04-14 cs.AI

VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline

Xuan Liu, Dheeraj Kodakandla, Kushagra Srivastva, Mahfuza Farooque

2604.10337 2026-04-14 cs.LG

Integrating SAINT with Tree-Based Models: A Case Study in Employee Attrition Prediction

Adil Derrazi, Javad Pourmostafa Roshan Sharami

Comments Accepted at IntelliSys 2025 (Springer LNNS)

Journal ref Published in Intelligent Systems and Applications (IntelliSys 2025), LNNS, Springer, 2025

详情

DOI: 10.1007/978-3-031-99958-1_27

英文摘要

Employee attrition presents a major challenge for organizations, increasing costs and reducing productivity. Predicting attrition accurately enables proactive retention strategies, but existing machine learning models often struggle to capture complex feature interactions in tabular HR datasets. While tree-based models such as XGBoost and LightGBM perform well on structured data, traditional encoding techniques like one-hot encoding can introduce sparsity and fail to preserve semantic relationships between categorical features. This study explores a hybrid approach by integrating SAINT (Self-Attention and Intersample Attention Transformer)-generated embeddings with tree-based models to enhance employee attrition prediction. SAINT leverages self-attention mechanisms to model intricate feature interactions. In this study, we explore SAINT both as a standalone classifier and as a feature extractor for tree-based models. We evaluate the performance, generalizability, and interpretability of standalone models (SAINT, XGBoost, LightGBM) and hybrid models that combine SAINT embeddings with tree-based classifiers. Experimental results show that standalone tree-based models outperform both the standalone SAINT model and the hybrid approaches in predictive accuracy and generalization. Contrary to expectations, the hybrid models did not improve performance. One possible explanation is that tree-based models struggle to utilize dense, high-dimensional embeddings effectively. Additionally, the hybrid approach significantly reduced interpretability, making model decisions harder to explain. These findings suggest that transformer-based embeddings, while capturing feature relationships, do not necessarily enhance tree-based classifiers. Future research should explore alternative fusion strategies for integrating deep learning with structured data.

URL PDF HTML ☆

赞 0 踩 0

2604.10335 2026-04-14 cs.CL cs.LG

Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation

Mohamed Ehab, Ali Hamdi

AI 大模型

视觉与机器人

科学与医疗

Ultra-Low-Dimensional Prompt Tuning via Random Projection

Symmetry-Aware Generative Modeling through Learned Canonicalization

Influencing Humans to Conform to Preference Models for RLHF

HFI: A unified framework for training-free detection and implicit watermarking of latent diffusion model generated images

WebLLM: A High-Performance In-Browser LLM Inference Engine

OSDFace: One-Step Diffusion Model for Face Restoration

The Master-Slave Encoder Model for Improving Patent Text Summarization: A New Approach to Combining Specifications and Claims

Graph Retention Networks for Dynamic Graphs

Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading

Poisoning with A Pill: Circumventing Detection in Federated Learning

A Multilingual Dataset and Empirical Validation for the Mutual Reinforcement Effect in Information Extraction

Learning Color Equivariant Representations

Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management

GaNI: Global and Near Field Illumination Aware Neural Inverse Rendering

Randomized Approach to Matrix Completion: Applications in Recommendation Systems and Image Inpainting

A Survey on Deep Learning Techniques for Action Anticipation

MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets

SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions

RoMa: Robust Dense Feature Matching

Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings

SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

The Wasserstein transform

COSMIK-MPPI: Scaling Constrained Model Predictive Control to Collision Avoidance in Close-Proximity Dynamic Human Environments

ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents

Multi-modal, multi-scale representation learning for satellite imagery analysis just needs a good ALiBi

Context Matters: Vision-Based Depression Detection Comparing Classical and Deep Approaches

WaterAdmin: Orchestrating Community Water Distribution Optimization via AI Agents

VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline

Integrating SAINT with Tree-Based Models: A Case Study in Employee Attrition Prediction

Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation