arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2604.17243 2026-04-21 cs.CV

RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation

Rui Min, Liang Yao, Shiyu Miao, Shengxiang Xu, Yuxuan Liu, Chuanyi Zhang, Shimin Di, Fan Liu

2604.17241 2026-04-21 cs.RO

GaLa: Hypergraph-Guided Visual Language Models for Procedural Planning

Kun Wang, Yiming Li, Mingcheng Qu, Aqiang Zhang, Guang Yang, Tonghua Su

Comments 14pages, 7figures

Journal ref ACL 2026(Findings)

2604.17240 2026-04-21 cs.AI

Safe and Policy-Compliant Multi-Agent Orchestration for Enterprise AI

Vinil Pasupuleti, Shyalendar Reddy Allala, Siva Rama Krishna Varma Bayyavarapu, Shrey Tyagi

Comments 6 pages, 3 figures, 3 tables, IEEE conference format

2604.17233 2026-04-21 cs.CV cs.AI

Enhancing Zero-shot Personalized Image Aesthetics Assessment with Profile-aware Multimodal LLM

Chun Wang, Chenfeng Wei, Chenyang Liu, Weihong Deng

2604.17231 2026-04-21 cs.CV cs.RO

Fringe Projection Based Vision Pipeline for Autonomous Hard Drive Disassembly

Badrinath Balasubramaniam, Vignesh Suresh, Benjamin Metcalf, Beiwen Li

Comments 20 pages, 11 figures

详情

英文摘要

Unrecovered e-waste represents a significant economic loss. Hard disk drives (HDDs) comprise a valuable e-waste stream necessitating robotic disassembly. Automating the disassembly of HDDs requires holistic 3D sensing, scene understanding, and fastener localization, however current methods are fragmented, lack robust 3D sensing, and lack fastener localization. We propose an autonomous vision pipeline which performs 3D sensing using a Fringe Projection Profilometry (FPP) module, with selective triggering of a depth completion module where FPP fails, and integrates this module with a lightweight, real-time instance segmentation network for scene understanding and critical component localization. By utilizing the same FPP camera-projector system for both our depth sensing and component localization modules, our depth maps and derived 3D geometry are inherently pixel-wise aligned with the segmentation masks without registration, providing an advantage over RGB-D perception systems common in industrial sensing. We optimize both our trained depth completion and instance segmentation networks for deployment-oriented inference. The proposed system achieves a box mAP@50 of 0.960 and mask mAP@50 of 0.957 for instance segmentation, while the selected depth completion configuration with the Depth Anything V2 Base backbone achieves an RMSE of 2.317 mm and MAE of 1.836 mm; the Platter Facing learned inference stack achieved a combined latency of 12.86 ms and a throughput of 77.7 Frames Per Second (FPS) on the evaluation workstation. Finally, we adopt a sim-to-real transfer learning approach to augment our physical dataset. The proposed perception pipeline provides both high-fidelity semantic and spatial data which can be valuable for downstream robotic disassembly. The synthetic dataset developed for HDD instance segmentation will be made publicly available.

URL PDF HTML ☆

赞 0 踩 0

2604.17229 2026-04-21 cs.AI

Yanasse: Finding New Proofs from Deep Vision's Analogies, Part 1

Alexandre Linhares

2604.17228 2026-04-21 cs.LG

Revisiting Auxiliary Losses for Conditional Depth Routing: An Empirical Study

Qingwei Lin

Comments 23 pages, 4 figures. Preprint. Controlled empirical study with 3-seed runs at 157.5M parameters; includes a negative result on oracle-style utility/rank supervision for conditional depth routing

详情

英文摘要

Conditional depth execution routes a subset of tokens through a lightweight cheap FFN while the remainder execute the standard full FFN at each controlled layer. The central difficulty is gate training: the gate decision must propagate through many layers before it influences the language modeling (LM) loss, so the resulting gradients are weak and noisy. Auxiliary losses are commonly stacked to stabilise training, yet the interactions among them -- particularly between a predictive auxiliary and explicit score supervision -- have not been systematically compared under controlled conditions. We evaluate two gate designs under a 157.5M-parameter decoder-only model with controller-only training, 50% full-path budget, and 3-seed runs on a fineweb-edu subset. The MLP gate (G1) maps the current hidden state to a utility score; the JEPA-guided gate (G3) adds an action-conditional predictor that forecasts, in a low-dimensional latent space, the outcome of executing full vs. cheap per token, aligned against a fixed target head. Under the standard recipe with oracle-style utility regression and pairwise rank supervision (util/rank), G3 improves early-to-mid optimisation over G1 in 3/3 seeds (lower avg LM, faster threshold hits, ~10.3x lower grad norms), with 20k-step endpoint LM within a 0.005 heuristic reference. A key finding (ablation A3): jointly removing util/rank improves best/avg LM and threshold-hit speed in 3/3 seeds for both gates, and the early-to-mid advantage of G3 over G1 disappears. We trace this to an off-policy oracle label that assumes all subsequent layers execute full, whereas gated execution routes only a fraction through full -- making util/rank net-negative under the current recipe. Removing util/rank also cuts the training FLOPs proxy from ~1.53x to ~1.07x full-only (2.87h to 1.75h on a V100-32GB, ~39%). Conclusions are scoped to the studied regime.

URL PDF HTML ☆

赞 0 踩 0

2604.17225 2026-04-21 cs.CL

A Multi-Agent Approach for Claim Verification from Tabular Data Documents

Rudra Ranajee Saha, Laks V. S. Lakshmanan, Raymond T. Ng

2604.17224 2026-04-21 cs.LG stat.ML

LASER: Low-Rank Activation SVD for Efficient Recursion

Ege Çakar, Ketan Ali Raghu, Lia Zheng

Comments Accepted to the Latent and Implicit Thinking Workshop at ICLR 2026

2604.17222 2026-04-21 cs.CV cs.AI eess.SP

Region-Affinity Attention for Whole-Slide Breast Cancer Classification in Deep Ultraviolet Imaging

Nagur Shareef Shaik, Teja Krishna Cherukuri, Dong Hye Ye

Comments Accepted at the IEEE Engineering in Medicine and Biology Society Annual International Conference (Proceedings of the 48th International Conference), 2026

2604.17217 2026-04-21 cs.CV cs.AI

Cross-Modal Attention Analysis and Optimization in Vision-Language Models: A Study on Visual Reliability

Lijie Zhou

2604.17215 2026-04-21 cs.LG

Continual Safety Alignment via Gradient-Based Sample Selection

Thong Bach, Dung Nguyen, Thao Minh Le, Truyen Tran

Comments 18 pages

Journal ref ACL 2026 (Findings)

2604.17214 2026-04-21 cs.AI

Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition

Nwe Ni Win, Jim Basilakis, Steven Thomas, Seyhan Yazar, Laura Pierce, Stephanie Liu, Paul M. Middleton, Nasser Ghadiri, X. Rosalind Wang

2604.17212 2026-04-21 cs.RO

Planning Smooth and Safe Control Laws for a Unicycle Robot Among Obstacles

Aref Amiri, Basak Sakcak, Steven M. LaValle

Comments This work has been accepted for publication in the 2026 European Control Conference (ECC)

2604.17211 2026-04-21 cs.CV

EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents

Yu Zhang, Kaiyuan Shen, Yang Li

Comments 24 pages

2604.17210 2026-04-21 cs.LG

Guardrails in Logit Space: Safety Token Regularization for LLM Alignment

Thong Bach, Truyen Tran

Comments 10 pages, 3 figures

2604.17209 2026-04-21 cs.CV cs.AI eess.SP

DREAM: Dynamic Retinal Enhancement with Adaptive Multi-modal Fusion for Expert Precision Medical Report Generation

Nagur Shareef Shaik, Teja Krishna Cherukuri, Dong Hye Ye

Comments Accepted at the IEEE Engineering in Medicine and Biology Society Annual International Conference (Proceedings of the 48th International Conference), 2026

2604.17208 2026-04-21 cs.CV cs.AI

CDSA-Net:Collaborative Decoupling of Vascular Structure and Background for High-Fidelity Coronary Digital Subtraction Angiography

Si Li, Chen-Kai Hu, Zhenhuan Lyu, Yuanqing He

2604.17207 2026-04-21 cs.LG cs.AI cs.CC cs.CL

Demystifying the unreasonable effectiveness of online alignment methods

Enoch Hyunwook Kang

2604.17206 2026-04-21 cs.CV

SciDraw-6K: A Multilingual Scientific Illustration Dataset Generated by Google Gemini

Davie Chen

Comments 9 pages, 5 figures. Dataset: https://huggingface.co/datasets/SciDrawAI/SciDraw-6K. Code: https://github.com/SciDrawAI/scidraw-6k

2604.17200 2026-04-21 cs.CL

Calibrating Model-Based Evaluation Metrics for Summarization

Hongye Liu, Dhanajit Brahma, Ricardo Henao

2604.17199 2026-04-21 cs.RO cs.SY eess.SY

Modeling, Control and Self-sensing of Dielectric Elastomer Soft Actuators: A Review

Y. Zhao, G. Meng

2604.17197 2026-04-21 cs.CL

Learning to Control Summaries with Score Ranking

Hongye Liu, Liang Ding, Ricardo Henao

2604.17195 2026-04-21 cs.CV

DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior

Junjia Huang, Binbin Yang, Pengxiang Yan, Jiyang Liu, Bin Xia, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li

Comments Accepted by CVPR2026 as a Highlight paper

2604.17191 2026-04-21 cs.LG

Do LLM-derived graph priors improve multi-agent coordination?

Nikunj Gupta, Rajgopal Kannan, Viktor Prasanna

2604.17190 2026-04-21 cs.CV

LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation

Yuwei Ning, Ganlong Zhao, Yipeng Qin, Si Liu, Yang Liu, Liang Lin, Guanbin Li

Comments Accepted by CVPR 2026

2604.17189 2026-04-21 cs.RO

Shepherding UAV Swarm with Action Prediction Based on Movement Constraints

Yusuke Tsunoda, Yusuke Goto, Takao Sato

详情

英文摘要

In this study, we propose a new sheepdog-inspired control method for a swarm of small unmanned aerial vehicles (UAVs), which predicts the swarm behavior while explicitly accounting for the motion constraints of real robots. Sheepdog-inspired guidance control refers to a framework in which a small number of navigator agents (sheepdog agents) indirectly drive a large number of autonomous agents (a flock of sheep agents) so as to steer the group toward a target position. In conventional studies on sheepdog-inspired guidance, both types of agents have typically been modeled as point masses, and the guidance law for the navigator agents has been designed using simple interaction vectors based on the instantaneous relative positions between the agents. However, when implementing such methods on real robots such as drones, it is necessary to consider each agent's motion constraints, including upper bounds on velocity and acceleration. Moreover, we argue that guidance can be made more efficient by predicting the future behavior of the autonomous swarm that is observable to the navigator agents. To this end, we propose a three-dimensional guidance control law based on behavior prediction of autonomous agents under motion constraints, inspired by the Dynamic Window Approach (DWA). At each control cycle, the navigator agent generates a set of feasible motion candidates that satisfy its motion constraints, and predicts the short-horizon swarm evolution using an internal model of the autonomous agents maintained within the navigator agent. The motion candidates are then evaluated according to criteria such as the progress velocity toward the target, the positioning strategy with respect to the swarm, and safety margins, and the optimal motion is selected to achieve safe and efficient guidance. Numerical simulation results demonstrate the effectiveness of the proposed guidance control law.

URL PDF HTML ☆

赞 0 踩 0

2604.17178 2026-04-21 cs.CL

Cognitive Policy-Driven LLM for Diagnosis and Intervention of Cognitive Distortions in Emotional Support Conversation

Lin Zhong, Renjin Zhu, Shujuan Ma, Jinhao Cui, Lingzhi Wang, Hao Chen, Qing Liao

Comments Accepted at ACL 2026 (Main Conference)

2604.17177 2026-04-21 cs.LG

Decomposing the Depth Profile of Fine-Tuning

Jayadev Billa

Comments 25 pages incl. 13 appendix pages. 1 figure, 19 tables

2604.17174 2026-04-21 cs.CL

Modeling Multi-Dimensional Cognitive States in Large Language Models under Cognitive Crowding

Lin Zhong, Siyu Zhu, Zizhen Yuan, Jinhao Cui, Xinyang Zhao, Lingzhi Wang, Hao Chen, Qing Liao

Comments Accepted at ACL 2026

AI 大模型

视觉与机器人

科学与医疗

RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation

GaLa: Hypergraph-Guided Visual Language Models for Procedural Planning

Safe and Policy-Compliant Multi-Agent Orchestration for Enterprise AI

Enhancing Zero-shot Personalized Image Aesthetics Assessment with Profile-aware Multimodal LLM

Fringe Projection Based Vision Pipeline for Autonomous Hard Drive Disassembly

Yanasse: Finding New Proofs from Deep Vision's Analogies, Part 1

Revisiting Auxiliary Losses for Conditional Depth Routing: An Empirical Study

A Multi-Agent Approach for Claim Verification from Tabular Data Documents

LASER: Low-Rank Activation SVD for Efficient Recursion

Region-Affinity Attention for Whole-Slide Breast Cancer Classification in Deep Ultraviolet Imaging

Cross-Modal Attention Analysis and Optimization in Vision-Language Models: A Study on Visual Reliability

Continual Safety Alignment via Gradient-Based Sample Selection

Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition

Planning Smooth and Safe Control Laws for a Unicycle Robot Among Obstacles

EmbodiedHead: Real-Time Listening and Speaking Avatar for Conversational Agents

Guardrails in Logit Space: Safety Token Regularization for LLM Alignment

DREAM: Dynamic Retinal Enhancement with Adaptive Multi-modal Fusion for Expert Precision Medical Report Generation

CDSA-Net:Collaborative Decoupling of Vascular Structure and Background for High-Fidelity Coronary Digital Subtraction Angiography

Demystifying the unreasonable effectiveness of online alignment methods

SciDraw-6K: A Multilingual Scientific Illustration Dataset Generated by Google Gemini

Calibrating Model-Based Evaluation Metrics for Summarization

Modeling, Control and Self-sensing of Dielectric Elastomer Soft Actuators: A Review

Learning to Control Summaries with Score Ranking

DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior

Do LLM-derived graph priors improve multi-agent coordination?

LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation

Shepherding UAV Swarm with Action Prediction Based on Movement Constraints

Cognitive Policy-Driven LLM for Diagnosis and Intervention of Cognitive Distortions in Emotional Support Conversation

Decomposing the Depth Profile of Fine-Tuning

Modeling Multi-Dimensional Cognitive States in Large Language Models under Cognitive Crowding