arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.12660 2026-04-15 cs.AI

Broadening the Applicability of Conditional Syntax Splitting for Reasoning from Conditional Belief Bases

Lars-Phillip Spiegel, Jonas Haldimann, Jesse Heyninck, Gabriele Kern-Isberner, Christoph Beierle

详情

英文摘要

In nonmonotonic reasoning from conditional belief bases, an inference operator satisfying syntax splitting postulates allows for taking only the relevant parts of a belief base into account, provided that the belief base splits into subbases based on disjoint signatures. Because such disjointness is rare in practice, safe conditional syntax splitting has been proposed as a generalization of syntax splitting, allowing the conditionals in the subbases to share some atoms. Recently this overlap of conditionals has been shown to be limited to trivial, self-fulfilling conditionals. In this article, we propose a generalization of safe conditional syntax splittings that broadens the applicability of splitting postulates. In contrast to safe conditional syntax splitting, our generalized notion supports syntax splittings of a belief base Δ where the subbases of Δ may share atoms and nontrivial conditionals. We illustrate how this new notion overcomes limitations of previous splitting concepts, and we identify genuine splittings, separating them from simple splittings that do not provide benefits for inductive inference from Δ. We introduce adjusted inference postulates based on our generalization of conditional syntax splitting, and we evaluate several popular inductive inference operators with respect to these postulates. Furthermore, we show that, while every inductive inference operator satisfying generalized conditional syntax splitting also satisfies conditional syntax splitting, the reverse does not hold.

URL PDF HTML ☆

赞 0 踩 0

2604.12659 2026-04-15 cs.LG cs.CL

Do VLMs Truly "Read" Candlesticks? A Multi-Scale Benchmark for Visual Stock Price Forecasting

Kaiqi Hu, Linda Xiao, Shiyue Xu, Ziyi Tang, Mingwen Liu

Comments We evaluate whether VLMs can comprehend multi-scale visual stock price data like human analysts with a proposed benchmark, identifying current VLMs' weak predictive power, significant biases, and limited sensitivity to forecast horizons and prompts

2604.12655 2026-04-15 cs.LG cs.CR

Robust Semi-Supervised Temporal Intrusion Detection for Adversarial Cloud Networks

Anasuya Chattopadhyay, Daniel Reti, Hans D. Schotten

Comments This work has been accepted for publication in IEEE 2026 EuCNC & 6G Summit. This is a preprint version. The final published version will be available via IEEE Xplore

2604.12651 2026-04-15 cs.CL cs.AI

Learning Chain Of Thoughts Prompts for Predicting Entities, Relations, and even Literals on Knowledge Graphs

Alkid Baci, Luke Friedrichs, Caglar Demir, N'Dah Jean Kouagou, Axel-Cyrille Ngonga Ngomo

2604.12650 2026-04-15 cs.CV cs.MM

Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis

Miao Liu, Fangda Wei, Jing Wang, Xinyuan Qian

Comments Submitted to ACMMM 2026

2604.12648 2026-04-15 cs.LG cs.AI

TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting

Fan Zhang, Shiming Fan, Hua Wang

2604.12647 2026-04-15 cs.SD cs.CL

Adaptive Test-Time Scaling for Zero-Shot Respiratory Audio Classification

Tsai-Ning Wang, Herman Teun den Dekker, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed

Comments Accepted at AHLI CHIL 2026

2604.12634 2026-04-15 cs.AI cs.CL cs.LG cs.MA

RPRA: Predicting an LLM-Judge for Efficient but Performant Inference

Dylan R. Ashley, Gaël Le Lan, Changsheng Zhao, Naina Dhingra, Zhipeng Cai, Ernie Chang, Mingchen Zhuge, Yangyang Shi, Vikas Chandra, Jürgen Schmidhuber

Comments 10 pages in main text + 6 pages of references + 36 pages of appendices, 12 figures in main text + 37 figures in appendices, 2 tables in main text + 3 table in appendices, 13 prompts in appendices

2604.12633 2026-04-15 cs.CL

Multilingual Multi-Label Emotion Classification at Scale with Synthetic Data

Vadim Borisov

2604.12632 2026-04-15 cs.LG cs.AI

Calibration-Aware Policy Optimization for Reasoning LLMs

Ziqi Wang, Xingzhou Lou, Meiqi Wu, Zhengqi Wen, Junge Zhang

Comments Published as a conference paper at ACL 2026

2604.12630 2026-04-15 cs.CV cs.CL

GeoAlign: Geometric Feature Realignment for MLLM Spatial Reasoning

Zhaochen Liu, Limeng Qiao, Guanglu Wan, Tingting Jiang

2604.12627 2026-04-15 cs.AI

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

Linhao Yu, Tianmeng Yang, Siyu Ding, Renren Jin, Naibin Gu, Xiangzhao Hao, Shuaiyi Nie, Deyi Xiong, Weichong Yin, Yu Sun, Hua Wu

2604.12626 2026-04-15 cs.RO cs.CV

Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting

Ziyuan Xia, Jingyi Xu, Chong Cui, Yuanhong Yu, Jiazhao Zhang, Qingsong Yan, Tao Ni, Junbo Chen, Xiaowei Zhou, Hujun Bao, Ruizhen Hu, Sida Peng

Comments Project page: https://zju3dv.github.io/habitat-gs/

2604.12622 2026-04-15 cs.CV cs.AI cs.NI

Efficient Semantic Image Communication for Traffic Monitoring at the Edge

Damir Assylbek, Nurmukhammed Aitymbetov, Marko Ristin, Dimitrios Zorbas

2604.12616 2026-04-15 cs.AI cs.MM

Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs

Jianhao Chen, Haoyang Chen, Hanjie Zhao, Haozhe Liang, Tieyun Qian

Comments 12 pages, 9 figures

详情

英文摘要

The rapid evolution of Vision-Language Models (VLMs) has catalyzed unprecedented capabilities in artificial intelligence; however, this continuous modal expansion has inadvertently exposed a vastly broadened and unconstrained adversarial attack surface. Current multimodal jailbreak strategies primarily focus on surface-level pixel perturbations and typographic attacks or harmful images; however, they fail to engage with the complex semantic structures intrinsic to visual data. This leaves the vast semantic attack surface of original, natural images largely unscrutinized. Driven by the need to expose these deep-seated semantic vulnerabilities, we introduce \textbf{MemJack}, a \textbf{MEM}ory-augmented multi-agent \textbf{JA}ilbreak atta\textbf{CK} framework that explicitly leverages visual semantics to orchestrate automated jailbreak attacks. MemJack employs coordinated multi-agent cooperation to dynamically map visual entities to malicious intents, generate adversarial prompts via multi-angle visual-semantic camouflage, and utilize an Iterative Nullspace Projection (INLP) geometric filter to bypass premature latent space refusals. By accumulating and transferring successful strategies through a persistent Multimodal Experience Memory, MemJack maintains highly coherent extended multi-turn jailbreak attack interactions across different images, thereby improving the attack success rate (ASR) on new images. Extensive empirical evaluations across full, unmodified COCO val2017 images demonstrate that MemJack achieves a 71.48\% ASR against Qwen3-VL-Plus, scaling to 90\% under extended budgets. Furthermore, to catalyze future defensive alignment research, we will release \textbf{MemJack-Bench}, a comprehensive dataset comprising over 113,000 interactive multimodal jailbreak attack trajectories, establishing a vital foundation for developing inherently robust VLMs.

URL PDF HTML ☆

赞 0 踩 0

2604.12615 2026-04-15 cs.AI

DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant

Lev Sorokin, Ivan Vasilev, Samuele Pasini

Comments Published in the proceedings of the DeepTest workshop at the 48th International Conference on Software Engineering (ICSE) 2026

2604.12610 2026-04-15 cs.CL

Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs

Xudong Wang, Chaoning Zhang, Qigan Sun, Zhenzhen Huang, Chang Lu, Sheng Zheng, Zeyu Ma, Caiyan Qin, Yang Yang, Hengtao Shen

Comments 12 pages, 5 figures

2604.12596 2026-04-15 cs.LG cs.AI

KumoRFM-2: Scaling Foundation Models for Relational Learning

Valter Hudovernik, Federico López, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, Matthias Fey

2604.12591 2026-04-15 cs.RO

Machine Learning-Based Real-Time Detection of Compensatory Trunk Movements Using Trunk-Wrist Inertial Measurement Units

Jannis Gabler, Clément Lhoste, Max Quast, Laura Mayrhuber, Andrea Ronco, Olivier Lambercy, Paulius Viskaitis, Dane Donegan

Comments This manuscript has been submitted to IEEE Transactions on Neural Systems and Rehabilitation Engineering for possible publication. This version is a preprint and has not undergone peer review

详情

英文摘要

Compensatory trunk movements (CTMs) are commonly observed after stroke and can lead to maladaptive movement patterns, limiting targeted training of affected structures. Objective, continuous detection of CTMs during therapy and activities of daily living remains challenging due to the typically complex measurements setups required, as well as limited applicability for real-time use. This study investigates whether a two-inertial measurement unit configuration enables reliable, real-time CTM detection using machine learning. Data were collected from ten able-bodied participants performing activities of daily living under simulated impairment conditions (elbow brace restricting flexion-extension, resistance band inducing flexor-synergy-like patterns), with synchronized optical motion capture (OMC) and manually annotated video recordings serving as reference. A systematic location-reduction analysis using OMC identified wrist and trunk kinematics as a minimal yet sufficient set of anatomical sensing locations. Using an extreme gradient boosting classifier (XGBoost) evaluated with leave-one-subject-out cross-validation, our two-IMU model achieved strong discriminative performance (macro-F1 = 0.80 +/- 0.07, MCC = 0.73 +/- 0.08; ROC-AUC > 0.93), with performance comparable to an OMC-based model and prediction timing suitable for real-time applications. Explainability analysis revealed dominant contributions from trunk dynamics and wrist-trunk interaction features. In preliminary evaluation using recordings from four participants with neurological conditions, the model retained good discriminative capability (ROC-AUC ~ 0.78), but showed reduced and variable threshold-dependent performance, highlighting challenges in clinical generalization. These results support sparse wearable sensing as a viable pathway toward scalable, real-time monitoring of CTMs during therapy and daily living.

URL PDF HTML ☆

赞 0 踩 0

2604.12582 2026-04-15 cs.CV

Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models

Zijian Liu, Sihan Cao, Pengcheng Zheng, Kuien Liu, Caiyan Qin, Xiaolin Qin, Jiwei Wei, Chaoning Zhang

2604.12575 2026-04-15 cs.CV

StructDiff: A Structure-Preserving and Spatially Controllable Diffusion Model for Single-Image Generation

Yinxi He, Kang Liao, Chunyu Lin, Tianyi Wei, Yao Zhao

Comments Accepted by IEEE Transactions on Multimedia (Regular Paper)

2604.12574 2026-04-15 cs.CV

Cross-Modal Knowledge Distillation for PET-Free Amyloid-Beta Detection from MRI

Francesco Chiumento, Julia Dietlmeier, Ronan P. Killeen, Kathleen M. Curran, Noel E. O'Connor, Mingming Liu

Comments Accepted to CVPR Workshops 2026 (PHAROS-AIF-MIH)

2604.12573 2026-04-15 cs.AI

IDEA: An Interpretable and Editable Decision-Making Framework for LLMs via Verbal-to-Numeric Calibration

Yanji He, Yuxin Jiang, Yiwen Wu, Bo Huang, Jiaheng Wei, Wei Wang

Comments Accepted to ACL 2026

2604.12568 2026-04-15 cs.CV

Evolution-Inspired Sample Competition for Deep Neural Network Optimization

Ying Zheng, Yiyi Zhang, Yi Wang, Lap-Pui Chau

2604.12565 2026-04-15 cs.RO cs.CV

Scalable Trajectory Generation for Whole-Body Mobile Manipulation

Yida Niu, Xinhai Chang, Xin Liu, Ziyuan Jiao, Yixin Zhu

详情

英文摘要

Robots deployed in unstructured environments must coordinate whole-body motion -- simultaneously moving a mobile base and arm -- to interact with the physical world. This coupled mobility and dexterity yields a state space that grows combinatorially with scene and object diversity, demanding datasets far larger than those sufficient for fixed-base manipulation. Yet existing acquisition methods, including teleoperation and planning, are either labor-intensive or computationally prohibitive at scale. The core bottleneck is the lack of a scalable pipeline for generating large-scale, physically valid, coordinated trajectory data across diverse embodiments and environments. Here we introduce AutoMoMa, a GPU-accelerated framework that unifies AKR modeling, which consolidates base, arm, and object kinematics into a single chain, with parallelized trajectory optimization. AutoMoMa achieves 5,000 episodes per GPU-hour (over $80\times$ faster than CPU-based baselines), producing a dataset of over 500k physically valid trajectories spanning 330 scenes, diverse articulated objects, and multiple robot embodiments. Prior datasets were forced to compromise on scale, diversity, or kinematic fidelity; AutoMoMa addresses all three simultaneously. Training downstream IL policies further reveals that even a single articulated-object task requires tens of thousands of demonstrations for SOTA methods to reach $\approx 80\%$ success, confirming that data scarcity -- not algorithmic limitations -- has been the binding constraint. AutoMoMa thus bridges high-performance planning and reliable IL-based control, providing the infrastructure previously missing for coordinated mobile manipulation research. By making large-scale, kinematically valid training data practical, AutoMoMa showcases generalizable whole-body robot policies capable of operating in the diverse, unstructured settings of the real world.

URL PDF HTML ☆

赞 0 踩 0

2604.12559 2026-04-15 cs.CL

FABLE: Fine-grained Fact Anchoring for Unstructured Model Editing

Peng Wang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu

Comments ACL 2026 findings

2604.11730 2026-04-15 cs.CV cs.HC cs.LG

Ambivalence/Hesitancy Recognition in Videos for Personalized Digital Health Interventions

Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan, Masoumeh Sharafi, Muhammad Haseeb Aslam, Lorenzo Sia, Nicolas Richet, Marco Pedersoli, Alessandro Lameiras Koerich, Simon L Bacon, Eric Granger

Comments 13 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2505.19328

详情

英文摘要

Using behavioural science, health interventions focus on behaviour change by providing a framework to help patients acquire and maintain healthy habits that improve medical outcomes. In-person interventions are costly and difficult to scale, especially in resource-limited regions. Digital health interventions offer a cost-effective approach, potentially supporting independent living and self-management. Automating such interventions, especially through machine learning, has gained considerable attention recently. Ambivalence and hesitancy (A/H) play a primary role for individuals to delay, avoid, or abandon health interventions. A/H are subtle and conflicting emotions that place a person in a state between positive and negative evaluations of a behaviour, or between acceptance and refusal to engage in it. They manifest as affective inconsistency across modalities or within a modality, such as language, facial, vocal expressions, and body language. While experts can be trained to recognize A/H, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital health interventions. Here, we explore the application of deep learning models for A/H recognition in videos, a multi-modal task by nature. In particular, this paper covers three learning setups: supervised learning, unsupervised domain adaptation for personalization, and zero-shot inference via large language models (LLMs). Our experiments are conducted on the unique and recently published BAH video dataset for A/H recognition. Our results show limited performance, suggesting that more adapted multi-modal models are required for accurate A/H recognition. Better methods for modeling spatio-temporal and multimodal fusion are necessary to leverage conflicts within/across modalities.

URL PDF HTML ☆

赞 0 踩 0

2604.11435 2026-04-15 cs.CL cs.AI cs.IR cs.LG

Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books

Argyrios Papoudakis, Mirella Lapata, Frank Keller

Comments 20 pages, 16 tables, 1 figure

2604.10992 2026-04-15 cs.CV

ArtiCAD: Articulated CAD Assembly Design via Multi-Agent Code Generation

Yuan Shui, Yandong Guan, Zhanwei Zhang, Juncheng Hu, Jing Zhang, Dong Xu, Qian Yu

2604.10929 2026-04-15 cs.RO

Ro-SLM: Onboard Small Language Models for Robot Task Planning and Operation Code Generation

Wenhao Wang, Yanyan Li, Long Jiao, Jiawei Yuan

Comments 25 pages, 2 figures, ACL 2026