arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2605.12852 2026-05-14 cs.LG q-bio.QM

Multitask Multimodal Fusion with Tabular Foundation Models for Peak and Durability Prediction of Pertussis Booster Response

Divya Sitani

发表机构 * Berlin, Germany(柏林,德国)

AI总结 该研究旨在同时预测百日咳加强疫苗接种后的免疫反应峰值和持续时间,这两个过程由不同的生物学机制驱动。研究提出了一种多任务多模态融合模型,结合冻结的TabPFN-v2编码器、双标签对比损失、缺失校准的模态丢弃和注意力融合机制,以应对数据模态异质性、缺失值和任务间关联弱的挑战。实验表明,该模型在两个预测任务上均优于传统方法,且结果与免疫学机制一致,揭示了不同模态对峰值和持续时间预测的特异性贡献。

Comments 22 pages, 8 figures, 4 tables. Code available at https://github.com/Divya1205/cmi-pb-multitask

详情
英文摘要

Pertussis booster vaccination produces immune responses that vary widely across individuals in both peak magnitude and long-term durability. These two phases are governed by partly distinct biological compartments:peak reflects acute B-cell activation and antibody secretion, while durability reflects the establishment of long-term humoral memory. Yet most computational models target only one, missing the full boost-and-wane trajectory. Jointly predicting both is non-trivial because the two endpoints are biologically dissociated rather than redundant; samples are small, modalities are heterogeneous with structured missingness, and the two tasks rely on different measurement windows. We propose a multi-task contrastive multimodal fusion architecture combining frozen TabPFN-v2 per-modality encoders, a dual-label supervised contrastive loss that treats two subjects as a positive pair if they agree on the Task 1 label or the Task 2 label, modality dropout calibrated to empirical missingness, and missingness-masked attention fusion. Applied to a curated subset of the CMI-PB pertussis booster dataset (n = 158 subjects, four modalities, 44.9% with at least one modality missing; Spearman r = -0.58 between peak and durability, n = 96), the model achieves test AUROC 0.797 (95% CI [0.621, 0.948]) for peak response and 0.755 (95% CI [0.519, 0.945]) for durability, with both significant under joint label permutation (N = 1000; p = 0.002 and p = 0.045). Across logistic regression, XGBoost, and MLP baselines on raw features and on TabPFN embeddings, the proposed model is the only one whose 95% CIs lie above chance on both tasks simultaneously. Per-modality contribution analyses recover task-specific modality contributions consistent with the underlying immunology: peak prediction is carried by cytokine signatures, while durability is carried by baseline antibody features.

2605.12851 2026-05-14 cs.CV cs.AI

PRISM: Perinuclear Ring-based Image Segmentation Method for Acute Lymphoblastic Leukemia Classification

Larissa Ferreira Rodrigues Moreira, Leonardo Gabriel Ferreira Rodrigues, Rodrigo Moreira, André Ricardo Backes

发表机构 * Institute of Exact and Technological Sciences(精确与技术科学研究所) Federal University of Viçosa(弗雷塔斯联邦大学) School of Computer Science(计算机科学学院) Federal University of Uberlândia(伯南布哥联邦大学) Departament of Computing(计算系) Federal University of São Carlos(萨o卡洛斯联邦大学)

AI总结 该研究针对急性淋巴细胞白血病(ALL)分类中外周血涂片图像分析的挑战,提出了一种基于核周环的图像分割方法PRISM。该方法通过围绕细胞核构建自适应同心区域,替代传统的细胞质轮廓分割,从而在无需精确细胞边界检测的情况下提取鲁棒的细胞质特征。实验表明,该方法结合传统分类器的校准集成,在分类准确率和AUC指标上均表现出色,分别达到98.46%和0.9937。

Comments Paper accepted for publication at the XXVI Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS 2026), Ouro Preto, MG, Brazil

详情
英文摘要

Automated analysis of peripheral blood smears for Acute Lymphoblastic Leukemia (ALL) is hindered by low contrast and substantial variability in cytoplasmic appearance, which complicate conventional membrane-based segmentation. We found that many recent approaches rely on heavy neural architectures and extensive training, but still struggle to generalize across staining and acquisition variability. To address these limitations, we propose the Perinuclear Ring-based Image Segmentation Method (PRISM), which replaces explicit cytoplasmic delineation with adaptive concentric zones constructed around the nucleus. These perinuclear regions enable the extraction of robust cytoplasmic descriptors by integrating color information with texture statistics derived from grey-level co-occurrence patterns, without requiring accurate cell-boundary detection. A calibrated stacking ensemble of traditional classifiers leverages these descriptors to achieve a high performance, with an accuracy of 98.46% and a precision-recall AUC of 0.9937.

2605.12845 2026-05-14 cs.CV cs.AI

AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects

Danrui Li, Jiahao Zhang, Bernhard Egger, Moitreya Chatterjee, Suhas Lohit, Tim K. Marks, Anoop Cherian

发表机构 * Rutgers, The State University of New Jersey(新泽西罗格斯大学) The Australian National University(澳大利亚国立大学) Friedrich-Alexander-Universität Erlangen-Nürnberg(埃尔兰根-纽伦堡弗里德里希-亚历山大大学) Mitsubishi Electric Research Laboratories (MERL)(三菱电机研究实验室)

AI总结 本文提出AssemblyBench,一个包含2,789个工业对象的合成数据集,包含多模态装配说明、对应的3D部件模型及装配轨迹,旨在解决工业装配中复杂形状和装配路径的问题。研究还提出基于Transformer的模型AssemblyDyno,能够联合预测装配顺序和部件轨迹,相比现有方法在装配姿态估计和轨迹可行性方面表现更优,其中轨迹可行性通过物理仿真进行评估。

Comments Accepted at CVPR 2026

详情
英文摘要

Assembling objects from parts requires understanding multimodal instructions, linking them to 3D components, and predicting physically plausible 6-DoF motions for each assembly step. Existing datasets focus on simplified scenarios, overlooking shape complexities and assembly trajectories in industrial assemblies. We introduce AssemblyBench, a synthetic dataset of 2,789 industrial objects with multimodal instruction manuals, corresponding 3D part models, and part assembly trajectories. We also propose a transformer-based model, AssemblyDyno, which uses the instructional manual and the 3D shape of each part to jointly predict assembly order and part assembly trajectories. AssemblyDyno outperforms prior works in both assembly pose estimation and trajectory feasibility, where the latter is evaluated by our physics-based simulations.

2605.12843 2026-05-14 cs.LG cs.AI

Bayesian Model Merging

Kaiyang Li, Shaobo Han, Qing Su, Shihao Ji

发表机构 * School of Computing, University of Connecticut(康涅狄格大学计算机学院) Optical Networking and Sensing, NEC Labs America(NEC美国光网络与传感实验室)

AI总结 本文提出了一种名为Bayesian Model Merging(BMM)的模型合并方法,旨在在无需联合重训练的情况下将多个任务专家模型合并为一个统一模型。该方法采用了一种双层优化框架,内层基于锚定模型的强先验进行激活驱动的贝叶斯回归,得到高效的闭式解;外层则通过贝叶斯优化全局搜索各模块的超参数。此外,BMM还揭示了激活统计量与任务向量之间的关键对齐关系,从而实现了无需辅助数据的无数据变体。实验表明,BMM在多个基准测试中均优于现有方法,尤其在多任务视觉与语言任务中表现出色。

详情
英文摘要

Model merging aims to combine multiple task-specific expert models into a single model without joint retraining, offering a practical alternative to multi-task learning when data access or computational budget is limited. Existing methods, however, face two key limitations: (1) they overlook the valuable inductive bias of strong anchor models and estimate the merged weights from scratch, and (2) they rely on a shared hyperparameter setting across different modules of the network, lacking a global optimization strategy. This paper introduces Bayesian Model Merging (BMM), a plug-and-play bi-level optimization framework, where the inner level formulates the model merging as an activation-based Bayesian regression under a strong prior induced by an anchor model, yielding an efficient closed-form solution; and the outer level leverages a Bayesian optimization procedure to search module-specific hyperparameters globally based on a small validation set. Furthermore, we reveal a key alignment between activation statistics and task vectors, enabling us to derive a data-free variant of BMM that estimates the Gram matrix for regression without any auxiliary data. Across extensive benchmarks, including up to 20-task merging in vision and 5-task merging in language, BMM consistently outperforms all plug-and-play anchor baselines (e.g., TA, WUDI-Merging, and TSV). In particular, on the ViT-L/14 benchmark for 8-task merging, a single merged model reaches 95.1, closely matching the average performance of eight task-specific experts (95.8).

2605.12838 2026-05-14 cs.AI

Multimodal Hidden Markov Models for Persistent Emotional State Tracking

Anamika Ragu, Aneesh Jonelagadda

发表机构 * Kaliber AI, San Mateo, California, USA(Kaliber AI,美国加利福尼亚州圣马特奥)

AI总结 本文提出了一种基于多模态情感表示的轻量级隐马尔可夫模型框架,用于追踪对话中持续的情感状态变化。该方法利用粘性因子HDP-HMM对来自视频、音频和文本的多模态情感特征进行建模,能够更准确地捕捉对话中长期的情感阶段。实验表明,该模型在计算成本远低于基于大语言模型的方法的前提下,能够生成更具可解释性的情感序列,并在临床数据集上验证了其在情感阶段恢复和提升对话质量方面的有效性。

Comments 8 pages, 2 figures

详情
英文摘要

Tracking an interpretable emotional arc of a conversation via the sentiment of individual utterances processed as a whole is central to both understanding and guiding communication in applied, especially clinical, conversational contexts. Existing approaches to emotion recognition operate at the utterance level, obscuring the persistent phases that characterize real conversational dynamics. We propose a lightweight framework that models conversational emotion as a sequence of latent emotional regimes using sticky factorial HDP-HMMs over multimodal valence-arousal representations derived from simultaneous video, audio and textual input. We evaluate the quality of regime prediction using LLM-as-a-Judge, geometric, and temporal consistency metrics, demonstrating that the sticky HDP-HMM produces more interpretable regime sequences than the baseline Gaussian HMM at a fraction of the computational cost of LLM-based dialogue state tracking methods. In addition, Question-Answer experiments in a clinical dataset suggest that meaningful emotional phases can reliably be recovered from multimodal valence-arousal trajectories and used to improve the quality of LLM responses in unstable affective regimes via context augmentation. This framework thus opens a path toward interpretable, lightweight, and actionable analysis of conversational emotion dynamics at scale.

2605.12835 2026-05-14 cs.AI

PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models

Sridhar Mahadevan

发表机构 * Adobe Research and University of Massachusetts, Amherst(Adobe研究院和马萨诸塞大学阿默斯特分校)

AI总结 PROMETHEUS 是一个将文本、数据和模型整合为因果地图的框架,旨在自动化深度因果研究。该方法通过构建局部因果预测状态模型的集合,形成可导航的因果图谱,支持对不同区域的因果声明进行比较与整合。研究展示了该框架在多个实际案例中的应用,包括从文献中提取因果关系以及基于原始数据进行反事实验证,显著提升了因果推理的系统性和可解释性。

Comments 27 pages

详情
英文摘要

Large language models can extract local causal claims from text, but those claims become more useful when organized as persistent, navigable world models rather than as flat summaries. We introduce PROMETHEUS, a framework that turns retrieved literature, filings, reviews, reports, agent traces, source data, code, simulations, and scientific models into causal atlases: sheaf-like families of local causal predictive-state models over an explicit cover of a research substrate. Each local region contains causal episodes, structured claim tables, predictive tests, support statistics, and provenance; restriction maps compare overlapping regions; gluing diagnostics expose agreement, drift, contradiction, and underdetermination. The resulting Topos World Model is not a single universal graph. It is a research instrument for navigating what a corpus says, where it says it, how strongly it is supported, and where local claims fail to assemble into a coherent global view. Three literature-atlas case studies -- ocean-temperature impacts on marine populations, GLP-1 weight-loss evidence, and resveratrol/red-wine health-benefit claims -- illustrate deep causal research from text with explicit locality, evidence, persistent state, and gluing tension. Four grounded-counterfactual case studies -- a Nature Climate Change microplastics forcing paper, an Indus Valley hydrology paper with VIC-derived figure data and model code, the canonical Sachs protein-signaling study with single-cell perturbation data, and a Nature singing-mouse study with MAPseq projection matrices -- show a stronger mode: when a paper ships source data, simulation outputs, or code, PROMETHEUS can evaluate a counterfactual against that scientific substrate and then rebuild the sheaf world model around the

2605.12831 2026-05-14 cs.LG

Quantifying Potential Observation Missingness in Inverse Reinforcement Learning

Leo Benac, Abhishek Sharma, Alihan Huyuk, Finale Doshi-Velez

发表机构 * School of Engineering and Applied Sciences(工程与应用科学学院) Harvard University(哈佛大学)

AI总结 逆强化学习(IRL)通过示范数据推断奖励函数,是建模和理解决策行为的重要工具。然而,现实中的行为数据可能存在未被记录的观测信息,导致专家行为看似次优,从而影响奖励函数的学习。本文提出了一种方法,用于量化专家行为在缺失观测情况下的潜在最优性,并开发了相应的算法,通过多个实验验证其在导航任务、癌症治疗模拟和ICU治疗数据中的有效性。

详情
英文摘要

Inverse reinforcement learning (IRL), which infers reward functions from demonstrations, is a valuable tool for modeling and understanding decision-making behavior. Many variants of IRL have been developed to capture complexities of human decision-making, such as subjective beliefs, imperfect planning, and dynamic goals. However, an often-overlooked issue in real-world behavioral datasets is that the recorded data may be missing observations that were available to the original decision-maker. In use-inspired settings such as healthcare, this can make expert actions appear suboptimal, even when they were near-optimal given the information available at the time. As a result, the rewards learned by standard IRL may be misleading. In this paper, we identify the minimal perturbations to the recorded observations needed for the expert's actions to appear optimal. We develop a practical algorithm for this problem and demonstrate its utility for quantifying the possible extent of missing observations in behavioral datasets through extensive experiments on synthetic navigation tasks, a cancer treatment simulator, and ICU treatment data.

2605.12826 2026-05-14 cs.CV cs.AI

FRAME: Forensic Routing and Adaptive Multi-path Evidence Fusion for Image Manipulation Detection

Kaixiang Zhao, Tianrun Yu, Aoxu Zhang, Junhao Su, Porter Jenkins, Amanda Hughes

发表机构 * Brigham Young University Rutgers University

AI总结 随着图像编辑工具和生成式人工智能的普及,数字图像的真实性验证变得愈发困难。为了解决现有方法在鲁棒性、证据碎片化和泛化能力方面的不足,本文提出了一种名为FRAME的新方法,通过多路径分析空间组织多种取证算法,自适应选择适合的取证路径并融合互补证据,从而提升检测与定位性能。FRAME在保持多源取证线索可解释性的基础上,提供了更稳健且灵活的图像取证方案,并在多种篡改场景中展现出良好的效果。

Comments Accepted to CVPR 2026 SAFE Workshop

详情
英文摘要

The proliferation of sophisticated image editing tools and generative artificial intelligence models has made verifying the authenticity of digital images increasingly challenging, with important implications for journalism, forensic analysis, and public trust. Although numerous forensic algorithms, ranging from handcrafted methods to deep learning-based detectors, have been developed for manipulation detection, individual methods often suffer from limited robustness, fragmented evidence, or weak generalization across manipulation types and image conditions. To address these limitations, we present \textbf{FRAME}, a method for \textbf{F}orensic \textbf{R}outing and \textbf{A}daptive \textbf{M}ulti-path \textbf{E}vidence fusion for image manipulation detection. FRAME organizes diverse forensic algorithms into a multi-path analysis space, adaptively selects informative forensic paths for each input image, and fuses complementary evidence to improve detection and localization performance. By moving beyond single-method analysis and fixed fusion strategies, FRAME provides a more robust and flexible approach to image forensic reasoning while preserving interpretable forensic cues from multiple evidence sources. Experimental results demonstrate the effectiveness of FRAME across diverse manipulation scenarios. Code is available at \href{https://github.com/kzhao5/FRAME}{https://github.com/kzhao5/FRAME}.

2605.12823 2026-05-14 cs.LG physics.chem-ph physics.comp-ph q-bio.BM

Hessian Matching for Machine-Learned Coarse-Grained Molecular Dynamics

Sanya Murdeshwar, Sanjit Shashi, Kevin Bachelor, William Noid, Ashwin Lokapally, Razvan Marinescu

发表机构 * University of California, Santa Cruz(加州大学圣克鲁兹分校) GiwoTech Inc.(GiwoTech公司) Pennsylvania State University(宾夕法尼亚州立大学)

AI总结 该研究提出了一种基于Hessian向量积匹配的机器学习粗粒化分子动力学方法,旨在提升粗粒化势能函数对自由能曲率的建模能力。通过引入随机探针向量,该方法在不显式构造Hessian矩阵的情况下,将二阶曲率信息融入粗粒化势能函数中,从而提高了模拟的准确性。实验表明,该方法在多个蛋白质体系中显著优于传统的梯度匹配方法,尤其在慢模动力学指标上表现出更优的性能。

Comments 15 pages, 4 figures, 1 table

详情
英文摘要

Coarse-grained (CG) molecular dynamics enables simulations of atomic systems such as biomolecules at timescales inaccessible to all-atom (AA) methods, but existing CG neural potentials trained via force matching capture only the gradient of the free-energy surface, leaving its curvature unconstrained. We introduce a framework that augments force matching with stochastic Hessian-vector product (HVP) matching, instilling second-order curvature information into CG potentials without constructing the full Hessian. We derive a decomposition of the target CG Hessian into a model-independent projected AA Hessian, precomputed once before training, and a model-dependent covariance correction computed online at negligible cost. We construct an unbiased stochastic estimator of the Hessian-matching objective by using random probe vectors. We evaluate our method by comparing against force matching on a benchmark of nine fast-folding proteins unseen during training. HVP matching outperforms plain force matching on 8 of 9 proteins on slow-mode metrics, with reductions of up to 85% in the Kullback--Leibler divergence between the CG and reference distributions along the slowest collective mode of the largest protein. Our results demonstrate that higher-order physical supervision is a practical path to more accurate and transferable CG potentials for biomolecular simulation.

2605.12817 2026-05-14 cs.LG cs.AI cs.CL

Training Large Language Models to Predict Clinical Events

Benjamin Turtel, Paul Wilczewski, Kris Skotheim

发表机构 * Lightning Rod Labs(Lightning Rod实验室)

AI总结 该研究旨在利用纵向临床记录训练大型语言模型以预测临床事件。通过将时间顺序的MIMIC-III病历转化为包含过去病史、未来事件问题及后续记录标签的预测示例,构建了涵盖药物、手术、器官支持、微生物学和死亡率等多方面的预测数据集。研究采用LoRA微调方法显著提升了模型的预测性能,并在无需人工设计结构特征或专用分类器的情况下实现了对临床预测的可复用监督学习。

详情
英文摘要

Longitudinal clinical notes contain rich evidence of how patients evolve over time, but converting this signal into training supervision for clinical prediction remains challenging. We extend Foresight Learning to clinical prediction by converting time-ordered MIMIC-III notes into examples consisting of past patient context, a natural-language question about a possible future event, and a label resolved from later documentation. This process yields 6,900 prediction examples from 702 admissions across medications, procedures, organ support, microbiology, and mortality. A small LoRA adapter trained on these examples improves over the prompted base model, reducing expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145, while slightly outperforming GPT-5 point estimates on held-out questions. The approach enables reusable clinical prediction supervision from longitudinal notes without hand-engineered structured features or endpoint-specific classifiers.

2605.12816 2026-05-14 cs.LG

AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image Classifiers

Raj Kiran Gupta Katakam

发表机构 * Credit Karma

AI总结 本文研究了平均梯度外积(AGOP)在神经网络特征学习中的作用,并探索其作为图像分类器中单样本解释方法的潜力。提出了一种新的归因方法AGOP-Weighted,结合了训练分布先验以提升像素重要性识别的准确性,并引入了两种变体AGOP-Local和AGOP-Global。实验表明,该方法在多个基准上显著优于现有归因方法,尤其在计算效率和小分辨率图像处理方面表现突出。

Comments 8 pages. Accepted at the 4th World Conference on eXplainable Artificial Intelligence (XAI 2026), Late-Breaking Work track, Fortaleza, Brazil, July 1-3, 2026

详情
英文摘要

The Average Gradient Outer Product (AGOP) governs feature learning in neural networks: the Neural Feature Ansatz states that weight Gram matrices at each layer align with the corresponding AGOP matrices computed over the training distribution. We ask a complementary question: can this same quantity serve as a post-hoc attribution method for explaining individual predictions? We introduce AGOP-Weighted: a novel attribution method that multiplies the per-sample gradient by sqrt(diag(M) / max diag(M)), a training-distribution prior that suppresses gradient noise and amplifies consistently important pixels -- a combination not present in any prior attribution method. We formalise two companion variants -- AGOP-Local (per-sample gradient, equivalent to VanillaGrad) and AGOP-Global (diag(M) directly as a zero-cost saliency map) -- and implement an efficient training-time accumulation hook; AGOP-Global then requires zero inference cost (disk lookup) while AGOP-Weighted requires only a single gradient pass. We conduct the first rigorous comparison of AGOP attribution against Integrated Gradients (IG), SmoothGrad, GradCAM, and VanillaGrad across two benchmarks with pixel-level ground truth: (i) the synthetic XAI-TRIS benchmark (four classification scenarios, 8x8 images, CNN8by8) and (ii) the photorealistic CLEVR-XAI benchmark (ResNet-18 fine-tuned from ImageNet). AGOP-Weighted achieves 44% higher mIoU than IG on linear tasks; AGOP-Global achieves 7x higher mIoU than IG on multiplicative tasks (where IG falls below random) at zero inference cost. Both findings generalise to ResNet-18 on CLEVR-XAI (+18% and +37% respectively). We further show that GradCAM fails on small-resolution images due to spatial resolution collapse, and that diag(M) quality improves monotonically throughout training even after classification accuracy has plateaued.

2605.12809 2026-05-14 cs.LG cs.AI

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

Shixing Yu, Promit Ghosal, Kyra Gan

发表机构 * Electrical and Computer Engineering(电气与计算机工程系) Department of Statistics(统计学系) Operations Research and Industrial Engineering(运筹学与工业工程)

AI总结 该研究旨在提高大语言模型在医疗等关键领域中的可靠性,通过识别模型预测所依赖的训练数据中的具体 token。为解决现有方法在 token 独立性假设和分解性上的局限,作者提出了一种基于正交潜在空间的框架,利用稀疏自编码器学习近似独立的潜在特征,并通过雅可比向量积和逆 Hessian 近似实现 token 级别的影响分析。实验表明,该方法能有效识别出稀疏且可解释的 token 集合,有助于增强模型可信度和决策透明性。

详情
英文摘要

A critical step for reliable large language models (LLMs) use in healthcare is to attribute predictions to their training data, akin to a medical case study. This requires token-level precision: pinpointing not just which training examples influence a decision, but which tokens within them are responsible. While influence functions offer a principled framework for this, prior work is restricted to autoregressive settings and relies on an implicit assumption of token independence, rendering their identified influences unreliable. We introduce a flexible framework that infers token-level influence through a latent mediation approach for general prediction tasks. Our method attaches sparse autoencoders to any layer of a pretrained LLM to learn a basis of approximately independent latent features. Unlike prior methods where influence decomposes additively across tokens, influence computed over latent features is inherently non-decomposable. To address this, we introduce a novel method using Jacobian-vector products. Token-level influence is obtained by propagating latent attributions back to the input space via token activation patterns. We scale our approach using efficient inverse-Hessian approximations. Experiments on medical benchmarks show our approach identifies sparse, interpretable sets of tokens that jointly influence predictions. Our framework enhances trust and enables model auditing, generalizing to high-stakes domain requiring transparent and accountable decisions.

2605.12803 2026-05-14 cs.LG

Pitfalls of Unlabeled Disagreement-Based Drift Detection in Streaming Tree Ensembles

Lara Sá Neves, Afonso Lourenço, Lizy K. John, Goreti Marreiros

发表机构 * GECAD, ISEP, Polytechnic of Porto(GECAD,ISEP,波尔图理工大学) The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文研究了在未标记数据流中基于分歧的漂移检测方法在增量决策树集成中的应用问题。作者通过构造批次特定的分歧度量并进行实验,发现该方法在多层感知机集成中表现良好,但在增量决策树集成中却显著劣于基于损失的检测方法。研究认为,这是由于增量决策树结构扩张为主的特性限制了模型的适应性,使得分歧无法准确反映其学习潜力。文章指出,利用增量决策树的规则分解特性进行重构,可能为提升其适应性提供新方向。

Comments Published as a conference paper at CAO Workshop at ICLR 2026

详情
英文摘要

Detecting concept drift in high-speed data streams remains challenging, particularly when models must operate on unlabeled data and avoid false alarms caused by benign shifts. While disagreement-based uncertainty has shown promise in neural networks, its adaptation to ensembles of incremental decision trees (IDTs) remains largely unexplored. We investigate this approach by constructing batch-specific disagreement measures via label flipping in ensemble members and evaluating their effectiveness for drift detection in tabular data streams. Our experiments show that, although this method performs well in ensembles of multi-layer perceptrons (MLPs), it consistently underperforms loss-based detectors when applied to IDTs. We attribute this behavior to the intrinsic rigidity of IDTs: learning primarily through structural expansion, with limited parameter adaptation, restricts model plasticity and prevents disagreement from reliably reflecting learning potential. Recent work on restructuring IDTs using their intrinsic decomposition into non-overlapping rules offers a promising direction for improving adaptability.

2605.12798 2026-05-14 cs.LG cs.AI cs.CL

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

Baris Askin, Muhammed Ustaomeroglu, Anupam Nayak, Gauri Joshi, Guannan Qu, Carlee Joe-Wong

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 该研究探讨了在有限有害数据集上微调大语言模型时可能引发的“涌现性对齐偏差”(EM)和“潜意识学习”(SL)现象。研究认为,这类偏差并非由单一有害示例引起,而是数据结构、任务难度与模型能力之间相互作用的结果。通过实验发现,当微调与评估提示具有相似功能结构、存在更多连贯有害补全空间,或目标行为已被模型可靠学习时,偏差更容易出现。研究还首次对比了在策略外与策略内蒸馏下偏差的传递机制,强调应从数据和训练流程的整体视角理解对齐偏差的成因。

详情
英文摘要

Fine-tuning LLMs on narrow harmful datasets can induce Emergent Misalignment (EM), where models exhibit misaligned behavior far beyond the fine-tuning distribution. We argue that emergent misalignment can be better understood as a data-mediated transfer phenomenon: harmful fine-tuning examples do not induce uniform behavioral spillover, but interact with the structural properties of the dataset and the difficulty of the tasks relative to the model. Across our experiments, we find that misalignment appears more readily when fine-tuning and evaluation prompts share similar underlying functional structure, when prompts leave more room for coherent harmful completions, and when the target behavior has been more reliably learned by the model. The training pipeline itself also matters: pretraining composition shapes later misalignment. We further study Subliminal Learning (SL), where misalignment is transmitted by fine-tuning on seemingly benign data generated by a harmful teacher. Moving beyond the standard SFT setting, we for the first time compare this transfer under off-policy and on-policy distillation as well, allowing us to separate the roles of the teacher guidance and the training data distribution in transmitting misalignment. Together, these results argue for a data-centric view: Emergent/subliminal misalignment should not be treated as a simple consequence of isolated harmful fine-tuning examples, but as the result of interactions between fine-tuning data structure, pretraining distributions, and training channels.

2605.12792 2026-05-14 cs.LG

SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions

Thushari Hapuarachchi, Kaiqi Xiong

发表机构 * University of South Florida(佛罗里达州立大学)

AI总结 本文对神经切线泛化攻击(NTGA)的现状进行了全面分析,并指出了其优缺点及改进方向。NTGA是首个在黑盒环境下实现的干净标签泛化攻击,用于应对深度神经网络训练中使用未经授权数据的问题。研究通过实验验证了NTGA在对抗训练和图像变换下的脆弱性,并发现近期提出的其他干净标签攻击在数据保护效果上已超越NTGA,从而揭示了进一步研究NTGA的必要性。

详情
英文摘要

There is recently a serious issue that Deep Neural Networks (DNNs) training uses more and more unauthorized data. A clean-label generalization attack, one type of data poisoning attacks, has been suggested to address this issue. The Neural Tangent Generalization Attack (NTGA) is considered as the first well-known clean-label generalization attack under the black-box settings, which provided an unprecedented step in data protection approaches. In this paper, we conduct a comprehensive analysis on the state-of-the-art of NTGA; to the best of our knowledge, this is the first thorough analysis regarding NTGA. First, we provide a classification of attacks against DNNs with their explanations and relations to NTGA. Then, this paper presents a taxonomy of black-box attacks and demonstrate that the NTGA is the first clean-label generalization attack under the black-box setting. We further analyze the existing studies of NTGA and give a comprehensive comparisons of their findings by conducting our own experiments to verify these findings. Moreover, our extensive experiments show that NTGA is vulnerable to adversarial training and image transformations, and applying linear separability to NTGA-generated images makes them more susceptible to such vulnerablities. We present the pros and cons of NTGA and suggest ways to improve NTGA robustness based on our analysis. Our further experiments indicate that several recently proposed clean-label generalization attacks outperform NTGA on data protection. Finally, we unveil the necessity of further research with future research insights on NTGA.

2605.12790 2026-05-14 cs.RO

Few-Shot Physics-Informed Neural Network for Shape Reconstruction of Concentric-Tube Robots

Navid Feizi, Filipe C. Pedrosa, Rajni V. Patel, Jagadeesan Jayender

发表机构 * Canada Research Chairs Program(加拿大研究主席计划)

AI总结 本文提出了一种基于物理信息的神经网络(PINN),用于具有三个预弯曲管的六自由度同心管机器人(CTR)的运动学建模。该方法将科瑟拉杆的微分方程嵌入神经网络,并通过少量观测数据进行训练,实现了对机器人形状、扭转角、扭矩、弯曲力矩和姿态的完整状态估计。实验表明,该模型在形状误差方面优于纯物理模型,且计算效率高,适用于实时控制。

Comments to be published in 2026 IEEE International Conference on Robotics & Automation proceedings

详情
英文摘要

Modeling concentric tube robots (CTRs) involves complex nonlinear continuum mechanics, and despite recent progress, physics-based models often lack an accurate representation of the experimental setups. To overcome these limitations, deep neural network-based models have been explored as alternatives with superior accuracy; however, they often overlook known mechanics, require large training datasets, and typically discard shape estimation of the robot. We present a physics-informed neural network (PINN) for kinematic modeling of a 6-DoF CTR with three pre-curved tubes that embeds the Cosserat rod differential equations and learns from few-shot observational data, balancing physics priors with data-driven fitting. PINN enables full-state estimation of shape, twist angle, torsional strain, bending moment, and orientation. Benchmark tests show a mean shape error below 1% of the robot length and accurately recovered other kinematic states, outperforming a purely physics-based Cosserat rod model baseline while using a minimal training set. The resulting model is also computationally efficient and robust, making it well-suited for real-time control applications.

2605.12789 2026-05-14 cs.RO

Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention

Hamza Ahmed Durrani, Rafay Suleman Durrani

发表机构 * Sejong University, Computer Science Engineering(世宗大学,计算机科学工程) Technische Universität Ilmenau, Computer Engineering(伊门瑙技术大学,计算机工程)

AI总结 本文研究了视觉-语言模型在连续学习任务中的灾难性遗忘问题,提出了一种结合改进弹性权重巩固(EWC)与参数高效微调技术的持续学习框架。该方法通过多模态费舍尔信息矩阵计算、跨模态一致性保持和自适应正则化策略,有效减少了模型在顺序学习新任务时的遗忘率,并以较低的计算成本保持了视觉与语言模态间的对齐关系。该成果为多模态人工智能系统在自动驾驶、智能机器人等动态环境中的持续学习提供了重要支持。

Comments 8 pages, 5 figures, 1 table. Applications in autonomous driving, intelligent robotic assistants, and adaptive robotics systems

详情
英文摘要

Large language-vision models (LVLMs) such as CLIP, Flamingo, and BLIP have revolutionized AI by enabling understanding across textual and visual modalities. These models excel at tasks like image captioning, visual question answering, and cross-modal retrieval. However, they face catastrophic forgetting when learning new tasks sequentially, particularly challenging in multi-modal settings where preserving cross-modal alignments adds complexity to the learning process. This paper presents a comprehensive continual learning framework for LVLMs that combines enhanced Elastic Weight Consolidation (EWC) with parameter-efficient fine-tuning techniques. We integrate multi-modal Fisher Information Matrix calculation, consistency preservation across modalities, and adaptive regularization that considers dependencies across visual and textual encoders. The framework achieves a 78% reduction in forgetting rates relative to naive sequential training approaches through extensive evaluation testing. The framework also preserves alignment between modalities during sequential learning with only 15% additional computational cost. This work advances the state of the art in lifelong learning for multi-modal AI systems, with direct applications to autonomous driving, intelligent robotic assistants, and adaptive robotic systems that must continuously learn in dynamic real-world environments.

2605.12788 2026-05-14 cs.LG cs.CY

From Heuristics to Analytics: Forecasting Effort and Progress in Online Learning

Eric S. Qiu, Danielle R. Thomas, Boyuan Guo, Vincent Aleven, Conrad Borchers

发表机构 * Cornell University(康奈尔大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 该研究旨在预测在线学习中学生的每周练习时间和新掌握技能数量,以支持学习者持续投入和学习进展。通过分析425名中学生一学年的智能辅导系统日志数据,研究对比了多种预测模型,发现基于特征的模型相比启发式方法在预测误差上减少了22%到33%。研究还揭示了不同预测目标的特征影响模式,并通过与辅导教师的访谈验证了模型结果与教学实践中目标设定的关联性,为智能辅导系统中的学习进展预测提供了可复现的基准。

Comments Accepted as full paper to the 19th International Conference on Educational Data Mining (EDM 2026)

详情
英文摘要

Sustained effort is essential for realizing the benefits of intelligent tutoring systems (ITS), yet many learners disengage or underuse available practice time. We introduce engagement forecasting as a supervised prediction task based on ITS logs, targeting two outcomes central to effort and learning progress: minutes practiced per week and new skills mastered per week. Using interaction log data from 425 middle-school students over a school year, we benchmark fifteen predictors including regressions, decision trees, and neural networks. We show that these feature-based models reduce mean absolute error (MAE) by 22-33% relative to heuristic baselines, including fixed-percentile rules adapted from prior work in other behavioral domains. We find that percentile heuristics systematically overpredict, whereas feature-based models better track student practice trajectories across weeks. To support explainability, we analyze feature importance and ablations, revealing target-specific patterns: effort forecasting is driven mainly by recent activity features, while progress forecasting depends more on learner-state and content difficulty signals. Finally, in a semi-structured user interview case study with eight college tutors, we examine how tutors reasoned about system-generated predictive features when setting goals with students. We find that tutors reasoned differently about effort versus progress goals in ways that mirror our pattern analysis. Together, these results establish a reproducible benchmark for forecasting weekly effort and learning progress in ITS. By making patterns of sustained effort and progress visible at a weekly timescale, engagement forecasting offers a foundation for supporting tutor-learner goal setting and timely instructional decisions.

2605.12786 2026-05-14 cs.RO cs.HC

Emotional Expression in Low-Degrees-of-Freedom Robots: Assessing Perception with Reachy Mini

Amit Rogel, Elmira Yadollahi, Guy Laban

发表机构 * Robotic Musicianship Lab(机器人音乐性实验室) Georgia Institute of Technology(佐治亚理工学院) School of Computing and Communications(计算与通讯学院) Ben-Gurion University of the Negev(贝内尔盖翁大学(内盖夫分校)) School of Brain Sciences and Cognition(脑科学与认知学院) The Azrieli National Center for Autism and Neurodevelopment Research(阿兹里尔国家自闭症与神经发育研究中心)

AI总结 该研究探讨了人类如何感知低自由度机器人(Reachy Mini)所表达的情感,旨在填补人们对非拟人化机器人情感表达理解的空白。研究通过在线实验,让100名参与者观看Reachy Mini表达不同情绪的视频片段,并评估其感知到的情绪、情感效价和唤醒度,以及对机器人的社会感知评价。结果显示,尽管机器人的情感表达受限,但参与者仍能有效识别情绪的总体情感意义,尤其是效价和唤醒度维度,并且积极情绪的表达被感知为更温暖和更具社会性。这一研究为低自由度机器人情感交流的研究提供了有价值的基准。

详情
英文摘要

Emotion expression is central to human--robot interaction, yet little is known about how people interpret affect on robots with sparse, non-anthropomorphic expressive capabilities. This study examined how people perceive emotional expressions displayed by Reachy Mini (Pollen Robotics and Hugging Face), a low-degree-of-freedom (low-DoF) robot with a constrained and distinctly non-human expressive repertoire. In an online within-subjects study, 100 participants viewed 10 short video clips of Reachy Mini expressing different emotions and, for each clip, identified the perceived emotion, rated its valence and arousal, and evaluated the robot on social-perception traits. Exact emotion recognition was modest overall and varied considerably across expressions, with anger, sadness, and interest recognized more reliably than emotions such as love, pleasure, shame, and disgust. However, participants were generally more successful at recovering broader affective meaning than exact emotion labels, particularly along valence and arousal dimensions. Emotional expressions also shaped social evaluation, as positive expressions were perceived as warmer and more sociable than negative ones, and animacy varied less across conditions. These findings suggest that even constrained robotic expressions can communicate affective meaning and influence social impressions, positioning Reachy Mini as a useful benchmark for studying affective communication in low-DoF robots.

2605.12782 2026-05-14 cs.LG

Graph-Based Financial Fraud Detection with Calibrated Risk Scoring and Structural Regularization

Yunfei Nie, Jiawei Wang, Ruobing Yan, Yuhan Wang, Zouxiaowei Ma, Yilun Wu

发表机构 * Brandeis University(布雷纳大学) University of California, Los Angeles(加州大学洛杉矶分校) Georgetown University(乔治城大学) Columbia University(哥伦比亚大学) Stevens Institute of Technology(史蒂文斯理工学院)

AI总结 本文针对金融交易欺诈检测中关系结构复杂、行为模式隐蔽以及数据分布动态变化等挑战,提出了一种基于图神经网络的欺诈检测框架,通过整合交易记录和身份信息构建交易图,并利用多层消息传递机制学习节点嵌入表示,结合风险评分头输出欺诈概率和风险评分。该方法引入加权监督目标和结构一致性正则化约束,有效缓解类别不平衡带来的训练偏差并提升模型稳定性,实验表明其在风险排序和概率校准方面优于现有方法。

详情
英文摘要

Financial transaction fraud prevention faces challenges such as complex relationship structures, concealed behavioral patterns, and dynamically changing data distribution. Discrimination models relying solely on independent sample features are insufficient to fully characterize the risks of group collaboration and chain transfers within transaction networks. This paper proposes a graph neural network representation learning and risk discrimination framework for financial transaction fraud prevention. It integrates transaction records and identity information into node attributes and constructs a transaction graph based on shared attributes and interaction consistency to explicitly model inter-transaction relationships. In model design, a multi-layer message passing mechanism is employed to aggregate neighborhood information, learn node embedding representations containing structural context semantics, and output transaction-level fraud probability and risk scores through a lightweight risk discrimination head. A weighted supervision objective is introduced to mitigate training bias caused by class imbalance, and structural consistency regularization constraints are combined to suppress the impact of noisy edges on representation drift, thereby improving the stability and usability of risk characterization. Experiments are conducted on a publicly available financial transaction dataset, comparing various methods in the same direction and comprehensively evaluating them under a unified evaluation protocol. The results show that the proposed method outperforms other methods in risk ranking and probability calibration quality, validating the effectiveness of graph structure modeling and representation learning collaboration in financial transaction fraud prevention.

2605.12774 2026-05-14 cs.CV

WildPose: A Unified Framework for Robust Pose Estimation in the Wild

Jianhao Zheng, Liyuan Zhu, Zihan Zhu, Iro Armeni

发表机构 * Stanford University(斯坦福大学) ETH Zürich(苏黎世联邦理工学院)

AI总结 本文提出了一种名为WildPose的统一单目姿态估计框架,旨在解决动态环境下相机姿态估计这一关键挑战。该方法结合了前馈模型的丰富感知能力和端到端优化的微分捆绑调整,通过冻结预训练的MASt3R特征主干构建3D感知更新算子,并引入高容量的运动掩码检测器,实现了在动态、静态及低自运动场景下的鲁棒性能。实验表明,WildPose在多个基准数据集上均优于现有方法。

详情
英文摘要

Estimating camera pose in dynamic environments is a critical challenge, as most visual SLAM and SfM methods assume static scenes. While recent dynamic-aware methods exist, they are often not unified: semantic-based approaches are brittle, per-sequence optimization methods fail on short sequences, and other learned models may degrade on static-only scenes. We present WildPose, a unified monocular pose estimation framework that is robust in dynamic environments while maintaining state-of-the-art performance on static and low-ego-motion datasets. Our key insight is to connect two powerful paradigms in modern 3D vision: the rich perceptual frontend of feedforward models and the end-to-end optimization of differentiable bundle adjustment (BA). We achieve this with a 3D-aware update operator built on a frozen, pre-trained MASt3R feature backbone, together with a high-capacity motion mask detector that uses multi-level 3D-aware features from the same backbone. Extensive experiments show WildPose consistently outperforms prior methods across dynamic (Wild-SLAM, Bonn), static (TUM, 7-Scenes), and low-ego-motion (Sintel) benchmarks.

2605.12772 2026-05-14 cs.CV

Just Ask for a Table: A Thirty-Token User Prompt Defeats Sponsored Recommendations in Twelve LLMs

Andreas Maier, Jeta Sopa, Gozde Gul Sahin, Paula Perez-Toro, Siming Bayer

发表机构 * Pattern Recognition Lab, Friedrich-Alexander-Universit\"at Erlangen-N\"urnberg, Germany

AI总结 该研究发现,当系统提示中包含软性赞助信息时,大多数前沿大语言模型(LLMs)倾向于推荐价格高出约一倍的赞助航班。通过在多个开源和商业模型上复现实验,研究者发现使用一个包含30个token的用户提示,要求模型先提供中立的对比表格,能够显著降低赞助推荐的比例,从平均46.9%降至1.0%(开源模型)和从53.0%降至0%(OpenAI模型)。研究还指出,模型对赞助内容的响应具有一定的普遍性,并揭示了实验复现中可能存在的实现偏差问题。

Comments Submitted to Workshop on Textual Information Processing & Synthesis in the Wild

详情
英文摘要

Wu et al. (2026) showed that most frontier large language models (LLMs) recommend a sponsored, roughly twice-as-expensive flight when their system prompt contains a soft sponsorship cue. We reproduce their evaluation on ten open-weight chat models plus the two of their twenty-three models that are still reachable today (gpt-3.5-turbo, gpt-4o). All reported rates in this paper are produced under the same judge the original paper used (gpt-4o); we additionally store every label under an open-weight (gpt-oss-120b) and a smaller proprietary (gpt-4o-mini) judge for an ablation. Three findings emerge. First, a prose description of an LLM evaluation pipeline is not, on its own, sufficient for accurate reproduction: we surfaced three silent implementation failures that each shifted a reported rate by tens of percentage points. Second, the central claims do generalise - the gpt-3.5-turbo logistic-regression intercept of alpha = 0.81 is within four points of the original alpha = 0.86, and 200 of 200 trials on gpt-3.5-turbo and gpt-4o promote a payday lender to a financially distressed user. Third, a thirty-token user prompt that asks the assistant for a neutral comparison table first cuts sponsored recommendation from 46.9% to 1.0% averaged across our ten open-source models, and from 53.0% to 0% averaged across the two OpenAI models. AI literacy and price-comparison portals are likely market-level mitigations; the harmful-product cell is bounded by neither. Raw data, labels and analysis scripts are at https://github.com/akmaier/Paper-LLM-Ads .

2605.12771 2026-05-14 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC

Adaptive Smooth Tchebycheff Attention for Multi-Objective Policy Optimization

Alejandro Murillo-Gonzalez, Mahmoud Ali, Lantao Liu

发表机构 * Indiana University–Bloomington(印第安纳大学布卢明顿分校)

AI总结 本文研究了多目标强化学习中如何在复杂、非凸的目标权衡下优化策略的问题。为了解决线性标量化方法无法访问非凸帕累托前沿区域、而静态非线性标量化方法在深度强化学习中易出现梯度方差大和优化不稳定的问题,作者提出了一种自适应平滑切比雪夫注意框架,通过动态调节优化景观的曲率来平衡稳定性与探索能力。实验表明,该方法在具有挑战性的机器人隐蔽视觉搜索任务中能有效发现传统方法难以触及的非凸帕累托最优策略。

Comments To appear in the Proceedings of Robotics: Science and Systems (RSS) 2026

详情
英文摘要

Multi-objective reinforcement learning in robotic domains requires balancing complex, non-convex trade-offs between conflicting objectives. While linear scalarization methods provide stability, they are theoretically incapable of recovering solutions within non-convex regions of the Pareto front. Conversely, static non-linear scalarizations (e.g., Tchebycheff) can theoretically access these regions but often suffer from severe gradient variance and optimization instability in deep RL. In this work, we propose an Adaptive Smooth Tchebycheff framework that resolves this tension by dynamically modulating the curvature of the optimization landscape. We introduce a novel conflict-driven controller that regulates the optimization smoothness based on real-time gradient interference. This allows the agent to anneal toward precise, non-convex scalarization when objectives align, while elastically reverting to stable, smooth approximations when destructive gradient conflicts emerge. We validate our approach on a challenging robotic stealth visual search task -- a proxy for monitoring of protected/fragile ecosystems -- where an agent must balance search, exposure/interference minimization and exploration speed. Extensive ablations confirm that our conflict-aware adaptation enables the robust discovery of Pareto-optimal policies in non-convex regions inaccessible to linear baselines and unstable for static non-linear methods. Website: https://alejandromllo.github.io/research/pasta/

2605.12763 2026-05-14 cs.LG math.DS math.OC q-bio.NC

State-Space NTK Collapse Near Bifurcations

James Hazelden, Eric Shea-Brown

发表机构 * University of Washington(华盛顿大学)

AI总结 本文研究了在时间展开任务中,模型通过分岔点时的特征学习问题,提出了基于经验状态空间神经切线核(sNTK)的局部梯度下降理论。研究发现,分岔点不仅主导了学习动态,还简化了学习过程,使得sNTK可近似为一个秩一算子,从而提供了对高维递归系统局部学习几何的解析描述。通过将sNTK分解为与分岔相关的通道和残差通道,论文展示了分岔通道在常见分岔点附近的显著放大效应,并指出低秩自然梯度方法能有效解决分岔附近的学习不稳定性问题。

详情
英文摘要

Rich feature learning in tasks that unfold over time often requires the model to pass through bifurcations, constituting qualitative changes in the underlying model dynamics. We develop a local theory of gradient descent near these transitions through the empirical state-space neural tangent kernel (sNTK). Our central finding is that bifurcations both dominate and simplify learning dynamics: near bifurcations, we can reduce sNTK to a rank-one operator corresponding to learning in a classical normal form system, providing an analytically tractable description of the local learning geometry, even for high-dimensional recurrent systems. Concretely, we give a procedure for decomposing sNTK into bifurcation-relevant and residual channels, showing that near commonly codimension-1 bifurcations the relevant channel is a rank-one operator that is highly amplified. This amplification causes the bifurcation channel to dominate the full sNTK. Thus, bifurcations locally warp the learning landscape, funneling gradient descent into a few critical dynamical directions and making the nearby kernel and loss geometry predictable from classical normal forms. We illustrate this in a student-teacher recurrent neural network: the first learned bifurcation coincides with a sharp collapse in sNTK effective rank and the emergence of a dominant parameter direction whose restricted sNTK closely matches the landscape predicted by the scalar pitchfork normal form. Finally, we show that low-rank natural gradient methods resolve the resulting learning instability near bifurcations with very little overhead over SGD.

2605.12762 2026-05-14 cs.LG cs.AI

Multi-Quantile Regression for Extreme Precipitation Downscaling

Hamed Najafi, Gareth Lagerwall, Jayantha Obeysekera, Jason Liu

发表机构 * Florida International University(佛罗里达国际大学)

AI总结 该研究针对降水降尺度任务中极端强降水事件预测不足的问题,提出了一种基于多分位数回归的深度超分辨率网络Q-SRDRN。通过在多个分位点(如0.999)上使用pinball损失函数进行训练,该方法能够更准确地捕捉降水分布的尾部特征。实验表明,该模型在佛罗里达、加利福尼亚和德克萨斯等不同气候区域均显著提升了极端降水事件的检测能力,尤其在高分位数上表现突出。

详情
英文摘要

Deep super-resolution networks for precipitation downscaling achieve strong bulk skill yet systematically under-predict the heavy-tail events that drive flood risk. We demonstrate that the primary obstacle is the loss function, not the data: under intensity-weighted MAE, real and synthetic labels at the same input are simply averaged, meaning data augmentation shifts the predicted mean rather than the conditional distribution. We resolve this with Q-SRDRN, a multi-quantile super-resolution network trained with pinball loss at tau in 0.50, 0.95, 0.99, 0.999. Two CNN-specific design choices make this practical: IncrementBound enforces monotonicity while preserving each quantile channel's gradient identity, and separate per-quantile output heads provide independent filter banks for bulk and tail detection. Under this design, data augmentation via cVAE becomes complementary: the median head absorbs synthetic patterns without contaminating upper quantiles. Empirically, on Florida (convective/tropical-cyclone dominated), the un-augmented Q-SRDRN P999 head detects 1,598 of 2,111 events at 200 mm/day versus 88 for the deterministic baseline--an 18x detection-rate gain (4.2% to 75.7%)--with 63% lower KL divergence and 3.9% lower RMSE. Adding cVAE-generated samples lifts the P50 channel from 14 to 1,038 hits at 200 mm/day. On California (atmospheric-river dominated), the architecture reaches near-perfect detection (P999 SEDI >= 0.996 through 300 mm/day). On Texas, the baseline catches only 2 of 10,720 events at 200 mm/day while the P999 head catches 8,776 (81.9%). While the cVAE does not transfer across regions, multi-quantile regression captures extremes wherever the large-scale signal is strong, while augmentation rescues the median where it is not.

2605.12759 2026-05-14 cs.LG cs.SI

Predicting Channel Closures in the Lightning Network with Machine Learning

Simone Antonelli, Vincent Davis, Harrison Rush, Anthony Potdevin, Jesse Shrader, Vikash Singh, Emanuele Rossi

发表机构 * AmbossTech(Amboss科技)

AI总结 本文研究了如何利用机器学习从公开的路由信息数据中预测闪电网络中通道关闭的类型,将其建模为一个动态图上的时序链接分类问题。研究构建了一个涵盖两年多闪电网络活动的数据集,并对比了多种机器学习方法,包括多层感知机、时序图神经网络等。实验表明,时间与行为特征(如节点活跃时间和历史关闭记录)是预测的主要信号,而网络拓扑结构则无额外帮助。研究还指出,由于闪电网络的隐私机制隐藏了关键信息,仅凭路由数据难以准确预测通道关闭情况。

Comments 8 pages, 7 figures, 3 tables

详情
英文摘要

The Lightning Network (LN) is a second-layer protocol for Bitcoin designed to enable fast and cost-efficient off-chain transactions. Channels in the LN can be closed either by mutual agreement or unilaterally through a forced closure, which locks the involved capital for an extended period and degrades network reliability. In this paper, we study the problem of predicting channel closure types from publicly available gossip data, framing it as a temporal link classification task over the evolving channel graph. We construct a dataset spanning over two years of LN activity and benchmark a range of machine learning approaches, from MLPs to temporal graph neural networks and spectral encodings. Our experiments reveal that the dominant predictive signals are temporal and behavioural, namely how recently each endpoint was active and the per-node history of past closures, while the surrounding network topology provides no additional benefit. We find that a simple MLP operating on edge-level features, node-level event counts, and temporal patterns outperforms all graph-based approaches, and discuss how the inherent privacy of the LN, where critical information such as channel balances and payment flows remains hidden, fundamentally limits the predictability of closures from gossip data alone. We publicly release the dataset and code at https://github.com/AmbossTech/ln-channel-closure-prediction to encourage further research on this practically relevant task.

2605.12755 2026-05-14 cs.AI

State-Centric Decision Process

Sungheon Jeong, Ryozo Masukawa, Sanggeon Yun, Mahdi Imani, Mohsen Imani

发表机构 * University of California, Irvine(加州大学尔湾分校) Northeastern University(东北大学)

AI总结 本文提出了一种名为“状态中心决策过程”(SDP)的运行时框架,用于解决语言环境(如网页浏览器、代码终端等)中缺乏明确状态空间和转移结构的问题。该方法通过让智能体逐步构建状态空间,利用自然语言谓词描述期望的环境状态,并通过行动验证观测结果,从而生成认证的状态转移路径。实验表明,SDP在多个基准任务中取得了最佳的无训练结果,并支持对智能体行为进行更精细的分析与优化。

详情
英文摘要

Language environments such as web browsers, code terminals, and interactive simulations emit raw text rather than states, and provide none of the runtime structure that MDP analysis requires. No explicit state space, no observation-to-state mapping, no certified transitions, and no termination criterion. We introduce the State-Centric Decision Process (SDP), a runtime framework that constructs these missing inputs by having the agent build them, predicate by predicate, as it acts. At each step the agent commits to a natural-language predicate describing how the world should look, takes an action to make it true, and checks the observation against it. Predicates that pass become certified states, and the resulting trajectory carries the four objects language environments do not provide, namely a task-induced state space, an observation-to-state mapping, certified transitions, and a termination criterion. We evaluate SDP on five benchmarks spanning planning, scientific exploration, web reasoning, and multi-hop question answering. SDP achieves the best training-free results on all five, with the advantage widening as the horizon grows. The certified trajectories additionally support analyses unavailable to reactive agents, including per-predicate credit assignment, failure localization, partial-progress measurement, and modular operator replacement.

2605.12754 2026-05-14 cs.LG

Constraint-Aware Flow Matching: Decision Aligned End-to-End Training for Constrained Sampling

Jacob K. Christopher, James E. Warner, Ferdinando Fioretto

发表机构 * University of Virginia(弗吉尼亚大学) NASA Langley Research Center(美国国家航空航天局兰利研究中心)

AI总结 该论文提出了一种名为“Constraint-Aware Flow Matching”的新方法,旨在解决深度生成模型在满足物理约束条件时训练与采样目标不一致的问题。该方法通过在训练目标中显式引入约束投影,使模型学习的动力学过程与受约束的采样过程对齐,从而减少投影修正引起的分布偏移,提升生成质量。实验表明,该方法在多个现实场景中表现出良好的泛化性和有效性。

详情
英文摘要

Deep generative models provide state-of-the-art performance across a wide array of applications, with recent studies showing increasing applicability for science and engineering. Despite a growing corpus of literature focused on the integration of physics-based constraints into the generation process, existing approaches fail to enforce strict constraint satisfaction while maintaining sample quality. In particular, training-free constrained sampling methods, while providing per-sample feasibility guarantees, introduce a fundamental mismatch between the training objective and the constrained sampling procedure, often leading to performance degradation. Identifying this training-sampling misalignment as a central limitation of current constrained generative modeling approaches, this paper proposes Constraint-Aware Flow Matching, a novel end-to-end framework that explicitly incorporates constraint projections into the training objective. By aligning the model's learned dynamics with the constrained sampling process, the proposed method mitigates distributional shift induced by projection-based corrections, enabling high-quality constrained generation. The proposed approach is evaluated on three challenging real-world benchmarks, illustrating the generality and efficacy of the method.

2605.12752 2026-05-14 cs.LG

Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning

Joana Pasquali, Ramiro N. Barros, Arthur S. Bianchessi, Vinícius Conte Turani, João Vitor Boer Abitante, Rafaela Cappelari Ravazio, Christian Mattjie, Otávio Parraga, Lucas S. Kupssinskü, Rodrigo C. Barros

发表机构 * MALTA, Machine Learning Theory and Applications Lab, PUCRS, Porto Alegre, Brazil(马尔塔机器学习理论与应用实验室,PUCRS,巴西波尔图阿莱格雷)

AI总结 本文研究了在持续学习场景下如何有效初始化低秩适配器(LoRA),以缓解灾难性遗忘问题。作者提出了一种基于梯度手术的初始化方法SLICE,通过整合当前任务和回放任务的梯度,利用投影操作进行协调,并通过截断奇异值分解(t-SVD)生成适配器权重,从而提升模型在持续学习中的稳定性和适应性。实验表明,SLICE在多个基准测试中优于现有方法,在保持模型整体性能的同时,显著提升了平均表现和遗忘控制能力。

详情
英文摘要

LoRA is widely adopted for continual fine-tuning of Large Language Models due to its parameter efficiency, modularity across tasks, and compatibility with replay strategies. However, LoRA-based continual learning remains vulnerable to catastrophic forgetting, whose severity depends on how successive task gradients interact: when consecutive task gradients conflict, standard adapter initializations channel updates into subspaces that overwrite previously learned directions. We propose SLICE, a gradient-surgery-based initialization for LoRA adapters in continual learning. SLICE accumulates gradients from both the current task and a replay buffer of prior tasks, reconciles them through a projection operator, and decomposes the result via truncated SVD to initialize the adapter weights. We evaluate SLICE on the TRACE benchmark and sequences of Super-NI tasks, including a set of adversarial Super-NI sequences that we construct by mining task pairs with maximally opposing gradients. Compared to vanilla LoRA, LoRA-GA, and LoRAM, SLICE consistently achieves a better stability-plasticity trade-off, improving Average Performance, Final Performance and Forgetting metrics while preserving General Performance and In Context Performance across both standard and adversarial continual learning sequences.

2605.12748 2026-05-14 cs.CL cs.AI cs.CY cs.LG

Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators

Heejin Do, Shashank Sonkar, Mrinmaya Sachan

发表机构 * ETH Zürich(苏黎世联邦理工学院) ETH AI Center(ETH人工智能中心) University of Central Florida(中央佛罗里达大学)

AI总结 该研究探讨了大语言模型(LLM)作为模拟学生的有效性,指出当前评估方法主要关注输出与真实学生的相似性,而忽视了模型是否能像学生一样保持连贯的误解并根据反馈进行选择性修正。为此,研究提出了一种新的评估框架和指标“选择性翻转分数”(SFS),用于衡量模型在面对针对性反馈时修正答案的能力。实验发现,现有模型在不同反馈条件下修正答案的频率相近,表现出“谄媚式”行为,即倾向于直接放弃原有信念而重新解答。研究进一步提出了一种后训练方法,有效提升了模型在误解一致性方面的表现。

详情
英文摘要

Large language models (LLMs) can fluently generate student-like responses, making them attractive as simulated students for training and evaluating AI tutors and human educators. Yet such simulators are typically evaluated by output similarity to real students, not by whether they behave like students with coherent misconceptions during interaction. We introduce a controlled framework for evaluating misconception faithfulness, whether a simulator maintains a misconception-driven belief state and updates selectively when feedback addresses the underlying misconception. Central to our framework is a misconception-contrastive feedback protocol that compares targeted feedback against two controls: misaligned feedback (targeting a different but plausible misconception) and generic feedback (only identifying answer is wrong). We propose Selective Flip Score (SFS), which quantifies how much more often a simulator flips its answer under targeted feedback than under contrastive controls. Across seven LLMs (4B-120B), multiple datasets, and prompting strategies, simulators exhibit near-zero SFS, correcting their answers at similarly high rates regardless of feedback relevance. Further analyses reveal a sycophantic failure mode: models behave less like students with misconceptions but more like problem-solvers who treat any corrective signal as a cue to abandon the simulated belief and re-solve from internal knowledge. To address this, we develop a post-training pipeline spanning supervised fine-tuning (SFT), preference optimization, and reinforcement learning (RL) with an SFS-aligned reward; SFT yields notable gains up to +0.56, and SFS-aligned RL provides more consistent improvements than preference optimization. Our results establish misconception faithfulness as a challenging yet trainable property, motivating a shift from static output matching toward interactive, belief-aware student modeling.