arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2501.18873 2026-04-23 cs.LG

Best Policy Learning from Trajectory Preference Feedback

Akhil Agnihotri, Rahul Jain, Deepak Ramachandran, Zheng Wen

详情

英文摘要

Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful approach for aligning generative models, but its reliance on learned reward models makes it vulnerable to mis-specification and reward hacking. Preference-based Reinforcement Learning (PbRL) offers a more robust alternative by directly leveraging noisy binary comparisons over trajectories. We study the best policy identification problem in PbRL, motivated by post-training optimization of generative models, for example, during multi-turn interactions. Learning in this setting combines an offline preference dataset - potentially biased or out-of-distribution and collected from a rater of subpar `competence' - with online pure exploration, making systematic online learning essential. To this end, we propose Posterior Sampling for Preference Learning ($\mathsf{PSPL}$), a novel algorithm inspired by Top-Two Thompson Sampling that maintains posteriors over the reward model and dynamics. We provide the first Bayesian simple regret guarantees for PbRL and introduce an efficient approximation that outperforms existing baselines on simulation and image generation benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2412.13697 2026-04-23 cs.LG

Splitting criteria for ordinal decision trees: an experimental study

Rafael Ayllón-Gavilán, Francisco José Martínez-Estudillo, David Guijo-Rubio, César Hervás-Martínez, Pedro Antonio Gutiérrez

Comments 33 pages, 4 figures, 6 tables

详情

DOI: 10.1016/j.patcog.2025.112273
Journal ref: Pattern Recognition, Volume 171, Part B, March 2026, 112273

英文摘要

Ordinal Classification (OC) addresses those classification tasks where the labels exhibit a natural order. Unlike nominal classification, which treats all classes as mutually exclusive and unordered, OC takes the ordinal relationship into account, producing more accurate and relevant results. This is particularly critical in applications where the magnitude of classification errors has significant consequences. Despite this, OC problems are often tackled using nominal methods, leading to suboptimal solutions. Although decision trees are among the most popular classification approaches, ordinal tree-based approaches have received less attention when compared to other classifiers. This work provides a comprehensive survey of ordinal splitting criteria, standardising the notations used in the literature to enhance clarity and consistency. Three ordinal splitting criteria, Ordinal Gini (OGini), Weighted Information Gain (WIG), and Ranking Impurity (RI), are compared to the nominal counterparts of the first two (Gini and information gain), by incorporating them into a decision tree classifier. An extensive repository considering $45$ publicly available OC datasets is presented, supporting the first experimental comparison of ordinal and nominal splitting criteria using well-known OC evaluation metrics. The results have been statistically analysed, highlighting that OGini stands out as the best ordinal splitting criterion to date, reducing the mean absolute error achieved by Gini by more than 3.02%. To promote reproducibility, all source code developed, a detailed guide for reproducing the results, the 45 OC datasets, and the individual results for all the evaluated methodologies are provided.

URL PDF HTML ☆

赞 0 踩 0

2411.16719 2026-04-23 cs.CV cs.LG

Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients for Brain Image Segmentation

Xiaoling Hu, Xiangrui Zeng, Oula Puonti, Juan Eugenio Iglesias, Bruce Fischl, Yael Balbastre

Comments 16 pages, 5 figures. Accepted by ICCV'25. Bruce Fischl and Yael Balbastre are co-senior authors

2411.10109 2026-04-23 cs.AI cs.HC cs.LG

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Joon Sung Park, Carolyn Q. Zou, Jonne Kamphorst, Niles Egan, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Percy Liang, Robb Willer, Michael S. Bernstein

2410.06239 2026-04-23 cs.RO

Open-Architecture End-to-End System for Real-World Autonomous Robot Navigation

Venkata Naren Devarakonda, Ali Umut Kaypak, Raktim Gautam Goswami, Naman Patel, Rooholla Khorrambakht, Prashanth Krishnamurthy, Farshad Khorrami

2408.07295 2026-04-23 cs.RO cs.AI

Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

Pranay Dugar, Aayam Shrestha, Fangzhou Yu, Bart van Marum, Alan Fern

Comments Website: https://masked-humanoid.github.io/mhc/

2408.00929 2026-04-23 cs.LG cs.CR

Verification of Machine Unlearning is Fragile

Binchi Zhang, Zihan Chen, Cong Shen, Jundong Li

Comments ICML 2024

2408.00920 2026-04-23 cs.LG stat.ML

Towards Certified Unlearning for Deep Neural Networks

Binchi Zhang, Yushun Dong, Tianhao Wang, Jundong Li

Comments ICML 2024 (errata)

2407.17395 2026-04-23 cs.LG

The Costs of Pretending That There Are Data-Generating Probability Distributions in the Social World

Benedikt Höltgen, Robert C. Williamson

Comments Accepted at FAccT'26

2403.20208 2026-04-23 cs.LG cs.AI

Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining

Yazheng Yang, Yuqi Wang, Yaxuan Li, Sankalok Sen, Lei Li, Lin Qiu, Qi Liu

Comments 10 pages; Accepted by TKDE

2402.13103 2026-04-23 cs.LG math.ST stat.TH

Multivariate Functional Linear Discriminant Analysis for the Classification of Short Time Series with Missing Data

Rahul Bordoloi, Clémence Réda, Orell Trautmann, Saptarshi Bej, Olaf Wolkenhauer

2402.06266 2026-04-23 cs.LG

Issues with Value-Based Multi-objective Reinforcement Learning: Value Function Interference and Overestimation Sensitivity

Peter Vamplew, Ethan, Watkins, Cameron Foale, Richard Dazeley

Comments This updates our previous pre-print to add extended discussion of value-function interference as well as new material illustrating the interaction between Q-value overestimation and non-linear utility

2308.00513 2026-04-23 cs.RO

UVIO: An UWB-Aided Visual-Inertial Odometry Framework with Bias-Compensated Anchors Initialization

Giulio Delama, Farhad Shamsfakhr, Stephan Weiss, Daniele Fontanelli, Alessandro Fornasier

2306.10084 2026-04-23 cs.LG

Convolutional and Deep Learning based techniques for Time Series Ordinal Classification

Rafael Ayllón-Gavilán, David Guijo-Rubio, Pedro Antonio Gutiérrez, Anthony Bagnall, César Hervás-Martínez

Comments 13 pages, 9 figures, 2 tables

2305.09288 2026-04-23 cs.LG

A Dictionary-based approach to Time Series Ordinal Classification

Rafael Ayllón-Gavilán, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martinez

2208.14649 2026-04-23 cs.CV

DetailCLIP: Injecting Image Details into CLIP's Feature Space

Zilun Zhang, Cuifeng Shen, Yuan Shen, Xinyu Zhou, Huixin Xiong, Tiancheng Zhao, Jianwei Yin

2105.09232 2026-04-23 cs.LG math.ST stat.TH

Diffusion Approximations for Thompson Sampling in the Small Gap Regime

Lin Fan, Peter W. Glynn

1810.11624 2026-04-23 cs.LG stat.ML

Time series clustering based on the characterisation of segment typologies

David Guijo-Rubio, Antonio Manuel Durán-Rosal, Pedro Antonio Gutiérrez, Alicia Troncoso, César Hervás-Martínez

Comments 13 pages, 7 figures, 4 tables, 57 refs

详情

DOI: 10.1109/TCYB.2019.2962584
Journal ref: IEEE Transactions on Cybernetics ( Volume: 51, Issue: 11, November 2021)

英文摘要

Time series clustering is the process of grouping time series with respect to their similarity or characteristics. Previous approaches usually combine a specific distance measure for time series and a standard clustering method. However, these approaches do not take the similarity of the different subsequences of each time series into account, which can be used to better compare the time series objects of the dataset. In this paper, we propose a novel technique of time series clustering based on two clustering stages. In a first step, a least squares polynomial segmentation procedure is applied to each time series, which is based on a growing window technique that returns different-length segments. Then, all the segments are projected into same dimensional space, based on the coefficients of the model that approximates the segment and a set of statistical features. After mapping, a first hierarchical clustering phase is applied to all mapped segments, returning groups of segments for each time series. These clusters are used to represent all time series in the same dimensional space, after defining another specific mapping process. In a second and final clustering stage, all the time series objects are grouped. We consider internal clustering quality to automatically adjust the main parameter of the algorithm, which is an error threshold for the segmenta- tion. The results obtained on 84 datasets from the UCR Time Series Classification Archive have been compared against two state-of-the-art methods, showing that the performance of this methodology is very promising.

URL PDF HTML ☆

赞 0 踩 0

2604.20842 2026-04-23 cs.CL cs.AI cs.SD

SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation

Ruohan Liu, Shukang Yin, Tao Wang, Dong Zhang, Weiji Zhuang, Shuhuai Ren, Ran He, Caifeng Shan, Chaoyou Fu

Comments Project page: https://speechparaling-bench.github.io/

2604.20841 2026-04-23 cs.CV

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho, Hanbyul Joo

Comments Project Page: https://snuvclab.github.io/devi/

2604.20835 2026-04-23 cs.CL

Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

Zhaofeng Wu, Shiqi Wang, Boya Peng, Anuj Goyal, Melanie Kambadur, Sebastian Ruder, Yoon Kim, Chloe Bi

2604.20825 2026-04-23 cs.LG cs.AI cs.CV cs.DC eess.SP

FedSIR: Spectral Client Identification and Relabeling for Federated Learning with Noisy Labels

Sina Gholami, Abdulmoneam Ali, Tania Haghighi, Ahmed Arafa, Minhaj Nur Alam

Comments Accepted at the 5th Workshop on Federated Learning for Computer Vision (FedVision), CVPR 2026. Sina Gholami and Abdulmoneam Ali contributed equally

2604.20824 2026-04-23 cs.LG q-bio.QM

Closing the Domain Gap in Biomedical Imaging by In-Context Control Samples

Ana Sanchez-Fernandez, Thomas Pinetz, Werner Zellinger, Günter Klambauer

2604.20822 2026-04-23 cs.CV cs.LG

Global Offshore Wind Infrastructure: Deployment and Operational Dynamics from Dense Sentinel-1 Time Series

Thorsten Hoeser, Felix Bachofer, Claudia Kuenzer

Comments 25 pages, 16 figures

2604.20819 2026-04-23 cs.LG cs.DC

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

Yiming Bian, Joshua M. Akey

2604.20817 2026-04-23 cs.CL cs.AI cs.LG

Convergent Evolution: How Different Language Models Learn Similar Number Representations

Deqing Fu, Tianyi Zhou, Mikhail Belkin, Vatsal Sharan, Robin Jia

2604.20816 2026-04-23 cs.LG cs.CV

ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control

Shelly Golan, Michael Finkelson, Ariel Bereslavsky, Yotam Nitzan, Or Patashnik

Comments Project page: https://shelley-golan.github.io/ParetoSlider-webpage/

2604.20813 2026-04-23 cs.CV

Adapting TrOCR for Printed Tigrinya Text Recognition: Word-Aware Loss Weighting for Cross-Script Transfer Learning

Yonatan Haile Medhanie, Yuanhua Ni

Comments Code and models available at https://github.com/YoHa2024NKU/Tigrinya_TrOCR_Printed Pre-trained models: https://huggingface.co/Yonatanhaile2026/tigrinya-trocrprinted, https://huggingface.co/Yonatanhaile2026/tigrinya-trocrhandwritten

2604.20811 2026-04-23 cs.AI

Diagnosing CFG Interpretation in LLMs

Hanqi Li, Lu Chen, Kai Yu

2604.20806 2026-04-23 cs.CV cs.AI cs.CL

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

Qiguang Chen, Chengyu Luan, Jiajun Wu, Qiming Yu, Yi Yang, Yizhuo Li, Jingqi Tong, Xiachong Feng, Libo Qin, Wanxiang Che

Comments ACL 2026 Camera Ready