arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1107
2506.05952 2026-05-05 cs.CV cs.AI

MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation

Dongjie Fu, Tengjiao Sun, Pengcheng Fang, Xiaohao Cai, Hansung Kim

Comments 9 pages, 4 figures, conference

详情
英文摘要

Recent advances in transformer-based text-to-motion generation have led to impressive progress in synthesizing high-quality human motion. Nevertheless, jointly achieving high fidelity, streaming capability, real-time responsiveness, and scalability remains a fundamental challenge. In this paper, we propose MOGO (Motion Generation with One-pass), a novel autoregressive framework tailored for efficient and real-time 3D motion generation. MOGO comprises two key components: (1) MoSA-VQ, a motion scale-adaptive residual vector quantization module that hierarchically discretizes motion sequences with learnable scaling to produce compact yet expressive representations; and (2) RQHC-Transformer, a residual quantized hierarchical causal transformer that generates multi-layer motion tokens in a single forward pass, significantly reducing inference latency. To enhance semantic fidelity, we further introduce a text condition alignment mechanism that improves motion decoding under textual control. Extensive experiments on benchmark datasets including HumanML3D, KIT-ML, and CMP demonstrate that MOGO achieves competitive or superior generation quality compared to state-of-the-art transformer-based methods, while offering substantial improvements in real-time performance, streaming generation, and generalization under zero-shot settings.

2506.03820 2026-05-05 cs.CL

Automatic Correction of Writing Anomalies in Hausa Texts

Ahmad Mustapha Wali, Sergiu Nisioi

Comments Accepted at ACL2026

详情
英文摘要

Hausa texts are often characterized by writing anomalies, such as incorrect character substitutions and spacing errors, which sometimes hinder natural language processing (NLP) applications. This paper presents an approach to automatically correct anomalies by finetuning transformer-based models. Using a corpus gathered from several public sources, we create a large-scale parallel dataset of over 400,000 noisy-clean Hausa sentence pairs by introducing synthetically generated noise to mimic realistic writing errors. In addition, we finetune several multilingual and African language models, including M2M100, AfriTeVA, NCAIR1/N-ATLaS, UBC-NLP/cheetah-base, and other variants of BART and T5 for this correction task. Our experimental results demonstrate that models such as M2M100 achieve state-of-the-art results despite their smaller size and distinct pretraining, and that correcting errors can have a significant impact in improving downstream tasks such as text classification, machine translation, question answering, and LLM prompting in general. This research provides a methodology, a publicly available dataset, and a comparison of models to improve Hausa text quality, thereby advancing NLP capabilities for the language and offering transferable insights for other low-resource languages.

2505.20340 2026-05-05 cs.CL cs.AI

Latent Trajectory Dynamics in Large Language Models: A Manifold Evolution Framework with Empirical Validation

Yukun Zhang, Qi Dong, Mengkang Li

详情
英文摘要

Understanding how latent representations evolve during generation is a central open problem in large language model interpretability. We introduce \textbf{Dynamical Manifold Evolution Theory} (DMET), a phenomenological framework that models LLM generation as a controlled dynamical system evolving along a trajectory on a low-dimensional semantic manifold. DMET formalizes the structural correspondence between Transformer components and a first-order ODE governed by a semantic potential $V$, and characterizes trajectory geometry through three falsifiable proxy metrics: state continuity $C$, attractor clustering quality $Q$, and topological persistence $P$, targeting local smoothness, meso-scale basin structure, and global topological organization, respectively. Across six model architectures, four task types, and 1,080 experimental runs, all three metrics consistently predict text quality outcomes -- log-perplexity, grammaticality, and cross-sentence coherence -- after controlling for decoding parameters, with associations surviving Benjamini--Hochberg correction. Ablation and sanity-check experiments confirm that the effects arise from genuine trajectory structure rather than static distributional artefacts. Furthermore, online monitoring of $C$ drives an adaptive decoding controller that reduces perplexity from 48.5 to 14.6 relative to a fixed-parameter baseline, demonstrating that latent dynamics characterization translates directly into actionable generation control.

2505.19607 2026-05-05 cs.LG cs.AI

Contrastive Residual Energy Test-time Adaptation

Yewon Han, Seoyun Yang, Taesup Kim

详情
英文摘要

Test-time adaptation (TTA) enhances model robustness by enabling adaptation to target distributions that differ from training distributions, improving real-world generalizability. However, most existing TTA approaches focus on adjusting the conditional distribution and therefore exhibit poor calibration, as they rely on uncertain predictions in the absence of labels. Energy-based TTA frameworks provide an alternative by modeling the marginal distribution of target data without depending on label predictions, but their reliance on costly sampling hinders scalability in real-world scenarios where decisions must be made without latency. In this work, we propose Contrastive Residual Energy Test-time Adaptation (CreTTA), a practical solution for reliable adaptation. We theoretically reformulate the marginal distribution adaptation as learning a residual energy function. This formulation leads to a contrastive objective where the intractable partition function mathematically cancels out, removing sampling and approximation error.Crucially, our analysis reveals that this design prevents overfitting through an adaptive gradient reweighting mechanism that leverages relative energy differences, avoiding the self-confirming bias of entropy minimization. Extensive experiments demonstrate that CreTTA achieves scalable and well-calibrated adaptation under real-world computational constraints.

2505.17370 2026-05-05 cs.LG cs.AI

Ellipsoidal Time Series Forecasting

Qilin Wang

Comments Accepted by ICML 2026. Public code at https://anonymous.4open.science/r/ FernPaper-58B4

详情
英文摘要

We argue that long-term forecasting requires learning local Jacobians with explicit spectral structure, going beyond simple conditional mean matching. Our method, Fern, invokes Brenier's theorem to directly parameterize the Jacobian as a symmetric positive semi-definite (SPD) factorization, treating forecasting as the optimal transport of probability mass from a fixed Gaussian source to data-dependent ellipsoids. This formulation reduces the computational cost of eigendecomposition from cubic to linear time while providing interpretable, geometry-aware projections. To rigorously evaluate robustness, we introduce a synthetic benchmark with controlled non-stationary shocks alongside new metrics like Effective Prediction Time (EPT). Fern demonstrates exceptional stability, outperforming baselines like DLinear and Koopa by over two orders of magnitude (up to 790x) on nonstationary settings where standard benchmarks fail to expose model brittleness.

2505.16850 2026-05-05 cs.LG cs.CL cs.CV

ATR-Bench: A Federated Learning Benchmark for Adaptation, Trust, and Reasoning

Tajamul Ashraf, Mohammed Mohsen Peerzada, Moloud Abdar, Yutong Xie, Yuyin Zhou, Xiaofeng Liu, Iqra Altaf Gillani, Janibul Bashir

Comments This paper is withdrawn due to issues in attribution to related work and the fair attribution of benchmark results, which were not adequately addressed at the time of submission. These issues affect the experimental analysis and require substantial revision

详情
英文摘要

Federated Learning (FL) has emerged as a promising paradigm for collaborative model training while preserving data privacy across decentralized participants. As FL adoption grows, numerous techniques have been proposed to tackle its practical challenges. However, the lack of standardized evaluation across key dimensions hampers systematic progress and fair comparison of FL methods. In this work, we introduce ATR-Bench, a unified framework for analyzing federated learning through three foundational dimensions: Adaptation, Trust, and Reasoning. We provide an in-depth examination of the conceptual foundations, task formulations, and open research challenges associated with each theme. We have extensively benchmarked representative methods and datasets for adaptation to heterogeneous clients and trustworthiness in adversarial or unreliable environments. Due to the lack of reliable metrics and models for reasoning in FL, we only provide literature-driven insights for this dimension. ATR-Bench lays the groundwork for a systematic and holistic evaluation of federated learning with real-world relevance. We will make our complete codebase publicly accessible and a curated repository that continuously tracks new developments and research in the FL literature.

2505.12546 2026-05-05 cs.CL cs.CY cs.LG

Extracting memorized pieces of (copyrighted) books from open-weight language models

A. Feder Cooper, Mark A. Lemley, Allison Casasola, Ahmed Ahmed, Aaron Gokaslan, Amy B. Cyphert, Christopher De Sa, Daniel E. Ho, Percy Liang

详情
英文摘要

Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) memorize protected expression from books in their training data. We show that these polarized positions dramatically oversimplify the relationship between memorization and copyright. To do so, we develop a technique to measure memorization of books, which we apply to 200 books and 14 open-weight LLMs. Through over 3000 experiments, we show that memorization varies both by model and book. With respect to our specific extraction methodology, we find that most LLMs do not memorize most books -- either in whole or in part; however, there are notable exceptions. For instance, Llama 3.1 70B entirely memorizes some books, like Harry Potter and the Sorcerer's Stone; memorization is so extensive that one can deterministically extract the whole book almost verbatim using the book's first few words as an initial prompt. We discuss why our results have significant implications for copyright cases, though not ones that unambiguously favor either side.

2505.02380 2026-05-05 cs.LG

EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices

Arnab Sanyal, Gourav Datta, Prithwish Mukherjee, Sandeep P. Chinchali, Michael Orshansky

Comments 4 pages, 1 reference page

详情
英文摘要

Large Language Models (LLMs) achieve strong performance across tasks, but face storage and compute challenges on edge devices. We propose EntroLLM, a compression framework combining mixed quantization and entropy coding to reduce storage while preserving accuracy. We use a combination of unsigned and asymmetric quantization. Tensor-level quantization produces an entropy-reducing effect, increasing weight compressibility, and improving downstream Huffman encoding by $7\times$ (8-bit) and $11.3\times$ (4-bit) over state-of-the-art methods. Huffman coding further reduces memory bandwidth demands, while a parallel decoding strategy enables efficient weight retrieval with minimal latency. Experiments on edge-scale LLMs (smolLM-1.7B, phi3-mini-4k, mistral-7B) show up to $30\%$ storage savings over uint8 and $65\%$ over uint4 models, with $31.9-146.6\%$ faster inference on memory-limited devices like the NVIDIA JETSON P3450. EntroLLM requires no retraining and is compatible with existing post-training quantization pipelines, making it practical for edge LLM deployment.

2504.20605 2026-05-05 cs.CL cs.AI cs.DL cs.LG

TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models

Mihai Nadas, Laura Diosan, Andrei Piscoran, Andreea Tomescu

Comments 18 pages, 6 tables, 1 figure. v2: revised evaluation with open-weight LLM judge panel, expanded citations

详情
英文摘要

Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We present TF1-EN-3M, to our knowledge the first open dataset of three million English-language fables generated exclusively by instruction-tuned models no larger than 8B parameters. Each story follows a six-slot scaffold (character -> trait -> setting -> conflict -> resolution -> moral), produced through a combinatorial prompt engine that guarantees genre fidelity while covering a broad thematic space. A fully reproducible evaluation pipeline employs a panel of open-weight LLM judges from distinct model families, scoring grammar, creativity, moral clarity, and template adherence, complemented by reference-free diversity and readability metrics. Among ten open-weight generator candidates, an 8B-parameter Llama-3 variant delivers the best quality-cost trade-off, producing high-scoring fables on consumer hardware at approximately $0.135 per 1,000 fables. We release the dataset, generation code, evaluation scripts, and full metadata under a permissive license, enabling exact reproducibility and cost benchmarking. TF1-EN-3M opens avenues for research in instruction following, narrative intelligence, value alignment, and child-friendly educational AI -- demonstrating that large-scale moral storytelling requires neither proprietary giant models nor proprietary evaluation infrastructure.

2504.02293 2026-05-05 cs.CL cs.AI

Breaking the Silence: A Dataset and Benchmark for Bangla Text-to-Gloss Translation

Sharif Mohammad Abdullah, Abhijit Paul, Shubhashis Roy Dipta, Zarif Masud, Shebuti Rayana, Ahmedul Kabir

详情
英文摘要

Gloss is a written approximation that bridges Sign Language (SL) and its corresponding spoken language. Despite a deaf and hard-of-hearing population of at least 3 million in Bangladesh, Bangla Sign Language (BdSL) remains largely understudied, with no prior work on Bangla text-to-gloss translation and no publicly available datasets. To address this gap, we construct the first Bangla text-to-gloss dataset, consisting of 1,000 manually annotated and 4,000 synthetically generated Bangla sentence-gloss pairs, along with 159 expert human-annotated pairs used as a test set. Our experimental framework performs a comparative analysis between several fine-tuned open-source models and a leading closed-source LLM to evaluate their performance in low-resource BdSL translation. GPT-5.4 achieves the best overall performance, while a fine-tuned mBART model performs competitively despite being approximately 100% smaller. Qwen-3 outperforms all other models in human evaluation. This work introduces the first dataset and trained model for Bangla text-to-gloss translation. It also demonstrates the effectiveness of systematically generated synthetic data for addressing challenges in low-resource sign language translation.

2503.12001 2026-05-05 cs.CV

3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction

Peizhen Zheng, Dongjing Jiang, Qingchong Jiao, Redouane EL Bouchtaoui, Flynnwell Jianfei Zhang

详情
英文摘要

The accurate reconstruction of dynamic street scenes is critical for applications in autonomous driving, augmented reality, and virtual reality. Traditional methods relying on dense point clouds and triangular meshes struggle with moving objects, occlusions, and real-time processing constraints, limiting their effectiveness in complex urban environments. While multi-view stereo and neural radiance fields have advanced 3D reconstruction, they face challenges in computational efficiency and handling scene dynamics. This paper proposes a novel 3D Gaussian point distribution method for dynamic street scene reconstruction. Our approach introduces an adaptive transparency mechanism that eliminates moving objects while preserving high-fidelity static scene details. Additionally, iterative refinement of Gaussian point distribution enhances geometric accuracy and texture representation. We integrate directional encoding with spatial position optimization to optimize storage and rendering efficiency, reducing redundancy while maintaining scene integrity. Experimental results demonstrate that our method achieves high reconstruction quality, improved rendering performance, and adaptability in large-scale dynamic environments. These contributions establish a robust framework for real-time, high-precision 3D reconstruction, advancing the practicality of dynamic scene modeling across multiple applications. The source code for this work is available to the public at https://github.com/okic-ca/3dgs

2503.07557 2026-05-05 cs.RO

AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning

Yangzhe Kong, Daeun Song, Jing Liang, Dinesh Manocha, Ziyu Yao, Xuesu Xiao

详情
英文摘要

We present a novel method, AutoSpatial, an efficient approach with structured spatial grounding to enhance VLMs' spatial reasoning. By combining minimal manual supervision with large-scale Visual Question-Answering (VQA) pairs auto-labeling, our approach tackles the challenge of VLMs' limited spatial understanding in social navigation tasks. By applying a hierarchical two-round VQA strategy during training, AutoSpatial achieves both global and detailed understanding of scenarios, demonstrating more accurate spatial perception, movement prediction, Chain of Thought (CoT) reasoning, final action, and explanation compared to other SOTA approaches. These five components are essential for comprehensive social navigation reasoning. Our approach was evaluated using both expert systems (GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet) that provided cross-validation scores and human evaluators who assigned relative rankings to compare model performances across four key aspects. Augmented by the enhanced spatial reasoning capabilities, AutoSpatial demonstrates substantial improvements by averaged cross-validation score from expert systems in: perception & prediction (up to 10.71%), reasoning (up to 16.26%), action (up to 20.50%), and explanation (up to 18.73%) compared to baseline models trained only on manually annotated data.

2502.16810 2026-05-05 cs.AI cs.CL cs.HC econ.GN q-fin.EC

AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting

Jibang Wu, Chenghao Yang, Yi Wu, Simon Mahns, Chaoqi Wang, Hao Zhu, Fei Fang, Haifeng Xu

详情
英文摘要

This paper develops an agentic framework that employs large language models (LLMs) for grounded persuasive language generation in automated copywriting, with real estate marketing as a focal application. Our method is designed to align the generated content with user preferences while highlighting useful factual attributes. This agent consists of three key modules: (1) Grounding Module, mimicking expert human behavior to predict marketable features; (2) Personalization Module, aligning content with user preferences; (3) Marketing Module, ensuring factual accuracy and the inclusion of localized features. We conduct systematic human-subject experiments in the domain of real estate marketing, with a focus group of potential house buyers. The results demonstrate that marketing descriptions generated by our approach are preferred over those written by human experts by a clear margin while maintaining the same level of factual accuracy. Our findings suggest a promising agentic approach to automate large-scale targeted copywriting while ensuring factuality of content generation.

2502.15311 2026-05-05 cs.CV cs.ET

A Comprehensive Review of Fish Feeding Behavior Analysis in Aquaculture: Tasks, Techniques, and Applications

Shulong Zhang, Daoliang Li, Jiayin Zhao, Mingyuan Yao, Yingyi Chen, Haihua Wang

Comments 37 pages, 8 figures,

详情
英文摘要

Fish feeding behavior analysis is a key foundation for intelligent feeding and precision aquaculture management, and plays an important role in improving feed utilization efficiency, reducing production costs, and mitigating environmental burden. Existing reviews mainly focus on specific technical modalities or related applications in smart aquaculture, which makes it difficult to present the overall development of fish feeding behavior analysis in a comprehensive manner. To address these issues, this paper provides a thematic review of fish feeding behavior analysis in aquaculture, and systematically examines its task definition, technical support, and application status. First, from the task perspective, two core subtasks of fish feeding behavior analysis are clearly distinguished, and relevant behavioral characteristics and evaluation metrics are summarized. Second, from the technical perspective, the development trajectories of computer vision, acoustics, sensors, and multimodal fusion technologies are examined, and their advantages, limitations, and applicable scenarios are analyzed. On this basis, the application value of fish feeding behavior analysis in intelligent feeding and aquaculture management is further summarized. Finally, this paper discusses the challenges in robust perception under complex environments, generalization across fish species and farming scenarios, collaborative multimodal modeling and lightweight deployment, closed loop intelligent feeding, coordinated optimization of multiple tasks, and long-term production validation, and outlines future research directions. This review provides a reference for task standardization, technical selection, and engineering application in fish feeding behavior analysis, and offers insights into the development of smart aquaculture and sustainable aquaculture management.

2502.00204 2026-05-05 cs.LG cs.GT

Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

Maria-Florina Balcan, Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Keegan Harris, Zhiwei Steven Wu

Comments Accepted to ICLR 2026

详情
英文摘要

We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. In every round the leader observes contextual information and commits to a mixed strategy, after which the follower best-responds. We provide learning algorithms for the leader which achieve $O(T^{1/2})$ regret under bandit feedback, an improvement from the previously best-known rates of $O(T^{2/3})$. Our algorithms rely on a reduction to linear contextual bandits in the utility space: In each round, a linear contextual bandit algorithm recommends a utility vector, which our algorithm inverts to determine the leader's mixed strategy. We extend our algorithms to the setting in which the leader's utility function is unknown, and also apply it to the problems of bidding in second-price auctions with side information and online Bayesian persuasion with public and private states. Finally, we observe that our algorithms empirically outperform previous results on numerical simulations.

2501.00112 2026-05-05 cs.RO

QuadPiPS: A Perception-informed Footstep Planner for Quadrupeds With Semantic Affordance Prediction

Max Asselmeier, Ye Zhao, Patricio A. Vela

Comments Under review. Project site: https://quadpips.github.io/

详情
英文摘要

This work proposes QuadPiPS, a perception-informed framework for quadrupedal foothold planning in the perception space. QuadPiPS employs a novel ego-centric local environment representation, known as the legged egocan, that is extended here to capture unique legged affordances through a joint geometric and semantic encoding that supports local motion planning and control for quadrupeds. QuadPiPS takes inspiration from the Augmented Leafs with Experience on Foliations (ALEF) planning framework to partition the foothold planning space into its discrete and continuous subspaces. To facilitate real-world deployment, QuadPiPS broadens the ALEF approach by synthesizing perception-informed, real-time, and kinodynamically-feasible reference trajectories through search and trajectory optimization techniques. To support deliberate and exhaustive searching, QuadPiPS over-segments the egocan floor via superpixels to provide a set of planar regions suitable for candidate footholds. Nonlinear trajectory optimization methods then compute swing trajectories to transition between selected footholds and provide long-horizon whole-body reference motions that are tracked under model predictive control and whole body control. Benchmarking with the ANYmal C quadruped across ten simulation environments and five baselines reveals that QuadPiPS excels in safety-critical settings with limited available footholds. Real-world validation on the Unitree Go2 quadruped equipped with a custom computational suite demonstrates that QuadPiPS enables terrain-aware locomotion on hardware.

2409.14500 2026-05-05 cs.LG cs.AI

GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data

Gleb Bazhenov, Oleg Platonov, Liudmila Prokhorenkova

Comments Accepted at NeurIPS 2025 (Datasets & Benchmarks Track)

详情
英文摘要

Although data that can be naturally represented as graphs is widespread in real-world applications across diverse industries, popular graph ML benchmarks for node property prediction only cover a surprisingly narrow set of data domains, and graph neural networks (GNNs) are often evaluated on just a few academic citation networks. This issue is particularly pressing in light of the recent growing interest in designing graph foundation models. These models are supposed to be able to transfer to diverse graph datasets from different domains, and yet the proposed graph foundation models are often evaluated on a very limited set of datasets from narrow applications. To alleviate this issue, we introduce GraphLand: a benchmark of 14 diverse graph datasets for node property prediction from a range of different industrial applications. GraphLand allows evaluating graph ML models on a wide range of graphs with diverse sizes, structural characteristics, and feature sets, all in a unified setting. Further, GraphLand allows investigating such previously underexplored research questions as how realistic temporal distributional shifts under transductive and inductive settings influence graph ML model performance. To mimic realistic industrial settings, we use GraphLand to compare GNNs with gradient-boosted decision trees (GBDT) models that are popular in industrial applications and show that GBDTs provided with additional graph-based input features can sometimes be very strong baselines. Further, we evaluate currently available general-purpose graph foundation models and find that they fail to produce competitive results on our proposed datasets.

2407.11933 2026-05-05 cs.LG

Fairness-Aware Multi-Group Target Detection in Online Discussion

Soumyajit Gupta, Maria De-Arteaga, Matthew Lease

详情
Journal ref
2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT)
英文摘要

Target-group detection is the task of detecting which group(s) a piece of content is ``directed at or about''. Applications include targeted marketing, content recommendation, and group-specific content assessment. Key challenges include: 1) that a single post may target multiple groups; and 2) ensuring consistent detection accuracy across groups for fairness. In this work, we investigate fairness implications of target-group detection in the context of toxicity detection, where the perceived harm of a social media post often depends on which group(s) it targets. Because toxicity is highly contextual, language that appears benign in general can be harmful when targeting specific demographic groups. We show our {\em fairness-aware multi-group target detection} approach both reduces bias across groups and shows strong predictive performance, surpassing existing fairness-aware baselines. To enable reproducibility and spur future work, we share our code online.

2407.06150 2026-05-05 cs.CV

PanDORA: Casual HDR Radiance Acquisition of Indoor Scenes for Image-based Lighting

Mohammad Reza Karimi Dastjerdi, Dominique Tanguay-Gaudreau, Frédéric Fortier-Chouinard, Yannick Hold-Geoffroy, Nima Kalantari, Jean-François Lalonde

Comments 10 pages, 11 figures

详情
英文摘要

Most novel view synthesis methods -- including Neural Radiance Fields (NeRF) -- struggle to capture the high dynamic range (HDR) radiance required for realistic image-based lighting (IBL). This limitation stems from a reliance on low dynamic range (LDR) imagery, which fails to capture the intensity of light sources found in indoor environments. While exposure bracketing can recover this range, it is often too slow for practical, large-scale acquisition. In this work, we introduce PanDORA: PANoramic Dual-Observer Radiance Acquisition, a system specifically designed for the fast and affordable capture of high-quality HDR radiance maps for IBL. Our approach utilizes two 360° cameras mounted on a portable monopod to simultaneously record videos at different exposures. These videos are processed by our proposed two-stage NeRF-based algorithm featuring a novel self-calibrating pipeline to estimate camera parameters. This pipeline produces non-saturated HDR radiance fields that accurately capture the radiance of a scene. When evaluated on a new dataset of real indoor environments featuring HDR ground truth lighting, PanDORA demonstrates superior fidelity in reconstructing the peak intensities necessary for downstream rendering tasks, providing a scalable and efficient solution for capturing real-world IBLs.

2402.05284 2026-05-05 cs.LG

Analyzing Adversarial Inputs in Deep Reinforcement Learning

Davide Corsi, Guy Amir, Guy Katz, Alessandro Farinelli

Comments Accepted to AISoLA 2025

详情
英文摘要

In recent years, Deep Reinforcement Learning (DRL) has become a popular paradigm in machine learning due to its successful applications to real-world and complex systems. However, even the state-of-the-art DRL models have been shown to suffer from reliability concerns -- for example, their susceptibility to adversarial inputs, i.e., small and abundant input perturbations that can fool the models into making unpredictable and potentially dangerous decisions. This drawback limits the deployment of DRL systems in safety-critical contexts, where even a small error cannot be tolerated. In this work, we present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification. Specifically, we present the Adversarial Rate, a metric adapted from the ProVe family, for the systematic evaluation of adversarial inputs in DRL, which partitions the input domain into subregions to enable both quantification and spatial visualization of adversarial inputs. The main contribution of this work is to provide a comprehensive evaluation framework for the effect of adversarial inputs on DRL policies. We present a set of tools and algorithms for its computation. Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations. Moreover, we analyze the behavior of these configurations to suggest several useful practices and guidelines to help mitigate the vulnerability of trained DRL networks.

2311.10320 2026-05-05 cs.CV eess.IV

Boosting Multimodal Remote Sensing Image Classification with Transformer-based Heterogeneously Salient Graph Representation

Jiaqi Yang, Bo Du, Rong Liu, Zhu Mao, Liangpei Zhang

详情
英文摘要

Data collected by different modalities can provide a wealth of complementary information, such as hyperspectral image (HSI) to offer rich spectral-spatial properties, synthetic aperture radar (SAR) to provide structural information about the Earth's surface, and light detection and ranging (LiDAR) to cover altitude information about ground elevation. Therefore, a natural idea is to combine multimodal images for refined and accurate land-cover interpretation. Although many efforts have been attempted to achieve multi-source remote sensing image classification, there are still three issues as follows: 1) indiscriminate feature representation without sufficiently considering modal heterogeneity, 2) abundant features and complex computations associated with modeling long-range dependencies, and 3) overfitting phenomenon caused by sparsely labeled samples. To overcome the above barriers, a transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper. First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data. Then, a self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling. Finally, a mean forward strategy is developed in order to avoid overfitting. Based on the above structures, the proposed model is able to break through modal gaps to obtain differentiated graph representation with competitive time cost, even for a small fraction of training samples. Experiments and analyses in three benchmark datasets with various state-of-the-art (SOTA) approaches show the performance of the proposed THSGR. The code will be available in https://github.com/jqyang22.

2306.04498 2026-05-05 cs.LG cs.CY cs.DC

Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits

Amir Leshem

Comments 17 pages, 3 figures

详情
英文摘要

We study fair multi-agent multi-armed bandit learning under collision-only coordination. Agents cannot communicate explicitly during learning and observe only their own rewards and whether collisions occur when several agents access the same arm. The goal is to learn a max-min fair allocation while keeping each agent's reward samples and empirical reward estimates local. We propose a fully distributed algorithm for bounded rewards with unknown support, achieving regret $O\!\left(N^3 f(\log T)\log T\right)$, where $f$ is any nondecreasing diverging function satisfying $f(k-1)/f(k)\to 1$. The algorithm combines distributed agent ordering, cumulative round-robin exploration, endpoint-revalidated warm-started bisection, and a collision-based distributed auction for threshold-feasibility tests. Unlike leader-based optimal algorithms, no agent collects the reward observations, empirical estimates, or preferences of the others. Thus, the protocol preserves reward privacy in the operational sense of avoiding reward sharing, while coordinating only through collision outcomes. Compared with previous privacy-preserving algorithms for max--min fair bandits, which have exponential dependence on the number of agents, our method achieves polynomial $N^3$ dependence while retaining near-logarithmic dependence on $T$. The analysis uses concentration of cumulative empirical estimates and stability of endpoint-revalidated bisection. Simulations confirm the predicted scaling with horizon, number of agents, and max--min gap across representative numerical settings.

2301.08719 2026-05-05 cs.AI physics.med-ph

The stochastic digital human is now enrolling for in silico imaging trials -- Methods and tools for generating digital cohorts

A Badano, M Lago, E Sizikova, JG Delfino, S Guan, MA Anastasio, B Sahiner

详情
Journal ref
Prog. Biomed. Eng. 5 042002 (2023)
英文摘要

Randomized clinical trials, while often viewed as the highest evidentiary bar by which to judge the quality of a medical intervention, are far from perfect. In silico imaging trials are computational studies that seek to ascertain the performance of a medical device by collecting this information entirely via computer simulations. The benefits of in silico trials for evaluating new technology include significant resource and time savings, minimization of subject risk, the ability to study devices that are not achievable in the physical world, allow for the rapid and effective investigation of new technologies and ensure representation from all relevant subgroups. To conduct in silico trials, digital representations of humans are needed. We review the latest developments in methods and tools for obtaining digital humans for in silico imaging studies. First, we introduce terminology and a classification of digital human models. Second, we survey available methodologies for generating digital humans with healthy and diseased status and examine briefly the role of augmentation methods. Finally, we discuss the trade-offs of four approaches for sampling digital cohorts and the associated potential for study bias with selecting specific patient distributions.

2007.02392 2026-05-05 cs.LG cs.DS math.ST stat.CO stat.ML stat.TH

Efficient Parameter Estimation of Truncated Boolean Product Distributions

Dimitris Fotakis, Alkis Kalavasis, Christos Tzamos

Comments 33rd Conference on Learning Theory (COLT 2020)

详情
英文摘要

We study the problem of estimating the parameters of a Boolean product distribution in $d$ dimensions, when the samples are truncated by a set $S \subset \{0, 1\}^d$ accessible through a membership oracle. This is the first time that the computational and statistical complexity of learning from truncated samples is considered in a discrete setting. We introduce a natural notion of fatness of the truncation set $S$, under which truncated samples reveal enough information about the true distribution. We show that if the truncation set is sufficiently fat, samples from the true distribution can be generated from truncated samples. A stunning consequence is that virtually any statistical task (e.g., learning in total variation distance, parameter estimation, uniformity or identity testing) that can be performed efficiently for Boolean product distributions, can also be performed from truncated samples, with a small increase in sample complexity. We generalize our approach to ranking distributions over $d$ alternatives, where we show how fatness implies efficient parameter estimation of Mallows models from truncated samples. Exploring the limits of learning discrete models from truncated samples, we identify three natural conditions that are necessary for efficient identifiability: (i) the truncation set $S$ should be rich enough; (ii) $S$ should be accessible through membership queries; and (iii) the truncation by $S$ should leave enough randomness in all directions. By carefully adapting the Stochastic Gradient Descent approach of (Daskalakis et al., FOCS 2018), we show that these conditions are also sufficient for efficient learning of truncated Boolean product distributions.

1910.09876 2026-05-05 cs.LG stat.ML

Neural Network Training with Approximate Logarithmic Computations

Arnab Sanyal, Peter A. Beerel, Keith M. Chugg

详情
英文摘要

The high computational complexity associated with training deep neural networks limits online and real-time training on edge devices. This paper proposed an end-to-end training and inference scheme that eliminates multiplications by approximate operations in the log-domain which has the potential to significantly reduce implementation complexity. We implement the entire training procedure in the log-domain, with fixed-point data representations. This training procedure is inspired by hardware-friendly approximations of log-domain addition which are based on look-up tables and bit-shifts. We show that our 16-bit log-based training can achieve classification accuracy within approximately 1% of the equivalent floating-point baselines for a number of commonly used datasets.

2605.01656 2026-05-05 q-bio.NC cs.AI cs.LG

From Cortical Synchronous Rhythm to Brain Inspired Learning Mechanism: An Oscillatory Spiking Neural Network with Time-Delayed Coordination

Tingting Dan, Guorong Wu

Comments 19 pages, 6 figures

详情
英文摘要

Human cognition emerges from coordinated spiking dynamics in distributed neural circuits, where information is encoded via both firing rates and precise spike timing determined by brain rhythms. Inspired by this notion, we propose a brain-inspired learning primitive in which cognition-level neural synchrony emerges through iterative bottom-up and top-down interactions between micro-scale dynamics of spiking neurons and a macro-scale mechanism of oscillatory synchronization. Specifically, we model each parcel (e.g., a cortical region or an image pixel) in the target system as a spiking neuron embedded in a predefined connectivity scaffold. Low-level information is encoded in a spatiotemporal domain, where neurons are selectively grouped and fire spontaneously over time through self-organized dynamics. In the bottom-up route, oscillatory synchronization is formed from past spiking activity accumulated over a finite memory window. Since brain dynamics operate in a regime of partial and transient synchronization rather than global phase locking, we model oscillatory coordination using a time-delayed synchronization formulation, which enables a top-down modulation of heterogeneous neural spiking for a large-scale distributed system. Together, we devise a spiking-by-synchronization neural network (S2-Net) that uses rhythmic timing as a control mechanism for efficient information processing. Promising results have been achieved across a broad range of tasks, including neural activity decoding, energy-efficient signal processing, temporal binding and semantic reasoning.

2605.01655 2026-05-05 math.CA cs.LG

Exact Loop Controllers for ReLU Realization of Homogeneous Curve Refinements

Boldsaikhan Bolorkhuu, Tsogtgerel Gantumur

Comments 39 pages, 6 figures

详情
英文摘要

We study homogeneous refinement operators \((Vγ)(t)=\sum_{j\in\mathbb Z}A_jγ(Mt-j)\), acting on compactly supported continuous piecewise linear curves \(γ:\mathbb R\to\mathbb R^p\), where \(M\ge2\) and only finitely many matrices \(A_j\in\mathbb R^{p\times p}\) are nonzero. We prove that the iterates \(V^nγ\) admit exact ReLU realizations of fixed width and depth \(O(n)\). The main new ingredient is an exact loop controller for the residual dynamics. Instead of propagating scalar residual surrogates, the construction transports the residual orbit by a forward-exact state on a polygonal loop. Scalar factors and digit selectors are then recovered from this loop state by complementary CPwL readouts. The loop seam is not removed, but its remaining ambiguity is confined to the final readout/selector stage, where it is harmless because the scalar atom is supported away from the seam. This gives a homogeneous \(M\)-ary vector-valued extension of the scalar binary refinable-function construction with a more geometric controller architecture. We also record crude exponential bounds on the network weights and biases. Affine forcing terms are handled by expanding affine iterates into finite sums of homogeneous iterates, giving exact fixed-width realizations with depth \(O(n^2)\), and anchored open curves reduce to compactly supported defects with affine anchor mismatch. We also describe homogeneous polygonal generators, including dragon-type examples and a self-intersecting Hilbert-type prototype in arbitrary dimension. The extended version includes stage-dependent forcing, finite-state stacking reductions, and further geometric constructions such as Koch-, Gosper-, Morton-, and connector-based Hilbert-type variants.

2605.01628 2026-05-05 stat.ML cs.LG math.ST stat.TH

Self-Normalized Martingales and Uniform Regret Bounds for Linear Regression

Fan Chen, Jian Qian, Alexander Rakhlin, Nikita Zhivotovskiy

详情
英文摘要

Self-normalized martingale inequalities lie at the heart of confidence ellipsoids for online least squares and, more broadly, many bandit and reinforcement-learning results. Yet existing vector and scalar results typically rely on bounded covariates and an explicit regularization matrix, producing bounds that are \emph{not scale-invariant}: although the self-normalized quantity is scale-invariant by definition, its standard upper bounds are not. We characterize when scale-invariant upper bounds on self-normalized martingales are possible. Without further assumptions, we prove that nontrivial scale-invariant bounds exist only in dimension $d=1$; moreover, in $d=1$ we obtain $O(\log T)$ scale-invariant self-normalized bounds without any assumptions on the covariates. In contrast, for $d>1$ we show that no nontrivial scale-invariant bound can hold in full generality. We then connect this dichotomy to \emph{doubly-uniform} regret in online linear regression (i.e., regret bounds that are simultaneously independent of the covariate scale and the comparator norm) and use it to resolve the open question of Gaillard, Gerchinovitz, Huard, and Stoltz, \emph{``Uniform regret bounds over $\mathbb{R}^d$ for the sequential linear regression problem with the square loss''} (ALT 2019): in $d=1$ we give an explicit algorithm with $O(\log T)$ doubly-uniform regret, whereas for $d>1$ sublinear doubly-uniform regret is impossible. Finally, under a natural \emph{smoothness} condition (bounded Radon--Nikodym derivatives of the conditional covariate laws with respect to a fixed base measure), we recover sublinear regret for $d>1$ without bounded covariates and derive a self-normalized concentration inequality free of the usual regularization penalties, yielding arguably a first natural scale-invariant bound for adaptive, non-i.i.d. vector martingales.

2605.01611 2026-05-05 cs.CY cs.AI cs.LG

The Case for ESM3 as a General-Purpose AI Model with Systemic Risk Under the EU AI Act

Taro Qureshi, Jacob Griffith, Koen Holtman, Marcel Mir Teijeiro, Ze Shen Chin, Rokas Gipiškis

Comments 8 pages, 1 figure, Technical AI Safety Conference

详情
英文摘要

Due to ambiguity in the wording of the EU AI Act, we examine the question of to what extent frontier biological foundation models such as ESM3 are subject to obligations for general-purpose AI models with systemic risk under the EU AI Act. In this paper, we map ESM3 to the biorisk chain, and conclude that it would be desirable if the providers of ESM3 and similar biological models were subject to these obligations, which would require them to assess and mitigate dual-use risks from their models. We then perform an analysis, comparing the attributes of ESM3 to the classification criteria in the AI Act and the supporting material. We conclude that at this time, ESM3 does not appear to be meaningfully regulated by the Act. We then propose remedies to correct the situation.

2605.01610 2026-05-05 cs.HC cs.AI

Less Interaction But More Explanation: A Communication Perspective on Agentic AI Interfaces

Eunchae Jang, S. Shyam Sundar

详情
Journal ref
Proceedings of the CHI 2026 Workshop on Human-Centered Explainable AI (HCXAI), Barcelona, Spain, 2026
英文摘要

AI systems have long been expected to interact with users, answering questions, generating content, and continuing (social) conversations. Agentic AI, however, breaks from this expectation, as its primary objective is workflow execution on behalf of the users. If a system becomes more agentic, do users need less interaction with the system? Our answer is: less routine back-and-forth, but more communication for oversight and explanation, as agentic AI proactively acts, not just responds. Grounded in a communication perspective, we discuss how users perceive the communicative roles of AI systems (whether as the source of actions or merely a channel), and how this can shape trust. Because agentic AI can play multiple communicative roles, it can complicate this source perception and introduce potential risks. To address this, we propose three types of explanations that agentic AI needs to incorporate (action-process, uncertainty, and coordination), and suggest that customization affordances that allow users to decide when and which explanations they see may be key to preserving human agency as AI autonomy increases.