arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.19851 2026-04-23 cs.GT cs.AI

Is Four Enough? Automated Reasoning Approaches and Dual Bounds for Condorcet Dimensions of Elections

Itai Zilberstein, Ratip Emin Berker, George Li, Ruben Martins

Comments Appears at the 8th Games, Agents, and Incentives Workshop (GAIW-26). Held as part of the Workshops at the 25th International Conference on Autonomous Agents and Multiagent Systems

详情

英文摘要

In an election where $n$ voters rank $m$ candidates, a Condorcet winning set is a committee of $k$ candidates such that for any outside candidate, a majority of voters prefer some committee member. Condorcet's paradox shows that some elections admit no Condorcet winning sets with a single candidate (i.e., $k=1$), and the same can be shown for $k=2$. On the other hand, recent work proves that a set of size $k=5$ exists for every election. This leaves an important theoretical gap between the best known lower bound $(k\geq 3)$ and upper bound $(k \leq 5)$ for the number of candidates needed to guarantee existence. We aim to close the gap between the existence guarantees and impossibility results for Condorcet winning sets. We explore an automated reasoning approach to tighten these bounds. We design a mixed-integer linear program (MILP) to search for elections that would serve as counter-examples to conjectured bounds. We employ a number of optimizations, such as symmetry breaking, subsampling, and constraint generation, to enhance the search and model effectively infinite electorates. Furthermore, we analyze the dual of the linear programming relaxation as a path towards obtaining a new upper bound. Despite extensive search on moderate-sized elections, we fail to find any election requiring a committee larger than size 3. Motivated by our experimental results in this direction, we simplify the dual linear program and formulate a conjecture which, if true, implies that a winning set of size 4 always exists. Our automated reasoning results provide strong empirical evidence that the Condorcet dimension of any election may be smaller than currently known upper bounds, at least for small instances. We offer a general-purpose framework for searching elections in ranked voting and a new, concrete analytical path via duality toward proving that smaller committees suffice.

URL PDF HTML ☆

赞 0 踩 0

2604.19850 2026-04-23 cs.ET cs.LG cs.NE q-bio.MN q-bio.QM

What Makes a Bacterial Model a Good Reservoir Computer? Predicting Performance from Separability and Similarity

Laura Alonso Bartolomé, Jean-Loup Faulon, Xavier Hinaut

详情

英文摘要

Biological systems are promising substrates for computation because they naturally process environmental information through complex internal dynamics. In this study, we investigate whether bacterial metabolic models can act as physical reservoirs and whether their computational performance can be predicted from dynamical properties linked to separability and similarity. We simulated the growth dynamics of five bacterial species, one yeast species, and 29 Escherichia coli single-gene deletion mutants using dynamic flux balance analysis (dFBA), with glucose and xylose concentrations as inputs and growth curves as reservoir states. Computational performance was assessed on random nonlinear classification tasks using a linear readout, while reservoir properties linked to separability and similarity were characterised through kernel and generalisation ranks computed from growth-curve state matrices. Several microbial models achieved high classification accuracy, showing that bacterial metabolic dynamics can support nonlinear computation. Clear differences were observed between species, with some models converging more rapidly and others reaching higher maximum accuracy, revealing a trade-off between convergence speed and peak performance. In contrast, all E. coli mutants were dominated by the wild-type model, suggesting that gene deletions reduce the dynamical richness required for efficient computation. The difference between kernel and generalisation ranks was generally associated with improved accuracy, but deviations across models and sensitivity at low rank values limited its predictive power in practice. Overall, these results show that bacterial metabolic models constitute promising substrates for reservoir computing and provide a first step towards identifying microbial strains with favourable computational properties for future experimental implementations.

URL PDF HTML ☆

赞 0 踩 0

2604.19846 2026-04-23 hep-ex astro-ph.HE astro-ph.IM cs.AI cs.LG

Neural posterior estimation of the neutrino direction in IceCube using transformer-encoded normalizing flows on the sphere

R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, J. M. Alameddine, S. Ali, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, S. N. Axani, R. Babu, X. Bai, A. Balagopal V., S. W. Barwick, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, P. Behrens, J. Beise, C. Bellenghi, S. Benkel, S. BenZvi, D. Berley, E. Bernardini, D. Z. Besson, E. Blaufuss, L. Bloom, S. Blot, F. Bontempo, J. Y. Book Motzkin, C. Boscolo Meneguolo, S. Böser, O. Botner, J. Böttcher, J. Braun, B. Brinson, Z. Brisson-Tsavoussis, R. T. Burley, D. Butterfield, K. Carloni, J. Carpio, N. Chau, Z. Chen, D. Chirkin, S. Choi, A. Chubarov, B. A. Clark, G. H. Collin, D. A. Coloma Borja, A. Connolly, J. M. Conrad, D. F. Cowen, C. De Clercq, J. J. DeLaunay, D. Delgado, T. Delmeulle, S. Deng, P. Desiati, K. D. de Vries, G. de Wasseige, T. DeYoung, J. C. Díaz-Vélez, S. DiKerby, T. Ding, M. Dittmer, A. Domi, L. Draper, L. Dueser, D. Durnford, K. Dutta, M. A. DuVernois, T. Ehrhardt, L. Eidenschink, A. Eimer, C. Eldridge, P. Eller, E. Ellinger, D. Elsässer, R. Engel, H. Erpenbeck, W. Esmail, S. Eulig, J. Evans, P. A. Evenson, K. L. Fan, K. Fang, K. Farrag, A. R. Fazely, A. Fedynitch, N. Feigl, C. Finley, D. Fox, A. Franckowiak, S. Fukami, P. Fürst, J. Gallagher, E. Ganster, A. Garcia, M. Garcia, E. Genton, L. Gerhardt, A. Ghadimi, C. Glaser, T. Glüsenkamp, J. G. Gonzalez, S. Goswami, A. Granados, D. Grant, S. J. Gray, S. Griffin, K. M. Groth, D. Guevel, C. Günther, P. Gutjahr, C. Ha, A. Hallgren, L. Halve, F. Halzen, L. Hamacher, M. Handt, K. Hanson, J. Hardin, A. A. Harnisch, P. Hatch, A. Haungs, J. Häußler, K. Helbing, J. Hellrung, B. Henke, L. Hennig, F. Henningsen, L. Heuermann, R. Hewett, N. Heyer, S. Hickford, A. Hidvegi, C. Hill, G. C. Hill, R. Hmaid, K. D. Hoffman, A. Hollnagel, D. Hooper, S. Hori, K. Hoshina, M. Hostert, W. Hou, M. Hrywniak, T. Huber, K. Hultqvist, K. Hymon, A. Ishihara, W. Iwakiri, M. Jacquart, S. Jain, O. Janik, M. Jansson, M. Jin, N. Kamp, D. Kang, W. Kang, A. Kappes, L. Kardum, T. Karg, A. Karle, A. Katil, M. Kauer, J. L. Kelley, M. Khanal, A. Khatee Zathul, A. Kheirandish, T. Kim, H. Kimku, F. Kirchner, J. Kiryluk, C. Klein, S. R. Klein, Y. Kobayashi, S. Koch, A. Kochocki, R. Koirala, H. Kolanoski, T. Kontrimas, L. Köpke, C. Kopper, D. J. Koskinen, P. Koundal, M. Kowalski, T. Kozynets, A. Kravka, N. Krieger, T. Krishnan, K. Kruiswijk, E. Krupczak, A. Kumar, E. Kun, N. Kurahashi, C. Lagunas Gualda, L. Lallement Arnaud, M. J. Larson, F. Lauber, J. P. Lazar, K. Leonard DeHolton, A. Leszczyńska, C. Li, J. Liao, C. Lin, Q. R. Liu, Y. T. Liu, M. Liubarska, C. Love, L. Lu, F. Lucarelli, W. Luszczak, Y. Lyu, M. Macdonald, E. Magnus, Y. Makino, E. Manao, S. Mancina, A. Mand, I. C. Mariş, S. Marka, Z. Marka, L. Marten, I. Martinez-Soler, R. Maruyama, J. Mauro, F. Mayhew, F. McNally, K. Meagher, A. Medina, M. Meier, Y. Merckx, L. Merten, J. Mitchell, L. Molchany, S. Mondal, T. Montaruli, R. W. Moore, Y. Morii, A. Mosbrugger, D. Mousadi, E. Moyaux, T. Mukherjee, M. Nakos, U. Naumann, J. Necker, L. Neste, M. Neumann, H. Niederhausen, M. U. Nisa, K. Noda, A. Noell, A. Novikov, A. Obertacke, V. O'Dell, A. Olivas, R. Orsoe, J. Osborn, E. O'Sullivan, B. Owens, V. Palusova, H. Pandya, A. Parenti, N. Park, V. Parrish, E. N. Paudel, L. Paul, C. Pérez de los Heros, T. Pernice, T. C. Petersen, J. Peterson, S. Pick, M. Plum, A. Pontén, V. Poojyam, B. Pries, R. Procter-Murphy, G. T. Przybylski, L. Pyras, C. Raab, J. Rack-Helleis, N. Rad, M. Ravn, K. Rawlins, Z. Rechav, A. Rehman, I. Reistroffer, E. Resconi, S. Reusch, C. D. Rho, W. Rhode, L. Ricca, B. Riedel, A. Rifaie, E. J. Roberts, S. Rodan, M. Rongen, A. Rosted, C. Rott, T. Ruhe, L. Ruohan, D. Ryckbosch, J. Saffer, D. Salazar-Gallegos, P. Sampathkumar, A. Sandrock, G. Sanger-Johnson, M. Santander, S. Sarkar, M. Scarnera, M. Schaufel, H. Schieler, S. Schindler, L. Schlickmann, B. Schlüter, F. Schlüter, N. Schmeisser, T. Schmidt, A. Scholz, F. G. Schröder, S. Schwirn, S. Sclafani, D. Seckel, L. Seen, M. Seikh, S. Seunarine, P. A. Sevle Myhr, R. Shah, S. Shah, S. Shefali, N. Shimizu, B. Skrzypek, R. Snihur, J. Soedingrekso, D. Soldin, P. Soldin, G. Sommani, C. Spannfellner, G. M. Spiczak, C. Spiering, J. Stachurska, M. Stamatikos, T. Stanev, T. Stezelberger, T. Stürwald, T. Stuttard, G. W. Sullivan, I. Taboada, S. Ter-Antonyan, A. Terliuk, A. Thakuri, M. Thiesmeyer, W. G. Thompson, J. Thwaites, S. Tilav, K. Tollefson, J. A. Torres, S. Toscano, D. Tosi, K. Upshaw, A. Vaidyanathan, N. Valtonen-Mattila, J. Valverde, J. Vandenbroucke, T. Van Eeden, N. van Eijndhoven, L. Van Rootselaar, J. van Santen, J. Vara, F. Varsi, M. Venugopal, M. Vereecken, S. Vergara Carrasco, S. Verpoest, D. Veske, A. Vijai, J. Villarreal, C. Walck, A. Wang, E. H. S. Warrick, C. Weaver, P. Weigel, A. Weindl, J. Weldert, A. Y. Wen, C. Wendt, J. Werthebach, M. Weyrauch, N. Whitehorn, C. H. Wiebusch, D. R. Williams, L. Witthaus, G. Wrede, X. W. Xu, J. P. Yanez, Y. Yao, E. Yildizci, S. Yoshida, R. Young, F. Yu, S. Yu, T. Yuan, S. Yun-Cárcamo, A. Zander Jurowitzki, A. Zegarelli, S. Zhang, Z. Zhang, P. Zhelnin, P. Zilberman

详情

英文摘要

IceCube is a cubic-kilometer-scale neutrino detector located at the geographic South Pole. A precise directional reconstruction of IceCube neutrinos is vital for associations with astronomical objects. In this context, we discuss neural posterior estimation of the neutrino direction via a transformer encoder that maps to a normalizing flow on the 2-sphere. It achieves a new state-of-the-art angular resolution for the two main event morphologies in IceCube - tracks and showers - while being significantly faster than traditional B-spline-based likelihood reconstructions. All-sky scans can be performed within seconds rather than hours, and take constant computation time, regardless of whether the posterior extent is arc-minutes or spans the whole sky. We utilize a combination of $C^2$-smooth rational-quadratic splines, scale transformations and rotations to define a novel spherical normalizing-flow distribution whose parameters are predicted as a whole as the output of the transformer encoder. We test several structural choices diverting from the vanilla transformer architecture. In particular, we find dual residual streams, nonlinear QKV projection and a separate class token with its own cross-attention processing to boost test-time performance. The angular resolution for both showers and tracks improves substantially over the whole trained energy range from 100 GeV to 100 PeV. At 100 TeV deposited energy, for example, the median angular resolution improves by a factor of $1.3$ for throughgoing tracks, by a factor of $1.7$ for showers and by a factor of $2.5$ for starting tracks compared to state-of-the art likelihood reconstructions based on B-splines. While previous machine-learning (ML) efforts have managed to obtain competitive shower resolutions, this is the first time an ML-based method outperforms likelihood-based muon reconstructions above 100 GeV.

URL PDF HTML ☆

赞 0 踩 0

2604.19841 2026-04-23 stat.AP cs.LG

Spatio-temporal modelling of electric vehicle charging demand

Kaoutar Bouaachra, Yvenn Amara-Ouali, Yannig Goude, Raphaël Lachieze-Rey

Comments 18 pages, 19 figures

2604.19832 2026-04-23 quant-ph cs.LG

Option Pricing on Noisy Intermediate-Scale Quantum Computers: A Quantum Neural Network Approach

Sebastian Zając, Rafał Pracht

2604.19827 2026-04-23 cs.SE cs.AI

More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems

Daniel Russo

2604.19826 2026-04-23 cs.SE cs.AI cs.LG

Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

Éric Jacopin

Comments 20 pages. Preprint; arXiv long version of a paper accepted at AIware 2026. Adds Appendices A (cross-language) and B (Python isolation) not present in the ACM camera-ready

2604.19825 2026-04-23 cs.SE cs.AI

SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution

Woojin Lee, Jin-Xia Huang

Comments 23 pages, 2 figures, Accepted at Findings of ACL 2026

2604.19806 2026-04-23 physics.chem-ph cs.AI cs.LG

Improving Molecular Force Fields with Minimal Temporal Information

Ali Mollahosseini, Mohammed Haroon Dupty, Wee Sun Lee

2604.19801 2026-04-23 eess.AS cs.AI cs.CL

Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech

Gus Lathouwers, Lingyun Gao, Catia Cucchiarini, Helmer Strik

Comments Submitted for Interspeech 2026, currently under review

2604.19799 2026-04-23 cs.HC cs.AI cs.CY q-bio.NC

Measuring Creativity in the Age of Generative AI: Distinguishing Human and AI-Generated Creative Performance in Hiring and Talent Systems

Yigal Rosen, Ilia Rushkin

Comments Research Paper Presented at the BIG.AI@MIT Conference, April 2, 2026

2604.19798 2026-04-23 cs.CY cs.CV econ.EM

Diagnosing Urban Street Vitality via a Visual-Semantic and Spatiotemporal Framework for Street-Level Economics

Xinxin Zhuo, Mengyuan Niu, Ruizhe Wang, Junyan Yang, Qiao Wang

Comments Submitted to ACM Transactions on Spatial Computing. This paper is currently under review

2604.19797 2026-04-23 eess.AS cs.AI cs.CL

Enhancing ASR Performance in the Medical Domain for Dravidian Languages

Sri Charan Devarakonda, Ravi Sastry Kolluru, Manjula Sri Rayudu, Rashmi Kapoor, Madhu G, Anil Kumar Vuppala

2604.19781 2026-04-23 cs.CY cs.AI cs.CL

Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment

Tyler Burleigh

Comments 12 pages, 7 figures. Accepted at NCME 2026

2604.19763 2026-04-23 eess.AS cs.AI cs.CL

Explainable Speech Emotion Recognition: Weighted Attribute Fairness to Model Demographic Contributions to Social Bias

Tomisin Ogunnubi, Yupei Li, Björn Schuller

Comments 5 pages, 4 figures

2604.19752 2026-04-23 cs.MA cs.AI cs.CY

Soft-Label Governance for Distributional Safety in Multi-Agent Systems

Aizierjiang Aiersilan, Raeli Savitt

详情

英文摘要

Multi-agent AI systems exhibit emergent risks that no single agent produces in isolation. Existing safety frameworks rely on binary classifications of agent behavior, discarding the uncertainty inherent in proxy-based evaluation. We introduce SWARM (\textbf{S}ystem-\textbf{W}ide \textbf{A}ssessment of \textbf{R}isk in \textbf{M}ulti-agent systems), a simulation framework that replaces binary good/bad labels with \emph{soft probabilistic labels} $p = P(v{=}+1) \in [0,1]$, enabling continuous-valued payoff computation, toxicity measurement, and governance intervention. SWARM implements a modular governance engine with configurable levers (transaction taxes, circuit breakers, reputation decay, and random audits) and quantifies their effects through probabilistic metrics including expected toxicity $\mathbb{E}[1{-}p \mid \text{accepted}]$ and quality gap $\mathbb{E}[p \mid \text{accepted}] - \mathbb{E}[p \mid \text{rejected}]$. Across seven scenarios with five-seed replication, strict governance reduces welfare by over 40\% without improving safety. In parallel, aggressively internalizing system externalities collapses total welfare from a baseline of $+262$ down to $-67$, while toxicity remains invariant. Circuit breakers require careful calibration; overly restrictive thresholds severely diminish system value, whereas an optimal threshold balances moderate welfare with minimized toxicity. Companion experiments show soft metrics detect proxy gaming by self-optimizing agents passing conventional binary evaluations. This basic governance layer applies to live LLM-backed agents (Concordia entities, Claude, GPT-4o Mini) without modification. Results show distributional safety requires \emph{continuous} risk metrics and governance lever calibration involves quantifiable safety-welfare tradeoffs. Source code and project resources are publicly available at https://www.swarm-ai.org/.

URL PDF HTML ☆

赞 0 踩 0

2604.19750 2026-04-23 cs.SE cs.AI cs.HC

Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging

Zhilin Liu, Ye Huang, Ting Xie, Ruizhi Zhang, Wen Li, Lixin Duan

2604.18951 2026-04-23 cs.MA cs.CL

Superficial Success vs. Internal Breakdown: An Empirical Study of Generalization in Adaptive Multi-Agent Systems

Namyoung So, Seokgyu Jang, Taeuk Kim

Comments 27 pages, 4 figures. Equal contribution for the first two authors

2604.17511 2026-04-23 cs.LO cs.AI cs.CR

Atomic Decision Boundaries: A Structural Requirement for Guaranteeing Execution-Time Admissibility in Autonomous Systems

Marcelo Fernandez

Comments 21 pages. 1st paper (Paper 0) in the 6-paper Agent Governance Series (Papers 0-5). Zenodo: https://doi.org/10.5281/zenodo.19670649. Companion: P1/ACP (arXiv:2603.18829), P2/IML (arXiv:2604.17517), P3 (zenodo.19672597), P4 (zenodo.19672608), P5/RAM (zenodo.19669430)

详情

DOI: 10.5281/zenodo.19670649

英文摘要

Autonomous systems increasingly execute actions that directly modify shared state, creating an urgent need for precise control over which transitions are permitted to occur. Existing governance mechanisms evaluate policies prior to execution or reconstruct behavior post hoc, but do not enforce admissibility at the exact moment a state transition is committed. We introduce the atomic decision boundary, a structural property of admission control systems in which the decision and the resulting state transition are jointly determined as a single indivisible step in the labeled transition system (LTS) model of execution. We distinguish two classes: atomic systems, where evaluation and transition are coupled within a single LTS step, and split evaluation systems, where they are separate transitions interleaved by environmental actions. The separation introduces an architectural gap -- the decision is evaluated in one system state; the transition fires in a potentially different one -- that no policy, regardless of sophistication, can close from within a split architecture. Under realistic concurrent environments, we prove via a constructive counterexample trace that no construction can make a split system equivalent to an atomic system with respect to admissibility. Three corollaries follow: impossibility of execution-time guarantees in split systems, insufficiency of external state enrichment, and admissibility as an execution-time rather than evaluation-time property. We further formalize the Escalate outcome -- absent from classical TOCTOU analyses -- proving that it transfers rather than eliminates the atomicity requirement: resolution is safe if and only if it is itself atomic. We classify RBAC, ABAC, OPA, Cedar, and AWS IAM as split systems and ACP as atomic, providing a structural taxonomy of existing governance mechanisms. Admissibility is a property of execution, not evaluation.

URL PDF HTML ☆

赞 0 踩 0

2604.17172 2026-04-23 cs.DC cs.AI

UCCL-Zip: Lossless Compression Supercharged GPU Communication

Shuang Ma, Chon Lam Lao, Zhiying Xu, Zhuang Wang, Ziming Mao, Delong Meng, Jia Zhen, Jun Wu, Ion Stoica, Yida Wang, Yang Zhou

2604.16779 2026-04-23 quant-ph cs.LG

Q-SINDy: Quantum-Kernel Sparse Identification of Nonlinear Dynamics with Provable Coefficient Debiasing

Samrendra Roy, Syed Bahauddin Alam

2604.16756 2026-04-23 cs.SE cs.AI

Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering

Francesco Sovrano, Gabriele Dominici, Alberto Bacchelli

Comments Accepted for publication in the proceedings of FSE'2026

详情

DOI: 10.1145/3808115

英文摘要

Prompt-induced cognitive biases are changes in a general-purpose AI (GPAI) system's decisions caused solely by biased wording in the input (e.g., framing, anchors), not task logic. In software engineering (SE) decision support (where problem statements and requirements are natural language) small phrasing shifts (e.g., popularity hints or outcome reveals) can push GPAI models toward suboptimal decisions. We study this with PROBE-SWE, a dynamic benchmark for SE that pairs biased and unbiased versions of the same SE dilemmas, controls for logic and difficulty, and targets eight SE-relevant biases (anchoring, availability, bandwagon, confirmation, framing, hindsight, hyperbolic discounting, overconfidence). We ask whether prompt engineering mitigates bias sensitivity in practice, focusing on actionable techniques that practitioners can apply off-the-shelf in real environments. Testing common strategies (e.g., chain-of-thought, self-debiasing) on cost-effective GPAI systems, we find no statistically significant reductions in bias sensitivity on a per-bias basis. We then adopt a Prolog-style view of the reasoning process: solving SE dilemmas requires making explicit any background axioms and inference assumptions (i.e., SE best practices) that are usually implicit in the prompt. So, we hypothesize that bias-inducing features short-circuit assumption elicitation, pushing GPAI models toward biased shortcuts. Building on this, we introduce an end-to-end method that elicits best practices and injects axiomatic reasoning cues into the prompt before answering, reducing overall bias sensitivity by 51% on average (p < .001). Finally, we report a thematic analysis that surfaces linguistic patterns associated with heightened bias sensitivity, clarifying when GPAI use is less advisable for SE decision support and where to focus future countermeasures.

URL PDF HTML ☆

赞 0 踩 0

2604.15560 2026-04-23 astro-ph.EP astro-ph.IM cs.LG

ExoNet: Calibrated Multimodal Deep Learning for TESS Exoplanet Candidate Vetting using Phase-Folded Light Curves, Stellar Parameters, and Multi-Head Attention

Md. Rashadul Islam

Comments v2: Complete revision. Corrected systematic TOI/TIC cross-identification errors present in v1. Rebuilt inference pipeline using verified NASA Exoplanet Archive catalog (4,720 PC-disposition candidates, up from 200). Updated all results, figures, and performance metrics. 8 pages, 4 figures, 6 tables

2604.12456 2026-04-23 eess.AS cs.AI

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Qixi Zheng, Yuxiang Zhao, Tianrui Wang, Wenxi Chen, Kele Xu, Yikang Li, Qinyuan Chen, Xipeng Qiu, Kai Yu, Xie Chen

2604.01965 2026-04-23 cs.IR cs.AI cs.CL cs.DL

Do We Need Bigger Models for Science? Task-Aware Retrieval with Small Language Models

Florian Kelber, Matthias Jobst, Yuni Susanti, Michael Färber

Comments Accepted at NSLP@LREC 2026

2603.14222 2026-04-23 cs.CR cs.AI

Membership Inference for Contrastive Pre-training Models with Text-only PII Queries

Ruoxi Cheng, Yizhong Ding, Jian Zhao, Hongyi Zhang, Haoxuan Ma, Tianle Zhang, Yiyan Huang, Xuelong Li

2603.09046 2026-04-23 cs.CR cs.LG cs.OS

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

Yinpeng Wu, Yitong Chen, Lixiang Wang, Jinyu Gu, Zhichao Hua, Yubin Xia

Comments 13 pages, 11 figures

2602.22437 2026-04-23 cs.DC cs.AI cs.LG

veScale-FSDP: Flexible and High-Performance FSDP at Scale

Zezhou Wang, Youjie Li, Zhiqi Lin, Jiacheng Yang, Cong Xie, Guanyu Feng, Zheng Zhong, Ziyue Huang, Hongyu Zhu, Zhi Zhang, Yanghua Peng, Xin Liu

2602.20181 2026-04-23 cs.CY cs.AI

Catalyzing Informed Residential Energy Retrofit Decisions via Domain-Specific LLM

Lei Shu, Dong Zhao, Jianli Chen, Armin Yeganeh, Sinem Mollaoglu, Jiayu Zhou

2602.15037 2026-04-23 cs.SE cs.AI

CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis

Mayank Ravishankara

详情

DOI: 10.1109/SoutheastCon63549.2026.11476399

英文摘要

As large language models (LLMs) advance toward expert-level performance in engineering domains, reliable reasoning under user-specified constraints becomes critical. In circuit analysis, for example, a numerically correct solution is insufficient if it violates established methodological conventions such as mesh directionality or polarity assignments, errors that can propagate in safety-critical systems. Yet it remains unclear whether frontier models truly apply first-principles reasoning or rely on entrenched training priors that conflict with explicit instructions. We introduce CircuChain, a diagnostic benchmark designed to disentangle instruction compliance from physical reasoning competence in electrical circuit analysis. CircuChain consists of counterbalanced Control/Trap problem pairs across five canonical circuit topologies, augmented with systematic variations in sign conventions, current orientations, and polarity definitions. A multi-stage verification pipeline, combining symbolic solvers, SPICE simulation, and an LLM-based error taxonomy, enables fine-grained attribution of failures to convention errors, physics errors, arithmetic mistakes, or hallucinations. Across 100 tasks per model, we observe a consistent Compliance-Competence Divergence. The strongest model evaluated exhibits near-perfect physical reasoning but a high rate of convention violations when Trap conditions deliberately invert natural sign patterns. Conversely, weaker models display lower physical fidelity yet superior adherence to explicit instructions. These results suggest that increased model capability does not guarantee improved constraint alignment and highlight the need for new evaluation frameworks that stress instruction-following under mathematically rigid domains. CircuChain provides one such framework and offers actionable insights for both engineering education and AI alignment research.

URL PDF HTML ☆

赞 0 踩 0