arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 自然语言与多模态智能 1 篇

2606.13247 2026-06-13 cs.AI 新提交

EPIG: Emotion-Based Prompting for Personalised Image Generation

EPIG:基于情感提示的个性化图像生成

Emna Othmen, Mohamed Yassine Landolsi, Lotfi Ben Romdhane

发表机构 * MARS Research Lab LR17ES05, ISITCom, University of Sousse(苏塞大学ISITCom学院MARS研究实验室LR17ES05)

AI总结 提出EPIG方法,利用心理学效价-唤醒模型在提示层面增强情感表达,无需训练即可控制生成图像的唤醒度,在10个多样化提示上平均唤醒误差降低14%-17%。

Comments Submitted to arXiv. 20 pages, 4 figures. Work on emotion-based prompt engineering for text-to-image diffusion models with applications in personalized image generation

详情
AI中文摘要

文本到图像扩散模型在从自然语言提示合成高质量图像方面取得了令人印象深刻的结果。然而,常用的提示策略仍然相对通用,限制了模型准确表达情感意图和细微情感属性的能力。本文提出EPIG,一种在图像生成之前在提示层面增强情感表达性的方法。基于心理学知情的情感表示(效价-唤醒)并利用结构化的、角色感知的提示丰富化,EPIG在不修改或重新训练图像生成主干的情况下丰富提示的情感相关组件。由此产生的情感感知提示引导生成过程朝向更情感连贯的视觉输出,在控制唤醒方面特别有效。EPIG轻量级、无需训练,非常适合资源受限和个性化图像生成场景。在10个多样化提示的基准测试上的实验结果表明,与强基线(包括朴素插入和基于LLM的提示扩展)相比,EPIG将平均唤醒误差分别降低了14%和12%。这些改进具有统计显著性。EPIG还保持了效价对齐和语义一致性,如CLIPScore所测量并由消融研究所支持。在包含人类、儿童或动物等显式主体的提示上效果更为显著,误差降低达到17%,突出了所提出方法的主题敏感行为。

英文摘要

Text-to-image diffusion models have achieved impressive results in synthesizing high-quality images from natural language prompts. However, commonly used prompting strategies remain relatively generic, limiting the model's ability to accurately express emotional intent and nuanced affective attributes. This work proposes EPIG, a method that enhances emotional expressiveness at the prompt level prior to image generation. Grounded in psychologically informed emotion representations (valence-arousal) and leveraging structured, role-aware prompt enrichment, EPIG enriches emotion-related components of prompts without modifying or retraining the image generation backbone. The resulting emotion-aware prompts guide the generative process toward more emotionally coherent visual outputs, with particular effectiveness in controlling arousal. EPIG is lightweight, training-free, and well suited for resource-constrained and personalized image generation scenarios. Experimental results on a benchmark of 10 diverse prompts show that EPIG reduces mean arousal error compared to strong baselines, including naive insertion and LLM-based prompt expansion, with reductions of 14% and 12%, respectively. These improvements are statistically significant. EPIG also preserves valence alignment and semantic consistency, as measured by CLIPScore and supported by ablation studies. The effect is more pronounced on prompts containing explicit subjects such as humans, children, or animals, where the reduction reaches 17%, highlighting the subject-sensitive behavior of the proposed method.