AI中文摘要
对于字符串 $u, D \in \Sigma^*$,$u$ 在 $D$ 中的子序列嵌入是一个函数 $e \colon \{1, 2, \ldots, |u|\} \to \{1, 2, \ldots, |D|\}$,满足对每个 $i \in \{1, 2, \ldots, |u|-1\}$ 有 $e(i) < e(i+1)$,且 $u$ 的第 $i$ 个符号等于 $D$ 的第 $e(i)$ 个符号。$u$ 的间隙约束是一个三元组 $(i, j, L)$,其中 $1 \leq i < j \leq |u|$,$L$ 是 $\Sigma$ 上的正则语言。如果 $D$ 中严格位于 $e(i)$ 和 $e(j)$ 之间的因子是 $L$ 中的单词,则嵌入 $e$ 满足间隙约束 $(i, j, L)$。我们研究带间隙约束的子序列匹配问题,该问题与复杂事件识别(CER)相关:给定 $u, D \in \Sigma^*$ 和一组间隙约束 $C$,找到 $u$ 在 $D$ 中满足 $C$ 中所有间隙约束的嵌入。通常,子序列匹配是NP完全的,唯一已知的可处理变体限制了间隙约束的区间结构。在这项工作中,我们表明,如果间隙约束语言满足我们称之为左凸性的性质:只要 $u v w \in L$ 且 $v \in L$,则也有 $uv \in L$,那么我们可以相当高效地(实际上,在SETH下是最优的)在时间 $O(|D| (|u| + |C|))$ 内解决具有任意区间结构的间隙约束子序列匹配。左凸语言足够表达CER中考虑的有趣现实场景,例如长度约束 $L = \{w \mid a \leq |w| \leq b\}$,其中 $a, b \in \mathbb{N}$。我们还展示了如何使用我们的算法高效枚举所有满足条件的嵌入,这对于CER中的可能应用尤为重要。最后,我们展示了非左凸语言如何导致难解性,即如果除了长度约束外,还允许 $\{aa, \epsilon\}$ 作为唯一的非左凸约束语言,那么问题再次变为NP完全的。
英文摘要
For strings $u, D \in Σ^*$, a subsequence embedding of $u$ in $D$ is a function $e \colon \{1, 2, \ldots, |u|\} \to \{1, 2, \ldots, |D|\}$ with $e(i) < e(i+1)$ for every $i \in \{1, 2, \ldots, |u|-1\}$ and the $i$-th symbol of $u$ equals the $e(i)$-th symbol of $D$. A gap-constraint for $u$ is a triple $(i, j, L)$ with $1 \leq i < j \leq |u|$ and $L$ is a regular language over $Σ$. An embedding $e$ satisfies a gap-constraint $(i, j, L)$ if the factor of $D$ strictly between positions $e(i)$ and $e(j)$ is a word from $L$. We investigate the subsequence matching problem with gap-constraints, which is relevant in the context of complex event recognition (CER): given $u, D \in Σ^*$ and a set $C$ of gap-constraints, find an embedding of $u$ in $D$ that satisfies all gap-constraints from $C$.
In general, subsequence matching is NP-complete and the only known tractable variants restrict the interval structure of the gap-constraints. In this work, we show that we can solve subsequence matching with gap-constraints with an arbitrary interval structure rather efficiently (in fact, optimally under SETH) in time $O(|D| (|u| + |C|))$ if the gap-constraint languages satisfy a property which we dub left-convexity: whenever $u v w \in L$ and $v \in L$, then also $uv \in L$. Left-convex languages are sufficiently expressive to model interesting real-world scenarios considered in CER, e.g., length constraints $L = \{w \mid a \leq |w| \leq b\}$ for $a, b \in \mathbb{N}$. We also show how our algorithm can be used in order to efficiently enumerate all satisfying embeddings, which is particularly relevant for possible applications in CER. Finally, we show how non-left-convex languages can lead to intractability, i.e., if in addition to length constraints we allow $\{aa, ε\}$ as the only non-left-convex constraint language, then the problem is NP-complete again.