The forbidden subword method

I haven’t felt like writing anything hard or long, but I have felt like writing, so here is something easy that I have had cause to think about over the past couple days. This is a post about a basic tool in the enumeration toolkit, which I have used countless times since I learned of it, but strangely I have never heard anyone else talk about it. There are less general techniques that are well-known, and there more general techniques that are significantly more complicated to use, but this Goldilocks zone seems woefully underdisturbed.

I learned this method in my intro to combinatorics course, taught by Chris Godsil in 2014. I don’t think the rest of the class liked him all that much, because the class was early in the morning and he, in the parlance of the times, goes hard in the paint. But I ended up taking four whole courses with him, so suffice it to say he left a good impression on me. I am now feeling the compulsion to wax nostalgic for those days, so let me move on quickly.

A language is simply a subset of all the words—finite sequences—made from some alphabet $\Sigma$ having $s$ letters. Suppose that $s$ is finite, and you have a finite set of words $F \subseteq \Sigma^*$ . Then the basic form of the forbidden subword method gives you a generating function

$A(x) = \sum_{w \in L} x^{|w|} = \sum_{n \in \mathbb N} (\text{# of words of length}\ n)x^n$

for the language $L$ consisting of all words that do not contain anything in $F$ as a subword.

Before I begin, I should hope I do not have to extol the virtues of generating functions to you, dear reader. They are a fundamental piece of enumerative technology, and they are exceedingly versatile. I will give a sample of the applications for which you can use this method, but it is by no means exhaustive, and is truncated for concision. If you want something to read, I perennially recommend Wilf’s book generatingfunctionology.

In case you need a refresher to how languages work, let me prime you with a couple of quick examples.

The empty word is denoted $\epsilon$ , and is a perfectly valid word. It has length zero, so the generating function of the singleton language $\{\epsilon\}$ is the constant $1$ .
$\Sigma$ is a set of letters, which are words of length 1, so its generating function is the monomial $sx$ .
The set of all words $\Sigma^*$ has all $s^n$ possible words of length $n$ , so its generating function is the geometric series
$\sum_{n \in \mathbb N} s^n x^n = \frac1{1 - sx}.$
The generating function of the disjoint union of two languages is the sum of their generating functions.
If you have two languages $L$ and $M$ , their concatenation is the language
$L \cdot M = \{ ww' : w \in L, w' \in M \}.$
The generating function of $L \cdot M$ is the product of the generating functions of $L$ and $M$ .¹

To start, let’s walk through the method’s internal logic in a simple case, with two letters and one word: $\Sigma = \{\texttt h,\texttt t\}$ and $F = \{\texttt{hth}\}$ . The trick is to define an auxiliary language $M$ , consisting of all words having precisely one occurrence of $\texttt{hth}$ , and that occurrence being right at the end. $L$ and $M$ are related to each other in some simple ways, and these will enable us to find a system of equations among them, which can be transformed into a system of equations in their generating functions.

The first observation comes from considering the language product $L \cdot \Sigma$ . Every such concatenation is a nonempty word, and either it contains no occurrences of $\texttt{hth}$ , or it contains precisely one, right at the end. Accounting for the fact that the empty word $\epsilon$ does indeed belong to $L$ , we find the following equation of languages:

$\def\Cup{\mathbin{\mkern2mu\cup\mkern2mu}}\{\epsilon\} \Cup L \cdot \Sigma = L \Cup M.$

The value of this observation is that it translates directly into an equation of generating functions. Letting $A(x)$ be the generating function for $L$ and $B(x)$ for $M$ , we see that

$1 + A(x) \cdot (2x) = A(x) + B(x).$

If we can find a second, independent equation of languages, we will have a system in which we can solve for $A(x)$ .

And for this, we make a second observation: if you simply concatenate the entirety of $\texttt{hth}$ to $L$ , that definitely has occurrences of the forbidden word, you just need to track what they look like. It is possible that you have a word in $L$ ending in $\texttt{ht}$ , such that the first $\texttt h$ completes an occurrence of the forbidden word, and then there is a $\texttt{th}$ hanging on afterwards. But if not, then you simply get a word in $M$ .

Putting this all together, we find

$L\cdot\texttt{hth} = M\cdot\texttt{th} \Cup M,$

which overall gives us the following system of generating functions:

$\begin{align*} 1 + 2xA(x) &= A(x) + B(x) \\ x^3A(x) &= x^2B(x) + B(x) \end{align*}$

Then you solve this system for $A(x)$ just like you would in grade school: substitute $B(x) = \frac{x^3}{1+x^2}A(x)$ into the top equation and rearrange to find

$A(x) = \frac{1}{1 - 2x + \frac{x^3}{1+x^2}} = \frac{1 + x^2}{1 - 2x + x^2 - x^3}.$

Now you can do anything you can normally do with generating functions, such as extract coefficients. Here’s an example of a spicier thing you can do: evaluating $\frac{xd}{dx}B(x)$ at $1/2$ gives the expected number of coin flips until you observe the sequence $\texttt{hth}$ . Feel free to verify that the answer in this case is 10.

We are ready to examine the general case, where $\Sigma$ and $F \subseteq \Sigma^*$ are arbitrary finite sets.

As before, we define $L$ to have no subwords from $F$ . Now, there is one auxiliary language $L_f$ for each $f \in F$ , having no occurrence of any forbidden subwords except for a single occurrence of $f$ at the end of the word. Without loss of generality, we may assume no word in $F$ is a proper subword of another word in $F$ : if some $f$ is properly contained in another $f'$ , then $L_{f'}$ is empty because any word ending in $f'$ will contain an occurrence of $f$ also.

Then we construct $\lvert F \rvert + 1$ equations. The first is, as before, $\{\epsilon\} \cup L\cdot\Sigma = L \cup \bigcup_{f \in F} L_f$ , which translates to

$1 + sx A(x) = A(x) + \sum_{f \in F} A_f(x).$

Then for each $f \in F$ , we obtain an equation whose LHS is $L\cdot f$ . On the RHS, for each triple $(u,v,w)$ of words such that $|v|\ge1$ , $uv \in F$ , and $vw = f$ , there will be an $L_{uv}\cdot w$ term.

One way to manage this complexity is to package the $w$ ’s into sets of “quotients”

$\alpha\backslash f = \{ w : \alpha = uv, \lvert v \rvert \ge 1, f = vw \},$

so that

$L\cdot f = \bigcup_{(u,v,w)} L_{uv}\cdot w = \bigcup_{f' \in F} L_{f'}\cdot(f'\backslash f).$

This translates into the equation

$x^{|f|}A(x) = \sum_{f' \in F} A_{f'}(x) Q_{f'\backslash f}(x)$

where $Q_{\alpha\backslash f}(x)$ is the generating function of $\alpha\backslash f$ , which will always be a polynomial of degree strictly less than $\lvert f\rvert$ where each coefficient is 0 or 1.

Now you simply solve this as a system of equations, and you’re done.

And now, here’s a handful of quick applications and extensions of this method.

What’s the probability of seeing at least $k$ heads in a row if you flip a coin $n$ times?

Take $\Sigma = \{\texttt h, \texttt t\}$ and $F = \{ \texttt h^k \}$ . Once you have that $A(x) = \frac{1-x^k}{1-2x+x^{k+1}}$ , you can scale these numbers down to probabilities, and negate them by computing $\frac{1}{1-x} - A\bigl(\frac x2\bigr)$ . Then you simply extract the $n$ -th coefficient.
How many sequences of coinflips having $h$ heads and $t$ tails avoid the word $\texttt{htth}$ ?

Take $\Sigma = \{\texttt h, \texttt t\}$ and $F = \{\texttt{htth}\}$ , but this time use multiple variables in your generating functions: $x$ for $\texttt h$ and $y$ for $\texttt t$ . Now your system of equations looks like:
$\begin{align*} 1 + (x+y)A(x,y) &= A(x,y) + B(x,y) \\ x^2y^2A(x,y) &= (xy^2+1)B(x,y) \end{align*}$
My friends and I are playing a game where each of us picks a sequence of letters in $\Sigma$ , and then we sample it uniformly until one of us sees our word… What are my chances of winning?

Take $F$ to be the set of all your friends’ words. The chances of $f \in F$ winning is $A_f(1/s)$ , unless $f$ is a superword of some other word, in which case it will either sometimes tie with its suffixes or always lose. You can verify that $\sum_{f \in F} A_f(1/s) = 1$ always.
How do I count lattice paths in the plane that can go in any direction but do not backtrack?

Take $\Sigma = \{\texttt n,\texttt e,\texttt s,\texttt w\}$ and $F = \{\texttt{ns},\texttt{sn},\texttt{we},\texttt{ew}\}$ . You can speed up your working by observing that $A_{\texttt{ns}} = A_{\texttt{sn}}$ and $A_{\texttt{we}} = A_{\texttt{ew}}$ , by a simple bijection argument.
Can $F$ be infinite?

Technically, yes. There are only finitely many words of any length so the RHS of each equation will always converge² and only finitely many equations will affect the prefixes of the generating functions $A(x)$ and $A_f(x)$ so $F$ can be infinite. However, this will become extremely difficult to solve by hand, unless $F$ is nice in some analyzable way. If you can put this to use I’d love to see it.
Can $\Sigma$ be infinite?

It can, but it has to at least be locally finite, meaning that only finitely many letters have any particular weight (cf. example 2). Equivalently, it needs to have a well-defined generating function. That said, interpreting what it is you’ve just counted becomes a little more nuanced in this case, so I would not take this as an easy generalization.
I want the answer to be Fibonacci!

Uh, okay, I guess. Take $\Sigma = \{0,1\}$ and $F = \{11\}$ to find that $A(x) = \frac{1}{1-x-x^2}$ .

Finally, I will extemporate a little on the other techniques that I mentioned earlier, and how this method sits among them.

The first that comes to mind is that all these languages $L$ are regular languages, which means in principle you can write out their regular expressions and then multiply together the appropriate generating functions without solving a system of equations. True as that is, it is rare that the regular expression appropriate for the problem is simpler to find than writing out and solving this system of equations. Regular expressions are atrocious to work with in all but the nicest circumstances.

The other technique that comes to mind is the more powerful “state machine” method where you write down the adjacency matrix $A$ of a digraph of states and transitions, and then compute

$E_T^\top(I - xA)^{-1}E_S^{\vphantom\top} = \frac{E_T^\top\operatorname{adj}(I-xA)E_S^{\vphantom\top}}{\det(I-xA)},$

where $E_S = \sum_{s \in S} e_s$ is the sum over all permissible source states $S \subseteq V$ and $E_T$ the target states.

This can certainly emulate and even trounce the forbidden subword method in some special cases, but for the generic problem that the forbidden subword method solves easily, this technique is bloated and difficult. See, if $\ell$ is the length of the longest word in $F$ then naively you need $s^{\ell-1}$ states, and reducing the number of states usually requires increasing the complexity of the matrix you’ve just described, which makes the calculations not all that much easier.

It’s all about picking the right tool for the job, in the end. And the forbidden subword method is the right tool for a surprising number of jobs. And it’s super easy to remember how it goes, too: I internalized it once in 2014 and I haven’t forgotten it since.

Technically this only works if each word in $L \cdot M$ can be written uniquely as the concatenation of a word in $L$ with a word in $M$ . Otherwise, the product of generating functions will enumerate $L \cdot M$ as a multiset, i.e. with multiplicities considered. In the language products considered in this post, this will never be an issue, but technically you’ve gotta be careful. ↩
To be clear, this is convergence in the $x$ -adic topology aka as a formal power series. I know people usually talk about formal power series as not worrying about converging, but technically it’s a form of converging. This is not a generating functions post, so please do not make me get out the generating functions pedantry, I just wanna do a cute little bit of enumeration. ↩

The forbidden subword method

Comments (0 public)

Submit a comment