The forbidden subword method
I haven’t felt like writing anything hard or long, but I have felt like writing, so here is something easy that I have had cause to think about over the past couple days. This is a post about a basic tool in the enumeration toolkit, which I have used countless times since I learned of it, but strangely I have never heard anyone else talk about it. There are less general techniques that are wellknown, and there more general techniques that are significantly more complicated to use, but this Goldilocks zone seems woefully underdisturbed.
I learned this method in my intro to combinatorics course, taught by Chris Godsil in 2014. I don’t think the rest of the class liked him all that much, because the class was early in the morning and he, in the parlance of the times, goes hard in the paint. But I ended up taking four whole courses with him, so suffice it to say he left a good impression on me. I am now feeling the compulsion to wax nostalgic for those days, so let me move on quickly.
A language is simply a subset of all the words—finite sequences—made from some alphabet having letters. Suppose that is finite, and you have a finite set of words . Then the basic form of the forbidden subword method gives you a generating function
for the language consisting of all words that do not contain anything in as a subword.
Before I begin, I should hope I do not have to extol the virtues of generating functions to you, dear reader. They are a fundamental piece of enumerative technology, and they are exceedingly versatile. I will give a sample of the applications for which you can use this method, but it is by no means exhaustive, and is truncated for concision. If you want something to read, I perennially recommend Wilf’s book generatingfunctionology.
In case you need a refresher to how languages work, let me prime you with a couple of quick examples.
 The empty word is denoted , and is a perfectly valid word. It has length zero, so the generating function of the singleton language is the constant .
 is a set of letters, which are words of length 1, so its generating function is the monomial .

The set of all words has all possible words of length , so its generating function is the geometric series
 The generating function of the disjoint union of two languages is the sum of their generating functions.

If you have two languages and , their concatenation is the language
The generating function of is the product of the generating functions of and .^{1}
To start, let’s walk through the method’s internal logic in a simple case, with two letters and one word: and . The trick is to define an auxiliary language , consisting of all words having precisely one occurrence of , and that occurrence being right at the end. and are related to each other in some simple ways, and these will enable us to find a system of equations among them, which can be transformed into a system of equations in their generating functions.
The first observation comes from considering the language product . Every such concatenation is a nonempty word, and either it contains no occurrences of , or it contains precisely one, right at the end. Accounting for the fact that the empty word does indeed belong to , we find the following equation of languages:
The value of this observation is that it translates directly into an equation of generating functions. Letting be the generating function for and for , we see that
If we can find a second, independent equation of languages, we will have a system in which we can solve for .
And for this, we make a second observation: if you simply concatenate the entirety of to , that definitely has occurrences of the forbidden word, you just need to track what they look like. It is possible that you have a word in ending in , such that the first completes an occurrence of the forbidden word, and then there is a hanging on afterwards. But if not, then you simply get a word in .
Putting this all together, we find
which overall gives us the following system of generating functions:
Then you solve this system for just like you would in grade school: substitute into the top equation and rearrange to find
Now you can do anything you can normally do with generating functions, such as extract coefficients. Here’s an example of a spicier thing you can do: evaluating at gives the expected number of coin flips until you observe the sequence . Feel free to verify that the answer in this case is 10.
We are ready to examine the general case, where and are arbitrary finite sets.
As before, we define to have no subwords from . Now, there is one auxiliary language for each , having no occurrence of any forbidden subwords except for a single occurrence of at the end of the word. Without loss of generality, we may assume no word in is a proper subword of another word in : if some is properly contained in another , then is empty because any word ending in will contain an occurrence of also.
Then we construct equations. The first is, as before, , which translates to
Then for each , we obtain an equation whose LHS is . On the RHS, for each triple of words such that , , and , there will be an term.
One way to manage this complexity is to package the ’s into sets of “quotients”
so that
This translates into the equation
where is the generating function of , which will always be a polynomial of degree strictly less than where each coefficient is 0 or 1.
Now you simply solve this as a system of equations, and you’re done.
And now, here’s a handful of quick applications and extensions of this method.

What’s the probability of seeing at least heads in a row if you flip a coin times?
Take and . Once you have that , you can scale these numbers down to probabilities, and negate them by computing . Then you simply extract the th coefficient.

How many sequences of coinflips having heads and tails avoid the word ?
Take and , but this time use multiple variables in your generating functions: for and for . Now your system of equations looks like:

My friends and I are playing a game where each of us picks a sequence of letters in , and then we sample it uniformly until one of us sees our word… What are my chances of winning?
Take to be the set of all your friends’ words. The chances of winning is , unless is a superword of some other word, in which case it will either sometimes tie with its suffixes or always lose. You can verify that always.

How do I count lattice paths in the plane that can go in any direction but do not backtrack?
Take and . You can speed up your working by observing that and , by a simple bijection argument.

Can be infinite?
Technically, yes. There are only finitely many words of any length so the RHS of each equation will always converge^{2} and only finitely many equations will affect the prefixes of the generating functions and so can be infinite. However, this will become extremely difficult to solve by hand, unless is nice in some analyzable way. If you can put this to use I’d love to see it.

Can be infinite?
It can, but it has to at least be locally finite, meaning that only finitely many letters have any particular weight (cf. example 2). Equivalently, it needs to have a welldefined generating function. That said, interpreting what it is you’ve just counted becomes a little more nuanced in this case, so I would not take this as an easy generalization.

I want the answer to be Fibonacci!
Uh, okay, I guess. Take and to find that .
Finally, I will extemporate a little on the other techniques that I mentioned earlier, and how this method sits among them.
The first that comes to mind is that all these languages are regular languages, which means in principle you can write out their regular expressions and then multiply together the appropriate generating functions without solving a system of equations. True as that is, it is rare that the regular expression appropriate for the problem is simpler to find than writing out and solving this system of equations. Regular expressions are atrocious to work with in all but the nicest circumstances.
The other technique that comes to mind is the more powerful “state machine” method where you write down the adjacency matrix of a digraph of states and transitions, and then compute
where is the sum over all permissible source states and the target states.
This can certainly emulate and even trounce the forbidden subword method in some special cases, but for the generic problem that the forbidden subword method solves easily, this technique is bloated and difficult. See, if is the length of the longest word in then naively you need states, and reducing the number of states usually requires increasing the complexity of the matrix you’ve just described, which makes the calculations not all that much easier.
It’s all about picking the right tool for the job, in the end. And the forbidden subword method is the right tool for a surprising number of jobs. And it’s super easy to remember how it goes, too: I internalized it once in 2014 and I haven’t forgotten it since.

Technically this only works if each word in can be written uniquely as the concatenation of a word in with a word in . Otherwise, the product of generating functions will enumerate as a multiset, i.e. with multiplicities considered. In the language products considered in this post, this will never be an issue, but technically you’ve gotta be careful. ↩

To be clear, this is convergence in the adic topology aka as a formal power series. I know people usually talk about formal power series as not worrying about converging, but technically it’s a form of converging. This is not a generating functions post, so please do not make me get out the generating functions pedantry, I just wanna do a cute little bit of enumeration. ↩
Comments (0 public)
Submit a comment