The alternative, more poetic title that I was tempted to give this post was A Most Esoteric Tragedy. I relented for three reasons. Morally: I detest clickbait. Practically: it would unhelpfully obscure the topic on the rare occasion that someone might actually find this ‘blog helpful. (Ha!) Most of all, emotionally: it would be comically hammy for me to entertain the idea that the contrivance contained in this analysis could be called a tragedy. But hopefully, the fact that I spent the entire flavour text being snobby about this alternative title sufficiently conveys the self-indulgence and megalomania I feel for having come up with it.

Esoteric programming languages, or esolangs, are, as their name suggests, deliberately minimalist or extravagant or otherwise obfuscated programming languages, which are mostly of recreational—at best, theoretical—value to the disciplines of computer science. Having cut my teeth on them in high school, I feel a deep affinity for them, and their existence is what makes the idea of programming bearable to me. For this reason, I will endeavour to make this readable to what I recall to be myself a decade ago, so if you are a recurring reader and you think I emphasize or clarify something that is out of character for me, rest easy, because I am merely trying to insult my past self’s level of intelligence, not yours.

There are many genres of esolang, but in this post we will focus on one small family, all invented by linguist Chris Barker in 2001 and 2002, that falls into what might be called the combinatory logic genre. I plan to explain these three languages, both in the literal sense of the specification, and in the metaphorical sense of ‘by what mechanisms and principles do they work’. Partly to cap off my own recent study of them, and partly to give an alternative perspective to the internet and anyone else who has the same archaeological curiosity as my past self.

To set the scene, combinatory logic is a rudimentary model of computation, which exists to be compared and contrasted with the lambda calculus. Where lambda calculus seeks to reason about functions and computation by defining and reasoning about functions analytically, combinator calculi1 work synthetically, building complicated functions from a small set of simpler ones. They both describe the same thing, and can live peacefully in the same world, so I will be using ‘terms’ and ‘combinators’ almost interchangeably, but they have slightly different theoretical connotations which don’t matter unless you are a pedant like me. I will be using both of them heavily in what follows, so familarity with both is ideal; but as with most esolang-related computer science, you can probably fake your way through it even if you aren’t, and have merely skimmed the Wikipedia articles.

Finally, I will admit up front that the analysis I will be doing is ahistorical. I will be: assigning a sequential order to developments which did not necessarily occur in that (or any!) order; poetically assigning motivations and narratives that I have imagined; and speculating wildly overall. Chris Barker, if you’re reading this, sorry for putting words in your mouth. Oh, and while I have your attention, please finish renovating your website! You’ve destroyed the links to the original documentation of these langs and I had to create an account on the Esolang wiki to edit its References section to include links to archived versions on the Wayback Machine.

# Warmup

To begin, here are some combinators to consider.

Very famously, every term with no free variables in the lambda calculus can be written as some combination of the S and K combinators. Indeed, for any combinator $X$,2

and then there is a simple algorithm by which you can recursively eliminate abstractions from any term using only S, K, and I:

The reason this matters is to do with a little thing called Turing-completeness, you might have heard of it. That maybe sounds like I’m being flippant for a joke, but actually I’m not going to talk about Turing-completeness at all, because it’s irrelevant. For our purposes we will just leave it at: lambda calculus and combinatory logic are both Turing-complete, whatever that means, so any language or computation model that can express everything they can express, is also Turing-complete.

Anyway, the fact that SKI forms a basis for combinatory logic is actually the main conceit for the very first esolang in this combinatory logic genre, David Madore’s Unlambda. Unlambda is so famous, it not only has an entry on the Esolang wiki or on Wikipedia, but it was referenced on my ‘blog in a previous post! To make a medium story short, Unlambda is SKI, plus some extra design quirks, wrapped in a spartan syntax, and then I/O hastily duct-taped on top.

Unlambda set a baseline for the genre, in that you can boil down—or at least taxonomize—the difference into worries about the aesthetics of those four facets: the combinator calculus, the other language design, the syntax, or the I/O capabilities. For example, you can trade out SK for another basis, like BCKW. Or maybe your name is John Tromp and you think you can do I/O better, so you come up with a different model for that and create Lazy K.

And this brings us to the first of Barker’s languages: Iota, from (near as I can place it) 2001.

# Iota

The main conceit of Iota, and what grants it its harsh minimalism, is primarily the use of a single combinator as the basis for its calculus. This universal combinator, which Barker calls $\iota$, Iota calls i, and which I will call U, is $\lambda f.f\m{SK}$. The other necessary ingredient is the notation for application, for which Barker follows in the footsteps of Unlambda and makes a prefix operator, which he decides to spell *.

Barker summarizes this information as follows.

syntax semantics
$F\to\texttt i$ $\lambda f.f\m{SK}$
$F\to\texttt *FF$ $[F][F]$

To prove U forms a basis by itself, it suffices to write another basis in terms of it. So here is SKI, and how to combine them:

If you don’t like U you can also consider a variant of Iota wherein i codes for a different universal combinator. Barker does not explicitly suggest this, but he includes discussion of alternatives, such as $\lambda f.f\m{KSK}$ or $\lambda f.f\m S(\m{BKK})$.

In any case, Iota can write anything that SKI can, so it gets to enjoy the same Turing-completeness. And it accomplishes this with only two language elements, * and i, which is generally where minimalist esolangs are content to draw the line.

Notably, Iota does not have I/O of any kind. I mean, if you are so inclined you can consider the program as coding for a combinator that takes input according to one of the many data encodings of lambda calculus and spits out a similarly coded output. But it doesn’t really have a framework for that. We will see this addressed by other members of the language family, but not immediately.

Instead, the first issue Barker takes with Iota is the linguistic relationship between * and i. In Barker’s own words, * is syncategorematic3, which is to say it does not really have its own semantic meaning, it is necessarily facilitating some modification of other elements, and cannot exist without them. To wit, i is an Iota program, but * is not. Barker is primarily a linguist, so this is probably a natural question to ask, whether application need be handled syncategorematically.

# Jot

In the middlest language of this family, Jot, Barker succeeds in lexicalizing Iota’s application operation, and consequently arrives at a language with a number of interesting properties. However, as I hope to convey to you, there were a couple of design choices made that give me pause, and make me wonder how Barker thinks about his languages. Here, I’ll start with the syntax/semantics table and we’ll work out from there.

syntax semantics
$F \to \varepsilon$ $\m I$
$F \to F\texttt0$ $\m U[F]$
$F \to F\texttt1$ $\m B[F]$

The high level overview of Jot’s syntax is that it is left-branching, so each next bit modifies the program, starting with the identity for the empty program. 0 applies the universal combinator (which for now we will insist must remain U and not some other one, for a subtle reason) and 1 uses the lexicalized application. The mapping to SKI is a little uglier now:

In particular, the proofs that these work as they should require a little more subtlety than the simple transliteration we used in Iota. Here’s how you prove that, say, K works as intended: for all programs $F$,

And then for application, you observe that $[F\texttt1]XY = \m B[F]XY = [F](XY)$. Finally, you can say the magic words, “by induction,” and the correctness follows.

We’re almost done describing what makes Jot cool, but we have enough for me to bring up my first quibble. If you were paying attention to the universal combinator stuff in Iota, then you may have noticed that it’s actually kind of weird for $[w\texttt0]=\m U[w]$ when it’s a lot more directly Iota-like for it to have been $[w]\m U$. In fact, I’ll save you the trouble of checking: if Barker had done this then K would have been 1010100 and S would have been 101010100, and Jot would have been literally the same language as Iota but spelled with 10 instead of *i.4 If you were feeling whimsical, you could call this version of Jot a resemanticization of Iota that categorematifies *.

So why did he change it when moving to Jot? I don’t know for sure, but currently my running hypothesis is, “because he could”. Barker likely noticed that using U in this way still gave access to roughly the same computations as the Iota way did, but the result made the coding of S and K a little bit shorter, so he went with that for Jot. This mutation really isn’t that big of a deal, but you should keep the unsimplified version in mind when we move to the final language in the family.

The final observation that Barker makes about Jot is that, since every term in the SK combinator calculus can be written as a Jot program starting with 1, Jot constitutes a Gödel numbering, that is, for every program there exists a natural number coding for it. The mapping from Gödel number to program is to express the number in binary and then read that as Jot code for a term in the combinator calculus; the reverse mapping is to code your program into combinator calculus, express that in terms of S and K, and then read off the Jot number.

This is admittedly a pretty cool property for a language to have. It’s an extremely natural coding—it’s just binary, and every bit has a concrete meaning!—compared to the hacks that logicians have had to employ in the past. But I feel like there is something devastatingly obvious the Barker missed about this coding, that has some serious implications for the aesthetics of the language and its Gödel numbering.

Here, take a look. By fiat, the meaning of the empty Jot program is just the identity combinator. Nothing wrong with that, it’s a perfectly natural choice. But look at what happens to initial 1s in a Jot program as a result:

They have no effect! Any Jot program can begin with arbitrarily few or many 1’s and they would not affect the resultant program. So the “every combinator can be written as a Jot program that starts with 1” observation is kind of a sham, because you can freely add or remove 1’s at no cost.

And here’s the kicker.

When you express a number in binary, you know what bit that you can freely add to or remove from the front of it, that doesn’t change its binary meaning?

The bit 0.

If Jot had all its bits flipped, it would still be a Gödel numbering—just drop all the leading zeroes from your transliteration of the SK expression—but it would also arithmetically respect the binary representations of numbers.

…Tragedy!

Of course, it behooves me to try to explain Barker’s design choice. My hypothesis for this one relies on the original form of Barker’s syntax/semantics table for Jot. In it, he writes $[F\texttt1] = \lambda xy.[F](xy)$, which is more explicit about its purpose. This explicit formulation probably muddles the computations just enough to make the extensional equivalence $[\texttt1] = [\varepsilon]$ feel not so transparent. Even the SK form of this expression, $[F\texttt1] = \m S(\m K[F])$, seems complicated enough that you could conceivably miss what happens for the empty program. But it still feels like an oversight to me, and I can’t have been the only person to have noticed this, so the explanation is probably more nuanced than that. Maybe Barker simply rejects extensional equivalence altogether?

In any case, I hereby declare my Jot relex, Jomplement, an even better Gödel numbering than Jot.

# Zot

In 2002, Barker finally tries to tackle I/O. To recap, Iota had no I/O to speak of. Jot was not designed with I/O in mind, but strictly speaking, it can take further bits of input after the given program. However, those bits manifest as combinators of arbitrary complexity, which are passed to the Jot program as arguments, making the task of determining the bits used all but impossible. So for Zot, Barker had to devise some new model for which a program could conceivably take as input discernable bits, and then give it a compatible output model too.

Barker describes Zot as a variant of Jot, with the same syntax, and “the only semantic difference is that the values have all been type-lifted using continuations”. This is almost perfectly accurate, but you have to use the unsimplified Jot that I discussed, where $[F\texttt0] = [F]\m U$, and not the standard $\m U[F]$ one. If you recall, that is the one whose combinator transliterations of S and K were exactly the same as Iota’s, and it will be no coincidence that Zot spells S and K the same way too.

syntax semantics
$F \to \varepsilon$ $\lambda c.c\m I$
$F \to F B$ $[F][B]$
$B \to \texttt0$ $\lambda c.c\m U$
$B \to \texttt1$ $\lambda cL.L(\lambda\ell R.R(\lambda r.c(\ell r)))$

If you are not a computational linguist or someone else that has cause to think about continuations, you may say to yourself, “Continuations? What are those needlessly complicated abstractions doing here?” And I do not yet have a ‘blog post in which I explain continuations adequately, so I cannot point you there, for you to see the errors of your ways. But suffice it to say that continuations, arcane as they may seem, are an important functional tool for understanding and manipulating control flow, and for Barker, continuization is the method by which Zot is able to build a program and have that program take input, using the same semantics throughout.

I don’t really know a way to describe this that doesn’t require meditation on continuations, so I’ll do this. First I’m going to give a lemma about Zot semantics. Then, towards a sketch of a proof, I’m going to describe the sense in which $[\varepsilon]$, $\texttt0$, and especially $\texttt1$ are continuized versions of their Iota and Jot counterparts. After that, even if you don’t understand continuations, hopefully you will still come out with a sense of how Zot accomplishes what it does.

Lemma. If $F$ is a bitstring where every prefix has at least as many 1’s as 0’s, then $[F] = \lambda c.cX$ for some combinator $X$.

The condition about prefixes should be familiar to any combinatoricists that are reading, and they are probably already coming up with alternative ways of phrasing it that they prefer. One way in particular that I will single out, for its intuitive content, is that if you treat 1’s like opening parentheses and 0’s like closing parentheses, then every closing paren in $F$ matches up to some open one (and then maybe there are some extra open ones that haven’t been closed yet but that’s fine).

This $\lambda c.cX$ form is, roughly, a signal that the meaning of the term is still continuized, so it will interact with other continuzed terms in a “program-building” way. But after that, if we reach a 0 bit in $F$ where the 0’s finally outnumber the 1’s, the result is no longer guaranteed to be continuized, and then Zot may stop permitting the next bits to contribute to the definition of the program and instead just treat them as a series of distinguishable arguments.

The argument at the start of the continuized form is meant to be the continuation of the program—to make a long story short, the future of the program ought to be packed into that one argument, so that you apply it to your result when you’ve finished making it. You see, if you have two such continuzed combinators, then they dance around each other in an interesting way:

Did you catch that? It came out to $XY$ at the end, but both of the continuized versions took a turn being at the head of the computation. In this case they both simply ceded control to the other, so nothing special occurred, but the point is that if you live in this continuized world where everybody promises to give everyone else a turn at the head of the computation, then everyone has a chance to enact something at the top level if they really want to. If the $Y$ continuization was $\lambda c.\m K(cY)$ then there would be a K left on the outside of the computation.

This very short and oversimplificatory explanation is also the basis by which some computational linguists think that meaning in natural languages may also be continuized. How else do you explain the ability of the word “everyone” to slip a universal quantifier around the sentence it contains? Oh hey, come to think of it, I’ve some papers and even a book on the applicability of continuations to natural language5 by a certain linguist named Chris Barker. Where have I seen that name before…

Now we’ll talk about how our combinators specifically were continuized. The simplest form of continuization is just as constants: even though combinators themselves are meant to be functions, you can continuize them as though they are data that are not necessarily intended to take arguments. To do this, you just map your combinator $X$ to $\lambda c.cX$, simple as that.

So $[\varepsilon] = \lambda c.c\m I$ is the constant continuization of the identity combinator, that begins every Jot program, and $\texttt0 = \lambda c.c\m U$ is the constant continuization of our universal combinator, U. (It is notable that here, in the Zot documentation, is the first time Barker suggests that you can swap out the universal combinator if you are not satisfied with his choice, so long as you remember to continuize it.)

1 is different, however. It was never just B as data, it was B applied to the meaning of the program. In Jot, Barker phrased it as a function taking two arguments, applying them to each other, and then handing that off to the rest of the meaning of the program.

So we’re going to do a more complicated continuization, that lifts this combinator into the continuized world as a function. Here’s the plan. We take the continuation first—everybody does, so we agree to play along—and that looks like $\lambda c.({-})$. (I’m gonna use $({-})$ to denote a blank, into which the rest of our work will go.) Then we’re gonna do whatever other work we wanted to get done, and finally once we have our result, say $X$, we finish off by doing $cX$.

Then we start to take our arguments. But remember, our arguments are going to be continuized, so they look like $A = \lambda x.xa$ or something, and the only way we can get access to the juicy data they contain is if we give them a continuation and ask them very nicely to please use it. So the blueprint for that is something like this: $\lambda A.A(\lambda a.({-}))$. The $a$ is the argument we wanted, but the $A$ is the continuized argument that we initially get, and we give it the continuation $\lambda a.({-})$ under the assumption that it will surrender its argument to us, as $a$, when it gets a chance to be the head.

1 takes two arguments, the left one and the right one, so so far we’re looking at $\lambda c.\lambda L.L(\lambda\ell.\lambda R.R(\lambda r.({-})))$. All we have to do is apply those arguments to each other ($\ell r$), and then finish off by handing that to $c$ like we promised earlier we would ($c(\ell r)$). So overall, we obtain:

This is exactly 1 as advertised in the semantics table.

With this understanding, the proof of the lemma is not difficult, just a little tedious to phrase correctly. If I had to prove it, I’d reduce it the following more explicit claim.

Claim. If $F$ is a bitstring satisfying the prefix condition and having $n$ more 1’s than 0’s, then

for some combinator $X$. In other words, $[F]$ is the ($n+1$)-th order continuization of $X$.

And this is a perfect segue into Zot’s I/O facilities. Zot takes program and input from the same bitstream, one after the other, so the delineation between the two is a lot fuzzier than in other languages. Strictly speaking, you are free to declare any particular bitstring as the program you intended to write, and all subsequent bits input. But from the Lemma, there is a natural dichotomy between the bits that Zot treats as program and the bits it treats as input.

Combining the bit-by-bit semantics with the Lemma, any bitstring where all prefixes have at least as many 1’s as 0’s will be continuized, and is therefore required to cede the head of the computation to its argument. But by the same token, the first prefix of a bitstring to have more 0’s than 1’s is no longer bound by the Lemma and can potentially treat the following bits purely as arguments.

And, being that $\texttt0 \ne \texttt1$, they can be distinguished. Barker suggests the combinator $\m Q = \lambda f.f\m{IIIK}$, which satisfies

the standard encodings of true and false respectively for boolean into lambda calculus and combinatory logic. (Yes, you read that right, it’s true on 0 and false on 1. Barker gets this mixed up in his Zot ‘page. Another point for the Jomplement truthers!) So now, if $B$ is the Zot meaning of a bit, and $X$ and $Y$ are the combinators you want to evaluate on 0 or 1 respectively, then $\m QBXY = B\m{IIIK}XY$ will accomplish that. Set yourself up in a loop (infinite or finite) and you have a program taking bits of input.

Output is less cute, and borne of a non-theoretical necessity, but I will try to describe it nonetheless. You may have noticed a pattern among the terms $\m U, [\varepsilon], \texttt0, \m Q$: they all begin with a lambda abstraction, as in “$\lambda f.$”, and then their action is simply to feed $f$ a sequence of combinators. For U it is $\langle \m S,\m K \rangle$, for 0 it is $\langle \m U \rangle$. In fact, the identity combinator itself is of this form, for the empty sequence $\langle\rangle$.

Predictably, Barker calls combinators of this form sequences. For him, $\langle \m I, \m I, \m I, \m K \rangle$ is an alternative notation for Q. He gives a recursive definition of sequences, and casts input in their terms. You may have even noticed that a Zot program is precisely a binary sequence—that is, a sequence of 1’s and 0’s—applied to the continuization of the identity, which starts the whole process of peeling those continuization layers off and mingling them together into a program. Because they are how program is read and how input is taken, Barker considers binary sequences the appropriate way to give output, too.

If your program is designed to take only certain well-formed inputs, then there is no need to do anything special, and your program will just take input bits until it gets a well-formed input, and then spit out its output. But at this point, Barker imagines what it would be like to have interactive dialogues between input and output. The reasoning goes, the Zot program needs some method of receiving an EOL and optionally surrendering a partial output, and then going back to a further inputs mode. And this is where it gets a bit ill-specified.

Barker has some proof-of-concept Zot interpreters which signal end-of-input by first sending a new combinator that he calls “output” and that I will call E:

and then, assuming the result is some binary sequence, further sending a printing function to it which recursively prints everything in the sequence it gets. In case E looks silly at first blush, let me point out that it is designed so that $\m{QE} = \m K(\m{KI})$, which in turn has the effect that the bit-distinguishing trick from our input discussion has the result $\m QBXY = \m I$ when $B = \m E$.

This is a fun start of an idea, but Barker seems to leave it unfinished, deferring to the statement, “This allows an appropriately designed system to have orderly dialogs of input and output.” Perhaps the implication is that a Zot program that does not yet wish to halt would, upon receiving E followed by a printer, apply the printer to whatever bits it wishes to output, and then discard the printer and eat more input? It’s unclear, at least to me.

Furthermore—and I understand that this might be a particularly modern or softwarey concern when Barker is a linguist and made Zot almost two decades ago, but still—is any of this stuff something that a Zot interpreter can accurately tell is occurring? I think you bump up against some uncomputable problems if you try to verify whether a Zot program is actually adhering to any of these hypothetical contracts beyond the simple ones. It is uncomputable, for instance, to tell if the print function you have handed to your Zot program is ever going to be used again, so you can’t use that to tell if Zot is done with it. I haven’t checked, but it’s probably even uncomputable to tell if you have a finite binary sequence or not.

Anyway, this is merely nitpicking at this point. It’s not entirely fair for me to grill Chris Barker in absentia. My real point is, the output is a proof of concept. Zot isn’t quite “a version of Jot with I/O”, as much as it is “(unsimplified) Jot with I and the potential for O”.

# Conclusion

These languages are slick and I think they deserve more eyes and more study. They are a rather insular family, admittedly; I think it would hard to learn specific lessons that are applicable to other languages, especially non-esoteric ones. But there is some juicy theory in there.

Jot in particular is extremely interesting to me—not only because it is a variant of the best Gödel numbering, Jomplement—but also because the semantic sleight of hand by which it operates is impressive. This is perhaps a contradictory tone for me to take, after melodramatically saying it had at its heart a grand tragedy, but I am not so silly as to let what a thing actually is inform my opinion of it. Jot is an idea, that forms the dazzling centerpiece of an adorable6 little theme-and-variations progression in the world of esolangs.

Exercises. 1.(i) Prove the combinators $\lambda f.f\m{KSK}$ and $\lambda f.f\m S(\m{BKK})$ are universal. (ii) Write down a universal combinator that is distinct from the three discussed in this post. Do you notice any patterns?

2. Prove the Lemma. Optional advice: prove the Claim first.

3. Write down a translation table from SKI to Zot.

4. I have argued that Zot is a semantic continuization of “unsimplified Jot”. Write down a continuized semantics for actual Jot.

These exercises are ordered thematically. In terms of difficulty, I reckon $% $ but your mileage may vary. Remember, “write down” means you have to come up with it and then prove it’s correct!

1. Yes, plural. I had half a mind to use the plural form for the lambda calculi too, because there’s more than one of those, but I don’t want to scare you off too quickly.

2. I’m skipping some pedantry about extensional versus intensional equality and what implicit foundations the theory of computation rests on. In particular there are some bad evaluation models and bad $X$’s for which this won’t actually be the identity, but instead have the potential to fail to halt. It doesn’t really matter for the purposes of this ‘blog, but if it did matter there would be a lot of interesting stuff to say, so add that to the list of Things I’d Like to Write Down Sometime.

3. I cannot adequately describe the sheer philological decadence the existence of this word inspires in me.

4. This is why I insisted on U as the universal combinator for Jot, by the way. The sense in which it needs to be universal is different from Iota’s. If you wish to use a different combinator for 0, then either it’s universal and you use the Iota-like semantics, or it’s a subtly different property from universality that it needs to satisfy. U satisfies both universality and this other property, so it’s okay.

5. Here are a couple citations, just for you.

Barker, Chris and Chung-chieh Shan. Continuations and Natural Language. doi:10.1093/acprof:oso/9780199575015.001.0001

6. I’ve noticed I use the word ‘adorable’ a lot whenever I positively review something, lately. It’s in like half of my Steam reviews. I’m trying my best to vary but it’s just such a perfectly good little word. One might even say that it’s… nay, I mustn’t!