toki pona is a minimalist constructed language, created by linguist Sonja Lang between 2001 and 2014. It features 9 consonants and 5 vowels in its phonemic inventory (cf. most dialects of English at 24 and 14-25), and those 14 phonemes combine into a 123-word vernacular. The standard reference document on the conlang is currently Lang’s book from 2014. There are many things to find fascinating about toki pona, but the one I’d like to explore in detail right now is this: toki pona dedicates one of its 123 words to that specific book, despite not having a word for “book”. Before we get into what I think is so spicy about that, let’s all get on the same page about toki pona’s history, philosophy, and current state.

toki pona’s main conceit is its minimalist philosophy. You can see material evidence of this in even its phonemic inventory. Here are all fourteen of its phonemes: they are rather basic and almost exactly match their IPA.

a e i o u j k l m n p s t w
/ɐ~ä/ /ɛ/ /i/ /o~ɔ/ /u/ /j/ /k/ /l/ /m/ /n/ /p/ /s/ /t/ /w/

This inventory is fully compatible with English, as well as most of the other world’s most spoken languages. Maybe at a glance it sounds a little infantilised because the most aggressive sounds it can muster are /t/ and /k/, but it only really rears its head if you need to transliterate something. I’m not going to get into the phonotactics, even though it’d be easy to do so, because it’s been done to death and I don’t really need it here. Suffice it to say, the rules are simple and easy to learn.

The real meat of the minimalism of toki pona is in its vocabulary, though. As I alluded to earlier, the vocabulary is incredibly small, just 123 words. For reference, my last post alone used over a thousand unique English words, which is larger by a factor of eight. The function of this vocabulary is to dissect the world into a comparatively small number of basic elements, and challenge you to build the world back out of those primitives with the grammar. The grammar is fairly versatile, allowing you to recombine these atoms in a number of ways whenever you encounter something not having its own word, but not so strong that you can just translate from a natural language word-for-word—you need to put some thought into what it really is, and what’s worth saying about it.

For example, jan is the word for ‘person’ and pona is the word for ‘good’ and ‘friend’ is traditionally analysed into a compound of these two words, jan pona: a good person. For another example, a tomo tawa is a movement structure, which is to say a vehicle, and then a tomo tawa waso is a bird vehicle, which probably refers to a plane.

To get more mileage out of this vocabulary, toki pona often allows its words to occupy more than one part of speech—the adjectival tawa meaning ‘of movement’ can only be distinguished grammatically from the prepositional tawa meaning ‘to’ or ‘towards’ or ‘according to’, or the verbal tawa meaning ‘to move’ transitively (or ‘to move to’ intransitively!)—and wholly embraces the occasional syntactic/semantic ambiguity that arises. “I have an airplane” would be mi jo e tomo tawa waso, but in a different context, that sentence might be read as “I have a house for the bird” instead! You generally have to determine which makes sense based on context. toki pona also does not decline for number or case or anything, preferring instead adjectives (like mute, ‘many’) or prepositions (like lon, ‘in, on, at’) or particles (like e, the direct object particle) or even word order.

The lesson that toki pona is trying to teach is—in my opinion, anyway—that you need to think about what you really mean, and how to most faithfully express it, and maybe even if it deserves to be disambiguated in the way you want it to. Here, here’s a concrete example. toki pona has three words for number: wan, tu, and mute; meaning one, two, and many, respectively. Well, okay, there’s also ala for zero, but that’s just because it’s a polyseme for ‘none’ and also handles verb negation. The claim here is that anything larger is not really important enough to disambiguate. This is probably bold and unsettling to many people at first glance, given how much numbers and sequences of numbers are important in our modern lives. There is an alternative system of numbers, in which luka (hand) is assigned 5, mute 20, and ale (all) 100, but this is still terribly unwieldy. The rejection of large numbers is a statement, a conscious decision on the part of toki pona that they don’t—or shouldn’t—matter to a toki pona speaker.

Here are some more quick examples, now that you get the picture. Both ‘good’ and ‘simple’ are polysemously accomodated by pona, which means toki pona is intentionally conflating them, and asserts you should too. The word wile means both ‘to want’ and ‘to need’, and you can interpret this as toki pona philosophically objecting to anyone that claims that you can want without needing, or need without wanting. The lone third person pronoun is ona, subsuming all of ‘he’, ‘she’, singular ‘they’, and ‘it’, and the claim here (for English speakers at least!) is clear, I should hope.

There is lots to discuss and dissect here, but overall the result is a language that, while perhaps not being perfectly semantically minimal, forces its speakers to engage with minimalist philosophy in order to speak/write the language profitably, and in turn asks its speakers to consciously be understanding and interpretive in order to read/listen to the communication of others successfully. It’s very fun, and makes for quite a positive community on the whole. The simple vocabulary lends itself to a very low barrier to entry, too, which you can leisurely tackle in a month, or power through in as quick as a weekend. (Consider this my token endorsement of the language.)

This, as I’ve described it, is toki pona, according to jan Sonja’s book: Toki Pona: The Language of Good. Or, as the community calls it, pu. And that’s not an in-joke of some kind, that’s the Official Toki Pona Name for that book. TP:tLoG calls itself pu, as one of its 123 words. It’s not even polysemous with other words that could be related, like ‘textbook’ or ‘instruction’ or ‘orthodoxy’. This in itself is interesting already—this level of self-awareness and self-reference places pu within comparison distance to other texts like religious texts and some manifestos—but it gets even more interesting when you look at the history of toki pona.

As I alluded to earlier, the first draft of toki pona was published online in 2001, and received periodic updates until pu was finally published in print in 2014. It quickly grew from a Yahoo! mailing list to a phpBB forum to an expansive community that now reaches across several websites and social networks: Facebook, Reddit, Discord, Telegram, and likely several more. This, you may have observed, is a long time to live with version 0.x of anything. There are old timers that grew fluent in toki pona with words that have since been deprecated by the time pu was released.

Furthermore, as with any thriving language, new words are coined and grammatical innovations are created regularly. So toki pona exists in three states at once: I shall follow the community in referring to these states as pre-pu, pu, and post-pu. Note that these are not literal temporal states, but more so attitudes towards the language that are causally related to pu.

What complicates the story further is that these states exist in tension with each other. pu implicitly rejects all toki pona that is not pu. It is a critical part of toki pona’s philosophy that you use the language as it is, instead of coining new words for the complex concepts that pu rejects. And, somewhat cleverly, it enforces this by having a word for itself: it is pu to study pu, and whether or not a production is supported or rejected by pu is a discussion that can occur, in toki pona.

In practice, the tensions look something like this. Pre-pu toki pona is rather widely and fluently spoken, and adheres to the same minimalist philosophy, but contains deprecated words, in mild but direct conflict with pu. Post-pu toki pona, on the other hand, is wary (via pu) of some of these pre-pu words, but, being a language with users that have independent thought, it is generally free to take any of them that it pleases, as well as whatever new words it coins, which pu and pre-pu both take issue with.

It is easy to sympathize with pu here, if you take its philosophy seriously: new words are circumventing the minimalism of trying to analyse the world into its primitive elements. Plus, as its fluent speakers constantly demonstrate, it is generally very possible to find the appropriate circumlocutions to talk about any desired topic. A very common and powerful strategy is to explain a concept in as much toki pona as you need, and then refer to it as ijo ni (“that thing”) or jan ni (“that person”) or nasin ni (“that method”) or a ni-modified version of whatever other simple noun is appropriate. And as a final nail in the coffin, pu itself admits that toki pona is a simple language (a toki that is pona) and it can very easily become unsuitable for a world as complicated as ours. toki pona is not for technical documents, according to pu.

Let me argue in post-pu’s defence, though. Post-pu toki pona having more words than pu is not necessarily inconsistent with the minimalist philosophy of pu, if the new words are carefully justified. Consider, for example, the post-pu word tonsi. In pu, there is a single third person pronoun, ona, and the only two words to have any relation to gender are the two words meli (female, femininity; etymologically from Tok Pisin meri) and mije (male, masculinity; etym. Finnish mies). However, if you are trans or nonbinary, there is no word for you in toki pona. Although you can circumlocute with any number of options, like jan pi meli mije or jan pi meli ala pi mije ala or something, this is an unwieldy hassle to simply state who you are.

So define tonsi (nonbinary; etym. Mandarin tóngzhì for ‘comrade [no political connotation]’ and, in slang, ‘LGBT’). Now nonbinary people don’t have to be circumlocuted or explained or ni-ed, they simply are, and have a language primitive to match the existing ways of talking about gender: jan tonsi to accompany jan meli and jan mije. The assertion is that tonsi is perfectly in keeping with the pu philosophy, and more generally there can exist strong arguments for the introduction of words otherwise excluded or neglected by pu.

And this is merely one of the many “levels of strength” to which one can be post-pu. It is both “post-pu” to introduce an exhaustive system for number-naming—rendering the wan-tu-mute trick irrelevant for the sake of utility—and to use a pu word in a new part of speech—taking its meaning there to be the obvious one suggested by the existing polysemy and the inferred grammar of this modification. Perhaps I am giving off the impression that I am a post-pu sympathizer. That’s true, but in my opinion, it’s not as clear-cut as pu or post-pu is prescriptivist or descriptivist, “right” or “wrong”; and this is an immensely interesting place to be.

One final thing to note. According to pu, pu means either the book, interacting with the book, or being in accordance with the book. But a fairly weak post-pu novelty is that pu can mean orthodoxy in general. And while everyone in the community generally has a pretty clear understanding of what it means for something to be pu or pre-pu or post-pu, I find it just a little funny that depending on which side of the line you fall, you might be inclined to draw the line a little differently.

Because I think this is so important as to deserve archival in a ‘blog post, I would be remiss if I did not make good on my threat to compare pu’s self-reference to that of religious texts. So I searched for prior art and came upon Donald Haase’s Master’s thesis in Religious Studies.1

Haase, Donald. Self-Referential Features in Sacred Texts. FIU Electronic Theses and Dissertations (2018), 3726. doi:10.25148/etd.FIDC006911

It is the only thing I could find when I looked for a sincere survey of self-reference in notable ideological texts—for fans of the ‘blog, that means I am excluding Hofstadter from my search on the grounds that it is, as contemporary youth might pejoratively say, “a meme” and “kinda cringe”.

In his study, Haase considers a very broad definition of sacred texts—any fixed and bounded sequence of words that is considered by at least one person to be sacred—which serves our purposes handily. He finds three major categories of self-reference: inlibration, necessity, and untranslatability. pu satisfies the second of these very literally, but not so much the others.

Since I will be skipping over them, let me briefly summarize the inapplicable two categories. Inlibration is the proclamation that a text is the textual manifestation of a deity or other sacred entity. Clearly irrelevant here until someone takes the step of deifying jan Sonja. On the other hand, untranslatability is the instruction that any translation or other non-verbatim communication of the text will fail to inherit its sacredness, being merely mundane, if not outright profane. This has slightly more applicability, if you interpret non-pu lesson materials as non-verbatim communication, as their details frequently come under scrutiny for adherence to pu. But, especially given the existence of an official translation of pu into French, I think modified transmission of pu is not controversial, at least so long as the linguistic rules and minimalist philosophy are represented faithfully.

The necessity self-referential, on the gripping hand, is when a text insists on its own necessity for some (ostensibly important) purpose. Haase suggests many traditionally religious necessities:

A text can describe itself as necessary in (at least) the following ways:

  • Soteriologically Necessary: Necessary for salvation.
  • Eschatologically Necessary: Necessary to bring about the end of the world, or to make the end of the world occur in a desired way.
  • Ritualistically Necessary: Necessary for the correct performance of a ritual.
  • Ontologically Necessary: Necessary for the functioning or existence of reality.
  • Commanded Necessity: Mandated to be used in some way by a sacred source (without explicit invocation of one of the above reasons).114

What this is not is any kind of implied necessity of the contents of the text. For instance a text merely describing a ritual or the end of the world does not make it ritualistically or eschatologically necessary. The most common occurrences of this though are commands within a text that the text be read, recited, studied, copied, or otherwise transmitted.

114. Perhaps this could be separated into distinct types. Sample statements that would fall under this would be commands to read, learn, teach, copy, pass down, or safeguard the text. Each of these is impossible to fulfill without the text.

[links mine, footnote from source]

Of those that he lists, pu satisfies mainly the ritualistical necessity and the commanded necessity, by insisting that studying pu is the appropriate way to learn to speak toki pona, both in the main text and cheekily from the exercises. I would argue that there is also an indirect component, in pu’s strict control of its specific models of grammar and of parts of speech, which I would like to metaphorize as a glorified nonprogrammer’s version of EBNF. However, Haase explicitly ignores this implied self-referentiality in his study, for the fair purposes of objective comparison.

From Haase’s examples, the only sacred text he considers that falls only under the category of necessity self-referentials is the Papyrus of Ani. This is a personalized funeral scroll, containing passages that would be read at someone’s funeral by a priest so that the deceased is ensured a proper passage into the afterlife. It also contains instructions on how to read those passages and carry out the ceremony, and even has pictures of a priest reading from that same funeral scroll.

At the risk of making light of ancient funeral customs, I think pu and the Papyrus of Ani are both comparable in the degree of insistence of their ritualistical and commanded necessities. For contrast, Haase’s other examples, like the Quran and the Book of Mormon, are generally far stricter about their own necessity, and possess further forms of self-referentiality to boot. It would be immensely silly to leave the comparison at that, but I think I’ll have to, being that I don’t feel qualified to talk at length about any of the other texts that Haase uses. So overall, though the comparison is not entirely fair, I think it still definitely places pu somewhere interesting on a hypothetical “sacral continuum”.

I don’t have a conclusion to this case study that isn’t just gesturing towards Haase’s thesis and implying that I don’t have much to add. Strictly speaking, my motivations are orthogonal to Haase’s, though his groundwork is definitely stuff I think needs to be done and I am lucky that I do not have to do it myself just to do this case study saliently.

And, don’t let me let you forget it, this is just a (rather shallow) case study centered around a single text. As I alluded to earlier, this self-referentiality lens probably has some interesting things to say about ideological manifestos, too. I’m more interested in this because it relates to a conlang I like, but I don’t think there’s a similar dynamic going on with too many other conlangs. There definitely might be a couple of the popular ones worth examining, though: my money’s on Lojban, because I don’t know enough about the history of Esperanto to assess its usefulness there.

As far as my pu opinions go, I think tonsi specifically is a good word and belongs in toki pona, and there are very few non-pu words to which I extend the same privilege, even among the generally uncontroversial pre-pu lexicon. An informal survey on Reddit recently concluded that, out of 86 respondents, more than half agree that tonsi is a valid toki pona word. Only a slightly larger proportion reports in favour of the meme word kijetesantakalu that Sonja Lang introduced into toki pona on 1 April 2009, which I know is just a funny fact but I think also deserves to be called good progress for tonsi.

I have been working on a math post in toki pona. For anyone that is familiar with toki pona, or that paid attention earlier when I said that toki pona was not built to discuss technical subjects: yeah, I am indeed as stupid as I sound. No ETA on it yet, because I’m not entirely sure that it’s even going to work out comprehensibly, but my fingers are crossed. If there’s an easy topic you’d like to see, leave me a comment below. I won’t make any guarantees, because I’d probably have to pivot the stuff I’ve already written, but inspiration is never unwelcome.

  1. The lattice fans out there will be pleased to know that lattices tried to sneak their way into this post just as much as any of my mathematical ones: I kept typoing Haase as Hasse! If one slipped past the Ctrl+F, then you can officially declare the lattices as the winner of this round.