Jekyll2018-05-09T23:27:48-04:00http://cleare.st//The Clearest of Blue SkiesI'm technically a professional mathematician. I also like computers, words, and bad jokes. But I hate fun. Fun is banned.
Ilia ChtcherbakovAn excluded subobject theorem for nondistributive lattices2017-07-07T14:51:53-04:002017-07-07T14:51:53-04:00http://cleare.st/math/excluded-subobjects-for-nondistributive-lattices<p>I gave a talk yesterday at the PMC’s SASMS. I typeset some notes, and I thought I would share them because I drew pictures.
This typeset version has more details and less visual intuition than what I presented on the blackboard, which is why the whole thing fit into a half hour.
Find a link to the notes and some flavour text under the cut.
<!--more--></p>
<p><a href="/files/sasms1175.pdf">Here is a link to the PDF version of my notes.</a></p>
<p><a href="http://sasms.puremath.club">Short Attention Span Math Seminars</a>—or SASMS—is an event held by the University of Waterloo’s <a href="http://puremath.club">Pure Math Club</a> every four months.
It’s a fun little evening where students can sign up to give short 30 minute talks on topics to interest them, and it’s a fun way to practice speaking.
I was club president for probably like a billion years, and I’ve signed up to give a talk every term since before my father was born, so of course I talked this term.
This is probably the last talk I’m going to give, so I wanted to do a good job for once. I think it went over well.</p>
<p>The talk is about lattices, of course, because lattices are basically my favourite things that aren’t matroids.
A lattice is <strong>distributive</strong> if its meet operation distributes over its join, or equivalently vice versa.
It turns out that there is a nice excluded-subobject theorem characterizing distributivity—akin to <a href="https://en.wikipedia.org/wiki/Kuratowski%27s_theorem">Kuratowski’s theorem</a> on planarity—and it “factors” nicely into two halves as far as lattice theory is concerned.</p>
<p>I think I’ll spend the rest of this post explaining the story behind why I’m gave this talk.</p>
<p>Last fall, I took a course called “Introduction to Substructural Logics”.
This was an undergraduate topics course in philosophy, so it counts as an elective on my transcript, but it was taught by a hardcore logician and was very heavily mathematical in nature.
Basically a perfect course for someone like me who is lazy and can’t stand doing things that aren’t math.
There were at most twelve people taking it, most of them probably upper-year philosophy students with logic leanings.
Me and my good friend Sean Harrap, a mathematician and a computer scientist respectively, were also students.</p>
<p>It proceeded slowly enough at first, with many breaks to talk about the history or some connections to philosophical ideas, but it walked and talked like an easy math course.
The grading scheme was to be based on the class’ performance on three assignments, with the explicit expectation that people should try to complete as many questions as they can and the grading curve would be decided holistically after the fact.
Having enough time and interest on my hands, Sean and I decided to try to answer every question. For the first two assignments, this was not very hard or time-consuming, as the exercises were fairly simple.</p>
<p>By the time the third assignment rolled around, however, we had begun to cover algebraic semantics, and while the lectures still proceeded at a reasonable pace, the assignment pulled out all the stops.
One of the questions was to prove that a bounded lattice was a Boolean algebra iff every prime filter was an ultrafilter—you may recall I squeezed a <a href="/math/prime-filters-in-distributive-lattices">series of two blog posts</a> out of this problem—and it was assigned to us as casually as anything.
Another question was to prove the previously mentioned excluded subobject result for lattice distributivity, which an exercise I would not recommend for even the most intense training regimens.</p>
<p>I managed to walk out of the course with a hundo, so all’s well that ends well, but I worry the marking scheme was a little too adversarial
and might have left some of the less algebraically inclined students high and dry.</p>
<p>In any case, this was a tale I’d recounted many times since, and at one point, as a joke, my partner in substructural crime Sean Harrap suggested I give a talk about this course since I couldn’t stop yammering on about it.
So I put my money where my mouth was and set to work digesting this proof until it fit into 30 minutes.
Even then it’s a bit of a stretch, but at the very least all the necessary ingredients are there.</p>Ilia ChtcherbakovI gave a talk yesterday at the PMC’s SASMS. I typeset some notes, and I thought I would share them because I drew pictures.
This typeset version has more details and less visual intuition than what I presented on the blackboard, which is why the whole thing fit into a half hour.
Find a link to the notes and some flavour text under the cut.sl(2) for Combinatorialists2017-05-24T20:38:49-04:002017-05-24T20:38:49-04:00http://cleare.st/math/sl2-for-combinatorialists<p>There is a long and terrific story to tell about Lie theory, and I wish I could do it justice, but there’s far too much to say in a single post.
What I have today is merely one application of one Lie algebraic idea, which ends up being a useful theoretical and practical tool in enumerative combinatorics.
<!--more--></p>
<p>The long and short of it is that the representation theory of a spooky object called <script type="math/tex">\def\sl{\mathfrak{sl}}\sl(2)</script> can be hijacked by combinatorialists
to prove that certain sequences of positive integers are <em>symmetric</em> and <em>unimodal</em>.
Typically symmetry is obvious but unimodality is quite hard to establish,
so this <script type="math/tex">\sl(2)</script> technology does make things somewhat neater.
The other big tool I know about for proving things about unimodality or related properties like log-concavity is the theory of <em>stable polynomials</em>,
which is also rather algebraic.</p>
<h1 id="lie-algebras">Lie algebras</h1>
<p>Fix the field <script type="math/tex">\def\C{\mathbb C}\C</script>. Some of the following math can be done over other fields and even over general rings, but <script type="math/tex">\C</script> is good enough for the combinatorialist and hence for this post.
Formally, a <strong>Lie algebra</strong> is a vector space <script type="math/tex">\def\g{\mathfrak g}\g</script> together with a special bilinear map <script type="math/tex">[{-},{-}] : \g \times \g \to \g</script> called the <em>Lie bracket</em>.
The Lie bracket must be antisymmetric, in that <script type="math/tex">[a,b] = -[b,a]</script> for all <script type="math/tex">a,b \in \g</script>, and it must also satisfy the <em>Jacobi identity</em>,</p>
<script type="math/tex; mode=display">[a,[b,c]] + [b,[c,a]] + [c,[a,b]] = 0,</script>
<p>for all <script type="math/tex">a,b,c \in \g</script>.</p>
<p>Lie algebras arise in a natural way from fantastic objects called <em>Lie groups</em>, which are essentially groups with smooth manifold structure.
There is an enormous amount of theory on this topic, of which I will be needing rather little, and most of what I will talk about today can be done without invoking any of the deep Lie theory underlying everything,
but I thought I would record at least a taste of what lies beneath.</p>
<p>Any associative <script type="math/tex">\C</script>-algebra <script type="math/tex">A</script> gives rise to a Lie algebra on <script type="math/tex">A</script>, by taking the Lie bracket to be the commutator <script type="math/tex">[a,b] = ab - ba</script>.
In particular, the matrix algebra <script type="math/tex">\mathrm{End}(V)</script> of endomorphisms of a finite-dimensional vector space gives a Lie algebra denoted <script type="math/tex">\def\gl{\mathfrak{gl}}\gl(V)</script>.
When the particular vector space is irrelevant we often abbeviate <script type="math/tex">\gl(n) = \gl(\C^n)</script>.</p>
<p>The Lie algebra <script type="math/tex">\sl(2)</script> is a sub–Lie algebra of <script type="math/tex">\gl(2)</script>, consisting of those matrices with zero trace.
The trace functional is not multiplicative, so <script type="math/tex">\sl(2)</script> is not a subalgebra of <script type="math/tex">\mathrm{End}(\C^2)</script>, but it is true that <script type="math/tex">\def\tr{\operatorname{tr}}\tr(AB) = \tr(BA)</script>, so that <script type="math/tex">\tr([A,B]) = \tr(AB) - \tr(BA) = 0</script> and then <script type="math/tex">\sl(2)</script> is closed under the Lie bracket.</p>
<p>To better discuss <script type="math/tex">\sl(2)</script>, let</p>
<script type="math/tex; mode=display">% <![CDATA[
X = \begin{bmatrix} 0&1\\0&0 \end{bmatrix}, \quad Y = \begin{bmatrix} 0&0\\1&0 \end{bmatrix}, \quad H = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}. %]]></script>
<p><script type="math/tex">\{X,Y,H\}</script> is a basis for <script type="math/tex">\sl(2)</script>, and we can compute that <script type="math/tex">[X,Y] = H</script>, <script type="math/tex">[H,X] = 2X</script>, and <script type="math/tex">[H,Y] = -2Y</script>.
In fact, <script type="math/tex">\{X,Y\}</script> together generate <script type="math/tex">\sl(2)</script> as a Lie algebra, in that the only subset of <script type="math/tex">\sl(2)</script> closed under finite linear combinations and brackets, and containing <script type="math/tex">X</script> and <script type="math/tex">Y</script>, is all of <script type="math/tex">\sl(2)</script>.</p>
<h1 id="representation-theory">Representation theory</h1>
<p>A <strong>representation</strong> of a Lie algebra <script type="math/tex">\g</script> is a linear map <script type="math/tex">\pi : \g \to \gl(V)</script> for some vector space <script type="math/tex">V</script>, such that <script type="math/tex">\pi([x,y]_\g) = [\pi(x),\pi(y)]_{\gl(V)}</script>.
One famous representation of any Lie algebra is the <em>adjoint representation</em> <script type="math/tex">\mathrm{ad} : \g \to \gl(\g)</script> where <script type="math/tex">\mathrm{ad}(x) = [x,{-}]</script>.
We’re going to investigate the representation theory of <script type="math/tex">\sl(2)</script><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> and an arguably combinatorially useful property will fall out.</p>
<p>Let <script type="math/tex">\pi : \g \to \gl(V)</script> be a representation. A subspace <script type="math/tex">W \subseteq V</script> is <strong><script type="math/tex">\pi</script>-invariant</strong> if it is <script type="math/tex">\pi(x)</script>-invariant for all <script type="math/tex">x \in \g</script>, that is, if <script type="math/tex">\pi(x)W \subseteq W</script>.
<script type="math/tex">\pi</script> is <strong>irreducible</strong> if the only nontrivial invariant subspace is <script type="math/tex">V</script>.</p>
<p>One would hope, as in the representation theory of finite groups, that every complex finite-dimensional representation of a Lie algebra <script type="math/tex">\g</script> is a direct sum of irreducibles.
This doesn’t work out unless <script type="math/tex">\g</script> is <em>semisimple</em>.
The definition is a bit involved and doesn’t motivate itself, but it’s not wrong to say that <script type="math/tex">\g</script> is semisimple iff it is a direct sum of <em>simple</em> Lie algebras, which are those where the only nontrivial subspace <script type="math/tex">\mathfrak i</script> such that <script type="math/tex">[\g,\mathfrak i] = \mathfrak i</script> is <script type="math/tex">\g</script> itself.
Point is, <script type="math/tex">\sl(2)</script> is semisimple.</p>
<p>One common abuse of notation is to make <script type="math/tex">\pi : \g \to \gl(V)</script> implicit by declaring that <script type="math/tex">V</script> is a representation of <script type="math/tex">\g</script>
and that <script type="math/tex">xv = \pi(x)v</script> for <script type="math/tex">x \in \g</script> and <script type="math/tex">v \in V</script>.
Never having been one to rock the boat, I’ll do the same when discussing representations of <script type="math/tex">\sl(2)</script>.</p>
<p>Because <script type="math/tex">\{X,Y\}</script> generate <script type="math/tex">\sl(2)</script>, representations of <script type="math/tex">\sl(2)</script> are determined by the images of <script type="math/tex">X</script> and <script type="math/tex">Y</script>.
The coherence conditions they have to satisfy are <script type="math/tex">[H,X] = 2X</script> and <script type="math/tex">[H,Y] = -2Y</script>, where of course <script type="math/tex">H = [X,Y]</script>.
By the bilinearity and antisymmetry of the bracket, any pair of maps <script type="math/tex">(X,Y)</script> satisfying these (two) equations forms a representation of <script type="math/tex">\sl(2)</script>.</p>
<p>If we take for granted that every representation of <script type="math/tex">\sl(2)</script> decomposes a direct sum of irreducible representations, often abbreviated <em>irreps</em>,
then it suffices to understand the irreps.
Once we have that knowledge I can explain what it’s good for and how a combinatorialist might use it.
(At this point you can skip to the next section if you believe me and don’t care why.)</p>
<p>So let <script type="math/tex">V</script> be a finite-dimensional irrep of <script type="math/tex">\sl(2)</script>.
By semisimplicity, we can use a principle called <em>the preservation of Jordan decomposition</em><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.
This tells us that <script type="math/tex">H</script> acts diagonalizably on <script type="math/tex">V</script>, since it itself is diagonal in <script type="math/tex">\sl(2)</script>, and likewise <script type="math/tex">X</script> and <script type="math/tex">Y</script> act nilpotently.
Because <script type="math/tex">H</script> is diagonalizable, let’s decompose <script type="math/tex">V = \bigoplus_\lambda V_\lambda</script> into <script type="math/tex">\lambda</script>-eigenspaces <script type="math/tex">V_\lambda</script> for <script type="math/tex">H</script>.
The eigenvalues <script type="math/tex">\lambda</script> that have nontrivial <script type="math/tex">V_\lambda</script> are called <strong>weights</strong> and the <script type="math/tex">V_\lambda</script> are called <strong>weight spaces</strong>.</p>
<p>If <script type="math/tex">v \in V_\lambda</script>, then</p>
<script type="math/tex; mode=display">HXv = (XH + [H,X])v = X(\lambda v) + (2X)v = (\lambda + 2)Xv</script>
<p>so <script type="math/tex">X(V_\lambda) \subseteq V_{\lambda+2}</script>, and likewise <script type="math/tex">Y(V_\lambda) \subseteq V_{\lambda-2}</script>.
For this reason <script type="math/tex">X</script> and <script type="math/tex">Y</script> are often called <em>raising</em> and <em>lowering</em> operators, respectively.</p>
<p>If some <script type="math/tex">\alpha \in \C</script> has a nontrivial <script type="math/tex">V_\alpha</script>, then <script type="math/tex">\bigoplus_{n \in \mathbb Z} V_{\alpha+2n}</script> is an invariant subrepresentation of <script type="math/tex">V</script>, and hence equals <script type="math/tex">V</script> by irreducibility.
So by finite dimensionality, these eigenvalues show up in an unbroken line as in <script type="math/tex">\alpha, \alpha+2, \alpha+4, \dots, \alpha+2k</script>.</p>
<p>Let <script type="math/tex">v \in V_\alpha</script> be a vector of lowest weight.
Then consider the cyclic subspace <script type="math/tex">\{v, Xv, X^2v, ...\}</script>.
Obviously, <script type="math/tex">Yv = 0</script> by lowest weight.
By induction we can show <script type="math/tex">YX^nv = n(\alpha+n-1)X^{n-1}v</script>, for</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
YX^{n+1}v &= XYX^nv + HX^nv \\
&= X\bigl( n(\alpha+n-1) X^{n-1}v \bigr) + (\alpha+2n)X^nv \\
&= \bigl( n\alpha + n(n-1) + \alpha+2n \bigr)X^nv \\
&= \bigl( (n+1)\alpha + n(n+1) \bigr) X^nv \\
&= \bigl( (n+1)(\alpha + (n+1)-1) \bigr) X^nv.
\end{align*} %]]></script>
<p>It follows that this cyclic subspace is a subrepresentation, and by irreducibility, <script type="math/tex">V</script> is equal to this subrepresentation.
But now, because <script type="math/tex">V</script> is finite-dimensional, we can do some numerological magic.
<script type="math/tex">X^nv = 0</script> for some least <script type="math/tex">n = \dim V</script>, and then <script type="math/tex">0 = YX^nv = n(\alpha+n-1)X^{n-1}v</script>.</p>
<p>Well, <script type="math/tex">X^{n-1}v</script> is a nonzero vector, so the coefficient <script type="math/tex">n(\alpha+n-1)</script> must be zero.
If <script type="math/tex">V</script> is nontrivial, then <script type="math/tex">\alpha + n-1 = 0</script> and hence <script type="math/tex">\alpha</script> is a strictly negative integer!</p>
<p>Finally, recall that <script type="math/tex">X(V_\lambda) \subseteq V_{\lambda+2}</script>, so that the weight spaces of <script type="math/tex">V</script> have dimensions <script type="math/tex">1, 0, 1, 0, \dots</script>, starting at <script type="math/tex">\alpha</script>.
Also, since <script type="math/tex">k = n-1 = -\alpha</script> gives the last nonzero vector, the highest nontrivial weight space is <script type="math/tex">\alpha+2k = -\alpha = |\alpha|</script>.</p>
<p>Now we know all that we need to about irreps of <script type="math/tex">\sl(2)</script>.</p>
<h1 id="symmetry-and-unimodality">Symmetry and unimodality</h1>
<p>Taking stock of what just happened, we see that there is one irrep of any particular dimension, and its weight spaces have dimensions <script type="math/tex">1, 0, 1, 0, \dots, 0, 1</script>, symmetrically arranged around the 0-eigenspace.
It follows that any finite-dimensional representation is isomorphic to a direct sum of these.
That is to say, if <script type="math/tex">V = \bigoplus_i V_i</script> is a representation of <script type="math/tex">\sl(2)</script>, graded by its weight spaces, and <script type="math/tex">d_i = \dim V_i</script> is the dimension of the <script type="math/tex">i</script>-th weight space, then the following is true:</p>
<script type="math/tex; mode=display">\cdots \le d_{-4} \le d_{-2} \le d_0 \ge d_2 \ge d_4 \ge \cdots</script>
<script type="math/tex; mode=display">\cdots \le d_{-3} \le d_{-1} = d_1 \ge d_3 \ge \cdots</script>
<p>These two sequences, <script type="math/tex">(\dots, d_{-4}, d_{-2}, d_0, d_2, d_4, \dots)</script> and <script type="math/tex">(\dots, d_{-3}, d_{-1}, d_1, d_3, \dots)</script>, have two properties that are referred to as <strong>symmetry</strong>—that they can be reflected about their center and remain equal—and <strong>unimodality</strong>—that they rise monotonically to some peak, and subsequently fall monotonically.</p>
<p>Because of this, if you would like to prove that some sequence of positive integers is symmetric and unimodal, it would suffice to find a representation of <script type="math/tex">\sl(2)</script> with suitable weight spaces.
The encoding for a sequence <script type="math/tex">(d_i)_{i=0}^n</script> is usually to have <script type="math/tex">d_i</script> be the dimension of the <script type="math/tex">(2i-n)</script>-th weight space.<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>
To show you some interesting examples, I’ll use a couple of bits of technology, but in principle I could give the coefficients explicitly and that would suffice.</p>
<p>Given any finite set <script type="math/tex">S</script>, there is a natural representation of <script type="math/tex">\sl(2)</script>, called the <em>Boolean representation</em>, on the free vector space <script type="math/tex">\C\mathcal P(S)</script> whose basis is indexed by subsets of <script type="math/tex">S</script>.
Denote the basis vector of <script type="math/tex">A \subseteq S</script> by <script type="math/tex">\tilde A</script>.
Then the representation of <script type="math/tex">\sl(2)</script> is given by</p>
<script type="math/tex; mode=display">X\tilde A = \sum_{a \notin A} (A \cup \{a\})^\sim \quad \text{and} \quad Y\tilde A = \sum_{a \in A} (A \smallsetminus \{a\})^\sim.</script>
<p>Let <script type="math/tex">V</script> be a representation of <script type="math/tex">\sl(2)</script> and suppose it has an action by some group <script type="math/tex">G \le \mathrm{GL}(V)</script> as well.
If the <script type="math/tex">\sl(2)</script>-rep is <strong>equivariant</strong> with respect to the <script type="math/tex">G</script>-action, in that <script type="math/tex">gx = xg</script> for all <script type="math/tex">x \in \sl(2)</script> and <script type="math/tex">g \in G</script>,
then there exists a subrepresentation on the <script type="math/tex">G</script>-invariant vectors, i.e. on the vector space</p>
<script type="math/tex; mode=display">V^G = \{ v \in V : g.v = v\ \text{for all}\ g \in G \}.</script>
<p>To wit, if <script type="math/tex">\{v_1, \dots, v_n\}</script> is some orbit of <script type="math/tex">G</script>, then <script type="math/tex">\sum_i v_i \in V^G</script>, so in some sense <script type="math/tex">V^G</script> is the space of orbits of <script type="math/tex">G</script>.</p>
<p>Now, we can see a couple of examples.</p>
<p>First, let <script type="math/tex">g_n(k)</script> the number of isomorphism classes of <script type="math/tex">n</script>-vertex <script type="math/tex">k</script>-edge graphs.
Clearly the sequence <script type="math/tex">g_n = ( g_n(k) : 0 \le k \le \binom{n}{2} )</script> is symmetric, via complementation, but unimodality is far far harder to show combinatorially.
Instead, we’ll use <script type="math/tex">\sl(2)</script>!</p>
<p>Let <script type="math/tex">E = \binom{[n]}2</script> be the edge set of the complete graph <script type="math/tex">K_n</script>.
The symmetric group <script type="math/tex">S_n</script> has a natural action on <script type="math/tex">E</script>, by permuting the vertices of <script type="math/tex">K_n</script> and bringing the edges along.
This induces an action on <script type="math/tex">2^E</script>, which can be interpreted as the set of all graphs on the vertex set <script type="math/tex">[n] = \{1,\dots,n\}</script>.
Notably, two graphs are in the same orbit iff they are isomorphic.
It follows that the dimensions of the weight spaces of <script type="math/tex">\C\mathcal P(E)^{S_n}</script> are precisely the <script type="math/tex">g_n(k)</script>’s above,
so by the invariant subrepresentation of the Boolean representation of <script type="math/tex">\sl(2)</script> on <script type="math/tex">E</script>, <script type="math/tex">g_n</script> is symmetric and unimodal.</p>
<p>As a second example, let <script type="math/tex">p_{a,b}(k)</script> be the number of <a href="https://en.wikipedia.org/wiki/Partition_%28number_theory%29">integer partitions</a> of <script type="math/tex">k</script> with at most <script type="math/tex">a</script> parts, each of which has size at most <script type="math/tex">b</script>.
It’s a classical result of <a href="https://en.wikipedia.org/wiki/Q-analog">q-combinatorics</a> that this is the coefficient of <script type="math/tex">q^k</script> in the Gaussian polynomial</p>
<script type="math/tex; mode=display">\def\qbinom{\genfrac{[}{]}{0pt}{}} \qbinom{a+b}{a}_q = \prod_{i=1}^a \frac{1-q^{b+i}}{1-q^i}.</script>
<p>Let <script type="math/tex">V</script> be the Boolean representation of <script type="math/tex">[a] \times [b]</script>, and let <script type="math/tex">G = S_b \wr S_a</script> be the <a href="https://en.wikipedia.org/wiki/Wreath_product">wreath product</a> of two symmetric groups.
If you don’t know what this group is, then know that its action on <script type="math/tex">\mathcal P([a] \times [b])</script> is exactly to permute the cells within each row, and then also to permute the rows afterwards.
(If you’re reading the Wikipedia article, then this is an action induced by the <em>imprimitive</em> action.)</p>
<p>Given a proper definition, it is not hard to show each orbit of <script type="math/tex">G</script> on <script type="math/tex">\mathcal P([a] \times [b])</script> contains the Ferrers diagram of exactly one partition,
and hence <script type="math/tex">G</script> acts on <script type="math/tex">V</script> such that <script type="math/tex">V^G</script> is a vector space with basis indexed by these <script type="math/tex">a \times b</script>-bounded partitions.
The partitions of <script type="math/tex">n</script> all fall into the weight space with eigenvalue <script type="math/tex">2n - ab</script>,
so the sequence <script type="math/tex">p_{a,b} = ( p_{a,b}(k) : 0 \le k \le ab )</script> is symmetric and unimodal.
This is but one of many proofs of the celebrated unimodality of the coefficients of <script type="math/tex">\qbinom{n}{k}_q</script>.</p>
<p>It is not very hard to give explicit coefficients for this particular representation, actually.
Writing partitions as their multiplicity vectors <script type="math/tex">(m_0, \dots, m_b)</script>, it turns out that</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
X\cdot(m_0, ..., m_b) &= \sum_{i=0}^b (b-i)m_i \cdot(\dots, m_i-1, m_{i+1}+1, \dots), \\
Y\cdot(m_0, ..., m_b) &= \sum_{j=0}^b jm_j \cdot(\dots, m_{j-1}+1, m_j-1, \dots).
\end{align*} %]]></script>
<p>At first glance it may appear that the above is not well-defined, but only valid partitions will show up in summands with nonzero coefficients, so all is well.</p>
<h1 id="lets-talk-about-posets-now">Let’s talk about posets now</h1>
<p>Because posets and lattices are my favourite thing on this blog, I would be remiss if I were not to mention a very obvious connection to posets.</p>
<p>A poset <script type="math/tex">P</script> is <strong>graded</strong> if it can be partitioned into disjoint ranks <script type="math/tex">P_i</script>, <script type="math/tex">i \in \{0, ..., r\}</script>, such that the only covering relations are between adjacent ranks <script type="math/tex">P_i</script> and <script type="math/tex">P_{i+1}</script>.
In such a situation, you could prove that <script type="math/tex">P</script> is <em>rank-symmetric</em> and <em>rank-unimodal</em> by finding a representation of <script type="math/tex">\sl(2)</script> on the free vector space <script type="math/tex">\tilde P</script> whose weight spaces are the free subspaces <script type="math/tex">\tilde P_i</script>.</p>
<p>If you additionally require that the representation of <script type="math/tex">\sl(2)</script> respect the poset structure—by saying that <script type="math/tex">X</script> and <script type="math/tex">Y</script> only raise or lower along covering relations,
i.e. whenever <script type="math/tex">X\tilde a = \sum_i x_i \tilde b_i</script> for nonzero <script type="math/tex">x_i</script>, then <script type="math/tex">a \le b_i</script>—then we say that <script type="math/tex">P</script> carries that representation of <script type="math/tex">\sl(2)</script>.
In this case, you prove not only that <script type="math/tex">P</script> is rank-symmetric and unimodal, but has a third property: that any union of <script type="math/tex">k</script> antichains is at most as large as the union of the <script type="math/tex">k</script> largest ranks.</p>
<p>This is called the <a href="https://en.wikipedia.org/wiki/Sperner_property_of_a_partially_ordered_set"><em>strong Sperner</em> property</a>, and those of you who have heard of Sperner theory are probably already groaning and closing your browser window,
so I promise I won’t say much more about it.
Essentially, this property is saying that there are no clever collections of large antichains, and if you want a bunch of them you might as well take from the ranks.
In some sense it guarantees that your poset is not very lopsided.</p>
<p>A poset has the <em>Peck property</em> if it is rank-symmetric, rank-unimodal, and strongly Sperner.
By a theorem of Proctor, a poset is Peck iff it carries a representation of <script type="math/tex">\sl(2)</script>.</p>
<p>The representations given as examples above are actually carried by posets.
The first is carried by the poset of <script type="math/tex">n</script>-vertex isomorphism classes of graphs, ordered by the subgraph relation,
and the second is carried by the lattice of bounded partitions, equivalently viewed as the lattice of order ideals in a product of two chains.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>By “investigate” I mean I’m just going to say some things which aren’t technically wrong, and you can look them up if you don’t believe me; and by “we” I mean I’m reading some <a href="http://csclub.uwaterloo.ca/~mlbaker/s14/">Lie rep theory notes</a> curated by a friend of mine, <a href="https://mlbaker.net/">Michael Baker</a>, and pilfering just enough of the relevant presentation to make me feel bad if I didn’t say anything. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Of course, this depends on semisimplicity. Properly, this is called the preservation of <a href="https://en.wikipedia.org/wiki/Jordan%E2%80%93Chevalley_decomposition">Jordan–Chevalley decomposition</a>, and a precise statement and proof can probably be found in something like Fulton and Harris. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>It is more convenient to index from 0 when dealing with <script type="math/tex">\sl(2)</script>, for the same reason that both <script type="math/tex">\varnothing</script> and <script type="math/tex">S</script> are subsets of <script type="math/tex">S</script>. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Ilia ChtcherbakovThere is a long and terrific story to tell about Lie theory, and I wish I could do it justice, but there’s far too much to say in a single post.
What I have today is merely one application of one Lie algebraic idea, which ends up being a useful theoretical and practical tool in enumerative combinatorics.Graph Homomorphisms and Cores2017-03-06T15:56:40-05:002017-03-06T15:56:40-05:00http://cleare.st/math/graph-homs-and-cores<p>Today I’d like to talk about an open problem I’ve been interested in for the past couple of years.
It’s a very hard problem, in that there are easier special cases that are famously unapproachable,
but it makes for some rather pretty algebra, living at the intersection of graph theory and category theory.
If you ask me, it’s too good to be false.
<!--more--></p>
<p>A <strong>graph homomorphism</strong> <script type="math/tex">f : G \to H</script> between two (simple) graphs <script type="math/tex">G</script> and <script type="math/tex">H</script> is a function <script type="math/tex">f : V(G) \to V(H)</script> of the vertices that preserves edges,
i.e. such that if <script type="math/tex">uv \in E(G)</script>, then <script type="math/tex">f(u) f(v) \in E(H)</script>.
If you want to see a couple of professionals handle these bad boys, I would heartily recommend flipping open Chapter 6 of <em>Algebraic Graph Theory</em> by Godsil and Royle.</p>
<p>This is an appropriate notion of morphism for a category <script type="math/tex">\mathsf{Graph}</script> of graphs, for a number of reasons,
not least of which is that the iso arrows are the usual graph isomorphisms.
We will see this category rear its head again and again, in more or less obvious ways, but that won’t be important until much later.</p>
<p>A homomorphism <script type="math/tex">f : G \to H</script> partitions <script type="math/tex">V(G)</script> into fibres <script type="math/tex">f^{-1}(v) = \{ u \in V(G) : f(u) = v \}</script>, one for each <script type="math/tex">v \in V(H)</script>.
If two vertices of <script type="math/tex">G</script> fall into the same fibre, they cannot be adjacent, because otherwise <script type="math/tex">H</script> would have a loop edge.
So each fibre is a <em>coclique</em><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>.
Consequently, a homomorphism <script type="math/tex">G \to K_n</script> is an <script type="math/tex">n</script>-colouring of the graph <script type="math/tex">G</script>,
and it doesn’t take too much imagination to view homomorphisms as a generalization of graph colouring.</p>
<p>You can restate a number of other interesting graph theoretical conditions in terms of homomorphisms as well.
For example, <script type="math/tex">K_n \to G</script> iff <script type="math/tex">G</script> has an <script type="math/tex">n</script>-clique, so we can talk about the clique number <script type="math/tex">\omega(G)</script>.
<em>Odd girth</em>—the length of the shortest odd cycle in <script type="math/tex">G</script>–is another example.
Even cycles are bipartite, so <script type="math/tex">C_{2k} \to K_2 \to G</script> so long as <script type="math/tex">G</script> is nonempty,
but with an odd cycle <script type="math/tex">C_{2k+1}</script>, you have to wrap it around a smaller odd cycle, so e.g. if <script type="math/tex">G</script> is triangle-free then <script type="math/tex">C_3 \not\to G</script>.</p>
<p>Now define a relation <script type="math/tex">\to</script> on the set of graphs <script type="math/tex">\def\G{\mathcal G}\G</script> by saying <script type="math/tex">G \to H</script> if there exists a homomorphism from <script type="math/tex">G</script> to <script type="math/tex">H</script>.
This relation is a preorder on <script type="math/tex">\G</script>, that is, it’s reflexive and transitive.
It’s not a partial order, though, even up to isomorphism:
as we saw just a moment ago, <script type="math/tex">C_{2k} \to K_2</script> and <script type="math/tex">K_2 \to C_{2k}</script>.
More generally, <script type="math/tex">\to</script> will confuse any nonempty bipartite graph for <script type="math/tex">K_2</script>,
and likewise if <script type="math/tex">\chi(G) = \omega(G) = n</script> then <script type="math/tex">G \to K_n</script> and <script type="math/tex">K_n \to G</script>.</p>
<hr />
<p>When faced with a preorder, a standard trick is to quotient out by equivalence relation generated by the preorder, and look at the resulting partial order.
In our case, the equivalence relation is called <strong>hom-equivalence</strong> <script type="math/tex">\def\fromto{\leftrightarrow}\fromto</script>.
Technically, all of my statements from now on should be about <script type="math/tex">(\G/{\fromto}, {\to})</script>,
but because hom-equivalence respects most of our intuition, I will often talk about the graphs themselves and not their hom-equivalence classes.</p>
<p>So now we’re living in a poset, <script type="math/tex">(\G/{\fromto}, {\to})</script>. Let’s try to understand its structure.
As of late, my favourite objects on this ‘blog have been lattices, so it should come as no surprise that <script type="math/tex">(\G/{\fromto}, {\to})</script> is a lattice.
What may be more surprising is what the meet and join are.</p>
<p>Recall that a <strong>lattice</strong> is a poset such that every pair of elements has a greatest lower bound, called a <strong>meet</strong>, and a least upper bound, called a <strong>join</strong>.
In our case, we have a slightly nonstandard order <script type="math/tex">\to</script>, so in the intuitive picture of something being ‘above’ or ‘greater than’ another,
I am considering the codomain to be above the domain, i.e. <script type="math/tex">({\to}) \approx ({\le})</script>.</p>
<p>First off, let’s try to figure out the join.
That is, for two graphs <script type="math/tex">G</script> and <script type="math/tex">H</script>, we want a graph <script type="math/tex">J</script> such that <script type="math/tex">J \to X</script> iff <script type="math/tex">G \to X</script> and <script type="math/tex">H \to X</script>.
In particular, <script type="math/tex">J \to J</script>, so we should have <script type="math/tex">G \to J</script> and <script type="math/tex">H \to J</script>.</p>
<p>The most brain-dead thing to try that obviously has this property is the disjoint union <script type="math/tex">G + H</script>.
If <script type="math/tex">G + H \to X</script> then we can clearly read off a pair of homs <script type="math/tex">G \to X</script> and <script type="math/tex">H \to X</script>.
Similarly, if <script type="math/tex">g : G \to X</script> and <script type="math/tex">h : H \to X</script>, then the way to construct a hom <script type="math/tex">G + H \to X</script> is to simply send the <script type="math/tex">G</script>-half via <script type="math/tex">g</script> and the <script type="math/tex">H</script>-half via <script type="math/tex">h</script>.
So this poset has a join, namely disjoint union.</p>
<p>Now, let’s consider the meet.
It’s a bit of a counterintuitive answer, but I have to tell you what it is because I will make use of it later.</p>
<p>Define the graph <script type="math/tex">G \times H</script> by taking the vertex set <script type="math/tex">V(G \times H) = V(G) \times V(H)</script>, and the edge set</p>
<script type="math/tex; mode=display">E(G \times H) = \{ (g,h)(g',h') : gg' \in E(G)\ \text{and}\ hh' \in E(H) \}.</script>
<p>This graph is variously called the <strong>direct product</strong>, <strong>tensor product</strong>, or <strong>categorical product</strong> of the two graphs <script type="math/tex">G</script> and <script type="math/tex">H</script>.
The two projection maps <script type="math/tex">\pi_G</script> and <script type="math/tex">\pi_H</script> to the coordinates of the vertex set are in fact surjective graph homomorphisms onto the factors.</p>
<blockquote>
<p><strong>Proposition.</strong> If <script type="math/tex">g : X \to G</script> and <script type="math/tex">h : X \to H</script>,
then there is a homomorphism <script type="math/tex">f : X \to G \times H</script> such that
<script type="math/tex">g = \pi_G \circ f</script> and <script type="math/tex">h = \pi_H \circ f</script>.</p>
<p><strong>Proof.</strong> Define <script type="math/tex">f</script> by <script type="math/tex">f(x) = (g(x),h(x))</script> for each <script type="math/tex">x \in V(X)</script>.
This is a homomorphism if both <script type="math/tex">g</script> and <script type="math/tex">h</script> are, and it is very explicit that postcomposing projection maps recovers <script type="math/tex">g</script> and <script type="math/tex">h</script>. ∎</p>
</blockquote>
<p>It follows that the direct product <script type="math/tex">\times</script> is the meet in our poset, and hence it is a lattice.</p>
<blockquote>
<p><strong>Theorem.</strong> The hom-equivalence classes of graphs form a lattice. ∎</p>
</blockquote>
<hr />
<p>The next step in <a href="https://en.wikipedia.org/wiki/Grok">grokking</a> this lattice <script type="math/tex">(\G/{\fromto},{\to},{\times},{+})</script> will be to get a better handle on its elements.
Specifically, I would like to find a collection of interesting representatives for the hom-equivalence classes, much like the representatives <script type="math/tex">\{0,...,n-1\}</script> for <script type="math/tex">\def\Z{\mathbb Z}\Z/n\Z</script>.
By the serendipity of analogy, it will turn out that the vertex-minimal elements of each hom-equivalence class will have exceptional structure.</p>
<blockquote>
<p><strong>Theorem.</strong> Graphs in a hom-equivalence class having the minimum number of vertices are determined up to isomorphism by the hom-equivalence class.</p>
<p><strong>Proof.</strong> Let <script type="math/tex">G \fromto H</script> be two graphs with the minimum number of vertices for their hom-equivalence class.
Let <script type="math/tex">f : G \to H</script> and <script type="math/tex">g : H \to G</script> be homomorphisms.
By the minimality condition, both <script type="math/tex">f</script> and <script type="math/tex">g</script> must be surjective,
so <script type="math/tex">g \circ f : G \to G</script> is a surjective endomorphism, that is, an automorphism.
Then <script type="math/tex">f</script> has two-sided inverse <script type="math/tex">(g \circ f)^{-1} \circ g</script>,
and it follows that <script type="math/tex">G \cong H</script>. ∎</p>
</blockquote>
<p>The unique-up-to-isomorphism graph of a hom-equivalence class <script type="math/tex">[G]</script> is called the <strong>core</strong> <script type="math/tex">G^\bullet</script>.
It turns out cores are particularly susceptible to this sort of minimality argument, which give them a number of algebraically pleasing properties.</p>
<blockquote>
<p><strong>Exercise.</strong> <script type="math/tex">G^\bullet</script> is isomorphic to an induced subgraph of <script type="math/tex">G</script>.</p>
</blockquote>
<p>With this result in hand, I will immediately abuse notation to say that <script type="math/tex">G^\bullet</script> <em>is</em> an induced subgraph of <script type="math/tex">G</script>.</p>
<blockquote>
<p><strong>Theorem.</strong> A graph <script type="math/tex">G</script> is a core iff
<script type="math/tex">\def\End{\mathrm{End}}\def\Aut{\mathrm{Aut}}\End(G) = \Aut(G)</script>.</p>
<p><strong>Proof.</strong> Suppose <script type="math/tex">G</script> is a core, and let <script type="math/tex">f \in \End(G)</script> be an endomorphism, that is, a homomorphism <script type="math/tex">f : G \to G</script>.
If <script type="math/tex">f</script> is not an automorphism of <script type="math/tex">G</script>, then it fails to be surjective,
which contradicts minimality.
Conversely, suppose <script type="math/tex">G</script> is a graph such that <script type="math/tex">\End(G) = \Aut(G)</script>.
By the previous exercise, <script type="math/tex">G^\bullet</script> embeds into <script type="math/tex">G</script>.
There exists an endomorphism <script type="math/tex">G \to G^\bullet</script>, and by hypothesis it is an automorphism, so <script type="math/tex">G \cong G^\bullet</script> and hence <script type="math/tex">G</script> is a core. ∎</p>
</blockquote>
<p>This is just the tip of the iceberg, though it is enough for now.
There is a lot of great stuff on cores that I will eventually get to, but I have a particular goal this time.</p>
<p>The question I have is a simple one.
The poset <script type="math/tex">(\G^\bullet,{\to})</script> of cores is isomorphic to the lattice <script type="math/tex">(\G/{\fromto},{\to})</script> of hom-equivalence classes.
<em>So what is the lattice structure on <script type="math/tex">(\G^\bullet,{\to})</script>?</em></p>
<p>In one sense, it’s easy to describe. For <script type="math/tex">X, Y \in \G^\bullet</script>, <script type="math/tex">X</script> meet <script type="math/tex">Y</script> is just <script type="math/tex">(X \times Y)^\bullet</script>, and same with the join.
But I claim we can do a bit better than that.</p>
<p>First off, given two cores <script type="math/tex">X</script> and <script type="math/tex">Y</script>, either they are comparable or not.
If <script type="math/tex">X \to Y</script>, then it’s not very hard to see that <script type="math/tex">X \times Y \fromto X</script> and <script type="math/tex">X + Y \fromto Y</script>.
So we may assume that <script type="math/tex">X</script> and <script type="math/tex">Y</script> are <em>hom-incomparable</em> cores.</p>
<p>Let me show you a result about disjoint unions of cores which we can use to attack this problem.</p>
<blockquote>
<p><strong>Lemma.</strong> If <script type="math/tex">X</script> and <script type="math/tex">Y</script> are connected hom-incomparable cores,
then <script type="math/tex">X + Y</script> is a core.</p>
<p><strong>Proof.</strong> Consider an endomorphism <script type="math/tex">f : X + Y \to X + Y</script>.
By the universal property of disjoint unions, this is determined by the two homomorphisms <script type="math/tex">g = f \circ \iota_X : X \to X + Y</script> and <script type="math/tex">h = f \circ \iota_Y : Y \to X + Y</script>.
Since <script type="math/tex">X</script> is connected, <script type="math/tex">g</script> sends <script type="math/tex">X</script> to one of the two components of <script type="math/tex">X + Y</script>.
But then by hom-incomparability, <script type="math/tex">X \not\to Y</script>, so <script type="math/tex">g</script> is essentially an endomorphism of <script type="math/tex">X</script>.
<script type="math/tex">X</script> is a core, so <script type="math/tex">\End(X) = \Aut(X)</script> and <script type="math/tex">g</script> is an isomorphism of <script type="math/tex">X</script> and the <script type="math/tex">X</script> component of the disjoint union.
By symmetry, <script type="math/tex">h</script> is also an isomorphism of <script type="math/tex">Y</script> onto the <script type="math/tex">Y</script>-component of <script type="math/tex">X + Y</script>, and hence <script type="math/tex">f</script> must be an automorphism.
<script type="math/tex">\End(X + Y) = \Aut(X + Y)</script>, so <script type="math/tex">X + Y</script> is a core. ∎</p>
</blockquote>
<p>If you think about it for a bit, this tells us that really the only thing that can go wrong is that if <script type="math/tex">A \to A'</script>, then <script type="math/tex">(A + B) + (A' + C) \fromto A' + B + C</script>.
So if <script type="math/tex">X</script> and <script type="math/tex">Y</script> are cores, then <script type="math/tex">(X + Y)^\bullet</script> consists of the <script type="math/tex">\to</script>-maximal connected components of <script type="math/tex">X + Y</script>.
<em>That</em> is what I consider a satisfactory answer: aside from the NP-complete problem of hom-existence between different components,
we algebraically know exactly what the core is in terms of the factors.</p>
<hr />
<p>So you might look at that and be optimistic about the dual case.
We just need a similar lemma for the direct product—something to do with <script type="math/tex">\times</script>-irreducible hom-incomparable cores.
Then we can just decompose all cores into their <script type="math/tex">\times</script>-factorizations and use our lemma a bunch!</p>
<p>Well, let’s not get hasty. <script type="math/tex">\times</script>-factorization doesn’t actually make sense in general!</p>
<blockquote>
<p><strong>Exercise.</strong> Show that <script type="math/tex">P_4 \times K_3 \cong K_2 \times G</script> for some <script type="math/tex">\times</script>-irreducible graph <script type="math/tex">G</script>, where <script type="math/tex">P_4</script> is the path on four vertices.</p>
</blockquote>
<p>So you have a moment of panic, and then gather your thoughts and say, “Surely this is in the literature!”
Well, yes, it is.
Chapter 8 of the <em>Handbook of Product Graphs</em>, by Hammack, Imrich, and Klavžar, is all you need to know about direct product factorization,
and in fact section 8.5 proves that the connected nonbipartite graphs have unique factorization in the graphs-with-loops.</p>
<p>“Okay, that’s a bit weird. Is that workable? Can we look at the proof and modify it?” Well, maybe. I don’t know yet.
It’s kinda gross because the dependencies go back to the start of Chapter 7,
and the <em>Handbook</em> is not light bedtime reading.
Digesting it is on my to-do list, but nowhere near the top.</p>
<p><a href="http://mathoverflow.net/q/203291/57713">I have asked MathOverflow about this</a> and they don’t seem to know either.
In fact, in a way that is easy to see but hard to formally state, this question is stronger than the following open problem:</p>
<blockquote>
<p><strong>Conjecture</strong> (Hedetniemi, 1966)<strong>.</strong> For all graphs <script type="math/tex">G</script> and <script type="math/tex">H</script>,
<script type="math/tex">\chi(G \times H) = \min\{\chi(G),\chi(H)\}</script>.</p>
</blockquote>
<p>It’s worth noting that the chromatic number of a product is easily upper bounded by that of both factors, because of the projection maps, so most of the work is trying to see that bound hold tightly.
Currently, we know this conjecture is true when either of <script type="math/tex">G</script> or <script type="math/tex">H</script> is 4-colourable.
If you want some reading material on this topic, the <a href="https://en.wikipedia.org/wiki/Hedetniemi's_conjecture">Wikipedia page</a> has a pretty good list of references.
The following paper in particular looks at Hedetniemi’s conjecture from the hom perspective.</p>
<blockquote>
<p>Häggkvist et al.
<em>On multiplicative graphs and the product conjecture.</em>
Combinatorica (1988) 8: 63.
<a href="https://doi.org/10.1007/BF02122553">doi:10.1007/BF02122553</a></p>
</blockquote>
<p>Another thing MO pointed out to me was some more categorical ideas involving map graphs (don’t look this one up on Wikipedia, it’s something different).
The point is that, not only is the direct product the product of the category of graphs and graph homs, but it has an exponential object with respect to this product.
That route seems even less workable if you look closely, but I haven’t fully convinced myself the approach is useless.</p>
<hr />
<p>It appears that I’ve run out the clock on actually explaining any more of this in more detail.
I definitely have a lot more I’d like to say, but this is a pretty good introduction to the core problem.
Each of the directions to proceed from here could fill a post or two of their own, so I think I’ll do that and refer back to this post as material to set the scene.</p>
<p>To end off on a high note, let me present some good news.
The smallest pair of hom-incomparable graphs is <script type="math/tex">K_3</script> and the <a href="https://en.wikipedia.org/wiki/Gr%C3%B6tzsch_graph">Grötzsch graph</a> <script type="math/tex">G</script> on 11 vertices.
<script type="math/tex">G</script> is <script type="math/tex">\times</script>-irreducible, because 11 is a prime number, so we should expect that <script type="math/tex">K_3 \times G</script> is a core.
This graph has 33 vertices, which is a lot when you’re asking about cores,
but running the world’s laziest SageMath code for a week has confirmed that, yes, <script type="math/tex">K_3 \times G</script> is a core.</p>
<p>The next pair of hom-incomparable <script type="math/tex">\times</script>-irreducible cores I can think of are the Möbius ladder <script type="math/tex">M_8</script> and the Petersen graphs, whose product comes to 80 vertices,
but I don’t own any supercomputers yet, so trying to figure out a more efficient way to test for that is yet another interesting direction to explore.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Some of you call these stable sets or independent sets. But I will call them cocliques until the day I die. <a href="https://en.wikipedia.org/wiki/Coclique">Wikipedia can go suck a lemon.</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Ilia ChtcherbakovToday I’d like to talk about an open problem I’ve been interested in for the past couple of years.
It’s a very hard problem, in that there are easier special cases that are famously unapproachable,
but it makes for some rather pretty algebra, living at the intersection of graph theory and category theory.
If you ask me, it’s too good to be false.A Few Translation Exercises2017-02-26T23:59:59-05:002017-02-26T23:59:59-05:00http://cleare.st/math/a%20few-translation-exercises<p>One of my favourite books is <a href="https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach">Gödel, Escher, Bach</a>, by Douglas Hofstadter.
If you’re patient enough to read it all, I highly recommend it: it’s a great book about mathematics and cognitive science.</p>
<p>One of the main goals of the book is to motivate, and sketch a proof of, Gödel’s first incompleteness theorem.
At one point, he provides some exercises in transcribing number theoretical statements in a specific implementation of Peano arithmetic he calls TNT. <!--more-->
They’re pretty easy, but there are a couple that can really stump you
if you haven’t taken any first-order logic courses, like me.</p>
<p>It would befit me to detail Hofstadter’s notation for TNT, but it’s not really anything out of the ordinary.
We denote material implication with the archaic <script type="math/tex">\def\impl{\mathbin\supset}\impl</script> symbol, and negation with the tilde <script type="math/tex">\def\tnot{\mathop{\sim}}\tnot</script>,
but otherwise it is bog-standard, and the particulars are not that important.</p>
<p>Here is the relevant passage from Chapter 8 of GEB—in my 20th Anniversary Edition, it is on page 215.</p>
<blockquote>
<p class="centered"><strong>A Few More Translation Exercises</strong></p>
<p>And now, a few practice exercises for you, to test your understanding of the
notation of TNT. Try to translate the first four of the following N-sentences
into TNT-sentences, and the last one into an open well-formed formula.</p>
<p class="centered">All natural numbers are equal to 4. <br />
There is no natural number which equals its own square. <br />
Different natural numbers have different successors. <br />
If 1 equals 0, then every number is odd. <br />
<script type="math/tex">b</script> is a power of 2.</p>
<p>The last one you may find a little tricky.
But is it nothing, compared to this one:</p>
<p class="centered"><script type="math/tex">b</script> is a power of 10.</p>
<p>Strangely, this one takes great cleverness to render in our notation.
I would caution you to try it only if you are willing to spend hours
and hours on it—and if you know quite a bit of number theory!</p>
</blockquote>
<p>Let’s just jump right into it.
I know you’re excited to see me bend over backwards to translate that last one, but I don’t like to just skip to the end of a story,
however trite the crescendo.</p>
<hr />
<p>The first few exercises are pretty much freebies, so I won’t say too much about them.
Note that, as with any translation, we sometimes have to massage our goal a bit to be able to translate accurately.</p>
<blockquote>
<p>All natural numbers are equal to 4.</p>
</blockquote>
<p><script type="math/tex">\def\0{\mathsf 0}\def\s{\mathsf S}\def\c{\mathpunct:}\def\eq{\mathbin=}\forall n\c n \eq \s\s\s\s\0</script>.</p>
<blockquote>
<p>There is no natural number which equals its own square.</p>
</blockquote>
<p><script type="math/tex">\tnot\exists n\c n \eq (n \cdot n)</script>.
Note that by convention, all binary arithmetical operations must be delimited with parentheses <script type="math/tex">({-})</script>.
This was probably done to avoid worrying about disambiguation, especially later on when proving the incompleteness theorem.</p>
<blockquote>
<p>Different natural numbers have different successors.</p>
</blockquote>
<p><script type="math/tex">% <![CDATA[
\def\<{\langle}\def\>{\rangle}\forall m\c \forall n\c \< \tnot m \eq n \impl \tnot \s m \eq \s n \> %]]></script>.
Just as the arithmetical operations got parens, the binary logical connectives must be delimited with angle brackets <script type="math/tex">% <![CDATA[
\<{-}\> %]]></script>.
Don’t cross the streams and all that.</p>
<blockquote>
<p>If 1 equals 0, then every number is odd.</p>
</blockquote>
<p><script type="math/tex">% <![CDATA[
\< \s\0 \eq \0 \impl \forall n\c \exists a\c n = \s(a+a) \> %]]></script>.
I interpret this exercise as a gentle reminder to keep your quantifiers as tight as possible.</p>
<blockquote>
<p><script type="math/tex">b</script> is a power of 2.</p>
</blockquote>
<p>Okay, finally something juicy. Here we must create a formula with one free variable, <script type="math/tex">b</script>, which is true iff <script type="math/tex">b</script> is a power of two.
We have no way to directly transcribe exponentiation, as our only operations are <script type="math/tex">+</script> and <script type="math/tex">\cdot</script>.
So how can we state that property sufficiently simply to implement that in first order logic?</p>
<p>By the Fundamental Theorem of Arithmetic, every nonzero number which is not a power of two has an odd factor besides <script type="math/tex">\mathsf{S0}</script>.
Every odd number is a factor of <script type="math/tex">\0</script>, so that case is handled as well.
Conversely, every factor of a power of two is either equal to <script type="math/tex">\s\0</script> or is even. So we’ll use that reformulation.</p>
<script type="math/tex; mode=display">% <![CDATA[
\def\u#1#2{ {\underbrace{#1}_{#2}}}\forall a\c \< \u{\exists c\c b = (\mathsf{SS} a \cdot c)}{\mathsf{SS} a\ \text{is a factor}} \impl \u{\exists c\c \mathsf{SS} a = (c+c)}{\mathsf{SS} a\ \text{is even}} \> %]]></script>
<p>Recall that if we want to talk about a number <script type="math/tex">a' \ge 2</script>, then we have to write it as <script type="math/tex">a' = \mathsf{SS} a</script> for <script type="math/tex">a \ge 0</script>.
Also recall that there’s nothing wrong with using <script type="math/tex">c</script> in both the antecedent and the consequent, because of the tightness of my quantifier bindings.</p>
<blockquote>
<p><script type="math/tex">b</script> is a power of 10.</p>
</blockquote>
<p>Now we’re going to have to be even more clever than last time.
The previous solution worked because 2 was prime, and we were able to get away with not caring which power of two <script type="math/tex">b</script> was.
But 10 isn’t a prime, and even if we want to say that <script type="math/tex">10^n = 2^n \cdot 5^n</script>, we’ll somehow need to know what that exponent <script type="math/tex">n</script> is.</p>
<p>Well, here’s a slick math trick. <em>Let’s just say we know what <script type="math/tex">n</script> is.</em>
Since we don’t really know how to read off <script type="math/tex">n</script> easily from <script type="math/tex">b = 10^n</script>, we’ll just sweep that problem under the rug with an <script type="math/tex">\exists n</script>.
So as long as we can calculate <script type="math/tex">10^n</script> in TNT, we can stipulate that <script type="math/tex">b = 10^n</script> and be done with it.</p>
<p>How do we do that? Well, let’s just compute <script type="math/tex">10^n</script> the way we usually do.
Namely, start with <script type="math/tex">\mathsf{S0}</script>, then multiply it by <script type="math/tex">\mathsf{SSSSSSSSSS0}</script> <script type="math/tex">n</script> times.
This gives a sequence of numbers, <script type="math/tex">x = (1,10,100,...,10^n)</script>.
What makes this sequence nice is that it has some defining characteristics which are very easily beheld, even by our good friend <a href="https://en.wikipedia.org/wiki/Giuseppe_Peano">Giuseppe</a>:
to wit, <script type="math/tex">x_0 = 1</script>, and <script type="math/tex">x_{i+1} = 10 \cdot x_i</script> for all <script type="math/tex">% <![CDATA[
i < n %]]></script>.</p>
<p>We can view this sequence <script type="math/tex">x</script> as a certificate that <script type="math/tex">b</script> equals <script type="math/tex">10^n</script>,
by asking that <script type="math/tex">x_n = b</script>, and it looks like TNT is in principle smart enough to be able to verify this, since all we really need is multiplication by ten.
Clearly, a certificate for <script type="math/tex">b</script> can only exist iff <script type="math/tex">b</script> is a power of ten,
so our issue now is to write this certificate in a language that TNT can understand.
It would suffice to encode <script type="math/tex">x</script> as some fixed-length tuple of numbers, because then we can just existentially quantify those suckers and go home.</p>
<p>Our saviour comes in the form of the <a href="https://en.wikipedia.org/wiki/Chinese_remainder_theorem">Chinese Remainder Theorem</a>.
If we have a pairwise-coprime set of moduli <script type="math/tex">m_0, ..., m_n</script> that are sufficiently large, then we can solve the system of equations</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
a &\equiv x_0 \pmod{m_0} \\
a &\equiv x_1 \pmod{m_1} \\
&\;\mathrel{\vdots} \\
a &\equiv x_n \pmod{m_n} \\
\end{align*} %]]></script>
<p>for <script type="math/tex">a</script>. That guarantees the existence of an encoding.
Turning that on its head, given <script type="math/tex">a</script> and <script type="math/tex">m_i</script>, we can access <script type="math/tex">x_i</script> as the remainder of dividing <script type="math/tex">a</script> by <script type="math/tex">m_i</script>.
For convenience, let’s define a relation <script type="math/tex">R</script> for saying that <script type="math/tex">r</script> is the remainder when dividing <script type="math/tex">a</script> by <script type="math/tex">m</script>:</p>
<script type="math/tex; mode=display">% <![CDATA[
R[a,m,r] := \< \u{\exists r'\c m \eq (r + \s r')}{r < m\vphantom\mid} \wedge \u{\exists q\c a \eq ((m \cdot q) + r)}{m \mid a - r} \> %]]></script>
<p>So now we just need a convenient collection of pairwise coprime <script type="math/tex">m_i</script> to use.
I’ll go with <script type="math/tex">m_i = 1 + k(i+1)</script>, where <script type="math/tex">k</script> is a sufficently large and highly divisible number, so that (1) <script type="math/tex">m_i > x_i</script>, and (2) the <script type="math/tex">m_i</script> are coprime.
We can guarantee this works, because e.g. <script type="math/tex">k = \max(x_0, x_1, ..., x_n, n)!</script> has precisely the properties we want.
So our certificate will provide <script type="math/tex">k</script> and <script type="math/tex">a</script>.
We also need to know <script type="math/tex">n</script> to know when to stop and compare to <script type="math/tex">b</script>.</p>
<p>We now have our certificate scheme <script type="math/tex">(n,a,k)</script>, and can find one for any valid <script type="math/tex">b</script>. They can get pretty large, though, especially using our overly generous estimate for <script type="math/tex">k</script>.
For example, <script type="math/tex">(1,43,5)</script> works to prove that <script type="math/tex">10 = 10^1</script>, since <script type="math/tex">43 \equiv 1 \pmod{1\cdot5+1}</script> and <script type="math/tex">43 \equiv 10 \pmod{2\cdot5+1}</script>.
However, in our safe estimate, we took <script type="math/tex">k = 10! = 3628800</script> so that the least result CRT can give us is <script type="math/tex">a = 20(k+1) - (2k+1) = 65318419</script>.
<script type="math/tex">(2,32026,34)</script> will certify that <script type="math/tex">100 = 10^2</script>, but our safe estimate gives <script type="math/tex">k = 100! \approx 9.33 \times 10^{157}</script> and some value of <script type="math/tex">% <![CDATA[
a < (k+1)(2k+1)(3k+1) %]]></script>.</p>
<p>In any case, we have enough to solve the problem, so let’s assemble the parts.
Our solution will be <script type="math/tex">\exists n\c P[\mathsf{SSSSSSSSSS0},n,b]</script>, where</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
P[m,n,b] &:= \exists a\c \exists k\c \< \< \u{R[a,\s k,\s\0]}{x_0=1} \wedge \u{R[a,\s(k \cdot \s n),b]}{x_n=b} \> \wedge \u{I[m,n,a,k]}{\text{inductive step}} \>, \\
I[m,n,a,k] &:= \forall i\c \< \u{\exists j\c n \eq (i + \s j)}{i < n} \impl \exists y\c \< \u{R[a,\s(k \cdot \s i),y]}{x_i = y} \wedge \u{R[a,\s(k \cdot \s\s i),(m \cdot y)]}{x_{i+1} = my} \> \>.
\end{align*} %]]></script>
<p>Thankfully, that’s it, and we’re done.
Those poor souls who wish to prove that this method of exponentiation is well-defined <em>in Peano Arithmetic</em> have it way way worse
because they have to fiddle with Cantor coding probably and use induction to prove the arbitrary extension of sequences.
It’s a real stinker, and mathematicians are fine with this because the mere existence of a proof is good enough to justify the Laconic axiom set.</p>
<p>To close out, I’ve gone and done the obvious thing, which was to inline all the definitions to give you one long string of pure TNT.
I’ve thrown in a couple of optimizations to shave off what I could—including obviating an <script type="math/tex">\exists s\c \s k \eq (\s\0 + \s s)</script> by using <script type="math/tex">\s k</script> everywhere we’d otherwise use <script type="math/tex">k</script>—but we’re still at 201 symbols.
So brace yourselves…</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\exists n\c{}&\exists a\c \exists k\c \< \< \exists q\c a \eq \s(\s\s k \cdot q) \wedge \< \exists s\c (\s k \cdot \s n) \eq (b + s) \wedge \exists q\c a \eq ((\s(\s k \cdot \s n) \cdot q) + b) \> \> \\
&{} \wedge \forall i\c \< \exists j\c n \eq (i + \s j) \impl \exists y\c \< \< \exists s\c (\s k \cdot \s i) \eq (y + s) \wedge \exists q\c a \eq ((\s(\s k \cdot \s i) \cdot q) + y) \> \\
&\qquad{} \wedge \< \exists s\c (\s k \cdot \s\s i) \eq ((\mathsf{SSSSSSSSSS0} \cdot y) + s) \\
&\qquad{} \wedge \exists q\c a \eq ((\s(\s k \cdot \s\s i) \cdot q) + (\mathsf{SSSSSSSSSS0} \cdot y)) \> \> \> \>
\end{align*} %]]></script>
<!-- En:Ea:Ek:<<Eq:a=S(SSk*q)^<Es:(Sk*Sn)=(b+s)^Eq:a=((S(Sk*Sn)*q)+b)>>^Ai:<Ej:n=(i+Sj)/Ey:<<Es:(Sk*Si)=(y+s)^Eq:a=((S(Sk*Si)*q)+y)>^<Es:(Sk*SSi)=((SSSSSSSSSS0*y)+s)^Eq:a=((S(Sk*SSi)*q)+(SSSSSSSSSS0*y))>>>> -->
<blockquote>
<p><strong>Exercise.</strong> (For all the code golfers out there.)
Find a formula with less than 200 symbols. Can you beat 193?</p>
</blockquote>Ilia ChtcherbakovOne of my favourite books is Gödel, Escher, Bach, by Douglas Hofstadter.
If you’re patient enough to read it all, I highly recommend it: it’s a great book about mathematics and cognitive science.
One of the main goals of the book is to motivate, and sketch a proof of, Gödel’s first incompleteness theorem.
At one point, he provides some exercises in transcribing number theoretical statements in a specific implementation of Peano arithmetic he calls TNT.A Diophantine Contest Problem2017-02-19T18:40:10-05:002017-02-19T18:40:10-05:00http://cleare.st/math/diophantine-contest-problem<p>This is a shorter post, about a silly little problem I came up with a few months ago.
It’s not a very intelligent problem, in that it doesn’t really serve any mathematical purpose; nor is it altogether tough;
but I think it has a certain aesthetic pleasantness to it.
<!--more--></p>
<blockquote>
<p><strong>Problem.</strong>
Let <script type="math/tex">\def\s#1{(\mathrm S_{#1})}\s{15}</script> be the following system of 15 linear Diophantine equations in 15 variables.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
1 &= O + N + E \\
2 &= T + W + O \\
3 &= T + H + R + E + E \\
&\;\mathrel{\smash{\vdots}} \\
14 &= F + O + U + R + T + E + E + N \\
15 &= F + I + F + T + E + E + N
\end{align*} %]]></script>
<p>Does this system have an integer solution?</p>
<p>More generally, if we define <script type="math/tex">\s n</script> analogously for <script type="math/tex">n \ge 1</script>, what is the greatest <script type="math/tex">n</script> for which there exists an integer solution?</p>
</blockquote>
<p>If you want to try it for yourself, don’t read the next bit because I’ll also say some words about how to solve it.
I think it’s kinda fun to discover the curious properties of <script type="math/tex">\s n</script>,
so easy as it is to just keep reading, consider giving it a shot yourself.</p>
<hr />
<p>So how would you approach such a problem?
Well, writing it out in full would help.
Then if you look closely, you’ll notice some interesting relations.
Here’s a useful one:</p>
<script type="math/tex; mode=display">T + H + I + R + T + E + E + N = 13 = 3 + 10 = T + H + R + E + E + T + E + N.</script>
<p>This implies that <script type="math/tex">I = E</script>.
Thus, <script type="math/tex">9 = N + I + N + E = 2(N + I)</script> for any integral solution, which is impossible.
So <script type="math/tex">\s{15}</script> is has no integral solutions.</p>
<p>In fact, this helps us for the second part, because we have shown that even <script type="math/tex">\s{13}</script> has no integral solutions.
Let’s how far down we can go.
First, note that <script type="math/tex">12 = T + W + E + L + V + E</script> is redundant, because of the following anagrammatical fact:</p>
<script type="math/tex; mode=display">T + W + E + L + V + E = (E + L + E + V + E + N) - (O + N + E) + (T + W + O)</script>
<p>So we can do <script type="math/tex">\s{12}</script> iff we can do <script type="math/tex">\s{11}</script>.</p>
<p>Well, we can get a bit sneakier with our reductions.
<script type="math/tex">11 = E + L + E + V + E + N</script> is the only equation to use the variable <script type="math/tex">L</script>,
so if we can solve the rest, then we can take <script type="math/tex">L = 11 - 3E - V - N</script>.
Following this reduction strategy, we can do it again on <script type="math/tex">(2,W)</script>, <script type="math/tex">(4,U)</script>, <script type="math/tex">(6,X)</script>, and <script type="math/tex">(8,G)</script> to leave the equations <script type="math/tex">\{1,3,5,7,9,10\}</script>.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
1 &= O + N + E \\
3 &= T + H + R + E + E \\
5 &= F + I + V + E \\
7 &= S + E + V + E + N \\
9 &= N + I + N + E \\
10 &= T + E + N
\end{align*} %]]></script>
<p>But now we can reduce again, and we can consider variables that were occupied by already-eliminated equations!
Since we have eliminated equations <script type="math/tex">\{2,4,6\}</script>, we can eliminate <script type="math/tex">(1,O)</script>, <script type="math/tex">(3,R)</script>, <script type="math/tex">(5,F)</script>, and <script type="math/tex">(7,S)</script>,
which pares the problem down to the two equations <script type="math/tex">9 = N + I + N + E</script> and <script type="math/tex">10 = T + E + N</script>.</p>
<p>That’s easy to solve, so we conclude that the answer to the problem is <script type="math/tex">n = 12</script>.</p>Ilia ChtcherbakovThis is a shorter post, about a silly little problem I came up with a few months ago.
It’s not a very intelligent problem, in that it doesn’t really serve any mathematical purpose; nor is it altogether tough;
but I think it has a certain aesthetic pleasantness to it.The call/cc Yin-Yang Puzzle2017-02-12T23:56:00-05:002017-02-12T23:56:00-05:00http://cleare.st/code/call-cc-yin-yang-puzzle<p>The <em>call/cc yin-yang puzzle</em> is an ancient piece of Scheme code,
which was written—or more accurately <em>discovered</em>—by <a href="http://www.madore.org/~david/">David Madore</a>
in the year 1999 upon his invention of the esoteric programming language <a href="http://www.madore.org/~david/programs/unlambda">Unlambda</a>.
It is a rite of passage for aspiring Schemers to grok these five lines, if they claim true mastery over the power of the continuation.
<!--more-->
This is my attempt at understanding of its mysterious action, from the only-slightly-less-ancient era of 2012.</p>
<hr />
<p>Here is the code in its original glory.</p>
<div class="language-scheme highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="p">((</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">foo</span><span class="p">)</span> <span class="p">(</span><span class="nb">newline</span><span class="p">)</span> <span class="nv">foo</span><span class="p">)</span>
<span class="p">(</span><span class="nb">call/cc</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">bar</span><span class="p">)</span> <span class="nv">bar</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">yang</span> <span class="p">((</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">foo</span><span class="p">)</span> <span class="p">(</span><span class="nb">write-char</span> <span class="o">#</span><span class="err">\</span><span class="nv">*</span><span class="p">)</span> <span class="nv">foo</span><span class="p">)</span>
<span class="p">(</span><span class="nb">call/cc</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">bar</span><span class="p">)</span> <span class="nv">bar</span><span class="p">)))))</span>
<span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span>
</code></pre>
</div>
<p>Here is a prefix of its output.</p>
<div class="highlighter-rouge"><pre class="highlight"><code>
*
**
***
****
*****
</code></pre>
</div>
<p>It prints increasingly long lines of asterisks; first one, then two, and so on.
The challenge of the puzzle is to explain why.</p>
<p>To begin my analysis, let me rewrite the problem a bit, to make it a bit clearer what is going on:</p>
<div class="language-scheme highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="k">define</span> <span class="nv">I</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">bar</span><span class="p">)</span> <span class="nv">bar</span><span class="p">))</span> <span class="c1">; Identity</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">N</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">foo</span><span class="p">)</span> <span class="p">(</span><span class="nb">newline</span><span class="p">)</span> <span class="nv">foo</span><span class="p">))</span> <span class="c1">; Newline</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">A</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">foo</span><span class="p">)</span> <span class="p">(</span><span class="nb">write-char</span> <span class="o">#</span><span class="err">\</span><span class="nv">*</span><span class="p">)</span> <span class="nv">foo</span><span class="p">))</span> <span class="c1">; Asterisk</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="p">(</span><span class="nf">N</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span>
</code></pre>
</div>
<p>Both bindings have the pattern of generating a continuation, printing something, and then binding the continuation.
I’ll do my best to explain what a continuation is in a second, but first off,
it behooves us to understand which flavour of <code class="highlighter-rouge">let</code> we are using.</p>
<p><code class="highlighter-rouge">(let* ((a foo) (b bar)) blah)</code> performs its assignments in a strictly-enforced order.
First, <code class="highlighter-rouge">foo</code> is evaluated, then bound to <code class="highlighter-rouge">a</code>, and only then <code class="highlighter-rouge">bar</code> is evaluated and bound to <code class="highlighter-rouge">b</code>.
This order of operations is crucial for the continuations to behave correctly.
In particular, <code class="highlighter-rouge">a</code> is bound and visible to <code class="highlighter-rouge">(b bar)</code> when <code class="highlighter-rouge">bar</code> is being computed. Keep this in mind.</p>
<hr />
<p>I guess I should take a moment now to explain precisely what a continuation is, and how <code class="highlighter-rouge">call/cc</code> gives us one,
for the sake of the readers that aren’t familiar.</p>
<p>A <em>continuation</em>, according to <a href="http://en.wikipedia.org/wiki/Continuation">Wikipedia</a> is “an abstract representation of the control state of a computer program”.
For all intents and purposes, it is a snapshot of a partially executed program,
packaged up into a function, with a hole at the current position.
Invoking the function with an argument will plug that argument into the hole in the program, and continue execution from there.</p>
<p>If I may be permitted an example, consider the following code.</p>
<div class="language-scheme highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">pythag</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
<span class="p">(</span><span class="nb">sqrt</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">a</span><span class="p">)</span>
<span class="p">(</span><span class="nb">*</span> <span class="nv">b</span> <span class="nv">b</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">pythag</span> <span class="mi">3</span> <span class="mi">4</span><span class="p">)</span>
</code></pre>
</div>
<p>Under some reasonable assumptions about the order of evaluation of expressions,<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>
the continuation of this program at the second <code class="highlighter-rouge">*</code> operation would be the function <code class="highlighter-rouge">(lambda (arg) (sqrt (+ 9 arg)))</code>.
At that point, the program has just calculated that <code class="highlighter-rouge">(* 4 4)</code> is equal to <code class="highlighter-rouge">16</code>, and is poised to invoke the continuation with that result.</p>
<p><code class="highlighter-rouge">call/cc</code> is short for <a href="http://en.wikipedia.org/wiki/Call-with-current-continuation"><code class="highlighter-rouge">call-with-current-continuation</code></a>, and what it does is it takes the current continuation of your program—at the <code class="highlighter-rouge">call/cc</code>—wraps it up in a lambda, and just <strong>gives</strong> it to the program for use.
More precisely, <code class="highlighter-rouge">(call/cc f)</code> evaluates to <code class="highlighter-rouge">(f Cont)</code> where <code class="highlighter-rouge">Cont</code> is the continuation of <code class="highlighter-rouge">(call/cc f)</code>, i.e. the <em>current</em> one when <code class="highlighter-rouge">call/cc</code> is called.</p>
<p>While it is nothing if not aptly named, its behaviour can be a bit confusing and opaque, and its implications might be unclear at first.
Your first instinct might be to use <code class="highlighter-rouge">call/cc</code> as an early abortion mechanism, a sort of hackish try-catch block where your <code class="highlighter-rouge">f</code> can invoke the continuation to error out safely if if so desires.
But the true power is in calling something like <code class="highlighter-rouge">(call/cc I)</code>, using our identity function <code class="highlighter-rouge">I</code> to <em>let the continuation escape</em>.
Then we can put it in our pocket, do some computation, and subsequently invoke the continuation with the result, effectively <em>going back in time</em> with information from the future.</p>
<p>Some operations, like outputs and <code class="highlighter-rouge">set!</code>s, exist outside of the “program-timeline”,
so there is still a discernable narrative, from the programmer’s point of view.
However, this control of the flow of time and hence of the flow of the program
lets you implement arbitrarily interesting control structures, like <code class="highlighter-rouge">break</code>-able loops, Pythonic generators, full coroutines, and so on, in a uniform way.
I’ll discuss the implementation of this feature a bit more towards the end of the post, but conceptually we now understand how continuations work, at least at a low level.</p>
<hr />
<p>So let’s return to the puzzle.
I will proceed to evaluate<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> it by hand one step at a time, displaying each line. Then we can refer back to it as I explain.
As a convention, I’ll name the continuations that are created <code class="highlighter-rouge">C0</code>, <code class="highlighter-rouge">C1</code>, <code class="highlighter-rouge">C2</code>, and so on in turn.
Whenever <code class="highlighter-rouge">N</code> or <code class="highlighter-rouge">A</code> is called, I will record that with a comment at right.</p>
<div class="language-scheme highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="p">(</span><span class="nf">N</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">)))</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="p">(</span><span class="nf">N</span> <span class="p">(</span><span class="nf">I</span> <span class="nv">C0</span><span class="p">)))</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="p">(</span><span class="nf">N</span> <span class="nv">C0</span><span class="p">))</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C0</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; N</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C0</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nf">I</span> <span class="nv">C1</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; N</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C0</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="nv">C1</span><span class="p">)))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; N</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C0</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="nv">C1</span><span class="p">))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NA</span>
<span class="p">(</span><span class="nf">C0</span> <span class="nv">C1</span><span class="p">)</span> <span class="c1">; NA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="p">(</span><span class="nf">N</span> <span class="nv">C1</span><span class="p">))</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C1</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NAN</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C1</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="nv">C2</span><span class="p">)))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NAN</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C1</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="nv">C2</span><span class="p">))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANA</span>
<span class="p">(</span><span class="nf">C1</span> <span class="nv">C2</span><span class="p">)</span> <span class="c1">; NANA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C0</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="nv">C2</span><span class="p">)))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C0</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="nv">C2</span><span class="p">))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAA</span>
<span class="p">(</span><span class="nf">C0</span> <span class="nv">C2</span><span class="p">)</span> <span class="c1">; NANAA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="p">(</span><span class="nf">N</span> <span class="nv">C2</span><span class="p">))</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C2</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAAN</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C2</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="nv">C3</span><span class="p">)))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAAN</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C2</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="nv">C3</span><span class="p">))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAANA</span>
<span class="p">(</span><span class="nf">C2</span> <span class="nv">C3</span><span class="p">)</span> <span class="c1">; NANAANA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C1</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="nv">C3</span><span class="p">)))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAANA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C1</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="nv">C3</span><span class="p">))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAANAA</span>
<span class="p">(</span><span class="nf">C1</span> <span class="nv">C3</span><span class="p">)</span> <span class="c1">; NANAANAA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C0</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="nv">C3</span><span class="p">)))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAANAA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C0</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="nv">C3</span><span class="p">))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAANAAA</span>
<span class="p">(</span><span class="nf">C0</span> <span class="nv">C3</span><span class="p">)</span> <span class="c1">; NANAANAAA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="p">(</span><span class="nf">N</span> <span class="nv">C3</span><span class="p">))</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAANAAA</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C3</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="p">(</span><span class="nb">call/cc</span> <span class="nv">I</span><span class="p">))))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAANAAAN</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nf">yin</span> <span class="nv">C3</span><span class="p">)</span> <span class="p">(</span><span class="nf">yang</span> <span class="p">(</span><span class="nf">A</span> <span class="nv">C4</span><span class="p">)))</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))</span> <span class="c1">; NANAANAAAN</span>
<span class="o">...</span>
</code></pre>
</div>
<p>Because the time-travel metaphor is probably the most apt way to understand the operation of the puzzle,
thinking about the “past” and “future” of each continuation is crucial to following the behaviour.</p>
<p>So, to start, <code class="highlighter-rouge">yin</code> spins up a continuation <code class="highlighter-rouge">C0</code>, whose future is to be bound to <code class="highlighter-rouge">yin</code> and then evaluate <code class="highlighter-rouge">yang</code>.
<code class="highlighter-rouge">yang</code> creates a continuation of its own, <code class="highlighter-rouge">C1</code>, and <code class="highlighter-rouge">C1</code> lives in a timeline where <code class="highlighter-rouge">yin</code> is bound to <code class="highlighter-rouge">C0</code>.
We continue to the evaluation of <code class="highlighter-rouge">(yin yang) = (C0 C1)</code>.
This sends <code class="highlighter-rouge">C1</code> back in time to be bound to <code class="highlighter-rouge">yin</code>!</p>
<p>Now when <code class="highlighter-rouge">yang</code> generates a new continuation <code class="highlighter-rouge">C2</code>, it’s in a different timeline
with a different <code class="highlighter-rouge">yin</code> binding than its own past.
We bind <code class="highlighter-rouge">yang</code> to <code class="highlighter-rouge">C2</code> and evaluate <code class="highlighter-rouge">(yin yang) = (C1 C2)</code>.</p>
<p>At this point, <code class="highlighter-rouge">C2</code> goes back (or maybe sideways?) in time to the creation of <code class="highlighter-rouge">C1</code>, and binds <code class="highlighter-rouge">yang</code>.
In this timeline, <code class="highlighter-rouge">yin</code> is <code class="highlighter-rouge">C0</code>, so <code class="highlighter-rouge">(yin yang) = (C0 C2)</code> sends <code class="highlighter-rouge">C2</code> even further back in time <em>to be bound to <code class="highlighter-rouge">yin</code></em>.</p>
<p>Now when <code class="highlighter-rouge">yang</code> generates a new continuation <code class="highlighter-rouge">C3</code>, it’s in a different timeline
with a different <code class="highlighter-rouge">yin</code> binding than its own past.
We bind <code class="highlighter-rouge">yang</code> to <code class="highlighter-rouge">C3</code> and evaluate <code class="highlighter-rouge">(yin yang) = (C2 C3)</code>.
(Do you notice history repeating itself?)</p>
<p><code class="highlighter-rouge">C3</code> is sent back to when <code class="highlighter-rouge">yin</code> was <code class="highlighter-rouge">C1</code>, so we eval <code class="highlighter-rouge">(C1 C3)</code>. <br />
<code class="highlighter-rouge">C3</code> is sent back to when <code class="highlighter-rouge">yin</code> was <code class="highlighter-rouge">C0</code>, so we eval <code class="highlighter-rouge">(C0 C3)</code>. <br />
<code class="highlighter-rouge">C3</code> is sent back to the very beginning, so we spin up <code class="highlighter-rouge">C4</code> and eval <code class="highlighter-rouge">(C3 C4)</code>.</p>
<p>Continuing the pattern, we see that <code class="highlighter-rouge">C[n+1]</code> is created in a timeline where <code class="highlighter-rouge">C[n]</code> binds <code class="highlighter-rouge">yin</code>.
We proceed to evaluate <code class="highlighter-rouge">(C[n] C[n+1])</code>, which passes it down to <code class="highlighter-rouge">(C[n-1] C[n+1])</code>, and so on down to <code class="highlighter-rouge">(C0 C[n+1])</code>.
After that final step, <code class="highlighter-rouge">C[n+1]</code> is the new <code class="highlighter-rouge">yin</code> and we create the next continuation.
The continuations arrange themselves in a chain, and new continuations are passed down this chain until they reach <code class="highlighter-rouge">C0</code>,
at which point they trigger the creation of the next continuation.</p>
<p>It remains to describe the printing.
Well, newlines are printed when we bind <code class="highlighter-rouge">yin</code>—technically <em>just before</em> we bind it—and asterisks when we bind <code class="highlighter-rouge">yang</code>.
Whenever we bind <code class="highlighter-rouge">yin</code>, we know that the next step is to create a new continuation, and repeatedly bind it to <code class="highlighter-rouge">yang</code> as we pass it down the chain to <code class="highlighter-rouge">C0</code>.</p>
<p>Concretely, upon the binding of <code class="highlighter-rouge">C[n]</code> to <code class="highlighter-rouge">yin</code>, we print a newline, generate <code class="highlighter-rouge">C[n+1]</code>, print <code class="highlighter-rouge">n</code>+1 asterisks—one for each of <code class="highlighter-rouge">C[n]</code>, <code class="highlighter-rouge">C[n-1]</code>, …, <code class="highlighter-rouge">C0</code> as we pass it down—and then bind <code class="highlighter-rouge">C[n+1]</code> to <code class="highlighter-rouge">yin</code> anew.
So altogether we should expect a newline, followed by 0+1 asterisks, then a newline followed by 1+1 asterisks, then a newline and 2+1 asterisks, and so on.</p>
<p>This is precisely the output of the yin-yang puzzle!</p>
<hr />
<p>To finish off, I’ll make an attempt at rewriting the puzzle in so-called <a href="https://en.wikipedia.org/wiki/Continuation-passing_style"><em>continuation-passing style</em></a>, in which each function is called by explicitly passing its continuation,
and each function is defined by evaluating the provided continuation with the computed result.</p>
<div class="language-scheme highlighter-rouge"><pre class="highlight"><code><span class="c1">; CPS versions of "primitives"</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">I&</span> <span class="nv">x</span> <span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="nf">k</span> <span class="nv">x</span><span class="p">))</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">N&</span> <span class="nv">x</span> <span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="nb">newline</span><span class="p">)</span> <span class="p">(</span><span class="nf">k</span> <span class="nv">x</span><span class="p">))</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">A&</span> <span class="nv">x</span> <span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="nb">write-char</span> <span class="o">#</span><span class="err">\</span><span class="nv">*</span><span class="p">)</span> <span class="p">(</span><span class="nf">k</span> <span class="nv">x</span><span class="p">))</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">call/cc&</span> <span class="nv">f&</span> <span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="nf">f&</span> <span class="nv">k</span> <span class="nv">k</span><span class="p">))</span>
<span class="c1">; microwave on high for 4 min</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">yin-yang&</span> <span class="nv">k</span><span class="p">)</span>
<span class="p">(</span><span class="nf">call/cc&</span> <span class="nv">I&</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">early-cont</span><span class="p">)</span>
<span class="p">(</span><span class="nf">N&</span> <span class="nv">early-cont</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">yin</span><span class="p">)</span>
<span class="p">(</span><span class="nf">call/cc&</span> <span class="nv">I&</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">later-cont</span><span class="p">)</span>
<span class="p">(</span><span class="nf">A&</span> <span class="nv">later-cont</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">yang</span><span class="p">)</span>
<span class="p">(</span><span class="nf">k</span> <span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">)))))))))))</span>
<span class="c1">; or pop it in the oven at 350 for 18 min</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">provide-cc&</span> <span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="nf">call/cc&</span> <span class="nv">I&</span> <span class="nv">k</span><span class="p">))</span> <span class="c1">;=>(k k)</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">yin-yang</span><span class="p">)</span>
<span class="p">(</span><span class="nf">provide-cc&</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">yin</span><span class="p">)</span>
<span class="p">(</span><span class="nb">newline</span><span class="p">)</span>
<span class="p">(</span><span class="nf">provide-cc&</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">yang</span><span class="p">)</span>
<span class="p">(</span><span class="nb">write-char</span> <span class="o">#</span><span class="err">\</span><span class="nv">*</span><span class="p">)</span>
<span class="p">(</span><span class="nf">yin</span> <span class="nv">yang</span><span class="p">))))))</span>
<span class="p">(</span><span class="nf">yin-yang</span><span class="p">)</span> <span class="c1">; runs forever</span>
</code></pre>
</div>
<p>Programming in this style seems at first like an exercise in pedantry,
but its purpose is to allow you direct access to the continuation, so that functions like <code class="highlighter-rouge">call/cc</code> can be implemented very easily.
As well, by giving each function explicit control of the continuation, each function has the ability to abort computation at any time, or basically do whatever it wants.</p>
<p>Maybe it’s a bit scary to trust each and every function with that power,
but in any case this is why we tend only to give this power to the compiler, and make everybody else ask for permission to see the future.
Also it’s a bit of a misattribution of agency, because programming usually occurs at a level <em>above</em> CPS-transforms,
so this transformation can introduce multiple layers between what would otherwise be a single logical function—viz. <code class="highlighter-rouge">yin-yang&</code>.</p>
<p>It’s pretty trippy to think about, though.
The function <code class="highlighter-rouge">provide-cc&</code> could be reasonably interpreted as ‘giving the future to the future’.</p>
<p>If you have the patience of a computer or a god, then you might want to try to perform a few beta-reductions and alpha-conversions on this code
to see how the printing statements come out of the woodwork upon execution.
Or if you’re sane you can just not do that.</p>
<p>In any case, that’s pretty much all I have to say about the yin-yang puzzle.
Continuations themselves have many more interesting applications and theoretical implications, but that stuff could fill a book or five.
So I’ll save the rest for another time.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Let me come clean and say that I’m not a computer scientist, nor a programmer, and I haven’t actually touched a Scheme interpeter in like three years. I couldn’t even be bothered to find one for this ‘blog post. So this is me merely covering my ass as I pretend to remember how Scheme works. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>See footnote 1. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Ilia ChtcherbakovThe call/cc yin-yang puzzle is an ancient piece of Scheme code,
which was written—or more accurately discovered—by David Madore
in the year 1999 upon his invention of the esoteric programming language Unlambda.
It is a rite of passage for aspiring Schemers to grok these five lines, if they claim true mastery over the power of the continuation.The Groups of a Field2017-02-06T19:40:45-05:002017-02-06T19:40:45-05:00http://cleare.st/math/groups-of-a-field<p><em>The following post is a digested version of a <a href="http://math.stackexchange.com/q/1918834/18850">question I asked</a> on math.SE a few months ago.</em></p>
<p>To every field <script type="math/tex">F = (F, {+}, {*})</script>, we can associate two natural groups.
These are the <em>additive group</em> <script type="math/tex">F^+ = (F, {+})</script> and the <em>multiplicative group of units</em> <script type="math/tex">F^\times = (F \smallsetminus \{0\}, {*})</script>.
A fun question to ask, especially of someone just getting started on basic group and ring theory, is whether or not these two groups are ever isomorphic for any field.</p>
<p>If you haven’t seen this question before, feel free to try it yourself!
<!--more-->
I can wait.</p>
<hr />
<p>The snappiest proof I’ve seen goes like this.
Suppose for a contradiction that <script type="math/tex">\phi : F^+ \mathrel{\tilde\longrightarrow} F^\times</script> is an isomorphism of groups.
Then the equation <script type="math/tex">x + x = 0</script> has exactly as many solutions as <script type="math/tex">y^2 = 1</script>.
Now proceed in two cases.</p>
<p>Clearly <script type="math/tex">x = 0</script> is a solution to <script type="math/tex">2x = 0</script>.
If there are any other solutions, then <script type="math/tex">F</script> must have characteristic two, i.e. <script type="math/tex">2\cdot1 = 2 = 0</script>.
However, in characteristic two, <script type="math/tex">y^2 - 1 = (y-1)^2</script> has only a single repeated root, <script type="math/tex">y = 1</script>.
If <script type="math/tex">x = 0</script> is the only solution, then the characteristic of <script type="math/tex">F</script> is not two,
and then <script type="math/tex">y^2 - 1 = (y-1)(y+1)</script> has two roots, <script type="math/tex">1</script> and <script type="math/tex">-1</script>.</p>
<p>In either case, the number of solutions disagrees, so <script type="math/tex">\phi</script> cannot exist.
Thus, for all fields <script type="math/tex">F</script>, <script type="math/tex">F^+ \not\cong F^\times</script>.</p>
<hr />
<p>Alright, cool. Here’s a different question.
Does there exist a pair of fields <script type="math/tex">E</script> and <script type="math/tex">F</script> such that <script type="math/tex">E^+ \cong F^\times</script>?
Well, that’s easy. We can find an example in the finite fields: <script type="math/tex">\def\F{\mathbb F}\F_2^+ \cong \F_3^\times</script>!</p>
<p>That’s a bit unsatisfying, so let’s take a stab at characterizing all of them.
We can use our casework from the first question to start off.</p>
<p>If the characteristic of <script type="math/tex">E</script> is 2, then there are <script type="math/tex">\def\card#1{\lvert#1\rvert}\card E</script> solutions to <script type="math/tex">2x = 0</script>,
so the only thing that works is giving <script type="math/tex">F</script> characteristic 2 and <script type="math/tex">\card E = 2</script>.
Then <script type="math/tex">\card F = 3</script> and we see precisely the example I gave above.</p>
<p>Now we may assume that <script type="math/tex">F</script> has characteristic two and <script type="math/tex">E</script> doesn’t.
If <script type="math/tex">E</script> has characteristic <script type="math/tex">p \ge 3</script>, then every element satisfies <script type="math/tex">px = 0</script>, so the nonzero elements of <script type="math/tex">F</script> must be solutions to <script type="math/tex">y^p = 1</script>.
From here it is easy to finish: <script type="math/tex">\card F = p+1 = 2^n</script> for some <script type="math/tex">n</script>,
so whenever <script type="math/tex">p</script> is a <a href="https://en.wikipedia.org/wiki/Mersenne_prime">Mersenne prime</a>, we have <script type="math/tex">\F_p^+ \cong \F_{2^n}^\times</script>.</p>
<p>This leaves the case when <script type="math/tex">E</script> has characteristic zero.
This is where things get a bit jargony, so I recommend some knowledge of group theory and field theory, and maybe a bit of comfort around model theory.
If you wish to leave now then the punchline is this:
we <em>can</em> construct such a pair of fields, thanks to Adler (1978),
but it takes a lot of high-powered tech.</p>
<p><script type="math/tex">E</script> has characteristic zero, so it is torsion-free.
All fields of characteristic zero contain (a copy of) <script type="math/tex">\def\Q{\mathbb Q}\Q</script>,
so that <script type="math/tex">E/\Q</script> is a field extension,
i.e. <script type="math/tex">E^+ \cong \Q^{(I)} := \bigoplus_{i \in I} \Q</script> is a rational vector space, with basis indexed by some set <script type="math/tex">I</script>.
The Wikipedia page for <em><a href="https://en.wikipedia.org/wiki/Divisible_group">divisible group</a></em> makes for some fun reading at this point.</p>
<p>At this point there’s not a lot we can say without invoking some fancy technology.
So we’ll do that, by taking a look at a lemma from the following paper.</p>
<blockquote>
<p>Adler, Allan.
<em>On the multiplicative semigroups of rings.</em>
Comm. Algebra 6 (1978), no. 17, 1751-1753.
<a href="https://doi.org/10.1080/00927877808822318">doi:10.1080/00927877808822318</a></p>
</blockquote>
<p>This paper is not about this problem, but instead concerns itself with the model theory of multiplicative groups of fields.
Here is more or less the result we care about.</p>
<blockquote>
<p><strong>Lemma 2.</strong> There exists a field <script type="math/tex">F</script> such that <script type="math/tex">F^\times</script> is torsion-free and divisible.</p>
</blockquote>
<p>To prove this, we’re going to need ultraproducts.
Let <script type="math/tex">P = \{2,3,5,...\}</script> be the set of primes.
A <strong>nonprincipal ultrafilter</strong> on <script type="math/tex">P</script> is a collection <script type="math/tex">U \subseteq 2^P</script>
of subsets satisfying:</p>
<ol>
<li>for all <script type="math/tex">A \subseteq P</script>, exactly one of <script type="math/tex">A</script> and <script type="math/tex">P \smallsetminus A</script> belongs to <script type="math/tex">U</script>,</li>
<li>if <script type="math/tex">A \in U</script> and <script type="math/tex">A \subseteq B</script>, then <script type="math/tex">B \in U</script>,</li>
<li>if <script type="math/tex">A,B \in U</script> then <script type="math/tex">A \cap B \in U</script>,</li>
<li>no finite set belongs to <script type="math/tex">U</script>.</li>
</ol>
<p>Intuitively <script type="math/tex">U</script> is a schema for classifying every partition of <script type="math/tex">P</script> as having a small side and a big side (that’s condition 1) in a coherent way (that’s 2 and 3).
Condition 4 is the additional stipulation that you’re not cheating,
where by cheating I mean saying something like “a subset is <em>big</em> iff it contains my favourite element <script type="math/tex">p_0 \in P</script>”.
Proving the existence of nonprincipal ultrafilters on arbitrary infinite sets requires some form of the Axiom of Choice.
But we don’t care about that so we’ll just assume it.</p>
<p>Now for each <script type="math/tex">p \in P</script>, let <script type="math/tex">K_p = \F_{2^p}</script>.
We’re going to use the ultrafilter to take a quotient on the ring-theoretic direct product <script type="math/tex">\prod_p K_p</script>, whose elements are <script type="math/tex">P</script>-indexed sequences <script type="math/tex">a = (a_p)_{p \in P}</script>.
Namely, we’ll say two elements <script type="math/tex">a,b \in \prod_p K_p</script> are <em><script type="math/tex">U</script>-mostly equal</em> if they agree on a <script type="math/tex">U</script>-big set, that is, <script type="math/tex">\{ p \in P : a_p = b_p \} \in U</script>.
The quotient <script type="math/tex">F_0 = \prod_p K_p \big/ U</script> of the direct product by this equivalence relation is called an <strong>ultraproduct</strong>, and can be endowed with ring structure of its own.</p>
<p>What makes ultraproducts useful is the <em>Fundamental Theorem of Ultraproducts</em>, due to Łoś.
It states that a first-order formula is true in <script type="math/tex">F_0</script> iff it is true for a <script type="math/tex">U</script>-big collection of the <script type="math/tex">K_p</script>.
The proof is an unenlightening structural induction, stemming from the definition of the quotient.
In any case, we see that <script type="math/tex">F_0</script> is a field, because for every element <script type="math/tex">a \in F_0</script>, either it has <script type="math/tex">U</script>-many zeroes and hence is equal to zero,
or it has <script type="math/tex">U</script>-many nonzeroes and we only have to invert those.
In fact, we can identify <script type="math/tex">F_0^\times</script> with <script type="math/tex">\prod_p K_p^\times/U</script> by the same reasoning.</p>
<p>If it wasn’t yet clear from context, this ultraproduct is our candidate for field whose multiplicative group is torsion-free and divisible.
It remains to show that for all <script type="math/tex">n \ge 2</script>, <script type="math/tex">\forall x\mathpunct:(x^n = 1 \to x = 1)</script> and <script type="math/tex">\forall x\mathpunct:\exists y\mathpunct: x = y^n</script>.
And this is where we use the cleverness of our definition of <script type="math/tex">K_p</script>.</p>
<blockquote>
<p><strong>Proposition.</strong> If <script type="math/tex">p</script> and <script type="math/tex">q</script> are distinct primes, then <script type="math/tex">\card{K_p} = 2^p - 1</script> and <script type="math/tex">\card{K_q} = 2^q - 1</script> are relatively prime.</p>
<p><strong>Proof.</strong> Run the Euclidean algorithm on <script type="math/tex">p > q</script>.
Because they are prime, they are relatively prime,
and the sequence of remainders will end in 1.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
p &= d_0q + r_0 & 2^p - 1 &= \biggl( 2^{r_0} \frac{2^{d_0q}-1}{2^q-1} \biggr) (2^q - 1) + 2^{r_0} - 1 \\
q &= d_1r_0 + r_1 & 2^q - 1 &= \biggl( 2^{r_1} \frac{2^{d_1r_0}-1}{2^{r_1}-1} \biggr) (2^{r_0} - 1) + 2^{r_1} - 1 \\
r_0 &= d_2r_1 + r_2 & 2^{r_0} - 1 &= \biggl( 2^{r_2} \frac{2^{d_2r_1}-1}{2^{r_2}-1} \biggr) (2^{r_1} - 1) + 2^{r_2} - 1 \\
&\;\mathrel{\smash{\vdots}} &&\;\mathrel{\smash{\vdots}}
\end{align*} %]]></script>
<p>Then we can run a parallel computation that implies
the GCD of <script type="math/tex">2^p-1</script> and <script type="math/tex">2^q-1</script> is equal to <script type="math/tex">2^{\gcd(p,q)} - 1 = 2^1 - 1 = 1</script>,
as shown above. ∎</p>
</blockquote>
<p>From the proposition, it follows that for all <script type="math/tex">n \ge 2</script>, there are only finitely many <script type="math/tex">p \in P</script> such that
the <script type="math/tex">K_p</script> has nontrivial <script type="math/tex">n</script>-torsion or <script type="math/tex">n</script>-th roots don’t exist.
(This is some easy group theory, and if you really want to be pedantic then it’s an exercise.)
All finite subsets of <script type="math/tex">P</script> are <script type="math/tex">U</script>-small, so by the Fundamental Theorem of Ultraproducts,
<script type="math/tex">F_0^\times</script> satisfies both <script type="math/tex">\forall x\mathpunct:(x^n = 1 \to x = 1)</script> and <script type="math/tex">\forall x\mathpunct:\exists y\mathpunct: x = y^n</script> for all <script type="math/tex">n \ge 2</script>.
It follows that <script type="math/tex">F_0^\times</script> is a rational vector space.
This concludes the proof of Lemma 2. ∎</p>
<p>There easily exists a field <script type="math/tex">E</script> whose additive group is isomorphic to <script type="math/tex">F_0^\times</script>—just take a field extension of <script type="math/tex">\Q</script> of appropriate degree.
So we have found <em>one</em> pair of infinite fields with the desired property.</p>
<blockquote>
<p><strong>Exercise.</strong> <em>(For those of you who like ultraproducts.)</em>
Show that there is an ultrafilter <script type="math/tex">U'</script> on the set of prime powers <script type="math/tex">Q</script>
such that the multiplicative group of the ultraproduct <script type="math/tex">\prod_q \F_q \big/ U'</script> is divisible and torsion-free.</p>
</blockquote>
<hr />
<p>Are there any more? Well, this is where it gets a bit silly.</p>
<p>The theory of fields whose multiplicative groups are torsion-free and divisible is a countable first-order theory.
And we have a model <script type="math/tex">F_0</script> of cardinality <script type="math/tex">\card{F_0} = 2^{\aleph_0}</script>.
So by <a href="https://en.wikipedia.org/wiki/L%C3%B6wenheim%E2%80%93Skolem_theorem">Löwenheim–Skolem</a>, we have models of every infinite cardinality.</p>
<p>If the cardinality is uncountable, then the cardinality equals the dimension of the multiplicative group as a rational vector space.
For the model of countable cardinality, we need to take a bit more care to figure out its dimension.</p>
<blockquote>
<p><strong>Exercise.</strong> Let <script type="math/tex">a</script> be algebraic over a finite field <script type="math/tex">\F_p</script>.
Then <script type="math/tex">a</script> has finite multiplicative order.</p>
</blockquote>
<blockquote>
<p><strong>Proposition.</strong> For finite <script type="math/tex">n \ge 1</script>,
there does not exist an <script type="math/tex">F</script> such that <script type="math/tex">F^\times \cong \Q^n</script>.</p>
<p><strong>Proof.</strong> <em>(Due to math.SE user <a href="http://math.stackexchange.com/a/2132068/18850">Starfall</a>.)</em>
We have seen that <script type="math/tex">F</script> must have characteristic two.
If <script type="math/tex">a \in F^\times</script> is algebraic over the prime field <script type="math/tex">\F_2</script>,
then by the exercise it has finite order, and hence must be 1.
So there exists a transcendental over <script type="math/tex">\F_2</script>.
Thus there is an embedding of fields <script type="math/tex">\F_2(t) \hookrightarrow F</script>.</p>
<p><script type="math/tex">\F_2[t]</script> is a PID, so <script type="math/tex">\F_2(t)^\times \cong \def\Z{\mathbb Z}\Z^{(\aleph_0)}</script>,
the direct sum of countably many copies of <script type="math/tex">\Z</script>.
So the field embedding induces an embedding of groups
<script type="math/tex">\Z^{(\aleph_0)} \cong \F_2(t)^\times \hookrightarrow F^\times \cong \Q^n</script>.
The domain has <script type="math/tex">\Z</script>-linearly independent sets of arbitrary size,
but the codomain is too small for that, so this is a contradiction. ∎</p>
</blockquote>
<p>Using this proposition we can conclude that the countable model <script type="math/tex">F</script> must have <script type="math/tex">F^\times \cong \Q^{(\aleph_0)}</script>.</p>
<p>As mentioned previously, finding fields <script type="math/tex">E</script> with <script type="math/tex">E^+</script> isomorphic to any particular rational vector space is very easy—just toss in an appropriate number of indeterminates.
So we can say that we have characterized all pairs <script type="math/tex">E^+ \cong F^\times</script> up to group isomorphism:</p>
<ul>
<li>If <script type="math/tex">E</script> has characteristic <script type="math/tex">p > 0</script> and <script type="math/tex">p+1</script> is a prime power, then <script type="math/tex">E = \F_p</script> and <script type="math/tex">F = \F_{p+1}</script>.</li>
<li>if <script type="math/tex">E</script> has characteristic <script type="math/tex">0</script> and <script type="math/tex">\dim E/\Q \ge \aleph_0</script>, then there exists an <script type="math/tex">F</script> by the magic of Löwenheim–Skolem.</li>
</ul>
<p>Let me close with a silly unrelated coincidence.
While researching ultraproducts, in order to digest Allan Adler’s paper,
I came across another short paper by a contemporary, <em>Andrew</em> Adler, on cardinalities of ultraproducts.</p>
<blockquote>
<p>Adler, Andrew.
<em>The cardinality of ultrapowers—an example.</em>
Proc. Amer. Math. Soc. 28 (1971), 311-312.
<a href="https://doi.org/10.1090/S0002-9939-1971-0280361-9">doi:10.1090/S0002-9939-1971-0280361-9</a>.</p>
</blockquote>Ilia ChtcherbakovThe following post is a digested version of a question I asked on math.SE a few months ago.
To every field , we can associate two natural groups.
These are the additive group and the multiplicative group of units .
A fun question to ask, especially of someone just getting started on basic group and ring theory, is whether or not these two groups are ever isomorphic for any field.
If you haven’t seen this question before, feel free to try it yourself!Prime Filters in Distributive Lattices II2017-01-30T23:51:33-05:002017-01-30T23:51:33-05:00http://cleare.st/math/prime-filters-in-distributive-lattices-2<p>Recall from <a href="/math/prime-filters-in-distributive-lattices">PFDL I</a>, I introduced distributive lattices and filters, and we proved the easy direction of a characterization of Boolean algebras.
Today I’ll detail a proof of the tougher and far more obscure converse—it involves some sneaky technology from formal logic.</p>
<p>Theorem 1 states that, in a Boolean algebra, every (nonempty) prime filter is an ultrafilter.
Its converse is as follows:
<!--more--></p>
<blockquote>
<p><strong>Theorem 2.</strong> Let <script type="math/tex">L</script> be a (distributive, bounded) lattice.
If every nonempty prime filter in <script type="math/tex">L</script> is an ultrafilter,
then <script type="math/tex">L</script> is a Boolean algebra.</p>
</blockquote>
<p>The goal of a proof of Theorem 2 is to find a complement to any given element.
That is, for any <script type="math/tex">a \in L</script>, we want a <script type="math/tex">b \in L</script> such that <script type="math/tex">a \wedge b = \bot</script> and <script type="math/tex">a \vee b = \top</script>.
There is no hope of doing this constructively, given our hypotheses, so we’ll instead try to show that if no element does both, there is a contradiction.</p>
<p>To show this, I’ll need to introduce some technology.
A lot of the following is adapted from Chapter 5 of <em>Introduction to Substructural Logic</em> by Restall, but it has a somewhat different focus.
I can’t in all honesty recommend the book, unless you have an intense interest in nonclassical logic.</p>
<p>A <strong>partial filter-ideal pair</strong> or <strong>PFI-pair</strong> is a pair <script type="math/tex">(F,I)</script> of subsets of <script type="math/tex">L</script>, such that there do not exist <script type="math/tex">f_1, ..., f_m \in F</script> and <script type="math/tex">i_1, ..., i_n \in I</script> satisfying <script type="math/tex">f_1 \wedge \cdots \wedge f_m \le i_1 \vee \cdots \vee i_m</script>.
We will abbreviate this as <script type="math/tex">m \nleq j</script> for <script type="math/tex">\def\finwedge{\bigwedge^{\text{fin}}}m \in \finwedge F</script> the collection of all finite meets in <script type="math/tex">F</script>, and <script type="math/tex">\def\finvee{\bigvee^{\text{fin}}}j \in \finvee I</script> the finite joins in <script type="math/tex">I</script>.
It follows from the reflexivity of <script type="math/tex">\le</script> that <script type="math/tex">F</script> and <script type="math/tex">I</script> must be disjoint.</p>
<p>A PFI-pair <script type="math/tex">(F,I)</script> is <strong>full</strong> if <script type="math/tex">F \cup I = L</script>.
The following should explain my nomenclature, though not the unfortunate redundancy in “full partial filter-ideal pair”.</p>
<blockquote>
<p><strong>Exercise.</strong> If <script type="math/tex">(F,I)</script> is a full PFI-pair, then <script type="math/tex">F</script> is a filter and <script type="math/tex">I</script> is an <em>ideal</em>, that is, a filter in the dual <script type="math/tex">L^*</script>.
Furthermore, if <script type="math/tex">I</script> is nonempty then <script type="math/tex">F</script> is prime, and if <script type="math/tex">F</script> is nonempty then <script type="math/tex">I</script> is prime (as a filter in <script type="math/tex">L^*</script>).</p>
</blockquote>
<p>Say that a pair of sets <script type="math/tex">(S,T)</script> <strong>extends</strong> another pair <script type="math/tex">(s,t)</script> if <script type="math/tex">S \supseteq s</script> and <script type="math/tex">T \supseteq t</script>.
Clearly, if <script type="math/tex">(F',I')</script> is a PFI-pair and extends <script type="math/tex">(F,I)</script>, then <script type="math/tex">(F,I)</script> is also a PFI-pair.
We would like the ability to do the opposite: start with a PFI-pair and extend it <em>to</em> a larger one.
Ideally a full pair, but let’s take it one step at a time.</p>
<blockquote>
<p><strong>Step Lemma.</strong> If <script type="math/tex">(F,I)</script> is a PFI-pair and <script type="math/tex">x \notin F \cup I</script>,
then one of <script type="math/tex">(F \cup \{x\},I)</script> or <script type="math/tex">(F,I \cup \{x\})</script> is a PFI-pair.</p>
<p><strong>Proof.</strong> Suppose that <script type="math/tex">(F,I)</script> is a PFI-pair but neither extension is.
Then there are <script type="math/tex">m,m' \in \finwedge F</script> and <script type="math/tex">j,j' \in \finvee I</script> such that <script type="math/tex">m \wedge x \le j</script> and <script type="math/tex">m' \le x \vee j'</script>.
Then</p>
<script type="math/tex; mode=display">m \wedge m' \le m \wedge (x \vee j') = (m \wedge x) \vee (m \wedge j') \le j \vee j',</script>
<p>contradicting the assumption that <script type="math/tex">(F,I)</script> is a PFI-pair. ∎</p>
</blockquote>
<p>Right on. Now we can just hit this sucker with Zorn’s Lemma.</p>
<blockquote>
<p><strong>Pair Extension Theorem.</strong> <em>(Belnap, 1970s)</em> Every PFI-pair is extended by some full PFI-pair.</p>
<p><strong>Proof.</strong> Let <script type="math/tex">P</script> be the poset of PFI-pairs extending <script type="math/tex">(F_0,I_0)</script>, ordered by extension.
If <script type="math/tex">C \subseteq P</script> is a chain, then <script type="math/tex">\bigl( \bigcup\{ F : (F,I) \in C \}, \bigcup\{ I : (F,I) \in C \} \bigr)</script> is an upper bound,
because some any violation <script type="math/tex">f \le i</script> of the PFI-pair condition can also be found in some sufficiently large element of <script type="math/tex">C</script>, contradicting its membership in <script type="math/tex">P</script>.
So by Zorn’s Lemma, <script type="math/tex">P</script> has an maximal element <script type="math/tex">(F, I)</script>.
The PFI-pair <script type="math/tex">(F, I)</script> must be full, for fear of contradicting the Step Lemma. ∎</p>
</blockquote>
<p>For those of you who are afraid of very infinite sets: if <script type="math/tex">L</script> is countable then we can get by by listing its elements and iterating the Step Lemma to infinity.</p>
<hr />
<p>Now we have all the tools we need to prove Theorem 2. So let’s do that.</p>
<p>Let <script type="math/tex">x \in L</script>. The goal is to find a complement for <script type="math/tex">x</script>, that is, letting</p>
<script type="math/tex; mode=display">A = \{ a \in L : x \vee a = \top \} \quad \text{and} \quad B = \{ b \in L : x \wedge b = \bot \},</script>
<p>to show that <script type="math/tex">A \cap B</script> is nonempty.
These sets are rather mysterious, so we need to learn some more about them.</p>
<blockquote>
<p><strong>Exercise.</strong> <script type="math/tex">A</script> is a nonempty filter and
<script type="math/tex">B</script> is a nonempty ideal (downward-closed and upward-directed).</p>
</blockquote>
<p>That’s pretty good. But it gets better: if <script type="math/tex">a \in A</script> and <script type="math/tex">b \in B</script>, then</p>
<script type="math/tex; mode=display">b = \top \wedge b = (a \vee x) \wedge b = (a \wedge b) \vee (x \wedge b) = (a \wedge b) \vee \bot = a \wedge b \le a.</script>
<p>It easily follows that if <script type="math/tex">A</script> and <script type="math/tex">B</script> intersect, they do so in precisely one element, as we would expect.
Suppose now for a contradiction that <script type="math/tex">A</script> and <script type="math/tex">B</script> are disjoint.</p>
<blockquote>
<p><strong>Claim.</strong> <script type="math/tex">(A, B \cup \{x\})</script> is a PFI-pair.</p>
<p><strong>Proof.</strong> Otherwise, there exist <script type="math/tex">a \in A</script> and <script type="math/tex">b \in B</script> such that either <script type="math/tex">a \le b</script> or <script type="math/tex">a \le b \vee x</script>.
The former is impossible, because <script type="math/tex">a \ge b</script>,
and the latter is too since it implies <script type="math/tex">b \vee x \ge a \vee x = \top</script>, i.e. <script type="math/tex">b \in A</script>. ∎</p>
</blockquote>
<p>So by the Pair Extension Theorem, there exists a full pair <script type="math/tex">(F, I)</script> extending <script type="math/tex">(A, B \cup \{x\})</script>.
Then <script type="math/tex">F</script> is a prime filter, and consequently an ultrafilter by hypothesis.
But that means <script type="math/tex">f \wedge x = \bot</script> for some <script type="math/tex">f \in F</script>.
Then <script type="math/tex">f \in B \subseteq I</script>, contradicting the fact that <script type="math/tex">F</script> and <script type="math/tex">I</script> are disjoint.</p>
<p>Thus, <script type="math/tex">A \cap B = \{y\}</script>. <script type="math/tex">y</script> is a complement for <script type="math/tex">x</script> by definition.
<script type="math/tex">x \in L</script> was arbitrary, so <script type="math/tex">L</script> is a Boolean algebra.
This concludes the proof of Theorem 2. ∎</p>
<hr />
<p>I have tried my best to make that argument is as clear and intuitive as I could.
Unfortunately, I don’t think there’s a way around the use of PFI-pairs.
If you’ll permit me to try my hand at an intuitive explanation of these objects, please read on.</p>
<p>Just as in the construction of the integers from the naturals, or of the rationals from the integers,
one often has to emulate some sort of oppositeness or negation by carrying around both the positive and negative parts, collected in an ordered pair.
Here, the task in logic originally solved by PFI-pairs is to show there exist “prime theories”—basically prime filters, except living in the “quasilattice” of propositions in logics with conjunction and disjunction—possibly satisfying some mild desired conditions.</p>
<p>The use of prime theories, in turn, was to coherently select exactly one proposition from every pair of complements, even in logics without a negation.
See, the theories of a logic are, in a sense, those collections of propositions that could coherently be considered true.
Primality is then the assertion that this truth must be <em>witnessed</em>.
For if <script type="math/tex">\top \vdash A \vee B</script> then a prime theory must contain at least one of <script type="math/tex">A</script> or <script type="math/tex">B</script>,
even if it cannot contain both without being trivial, due to e.g. <script type="math/tex">A \wedge B \vdash \bot</script>.</p>
<p>Logician really love to require that disjunctions be witnessed, because that meshes with our intuitive understanding of the notion of alternatives.
Even if it sometimes makes the math harder.
I can probably come up with a lot of examples involving de Morgan’s law and/or frame semantics but I think I’ll stop before I drown the mathematics in my <a href="https://en.wikipedia.org/wiki/Omphaloskepsis">omphaloskepsis</a>.</p>Ilia ChtcherbakovRecall from PFDL I, I introduced distributive lattices and filters, and we proved the easy direction of a characterization of Boolean algebras.
Today I’ll detail a proof of the tougher and far more obscure converse—it involves some sneaky technology from formal logic.
Theorem 1 states that, in a Boolean algebra, every (nonempty) prime filter is an ultrafilter.
Its converse is as follows:Prime Filters in Distributive Lattices I2017-01-29T22:30:07-05:002017-01-29T22:30:07-05:00http://cleare.st/math/prime-filters-in-distributive-lattices<p>I’d like to talk about some results pertaining to distributive lattices.
In particular, there’s this one interesting theorem about Boolean algebras I’ve been thinking about lately.
One direction is reasonably famous, pretty useful and not very hard to prove, so I’ll cover that.
But what I really wanna talk about is the converse direction, which is a result that almost nobody I know has ever heard of, and is impossible to find anything about on the internet.
<!--more--></p>
<p>However, since lattice theory isn’t really all that popular among undergrads, there’s a lot of ground to cover, and I think I’ll have to spend a post or so winding up to it, so as to break it all up into digestible chunks.
Some familiarity with the theory of lattices would be nice, but because I would otherwise be stuck in a limbo of “too much content for one post, not enough for two”, I’m going to mention everything I’ll use.
So this should be more or less self-contained, though I might go a little fast.</p>
<hr />
<p>A <strong>lattice</strong> is a set with a partial order <script type="math/tex">\le</script> and well-defined <em>greatest lower bound</em> and <em>least upper bound</em> operations <script type="math/tex">\wedge</script> and <script type="math/tex">\vee</script>, called <strong>meet</strong> and <strong>join</strong>, respectively.
To wit, <script type="math/tex">x \le a \wedge b</script> iff <script type="math/tex">x \le a</script> and <script type="math/tex">x \le b</script>, and likewise <script type="math/tex">a \vee b \le y</script> is iff <script type="math/tex">a \le y</script> and <script type="math/tex">b \le y</script>.
If there exist a global lower bound, we denote it by <script type="math/tex">\bot</script> and say the lattice is <strong>bounded below</strong>.
Similarly, the global upper bound is denoted by <script type="math/tex">\top</script> if it exists and the lattice is <strong>bounded above</strong>.
The lattice is <strong>bounded</strong> if it is bounded in both directions.</p>
<p>Here are a few handy elementary facts about lattices.
If the operations <script type="math/tex">\wedge</script> and <script type="math/tex">\vee</script> are well-defined, they are unique.
Also, meets and joins of any finite set are well-defined, by induction.
Finally, note that <script type="math/tex">a \le b</script> iff <script type="math/tex">a = a \wedge b</script>,
and that <script type="math/tex">a \wedge (a \vee b) = a</script>.</p>
<p>A lattice is <strong>distributive</strong> if <script type="math/tex">\wedge</script> distributes over <script type="math/tex">\vee</script>.
Not every lattice is distributive, but I will only care about distributive lattices in what follows.</p>
<p>If <script type="math/tex">(L,{\le},{\wedge},{\vee})</script> is a lattice, then <script type="math/tex">L^* = (L,{\ge},{\vee},{\wedge})</script> is called the <em>dual lattice</em> of <script type="math/tex">L</script>.
It is precisely the lattice <script type="math/tex">L</script>, flipped upside-down.
It is a standard but somewhat unpleasant result of lattices that if <script type="math/tex">L</script> is distributive, it is <em>dual distributive</em>, i.e. <script type="math/tex">\vee</script> distributes over <script type="math/tex">\wedge</script>.</p>
<blockquote>
<p><strong>Proposition.</strong> If <script type="math/tex">L</script> is distributive, so is <script type="math/tex">L^*</script>.</p>
<p><strong>Proof.</strong> This is just a big disgusting computation. Let <script type="math/tex">a,b,c \in L</script>.
First note that <script type="math/tex">a \le a \vee b</script> and <script type="math/tex">b \wedge c \le b \le a \vee b</script>,
so that <script type="math/tex">a \vee (b \wedge c) \le a \vee b</script>.
By the symmetry in <script type="math/tex">b</script> and <script type="math/tex">c</script>,</p>
<script type="math/tex; mode=display">a \vee (b \wedge c) \le (a \vee b) \wedge (a \vee c).</script>
<p>This is called <em>weak dual distributivity</em>.
Now observe that if <script type="math/tex">x \le z</script>, then</p>
<script type="math/tex; mode=display">x \wedge (y \vee z) = (x \wedge y) \vee (x \wedge z) = (x \wedge y) \vee z</script>
<p>by distributivity. This is called the <em>modular law</em>.
We now compute</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
(a \vee c) \wedge (b \vee c)
&= (c \vee a) \wedge (a \vee c) \wedge (b \vee c) \\
&= \bigl( (a \wedge b) \vee c \vee a \bigr) \wedge (a \vee c) \wedge (b \vee c) \\
&= (a \wedge b) \vee c \vee \bigl( a \wedge (a \vee c) \wedge (b \vee c) \bigr) &(*) \\
&= (a \wedge b) \vee c \vee \bigl( a \wedge (b \vee c) \bigr) \\
&= (a \wedge b) \vee c \vee (a \wedge b) \vee (a \wedge c) \\
&= (a \wedge b) \vee c,
\end{align*} %]]></script>
<p>where <script type="math/tex">(*)</script> is an application of the modular law to weak dual distributivity. ∎</p>
</blockquote>
<p>Henceforth, let <script type="math/tex">L</script> be a distributive bounded lattice.</p>
<p><script type="math/tex">L</script> is a <strong>Boolean algebra</strong> if every element is <strong>complemented</strong>, i.e. for every <script type="math/tex">x \in L</script> there is a <script type="math/tex">y \in L</script> such that <script type="math/tex">x \wedge y = \bot</script> and <script type="math/tex">x \vee y = \top</script>.
The familiar lattice <script type="math/tex">2^X</script> of subsets of some set <script type="math/tex">X</script>, ordered by inclusion, is a Boolean algebra where complementation is given by the set complement <script type="math/tex">A \mapsto X \smallsetminus A</script>.
I’ll call this the powerset algebra of <script type="math/tex">X</script>.</p>
<p>In a sense, powerset algebras are the prototypical Boolean algebras.
It is fairly easy to see that every finite Boolean algebra is a powerset algebra.</p>
<p>Sure, not every Boolean algebra is a powerset algebra—infinite powerset algebras must have uncountably many elements, but the sublattice of <script type="math/tex">2^{\mathbb N}</script> containing only finite and cofinite sets is a countable Boolean algebra—but thanks to Birkhoff we know every Boolean algebra must be a sublattice of a powerset algebra.
So there’s no harm in thinking of them as collections of sets with the usual set-lattice operations.</p>
<hr />
<p>We will also need to know about filters.
A <strong>filter</strong> is a subset <script type="math/tex">F \subseteq L</script> which is upward-closed and downward-directed.
To be explicit: if <script type="math/tex">a \in F</script> and <script type="math/tex">a \le a'</script>, then <script type="math/tex">a' \in F</script>; and if <script type="math/tex">a,b \in F</script>, then <script type="math/tex">a \wedge b \in F</script>.
Filters can be defined more generally for any poset, but this way of stating it will be convenient for our purposes.
A filter is <em>nonempty</em> iff it contains <script type="math/tex">\top</script>, and <em>nonfull</em> iff it does not contain <script type="math/tex">\bot</script>.</p>
<p>A nonfull filter <script type="math/tex">F</script> is <strong>prime</strong> if whenever <script type="math/tex">a \vee b \in F</script>, either <script type="math/tex">a \in F</script> or <script type="math/tex">b \in F</script>.
A filter is an <strong>ultrafilter</strong> if it is maximal among nonfull filters, i.e. any filter properly containing it is full.</p>
<blockquote>
<p><strong>Proposition.</strong> Every ultrafilter is prime.</p>
<p><strong>Proof.</strong> Let <script type="math/tex">F</script> be an ultrafilter, and suppose for a contradiction that it is not prime.
Then there are <script type="math/tex">a,b \notin F</script> such that <script type="math/tex">a \vee b \in F</script>.
Let <script type="math/tex">F' = F \cup \{ y : y \ge a \wedge x\ \text{for some}\ x \in F \}</script>.
This is clearly<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> a filter, and since <script type="math/tex">a = a \wedge (a \vee b) \in F'</script>,
we have that <script type="math/tex">F \subsetneq F'</script>.</p>
<p>Since <script type="math/tex">F</script> was maximal among nonfull filters, <script type="math/tex">F'</script> is full, so in particular <script type="math/tex">b \in F'</script>.
But <script type="math/tex">b \notin F</script>, so <script type="math/tex">b \ge a \wedge x</script> for some <script type="math/tex">x \in F</script>.
Then by distributivity,</p>
<script type="math/tex; mode=display">(a \vee b) \wedge x = (a \wedge x) \vee (b \wedge x) \le b \vee (b \wedge x) = b.</script>
<p>Since <script type="math/tex">a \vee b</script> and <script type="math/tex">x</script> are both members of <script type="math/tex">F</script>,
it follows that <script type="math/tex">b \in F</script>, which is a contradiction. ∎</p>
</blockquote>
<p>Note that we did not use the boundedness of <script type="math/tex">L</script>.</p>
<blockquote>
<p><strong>Theorem 1.</strong> In a Boolean algebra, every nonempty prime filter is an ultrafilter.</p>
<p><strong>Proof.</strong> Let <script type="math/tex">F</script> be a nonempty prime filter and let <script type="math/tex">F' \supsetneq F</script>.
Then there exists <script type="math/tex">x \in F' \smallsetminus F</script>.
<script type="math/tex">x</script> has a complement <script type="math/tex">\neg x</script>, and <script type="math/tex">x \vee \neg x = \top \in F</script>.
Thus, <script type="math/tex">\neg x \in F</script>. But then both <script type="math/tex">x, \neg x \in F'</script>, so <script type="math/tex">\bot = x \wedge \neg x \in F'</script>.
Every filter strictly containing <script type="math/tex">F</script> is full, so <script type="math/tex">F</script> is maximal among nonfull filters. ∎</p>
</blockquote>
<p>This is the ostensibly famous theorem I mentioned.
Granted, not everybody cares about filters, but definitely functional analysts do.
And I’m sure there are still some ring and model theorists who study ultraproducts.</p>
<hr />
<p>The converse to Theorem 1 that I’m thinking of is the following:</p>
<blockquote>
<p><strong>Theorem 2.</strong> Let <script type="math/tex">L</script> be a lattice.
If every nonempty prime filter in <script type="math/tex">L</script> is an ultrafilter,
then <script type="math/tex">L</script> is a Boolean algebra.</p>
</blockquote>
<p>This is substantially harder to show than Theorem 1, and I will need to introduce a bit of technology first.
Well, it’s more than a bit, in all honesty, so I think I’ll save that for a second post.</p>
<p>I only know two people that have seen a proof of this result: myself and the logic professor who assigned it to me as homework.
Hopefully this ‘blog post and its sequel will correct that.</p>
<p><em>This is the first post in a series. <a href="/math/prime-filters-in-distributive-lattices-2">Here is the next.</a></em></p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>My original proof here was not as clear as I thought, because I gave a bad definition of <script type="math/tex">F'</script>. Thanks to Niall Mitchell for catching that mistake. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Ilia ChtcherbakovI’d like to talk about some results pertaining to distributive lattices.
In particular, there’s this one interesting theorem about Boolean algebras I’ve been thinking about lately.
One direction is reasonably famous, pretty useful and not very hard to prove, so I’ll cover that.
But what I really wanna talk about is the converse direction, which is a result that almost nobody I know has ever heard of, and is impossible to find anything about on the internet.Why is a group?2017-01-25T18:35:40-05:002017-01-25T18:35:40-05:00http://cleare.st/math/why-is-a-group<p>Often when people talk about groups, they say something like: groups are objects that encode the notion of symmetry.
After working a bit with groups and group actions, it’s easy to convince yourself this is the case,
but this sort of <em>a posteriori</em> explanation might seem a little circular—at least, it does to me.
<!--more--></p>
<p>For those who haven’t heard this, these next few sentences are for you.
A <em>group</em> is an object encoding a possible kind of symmetry.
The symmetries understood by a group <script type="math/tex">G</script> are seen (or more evocatively, realized or effected) via <em>group actions</em> of <script type="math/tex">G</script> on an object <script type="math/tex">X</script>,
and sufficiently good actions are branded with adjectives like <em>transitive</em> or <em>faithful</em> or <em>regular</em>.
We record that a set enjoys some particular <script type="math/tex">G</script>-symmetry by equipping it with a representative <script type="math/tex">G</script>-action, and call the result a <em><script type="math/tex">G</script>-set</em>.</p>
<p>What follows is (my attempt at) a more intrinsic explanation.
Readers familiar with category theory might accuse me of cheating here, because it looks like I’m just reading off the categorical model of a group,
but I claim that that’s just a consequence of category theory being such a natural abstraction.
Plus I’m gonna talk about semigroups later and categories can’t handle that so eat it, nerd.</p>
<hr />
<p>Consider an object <script type="math/tex">X</script>. In order not to get too stuck on explaining why objects should be modelled as sets, and to not have to appeal to something like topos theory, I’ll treat this notion as a black box.
But objects are almost always modelled as sets, so think of <script type="math/tex">X</script> as a set.</p>
<p>We will say a <strong>symmetry</strong> of <script type="math/tex">X</script> is a transformation of <script type="math/tex">X</script>—for instance, a function from <script type="math/tex">X</script> to itself, if it were a set—such that the image is “the same” as <script type="math/tex">X</script>.
That is, after we apply the transformation, we recognize that <script type="math/tex">X</script> has remained unchanged in some way.
Note that we want “is the same as” to be an equivalence relation, because it would be really weird if it wasn’t.
For instance, the identity map <script type="math/tex">\def\id{\mathrm{id}}\id_X : X \to X</script> should be a symmetry, because clearly no change at all has occurred to <script type="math/tex">X</script>.</p>
<p>We will ask that the collection of symmetries of <script type="math/tex">X</script> obey two natural rules.
First, if <script type="math/tex">f</script> and <script type="math/tex">g</script> are two symmetries of <script type="math/tex">X</script>, then their composition <script type="math/tex">f \circ g</script> should also be a symmetry.
If <script type="math/tex">X</script> looks the same after applying <script type="math/tex">g</script>, we should be able to apply <script type="math/tex">f</script> afterwards, and the result should again not change in any meaningful way.
Note that function composition is an associative operation.</p>
<p>Second, we will ask that if the transformation <script type="math/tex">f</script> is a symmetry, then it can be undone,
i.e. there exists a symmetry <script type="math/tex">f^{-1}</script> such that <script type="math/tex">f^{-1} \circ f</script> is the identity transformation.
This make sense too: if <script type="math/tex">X</script> looks the same as <script type="math/tex">f(X)</script>, then we can apply <script type="math/tex">f^{-1}</script> to find that
<script type="math/tex">f^{-1}(X)</script> should look the same as <script type="math/tex">f^{-1}(f(X)) = (f^{-1} \circ f)(X) = \id_X(X) = X</script>.
Since “looks the same as” is an equivalence relation, it’s symmetric, so <script type="math/tex">X</script> looks the same as <script type="math/tex">f^{-1}(X)</script>.
Observe now that <script type="math/tex">f \circ f^{-1}</script> is a symmetry, so it has an inverse, and then we can prove that</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
f \circ f^{-1} &= \id_X \circ (f \circ f^{-1}) \\
&= (f \circ f^{-1})^{-1} \circ (f \circ f^{-1}) \circ (f \circ f^{-1}) \\
&= (f \circ f^{-1})^{-1} \circ (f \circ (f^{-1} \circ f) \circ f^{-1}) \\
&= (f \circ f^{-1})^{-1} \circ (f \circ \id_X \circ f^{-1}) \\
&= (f \circ f^{-1})^{-1} \circ (f \circ f^{-1}) \\
&= \id_X,
\end{align*} %]]></script>
<p>so it doesn’t matter if we undo <script type="math/tex">f</script> beforehand or afterwards.</p>
<p>Let <script type="math/tex">S(X)</script> be the set of symmetries of <script type="math/tex">X</script>.
<script type="math/tex">S(X)</script> is closed under the associative binary operation <script type="math/tex">\circ</script>; and has an identity element <script type="math/tex">\id_X</script> with respect to that operation; and every element has a two-sided compositional inverse.
Furthermore, <script type="math/tex">S(X)</script> consists of maps <script type="math/tex">X \to X</script>, so there is a natural “action” <script type="math/tex">\def\End{\mathrm{End}}S(X) \to \End(X)</script>, which is just an inclusion, that associates each symmetry to the corresponding (endo)transformation of <script type="math/tex">X</script>.
Denoting with a slight abuse of notation the action of <script type="math/tex">f \in S(X)</script> on <script type="math/tex">X</script> by <script type="math/tex">\def\act{ {.}}f\act X</script>,
we verify <script type="math/tex">f\act(g\act X) = f\act g(X) = f(g(X)) = (f \circ g)(X) = (f \circ g)\act X</script> and <script type="math/tex">\id_X\act X = \id_X(X) = X</script>.</p>
<p>Because we can interface with <script type="math/tex">S(X)</script> through the action, it really doesn’t matter what the elements of <script type="math/tex">S(X)</script> are,
so long as we know how to associate them to transformations <script type="math/tex">\End(X)</script> that behave correctly.
A <strong>group</strong>, then, is an abstract version of a collection of symmetries.
Namely, it comprises a set <script type="math/tex">G</script> equipped with an associative binary operation <script type="math/tex">\cdot</script>, such that there exists an identity <script type="math/tex">e \in G</script>, and every element <script type="math/tex">g \in G</script> has an inverse <script type="math/tex">g^{-1}</script>.
<script type="math/tex">S(X)</script> is a group under the operation of composition.</p>
<p>Likewise, an abstract <strong>group action</strong> is an operation <script type="math/tex">G \to \End(X)</script> assigning elements of <script type="math/tex">G</script> to transformations of <script type="math/tex">X</script>.
<script type="math/tex">S(X)</script> acts naturally on <script type="math/tex">X</script> by embedding <script type="math/tex">S(X)</script> into <script type="math/tex">\End(X)</script>.</p>
<p>The astute will notice that the action of <script type="math/tex">S(X)</script> on <script type="math/tex">X</script> arising in this way is always <strong>effective</strong>,
i.e. that if two transformations that act the exact same way are equal.
Relaxing this in general for our definitions causes no harm, because after some basic group theory,
we find that any action <script type="math/tex">(.)</script> can be viewed as a homomorphism of groups, and thus has a <em>kernel</em> <script type="math/tex">K</script>.
The action is effective—more commonly referred to as <em>faithful</em>—iff <script type="math/tex">K</script> is trivial,
but if <script type="math/tex">K</script> is nontrivial, we can excise that redundancy by taking the (faithful) action of the <em>quotient group</em> <script type="math/tex">G/K</script> on <script type="math/tex">X</script>.</p>
<p>We can confirm that groups are precisely the objects that arise in this way,
because every group acts <em>regularly</em>—and hence faithfully—on itself by left multiplication.
Explicitly, we can model <script type="math/tex">(G,{\cdot})</script> as <script type="math/tex">(S(X),{\circ})</script> for <script type="math/tex">X = G</script> as a set, and taking its symmetries to be all and only the translations by elements of <script type="math/tex">G</script>.
This is precisely the content of <a href="https://en.wikipedia.org/wiki/Cayley%27s_theorem">Cayley’s theorem</a>.</p>
<hr />
<p>By relaxing our notion of symmetry, we can obtain more general objects.
For instance, if we relax the inverse condition, and replace the ‘same as’ relation by a more nebulous ‘part of’ relation, we end up simply looking at all endotransformations, and obtain <em>monoids</em> and monoid actions.</p>
<p>If we relax even further to merely <script type="math/tex">\circ</script>-closed subsets of transformations, then we obtain <em>semigroups</em> and semigroup actions.
The only surviving guarantee is the associativity provided by <script type="math/tex">\circ</script>.</p>
<p>As in the group case, there exists a Cayley theorem for semigroups,
realizing every semigroup <script type="math/tex">S</script> as an effective transformation semigroup of some object.
For the curious, the usual choice of object is the image of <script type="math/tex">S</script> under the free functor <script type="math/tex">\textsf{Semigroup} \to \textsf{Monoid}</script>, which adjoins a two-sided identity if none is present.
(This confirms the sneaking suspicion some of you might have that it doesn’t cause any trouble in general to make sure to include <script type="math/tex">\id_X</script> in the subcollection of transformations.)</p>Ilia ChtcherbakovOften when people talk about groups, they say something like: groups are objects that encode the notion of symmetry.
After working a bit with groups and group actions, it’s easy to convince yourself this is the case,
but this sort of a posteriori explanation might seem a little circular—at least, it does to me.