The cheapest path problem and idempotent semirings

Let me propose a interesting theoretical variant of the shortest path problem. You have a directed graph with source and target vertices, and every edge has a cost to traverse it. However, not every cost is in good old American dollars—some are in alternative currencies, and you don’t know what the exchange rates will be until the day of your trip. You can’t tell for sure how much it’s going to cost, but there’s still a significant amount of precomputation that you can do, and we’re going to employ some fun algebra to do it.

So here’s the setup. You have a directed graph $G$ and each edge $e \in E(G)$ has a cost $c_e$ . You can add costs together $c+c'$ , and you can compare costs, so we have an ordered monoid. To be precise, this is a monoid $(M, {\cdot})$ equipped with a partial order $\le$ such that if $m \le m'$ then $m \cdot n \le m' \cdot n$ and $n \cdot m \le n \cdot m'$ , for all $m, m', n \in M$ . I know I said you add costs together but I’m going to be calling the monoid operation $\cdot$ from now on, so, like, I’m sorry for deceiving you for five seconds. The monoid identity gonna be called $1$ , too, just so we’re clear.

Anyway, the task is to produce the set of all cheapest costs, which is to say, the set of all costs that are the cheapest in some linearization of the monoid $M$ : some total order $\le'$ extending the partial order. Such a total order is a characterization of prices as they are on the day of travel. The hope is that in a simple enough monoid, antichains would be reasonably small, so flattening an antichain to a final cost is not as complicated as the preprocessing that is done on the graph now.

Notice that we are not requesting very much of the monoid of costs: in the currency example it would be very reasonable to stipulate that $M$ be commutative ( $m \cdot n = n \cdot m$ ), positive ( $1 \le m$ ), etc. (Actually, positive is an interesting condition to keep in mind as we proceed, especially if the digraph contains any directed cycles.)

The most important thing to not request is that the order be total: we are not guaranteed to compare any two monoid elements. In fact, equality is a perfectly valid partial order—every monoid is an ordered monoid under “ $m \le n$ iff $m = n$ ”—and this represents the worst case of knowing absolutely nothing about the relative costs of anything, even compared to the trivial cost $1$ .

So what are we to do? The only thing we intuitively can do: carry around all the possible costs that we see, only throwing out those $c$ for which we find a provably better cost $c' < c$ for the same task. Then, I don’t know, I guess you can run some graph algo and just keep all the incomparable results you see and throw away the big ones and it probably works out, right?

Well, I mean, yeah, it does… but for some (arguably) subtle and (debatably) interesting reasons.

So let’s start by making this precise. When solving the graph problem, instead of carrying one cost $c$ , we want carry a subset of costs. And furthermore, whenever two costs are comparable—comparable by the order in the monoid and also algorithmically permitted to be comparable by having the same source and target—we keep the lesser. A more expensive subpath can’t possibly lead to a less expensive final path, so this checks out. So, we won’t be encountering just any old subsets of the costs, we will be dealing with antichains.

An antichain $A \subseteq M$ is a set of pairwise incomparable elements, which is to say, for all $a,b \in M$ , if $a \le b$ then $a = b$ . The set of finite antichains on a poset like $M$ will be denoted by $\mathfrak A(M)$ , because I haven’t used a fraktur letter in a while and I’m horny. If we have two antichains, we can merge them together and take the best of both, by considering the minimal elements of their union,

$A \wedge B = \text{min'l}(A \cup B).$

I am using the $\wedge$ symbol for this operation because it makes $\mathfrak A$ into a semilattice: it is associative, commutative, and idempotent ( $A \wedge A = A$ ). Formally, it makes no difference whether the semilattice operation is a join or a meet, but in our case we will intuitively be considering it a meet, because we are interested in minimal costs. As such, the partial order on $\mathfrak A$ induced by this meet is $A \le B$ iff $A \wedge B = A$ , iff for all $b \in B$ there exists $a \in A$ with $a \le b$ . This has the somewhat unfortunate side effect that subsets of antichains (which are still antichains) are larger than their supersets: if $A \subseteq B$ then $A \ge B$ . Oh well. Not everything is sunshine and rainbows.

Now, if you’re me, you might be tempted to do an aside and talk about if this semilattice is a lattice. And though it is indeed very tempting, it’s a little long to get into the details, so I will just spoil the ending: it is a lattice,

$A \vee B = \{ a \in A : a \ge b \ \text{for some}\ b \in B \} \cup \{ b \in B : b \ge a \ \text{for some}\ a \in A \},$

and the proof that it works is just ugly casework.

In any case, whether or not $\mathfrak A$ has a lattice join doesn’t actually matter, because we only care about minimizing costs. What does matter is the operation on antichains induced by the monoid operation. We know what to do with the singleton antichains— $\{m\} \cdot \{n\} = \{ m \cdot n \}$ —and since every antichain is the meet of the singletons of its elements, we can extend this to all antichains by distributivity:

$A \cdot B = \biggl( \bigwedge_{a \in A} \{a\} \biggr) \cdot \biggl( \bigwedge_{b \in B} \{b\} \biggr) = \bigwedge_{a \in A} \bigwedge_{b \in B} \{a \cdot b\}.$

This is where we rely on the fact that we defined $\mathfrak A$ to be the finite antichains. Up until this point, we could do things for all antichains, but if $\mathfrak A$ is not a complete semilattice then this infinite meet may not be defined. You can’t even dodge this by externally declaring it’s just the minimal elements of the setwise product $\{ a \cdot b : a \in A, b \in B \}$ because there’s no guarantee it has any, let alone is adequately described by them.¹

This package of data $(\mathfrak A(M), {\wedge}, {\cdot})$ is an example of an idempotent semiring. Recall that a semiring $(R, {+}, {\cdot})$ is a set $R$ equipped with two monoid operations, a commutative addition $+$ with identity $0$ and a not-necessarily-commutative multiplication $\cdot$ with identity $1$ , and a further stipulation that $\cdot$ distributes over $+$ . Of course, every ring is a semiring, and the most famous example not arising from a ring is the natural numbers $(\mathbb N, {+}, {\times})$ .

A semiring is (additively) idempotent if $r + r = r$ for all $r \in R$ . A particularly famous example is the tropical semiring $(\mathbb R \cup \{\infty\}, {\min}, {+}')$ , where the multiplication $+'$ is the usual real addition extended to have $\infty$ as an absorbing element. (Its fame comes from tropical geometry, a hot topic in algebraic geometry as of late.) Idempotence means the addition is a semilattice operation, and as such defines a partial order on the semiring: $a \le b$ iff $a + b = a$ .² Furthermore, because of distributivity, this order is a monoid order on the multiplicative monoid $(R, {\cdot})$ .

Exercise. Verify that for any idempotent semiring, $\le$ is a semilattice ordering of the multiplicative monoid. That is, show that $\le$ is:

reflexive: $a \le a$ ;

antisymmetric: $a \le b$ and $b \le a$ implies $a = b$ ;

transitive: $a \le b$ and $b \le c$ implies $a \le c$ ;

a meet-semilattice order: $a \le b$ and $a \le c$ iff $a \le b + c$ ;

a monoid order: $a \le b$ implies $a \cdot c \le b \cdot c$ and $c \cdot a \le c \cdot b$ .

Let’s quickly take stock of our idempotent semiring $(\mathfrak A(M), {\wedge}, {\cdot})$ .

$\mathfrak A(M)$ is the set of finite antichains of our ordered monoid $M$ .
$\wedge$ takes the minimal elements of the union of its two operands, so it’s associative, commutative, and its identity element is the empty antichain $\varnothing \in \mathfrak A$ .
$\wedge$ can be interpreted as the meet of a semilattice, so it determines a partial order $\le$ : the order it induces on the singleton antichains agrees with the monoid order on $M$ , and the order it induces on subsets of any fixed antichain agrees with the superset order (if $A \supseteq A'$ then $A \le A'$ ).
$\cdot$ takes the minimal elements of the setwise product of its operands, so it’s associative, and its identity element is $\{1\}$ , the singleton containing the identity of $M$ . $\cdot$ is commutative iff the $M$ is.
$\varnothing$ is an absorbing element for $\cdot$ : $\varnothing \cdot A = \varnothing = A \cdot \varnothing$ .
$\varnothing$ is the greatest element of the poset of antichains—representing a literally priceless cost—and if $M$ is positive then $\{1\}$ is the least element.

Now that we have our costs in a cute little arithmetical package, we can unleash it on the problem. Recall from the setup: $G$ is a directed graph, and $c : E(G) \to M$ is an assignment of costs to the edges. The cost of a path $(e_1, e_2, \dots, e_n)$ is the product of the costs along that path, $c(e_1) \cdot c(e_2) \cdots c(e_n)$ .

Recall also that the goal is to find all possibly cheapest paths from some source $s \in V(G)$ to some target $t \in V(G)$ , subject to the indeterminacy of “some pairs of costs in $M$ may not be comparable”. In $\mathfrak A$ , we still are not able to compare costs, but if they come from paths that have the same start and end points, we can combine them without much thought, by simply taking their meet in $\mathfrak A$ . By construction, we know how to interpret $M$ in $\mathfrak A(M)$ , as singleton sets are antichains.

Immediately we can observe that the cheapest path from $s$ to $t$ will only definitively exist if there are no directed cycles whose cost around is not at least the cost of the identity, that is, every directed cycle $C$ satisfies $c(C) \ge \{1\}$ . If not, then there is some linearization of the monoid order—some cost conversion eventuality on the day of your trip—where the more you travel around that cycle, the cheaper the trip will be. So in the following analysis I will silently ignore this possibility, because its treatment is exactly similar as in the shortest path problem.

A second observation is that if we have a cheapest path from $s$ to $t$ , then every subpath is also a cheapest path between its own start and endpoints. That is to say, this problem is amenable to dynamic programming in precisely the same way as the shortest path problem is. Some of you reading may already see where this is going, but for everyone else, I will take it one step at a time.

First of all, let’s look at cheapest paths of length one. I claim it’s pretty easy to see that it’s solved by the following function $c_1 : V(G)^2 \to \mathfrak A$ , defined as

$c_1(s,t) = \begin{cases} \{1\} & \text{if}\ s=t, \\ \bigwedge( c(e) : s \overset e\longrightarrow t ) & \text{otherwise}. \end{cases}$

$\{1\}$ of course is the least possible cost, representing free transit, which I am implicitly assuming is the cost for simply staying at a given vertex. If the digraph has only at most one edge between any two vertices, then the big meet is not necessary, so long as it is acknowledged that nonedges are equivalent to edges whose cost is $\varnothing \in \mathfrak A$ .

Now, the cheapest paths of larger lengths $c_k : V(G)^2 \to \mathfrak A$ are a breeze:

$\begin{align*} c_2(s,t) &= \bigwedge_{v \in V(G)} c_1(s,v) \cdot c_1(v,t), \\ c_3(s,t) &= \bigwedge_{v \in V(G)} c_2(s,v) \cdot c_1(v,t), \\ c_4(s,t) &= \bigwedge_{v \in V(G)} c_3(s,v) \cdot c_1(v,t), \dots \end{align*}$

And since $c_1(t,t) = \{1\}$ , we have $c_{k+1}(s,t) \le c_k(s,t)$ , which means you just need to repeat this calculation until you are satisfied that there are no longer paths that need to be considered.

Now, this may look kind of complicated, but you have probably seen an algorithm of this form before, though possibly in an unexpected way. You see, in the semiring $\mathfrak A$ , a computation of the form $\bigwedge_i a_i \cdot b_i$ is a dot product. We can actually view $c_1$ as a $V(G) \times V(G)$ -matrix with coefficients in the semiring $\mathfrak A$ , and then $c_k$ is just the matrix power $c_1^k$ ! The addition is unorthodox compared to your run-of-the-mill linear algebra, sure, but in the arithmetic of $\mathfrak A$ it is perfectly natural, and indeed you can view $c_1$ as an obvious adjacency matrix for $G$ with coefficients from $\mathfrak A$ .³

One final observation about this computation. Because of the “idempotence” property $c_{k+1} \le c_k$ , overshooting isn’t really a bad thing, so you can repeatedly square the matrix instead of naively multiplying on copies of $c_1$ , taking full advantage of the “exponentiation by squaring” principle. I don’t think this gets you any serious computational gain if you actually track the timespace complexity of building and computing on antichains, but it’s pretty cool, in my opinion.

To me, this solution is satisfying. To some of you, it might not be.

Perhaps you imagined a stricter variant of the problem, where the task is to produce a list of paths that enacts each of the cheapest costs. Depending on who you are, this is either obviously equivalent or obviously inequivalent. I am of the former position, but regardless of whether or not you agree with me, the procedure to accomodate you is standard. In fact, this whole matrix-over-idempotent-semirings approach is essentially an algebraic recasting of the Floyd–Warshall algorithm, so that discussion may be your next destination. I myself am not particularly interested in that line of study, as it lacks a certain elegance—yes, I think $\mathfrak A$ is elegant—and is more of a necessary evil sort of math.

This topic is also a good springboard to talk about the use of idempotent semirings in combinatorial optimization. Since the late 19th-century, semirings have been trying to find a way to break into mainstream algebra. While they have largely failed to uproot the stranglehold that rings have on algebra, they persist, and by the 1970’s or so they finally started appearing in the work of applied mathematicians and computer scientists, who noted how much could be cast in their language. Idempotent semirings are especially valued, precisely for their ability to be viewed as an operation with a compatible and pleasant partial order. A min-plus–style semiring like $(\mathbb R \cup \{\infty\}, {\min}, {+})$ , for example, allows one to perform optimization, like in this ‘blog post, while a max-plus semiring like $(\mathbb R \cup \{-\infty\}, {\max}, {+})$ is more handy for recasting scheduling-type problems.

The study is only about as old as computer science is, and generally lays neglected out of what I can only assume is a distaste for unnecessary algebra, but it is not without its textbooks. I rather like the texture and mouthfeel⁴ of Golan, Semirings and their Applications, but I suppose it would be dishonest not to mention the newer (and not extremely worse typeset) book by Gondran and Minoux. Full disclosure, I haven’t read them all the way through (I actually never learned how to read, I just like the pretty pictures) so I can’t guarantee it’s not a waste of time, but I mean, it’s semiring theory. You’ve already admitted you don’t value your time by reading this ‘blog post all the way through, so why stop after dipping your toes in?

The meet-semilattice of antichains almost coincides with the complete lattice of the upper sets. They coincide when $M$ satisfies the descending chain condition—that no sequence $a_1 > a_2 > \cdots$ can continue indefinitely—which at first blush sounds like a tough guarantee. However, it follows from the simple assumption that the monoid is positive, that is, $1 \le m$ for all $m \in M$ . On any graph, the set of costs that appear on edges is a finite set, and hence gives a finitely generated submonoid of $M$ which inherits the order, and that alongside positivity gives you the preconditions for Higman’s Lemma from universal algebra. The conclusion is that the order is a wellquasiorder, which is equivalent to DCC plus the additional fact that all antichains are finite! So in a sense, upper sets are what we should really be looking at, and antichains are simply the computational approximation to them, and the only time they don’t work as an approximation is when the antichains are infinite anyway. ↩
Most semiring literature defines the partial order the other way, $a \le b$ iff $a + b = b$ , because addition feels more like a join-y operation. This also has the benefit of making the additive identity $0_R$ the bottom of the semilattice, which matches the notational conventions of order theory. However, this would require an unintuitive flip somewhere in the setup of our cost minimization problem, so for exposition’s sake I will turn it around here. Still, I didn’t lie to you about the tropical semiring, the min-plus formulation is probably more common, and I don’t have an explanation for that, algebraic geometry is just weird. ↩
Depending on your persuasion, you could call it either luck or predestination that semirings are precisely the objects which can be coefficients of matrices, and an idempotent semiring is a natural way of recasting a semilattice-ordered monoid. ↩
20. Sorry, I couldn’t think of a better place to put a weed joke. ↩

Comments have been disabled for this post.