Izaak Meckler

A "type theoretic" principle for reasoning about smooth bundles over manifolds

Posted on September 20, 2016

\[ \newcommand{\br}{\mathbb{R}} \newcommand{\angled}[1]{\langle{} {#1} \rangle{}} \newcommand{\mrm}{\mathrm} \] I'm now taking a Riemannian geometry course, and one of the problems on the first problem set was the following:

Let \(p\) be any point in a Riemannian n-manifold \((M,g)\). Show that there is a local orthonormal frame near \(p\), that is, a local frame \(E_1,\dots,E_n\) defined in a neighborhood of \(p\) that forms an orthonormal basis for the tangent space at each point.

The way one does this problem is to pick a chart \((U, \varphi : U \to \br^n)\) with \(p \in U\), use that to pull back the standard orthonormal frame on \(\br^n\) to obtain a frame \(E_1, \dots, E_n\) on \(U\), and then essentially do Gram Schmidt pointwise with the \(E_i\) by defining \begin{align*} \mrm{normed}(X)(x) &:= \frac{X(x)}{\sqrt{\angled{x,x}}} \\ G_1 &:= E_1 \\ F_1 &:= \mrm{normed}(G_1) \\ G_{k+1}(x) &:= E_{k+1}(x) - \sum_{i=1}^k \angled{E_{k+1}(x), F_i(x)} F_i(x) \\ F_{k+1} &:= \mrm{normed}(G_{k+1}) \end{align*}

and we could go and verify that this does indeed produce an orthormal basis at each point, just as Gram Schmidt does.

This suggests a more general reasoning principle:

Principal for inner product space bundles: If \(M\) is a manifold with a bundle of inner product spaces (i.e., a vector bundle \(E\) on \(M\) with a smoothly varying inner product) then any construction or theorem in the language of inner product spaces can be carried out in \(E\), interpreting vectors as global sections of \(E\) and the inner product of vectors as a pointwise inner product of sections.

The point is that any construction carried out generically for an inner product space can be interpreted as talking about a bundle of inner product spaces instead.

For example, the Gram Schmidt construction is

Gram Schmidt: For any basis \(e_1, \dots, e_n\) of \(V\), there exists a basis \(f_1, \dots, f_n\) of \(V\) such that \(\angled{f_i, f_j} = \delta_{i,j}\).

Which, interpreting vectors as vector fields and the inner product as the pointwise inner product gives us the construction:

Gram Schmidt for Riemannian manifolds: For any frame \(E_1, \dots, E_n\) of \(V\), there exists a frame \(F_1, \dots, F_n\) of \(V\) such that \((x \mapsto \angled{F_i(x), F_j(x)}) = (x \mapsto \delta_{i,j})\)

Note that this does not say that any vector bundle has an orthonormal frame! That would be false because we cannot in general get global frames. It merely says that if we already have a global frame, we can get an orthonormal global frame. Moreover, it is not possible to obtain a basis in the language of vector spaces (although to be clear, this should be some kind of language of vector spaces as a type theory rather than in FOL, so that constructions may be defined and given semantics).

I think there should be a somewhat more general reasoning principle. Let \(\mathcal{C}\) be a category. If, waving hands vigorously, one defines a \(\mathcal{C}\)-bundle on \(M\) to be a "smoothly varying" family of objects of \(\mathcal{C}\) over \(M\), then any construction or theorem in some kind of internal language for \(\mathcal{C}\) will give a corresponding construction for \(\mathcal{C}\)-bundles in which points of objects correspond to global sections of bundles \(M\). (Perhaps we should think of a global section of a bundle \(E\) as a bundle map from the "point bundle" over \(M\) (isomorphic to \(M\)) to \(E\)).

Edit: Andrej Bauer points out on Reddit that a lot of this is explained in "Sheaves in Geometry and Logic" by Mac Lane and Moerdijk.

Parallel transport and version control

Posted on September 14, 2016
Parallel transport

One of the primary problems of version control is that two people can edit the same version of the same file, and the two sets of changes they made have to be reconciled somehow.

One of the ways of solving this in many version control systems is the notion of "rebase". Say \(v_0\) is a version of a repository and \(u\) is a child version. That is, there is some patch \(c\) taking \(v_0\) to \(u\), which we can depict diagrammatically as \[ \array{ v_0 & \stackrel{c}{\to} & u } \] Now say while we were writing version \(u\), the project we're working on progressed upstream from version \(v_0\) to version \(v_1\) with a patch \(p\). \[ \array{ v_0 & \stackrel{p}{\to} & v_1 } \] The rebase of \(c\) along \(p\) is a patch \(R_p(c)\) going from \(v_1 \to u'\) which is somehow obtained from \(c\) in the most generic possible way. And moreover, if the main version progresses from \(v_0\) to \(v_1\) to \(v_2\) as follows \[ \array{ v_0 & \stackrel{p_1}{\to} & v_1 & \stackrel{p_2}{\to} & v_2 } \] then we should have \[\begin{align}R_{p_2}(R_{p_1}(c)) = R_{p_2 \circ p_1}(c)\end{align}\] where \(p_2 \circ p_1\) denotes the patch obtained by first applying the patch \(p_1\) and then appling the patch \(p_2\). In other words, I should be able to rebase incrementally, or all in one step, and it shouldn't make a difference. We also demand that rebasing against the empty patch \(\mathrm{id}{v_0}\) does nothing. Or in other words, \[\begin{align}R_{\mathrm{id}{v_0}}(c) = c\end{align}\].

One way of summarizing this is that rebasing should be functorial. To me, this looks a lot like parallel transport of (tangent) vectors along paths in manifolds (which is also functorial in the same way). This analogy is even more plausible since we can think of a changeset \(c : v_0 \to u\) as being like a tangent vector at \(v_0\) (if the changeset \(c\) is small enough, and maybe think of it like a path if it's bigger). Making this analogy rigorous and potentially useful for designing a version control system could perhaps involve a combinatorial description of a flat connection for some discrete group (or possibly even monoid) \(G\) which connects the versions of a file.

There have been several formalisms trying to talk about version control. I think it is well known for example that merging should be something like a pushout in an appropriate category, and of course there was Homotopical Patch Theory, which formalized a version control system in Homotopy Type Theory. This is similar to the idea I've mentioned here in that patches are interpreted as a kind of path.

Rhetorical Dishonesty

Posted on May 31, 2016

I was really struck by John McWhorter's response to this question. He barely argued substantively against the idea raised by the questioner, instead using rhetoric to discredit it on other grounds.

I've transcribed the response below, along with my interpretations of the (just barely) subtextual meanings of his statements.

He begins by signalling that the idea is so ridiculous that it merits immediate ridicule.
It's brilliant in its way, and I actually discuss it in the book
"This is a terrible idea and I've already discredited it thoroughly in writing"

"If you don't have future marking in your language then that makes you more sensitive to things like saving money and keeping healthy; if you have it then you're less likely to do it."

That is the idea of Keith Chan who was a brilliant economist at Yale and this...there's a TED talk, and it got around, and the media just ate it up with a spoon
"This guy is not a linguist, and his idea was popular in venues with large audiences, which means it's crap."

so if you don't have future marking such as Chinese does not in the sense of the European language has, then that makes you more likely to save money.

And Chen used statistics and you've got to respect them
"This guy uses fancy numbers which often convey authority, but I'm going to imply that such use really evidences an absence of true holistic understanding."
and apparently that showed that all of this made sense and you know I, um, this is not name dropping but I have spoken to, to him, we have corresponded, there is no battle, but no and I can just say very simply this. It's so easy

He firmly proclaims his opinion, long apparent from other signals in such a way that he seems to believe it's almost self-evident. He declares "It's so easy", implying that the opponent's opinion is so wrong, so shoddily supported, that anyone could see it.

Czech. Slovak. There used to be a country called Czechoslovakia and there's a reason. Czech and Slovak are really, this is going to offend somebody but, they are really the same language. I mean they are variations on a theme. The savings rates of Czechs and Slovaks are vastly different and yet they're speaking the same language with the same lack of a future. I give you that one. If you look at Chen's chart it's full of that it. It just doesn't work. It's not necessarily that he did bad linguistic work. That would be easy, that would be fun.

He slipped in that way in about three places where nobody could help it. I mean you simply have to know Russian well to understand whether or not it has a future (it's not his fault).
"Chen was simply oblivious to certain understandings which I'm privy to, and that resulted in his incorrect conclusion."
But there's a little of that, and then really the chart just doesn't work
"I only have that one minor criticism, so I have to vaguely assert that his chart is just wrong and quickly conclude...
and so it's a gorgeous idea, TED Talks live forever, he is a great presenter, but that idea is utterly bereft of hope.

...by reasserting that this is a superficially attractive idea, presented in popular and thus non-credible forums by a substanceless showman, which is ultimately wofeully mistaken."

Dubious Arguments

Posted on May 25, 2016

There's a famous dubious argument that "proves" the set of \(\mathrm{Tree}\) of (planar, rooted) binary trees is in bijection with the set of \(7\)-tuples of trees \(\mathrm{Tree}^6\).

The argument goes as follows. A binary tree is either empty or a non-empty tree with two children which are binary trees. Thus we have an isomorphism \(\mathrm{Tree} \cong 1 + \mathrm{Tree}^2\). Pretending that \(\mathrm{Tree}\) is a complex number and using the quadratic formula we obtain \[ \mathrm{Tree} = \frac{1\pm 3 i}{2} = e^{\frac{2\pi i}{3}} \] so that \(\mathrm{Tree}\) satisfies \(\mathrm{Tree}^6 = 1\). Multiplying both sides by \(\mathrm{Tree}\) yields \(\mathrm{Tree}^7 = \mathrm{Tree}\) and so we conclude that the set of \(7\)-tuples of trees is in bijection with the set of trees.

This is of course all a bit silly but as Fiore and Leinster showed, there is a general principle which provides legitimate proofs for arguments such as these. Specifically they prove the following. Let \(p, q_1\) and \(q_2\) be polynomials over \(\mathbb{N}\) (with some conditions on the polynomials which I won't mention). Suppose that \(z = p(z)\) implies \(q_1(z) = q_2(z)\) for all \(z \in \mathbb{C}\). Then \(z = p(z)\) implies \(q_1(z) = q_2(z)\) in all categories with a product, coproduct, and terminal object.

This theorem is strong enough to get us our bijection \(\mathrm{Tree}^7 \cong \mathrm{Tree}\), but apparently not to fix all nonsense arguments. Consider the following. Let \(t\) be the type defined by

type t =
  | A0
  | A1 of t
  | A2 of t * t

(or in other words \(t\) is the smallest set satisfying \(t = 1 + t + t^2\) or \(t\) is the initial algebra for the functor \(X \mapsto 1 + X + X^2\) or...).

Then if we imagine \(t\) is a complex number, we have \(0 = 1 + t^2\) so that \(t = \pm i\). Let's just pick \(t = i\) for convenience since the two are indistinguishable. Then just by calculation, we have \(\frac{2}{1 - t} = 1 + t\).

Now let's interpret this equation in terms of types. By another dubious argument, \(\frac{1}{1 - t} = 1 + t + t^2 + \dots = \mathrm{List}(t)\) since \[ 1 = (1 + t + t^2 + \dots) - (t + t^2 + \dots) = (1 - t)(1 + t + t^2 + \dots). \] Note that \(1 + t + t^2 + \dots\) is the type of lists of \(t\)s since such a list is either empty, or a single element of \(t\), or a pair, or a \(3\)-tuple, or...

Our dubiously obtained equation tells us we should have a bijection \(2 \times \mathrm{List}(t) \cong 1 + t\) where \(2\) is a type with \(2\) elements. There is in fact a very nice such bijection (or really, a family of such bijections indexed by the set of infinite binary strings) described below. I believe the existence of such a thing is not provided by Fiore and Leinster's theorem. Where does it come from?

Spoiler: The bijection

Let \(X_n\) for \(n \geq 0\) be the set of elements of \(t\) which are of the form and \(Y_n\) for \(n \geq 1\) those of the form and \(Y_0 = \{ * \}\) where \(*\) is some fresh guy not in \(t\).

Note that for \(n > 0\), \(X_n \cong Y_n \cong t^n\) by the obvious map which makes a list of all the dangling children \(s_i\). It is clear that \(X_n\) and \(Y_n\) are disjoint and with a bit more thought that \(1 + t \cong \{*\} \sqcup t = \bigsqcup_{n = 0}^\infty (X_n \sqcup Y_n)\) so that \[ \begin{align} 1 + t &\cong \bigsqcup_{n = 0}^\infty (t^n \sqcup t^n) \\ &\cong \bigsqcup_{n = 0}^\infty 2t^n \\ &\cong 2 \times \bigsqcup_{n = 0}^\infty t^n \\ &\cong 2 \times \mathrm{List}(t) \end{align} \] which gives the bijection. I say this is really a family of bijections indexed by infinite binary strings because we could have chosen any "spine" to decompose an element of \(t\) along. We just happened to choose the spine \(1111111111...\) (that is, "right, right, right, right,...").

Counting in binary

Posted on April 21, 2016

This past monday, after wandering around the Upper East Side for some time, I went for a jaunt up to East Harlem to visit my friend Brad Cohn to fill some time before meeting Zoe.

Brad has been teaching 4th graders in an afterschool program. The other week, since the kids had regents tests, they had no homework in their regular classes, and asked Brad if they could just play games. He told them "no" but said they could learn about whatever they wanted. They decided they wanted to learn about computers, and among other things, Brad taught them how counting in binary works. He did so in a pretty algebraic/symbolic way, by way of explaining exponentiation, etc.

But! We then thought of a -- I think -- more intuitive way to teach the concept which sidesteps (or sneakily explains) a lot of the necessary background concepts. The idea is to draw a complete binary tree of some depth (say, 4):
Now let's say you've got a number \(n\) (let's use \(11\)) in mind which you'd like to write in binary. Count along the leaves (starting with 0 of course) until reaching leaf number \(n\):

Next, let's find the path from the root node to the \(n^{\text{th}}\) leaf, as a sequence of instructions "take the left branch" or "take the right branch". In our example, the path is

  • take the right branch
  • take the left branch
  • take the right branch
  • take the right branch

Just to be concise, we can say "0" instead of "take the left branch" and "1" instead of "take the right branch". So, we can write our path as "1011". And this is how to write \(11\) in binary. Now I think there's lots of nice questions to ask kids or people new to math/computers after presenting this picture:

  • How do we know there's only one path from the root to a given leaf?
  • (After explaining binary notation in terms of sums of powers of two): Why does this method work?
  • Does this method work well for large numbers or does it take too long? (It does because counting along the leaves requires converting your number to unary.) What could we change to make it take less time?

One reason why the interaction between addition and multiplication is the interesting part of number theory

Posted on January 18, 2016

I've often heard people say that the interesting part of number theory is the interaction between addition and multiplication, but I only yesterday realized one reason why that is, triggered by convincing myself that \(a b = \mathrm{lcm}(a,b)\mathrm{gcd}(a,b)\) (a train of thought triggered by seeing the letters "GCD" graffitied on a wall in Chicago).

The reason is essentially that the multiplicative and additive structures of a ring of "numbers" are individually very well understood. Let's say "number theory" is about studying unique factorization domains which are finitely generated as an abelian group under addition. Then the additive part is well understood by the classification of finitely generated abelian groups, and the multiplicative part (considered as a monoid) is just isomorphic to the multiplicative monoid on the positive integers (given by adding the exponents in the unique factorization) times the group of units in our UFD. So that's easy to understand as well.

So, any statement about a ring of numbers only involving one of addition or multiplication is easy to answer, which means the only statements we can't easily answer (which includes all the "interesting" statements) are ones involving both addition and multiplication.

Visualizing Geometries

Posted on October 28, 2015

Let me just preface this post by saying its a prerequisite-free introduction to some of the ideas underlying this page, which I recommend you go play with now.

Sometimes a straight line isn't a straight line, at least not at first glance. Consider, for example, the surface of a sphere.

The equator marked in that picture is a "straight line" in the sense that for two points on the equator, the shortest path between them is a segment of the equator. And, in general, for any two points on a circle, the shortest path between them is a segment of a circle going around the whole sphere. We call the shortest path between two points a geodesic.

The geometry of the sphere is different from the usual geometry of the plane in other ways as well. For example, we know from middle school that the angles of a triangle always add up to 180 degrees. The sphere has "triangles" whose angles add up to more than 180 degrees, as you can see below. We can describe the situation as follows. Suppose you are an ant on the surface of a bowling ball (or a human on the surface of the earth!). You start at the equator at the point B. Then you face due north and walk to the north pole. Once there, you turn 90 degrees to your right and walk due south to equator. Finally, you turn 90 degrees to your right and walk back to B, where you started. The path you traced out had 3 points and consisted of "straight lines", so we say it is a triangle. And, notice that each of the angles was 90 degrees, so the angles of this triangle add up to 270 degrees, a good sight more than the expected 180.

Imagine we took a balloon (more or less a sphere) and popped it with a needle at the top Then, we'll sort of stretch out the balloon so its flat, stretching out the hole at the top, and keep stretching out further and further to infinity unitl it covers the whole plane. The idea is illustrated by this lovely image, where this procedure has been applied to the surface of the earth. If you'd like another way to help wrap your head around this, go to to the aforementioned page and select "Stereographic sphere" to explore this space. The paths that you walk along are where the "straight lines" (geodesics) on the sphere as described above go when you stretch it out over the whole plane. You can even make triangles whose angles add up to more than 180 degrees.

Roughly speaking, to give a geometry on the plane is to give a notion of distance between points. That is, a way of assigning to each pair of points a number (which we think of as the distance between them). What we did by flattening out the sphere to the whole plane was to transfer the notion of distance we had for points on the sphere to a notion of distance for points on the plane: If you want to know the distance between two points on the plane in this new notion of distance, see what points on the sphere are lying on top of them, and take it to be the distance between those two points.

Geometries can be a lot more exotic than that of the sphere, but the basic idea of a system of distances between points remains the same. Actually, we'll ask for a little more. We'll ask for a notion of length, which is to say a way of assigning numbers (lengths) to paths. I say it's a little more since once we have that, we can define the distance between two points to be the length of the shortest path (geodesic) between them. You can play with some more exotic geometries (including the the Poincare disk and the upper half plane, where the angles of triangles add up to less than 180 degrees) using this tool as mentioned above.

String diagrams, traversables, and positive braids

Posted on July 18, 2015

Today I want to describe some kind of possible connection between what in Haskell-land are called traversable functors and "positive braids. I really don't have many answers here, so I'd say the primary purpose of this post is to explain an observation and then invite comment from others.

Natural Transformations

Throughout this post I'll talk about natural transformations. If you're a mathematician, just think of it as a natural transformation. If you're a Haskeller, think of a natural transformation from a(n) (endo)functor \(F\) to a functor \(G\) as a polymorphic function of type \(\texttt{forall a. F a -> G a}\).

String diagrams

Skip to the next section if you know what string diagrams are.

Given a collection of natural transformations \(\eta_1, \eta_2, \cdots\), there are a bunch of other natural transformations we can define. Namely, we can

  • compose natural transformations whose types match up, and
  • given \(\varphi : S \to T\), for any functor \(F\) we can define both
    • \(F\varphi : FS \to FT\) (written \(\texttt{fmap}\; \varphi\)) for a Haskeller) and
    • \(\varphi F : SF \to TF\) (this is just the restriction of \(\varphi \texttt{ :: forall a. S a -> T a}\) to the type \(\texttt{forall a. S (F a) -> T (F a)}\)).

(where juxtaposition of functors denotes composition). We thus obtain a set of terms inductively defined out of the base set of \(\eta_i\).

Some equations automatically hold between terms built up in this way. The basic equations are that

  • for any \(\varphi : A \to B\), \(\epsilon : C \to D\), we have \(B\epsilon \circ \varphi C = \varphi D \circ A\epsilon\) by naturality
  • for any \(F, \varphi : A \to B, \epsilon : B \to C\), by functoriality \(F(\epsilon \circ \varphi) = F\epsilon \circ F\varphi\)
  • for any \(F, \varphi : A \to B, \epsilon : B \to C\), \((\epsilon \circ \varphi)F = \epsilon F \circ \varphi F\) which is clearly true since these are just restrictions.

Thinking about which terms built up out of functor-mapping and composition are equal is a bit annoying because its not always clear how to apply these equations to check. String diagrams solve this problem. They are an alternative notation for describing natural transformations built out over some base set, but one in which the naturality equations hold automatically so that recognizing when things are equal is a lot easier.

A string diagram is essentially a directed graph with edges labeled by functors and vertices labeled by natural transformations, along with an embedding in the plane (up to isotopy) in which all edges travel downward from source vertex to target vertex.

To really understand string diagrams, let's see how we can translate terms as described above into string diagrams (by "pattern matching" on the term). This will essentially explain the meaning of string diagrams.

  • A basic \(\eta_i : S_{i,1} \cdots S_{i,n} \to T_{i,1} \cdots T_{i,m}\) gets sent to the string diagram
  • If the types of \(\varphi\) and \(\epsilon\) match up so we can compose them, and if and are the string diagrams corresponding to \(\varphi\) and \(\epsilon\) respectively, then is the string diagram corresponding to \(\epsilon \circ \varphi\).
  • If \(\varphi\) has corresponding string diagram as above, then is the string diagram corresponding to \(F\varphi\).
  • If \(\varphi\) has corresponding string diagram as above, then is the string diagram corresponding to \(\varphi F\).

Let's see why naturality holds automatically. Remember that naturality says that for any \(\varphi : A \to B\), \(\epsilon : C \to D\), we have \(B\epsilon \circ \varphi C = \varphi D \circ A\epsilon\). Using our translation into string diagrams, the left side has string diagram while the right side is But these are equal since we can smoothly smush around the picture on the left to get the picture on the right. The other equations (functoriality and the fact that restriction respects composition are even more obviously true on-the-nose for string diagrams).


What I'll call braid diagrams are a sort of extension to string diagrams in which certain basic natural transformations \(TF \to FT\) have some kind of structure (exactly what kind will be discussed later) that makes it justifiable to draw them like this as two strands crossing over each other instead of as a "black box" vertex as in a regular string diagram.

(If you know a bit about braids, note here that I'm ignoring the group structure and working only with the monoid of positive braids. That is, braids where all "over" strands cross from left to right as you move from top to bottom.)

The exact conditions required on a natural transformation to draw it like that amount to guaranteeing that we can treat the "strands" and "crossings" of a braid diagram as physical strands of string and crossings of strands without getting into trouble.

Braids in Haskell

A while ago I noticed some candidate natural transformations to play the role of such braiding maps. Namely, for any \(T\) a traversable and \(F\) an applicative functor we have \(\texttt{sequenceA :: forall a. T (F a) -> F (T a)}\). Hereafter, I'll notate this map \(\sigma^T_F\),

Now, my hope is to represent \(\sigma^T_F\) as the diagram For this to be justifiable (i.e., to be able to manipulate diagrams containing this one as if that was a crossing of physical strings) we have to check that a few basic equations hold, which then guarantee the validity of all possible equations resulting from treating that as a crossing of strings. These equations are which is just a particular instance of the naturality of \(\sigma^{T_1}_{F_1}\) and which is not immediately clearly true. Keeping things grounded, In Haskell the first equation is written

(fmap . fmap) sequenceA . sequenceA = sequenceA . (fmap . fmap) sequenceA

and the second is

sequenceA . fmap sequenceA . sequenceA = fmap sequenceA . sequenceA . fmap sequenceA

To check if this second equation is true, let's first take a detour investigating the laws holding of traversables and how they look when interpreted as equations of braid diagrams. One of them is "linearity", which says that for any traversable \(T\) and applicative functors \(F\) and \(G\), the following diagram commutes Or, as an equation of braid diagrams, this looks like

Another is "naturality for applicative transformations". It says that for any natural transformation \(\alpha : F \to G\) which preserves the applicative structure (this is called an "applicative transformation" see e.g., here or here) we have that the following diagram commutes: Or, as an equation of braid diagrams, this looks like which says we can pull strands over applicative transformations.

Ok, now we can try to use these laws to prove the equation we wanted. The first equation follows from linearity, as does the last. The second equation follows from an application of "naturality for applicative transformations" assuming that the map \(\sigma^T_F\) is in fact an applicative transformation.

I know of some non-trivial \(T\) where \(\sigma^T_F\) is an applicative transformation for each \(F\), but they are all "fixed sized" functors. This includes for example, the (traversable, applicative) functor given by

data Diagonal x = Diagonal x x

Another example is, for \(m\) a monoid, (,) m. By \(T\) being "fixed size", I mean that length . (toList :: forall a. T a -> [a]) is a constant function. I think this should be sufficient since I think (i.e., I have an argument but didn't work it out fully) it implies \(T \cong X \mapsto A \times X^n\) for some \(n\) and for some \(A\) and (I think! I didn't check) the traversals for such functors are always applicative transformations.


Let \(\mathcal{T}\) be the class of traversable applicative functors \(T\) such that \(\sigma^T_F\) is an applicative transformation for each \(F\). Then by the above, for any \(T_1, \dots, T_n \in \mathcal{T}\) and any permutation \(\tau \in S_n\) (the symmetric group on \(n\) thingies), the natural transformations \(\eta : T_1 \cdots T_n \to T_{\tau(1)} \cdots T_{\tau(n)}\) defined over the \(\sigma^{T_i}\) are characterized by the positive braids project to \(\tau\) under the homomorphism from the braid group \(B_n\) to \(S_n\). If one wanted to decide the equality of two programs giving such natural transformations, one could simply compare them as braids, using Garside's/Thurston's algorithm for deciding equality of positive braids (see for example the accurately but to the uninitiated misleadingly named Word Processing in Groups).

My remaining questions are as follows.

  • Is \(T\) being fixed size necessary for \(\sigma^T_F\) to be an applicative transformation for all \(F\)?
  • If not, what are counterexamples and what is a necessary condition?

Discussion on reddit

Exhaustiveness checking in 2 + n easy steps

Posted on May 15, 2015

I recently implemented exhaustiveness checking for the Elm compiler. "Exhaustiveness checking" refers to the process of examining a case expression and seeing if all possible cases are covered by one of its branches.

The algorithm I wrote (which has surely been rediscovered many times) is quite simple and intuitive. The basic idea is to think of a pattern of type \(t\) as representing a set of values: namely the set of values of type \(t\) which match that pattern. For example, a variable pattern of type Int corresponds to the set of all values of type Int. And the pattern _ : xs (where the underscore is a variable pattern of type Int) corresponds to the set of non-empty lists of Ints.

We can "check" if a case on a value of type \(t\) is exhaustive as follows:

  1. Let \(S\) be the whole set of values of type \(t\). \(S\) will be the set of values which have yet to be covered by a pattern.
  2. March down the list of cases. At each branch, update \(S\) by removing from it all values that are covered by that pattern.
  3. \(S\) is now the set of values which are not covered by any pattern in the case expression. So, if \(S\) is empty then the case is exhaustive, and otherwise \(S\) contains examples of values which are not covered.

Now we just need a way to represent sets of values that lets us easily remove another set of values.

Let's take our representation of sets of values to be [Pattern]. We'll imagine that the set of values corresponding to ps :: [Pattern] is the union of the sets corresponding to p for each p <- ps.

Looking at (2) in our algorithm, we need a way to implement the difference of two sets. I.e., given sets \(A\) and \(B\), we have to produce the set containing all elements of \(A\) which are not in \(B\). Let's define \(A \setminus B := A \cap \overline{B}\) where \(\overline{B}\) represents the complement of \(B\) (in some universal set let's say).

So, if we can compute a complement operation and an intersection operation, then we can implement our algorithm. More precisely, for ps :: [Pattern] and q :: Pattern we want to compute

setMinus ps q
  = intersect ps (complement q)
  = union [intersectOneMany p (complement q) | p <- ps]`


union :: [ [Pattern] ] -> [Pattern]
union = concat

intersect :: [Pattern] -> [Pattern] -> [Pattern]
intersect ps qs = union [intersectOneMany p qs | p <- ps]

intersectOneMany :: Pattern -> [Pattern] -> [Pattern]
intersectOneMany p = mapMaybe (intersectOneOne p)

complement      :: Pattern -> [Pattern]
intersectOneOne :: Pattern -> Pattern -> Maybe Pattern

So, we just need to implement these last two functions and we'll be done. The details of these are easy enough to work out (depending on what our particular pattern type is). If you're interested, you can try and work them out yourself for Elm's pattern type or read my implementation.

Animating Abstractly

Posted on February 19, 2015

The goal of this post is to describe a design philosophy I've developed for the purposes of writing interactive animated programs. It's really just a particular instance of the more general philosophy of writing programs as an abstract language paired with an interpreter. I'm not sure who this idea is originally due to, but it's quite a nice one.

I've been TA'ing a course in functional programming which is being done in Elm, and as a result I've been writing quite a few interactive programs. Today we'll talk about the simple example of animating operations on a purely functional queue.

I tried to make this example as simple as possible, but no simpler, and so we'll have to start by describing how to efficiently implement queues in a purely functional setting. If you already know the standard way to do this, as described in Okasaki, feel free to skip the next section.


In a purely functional language, computation takes on an extremely local nature. All we can immediately know about some value is some fixed "length" view into it afforded by pattern matching. That is to say, there is no random access to the far-flung reaches of a data structure. Because of this, implementing a FIFO queue with \(O(1)\) enqueue and dequeue (hereafter "put" and "pop" respectively) seems impossible. To do so would require \(O(1)\) access to both the front and the back of a data structure. But if we change our thinking about what the "front" or the "back" of a data structure is, we can nearly achieve these bounds.

Our representation of queues will be

type alias Queue a = (List a, List a)

A pair

(front, back) : Queue a

represents a queue in the order

front ++ List.reverse back

In this representation the front of the queue is the front of the first list in the pair and the back of the queue is the front of the second list in the pair, so we get \(O(1)\) access to both the front and the back!

  • Popping an element off the front of the queue (assuming it's non-empty) is as easy as taking the head of the first list in the pair.
  • Pushing an element onto the back of the queue is just consing onto the second list in the pair, since we're representing the back of the queue in reverse.

If the front view of the queue is empty when we try to pop, we reverse the back and try to pop from there. Reversing takes time \(O(n)\) where \(n\) is the length of the back view, but assuming we're always operating on the new queue, we won't have to do it again for at least another \(n\) steps, so it has an amortized cost of \(O(1)\).

pop : Queue a -> Maybe (a, Queue a)
pop q = 
  case q of
    ([], [])         -> Nothing
    ([], back)       -> pop (List.reverse back, [])
    (x::front, back) -> Just (x, (front, back))

put is as simple as

put : a -> Queue a -> Queue a
put x (front, back) = (front, x::back)

Let's sort of inline the definition of List.reverse so that we can get a feel for how the queue elements are shuffled over from the back list to the front list.

pop : Queue a -> Maybe (a, Queue a)
pop q = 
  case q of
    ([], [])         -> Nothing
    (x::front, back) -> Just (x, (front, back))
    ([], back)       -> 
      let rightToLeft (front', back') = case back' of
            []          -> (front', [])
            x :: back'' -> go (x::front', back'')
      pop (go ([], back))

Now that we know how queues can be implemented efficently in a purely functional language, we can get on with the animating of the various queue operations.

Just so you know, here's what we'll have at the end of the day.

A conceptual framework for animating: an application of the language-interpreter pattern

My philosophy of animating is this: There is some underlying state of the system we are trying to animate. Often this state changes discretely, like the state of a queue being operated on. Any particular discrete change may be animated continuously, but we should try to represent the discrete changes independently, and let the animation be a sort of projection of these discrete changes, applied at the end of the day and interchangable with any other animation if we desire.

So, we now must have a little dialogue with ourselves.

  • Q: What is it that we want to animate?
  • A: We want to animate the push and pop commands being performed on the above queue data.
  • Q: Where will the commands for the queue come from?
  • A: They will come from the user.
  • Q: How do we represent this input from the user.
  • A: We make a type reifying the possible commands the user can make
type QueueCommand a
  = Put a
  | Pop
  | NoOp

The first interpretation

The QueueCommands themselves aren't directly animatable. The Pop command in particular could have many different animations associated to it depending on the state of the queue: Sometimes it just pops an element off the left list, sometimes it causes the right list to be reversed and shuffled over to the left list. These two things should of course be animated differently. So, we have to somehow interpret the abstract QueueCommand a type as a type with more specific information about operating on a queue.

With that in mind, we define the following type, which more directly represents the state of our system (the queue) as our two operations (Pop and Put) are performed on it.

type PuttingState a = PuttingRight a (Queue a)

type PoppingState a
  = PoppingLeft a (Queue a)
  | RightToLeft a (Queue a)

type Execution a
  = Popping (List (PoppingState a))
  | Putting (List (PuttingState a))
  | NoOpping (Queue a)

where Execution a gives the trace of states an operation put our system through Now we can give our first "interpretation" and interpret QueueCommand a in the type

Queue a -> (Execution a, Queue a)

since we can think of a QueueCommand a as meaning a function from queues to an execution trace of that command and the queue after the completion of that command. That is to say, we'll define a function

interpretCommand : QueueCommand a -> (Queue a -> (Execution a, Queue a))

which maps from the syntactic QueueCommand type to the more semantic interpretation type. The idea is to keep our types as tight as possible for as long as possible, moving to a semantically richer, less constrained type at the last possible moment. (When I say less constrained, I'm referring to the fact that there are many terms of the interpretation type Queue a -> (Execution a, Queue a) which aren't interpreting one of our queue commands).

As for how we actually write this function, note that if we have a value of type PoppingState a, we can always either take a step in the popping algorithm (shuffle the next element from the back queue to the front queue) or finish. With that in mind, we define

type OrDone x a = Done (Queue x) | StillGoing a

stepPopping : PoppingState a -> OrDone a (PoppingState a)
stepPopping ps = case ps of
  PoppingLeft x q             -> Done q
  RightToLeft x (front, back) -> case back of
    []       -> StillGoing (PoppingLeft x (front, back))
    y::back' -> StillGoing (RightToLeft y (x :: front, back'))

Similarly, we can define stepPutting.

stepPutting : PuttingState a -> OrDone a (PuttingState a)
stepPutting (PuttingRight x (front, back)) = Done (front, x :: back)

Now a function which builds up the history of the intermediate states in performing some opertation, as well as returning the queue in its post-operation state.

record : (s -> OrDone a s) -> s -> (List s, Queue a)
record step s = case step s of
  Done q        -> ([s], q)
  StillGoing s' -> let (ss, q) = record step s' in (s :: ss, q)

and now we can easily define

interpretCommand : QueueCommand a -> Queue a -> (Execution a, Queue a)
interpretCommand qc = case qc of
  Put x -> \q -> first Putting <| record stepPutting (PuttingRight x q)
  Pop   -> \q -> case q of
    ([], [])         -> (NoOpping q, q)
    ([], x::back)    ->
      first Popping <| record stepPopping (RightToLeft x ([], back))
    (x::front, back) ->
      first Popping <| record stepPopping (PoppingLeft x (front, back))


first : (a -> a') -> (a, b) -> (a', b)
first f (x, y) = (f x, y)

Animating: the second interpretation

Now to do the actual animating, another layer of interpretation. For this we'll use the type Piece in the piece library. Conceptually, a value of type Piece ForATime a is an a-valued function of Time which is defined from the beginning of time only up to some finite time (think a "piece" of a piecewise function). It is intended to represent the idea of an animation, which is a function of time paired with a duration. For example, we'll represent the position of a falling box as such a value. A value of type Piece Forever a is an everywhere defined a-valued function of Time. A Piece needs to be defined forever before we can safely "run" it to get an animation, since the user's browser doesn't just disappear when an animation completes.

To animate, we'll give an interpretation of Execution a. It's clear that an Execution a gives the basic skeleton for an animation, and our interpreter will fill in the details.

interpretExecution : Execution a -> Piece Forever Form
interpretExecution =
  let animWith anim = Piece.sustain << List.foldr1 (<>) << List.map anim in
  \e -> case e of
    Popping pss -> animWith drawPopping pss
    Putting pss -> animWith drawPutting pss
    NoOpping q  -> Piece.stayForever (drawQueue q)

where (<>) : Piece ForATime a -> Piece t' a -> Piece t' a combines two pieces to get a new one which acts like the first argument until its time is up, and subsequently acts like the second. function. It's a sort of monoid operation for Piece which lays pieces end-to-end. Piece.sustain is the function that extends a partial function of time to a total one by continuing on constantly with the final value of the partial function.

Finally, we put it all together by first calling interpretCommand on the user's command, and then animating the resulting animation with interpretExecution.

main : Signal Element
main =
  Signal.foldp (\qc (_, q) ->
    first interpretExecution <| interpretCommand qc q)
    (Piece.stayForever (drawQueue empty), empty)
    (Signal.subscribe commandChan)
  |> Signal.map fst
  |> Piece.run (Time.every 20)
  |> Signal.map (\f -> 
       flow down
       [ collage 500 500 [moveY -190 f]
       , container 500 40 middle <| buttons

The rest of the program is just the actual drawing code and is available here.

Short words: The beginnings of a new game

Posted on February 2, 2015

I'm currently TA'ing a class on functional programming, and today I demoed my Elm library "stage" for building up piecewise functions of time, typically to make animations.

Check it out

As part of the demo, I showed a small game I built using the library which I call "Short Words". Here it is. There's currently only one (short) level and things are a bit rough around the edges. The goal is to move the blue "R" to match the red "R" using the available moves. The number of moves you have remaining is shown at the top left.

The word "word" in the title "Short Words" refers to "word" in group theory. The idea is that in the game you're given a particular set of elements of the isometry of group (i.e., the transformations performable using the buttons) and you have to construct a short word representing the isometry given by the position/orientation of the red "R". (Strictly speaking it's more like a word in a monoid since I don't give you inverses.)

Update: Just added a few new levels.

Boxes and Types

Posted on January 9, 2015

Usually, I think of types as something very concrete. After all, I spend much of my time "directly" interacting with them while writing programs. But, every so often, I have some variant of this conversation

  • Person at party: What do you study?
  • Me: Math and computer science.
  • P: Oh, what sort of things?
  • Me: Topology, complexity theory, type theory
  • P: Type theory? What's that?
  • Me: Oh! Well it's about...

and at this point I realize that types maybe aren't so concrete.

Recently I had a similar realization, trying to explain one-holed-contexts to Kasia (with whom I developed the following ideas). After a minute or two of trying, I started drawing some pictures.

"Forget that", I said of whatever tack I was currently taking, "A type is like a box."

A "box"

The above image, for example, is the type \(X \times X\) (for some type \(X\)). This type is represented by a box with two \(X\)-shaped holes in it, each waiting to be filled by something \(X\)-shaped. The inhabitants of the type \(X \times X\) are exactly the "complete configurations" of this box. That is, the configurations of this box with all holes filled (we'll do away with holes in a moment, but this is how I explained it at the time).

It's a bit like this game that young type theorists sometimes play:
Even kids love type theory

In more generality, if we have box pictures for types \(A\) and \(B\), the box picture for \(A \times B\) is obtained by gluing these two pictures together like so:

By convention, let's glue them vertically to represent \(\times\). So, evidently, a complete configuration of the \(A \times B\) box is a complete configuration of the \(A\) box along with a complete configuration of the \(B\) box.

The story for \(A + B\) (that's Either A B for the Haskell programmers) is slightly more complicated. First we glue the boxes for \(A\) and \(B\) together (this time our convention will be to glue them horizontally) and then we put a "sliding cover" on the resultant box, which can be slid back and forth to cover either the \(A\) side or the \(B\) side, like so:

The cover lets us ignore the box under it. That is, it excuses us from configuring it. So, we can observe that a complete configuration of the \(A + B\) box is either a configuration of the \(A\) box or a configuration of the \(B\) box, as expected.

The unit type, \(1\), corresponds to a featureless, holeless box since there is exactly one configuration of such a box. Notice that now we don't really need the holes idea anymore, since we have box pictures for \(1\), \(+\), \(\times\), (and we're about to get one for \(\mu\)-types).

There's a straightforward interpretation of fixed point types in box pictures as well. Consider the functor \(F : \mathrm{Type} \to \mathrm{Type}\) defined by \(FX = 1 + X \times X\). In Haskell notation, we might write this (with suggestive names) as

data F x
  = EmptyTree
  | Node x x

We model this as a box drawing as follows: Start drawing the picture for \(F X\), without knowing what \(X\) is. When you encounter an \(X\), replace it with a drawing of \(FX\), and so on. Here's what this looks like for our functor \(F\).

The collection of configurations reachable from the starting configuration of this box in finitely many slides corresponds to the type of finite depth binary trees, \(\mu X. F X\). However, if we permit the configuration to have "infinitely much detail" so to speak, then the collection of configurations corresponds to \(\nu X . F X\), the type of possibly infinite depth binary trees.

If one only has finitely much time to drawing Configurations of this drawing correspond

The greatest fixed point of this functor, \(\nu X. FX\), is the type of possibly infinite depth rooted binary trees.

Let's imagine we made all the doors transparent in this picture so we can see the whole box at once.
The type of binary trees

This picture has quite a lot of symmetries, or near symmetries in any case. We can imagine sort of pushing the hole picture down into the bottom right quarter for example.

I don't have a real definition of the kind of symmetry I'm referring to, but it seems it's essentially represented by a binary tree with one of the leaves missing, in a one-holed context fashion.

Here are some examples of what I'm talking about.

The map \(f\) takes a tree \(T\) and plugs it in at the indicated spot of the indicated tree. Similarly for \(g\).

This isn't exactly a symmetry as traditionally conceived since such symmetries don't have inverses. They form only a semi-group (or a monoid if we throw in the identity) and not a group as a collection of symmetries typically does. However, these maps are injective, so inverting them isn't hopeless. The problem is that their image isn't the whole type of binary trees. It seems like we could somehow enlarge the type of trees (and correspondingly the above picture of the type of trees) to allow these near symmetries to really be symmetries. That's what I've been trying to figure out recently, but it's not totally clear to me how to proceed yet.

My current thinking is as follows. In order to restore symmetry to the image of the tree type, we must zoom out. There is a near symmetry taking the whole image to the bottom quarter, so it should have as its inverse a symmetry taking the bottom quarter one level up. In order for such a symmetry to make sense, the top node should actually be the left child of some other node in our zoomed out picture. But since there should be an inverse symmetry taking the upper right quarter (the right child) to the whole image, the top node should be able to act as the right child of some node as well. So we zoom out as required, allowing the top node to be either a left child or a right child:
Growing our space backwards

The same argument applies to these two new nodes, and we zoom out on them as well, continuing ad infinitum to get a new kind of picture. However, I'm not sure how to interpret this object type theoretically, or even how to compute with it. It also remains to see if there's a simple box picture for function types.

Party time

Posted on January 2, 2015

Here's something that I've been meaning to write for a very long time. It's a little simulation of people at a party. You can check it out here and the code is here.

These dots know how to get down

The model is simple. At every moment, each person has a collection of goals (e.g., drink alcohol, talk to person \(p\), get away from person \(p\)) and each goal has associated with it a utility and a location. It's worth noting that the utilities do depend on the person's state: someone who is already very drunk will assign a lower utility to drinking alcohol. At every moment, each person picks the goal of maximum utility and moves an infinitessimal amount toward it.

The colored lines in the above drawing represent individuals opinions of those around them. A green line indicates affinity and a red line indicates dislike.

Infinite Games in Hyperbolic Groups

Posted on July 9, 2014

The other day I was sitting in The Coop in Cambridge, reading a book on combinatorial games, and so naturally I started wondering about games played in groups, or on their Cayley graphs at least. After a short T ride home, I discussed the idea with my roommate and I settled on the following game:

Let \(G\) be a group with a specified presentation \(\langle a_1, \dots, a_m \mid R \rangle\) and let \(\Gamma\) be the corresponding Cayley graph. Fix some \(k : \mathbb{N}, k \geq 1\). Then the game \(E_{\Gamma, k}\) is played as follows:

There are two players, which for vividness I'll call Rocket and Gravity. The players start on some vertex \(v_0\) corresponding to the identity \(e\) in \(G\). The two players alternate turns, starting with Gravity. On its first move, Gravity moves to any vertex \(v_1\) adjacent to \(v_0\). Since we are in a Cayley graph, this amounts to choosing a generator.

Now suppose it is the \((i + 1)^{\text{th}}\) turn. If it is Gravity's turn, then as before, they select an adjacent vertex \(v_{i + 2}\), but now we must have \(v_{i + 1} \neq v_i\). In other words, the players may not backtrack. In yet other words, the word in \(G\) corresponding to the path the players are building must be freely reduced. If it is Rocket's turn, they are allowed to choose \(v_{i + 2}\) to be any vertex within \(k\) edges of \(v_{i + 1}\), again subject to the restriction that they may not move from \(v_{i + 1}\) to \(v_i\).

For each \(i\), let \(w_i\) be the word labelling a shortest path from \(v_0\) to \(v_i\). Then Gravity wins if \[ \limsup_{i \to \infty} |w_i| = \infty \] and Rocket wins otherwise. I.e., if \[ \limsup_{i \to \infty} |w_i| < \infty \] So, Gravity is trying to pull our hero away to infinity, while Rocket must struggle nobly till the end of time to stay within some bounded neighborhood of the origin.

The question then is, for a given \(\Gamma\), what is the least \(k\) such that Rocket has a winning strategy for the game \(E_{\Gamma, k}\). Since \(\Gamma\) is the Cayley graph of a finitely presented group, we have a weak upper bound (assuming every generator appears in some relator). Namely \(\max_{r \in R} |r| - 1\), where \(R\) again is our collection of relators. If Rocket has this many moves, then their winning strategy is simply to complete some relator containing the generator just chosen by Gravity.

Due to Yann Olivier

It seems clear that in some cases this must be a lower bound as well. For example, consider the genus 2 surface group \(\langle a, b, c, d \mid [a, b][c, d]\rangle\), shown above. If Rocket is given only 6 moves per turn, it seems clear that Gravity can succeed in pushing us out to boundary.

In fact this is the case, and we can say something more general: Take any group \(G = \langle a_1, \dots, a_m | R \rangle\) with corresponding Cayley graph \(\Gamma\), and let's assume that \(R\) is symmetrized (closed under inverses and cyclic permutations) for convenience. If \(G\) is \(C'(1/6)\) and for any \(a_i\) there is an \(a_j\) such that \(a_i a_j\) does not appear as a subword of any relator in \(R\), then Rocket requires at least \(\max_{r \in R} |r| - 1\) moves per turn to have a winning strategy. Using the same proof, we can say something more general than this, but it is a bit complicated to state, so I won't mention it for now.

So let's prove this. We'll give Rocket \(\ell := \max_{r \in R} |r| - 2\) moves per turn, and construct a winning strategy for Gravity.

To prove this we'll use, as with all facts about small cancellation groups, Greendlinger's lemma.

The logic proof is to exhibit a strategy for Gravity such that

  1. If it's Gravity's turn and a geodesic segment corresponding to the current position is \(w\), then Gravity can choose a generator \(a\) such that \(wa\) is still geodesic. In other words, Gravity can increase the distance from the origin by 1 on each turn.

  2. If it's Rocket's turn and a geodesic segment corresponding position is \(w\), then for any \(s\) of length at most \(\ell\) (not backtracking on \(w\)) the length of the shortest word equal to \(w\ell\) (which I'll denote \(\|w\ell\|\) is at least \(|w|\). That is, Rocket can never decrease the distance to the origin.

It then follows that the \(w_i\) are getting monotonically longer, so the strategy succeeds.

I'll follow up this post eventually with proofs and pictures.

2001 Frames: A Markov model movie

Posted on January 21, 2013

If I say the word "swimming" to you, you've got a fair bit of information about what word I'm going to say next. Most likely I'll say "pool", but I might say "is" or any other verb, or something else. Conversely I'm probably not going to say "banana" or "golf" after "swimming". By examining a large chunk of text, for any word w in the text, we can produce a probability distribution on what a random word following w will be. With all these distributions in hand, we can build a new text as follows: Randomly choose some word. From the distribution on words that follow this word, randomly select the next word. Now with the distribution on words that follow this new word, randomly select another word, and continue in this manner for as long as you like. The object underlying this technique is called a Markov chain, here's a good intuitive explanation:

A Markov Chain describes things that change in a way that has no memory. What happens in the future doesn’t depend on what happened in the past. Picture a drunk, staggering home after a night out. Each step he takes is in a random direction. He might recognise the local shop, and walk towards it – but if he gets lost, and finds the shop again, there is nothing to stop him making the same mistake twice and walking in circles – because he can’t remember where he’s been, but only where he is.

People sometimes use Markov Chains to generate poetry, absurdist jokes, or nonsensical text. I used the same basic idea to generate movies by looking at the colors of frames from 2001: A Space Odyssey. I.e., given that the color of the current frame is c, what is the probability distribution on the color of the following frame. Specifically, I discretized the HSV space into 144 by dividing hue and saturation each into 12 parts, and tagged images with the bucket which a plurality of their pixels fell into. Then I computed \(freq(Frame_{i + 1} = x \mid Frame_i = C)\) for each color bucket \(C\) and used those frequencies to train the Markov chain. I also tried using average color, but I found the results weren't as good. This movie, 2001 Frames (actually more like 1998 to fit with the music), is a single run of a markov chain trained in this manner on frames from 2001: A Space Odyssey. I hope you enjoy it.

2001 Frames from Izaak Meckler on Vimeo.

The code is on Github.