# Quadratic proof-complexity lower bounds from large scale geometry

\(\newcommand{\Set}[1]{\left\{#1\right\}}\) \(\newcommand{\angled}[1]{\langle #1 \rangle}\) \(\newcommand{\what}{\widehat}\) \(\newcommand{\bz}{\mathbb{Z}}\) \(\newcommand{\bn}{\mathbb{N}}\)

Today I want to talk about a quadratic proof complexity lower bound for a certain coNP complete problem (in a certain proof system). The most interesting thing about this lower bound is that it comes from a simple application of theorems from large scale geometry/geometric group theory, and so further demonstrates the utility of the large-scale-geometric point-of-view for thinking about computational complexity. (I've written previously about such an application here.)

The coNP complete problem will be the word problem for reversible circuits. Let's say again what that is. There is a certain subgroup \(G\) of the group of permutations of bi-infinite bitstrings \(S(2^\bz)\). It is generated by the following elements:

- \(\sigma\), a shift: \[\begin{align*} \sigma(w) = i \mapsto w(i + 1) \end{align*}\]
- For each of the \(2^3!\) possible 3-bit reversible gates, \(g\), the map \(\what{g}\) which applies this gate to the first three bits: \[\begin{align*} \what{g}(\dots, w_{-1}, w_0, w_1, w_2, w_3, \dots) &= (\dots, w_{-1}, g(w_0, w_1, w_2), w_3, \dots) \end{align*}\]

The word problem for \(G\) is coNP hard since reversible circuits can be encoded as elements of the group. Given a reversible circuit \(C\), let \(|C|\) denote the length of the word representing \(C\) via this encoding.

Via this encoding, we can prove the following lower bound:

Theorem:Suppose we have a finite collection of rewrite rules \[p_i \to q_i\] where \(p_i\) and \(q_i\) are reversible circuits which compute the same function and such that if \(C\) computes the identity function then there exists a sequence of rewrites of length at most \(L(|C|)\) rewriting \(C\) to the empty string.Then for some constant \(c\), \(L(n) \geq c n^2\) infinitely often.

*Proof.* Suppose we had such a set of rewrite rules \(\Set{p_i \to q_i : i \in I}\). Consider the set of relations \(R = \Set{ p_i q_i^{-1} : i \in I}\). I claim that \(\angled{\sigma, \what{g_1}, \dots, \what{g_{2^3!}} | R}\) is a presentation for \(G\) (where \(g_1, \dots, g_{2^3}!\) are the \(2^3!\) possible gates on \(3\) bits).

First, this group surjects onto \(G\) via the map \(\varphi\) which sends each generator to the corresponding generator in \(G\): every relation \(p_i q_i^{-1}\) holds in \(G\) since \(p_i\) and \(q_i\) compute the same functions (which means they are equal as elements in \(G\)). Second, this map is injective. If \(w \in \ker H\), then \(w\) is a circuit representing the identity function, which by assumption there is a sequence of rewrites taking \(w\) to the empty string. Each of these rewrites \(p_i \to q_i\) corresponds to an equality which holds in \(H\) (since \(p_i =_H q_i\)). Thus we have \(w =_H e\), the identity, so \(\ker H\) is trivial.

So \(\angled{\sigma, \what{g_1}, \dots, \what{g_{2^3!}} | R}\) is a presentation for \(G\). Moreover, since every sequence of rewrites taking a trivial word \(C\) to the identity has length at most \(L(|C|)\), the Dehn function for \(G\) with this presentation is at most \(L(n)\). Suppose for every constant \(c\), \(L(n) < c n^2\) eventually. Then, by the isoperimetric gap theorem, \(G\) satisfies a linear isoperimetric inequality.

But then \(G\) must be hyperbolic (see Theorem 3.22 here). However, this is not the case. To show \(G\) is not hyperbolic, it suffices to exhibit an infinite torsion subgroup. That is, an infinite subgroup \(H \leq G\) such that for every \(h \in H\), \(h^k = e\) for some \(k\) (where \(e \in G\) is the identity). Let \(g \in G\) be negation on the \(0\)th bit. I.e., \[ g(w) = i \mapsto \begin{cases} \neg g(i) & i = 0 \\ g(i) & \text{otherwise} \end{cases} \] Clearly \(g^2 = e\). Now consider \(g_k := \sigma^{-k} g \sigma^k\) for \(k \in \bn\). This is negation on the \(k\)th bit. Let \(H = \angled{g_k : k \in \bn}\). \(H\) is infinite, and every \(g_k\) has order two. And each pair \(g_{k_1}, g_{k_2}\) commute (if you negate bit \(k_1\) then bit \(k_2\), it is the same as negating bit \(k_2\) and then bit \(k_1\)). Thus, \(H\) is none other than \(\oplus_{k \in \bn} \mathbb{Z}_2\), an infinite direct sum of \(\mathbb{Z}_2\)s. So, \(H\) is an infinite torsion subgroup, which means \(G\) cannot be hyperbolic.

## Take-aways

At this point, we've shown that any system of "rewrite rules" for discovering triviality of circuits will take at least quadratically many rewrites on some trivial circuit. The proof uses the large scale geometry of a group \(G\) of reversible circuits. Roughly the only part that used the fact that \(G\) was a group was for certifying non-hyperbolicity via the existence of an infinite torsion subgroup. This suggests that it may be possible to extend this argument to non-reversible circuits by constructing a space of circuits whose non-hyperbolicity could be verified by some other means.

# When non-uniformity doesn't help

Today I want to write about a simple proof that certain problems that can be computed by polynomial-sized non-uniform read-once branching programs (also called non-uniform automata) can actually be computed uniformly just as efficiently. That is, in some cases, a degree of non-uniformity doesn't actually help solve the problem any more efficiently.

What is a non-uniform automaton? The basic idea is that of a deterministic finite automaton (i.e., a DFA, a.k.a. a finite state machine), but which is allowed to vary based on the input length. That is, a non-uniform automaton for a language \(L\) is a sequence of DFAs \(M_0, M_1, \dots\) such that for \(x\) of length \(n\), \(x \in L\) iff \(M_n\) accepts.

The fact is the following:

Fact:If \((G, A)\) is a finitely presented group and the word problem for \(G\) can be computed by non-uniform automata with polynomially many states, then the word problem for \(G\) has a linear time (multi-tape Turing machine) algorithm.

*Proof.* Suppose the word problem for \((G, A)\) has a non-uniform automaton \(M_0, M_1, \dots\) with polynomially many states. Say \(S(n)\) is a polynomial upper bound for the number of states of \(M_n\) (assume it is monotonically increasing WLOG). Let \(R(\ell)\) denote the set of elements of \(G\) with a word of length exactly \(\ell\).

I claim that \[ |R(n / 2)| \leq S(n) \] I.e., there are at least as many states in \(M_n\) as there are group elements with words of length exactly \(n/2\). Why? The map from \(R(n/2)\) to the states of \(M_n\) given by running \(M_n\) will be an injection: Take \(x, y \in R(n/2)\) and take words \(w_x, w_y\) of length \(n/2\) representing them. Suppose \(M_n\) is in the same state \(s\) after running on these words. \(M_n\) takes \(n/2\) more input characters, so from this state, we can feed it the word \(w_x^{-1}\), which has length \(n / 2\). Since it should accept \(w_x w_x^{-1}\) (since that's a trivial word), this will put it in an accepting state. But that means \(M_n\) will accept on \(w_y w_x^{-1}\) as well since it will first run on \(w_y\) and enter state \(s\), and then proceed as with the word \(w_x w_x^{-1}\) (since \(w_x\) also puts \(M_n\) in state \(s\)). Thus the word \(w_y w_x^{-1}\) is trivial, which means \(x = y\) in \(G\).

Now let \(B(e, r)\) denote the ball of radius \(r\) in the Cayley graph of \((G, A)\). We have \[ \begin{align*} |B(e, r)| &= \sum_{\ell \leq r} |R(\ell)| \\ &\leq \sum_{\ell \leq r} S(2 \ell) \\ &\leq r S(2 r) \end{align*} \] \(r S(2 r)\) is a polynomial in \(r\), and thus \(G\) has polynomial growth. Thus, by Gromov's theorem, \(G\) is virtually nilpotent. By the work in this paper of Goodman and Shapiro, \(G\) then has a linear time algorithm.

# Categorified Complexity Theory

\(\newcommand{\PSPACE}{\mathrm{PSPACE}}\) \(\newcommand{\IP}{\mathrm{IP}}\) \(\newcommand{\NP}{\mathrm{NP}}\) \(\newcommand{\P}{\mathrm{P}}\) \(\newcommand{\PH}{\mathrm{PH}}\) \(\newcommand{\Set}[1]{\left\{#1\right\}}\) \(\require{AMScd}\)

Here's a famous theorem from complexity theory:

Theorem\[\PSPACE \subseteq \IP\]

There are many theorems like this. Like Toda's theorem:

Theorem\[\PH \subseteq \P^{\#\P}\]

or the PCP theorem:

Theorem\[\NP \subseteq \mathrm{PCP}[O(\log n), O(1)]\]

In general, many theorems are class inclusions: showing that the set of problems characterized by one definition are included in the set of problemss characterized by some other surprisingly different definition.

Let's focus on \(\PSPACE \subseteq \IP\) because I know the proof the best. I want to advance the thesis here that this (and all effective proofs -- more on that later -- of class inclusions) really ought to be thought of as the decatgeorification of constructions of maps of "realizers".

There is a category whose objects are sets of languages (i.e., subsets of \(2^*\)) and whose morphisms are just inclusions of sets. From this point of view, the above theorems are constructions of morphisms in this category.

But they are so much more than that! The proof that \(\PSPACE \subseteq \IP\) is a uniform algorithm which takes a \(\PSPACE\) machine and produces an "\(\IP\) machine", that is, an interactive verifier, in such a way that the language computed by the \(\PSPACE\) machine is equal to the language computed by the \(\IP\) machine.

Let's flesh this out a bit. Instead of thinking of a complexity class as a set of languages, let's think of it as a set of "realizers" \(R\) equipped with a map \(L : R \to \mathcal{P}(2^*)\). So \(L\) maps each realizer to the language which it realizes.

For example, the complexity class \(\P\) has as its realizers pairs \((q, M)\) where \(q\) is a polynomial and \(M\) is a Turing machine running in time \(q(n)\) and the map \(L_\P\) from realizers to languages is \[\begin{align*} L_\P(q, M) &= \Set{ x : M(x) = 1 } \end{align*}\] As a more interesting example, the class \(\NP\) has as its realizers pairs \[\begin{align*} (q, V : 2^n \times 2^{q(n)} \to 2) \end{align*}\] where \(q\) is a polynomial and \(V\) is a polytime Turing machine of the stated type. The interpretation map \(L_\NP\) is \[\begin{align*} L_\NP(q, V) &= \Set{x : \exists y \in 2^{q(|x|)} \; V(x,y) = 1} \end{align*}\]In my head I think of complexity classes in this sense

- as "diagrams" \[ \begin{CD} R \\ @VVV{L} \\ \mathcal{P}(2^*) \end{CD} \]
- as a bundle over the space of languages \(\mathcal{P}(2^*)\), above each language sitting the fiber of realizers.
- or just as the set of associated realizers

In other words, this category of complexity classes is just the over category \(\mathrm{Set} / \mathcal{P}(2^*)\) (pronounced, "sets over languages").

Let's think about how this works out in a really simple example: There's a map \(\iota : \P \to \NP\) given by \[\begin{align*} \iota(q, M) = (0, (x, y) \mapsto M(x)) \end{align*}\] This is a bundle map since for any \(M\) we have \[\begin{align*} L_\P (q, M) &= \Set{ x : M(x) = 1 } \\ &= \Set{ x : \exists y \in 2^0 \; M(x) = 1 } \\ &= \Set{ x : \exists y \in 2^0 \; ((x, y) \mapsto M(x))(y) = 1 } \\ &= L_\NP(\iota(q, M)) \end{align*}\]That is, the map transforms the realizer into one which recognizes the same language.

The \(\PSPACE \subseteq \IP\) theorem is similar. It really exhbits a map \(\PSPACE \to \IP\) in our category which transforms a \(\PSPACE\) realizer (a polyspace Turing machine) into an \(\IP\) realizer (a randomized polytime interactive verifier).

Of course, the mere existence of a map \(C \to D\) (for classes \(C\) and \(D\)) is equivalent to a containment \(L(C) \subseteq L(D)\) (where by \(L(C)\) I mean the set of languages in the image of \(C\)'s interpretation map). But most class inclusion proofs (like \(\IP \subseteq \PSPACE\)) are actually doing far more: they are exhibiting effective constructions -- that is algorithms -- from the set of realizers of one class to the set of realizers of the other.

Apart from being (in my opinion) a more honest way of representing these kinds of facts (since I think everyone probably spends a few brain cells translating class inclusions into maps of realizers) I wonder if it would lead to new insights in complexity theory.

For example, can we get more interesting conclusions if we assume not just inclusions, but uniform inclusions? What happens if there is a "uniform inclusion" \(\NP \to \P / \mathrm{poly}\)? (It's not clear what that should mean, but maybe it would be a \(\P / \mathrm{Poly}\) machine which takes an \(\NP\) verifier \((V, q)\) and a string \(1^n\) as input and produces a circuit in time \(\mathrm{poly}{q(n), n}\) which computes the language corresponding to \(V\) for \(n\)-bit strings). Could you prove \(\P = \NP\) under this assumption?

Also, I wonder if anything can be gained by thinking of the category of complexity classes as bundles over some base space. The task of complexity theory is then classifying bundles. We have better tools for classifying bundles when the objects have more structure (e.g., fiber bundles, vector bundles, etc.). Is there a model of computation where complexity classes are vector bundles? Can we separate \(\P\) from \(\NP\) there?

# Tighter representation theory lower bounds from stronger assumptions

Let me preface this post by saying that the things proved here are more easily proved by the simplest counting arguments. I think the proofs are still worthwhile, however, as they give indications of how complexity theoretic assumptions can be used to prove purely mathematical statements, and the way in which stronger assumptions allow you to prove stronger statements.

Last time, I talked about how complexity theoretic assumptions given you lower bounds on the dimensions of irreducible representations of the symmetric group. This time, I want to show how certain stronger cryptographic assumptions give better lower bounds on such dimensions.

Namely, we'll assume that one way functions exist. I.e., there is some polytime computable function \(f : 2^k \to 2^m\) such that for any non-uniform PPT algorithm \(A\), we have \[ \mathrm{Pr}_{x \sim U(2^k)} \left[ A(f(x)) \in f^{-1}(f(x)) \right ] < \frac{1}{|p(k)|} \] for every polynomial \(k\) and for large enough \(k\). What this says is that \(f\) cannot be inverted on a generic input, except with vanishingly small probability.

Now let's prove the following theorem under this assumption:

TheoremFix a sequence of finite fields \(F_k\) with \(\log |F_k| = \mathrm{poly}(k)\) and operations computable in \(\mathrm{poly}(k)\) time.Let \(d_k\) be the dimension of the minimal \(F_k\)-representation of \(S(2^k)\). Then for any polynomial \(q(k)\), for all sufficiently large \(k\) we have \(d_k > q(k)\)

*Proof:* Let \(S(2^k)\) be the symmetric group on binary strings of length \(k\) equipped with a generating set \(X_k\) corresponding to a collection > of universal gates.

Fix a polynomial \(q(k)\). Suppose to the contrary that for infinitely many \(k\) we have \(d_k \leq q(k)\). Let \(\rho_k\) be the corresponding representation.

Then for such \(k\), we can given a non-uniform polytime algorithm for deciding the word problem in \(S(2^k)\). The advice string is \(\rho_k(x_1), \rho_k(x_2), \dots\) where the \(x_i\) are the generators. There are \(\mathrm{poly}(k)\) many generators and \(d_k\) is bounded by the polynomial \(q\), so the total advice is \(\mathrm{poly}(k)\). Now, we can just solve the word problem by taking our word, replacing generators with the corresonding matrices, multiplying them together and checking if the result is the identity matrix. All this can be done in polynomial time.

The word problem for \(S(2^k)\) on our given generating set is NP-complete (this is somewhat imprecise, what I really mean is described in the last post). We note that inverting any one-way function can be done in polytime with an NP oracle. Since we non-uniformly solve NP hard problems on arbitrarily large input sizes, we can non-uniformly invert one-way functions infinitely often, (come to think of it, one has to do a bit more work, but it should work out). This contradicts our cryptographic assumption.

Examining the proof, we can actually get away with a significantly weaker assumption. Namely, that there is some NP-hard problem \(L\) such that for any \(\mathrm{P/poly}\) algorithm \(A\), there is some constant \(N\) such that for all \(k > N\), some \(x\) of length \(k\) has \(A(k) \neq L(k)\).

# A more quantitative assumption

If we make a more quantitative assumption, such as the following non-uniform exponential time hypothesis:

There is a universal constant \(\delta\) such that any circuit for 3-SAT has size at least \(2^{\delta k}\).

Analyzing the circuit given in the above proof, using the fact that multiplying two \(d_k \times d_k\) matrices can be done in time \(d_k^3 \mathrm{poly}(k)\), we get the inequality \[ d_k^3 \mathrm{poly}(k) \geq 2^{\delta k} \] and thus for any \(\epsilon > 0\), \[ d_k \geq 2^{\frac{1}{3} (\delta - \epsilon) k} \] for large enough \(k\).

Sadly this is still far from the truth, which is that \(d_k = 2^k - 1\). Still, a purely complexity theoretic assumption lets us get pretty close.

# Using complexity theory to prove things in representation theory

The other day, Evan Shapiro and I figured something interesting out about how complexity theoretic assumptions can help to prove asymptotic statements about the representation theory of symmetric groups! At a high level, we proved that \(S_{2^k}\) can't have very small faithful representations over any finite field (given a certain complexity-theoretic assumption). It's a very crude lower bound and the minimal size of a faithful representation of the symmetric group is already known, which makes this proof a bit of a nuclear fly-swatter.

But! I think the proof is interesting because it involves basically nothing about representation theory and just uses a complexity theoretic argument. It's really weird! To me this seems like a fact about geometry of Euclidean space but you can prove it just with complexity theory in a super hands off way. (The argument itself feels a bit like using a cardinality argument to establish that not all reals are computable or something.) Also, the proof generalizes to broader classes of groups to conclude something about their rep theory apparently out of nothing.

Specifically, we proved the following:

Theorem: Fix a sequence of finite fields \(F_k\) whose elements can be represented in \(\mathrm{poly}(k)\) space and where operations can be done in time \(\mathrm{poly}(k)\).Let \((G_1, S_1), (G_2, S_2), \dots\) be a sequence of groups equipped with generating sets with \(|S_k| = \mathrm{poly}(k)\) such that the language \[L := \left\{ (1^k, w) : k \in \mathbb{N}, w \text{ a word in the } S_k,\, w =_{G_k} 1 \right\}\] is coNP-hard. Suppose further that each \((G_k, S_k)\) includes into \((G_{k+1}, S_{k+1})\). Let \(d_k\) be the dimension of the minimal faithful \(F_k\)-representation of \(G_k\).

Then if \(\mathrm{coNP} \not\subset \mathrm{P/poly}\), \(d_k\) is not bounded by any polynomial in \(k\).

Here's a proof: Suppose \(d_k\) is bounded by some polynomial in \(k\). Then we can give a \(\mathrm{P/poly}\) algorithm for the problem \(L\) as follows: Specifically, for the inputs of length \(n\), our advice string will be the \(d_n\)-dimensional faithful representation of \(G_n\), encoded as a list of \(d_n\)-dimensional matrices, one matrix \(S_i\) for each generator \(s_i\).

Now say our input is \((1^k, w)\). We have \(k \leq n\), so \((G_k, S_k)\) includes into \((G_n, S_n)\). Thus we may decide if \(w =_{G_k} 1\) by checking if \(w =_{G_n} 1\). This we do by applying our representation to \(w\). I.e., \(w = s_{i_1} \dots s_{i_r}\), so we compute the matrix \(W = S_{i_1} \dots S_{i_r}\) which we can do in time, say \(d_n^4\). Then, since the representation is faithful, \(W\) is the identity matrix iff \(w = 1\), so we can just check if \(W = I\), which is done in \(d_n^2\) time. In total the runtime of our algorithm is roughly \(d_n^4 + d_n^2\), which is polynomial in \(n\) since by assumption \(d_n\) is.

This contradicts the assumption that \(\mathrm{NP}\) is not contained in \(\mathrm{P / poly}\), so \(d_k\) is not bounded by any polynomial in \(k\).

You can strengthen this a little bit in a few ways, but I won't talk about that here. To conclude that the dimension of the smallest faithful rep of \(S_{2^k}\) isn't bounded by a polynomial in \(k\), we can notice that reversible circuits are basically words in a particular generating set of \(S_{2^k}\). Namely, the generating set consisting of

- apply CNOT on the first two bits
- apply a Toffoli gate on the first 3 bits
- swap the \(i^{\text{th}}\) bit and the \(j^{\text{th}}\) bit

The corresponding language \(L\) essentially consists of the reversible circuits which compute the identity map. Deciding whether a reversible circuit is the identity is coNP hard, so the word problem \(L\) is too and our theorem applies. The conclusion that \(S_{2^k}\) doesn't always have small representations then follows.

I think this result is interesting because it shows how complexity theoretic information can be used as a very general tool for proving things about finitely representable mathematical objects. Certain things have to get very complicated or else they could be used to give efficient algorithms for hard problems! This is also interesting because it says that if we had a more exact complexity theory (which talked about specific polynomials or at least leading terms rather than just P, etc) then we could get even more specific information about the representation theory of such groups.

I think it gives an indication as to why separations in complexity theory are so hard to prove: You need to rule out all algorithms, ones that may based on properties of arbitrary mathematical objects, which could plausibly involved in algorithms given the ubiquity of coding problems in seemingly unrelated objects. So, separating classes ends up proving properties of lots of different objects.

A big thanks to Mario Sanchez, German Stefanich, and especially Nic Brody and Patrick Lutz for thinking about related stuff with me. Thanks also to Brandon Rayhaun for pointing out this is a total nuclear flyswatter.