# Indistinguishability amplification

\(\newcommand{\ob}{\mathcal{O}}\) \(\newcommand{\Q}{\mathbb{Q}}\) \(\newcommand{\what}{\widehat}\)

This blog post is about a generalization of indistinguishability obfuscation, which might be useful.

Indistinguishability obfuscation (IO) is a cryptographic primitive (which may or may not exist). The usual way it's defined is as follows.

Definition: An indistinguishability obfuscator \(\ob\) is a probabilistic function satisfying the following.

If \(C, D : 2^n \to 2\) are two circuits computing the same function, then \(\ob(C) \cong \ob(D)\) (where \(\cong\) is computational indistinguishability.)

A way to think about this definition is that IO takes two values which are indistinguishable according to a certain weak class of distinguishers (namely, those that are only allowed to evaluate circuits at points and observe the outputs) and promote that to an indistinguishability by arbitary distinguishers.

Thought of this way, a generalization of this functionality becomes apparent. Before discussing it in full generality, let's examine another instance of this kind of *indistinguishability amplification*.

Definition: Consider the field (for example) \(\Q[i]\), with \(a + b i\) represented as the pair (a, b).

An

indistinguishability amplifierfor \(\Q[i]\) is a probabilistic function satisfying the following.Suppose \(\alpha, \beta \in \Q[i]\) are indistinguishable by all circuits that only use the following operations to compute on values from \(\Q[i]\):

- \(\mathsf{mul} : \Q[i] \times \Q[i] \to \Q[i]\)
- \(\mathsf{add} : \Q[i] \times \Q[i] \to \Q[i]\)
- \(\mathsf{div} : \Q[i] \times \Q[i] \to \Q[i] + \{ \texttt{divide_by_zero} \}\)
- \(\mathsf{equal} : \Q[i] \times \Q[i] \to \{ 0, 1 \}\)
Then \(\ob(\alpha) \cong \ob(\beta)\). Moreover, we demand that the above operations can be lifted to work on obfuscated numbers. I.e., there are functions \(\what{\mathsf{mul}}, \what{\mathsf{add}}, \dots\) such that

- \(\what{\mathsf{mul}}(\ob(x), \ob(y))\) is an obfuscation of \(\mathsf{mul}(x, y)\)
- \(\what{\mathsf{add}}(\ob(x), \ob(y))\) is an obfuscation of \(\mathsf{add}(x, y)\)
- If \(y \neq 0\), \(\what{\mathsf{div}}(\ob(x), \ob(y))\) is an obfuscation of \(\mathsf{div}(x, y)\) and \(\what{\mathsf{div}}(\ob(x), \ob(0))\) is the plaintext \(\texttt{divide_by_zero}\).
- \(\what{\mathsf{equal}}(\ob(x), \ob(y))\) is the plaintext bit \(\mathsf{eqal}(x, y)\).

In other words, \(\ob\) is an indistinguishability amplifier for \(\Q[i]\) if whenever two numbers \(\alpha, \beta\) are *algebraically* indistinguishable, \(\ob(\alpha)\) and \(\ob(\beta)\) are *computationally* indistinguishable. For example, since \(i\) and \(-i\) satisfy all the same rational polynomials, they are algebraically indistinguishable according to our definition, and so their obfuscations \(\ob(i)\) and \(\ob(-i)\) would be computationally indistinguishable.

I'll describe the full general formal setup in a follow up post.

# Quadratic proof-complexity lower bounds from large scale geometry

Today I want to talk about a quadratic proof complexity lower bound for a certain coNP complete problem (in a certain proof system). The most interesting thing about this lower bound is that it comes from a simple application of theorems from large scale geometry/geometric group theory, and so further demonstrates the utility of the large-scale-geometric point-of-view for thinking about computational complexity. (I've written previously about such an application here.)

The coNP complete problem will be the word problem for reversible circuits. Let's say again what that is. There is a certain subgroup \(G\) of the group of permutations of bi-infinite bitstrings \(S(2^\bz)\). It is generated by the following elements:

- \(\sigma\), a shift:
- For each of the \(2^3!\) possible 3-bit reversible gates, \(g\), the map \(\what{g}\) which applies this gate to the first three bits:

The word problem for \(G\) is coNP hard since reversible circuits can be encoded as elements of the group. Given a reversible circuit \(C\), let \(|C|\) denote the length of the word representing \(C\) via this encoding.

Via this encoding, we can prove the following lower bound:

Theorem:Suppose we have a finite collection of rewrite rules \[p_i \to q_i\] where \(p_i\) and \(q_i\) are reversible circuits which compute the same function and such that if \(C\) computes the identity function then there exists a sequence of rewrites of length at most \(L(|C|)\) rewriting \(C\) to the empty string.Then for some constant \(c\), \(L(n) \geq c n^2\) infinitely often.

*Proof.* Suppose we had such a set of rewrite rules \(\Set{p_i \to q_i : i \in I}\). Consider the set of relations \(R = \Set{ p_i q_i^{-1} : i \in I}\). I claim that \(\angled{\sigma, \what{g_1}, \dots, \what{g_{2^3!}} | R}\) is a presentation for \(G\) (where \(g_1, \dots, g_{2^3}!\) are the \(2^3!\) possible gates on \(3\) bits).

First, this group surjects onto \(G\) via the map \(\varphi\) which sends each generator to the corresponding generator in \(G\): every relation \(p_i q_i^{-1}\) holds in \(G\) since \(p_i\) and \(q_i\) compute the same functions (which means they are equal as elements in \(G\)). Second, this map is injective. If \(w \in \ker H\), then \(w\) is a circuit representing the identity function, which by assumption there is a sequence of rewrites taking \(w\) to the empty string. Each of these rewrites \(p_i \to q_i\) corresponds to an equality which holds in \(H\) (since \(p_i =_H q_i\)). Thus we have \(w =_H e\), the identity, so \(\ker H\) is trivial.

So \(\angled{\sigma, \what{g_1}, \dots, \what{g_{2^3!}} | R}\) is a presentation for \(G\). Moreover, since every sequence of rewrites taking a trivial word \(C\) to the identity has length at most \(L(|C|)\), the Dehn function for \(G\) with this presentation is at most \(L(n)\). Suppose for every constant \(c\), \(L(n) < c n^2\) eventually. Then, by the isoperimetric gap theorem, \(G\) satisfies a linear isoperimetric inequality.

But then, (see Theorem 3.22 here), \(G\) must be hyperbolic. However, this is a contradiction. To show \(G\) is not hyperbolic, it suffices to exhibit an infinite torsion subgroup. That is, an infinite subgroup \(H \leq G\) such that for every \(h \in H\), \(h^k = e\) for some \(k\) (where \(e \in G\) is the identity). Let \(g \in G\) be negation on the \(0\)th bit. I.e., \[ g(w) = i \mapsto \begin{cases} \neg g(i) & i = 0 \\ g(i) & \text{otherwise} \end{cases} \] Clearly \(g^2 = e\). Now consider \(g_k := \sigma^{-k} g \sigma^k\) for \(k \in \bn\). This is negation on the \(k\)th bit. Let \(H = \angled{g_k : k \in \bn}\). \(H\) is infinite, and every \(g_k\) has order two. And each pair \(g_{k_1}, g_{k_2}\) commute (if you negate bit \(k_1\) then bit \(k_2\), it is the same as negating bit \(k_2\) and then bit \(k_1\)). Thus, \(H\) is none other than \(\oplus_{k \in \bn} \mathbb{Z}_2\), an infinite direct sum of \(\mathbb{Z}_2\)s. So, \(H\) is an infinite torsion subgroup, which means \(G\) cannot be hyperbolic.

## Take aways

At this point, we've shown that any system of "rewrite rules" for discovering triviality of circuits will take at least quadratically many rewrites on some trivial circuit. The proof uses the large scale geometry of a group \(G\) of reversible circuits. Roughly the only part that used the fact that \(G\) was a group was for certifying non-hyperbolicity via the existence of an infinite torsion subgroup. This suggests that it may be possible to extend this argument to non-reversible circuits by constructing a space of circuits whose non-hyperbolicity could be verified by some other means.

# When non-uniformity doesn't help

Today I want to write about a simple proof that certain problems that can be computed by polynomial-sized non-uniform read-once branching programs (also called non-uniform automata) can actually be computed uniformly just as efficiently. That is, in some cases, a degree of non-uniformity doesn't actually help solve the problem any more efficiently.

What is a non-uniform automaton? The basic idea is that of a deterministic finite automaton (i.e., a DFA, a.k.a. a finite state machine), but which is allowed to vary based on the input length. That is, a non-uniform automaton for a language \(L\) is a sequence of DFAs \(M_0, M_1, \dots\) such that for \(x\) of length \(n\), \(x \in L\) iff \(M_n\) accepts.

The fact is the following:

Fact:If \((G, A)\) is a finitely presented group and the word problem for \(G\) can be computed by non-uniform automata with polynomially many states, then the word problem for \(G\) has a linear time (multi-tape Turing machine) algorithm.

*Proof.* Suppose the word problem for \((G, A)\) has a non-uniform automaton \(M_0, M_1, \dots\) with polynomially many states. Say \(S(n)\) is a polynomial upper bound for the number of states of \(M_n\) (assume it is monotonically increasing WLOG). Let \(R(\ell)\) denote the set of elements of \(G\) with a word of length exactly \(\ell\).

I claim that \[ |R(n / 2)| \leq S(n) \] I.e., there are at least as many states in \(M_n\) as there are group elements with words of length exactly \(n/2\). Why? The map from \(R(n/2)\) to the states of \(M_n\) given by running \(M_n\) will be an injection: Take \(x, y \in R(n/2)\) and take words \(w_x, w_y\) of length \(n/2\) representing them. Suppose \(M_n\) is in the same state \(s\) after running on these words. \(M_n\) takes \(n/2\) more input characters, so from this state, we can feed it the word \(w_x^{-1}\), which has length \(n / 2\). Since it should accept \(w_x w_x^{-1}\) (since that's a trivial word), this will put it in an accepting state. But that means \(M_n\) will accept on \(w_y w_x^{-1}\) as well since it will first run on \(w_y\) and enter state \(s\), and then proceed as with the word \(w_x w_x^{-1}\) (since \(w_x\) also puts \(M_n\) in state \(s\)). Thus the word \(w_y w_x^{-1}\) is trivial, which means \(x = y\) in \(G\).

Now let \(B(e, r)\) denote the ball of radius \(r\) in the Cayley graph of \((G, A)\). We have \[ \begin{align*} |B(e, r)| &= \sum_{\ell \leq r} |R(\ell)| \\ &\leq \sum_{\ell \leq r} S(2 \ell) \\ &\leq r S(2 r) \end{align*} \] \(r S(2 r)\) is a polynomial in \(r\), and thus \(G\) has polynomial growth. Thus, by Gromov's theorem, \(G\) is virtually nilpotent. By the work in this paper of Goodman and Shapiro, \(G\) then has a linear time algorithm.

# Categorified Complexity Theory

Here's a famous theorem from complexity theory:

Theorem\[\PSPACE \subseteq \IP\]

There are many theorems like this. Like Toda's theorem:

Theorem\[\PH \subseteq \P^{\#\P}\]

or the PCP theorem:

Theorem\[\NP \subseteq \mathrm{PCP}[O(\log n), O(1)]\]

In general, many theorems are class inclusions: showing that the set of problems characterized by one definition are included in the set of problemss characterized by some other surprisingly different definition.

Let's focus on \(\PSPACE \subseteq \IP\) because I know the proof the best. I want to advance the thesis here that this (and all effective proofs -- more on that later -- of class inclusions) really ought to be thought of as the decatgeorification of constructions of maps of "realizers".

There is a category whose objects are sets of languages (i.e., subsets of \(2^*\)) and whose morphisms are just inclusions of sets. From this point of view, the above theorems are constructions of morphisms in this category.

But they are so much more than that! The proof that \(\PSPACE \subseteq \IP\) is a uniform algorithm which takes a \(\PSPACE\) machine and produces an "\(\IP\) machine", that is, an interactive verifier, in such a way that the language computed by the \(\PSPACE\) machine is equal to the language computed by the \(\IP\) machine.

Let's flesh this out a bit. Instead of thinking of a complexity class as a set of languages, let's think of it as a set of "realizers" \(R\) equipped with a map \(L : R \to \mathcal{P}(2^*)\). So \(L\) maps each realizer to the language which it realizes.

For example, the complexity class \(\P\) has as its realizers pairs \((q, M)\) where \(q\) is a polynomial and \(M\) is a Turing machine running in time \(q(n)\) and the map \(L_\P\) from realizers to languages is As a more interesting example, the class \(\NP\) has as its realizers pairs where \(q\) is a polynomial and \(V\) is a polytime Turing machine of the stated type. The interpretation map \(L_\NP\) isIn my head I think of complexity classes in this sense

- as "diagrams" \[ \begin{CD} R \\ @VVV{L} \\ \mathcal{P}(2^*) \end{CD} \]
- as a bundle over the space of languages \(\mathcal{P}(2^*)\), above each language sitting the fiber of realizers.
- or just as the set of associated realizers

In other words, this category of complexity classes is just the over category \(\mathrm{Set} / \mathcal{P}(2^*)\) (pronounced, "sets over languages").

Let's think about how this works out in a really simple example: There's a map \(\iota : \P \to \NP\) given by This is a bundle map since for any \(M\) we haveThat is, the map transforms the realizer into one which recognizes the same language.

The \(\PSPACE \subseteq \IP\) theorem is similar. It really exhbits a map \(\PSPACE \to \IP\) in our category which transforms a \(\PSPACE\) realizer (a polyspace Turing machine) into an \(\IP\) realizer (a randomized polytime interactive verifier).

Of course, the mere existence of a map \(C \to D\) (for classes \(C\) and \(D\)) is equivalent to a containment \(L(C) \subseteq L(D)\) (where by \(L(C)\) I mean the set of languages in the image of \(C\)'s interpretation map). But most class inclusion proofs (like \(\IP \subseteq \PSPACE\)) are actually doing far more: they are exhibiting effective constructions -- that is algorithms -- from the set of realizers of one class to the set of realizers of the other.

Apart from being (in my opinion) a more honest way of representing these kinds of facts (since I think everyone probably spends a few brain cells translating class inclusions into maps of realizers) I wonder if it would lead to new insights in complexity theory.

For example, can we get more interesting conclusions if we assume not just inclusions, but uniform inclusions? What happens if there is a "uniform inclusion" \(\NP \to \P / \mathrm{poly}\)? (It's not clear what that should mean, but maybe it would be a \(\P / \mathrm{Poly}\) machine which takes an \(\NP\) verifier \((V, q)\) and a string \(1^n\) as input and produces a circuit in time \(\mathrm{poly}{q(n), n}\) which computes the language corresponding to \(V\) for \(n\)-bit strings). Could you prove \(\P = \NP\) under this assumption?

Also, I wonder if anything can be gained by thinking of the category of complexity classes as bundles over some base space. The task of complexity theory is then classifying bundles. We have better tools for classifying bundles when the objects have more structure (e.g., fiber bundles, vector bundles, etc.). Is there a model of computation where complexity classes are vector bundles? Can we separate \(\P\) from \(\NP\) there?

# Tighter representation theory lower bounds from stronger assumptions

Let me preface this post by saying that the things proved here are more easily proved by the simplest counting arguments. I think the proofs are still worthwhile, however, as they give indications of how complexity theoretic assumptions can be used to prove purely mathematical statements, and the way in which stronger assumptions allow you to prove stronger statements.

Last time, I talked about how complexity theoretic assumptions given you lower bounds on the dimensions of irreducible representations of the symmetric group. This time, I want to show how certain stronger cryptographic assumptions give better lower bounds on such dimensions.

Namely, we'll assume that one way functions exist. I.e., there is some polytime computable function \(f : 2^k \to 2^m\) such that for any non-uniform PPT algorithm \(A\), we have \[ \mathrm{Pr}_{x \sim U(2^k)} \left[ A(f(x)) \in f^{-1}(f(x)) \right ] < \frac{1}{|p(k)|} \] for every polynomial \(k\) and for large enough \(k\). What this says is that \(f\) cannot be inverted on a generic input, except with vanishingly small probability.

Now let's prove the following theorem under this assumption:

TheoremFix a sequence of finite fields \(F_k\) with \(\log |F_k| = \mathrm{poly}(k)\) and operations computable in \(\mathrm{poly}(k)\) time.Let \(d_k\) be the dimension of the minimal \(F_k\)-representation of \(S(2^k)\). Then for any polynomial \(q(k)\), for all sufficiently large \(k\) we have \(d_k > q(k)\)

*Proof:* Let \(S(2^k)\) be the symmetric group on binary strings of length \(k\) equipped with a generating set \(X_k\) corresponding to a collection > of universal gates.

Fix a polynomial \(q(k)\). Suppose to the contrary that for infinitely many \(k\) we have \(d_k \leq q(k)\). Let \(\rho_k\) be the corresponding representation.

Then for such \(k\), we can given a non-uniform polytime algorithm for deciding the word problem in \(S(2^k)\). The advice string is \(\rho_k(x_1), \rho_k(x_2), \dots\) where the \(x_i\) are the generators. There are \(\mathrm{poly}(k)\) many generators and \(d_k\) is bounded by the polynomial \(q\), so the total advice is \(\mathrm{poly}(k)\). Now, we can just solve the word problem by taking our word, replacing generators with the corresonding matrices, multiplying them together and checking if the result is the identity matrix. All this can be done in polynomial time.

The word problem for \(S(2^k)\) on our given generating set is NP-complete (this is somewhat imprecise, what I really mean is described in the last post). We note that inverting any one-way function can be done in polytime with an NP oracle. Since we non-uniformly solve NP hard problems on arbitrarily large input sizes, we can non-uniformly invert one-way functions infinitely often, (come to think of it, one has to do a bit more work, but it should work out). This contradicts our cryptographic assumption.

Examining the proof, we can actually get away with a significantly weaker assumption. Namely, that there is some NP-hard problem \(L\) such that for any \(\mathrm{P/poly}\) algorithm \(A\), there is some constant \(N\) such that for all \(k > N\), some \(x\) of length \(k\) has \(A(k) \neq L(k)\).

# A more quantitative assumption

If we make a more quantitative assumption, such as the following non-uniform exponential time hypothesis:

There is a universal constant \(\delta\) such that any circuit for 3-SAT has size at least \(2^{\delta k}\).

Analyzing the circuit given in the above proof, using the fact that multiplying two \(d_k \times d_k\) matrices can be done in time \(d_k^3 \mathrm{poly}(k)\), we get the inequality \[ d_k^3 \mathrm{poly}(k) \geq 2^{\delta k} \] and thus for any \(\epsilon > 0\), \[ d_k \geq 2^{\frac{1}{3} (\delta - \epsilon) k} \] for large enough \(k\).

Sadly this is still far from the truth, which is that \(d_k = 2^k - 1\). Still, a purely complexity theoretic assumption lets us get pretty close.