# When non-uniformity doesn't help

Today I want to write about a simple proof that certain problems that can be computed by polynomial-sized non-uniform read-once branching programs (also called non-uniform automata) can actually be computed uniformly just as efficiently. That is, in some cases, a degree of non-uniformity doesn't actually help solve the problem any more efficiently.

What is a non-uniform automaton? The basic idea is that of a deterministic finite automaton (i.e., a DFA, a.k.a. a finite state machine), but which is allowed to vary based on the input length. That is, a non-uniform automaton for a language \(L\) is a sequence of DFAs \(M_0, M_1, \dots\) such that for \(x\) of length \(n\), \(x \in L\) iff \(M_n\) accepts.

The fact is the following:

Fact:If \((G, A)\) is a finitely presented group and the word problem for \(G\) can be computed by non-uniform automata with polynomially many states, then the word problem for \(G\) has a linear time (multi-tape Turing machine) algorithm.

*Proof.* Suppose the word problem for \((G, A)\) has a non-uniform automaton \(M_0, M_1, \dots\) with polynomially many states. Say \(S(n)\) is a polynomial upper bound for the number of states of \(M_n\) (assume it is monotonically increasing WLOG). Let \(R(\ell)\) denote the set of elements of \(G\) with a word of length exactly \(\ell\).

I claim that \[ |R(n / 2)| \leq S(n) \] I.e., there are at least as many states in \(M_n\) as there are group elements with words of length exactly \(n/2\). Why? The map from \(R(n/2)\) to the states of \(M_n\) given by running \(M_n\) will be an injection: Take \(x, y \in R(n/2)\) and take words \(w_x, w_y\) of length \(n/2\) representing them. Suppose \(M_n\) is in the same state \(s\) after running on these words. \(M_n\) takes \(n/2\) more input characters, so from this state, we can feed it the word \(w_x^{-1}\), which has length \(n / 2\). Since it should accept \(w_x w_x^{-1}\) (since that's a trivial word), this will put it in an accepting state. But that means \(M_n\) will accept on \(w_y w_x^{-1}\) as well since it will first run on \(w_y\) and enter state \(s\), and then proceed as with the word \(w_x w_x^{-1}\) (since \(w_x\) also puts \(M_n\) in state \(s\)). Thus the word \(w_y w_x^{-1}\) is trivial, which means \(x = y\) in \(G\).

Now let \(B(e, r)\) denote the ball of radius \(r\) in the Cayley graph of \((G, A)\). We have \[ \begin{align*} |B(e, r)| &= \sum_{\ell \leq r} |R(\ell)| \\ &\leq \sum_{\ell \leq r} S(2 \ell) \\ &\leq r S(2 r) \end{align*} \] \(r S(2 r)\) is a polynomial in \(r\), and thus \(G\) has polynomial growth. Thus, by Gromov's theorem, \(G\) is virtually nilpotent. By the work in this paper of Goodman and Shapiro, \(G\) then has a linear time algorithm.