ARC proposes a computational no-coincidence conjecture formalising why structured circuits should be explainable

The Alignment Research Center has published a formal conjecture that attempts to capture in computational terms the intuition that surprising mathematical coincidences must have underlying explanations — a principle the post traces to a recent paper in Annals of Mathematics and Philosophy by Fields medalist Timothy Gowers.

Gowers’s informal principle, as quoted in the ARC post, states that “if an apparently outrageous coincidence happens in mathematics, then there is a reason for it.” The ARC post sets out to formalise this in a restricted setting relevant to the organisation’s interpretability research.

The resulting conjecture, which ARC calls the computational no-coincidence conjecture, concerns reversible circuits. The post defines the relevant property P(C) as follows: a reversible circuit C mapping 3n-bit inputs to 3n-bit outputs satisfies P(C) if no input ending in n zeros maps to an output that also ends in n zeros. The post explains why this property is highly unlikely to arise by chance: modelling a random reversible circuit as a truly random permutation, the probability that none of the 2^(2n) qualifying inputs maps to an output with n trailing zeros is roughly e^(-2^n) — exponentially small.

The conjecture states that, despite this extreme improbability for a random circuit, circuits that do satisfy P(C) must have internal structure that explains why, and this structure can be expressed in a polynomial-length advice string. More precisely: there exists a polynomial-time verification algorithm V such that for any circuit satisfying P(C), there exists a polynomial-length advice string pi where V(C, pi) = 1, while for 99 percent of random reversible circuits, no such pi exists.

The post sets out ARC’s reasoning for arriving at this specific formulation through earlier attempts, both of which had problems. A first attempt using circuits mapping n-bit inputs to a single bit, with the coincidence being that the circuit always outputs 1, was abandoned because random circuits of that type do not behave like random functions — many natural distributions over random circuits produce mostly constant functions. A second attempt using circuits from 2n bits to n bits, with the coincidence that the circuit never outputs the all-zeros string, was discarded because a naive sampling-based verifier could trivialise the conjecture without needing the structural advice string at all. The reversible-circuit formulation is presented as resolving both problems.

ARC connects the conjecture to complexity theory, noting that it is a weakened version of a known open problem. The post states that determining whether P(C) holds for a given circuit is coNP-hard, and that if the “99 percent” qualifier in the conjecture were replaced with “100 percent,” the resulting statement would be equivalent to NP = coNP. ARC describes the weakened form as, to its knowledge, a new variant.

The post explains why the conjecture matters for ARC’s interpretability research: if circuits exhibiting surprising properties always have polynomial-length structural explanations, then the mechanistic estimation agenda described in ARC’s companion post on matching sampling is not blocked in the worst case by the existence of circuits that defy short description. The no-coincidence conjecture is presented as a desideratum that heuristic explanation frameworks need to satisfy.

ARC notes it is not committed to the specific formulation presented, and says that if a sampling-based verifier can be constructed that trivialises the conjecture, the statement would need to be revised. The post invites input from theoretical computer scientists.