MIRI summarises technical paper on verifying international AI development agreements

The Machine Intelligence Research Institute has published an informal summary of “Mechanisms to Verify International Agreements About AI Development,” a paper by Technical Governance Team researchers Aaron Scher and Lisa Thiergart originally published in November 2024. The summary covers three policy goals the paper examines — tracking AI compute, verifying the absence of large-scale training runs, and certifying model evaluations — and describes potential verification methods for each.

On tracking AI compute, the post describes two approaches. The low-technology approach involves physical inspections: international inspectors visiting datacentres to count chips, examine them for tampering, audit security, and set up cameras. The post notes this is similar in structure to the START nuclear weapons treaty. The high-technology approach involves chips designed or modified to confirm their location remotely, for instance by pinging external servers and measuring latency. The post notes that current chip security measures are insufficient to prevent a well-resourced attacker from extracting a chip’s private key and spoofing its location, and that tamper-proofing remains a research target. The summary recommends tracking chips rather than datacentres, on the basis that the chip supply chain is more concentrated and easier to monitor.

On verifying the absence of large-scale training runs, the post notes that very large training runs have characteristics that distinguish them from inference workloads: they require high interconnect bandwidth between chip clusters, and have distinct patterns of power draw and network activity. It describes several verification methods, including requiring datacentres to log chip activities, re-running sampled activities on mutually trusted hardware, and classifying workloads based on externally detectable metrics such as power draw and bandwidth. The post notes that indirect methods are difficult to make adversarially robust, particularly as algorithmic efficiency improves and distributed training becomes more feasible.

On certifying model evaluations, the post describes the challenge of ensuring that the model being evaluated is the same model that was trained or deployed, and suggests Trusted Execution Environments as a mechanism to verify this. The post states that as of 2026, AI capabilities are “still outpacing evaluators’ ability to test them,” and that the science of evaluations remains in early stages.

The summary also covers cross-cutting mechanisms: whistleblower programmes, AI-enabled workload classification, safety cases for AI deployments — a structured argument that a given application is safe, common in aviation and medical device regulation — and monitoring of deployed model behaviour. The post notes that monitoring every copy of a deployed AI remains difficult but may be feasible if model weights can be secured to a small number of datacentres.

The full paper and executive summary are linked in the post.