Oxford Internet Institute sends five papers to ICLR 2026 in Rio de Janeiro

Several researchers and DPhil students from the Oxford Internet Institute (OII) at the University of Oxford are attending and presenting at the 14th International Conference on Learning Representations (ICLR) in Rio de Janeiro, running April 23–27, 2026.

Five OII-connected papers are featured at the conference across poster sessions and workshops. The following descriptions are drawn from the institute’s own summaries of the work.

SimBench, presented by Tiancheng Hu with Paul Röttger as senior author, introduces a benchmark for measuring how well large language models can simulate human behavior. The researchers find that even the best current models “still struggle,” that performance improves with model size, and that current training methods make models less accurate on questions where humans disagree. The paper also identifies particular difficulty in representing some demographic groups.

LLMs Encode Their Failures, presented by William Gitta Lugoloobi with Chris Russell as senior author, examines whether models can predict their own likelihood of answering a question correctly before generating a response. The researchers report that models often do encode this signal in their internal activations before output, and use this to build a routing system that directs each question to the model most suited to answer it. The institute’s summary states the approach “outperforms the strongest single model while cutting costs by up to 70%.” The paper is being presented at the Latent & Implicit Thinking workshop on April 27.

Task-Specific Knowledge Distillation via Intermediate Probes, also presented at the Latent & Implicit Thinking workshop by Ryan Brown with Chris Russell as senior author, addresses knowledge distillation from large to small models. Rather than using a large model’s final outputs as training signal — which the researchers argue can be unreliable on reasoning tasks — the approach trains probes on the large model’s internal representations and uses those as supervision. The institute’s summary says the method “consistently boosts accuracy on reasoning tests, especially when there isn’t much training data,” and requires no modification to either model.

A Positive Case for Faithfulness, authored by Harry Mayne with Adam Mahdi as senior author and presented at the Trustworthy AI workshop by Justin Kang of UC Berkeley, examines whether AI models’ self-explanations of their decisions are reliable. The researchers find that, contrary to common assumptions, these explanations “are more reliable than expected” and “do give useful clues about how the model reached its answer.”

LINGOLY-TOO, presented by Karolina Korgul and Ryan Kearns with Adam Mahdi as senior author, introduces a reasoning benchmark designed to minimize the influence of memorized training data. The approach uses templatized orthographic obfuscation to prevent models from relying on pattern recognition from pretraining. The paper is in Poster Session in Pavilion 3 (P3-#1509). A companion website is at oxrml.com/lingoly-too.

Pre-prints for all five papers are available via the links listed in the OII’s published announcement. OII researchers are available for press interviews through the institute’s communications office at [email protected].