This article summarises analysis from One Useful Thing by Ethan Mollick. Assessments and examples are Mollick’s own.

Ethan Mollick, writing in One Useful Thing, describes an experimental four-day class at the University of Pennsylvania in which executive MBA students — mostly working professionals with little or no coding experience — used Claude Code, Google Antigravity, ChatGPT, Claude, and Gemini to build startup prototypes from scratch. He then extends the observation into a framework for deciding when to delegate to AI.

The MBA startup experiment

Mollick writes that the students’ output over a couple of days was “an order of magnitude further along the path to a real startup” than he had seen from students working over a full semester before AI. Most prototypes had a core feature working, ideas were more diverse than usual, and market analyses were more detailed. He attributes part of the result not to his teaching but to the students’ domain expertise: “the key to success was actually… telling the AI what you want.”

Mollick also notes that AI lowered the cost of pivoting: because building is faster and cheaper, students could explore multiple directions without being locked into early decisions.

The delegation equation

Mollick proposes a framework for deciding when AI delegation pays off, based on three variables:

  • Human Baseline Time: how long the task would take a person to do without AI.
  • Probability of Success: how likely the AI is to produce acceptable output on a given attempt.
  • AI Process Time: the time required to write the prompt, wait for output, and evaluate it.

He illustrates the trade-off: for a task that takes one hour but requires 30 minutes to evaluate, delegation only makes sense when the probability of success is high, because each failed attempt costs more time than it saves. For a 10-hour task, the equation shifts significantly.

Mollick references the GDPval benchmark from OpenAI, which he describes as pitting experienced human experts in fields from finance to medicine to government against AI systems, with a separate group of experts acting as judges. He writes that tasks took experts an average of seven hours, while AI completed them in minutes but required an hour of expert evaluation. At the time of initial publication, human experts won the majority of judgements; with GPT-5.2, Mollick reports, models “tied or beat human experts an average of 72% of the time.” Using those figures, he calculates that a worker trying every seven-hour task — prompting the AI, evaluating for an hour, and completing the task themselves if the AI failed — would save approximately three hours on average.

Applying management techniques to delegation

Mollick argues that three practices from management improve the delegation equation: giving clearer instructions (raising probability of success), improving evaluation skills (reducing AI Process Time), and developing better ways to quickly assess whether AI output is good or bad. He says all three are aided by domain expertise.

He gives the example of asking Claude Code to generate a 1980s-style Sierra adventure game in a single prompt — with the game illustrated, playtested, and deployed — as a case where the AI had no specific requirements to meet and could improvise freely. He contrasts this with real delegation, where a specific output is required, and compares the problem of communicating intent to the documentation practices across professions: Product Requirements Documents in software development, shot lists in film, design intent documents in architecture, and Five-Paragraph Orders in the US Marines.

This piece is based solely on Mollick’s account in One Useful Thing.