GRASP makes long-horizon planning with learned world models practical through three targeted fixes

Researchers at Berkeley AI Research have published a blog post describing GRASP, a gradient-based planner for learned dynamics models that targets the failure modes that make long-horizon planning fragile in practice. The work, by Michael Psenka, Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar, proposes three modifications to standard gradient-based trajectory optimization: lifting the trajectory into virtual states to enable parallelization, adding stochasticity to iterates for exploration, and reshaping gradients to avoid brittle signals through high-dimensional vision encoders.

The post opens by framing the core tension: learned world models are becoming capable of predicting long sequences of future observations in visual spaces, but the ability to plan through these models does not follow automatically from their predictive accuracy. As the post states, “having a powerful predictive model is not the same as being able to use it effectively for control/learning/planning.”

Why long-horizon planning breaks

The post identifies three distinct failure modes that compound as planning horizons grow.

The first is the exploding and vanishing gradient problem. When optimizing a sequence of actions by differentiating through a world model applied repeatedly to itself, the Jacobian conditioning scales exponentially with the horizon length. For earlier actions in the sequence, this means either vanishingly small gradients or numerically unstable updates. The post notes that this is structurally identical to the backpropagation-through-time problem familiar from recurrent network training.

The second failure mode is a landscape problem. Short-horizon planning can often succeed with a greedy strategy. But as the post describes, longer tasks “are more likely to require non-greedy behavior: going around a wall, repositioning before pushing, backing up to take a better path.” The distance to goal along the optimal path is non-monotonic, and the resulting loss landscape contains local minima that greedy gradient descent gets stuck in.

The third failure mode is specific to deep learning-based world models that operate in high-dimensional learned latent spaces. The post describes state-input gradients through vision encoders as exhibiting adversarial brittleness — the models are “sharper in normal directions” orthogonal to training data manifolds — which produces misleading signals when used to update actions.

The GRASP solution: lifting, stochasticity, and reshaping

GRASP addresses the exploding gradient and local minima problems together through a technique called lifting, also described in planning literature as collocation. Instead of rolling out the trajectory sequentially and differentiating through the entire chain, the planner treats intermediate states as free variables to be jointly optimized alongside actions. The dynamics constraint — that each state must equal the world model applied to the previous state and action — becomes a soft penalty rather than a hard constraint enforced by sequential rollout.

The post describes two immediate benefits of this formulation. First, because each world model evaluation now depends only on local variables (adjacent state-action pairs), all time steps can be computed in parallel rather than sequentially. Second, the optimization landscape changes: the lifted objective shares the same global minimizers as the original rollout objective, but its local behavior is more tractable.

The stochasticity component adds noise directly to the state iterates during optimization, providing a form of exploration that helps escape local minima. The gradient reshaping component addresses the vision model brittleness by separating gradient signals through states from gradient signals through actions, providing cleaner updates for action variables.

Demonstrated on visual control tasks

The post includes demonstrations on two tasks: BallNav and Push-T, shown as animated GIFs in the blog. These are described as visual control problems where the world model operates in pixel or latent visual space. The post reports performance results across varying horizon lengths on Push-T, showing GRASP outperforming CEM, gradient descent, and LatCo baselines at longer horizons.

The broader claim is that GRASP makes gradient-based planning “much more robust” for long horizons specifically — the regime where prior methods break down. The post argues that as world models scale in capability, the planning bottleneck will become the limiting factor, and that addressing it now is more tractable while models are still advancing.