Mozilla AI releases llamafile v0.10.0 with multimodal support and Anthropic Messages API compatibility

Mozilla AI has announced llamafile 0.10.0, a rebuilt version of its portable model-execution project that adds multimodal model support, tool calling, and an Anthropic Messages API endpoint alongside the project’s existing single-executable distribution format.

According to the Mozilla AI announcement, the team rebuilt llamafile from the ground up following a polyglot build of llama.cpp, designed to make it easier to keep pace with upstream changes. The goal, the post states, was to combine llamafile’s original portability guarantees — running on multiple operating systems and CPU architectures from a single file — with the full feature set available in current versions of llama.cpp.

The 0.10.0 release includes: APE executable support for multiple operating systems and CPU architectures; the full llama.cpp server feature set including support for recent models, multimodal input, tool calling, and an Anthropic Messages API; multimodal model support in the terminal chat interface; a CLI tool, an HTTP server, and a terminal chat UI; Metal GPU support; CUDA GPU support (currently tested on Linux); and CPU optimizations for different architectures. The release also includes Whisperfile for audio transcription.

The Anthropic Messages API addition means that tooling built to call Claude — including Claude Code — can be pointed at a local llamafile instance and run against a locally served model, according to the announcement.

Mozilla AI lists specific models available as pre-built llamafiles covering a range of capabilities: thinking, multimodal, and tool calling, in sizes from 0.6B to 27B parameters. The team also notes that users who already have GGUF model weights on their system can download just the main llamafile executable and load those files directly, without downloading a new bundled model.

The rebuild follows what the announcement describes as a gap between llamafile’s original feature set and what had accumulated in upstream llama.cpp. The post acknowledges features from the older version that have not yet been brought forward, with a documentation page listing them for users who need them, and notes that older binaries and source code remain downloadable from previous releases. Older llamafiles for a wide range of models are also still hosted on Hugging Face, with each listing specifying which version of the software it was built with.

Future work described in the announcement includes a llamafile-builder application for easier bundling of custom executables, Vulkan GPU support, and continued bug fixing. The team says it will prioritise features based on user feedback.

Mozilla AI has distributed the v0.10.0 executables, including the main llamafile and whisperfile binaries, at links included in the announcement.