All notes

AI

May 8, 2026

Antirez Ships a Local DeepSeek 4 Flash Inference Engine for Apple Metal

Salvatore Sanfilippo (antirez) has released ds4, a minimal local inference engine targeting Apple Metal for running DeepSeek 4 Flash on-device without cloud dependencies.

ds4 is a lightweight inference engine written to run DeepSeek 4 Flash locally on Apple Silicon via the Metal GPU API. The project comes from Salvatore Sanfilippo, best known as the creator of Redis.

The target runtime is Metal, meaning inference runs on the GPU cores built into M-series Macs. This sidesteps cloud API costs and latency for workloads that fit in unified memory, and it removes the data-egress concern that blocks some professional use cases.

The codebase is minimal by design. Rather than wrapping an existing inference stack, antirez appears to have written the engine from closer to first principles — consistent with his prior work on small, auditable systems. That approach trades ecosystem breadth for legibility: engineers can read the full kernel path without navigating a large abstraction layer.

For solo founders and small teams already on Apple Silicon, a Metal-native path for DeepSeek 4 Flash is meaningful. DeepSeek 4 Flash is a fast, low-cost model in the DeepSeek family; running it locally on a MacBook Pro or Mac Studio collapses the iteration loop for prompt engineering, fine-tuning experiments, and eval harnesses that otherwise rack up API calls.

The practical constraint is memory. Unified memory on Apple Silicon is shared between CPU and GPU, so model weight size directly competes with the rest of the working set. Engineers should benchmark their specific Mac configuration before committing ds4 to a production pipeline.

The project is open source. Engineers interested in a readable Metal inference implementation, or in running DeepSeek 4 Flash offline, should read the repository directly at github.com/antirez/ds4. Given antirez's track record of tight, well-commented C, the source itself is likely worth studying independent of the use case.