AI
May 15, 2026WhichLLM Ranks Local Models Against Your Actual Hardware
WhichLLM is an open-source tool that takes your hardware specs and returns a ranked list of local LLMs sorted by benchmark performance, removing the trial-and-error from model selection.
Picking a local LLM today means cross-referencing VRAM requirements, quantization levels, and benchmark scores across a dozen sources. WhichLLM collapses that into a single lookup.
The tool accepts your hardware profile and returns a sorted list of models ranked by real benchmark data. The ranking accounts for what will actually run on your machine, not just what theoretically fits in memory. That distinction matters: a model that loads is not the same as a model that performs.
For solo founders and engineers running inference locally — whether to cut API costs, stay offline, or stay within a compliance boundary — the friction point has always been discovery. You either rely on word-of-mouth or spend hours on r/LocalLLaMA calibrating expectations. WhichLLM replaces that loop with a deterministic answer tied to your specific setup.
The project is open-source and lives on GitHub, which means the benchmark data and ranking logic are auditable. That matters more than it sounds. Closed recommendation tools have no accountability for how they weight scores. With the source available, engineers can inspect the methodology, contribute hardware profiles, and extend the benchmark coverage as new models ship.
The timing is useful. The local model landscape is moving fast — Mistral, Llama, Phi, Qwen, and Gemma variants ship on short cycles, and keeping a mental model of what runs well on a given GPU is genuinely hard. A tool that tracks this systematically has durable value.
Limitations to watch: benchmark coverage depends on community contribution, and the ranking is only as good as the data behind it. Hardware variance (driver versions, thermal throttling, memory bandwidth differences between GPU SKUs) means real-world results will always have noise. Treat the output as a strong starting point, not a guarantee.
The project is worth bookmarking for anyone maintaining a local inference setup or evaluating whether a given model fits a constrained deployment target.
Source
news.ycombinator.com