# ShouldIRunThisModel

> A community site for sharing real-world performance measurements and subjective ratings of locally-run LLMs.

There are plenty of resources that tell you whether a model will run on your hardware. Few tell you whether you *should* — and what to actually expect.

ShouldIRunThisModel collects two things:

1. **Subjective ratings** — "Can I run this model?" (does it fit in memory, is it fast enough?) and "Should I run this model?" (does it perform well for my use case?). Users also vote on which use cases a model is good for: coding, chat, reasoning, tool use, vision, writing, RAG/search, and computer use.

2. **Performance measurements** — tokens per second, time to first token, and memory usage at multiple context lengths (e.g. 256, 1024, 4096, 16384 tokens), submitted per model + hardware + interface combination.

## Key pages

- [/recipes](https://shouldirunthismodel.com/recipes) — Browse and filter ratings (“recipes”: model + hardware + interface)
- [/models/{shortname}](https://shouldirunthismodel.com/models/example) — Model detail and its recipes (no separate model directory)
- [/recipes/new](https://shouldirunthismodel.com/recipes/new) — Submit a recipe (setup + ratings)
- [/performance](https://shouldirunthismodel.com/performance) — Compare performance across hardware
- [/why](https://shouldirunthismodel.com/why) — Why this site exists

## Data model

Recipes are scoped to a **model** (name + quantization), **hardware** (e.g. "MacBook Pro M3 Max 128GB"), and **interface** (e.g. "LM Studio"). Each recipe includes:

- `can_i_run_this_model_rating` — 1–5 scale: whether the model runs usably on the hardware
- `should_i_run_this_model_rating` — 1–5 scale: whether the model performs well enough to recommend
- `use_case_votes` — per use case upvote/downvote
- `measurements` — array of `{ context_tokens, tokens_per_second, time_to_first_token_seconds, memory_usage_gib }`

## Benchmark tool

A local benchmark tool (`tools/benchmark-server.js`) connects to LM Studio, loads models on demand, runs inference at multiple context sizes, and exports results as JSON that can be imported on the performance page.