Why this site?
There are plenty of resources that tell you whether a local model will run on your hardware. Few tell you whether you should โ and what setup is needed to unlock good performance.
We therefore collect recipes and subjective ratings: one for whether you can run it on your hardware, and one for whether it's working well. It is up to you to define what these ratings mean exactly.
Of course, you can also submit benchmark measurements. We're focused on language models, and collect tokens per second, time to first token, and memory usage at different context lengths. Interfaces like LM Studio provide performance information after every message if you run as developer.