Yaarbal

LLMs that run on your machine, and a swarm of others doing the same.

Open the chat in your browser, or install it and run a model locally. Either way, the model is on someone's hardware, not a provider's API.

Open chat.yaarbal.app or run it locally

Run it on your machine

Python 3.11 or newer. macOS or Linux. Pulls a ~600 MB Qwen 2.5 0.5B on first run.

pip install yaarbal && yaarbal run qwen2.5-0.5b-instruct "hello"
Terminal recording: pip install yaarbal followed by yaarbal run qwen2.5-0.5b-instruct streaming a haiku response token-by-token. The full flow takes about three minutes from install to first token.

How it works

Three boxes labelled Leecher, Tracker, and Seeder, connected by arrows. The Leecher discovers seeders via the Tracker, then sends prompts directly to a Seeder, which streams tokens back. Credits flow from the Leecher to the Seeder per inference.

Seeders host complete models and earn credits. Leechers spend credits to run inference. The tracker matches them. No central inference provider.