Flare LLM

WASM-first LLM inference, running entirely in your browser.

Status

Initializing WASM module...

1. Load a model

Select a GGUF file. Try SmolLM2-135M Q8_0 (~138 MB).

Paste a direct URL to a GGUF file. Download progress shown in real time.

No GGUF file handy? Try SmolLM2-135M (~138 MB) — fills in model + tokenizer URLs from HuggingFace.

Connecting...

2. Tokenizer not loaded

Load a HuggingFace tokenizer.json for decoded text output. Without it, raw token IDs are shown.

3. Chat

Wraps your prompt in the model's instruction format. Disable for raw completion.

System prompt (optional, applied on first turn)
Sampling parameters
0.70
0.95
40
1.10
128
Your message