WASM-first LLM inference, running entirely in your browser.
Initializing WASM module...
Select a GGUF file. Try SmolLM2-135M Q8_0 (~138 MB).
Paste a direct URL to a GGUF file. Download progress shown in real time.
No GGUF file handy? Try SmolLM2-135M (~138 MB) — fills in model + tokenizer URLs from HuggingFace.
Load a HuggingFace tokenizer.json for decoded text output.
Without it, raw token IDs are shown.
Wraps your prompt in the model's instruction format. Disable for raw completion.