Support running LLM and building AI Agents locally and efficiently on edge devices
20 token/s prefix and decoding speed for Phi-3 (data collected on SAMSUNG S23)
CPU, GPU, and hybrid CPU + GPU inference
1.5-bit, 2-bit, 4-bit and 8-bit integer quantization
Android, iOS, MacOS and Windows Operating Systems
Explore our collection of 200+ Premium Webflow Templates