Microsoft Research announces ‘1-bit,’ a small language model that can run on CPU

Large language models (LLMs) have taken over the world of AI, offering vast knowledge in an instant. However, to run these models natively, one typically needs a server or a PC with a powerful accelerator. Microsoft Research has introduced a new “1-bit” LLM with a two-billion parameter scale that can run on a CPU.

Microsoft’s 1-bit LLM is trained on a corpus of 4 trillion tokens and offers performance equivalent to leading open-weight, full-precision models of similar size, while being efficient in terms of memory, energy, and latency.

You can now run 100B parameter models on your local CPU without GPUs.

Microsoft finally open-sourced their 1-bit LLM inference framework called bitnet.cpp:

> 6.17x faster inference
> 82.2% less energy on CPUs
> Supports Llama3, Falcon3, and BitNet models pic.twitter.com/AGPOsUjlyB

— Lior⚡ (@LiorOnAI) April 19, 2025

This model is capable of understanding language, mathematical reasoning, coding proficiency, and conversational abilities. It was trained from scratch and supports a maximum sequence length of 4,096 tokens. The activations are quantised to 8-bit integers, and the weights are quantised to ternary values {-1, 0, +1} using absmean quantisation during the forward pass.

The BitNet b1.58 2B4T is also an open-source model that requires only 0.4 GB of memory, whereas other similarly sized models typically require between 2 to 5 GB. It can run on a single ARM- or x86-based CPU. The Microsoft Research team is continuing to study this model to better understand its workings and explore how this technology can serve as an alternative to large language models that require expensive GPUs.

Interested users can download the BitNet b1.58 2B4T from Hugging Face and test it for themselves. This model outperforms other similarly sized language models such as LLaMA 3.2 1B, Gemma-3 1B, Qwen2.5 1.5B, SmolLM2 1.7B, and MiniCPM 2B.

Source link

Microsoft Research announces ‘1-bit,’ a small language model that can run on CPU

FTC vs Meta: Key moments from Week 1 of antitrust trial with Mark Zuckerberg, Sheryl Sandberg

Trent Alexander Arnold to Real Madrid: Virgil Van Dijk says defender deserves applause

Gaurav Khanna reacts to allegations of copying a Swiss chef’s disha on Celebrity MasterChef: ‘I have heard Shah Rukh Khan copies Dilip Kumar…’

‘Sach bolu ki jhoot’: Former cricketer Yuvraj Singh’s mother Shabnam on building a relationship with his wife Hazel Keech; expert on bonding with child’s partner

Leave a Comment Cancel reply