BitNet: Scaling 1-bit Transformers for Large Language Models
but then in the follow-up paper they started using 1.58bit weights (https://arxiv.org/pdf/2402.17764)
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}.