Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks, but I've skimmed through both and couldn't find an answer on why they call it "1-bit".


The original BitNet paper (https://arxiv.org/pdf/2310.11453)

  BitNet: Scaling 1-bit Transformers for Large Language Models
was actually binary (weights of -1 or 1),

but then in the follow-up paper they started using 1.58bit weights (https://arxiv.org/pdf/2402.17764)

  The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
This seems to be first source of the confounding of "1-bit LLM" and ternary weights that I could find.

  In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}.


It’s “1-bit, for particularly large values of ‘bit’”


Should be 1-trit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: