The issue here is memory in PyTorch is byte addressable and that's a limitation ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

formalsystem 6 months ago | parent | context | favorite | on: Quantized Llama models with increased speed and a ...

The issue here is memory in PyTorch is byte addressable and that's a limitation we can't solve without making a lot more changes to PyTorch. But in your specific case, if you'd like to pack more data into `values` you can use a combination of clever bit shifting, torch.cat and other bit twiddling pytorch like ops to pack more data. It's a trick we use quite heavily in torchao

Evidlo 6 months ago [–]

Arent int8s byte-aligned though? I thought this restriction was originally motivated by maintenance overhead of having to support more dtypes.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact