https://github.com/mit-han-lab/nunchaku

oneshtein · 2024-11-09T15:10:08 1731165008

Cannot compile it locally on Fedora 40:

  nunchaku/third_party/spdlog/include/spdlog/common.h(144): error: namespace "std" has no member "function"
  using err_handler = std::function<void(const std::string &err_msg)>;
                                   ^

mesmertech · 2024-11-09T15:18:32 1731165512

Yea its a pain, I'm trying to make an api endpoint for a website I own, and working on a docker image. This is what I have for now that "just" works:

the conda always yes thing makes sure that you can just paste the script and it all works instead of having to press "y" for each install. Also if you don't feel like installing a wheel from random person on the internet, replace that step with "pip install -e ." as the repo suggests. I compiled that one with cuda 12.4 cause that was the part takes the most time and is what most often seems to be breaking.

Also I'm not sure if this will work on Fedora, I tried this on a runpod machine with 4090(apparently it only works on few cards, 3090, 4090, a100 etc) with Cuda 12.4 on host machine and "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04" this image as base.

EDIT: using pastebin instead as HN doesn't seem to jive with code blocks: https://pastebin.com/zK1z0UdM

oneshtein · 2024-11-09T17:35:28 1731173728

Almost working:

  [2024-11-09 19:33:55.214] [info] Initializing QuantizedFluxModel
  [2024-11-09 19:33:55.359] [info] Loading weights from ~/.cache/huggingface/hub/models--mit-han-lab--svdquant-models/snapshots/d2a46e82a378ec70e3329a2219ac4331a444a999/svdq-int4-flux.1-schnell.safetensors
  [2024-11-09 19:34:01.432] [warning] Unable to pin memory: invalid argument
  [2024-11-09 19:34:02.143] [info] Done.
  terminate called after throwing an instance of 'CUDAError'
    what():  CUDA error: pointer does not correspond to a registered memory region (at /nunchaku/src/Serialization.cpp:32)

mesmertech · 2024-11-09T17:53:14 1731174794

prolly make sure your host machine cuda is also 12.4 and if not, update the other cuda versions I have on the pastebin to the one you have. I don't think it works with cuda 11.8 tho, remember trying it once

but yea, can't help you outside of runpod, I haven't even tried this on my home PCs yet. for my usecase of serverless API, it seems to work