Where do you host your model? I am looking around on where I can deploy one with...

wongarsu · on Aug 15, 2023

The easy answer here would be either pure CPU at Hetzner (e.g. a 24 core i9 with 64GB RAM for €84/month) or GPU at lambdalabs (starting at $360/month). Or maybe vast.ai, if you find a trustworthy offering with good uptime.

Though GPU workloads are still a point where building your own server and running it from your basement or putting it in colo can be very attractive.

Mernit · on Aug 15, 2023

You could easily host your model on https://beam.cloud (I'm a founder). You just add a decorator to your existing Python code:

    from beam import App, Runtime

    app = App(name="gpu-app", runtime=Runtime(gpu="T4"))

    @app.rest_api()
    def inference():
        print("This is running on a GPU")

Then run beam deploy {your-app}.py and boom, it's running on the cloud

jokethrowaway · on Aug 15, 2023

An A10G for 1200$ per month will ruin me financially

coder543 · on Aug 16, 2023

I think the Beam website should be a lot clearer about how things work[0], but I think Beam is offering to bill you for your actual usage, in a serverless fashion. So, unless you're continuously running computations for the entire month, it won't cost $1200/mo.

If it works the way I think it does, it sounds appealing, but the GPUs also feel a bit small. The A10G only has 24GB of VRAM. They say they're planning to add an A100 option, but... only the 40GB model? Nvidia has offered an 80GB A100 for several years now, which seems like it would be far more useful for pushing the limits of today's 70B+ parameter models. Quantization can get a 70B parameter model running on less VRAM, but it's definitely a trade-off, and I'm not sure how the training side of things works with regards to quantized models.

Beam's focus on Python apps makes a lot of sense, but what if I want to run `llama.cpp`?

Anyways, Beam is obviously a very small team, so they can't solve every problem for every person.

[0]: what is the "time to idle" for serverless functions? is it instant? "Pay for what you use, down to the second" sounds good in theory, but AWS also uses per-second billing on tons of stuff... EC2 instances don't just stop billing you when they go idle, though, you have to manually shut them down and start them up. So, making the lifecycle clearer would be great. Even a quick example of how you would be billed might be helpful.

bananapub · on Aug 15, 2023

why did you decide to make such a bad pitch for your product like this?

jokethrowaway · on Aug 15, 2023

I found gpu-mart.com but haven't tested yet.

An A4000 for 139$ x 12 is not terrible

waffletower · on Aug 16, 2023

Last year I did invest in a dual RTX 3090 Ryzen self-build tower. It runs fairly cool in the basement. So I literally self-host. I am confident that I have or soon will reach the cheaper to self host point of the cost curve, particularly as the two GPUs see very consistent use.

raihansaputra · on Aug 16, 2023

what are you consistently using it for?

Zetaphor · on Aug 16, 2023

I use Runpod, an A4000 is $0.31/hr