I think there is some nuance needed here. If you ask support to partition your bucket then they will be a bit annoying if you ask for specific partition points and the first part of the prefix is not randomised. They tried to push me to refactor the bucket first to randomise the beginning of the prefix, but eventually they did it.
The auto partitioning is different. It can isolate hot prefixes on its own and can intelligently pick the partition points. Problem is the process is slow and you can be throttled for more than a day before it kicks in.
They can do this with manual partitioning indeed. I've done it before, but it's not ideal because the auto partitioner will scale beyond almost anything AWS will give you with manual partitioning unless you have 24/7 workloads.
> you can be throttled for more than a day before it kicks in
I expect that this would depends on your use case. If you are dropping content you need to scale out to tons of readers, that is absolutely the case. If you are dropping tons of content with well distributed reads, then the auto partitioner is The Way.
It's 480B params, not 480GB. The 4 bit version of this is 270GB. I believe it's trained at bf16, so you need over a TB of memory to operate the model at bf16. No one should be trying to replace claude with a quantized 8 bit or 4 bit model. It's simply not possible. Also, this model isn't going to be as versed as Claude at certain libraries and languages. I have something written entirely my claude which uses the Fyne library extensively in golang for UI. Claude knows it inside and out as it's all vibe coded, but the 4 bit Qwen3 coder just hallucinated functions and parameters that don't exist because it wasn't willing to admit it didn't know what it was doing. Definitely don't judge a model by it's quant is all I'm saying.
I tried using the "fp8" model through hyperbolic but I question if it was even that model. It was basically useless through hyperbolic.
I downloaded the 4bit quant to my mac studio 512GB. 7-8 minutes until first tokens with a big Cline prompt for it to chew on. Performance is exceptional. It nailed all the tool calls, loaded my memory bank, and reasoned about a golang code base well enough to write a blog post on the topic: https://convergence.ninja/post/blogs/000016-ForeverFantasyFr...
Writing blog posts is one of the tests I use for these models. It is a very involved process including a Q&A phase, drafting phase, approval, and deployment. The filenames follow a certain pattern. The file has to be uploaded to s3 in a certain location to trigger the deployment. It's a complex custom task that I automated.
Even the 4bit model was capable of this, but was incapable of actually working on my code, prefering to halucinate methods that would be convenient rather than admitting it didn't know what it was doing. This is the 4 bit "lobotomized" model though. I'm excited to see how it performs at full power.
reply