Hacker Newsnew | past | comments | ask | show | jobs | submit | chisleu's commentslogin

This is going to improve the quality of LLM responses for users. I'm for this.

> You don’t have to randomize the first part of your object keys to ensure they get spread around and avoid hotspots.

As of when? According to internal support, this is still required as of 1.5 years ago.


I think there is some nuance needed here. If you ask support to partition your bucket then they will be a bit annoying if you ask for specific partition points and the first part of the prefix is not randomised. They tried to push me to refactor the bucket first to randomise the beginning of the prefix, but eventually they did it.

The auto partitioning is different. It can isolate hot prefixes on its own and can intelligently pick the partition points. Problem is the process is slow and you can be throttled for more than a day before it kicks in.


> but eventually they did it

They can do this with manual partitioning indeed. I've done it before, but it's not ideal because the auto partitioner will scale beyond almost anything AWS will give you with manual partitioning unless you have 24/7 workloads.

> you can be throttled for more than a day before it kicks in

I expect that this would depends on your use case. If you are dropping content you need to scale out to tons of readers, that is absolutely the case. If you are dropping tons of content with well distributed reads, then the auto partitioner is The Way.


He's not talking about the prefix, just the beginning of the object key.

The prefix is not separate from the object key. It's part of it. There's no randomization that needs to be done on either anymore.

and indeed the bucket is not separate from the object key. the API separates it logically "for humans" but it's all one big string

/agree

We are in the infancy of LLM technology.


How was he doing "complex agentic coding" when the APIs have such extreme context and throughput limitations?


holy shit it does. The scene with him inventing the new compression algorithm basically foreshadowed the gooning to follow local LLM availability.


I use opus or gemini 2.5 pro for plan mode and sonnet for act mode in Cline. https://cline.bot

It's my experience that Opus is better at solving architectural challenges where sonnet struggles.


It looks like qwen3-coder is going to steal K2's thunder in terms of agentic coding use.


Maybe so, but currently I like the sound of K2's writing more so than qwen3 (so far in my testing).


It's 480B params, not 480GB. The 4 bit version of this is 270GB. I believe it's trained at bf16, so you need over a TB of memory to operate the model at bf16. No one should be trying to replace claude with a quantized 8 bit or 4 bit model. It's simply not possible. Also, this model isn't going to be as versed as Claude at certain libraries and languages. I have something written entirely my claude which uses the Fyne library extensively in golang for UI. Claude knows it inside and out as it's all vibe coded, but the 4 bit Qwen3 coder just hallucinated functions and parameters that don't exist because it wasn't willing to admit it didn't know what it was doing. Definitely don't judge a model by it's quant is all I'm saying.


A Mac Studio 512GB can run it in 4bit quantization. I'm excited to see unsloth dynamic quants for this today.


I tried using the "fp8" model through hyperbolic but I question if it was even that model. It was basically useless through hyperbolic.

I downloaded the 4bit quant to my mac studio 512GB. 7-8 minutes until first tokens with a big Cline prompt for it to chew on. Performance is exceptional. It nailed all the tool calls, loaded my memory bank, and reasoned about a golang code base well enough to write a blog post on the topic: https://convergence.ninja/post/blogs/000016-ForeverFantasyFr...

Writing blog posts is one of the tests I use for these models. It is a very involved process including a Q&A phase, drafting phase, approval, and deployment. The filenames follow a certain pattern. The file has to be uploaded to s3 in a certain location to trigger the deployment. It's a complex custom task that I automated.

Even the 4bit model was capable of this, but was incapable of actually working on my code, prefering to halucinate methods that would be convenient rather than admitting it didn't know what it was doing. This is the 4 bit "lobotomized" model though. I'm excited to see how it performs at full power.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: