Hacker Newsnew | past | comments | ask | show | jobs | submit | jweir's commentslogin

I switched back to 4.5 Sonnet or Opus yesterday since 4.6 was so slow and often “over thinking” or “over analyzing” the problem space. Tasks which accurately took under an minute in Sonnet 4.5 were still running after 5 minutes in 4.6 (yeah I had them race for a few tasks)

Someone of this could be system overload I suppose.


Edit ~/.claude/settings.json and add "effortLevel": "medium". Alternatively, you can put it in .claude/settings.json in a project if you want to try it out first.

They recommend this in the announcement[1], but the way they suggest doing it is via a bogus /effort command that doesn't exist. See [2] for full details about thinking effort. It also recommends a bogus way to change effort by using the arrow keys when selecting a model, so don't use that either.

[1]: https://www.anthropic.com/news/claude-opus-4-6

[2]: https://code.claude.com/docs/en/model-config#adjust-effort-l...


You can do it via /model and pressing left and right though

That's not a thing, at least not in my installation of Claude Code.

It works for me! (Edited link since original had laptops serial number in it: https://screen.studio/share/3CEvdyji)

Claude Code v2.1.37

EU region, Claude Max 20x plan

Mac -- Tahoe 26.2


Good to know it works for some people! I think it's another issue where they focus too much on MacOS and neglect Windows and Linux releases. I use WSL for Claude Code since the Windows release is far worse and currently unusable do to several neglected issues.

Hoping to see several missing features land in the Linux release soon.

I'm also feeling weak and the pull of getting a Mac is stronger. But I also really don't like the neglect around being cross-platform. It's "cross-platform" except a bunch of crap doesn't work outside MacOS. This applies to Claude Code, Claude Desktop (MacOS and Windows only - no Linux or WSL support), Claude Cowork (MacOS only). OpenAI does the same crap - the new Codex desktop app is MacOS only. And now I'm ranting.


What version are you on? Did you run a Claude update?

I'm on v2.1.37 and I have it set to auto-update, which it does. I also tend to run `claude update` when I see a new release thread on Twitter, and usually it has already updated itself.

what? Their documentation is hallucinated?

Yep, and their documentation AI assistant will egregiously hallucinate whatever it thinks you want to hear, then repeat itself in a loop when you tell it that it's wrong.

Yesterday I asked a question about a Claude Code setting inside Claude Code, don't recall which, and their builtin documentation skill—something like that—ended up doing a web search and found a wrong answer on a third party site. Later I went to their documentation site and it was right there in the docs. Wonder why they can't bundle an AI-friendly version of their own docs (can't be more than a few hundred KBs compressed?) inside their 174MB executable.

It's insane that they concluded the builtin introspection skill for claude documentation should do a web search instead of simply packing the correct documentation in local files. I had the same experience like you, wasting tokens and my time because their architecture decision doesn't work in practice.

I have to google the correct Anthropic documentation and pass that link to claude code because claude isn't able to do the same reliably in order to know how to use its own features.


Also if they bundled the documentation for the version you're running it would have fewer problems due to version differences (like stable vs latest).

They used to? I have a distinct memory of it doing exactly that a few months ago. Maybe it got dropped in the mad dash that passes for CC sprint cycles

Pathetic how they have no support for modifying sampling settings, or even a "logit_bias" so I can ban my claude from using the EM dash (and regular dash), semicolons, or "not". Also will upweight things like exclamation points

Clearly those whose job it is to "monitor" folks use this as their "tell" if someone AI generated something. That's why every major LLM has this particular slop profile. It's infuriating.

I wrote a long winded rant about this bullshit

https://gist.github.com/Hellisotherpeople/71ba712f9f899adcb0...


They mentioned in the release notes if it's over-thinking you should decrease the reasoning effort.

Yeah, nothing is sped up, their initial deployment of 4.6 is so unbearably slow they are just now offering you the opportunity to pay more for the same experience of 4.5. What's the word for that?

Enslopification.

Remember having to write detailed specs before coding? Then folks realized it was faster and easier to skip the specs and write the code? So now are we back to where we were?

One of the problems with writing detailed specs is it means you understand the problem, but often the problem is not understand - but you learn to understand it through coding and testing.

So where are we now?


Skip specs, and you often ended up writing the wrong program - at substantial cost.

The main difference now is the parrots have reduced the cost of the wrong program to near zero, thereby eliminating much of the perceived value if a spec.


Astronaut 1, AI-assisted developers: You mean, it's critical to plan and spec out what you want to write before you start in on code?

Astronaut 2, Tim Bryce: Always has been...


We’re not „thinking with portals” about these things enough yet. Typically we’d want a detailed spec beforehand, as coding is expensive and time consuming, thus we want to make sure we’re coding the right thing. With AI though, coding is cheap. So let AI skip the spec and write the code badly. Then have it review the solution, build understanding, design a spec for better solution and have it write it again. Rinse and repeat as many times you need.

It’s also nothing new, as it’s basically Joe Armstrong's programming method. It’s just not prohibitively expensive for the first time in history.


Joe should sue.

That’d be challenging for him right now.

Look up Sean Bell - not a stop a frisk, just an open fire.

Once, my wife and I were stopped, but not frisked, and cited for riding bikes, on a sidewalk at 2AM on a stretch of Atlantic Ave that would kill you to ride on. It made no sense, until I found out that my neighbor and his friend had been murdered at a street party. There was a drag net out trying to find the killer and they stopped anyone for anything.

A tough city.


This is our experience. We have added Sorbet to a 16 year old Rails app. It is a big win in avoiding errors, typos, documentation, code completion, fewer tests are required, etc.

And the LLMs take advantage of the types through the LSP and type checking.


I’d love to hear from you or someone in your shoes: what are some patterns or examples of tests that are made redundant by types?

“It has a field of type X” has never been a useful test for me, my tests are always more like:

“if I send message X I get return value or action Y”

… with my admittedly limited experience of types I don’t see how they replicate this.

Therefore it looks like I’d only be “replacing” tests that I’d never write in the first place.

What am I missing?


One of the big advantages of types is documenting what is *not* allowed. This brings a clarity to the developers and additionally ensure what is not allowed does not happen.

Unit tests typically test for behaviours. This could be both positive and negative tests. But we often test only a subset of possibilities just because how people generally think (more positive cases than negative cases). Theoretically we can do all those tests with unit testing. But we need to ask ourselves honestly, do we have that kind of test coverage as SQLLite? If yes, do we have that for very large codebases?


Just to clarify, are you saying SQLLite is a good example that we should emulate?


SQLite is known for having a lot of testing. Per their docs around 600x as much test as application code.

https://sqlite.org/testing.html


We have some tests that ensure the interface is correct - that the correct type of args are passed say from a batch process to a mailer and a mail object is returned.

For these tests we don’t care about the content only that something didn’t get incorrectly set or the mailer interface changed.

Now if the developer changes the Mailer to require a user object the compiler tells us there is an error. Sorbet will error and say “hey you need to update your code here and here by adding a User object”

Before we would have had test coverage for that - or maybe not and missed the error.


First one that pops to mind is some old python code; the parameter that came in on some functions could be a single string or a list of them. Lots of bugs where arg[0] was a character rather than a string. So tests had to be written showing both being passed in.


We have been adding Sorbet typing to our Rails application and it is a positive enhancement.

It’s not like Ruby becomes Haskell. But it does provide a good deal of additional saftey, less testing, LSP integration is good, and it is gradual.

There is a performance hit but we found it to be quite small and not an issue.

But there are area of our application that use Grape and it is too meta for Sorbet so we don’t try and use it there.


Same here. T::Struct and T::Enums at API boundaries has been the sweet spot—typed request/response models, runtime validation at ingress/egress.

I’ve been using this pattern for API clients[0] and CLIs[1]: define the shape once with Sorbet, get automatic JSON Schema generation when you need it.

[0] https://github.com/vicentereig/exa-ruby [1] https://github.com/vicentereig/lf-cli


> It’s not like Ruby becomes Haskell.

Well, maybe next time.


And use a single px invisible gif to move things around.

But was Space Jam using multiple images or just one large image with and image map for links?


The author said he had the assets and gave them to Claude. It would be obvious if he had one large image for all the planets instead of individual ones.


And moms are the gate keeps of their kids friends.


Speaking of compiling Ruby. And Stripe coders who have used the Sorbet compiler?

https://sorbet.org/blog/2021/07/30/open-sourcing-sorbet-comp...


It seems to be gone from the repo, and doesn't seem to be worked on any more? A shame.

AOT compiling Ruby is hard. I'm trying [1] [2].

Sorbet would be in a good position because part of the challenge of making it fast is that Ruby has a lot of semantics that are rarely used but that makes making compiled Ruby fast really hard. E.g. the bignum promotion adds overhead to every single operation unless you can prove invariants about the range of the values; the meta-programming likewise adds overhead and makes even very basic operations really expensive unless you can prove classes (or individual objects) aren't being mucked with...

So starting with type checking is an interesting approach to potentially allow for compiling guarded type-specific fast paths. If my compiler ever gets close enough to feature complete (it's a hobby project, so depends entirely on how much time I get, though now I also justify more time for it by using it as a test-bed for LLM tooling), it's certainly a direction I'd love to eventually explore.

[1] https://hokstad.com/compiler (not updated for a decade)

[2] https://github.com/vidarh/writing-a-compiler-in-ruby/ (updated now primarily by Claude Code; it's currently focusing on actually passing RubySpec and making speedy progress, at the cost of allowing some fairly ugly code - I do cleanup passes occasionally, but most of the cleanup will be deferred until more passes)


And won’t the authService.register function also error if the user already exists? Or will it allow double registering the account?

There are deeper problems here that a Result type is not gonna fix.


The Authservice register function will error, it says so in the article


Something that the type system should do is "make impossible states impossible" as Evan Czaplicki said (maybe others too)

We have started to use typed HTML templates in Ruby using Sorbet. It definitely prevents some production bugs (our old HAML templates would have `nil` errors when first going into production).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: