You might need to turn laws into formal proofs, and the existence of judges makes me think that’s not as likely as you would like. A commenting system would though—trained on countries’s precedents, jurisprudence and traditions might.
This could in theory already happen without any tech, but I suspect since the government is pretty monolithic, any changes in a specific law are all being done by the same set of people.
You might not have merge conflicts but I imagine you could end up with conflicting guidance from two separate pieces of law (e.g., law A says you must wear green on St. Patrick's day, law B outlaws green pajamas).
> “We are not aware of any successful mercenary spyware attacks against a Lockdown Mode-enabled Apple device,” Apple spokesperson Sarah O’Rourke told TechCrunch on Friday.
At least 225 judges have ruled in more than 700 cases that the administration's mandatory immigration detention policy likely violates the right to due process[1] The Fifth Amendment's Due Process Clause generally requires those having federal funds cut off to receive notice and an opportunity for a hearing, which was not provided in many of DOGE's spending freezes[2]
Not really related, but does anybody know if somebody's tracking same models performance on some benchmarks over time? Sometimes I feel like I'm being A/B tested.
Yeah, good tests are associated with cost. I'd like to see benchmarks on big messy codebases and how models perform on a clearly defined task that's easy to verify.
I was thinking that tokens spent in such case could also be an interesting measure, but some agent can do small useful refactoring. Although prompt could specify to do the minimal change required to achieve the goal.
People are loading huge interpreted environments for stuff that can be done from the command line. Run computations on complex objects where it could be a single machine instruction etc. The trend has been around for a long time.
Wow /insights is genuinely useful, perhaps CLI should be pushing that as a tip, if one has enough sessions, instead of keep nagging me about the frontend developer skill which I already have installed
In general CLI could be more reliable and responsive though, it's a text based env yet sometimes feel like running windows 95 on 386dx
It seems clear from the insights that some model is marking failure cases when things went wrong and likely reporting home, so that should be extremely valuable to Anthropic
your CPU, your OS, CPU and firmware on your motherboard chips, ethernet, wifi, HDDs (btw did you know your sim card has JVM?), your browser, all your networking equipment in between, BGP and all the root certs and I'm just scratching the surface
reply