More

osti · 2026-04-23T22:09:35 1776982175

Can't they write a script to solve rubik cubes?

Jensson · 2026-04-24T03:44:06 1777002246

That doesn't test whether the model can follow and execute a dynamic plan reliably.

osti · 2026-04-20T16:33:42 1776702822

I think this one is only about 600GB VRAM usage, so it could fit on two mac studios with 512GB vram each. That would have costed (albeit no longer available) something like less than 20k.

NitpickLawyer · 2026-04-20T16:38:30 1776703110

Yeah, but that's personal use at best, not much agentic anything happening on that hardware. Macs are great for small models at small-medium context lengths, but at > 64k (something very common with agentic usage) it struggles and slows down a lot.

The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.

osti · 2026-04-20T18:52:10 1776711130

True, but I think for local models, we are mostly considering personal usage.

zozbot234 · 2026-04-20T17:00:05 1776704405

You could run it with SSD offload, earlier experiments with Kimi 2.5 on M5 hardware had it running at 2 tok/s. K2.6 has a similar amount of total and active parameters.

osti · 2026-04-20T18:50:54 1776711054

Yeah... I would definitely call 2t/s unusable. For simple chats, I'd want at least 15 t/s. For agentic coding (which this model is advertised for), I'd want good prefill performance as well.

veber-alex · 2026-04-20T23:48:00 1776728880

That's just throwing money away. The performance with large context would have been unusable especially if you need to serve more then a single person.

osti · 2026-04-20T16:31:25 1776702685

Maybe open source == communism

darkwater · 2026-04-20T16:38:05 1776703085

Good ol' Steve "Developers! Developers! Developers!" Ballmer said so a long time ago. What a visionary!

konart · 2026-04-20T16:58:08 1776704288

But China is not communist event though the rulling party the word in its name.

fragmede · 2026-04-20T17:20:46 1776705646

The Democratic People's Republic of Korea would like a word.

pheggs · 2026-04-20T17:29:33 1776706173

what makes you think that china ever gave up its communist goals? I personally see that everything they do aims towards that goal. From the one child policy, the huge amounts of empty apartments they build, the stuff they produce for almost free, the fishing.. open sourcing the models perfectly fits that culture too, it's the means of production

otterley · 2026-04-20T18:32:54 1776709974

The one-child policy died a long time ago. Also, the accumulation of wealth by connected politicians and businesspeople flies in the face of what communism is supposed to stand for.

There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier. It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).

bwv848 · 2026-04-20T20:23:33 1776716613

One-child policy did not die, it just morphed into Three-child policy, still a form of family planning, and still would probably fine people for having more than three kids.

pheggs · 2026-04-20T19:14:08 1776712448

> The one-child policy died a long time ago.

true, but as far as I understand it did because birth rates got too low. so they replaced it with a two-child policy and later with a three-child policy

> Also, the accumulation of wealth by connected politicians and businesspeople flies in the face of what communism is supposed to stand for.

Yeah, I am sure there's a lot of cases for that. But as far as I know the amount of billionaires has started declining in China, and I don't see how that means that they as a country moved away from the goal, it just means there's issues

> There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier.

I don't know about that, you could be right. A google search for real estate prices in china reveal a lot of news articles how they are going down though.

> It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).

Wouldn't be surprised if rich people in china invest in real estate. They don't have free capital flow, so its not easy to invest abroad and it becomes an obvious choice. Bitcoin is banned in China for that reason too

But again, as far as I know that does not mean the country moved their goals of trying to reach communism one day

otterley · 2026-04-20T19:42:58 1776714178

> I don't see how that means that they as a country moved away from the goal, it just means there's issues

They're further from Communism than they've ever been since the PRC was founded. The gap between rich and poor is growing there, not shrinking.

> A google search for real estate prices in china reveal a lot of news articles how they are going down though.

They're investing outside China (Vancouver, Toronto, NYC, London, Sydney, Melbourne, etc.) because their assets are safer there (these countries all have strong property protection laws). Like Bitcoin, freedom of capital flows may be restricted, but the wealthy seem to be evading these restrictions with impunity.

pheggs · 2026-04-20T20:22:46 1776716566

> They're further from Communism than they've ever been since the PRC was founded. The gap between rich and poor is growing there, not shrinking.

I suppose it depends on what time frame you look at, it's shrinking since 2010, but inequality rose more than that in the 80s: https://www.theglobaleconomy.com/China/gini_inequality_index...

However, that's not my point - I did not mean to say that they are going to be successful but rather that it still appears to be a long term goal for them.

> Like Bitcoin, freedom of capital flows may be restricted, but the wealthy seem to be evading these restrictions with impunity.

I don't know about that, without any source of data I guess I just have to take your word for it. I would not be surprised if you were right in this case though.

Saline9515 · 2026-04-20T20:39:45 1776717585

China is a ruthless capitalist country managed by an authoritarian regime. Planning and lack of respect for the individual or the rule of law are not communist per se.

nozzlegear · 2026-04-21T02:35:39 1776738939

> Planning and lack of respect for the individual or the rule of law are not communist per se.

They just happen to be a feature of every single country that's attempted communism to date. Total coincidence.

Saline9515 · 2026-04-22T20:15:04 1776888904

And? Fascism does it, too. Authoritarian rule, such as monarchy, does it too.

osti · 2026-04-20T17:06:46 1776704806

Oh i’m fully aware of that lol

diegolas · 2026-04-21T12:33:30 1776774810

communism is a goal, capitalism is a stage

tadfisher · 2026-04-20T17:09:32 1776704972

Nah, open source means those who do the work own the result. It's supercapitalism.

pheggs · 2026-04-20T17:50:56 1776707456

I dont think thats right, the models and the gpus are the means of production.

in capitalism the people with the capital get the profit, not the people who do the work. however, workers are said to benefit too through their salary, just less so

tadfisher · 2026-04-20T18:07:29 1776708449

The reason regular-capitalism worked is that all production used to depend on workers bottlenecking the free flow of capital by demanding salaries in exchange for their labor. Now that we've removed that obstacle, capitalism demands workers seize the means of production in order to maintain the status quo. Hence, supercapitalism.

throwaway-blaze · 2026-04-20T18:33:36 1776710016

regular capitalism works but now that the means of production are not factories, the workers have to become more entrepreneurial. Then they will control their destinies.

pheggs · 2026-04-20T18:31:01 1776709861

workers seizing the means of production is by definition socialism and not capitalism though, that's the whole idea behind socialism

tadfisher · 2026-04-20T23:52:39 1776729159

You miss the point: we advertise the change as workers becoming part of the owner class and realizing all of the economic gains of their work, thus supercapitalism. Don't use the "s" or "c" words.

osti · 2026-04-09T19:59:28 1775764768

Yup I've mentioned this in another thread, I got gpt 5.4xhigh to improve the throughout of a very complex non typical CUDA kernel by 20x. This was through a combination of architecture changes and then do low level optimizations, it did the profiling all by itself. I was extremely impressed.

esperent · 2026-04-11T00:20:14 1775866814

Do you mean the non-codex model? Are people preferring normal GPT over codex?

osti · 2026-04-11T00:54:18 1775868858

I was using codex cli with 5.4xhigh. So it was able to iteratively improve from simple prompts on my part (can you give some architectural ideas to improve the performance? And once it does, I just say can you implement and benchmark it).

I think it was a bit like Karpathy's autoresearch, except I was doing manual promoting... Though I feel I could definitely be removed from that equation.

osti · 2026-04-07T17:44:10 1775583850

I had a very complex cuda kernel and codex cli managed to improve the throughout 20x.

AlexCoventry · 2026-04-08T20:56:40 1775681800

Hmm, I still have some nonrefundable API credits with OpenAI. Maybe I should try to use them for my kernel.

FWIW, this talk[1] from NVIDIA/Meta from March claims that coding agents can often write correct implementations of of CUDA kernels, but that they're usually dog slow, like 100x slower than a kernel optimized by a skilled human.

[1] https://www.nvidia.com/en-us/on-demand/session/gtc26-s81653/

osti · 2026-04-09T23:38:10 1775777890

For me it was able to try out different architectures for perf improvement, then once it's settled on some good architectures, it can do lower level optimizations on them by profiling the code etc.

AlexCoventry · 2026-04-11T00:19:41 1775866781

That's great. It seems like a large body of experts are having real problem with this, so maybe you should publish something about your methods, or start a business...

osti · 2026-04-11T04:46:52 1775882812

I can't vouch for whether or not it can beat human experts though because I'm no CUDA expert myself. The original CUDA code were human written and I first let codex adapt it to my specific use case. Then I basically let codex generate ideas and try the ideas out itself (I think it's a bit like Karpathy's autoresearch, except I was still doing manual prompting). And that was enough to get me 20x improvement.

I suspect when people said AI wrote non performant CUDA kernels it was beginning-mid last year and it's definitely vastly improved since back then. And the agent's ability to iteratively improve really impressed me.

osti · 2026-04-02T19:16:26 1775157386

But is arc-agi really that useful though? Nowadays it seems to me that it's just another benchmark that needs to be specifically trained for. Maybe the Chinese models just didn't focus on it as much.

sdenton4 · 2026-04-02T19:28:36 1775158116

Doing great on public datasets and underperforming on private benchmarks is not a good look.

Deegy · 2026-04-02T19:46:13 1775159173

Is it though? Do we still have the expectation that LLMs will eventually be able to solve problems they haven't seen before? Or do we just want the most accurate auto complete at the cheapest price at this point?

sdenton4 · 2026-04-02T23:10:12 1775171412

It indicates that there's a good chance that they have trained on the test set, making the eval scores useless. Even if you have given up on the dream of generalization entirely, you can't meaningfully compare models which have trained on test to those which have not.

stavros · 2026-04-02T23:00:07 1775170807

You're not supposed to train for benchmarks, that's their entire point.

osti · 2026-04-01T01:16:50 1775006210

That is ture, but the revenue of the artisanal stuff is probably only a very low percentage of the overall market, which would imply a lot of software engineers would have to exit the field. Which is what we here don't want to see.

osti · 2026-03-27T03:32:33 1774582353

Doesn't the chat version of chatgpt or gemini also have interleaved tool calls, so do those also count as with harnesses?

WiSaGaN · 2026-03-27T06:01:50 1774591310

Harness is fine. I think people here are arguing what provided here to take the test is not harness.

osti · 2026-03-24T03:25:09 1774322709

Seems like the high compute parallel thinking models weren't even needed, both the normal 5.4 and gemini 3.1 pro solved it. Somehow Gemini 3 deepthink couldn't solve it.

osti · 2026-03-21T18:01:56 1774116116

During flights? Sounds a bit harsh.

cobbzilla · 2026-03-21T18:06:28 1774116388

Have you ever tried to sleep while the person next to you watches a movie at full volume?

furyofantares · 2026-03-21T18:15:36 1774116936

Yeah, it sucks. I agree with you, they should be brutally murdered.

nxpnsv · 2026-03-21T18:22:42 1774117362

That's too harsh, a regular murder would suffice.

sharkweek · 2026-03-21T18:33:08 1774117988

Just put them in row 24 on a Boeing 737 max and let the problem take care of itself.

halapro · 2026-03-21T18:42:01 1774118521

Just open the window

lostlogin · 2026-03-21T18:44:56 1774118696

Boeing tried this new feature.

halapro · 2026-03-21T18:46:58 1774118818

Not a bug, works as intended.

lelanthran · 2026-03-21T18:34:06 1774118046

> That's too harsh, a regular murder would suffice.

Correct. Kicking someone off during a flight and not giving them a parachute counts as a regular murder...

verdverm · 2026-03-21T18:48:57 1774118937

Requisite link to satirical study

"Parachute use to prevent death and major trauma when jumping from aircraft: randomized controlled trial"

https://www.bmj.com/content/363/bmj.k5094

anigbrowl · 2026-03-22T02:57:17 1774148237

It's not murder if they're guilty. Those planes come with doors for a reason.

rendaw · 2026-03-21T19:09:07 1774120147

For all siblings, I think parent was suggesting "while in flight". i.e. dropping them from 30k feet. Hence harsh...

quietsegfault · 2026-03-21T18:03:17 1774116197

NO TICKET

lelanthran · 2026-03-21T18:46:56 1774118816

I wonder how many people got this reference.

Anyway, for those who did not: https://www.youtube.com/watch?v=rCZ86O3PO-U

shagie · 2026-03-21T19:54:44 1774122884

Could have also gone for Dogma (which of course references that clip) https://youtu.be/PpckOsftaP4?si=DDlDY3ZK7FoUcKrn&t=41

throwaway894345 · 2026-03-21T18:13:01 1774116781

Seems like this flew right over a few heads.

widowlark · 2026-03-21T18:17:12 1774117032

and yet the joke fell right into our laps

sebastiennight · 2026-03-21T18:19:42 1774117182

United says we should tone down the sarcasm

Hamuko · 2026-03-21T18:53:49 1774119229

Harsh, but fair.

SOLAR_FIELDS · 2026-03-21T18:59:38 1774119578

Now explain why it wouldn’t also be fair to kick people off that were loudly emitting disgusting flatulence. Is it because they “might” not have control over it? Can I not claim I also “might” not have the control over my impulsive desire to listen to music or that I can’t use headphones for a medical issue?

I mean such a thing I would say equally detracts from the flying experience, so why not also kick those people off?

Edit: not sure why I’m getting downvoted, this is a legitimate question. I genuinely want to hear the justification.

DaSHacka · 2026-03-21T19:09:33 1774120173

You'd have a more convincing argument if you argued for a passenger with Tourette's or something. Bodily functions are obviously different from watching a movie at full volume, because there's never a situation where you would be involuntarily blasting the audio of your show or whatever to the whole plane.

SOLAR_FIELDS · 2026-03-21T19:12:25 1774120345

Okay, Tourette’s then. Should we kick people off for Tourette’s?

Your comment also presupposes two things: that flatulence is always involuntary and blasting music isn’t. Let’s say I have a form of Tourette’s that forces me to involuntarily blast noise and music and I have medical papers to prove it. Is it okay then?

I would absolutely support it if you could demonstrate that those two things are actually true. My point is: Who gets to decide what’s legitimately an involuntary medical issue and what isn’t, and where is the line that demarcates it? And what is the point of this exercise? It’s to prevent people from forcing everyone else to have a worse experience for their own personal gain, which flatulence is a form of that you could argue, so why is blasting music fundamentally different?

recursive · 2026-03-21T19:34:24 1774121664

We're talking about music coming from a phone. Not a person. Just turn the phone off or uninstall tiktok. Or put it in your bag.

vel0city · 2026-03-21T19:53:03 1774122783

Are you seriously making the argument blasting music or a movie or whatever is an involuntary bodily function?

SOLAR_FIELDS · 2026-03-21T20:43:55 1774125835

Yes. Because I'm asking the question who decides what is involuntary or not. Who is it? It seems like there is a presupposition here, but who is defining that?

Coming back to the Tourette's example: let's say someone starts shouting cuss words and loudly annoying everyone else "involuntarily". Do they get kicked off the plane? Why or why not? Who decides that? Does the person have to present medical evidence that they have Tourette's to not get kicked off the plane? If so, can they also present medical evidence of a condition that causes them to spontaneously press play on their mobile devices with no headphones and would that be accepted?

I'm obviously not defending the behavior of the loud-music-on-plane-players, or advocating that everyone needs to smell everyone's farts. I'm pointing out that this is something that is arbitrary and weaponizable.

anigbrowl · 2026-03-22T02:59:21 1774148361

I vote to throw you off the plane for disingenuous baitposting.

vel0city · 2026-03-21T23:14:29 1774134869

You don't understand that a phone isn't a part of the human body? Seriously? We as a society can't even come to agreement on that basic fact anymore?

If someone shoots a gun in a crowd is that too an involuntary bodily function? Is the gun not just part of their body? Are you confused by that as well? Where do we draw the limits on what is the human body? Who decides that? If I lay on the ground does the whole earth become my body?

RobotToaster · 2026-03-21T18:57:34 1774119454

Not harsh enough. They belong in the special level of hell reserved for child molesters and people who talk in the theatre.

chisel192 · 2026-03-21T18:06:34 1774116394

> During flights? Sounds a bit harsh.

Sounds harsh to you.

Let the market decide.

Vote with your wallet and fly a different airline.

saint11 · 2026-03-21T18:11:51 1774116711

But kicking someone off mid-flight at high altitude is still a bit harsh. I hope they give them parachutes at least.

dguest · 2026-03-21T19:09:01 1774120141

FUN FACT: Aviation rules require that any plane carrying a parachute must have at least one for every person on board. Hopefully the reason is obvious.

Now given that, do you really want to pay the extra cost of flying with 300 parachutes just so mr-full-volume-phone can have one?

3eb7988a1663 · 2026-03-21T19:56:22 1774122982

That is an incredibly fun fact. Does this only apply to commercial or also a little Cessna? Presumably there is no actual enforcement on the private planes.

dguest · 2026-03-22T08:47:57 1774169277

I made it too fun: what I said was at best an over-genarlization. The actual rules [1] apply to acrobatics and say that parachutes are required for everyone when non-crew passenger is on the plane:

    Unless each occupant of the aircraft is wearing an approved parachute, no pilot of a civil aircraft carrying any person (other than a crewmember) may execute any intentional [acrobatic] maneuver...

So without the passenger no one needs a parachute, with them everyone does.

It's perfectly legal for a 787 to carry a few parachutes just for the full-volume passengers.

[1]: https://faraim.org/faa/far/cfr/title-14/part-91/section-91.3...

jjmarr · 2026-03-21T19:26:28 1774121188

I've packed my own parachute for this hypothetical situation.

HPsquared · 2026-03-21T18:22:19 1774117339

Only if they paid extra at check-in.

doubled112 · 2026-03-21T18:26:09 1774117569

And you specifically have to request it. It isn’t a normal option during purchase.

vel0city · 2026-03-21T19:57:17 1774123037

Nah, with how ticketing is these days they'll bug you a dozen times to choose between the $50 basic economy disaster package that only has the mask and 50% airflow or the full package for $100 that includes another 25% airflow and a flotation device. Business execute gets you the parachute, a private life raft, and a few days of MREs for $250.

gumby271 · 2026-03-21T18:12:33 1774116753

Bet it won't happen twice though.

MPSimmons · 2026-03-21T18:35:41 1774118141

> give them parachutes at least

the first time

andrewflnr · 2026-03-21T18:29:23 1774117763

I'm going to vote with my wallet by moving United up my priority list.

integralid · 2026-03-21T18:12:54 1774116774

Either you missed the joke or I missed your sarcasm. I read GP as a joke: being literally kicked out of a flight in air is a death sentence, which is a bit harsh penalty indeed.