Hacker Newsnew | past | comments | ask | show | jobs | submit | programjames's commentslogin

I mean, everyone is still using variational autoencoders for their latent flow models instead of the information bottleneck. It's because it's cheaper (in founder time) to raise 10(0)x more money instead of having to design your own algorithms and architectures for a novel idea that might work in theory, but could be a dead end six months down the line. Just look at LiquidAI. Brilliant idea, but it took them ~5 years to do all the research and another to get their first models to market... which don't yet seem to be any better than models with a similar compute requirement. I find it pretty plausible that none of the "big" LLM companies seriously tried SSMs, because they already have plenty enough money to throw at transformers, or took a quick path to get a big valuation.

Maybe for an Australian billion? But in American English it would be $100 million.


In both Australian and American English a billion is 1,000,000,000 (one thousand million).

I find it pretty plausible they got an 80% speedup just by making optimized kernels for everything. Even when GPUs say they're being 100% utilized, there are so many improvements to be made, like:

- Carefully interleaving shared memory loading with computation, and the whole kernel with global memory loading.

- Warp shuffling for softmax.

- Avoiding memory access conflicts in matrix multiplication.

I'm sure the guys at ClosedAI have many more optimizations they've implemented ;). They're probably eventually going to design their own chips or use photonic chips for lower energy costs, but there's still a lot of gains to be made in the software.


yes I agree that it is very plausible. But it's just unclear whether it is more of a business decision or a real downstream effect of engineering optimizations (which I assume are happening everyday at OA)


Just from the title I knew Keenan Crane was the advisor. He has some really good YouTube video explanations for most of the concepts:

https://www.youtube.com/watch?v=bZbuKOxH71o

This work is one of those things that feels like the completely obviously right way to do things in retrospect, and why hasn't anyone implemented this before? It helps that the authors explain it very intuitively, slowly building up the tools to run like stars.

Something I don't see mentioned is that this would be pretty useful for training a physics-based model, specifically a neural ODE. Since the mean squared error scales as O(1/N), to converge you need O(1/Loss) evaluations per point. If you were using a grid approach, the cell size would need to be O(sqrt(Loss)) in size, and the running time couldn't be better than O(1/(Loss * log(Loss))) in two dimensions. To be fair, it probably takes O(1/log(Loss)) time for each Monte Carlo simulation, so it's no worse to use the grid. But, if you go up to three or more dimensions, this method still has the same running time while the best grid method takes at least O(# grid cells) = O(1/Loss^(d/2)) time.


I've seen people with a great deal of natural ability. I'm certainly one of them. But most never got as good at maths as me, because my father began teaching me before an age I can remember. I also never got as good as some others, with arguably less natural ability, because my father didn't know what an amazing maths education looks like either—he grew up in the middle-of-nowhere, went off to college, and realized, "I really should have learned all this stuff earlier," and did his best to do so for me, but he was learning about competition maths at the same time as me. This is not to say that competition maths make an "amazing maths education", just that competition maths coaches exist, and they know exactly how to best teach students maths, while I learned mostly through trial and error.


And, in most places, your public school has the legal obligation to provide an education for all ability levels, including people who show up to their first year with the ability to read. Even in the United States of America, where the culture is only the bottom 20% of students deserve to learn, most school districts still have this in their bylaws (they just ignore it).


This is the same reason people don't take calculus in ninth grade, or organic chemistry in tenth. "What will you do if you run out of classes?" I don't know, learn on your own? And once you run out of learning, do your own reseach? You cannot simultaneously view school as a place to advance your child's learning, and also a place you need to hold off on their learning for. Pick one, and if you pick the latter, admit to yourself that they're not going to school, they're getting babysat.


I actually attended a school system which had a system in place for that --- see my post elsethread.

Unfortunately, the Mississippi State Supreme Court ruled it to be an unfair and illegal educational system which conferred undue benefits to the students able to take advantage of it and that the lack of a commensurate compensation for students who were unable to do so was manifestly inappropriate.


By "unable to take advantage of it", do you mean:

- Unable to demonstrate prequisite knowledge, or

- Unable to go to a school further away?

I can get the latter one, which is why I think we shouldn't just have magnet schools, we should have free, government-run magnet boarding schools. Or, alternatively, how much would it really cost to provide a chauffeur service to kids who have demonstrated intelligence and need? If schools can provide personal aides to 1% of their population, I'm sure they have the budget to treat another 1% equitably.


The former.

The crux of the lawsuit as I understood it from hearing about it from letters my parents received from involved parents was that a student who was unable to learn at the accelerated pace and graduated with only a high-school diploma sued to either be allowed to continue to attend the school for 4 additional years, or to be granted funds to attend a college.

The school was the only public school in the county, and was attended by all the local residents (the student who initiated the lawsuit was one of them) and the children of the personnel of the local Air Force Base --- it was the matching DoD funding which made the school system possible.


From what you say it sounds a boneheaded decision, to deprive students from a good education because other students aren't ready for it.

The UK excels at something similar, where they are trying to undermine private schools and even higher level public grammar schools. This is because it's only privileged children who can afford to go to there, and the outcomes are way better then public schools.

There is a term for this: "the politics of envy", where it's better to funnel everyone through the same mediocre system so that nobody can gain an advantage. This was very much the logic behind the recent law to tax private schools, and it's an idiotic principle.


If you've ever played Risk online, this becomes obvious, very quickly. The worst players to have around are not your enemies, but the stupid ones. They'll randomly block your troops from attacking mutual enemies, accidentally bait you into attacking other players ("attack red together?" . "okay sure!" . hits three of their troops, after you hit thirty), not hold bonuses but not let you hold them either, waste all their troops on a useless endeavour, feeding the game to a third player, and so many more blunders. Sometimes, you really want to work with them, because they've tried to be nice to you, much more so than the rest of the players. Often, they fail miserably and drag you down with them. I'll take a smart enemy any day. Usually smart players don't want to be your enemy, unless you've done something to provoke them (oops... it's more fun). Even then, they'll ally with you in a heartbeat if you can take a mutually beneficial action.

Something sobering to keep in mind is that the vast majority of people have the logical and mathematical capabilities of a good ten-year-old. There are ten-year-old chess grandmasters, USAJMO qualifiers, or to be less extreme, ten-year-olds that have a solid understanding of algebra and an intuition for proofs. Most people do not, and rely on heuristics or intuitions for everything. They do not even realize you can prove something, except with "it feels or seems right". It isn't because they're mentally incapable of learning the skill, it's because they never bothered to, or didn't think it was important.


> Something sobering to keep in mind is that the vast majority of people have the logical and mathematical capabilities of a good ten-year-old.

I had a period of my life where I read a lot of "you're not special" on reddit, but then I finally understood that the ability to think logically on the most basic level already makes me very special. After spending some time with people from lower social classes and genuinely trying to bond with them I became elitist. These people just don't fucking think.


The years made me realize that too, but there's also plenty of stupid people in upper-classes. And usually, contrary to the first ones, I can't escape having to deal with those.


That's true.


Does this hold true for most multiplayer online games? I don't play, but my kids could often be heard yelling "why would you do that?!?" when they had a game going.


This is really common in fighting games. Many okay players will get beat by really bad players who are really random. Better players in these games play in such a way as to mitigate effects of the randomness of their opponent.


That isn't a fair ranking. Some people are more idealist than you, and a priori want to believe others will play well. If you stick them in a lobby full of smashy noobs, they may not live long enough to update on those beliefs (within the single game). Of course, the same happens in reverse: if you go into a game assuming everyone is a smashy noob, and they're actually good at the game, you're going to end up in the losing position. The rate of mistakes changes the trembling-hand equilibria. The best players will, of course, figure out reputations for the other players as the game progresses, but calling some just 'okay' and others 'better' because they initialize the reputations to different values is not fair.

Think about this: you've probably driven a car to get from Point A to Point B before. If you existed in a society where people were constantly making mistakes, in the sense of crashing their cars several orders of magnitude more often, driving a car to get from Point A to Point B is no longer a good strategy. But it usually is a good, if not the best, move you can make right now, because people aren't making mistakes that frequently.

Here's another example: marriage (long-term relationships). Perhaps not all extra-marital affairs are mistakes, but a significant proportion of them are. If too many people are making these mistakes, leading to messy divorces, it's no longer worth it to even consider dating in the first place.


"but calling some just 'okay' and others 'better' because they initialize the reputations to different values is not fair"

Why is it not fair? I was just stating how it goes. It's pretty well documented that the best players have the best defense, unless it's a game like UMVC3. The players at higher ranks tend to be better at playing defense and they just let their opponents kill themselves. They don't get blown up by wake up supers, dps, jump ins, etc. Okay players might be good at doing combos, but their game sense isn't great and they will frequently put themselves in positions to have the tables turned on them. They also don't focus on punishing mistakes and capitalizing on their defense. If you have good game sense, you can actually beat people really well with pretty mediocre execution.

Even if you play multiple matches, these okay players will lose a surprising amount to the players who play randomly and are smashy noobs.


(Note: this is Risk).

I recently played a game with higher-level players. One player was incredibly passive; their bonus would get broken every turn, and rather than doing anything to defend it (and they had enough troops to defend it), they pulled their troops into the middle of the bonus and let everyone else take turns breaking their bonus. This could work fine in a lower-level lobby—eventually the other players might get bored and start hitting each other—but not with better players. Good players realize a couple things about them:

1. They won't retaliate, so I can knock off a few of their troops at no risk to myself.

2. They won't do anything, so any fighting among the rest of the players will effectively be a troop subsidy to them.

Since they're unwilling to help anybody, no one wants to give them free troops, and since they've demonstrated an unwillingness to retaliate, there's really no risk with hitting them over and over. So, naturally, they were the next player to be eliminated.

One thing that's nice about Risk is there's very little to the mechanics. There aren't combos to practice or build orders to memorize, mostly just an understanding of what other people want and how to negotiate. Pretty much everyone rated intermediate and above has the mechanics down: how to move troops around to not block each other, how to choke out other players, what moves are game ending for you or another player, and so on. The thing that sets apart intermediates, experts, masters, and grandmasters is almost entirely their ability to work with other players. However, since many (most?) players are "smashy noobs", lots of people rise up the ranks by just assuming everyone else is a smashy noob, and playing extremely passively to compensate. It works, until they end up in a lobby full of masters.


> That means that you can't pipe these things into each other, unless you can build an equivalently fast and efficient light->charge transducer (i.e. a photodetector).

These exist:

https://ultrafast.mit.edu/


There is, and people have trained purely optical neural networks:

https://arxiv.org/abs/2208.01623

The real issue is trying to backpropagate those nonlinear optics. You need a second nonlinear optical component that matches the derivative of the first nonlinear optical component. In the paper above, they approximate the derivative by slightly changing the parameters, but that means the training time scales linearly with the number of parameters in each layer.

Note: the authors claim it takes O(sqrt N) time, but they're forgetting that the learning rate mu = o(1/sqrt N) if you want to converge to a minimum:

    Loss(theta + dtheta) = Loss(theta) + dtheta * dLoss(theta) + O(dtheta^2)
                         = Loss(theta) + mu * sqrtN * C (assuming Lipschitz continuous)
    ==>     min(Loss)    = mu * sqrtN * C/2


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: