Hacker Newsnew | past | comments | ask | show | jobs | submit | TheDudeMan's commentslogin

How fast if you write a for loop and keep track of the index and value of the smallest (possibly treating them as ints)?


I hazard to guess that it would be the same, because the compiler would produce a loop out of .iter(), would expose the loop index via .enumerate(), and would keep track of that index in .min_by(). I suppose the lambda would be inlined, maybe even along with comparisons.

I wonder could that be made faster by using AVX instructions; they allow to find the minimum value among several u32 values, but not immediately its index.


Even without AVX it seems possible to do better than a naive C style for loop argmax by manually unrolling the loop a bit and maintaining multiple accumulators

e.g. using 4 accumulators instead of 1 accumulator in the naive for loop gives me around a 15%-20% speedup (Not using rust, extremely scalar terrible naive C code via g++ with -funroll-all-loops -march=native -O3)

if we're expressing argmax via the obvious C style naive for loop, or a functional reduce, with a single accumulator, we've forcing a chain dependency that isn't really part of the problem. but if we don't care which argmax-ing index we get (if there are multiple minimal elements in the array) then instead of evaluating the reductions in a single rigid chain bound by a single accumulator, we can break the chain and get our hardware to do more work in parallel, even if we're only single threaded.

anonymoushn is doing something much cleverer again using intrinsics but there's still that idea of "how do we break the dependency chain between different operations so the cpu can kick them off in parallel"


you can have some vector registers n_acc, ns, idx_acc, idxs, then you can do

  // (initialize ns and idxs by reading from the array
  //  and adding the apropriate constant to the old value of idxs.)
  n_acc = min(n_acc, ns);
  const is_new_min = eq(n_acc, ns);
  idx_acc = blend(idx_acc, idxs, is_new_min);
Edit: I wrote this with min, eq, blend but you can actually use cmpgt, min, blend to avoid having a dependency chain through all three instructions. I am just used to using min, eq, blend because of working on unsigned values that don't have cmpgt

you can consult the list of toys here: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...


Yes this is fairly easy to write in AVX, and you can track the index also, honestly the code is cleaner and nicer to read than this mildly obfuscated rust.


You're referring to nothing and nothing. What exactly are you talking about? It certainly can't be the trivial to understand one liners in the blog.


But how is that slower than sorting the list?!


You know they're getting better, right?


Once they're good enough, I'm sure they will be used somewhere.

These aren't. They're not even ten percent there. I don't get why you'd try to mass-produce and market them.

Tesla is going to have proper autonomous driving in their consumer vehicles before they make one useful humanoid robot.


That's the part I find frustrating as well. The Optimus demos I've seen show a product that is far, far, far from ready for prime time while Musk and others act like it's amazingly capable.

The recent clip posted by Marc Benioff was...painful. It took a few seconds to reply to a simple greeting. Its next bit of speech in response to a query of where to get a Coke has a weird moment where it seems like it interrupts itself. Optimus offers to take Benioff to the kitchen to get a Coke. Optimus acknowledges Benioff's affirmative response, but just stands there. Then you hear Musk in the background muttering that Optimus is "paranoid" about the space. Benioff backs up a few feet. Optimus slowly turns then begins shuffling forward. Is it headed to the kitchen? Who knows!

The reaction to that should not be "OMG I cannot wait to pay you $200-$500k for one of these!" It should be "You want HOW MUCH for THIS? Are you nuts?"


OK, so you do get the vision.


Like fusion power reactor are, and have been for a long time?


No, I never said that.


They saw how much money stablecoin issuers are making. Simple as that.


I interpret The Bitter Lesson as suggesting that you should be selecting methods that do not need all that data (in many domains, we don't know those methods yet).


Yes, many PMs suck. And many engineers suck. And communication is always lossy. Having many/all engineers take some calls helps to mitigate those.


I would have loved this as a kid. Walkie-talkie range and battery life made them useless for my adventures.


Interesting. You just articulated why Chandler was annoying rather than funny.


He was extreme funny when I was 17.


The Bitter Lesson is saying, if you're going to use human knowledge, be aware that your solution is temporary. It's not wrong. And it's not wrong to use human knowledge to solve your "today" problem.


System prompts enable changing the model behavior with a simple code change. Without system prompts, changing the behavior would require some level of retraining. So they are quite practical and aren't going anywhere.


This is because coders didn't spend enough time making their tests efficient. Maybe LLM coding agents can help with that.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: