Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a combination that creates the magic. I'm a big believer in that you need to spend time learning math as well as learning programming and computer architecture. The algorithms are affected by all these things (this is why teams work best. But you need the right composition).

I'm a researcher and still early in my career. I'm no rockstar but I'm definitely above average if you consider things like citations or h-index. Most of my work has been making models more efficient, using fewer resources. Mostly because lack of gpu access lol. My is more on density estimation though (generative modeling)

And to be clear, I'm not saying you need to sit and do calculations all day. But learning these maths is necessary for the intuition and being able to apply that to real world problems.

I'll give a real world example though. I was interning at a big company last year and while learning their framework I was playing around with their smaller model (big one wasn't released yet). While training I recognized it was saturating early on and looking at the data I immediately recognized there were generalization issues. I asked for a week to retain the model (I only had a single V100 available despite company resources). By the end of the week I had something really promising but I was still behind on accuracy of the internal test set. I was convinced though because I can understand what causes generalization and the baked in biases of the data acquisition. My boss was not convinced and I was asking for other test sets and customer data. Begrudgingly it was given to me. I run the test and I 3x'd the performance. Being neck and neck with their giant model that had tons of pertaining (a few percent behind). Dinky little ResNet model beating a few hundred million param transformer. Few hours to train vs weeks. My boss was shocked. His boss was shocked (who was very anti theory). Even got emails asking how I did it from top people. I say that everything I did only works better on transformers and we should implement it there (I have experience with similar models at similar scales). And that's the end of the story. Nothing happened. My version wasn't released to customers nor were the additions I made to the training algorithms merged (all things were optional too, so no harm).

That's been pretty representative of my experience so far though. I can smash some metric at a small scale and mostly people say "but does it scale" and then do not give me the requisite compute to attempt it. I've seen this pattern with a number of people doing things like me. I'm far from alone and I've heard the same story at least a dozen times. The truth is to compete with these giant models you still need a lot of compute. You can definitely get the same performance with 10x and maybe even 100x fewer parameters or lower cost, but 1000x is a lot harder. I'm more concerned that we aren't really providing good pathways to grow. Science always has worked by starting small then scaling. Sure, a lot fails along the way but you have to try. The problem with GPU poor not being able to contribute to research is more gate keeping than science. But I don't think that should be controversial when you look at other comments in this thread. People say "no one knows" as if the answer is "no one can know, so don't try". That's very short sighted. But hey, it's not like there's another post today with the exact same sentiment (you can find my comment there too) https://news.ycombinator.com/item?id=43447616



Thank you for that insightful comment! "Starting small, research and scale then" is really a pattern often overlooked these days. I wish you all the best for your future endeavours.


Haha well it's pretty hard to start big if you don't have the money lol. And thanks! I just want to see our machines get smarter and to get people to be open to trying more ideas. Until we actually have AGI I think it's too early to say which method is going to definitely lead us there




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: