As somebody from the U.S., I'm struck by how much people from the U.K. value a strong background in theory. At Google my experience has been "what matters is how smart you are, whether you understand the problem we're trying to solve, and whether you have creative solutions". When I applied for DeepMind some time ago, I was grilled, rapid-fire, a hundred questions covering the breadth of a rigorous undergraduate education in linear algebra, stats, ML, calculus, etc.. They seemed content to measure my intelligence by seeing how rapidly and deeply I had assimilated standard courses, rather than by seeing how I approached a problem I'd never seen before.
This guy is obviously talented, but also he comes from a tradition of optimizing for this kind of academic culture. You would be similarly weirded out by the leetcode fetish in tech if that's not what you were used to. I think that's what many commenters are missing.
I've noticed that cultural difference too. I think there are things to take away from both approaches. Extremely knowledgeable people with a lot of background in theory should get better at understanding and creatively applying that knowledge to new problems. And people who are great at problem solving should learn more theory instead of expecting themselves to just materialize the best solution out of thin air.
If anyone's reading over this and feels "Gosh, I'll never be an ML dev; this is way too much": I don't know most of that list, and still manage to be a productive researcher. I learn what I need as I go.
That's probably the optimal strategy. I'm skeptical of first-principles learning. It's great to immerse yourself in theory, but when you've gone all the way to "topology" you've probably gone beyond the limit of what most ML devs care about on a day-to-day basis.
It's still useful to know. I've applied lots of ideas from other fields. But can you force that knowledge by forcing yourself to study other fields? Maybe. We all have a finite amount of time though.
That said, lots of items on this list are key, and it'd be worth ranking them. There's no need to memorize formulas for Adam, but knowing the concept of momentum-per-weight is pretty crucial.
The author and you seem to be talking to different audiences, you're talking about ML eng and OP is talking about ML researchers.
Researchers absolutely need to know a lot, not necessarily all the way to topology or w/e but definitely the underlying mathematical principles in order to advance the field (IMO).
There's significant overlap between ML research and ML dev. If I can do it without most of that list, it should give people here some hope of joining the field without needing to immerse themselves in theory.
I'm an ML researcher, and I'm with sillysaurusx on this. I actually know most of the things on the list, but only because my research is mainly in model compression and computational efficiency. Recently I've been interested in adapting diffusion models to generate music (in raw audio domain), and I'd say only 5 out of 18 bullet points in ML section are relevant - the rest falls from "nice to know" to "irrelevant".
In my field we’d never hire someone with a PhD as entry level, we’d move right past them. It’s maybe because none of us have PhDs and so we think of you guys as gods that should be above us and we don’t deserve you and we can’t pay you enough. Maybe the same thing is happening?
I don't have a phd degree and I'm not at the same level as someone who has that degree, actually. Just knowing the things that are listed in that post doesn't get you a degree. I'm also circumspect on what I mention in an application for that reason.
What you need to know depends largely what you are working on - and that also holds for a successful Ph.D. candidate. Of course, it is good advice to be open-minded beyond one's narrow field of inquiry, as the OP suggests, but a long list of maths topics may not be helpful a lot to beginners.
The Ph.D. period is the time when you have some time available to acquire additional skills, and my advice is: try to strengthen those aspects of your education where you currently have the most glaring deficits. For instance,
take a statistics course if that is your weak spot, or learn a foreign language
if you have evaded that topic so far in your educational journey.
Read the main textbooks of your field and read and re-read ALL relevant papers for your actual Ph.D. topic, once you've been able to identify it (which may well take you most or all of your first year). Take details notes because nobody can remember most of that much highly concentrated advanced material. Try to find gaps: ask questions and find out if people have tried to answer them yet or not.
Interact with others e.g. at conferences, after meeting some people there by attending and networking in a prior year.
> Read the main textbooks of your field and read and re-read ALL relevant papers for your actual Ph.D. topic, once you've been able to identify
I'm gonna be real as someone in grad school. Basically no PhD, grad students, or professors I know read full text books. I hear these ideas a lot, and they often sound like one of those Instagram influencer diets, that's completely unreasonable if you have any constraints in your life, and it's mostly been used to gate keep "real" scientists. Be well studied and knowledgeable about your problem domain, but you have a finite lifespan, so never feel bad that you aren't "educated enough".
Aha, I have probably read that sentence literally hundreds of times, and never spotted the typo. I will never be able to unsee that.
Geron is good but now a bit out-of-date. No transformers (just CNNs/RNNs/etc.) and the coding component is all in scikit-learn and TensorFlow (rather than PyTorch or JAX).
This was a really fun list to read through. I agree with the author that knowing as much as possible about how things work, not just what things do, is extremely useful.
However, "Just know stuff" I think is a secondary requirement (although an important one) to be successful. People really struggle to "just know stuff" if they aren't interested in the subject in the first place. People who aren't interested will settle for knowing only what things do, and not dive into how they work.
I am interested in this stuff, and actually self taught (bachelors in mathematics here). I have experience with a lot of stuff on this list, not through work or academia, or because I want to make money, but through fiddling on my desktop at home. I too got the "coveted tech job" as a machine learning engineer, but I never would have if I wasn't legitimately interested in this stuff, studying for fun in my spare time. I have seen lots of people fail to progress in this field because they _don't care_, they just want a good job.
Kind of off topic, but this is actually an integral part of my interviewing process. We give candidates a simple dataset to model, and we receive their script. The performance on the hold-out set is only weighted ~20%. The candidates ability to talk about their process, about the internals of the model they used, about the feature engineering quirks to work around model limitations, about their parameter tuning scheme - these conversations reveal how much someone is actually interested in the field, and is a great indicator for whether or not they are going to be a good contributor to the team. I've had candidates who couldn't tell me _anything_ about how the models they used actually worked. PhDs included!
I just wrapped up a machine learning PhD at Caltech (now doing a postdoc in ML at Berkeley) and I disagree strongly with this article. What matters isn't knowing a bunch of random stuff, but rather writing/speaking skills, a willingness to learn new things, perseverance in the face of setbacks, creativity, having enough EQ to navigate the advisor-advisee relationship and departmental politics, and most of all, an ability to follow through and actually get things done. These "intangible" skills are far more important than having any specific knowledge.
This. The article is meh in terms of advices, and even serves more as a self-promotion (which apparently he is good at and that's something to learn). Maybe it somehow works for him and it's fine, but it's no more useful than a table of contents.
But what you do need for specifically a PhD? I argue that "knowing stuff" is what is necessary -- and that indeed it's essentially the purpose of the whole academic institutiom.
A Ph.D. isn't about knowing things, it's about doing research. Recipients of the degree are given the title "doctor" not for what they know, but because they are first and foremost teachers of knowledge (from the Latin docere, meaning to teach).
"Knowing stuff" is enough to get you to the point where you can formulate a good research question, but being a good researcher is a much broader skillset than just knowing things. And consequently, just knowing things won't get you a Ph.D., because getting one requires you to talk about your research. A lot. Like, all the time. And after all, the last step in getting the Ph.D. is called a "defense", not an "examination", because you are not there to tell anyone what you know -- you are there to defend what you know, and that's a different skillset than gaining knowledge or even disseminating knowledge.
I guess all that is to say, you could know everything in the world, but if you lack the skills to tell anyone else about that knowledge, you'll never get a Ph.D.
> Like, if this person cant say "Look at me, I am UNUSUALLY INTELLIGENT!" then who can?!
No one, that's my point. If academia has not taught to the author that his intelligence isn't unusual, the workforce of his new employer certainly will. Listing github stars and twitter followers in the second paragraph as an achievement to me transpires lack of maturity and a need for external validation. On the bright side, being good at self-promotion when entering a mega-corporation will make sure he has a great career in front of him.
I see, you read it as a form of self aggrandizement. I think your reading was wrong (in addition to being unkind), but wouldnt disagree with the core value under discussion.
I think a fair test to apply is whether or not the thing is a fact - if it is a fact, and you think it is impressive, that might say more about you than the author. I feel like otherwise you are asking people to self censor too strongly for fear of betraying some perverse sense of modesty that does now allow for anyone to have done anything worth noticing at all.
I think your post combined with the responses to it are a great example of how messages become stripped of all nuance, even by otherwise very educated/intelligent people. A lot of the critique here ignores the nuance and specificity you give just in the opening paragraphs.
"Just know stuff" deeply resonates. Even much below the author's level, as is the case for me, technical knowledge dominates everything else by orders of magnitude. I'm still appalled at how people can manage to gather the courage to utter they don't need math.
A great list, too. I guess I have stuff to brush up on for the next 10 to 20 years?
> I'm still appalled at how people can manage to gather the courage to utter they don't need math.
I try to (re-)learn math periodically because I feel like I should, like how one ought to eat one's vegetables and one ought to exercise, but extrinsic motivation is the thing that's lacking. I usually end up on recreational math puzzles or something, before dropping it, since at least those are fun. Probably made three or four cracks at it in a decade, each has gone the same way.
I literally don't know what I'd use most of it for—I can't find that need, even when I try. I'm sure I could make something up just to have an excuse to apply what I was learning, but... why? Probably there are other jobs I could find where knowledge of math was absolutely key, but... why? I've been paid to write code for about 23 years, my pay's great, and I've repeatedly gained a reputation at companies for being the guy to go to for tough or low-level problems. But if you gave me an intro to linear algebra final or calculus 1 final, today, I'd be lucky to score 25% on either (hell, I'd be lucky to get anything right on the Calc final, but maybe there'd be a couple easy questions at the beginning—I've never, once, ever applied anything I learned in calc, for any purpose, so what little I knew about it to begin with is long gone)
It’s one of those things where if you don’t have the math knowledge, the opportunities to apply it will be literally invisible to you.
If all you have is a hammer, everything looks like a nail—but the converse of that is that if you have never seen a hammer, nails will be invisible and incomprehensible to you, they will just blend into the background of noise.
When I learn about something, I suddenly see it everywhere.
Hypothetically, if you wanted to learn it, I would recommend devoting the first hour of every day to it. Before your brain fully wakes up and starts asking why you are doing it, just do it.
I assume you have the standard CS background, with decent knowledge of discrete math, calculus, linear algebra. I would recommend starting with a graduate-level linear algebra textbook. A friend of mine used to say “linear algebra is the new addition”, it permeates everything, and knowing it well produces massive dividends.
Why should I think I'll find uses this time, when I already burned time and money learning (some of) it once, and lost all that precisely because I never encountered any need for it? An hour a day is a huge time investment for something that's already failed to prove its worth once. Like I'm sure I could go find some jobs that I can't get now because I lack math skills, and I'm sure some (far from all!) of those pay better than what I make now, but... like... that's equally true of jobs that require better business or speaking skills, and unlike linear algebra I can easily point to ways those skills could be beneficial in everyday life and in my existing job. Where's the corresponding immediately-useful benefit for lin. alg.?
Clearly there are people that don’t need to know math. You happen to be one of them, congratulations. Though I know I’d be bored out of my mind if I did software work that only used high school level math and logic.
Sure, some people need it, I was just responding to:
> I'm still appalled at how people can manage to gather the courage to utter they don't need math.
When... well, the vast majority of people really don't. They promptly forget almost everything back to about 6th grade math, shortly after finishing formal education, because they truly never need it, so that knowledge and those skills quickly rust.
If these people in-fact could make great use of it, such that it's "appalling" that they don't think they need it, then that's probably what school should focus on teaching, at least for non-math-majors. Laser-focus on application in everyday life. Especially in k-12. If it's actually useful and people are being forced to spend hundreds to thousands of hours learning junior high, high school, and college math, but then losing most of it because they never see any use for it, that's a tremendous failing of curriculum that should be addressed as directly as possible. If such a program wouldn't succeed because it's actually true that most people don't really need most of that math for anything, then we ought not be "appalled" at their correctly assessing that truth.
I see what youre saying. I don't think it's a failure of the curriculum to teach people things when they are young they don't end up using. Certainly a 13 year old isn't going to know what their future career path/interests will be (some do, but most don't) and shouldn't let them shut doors down the road at such a young age. I think at this point high school has devolved to the point of just giving everyone the basic broad skills that they could feasibly succeed at any college major. The seniors that have already decided they just want to build houses all day can complain during math, we all heard it, "when will I ever use this", but the problem is the answer is not "never" and its not "always" its "we don't know, but you may need it, and closing those doors now will limit your future potential".
> I try to (re-)learn math periodically because I feel like I should, like how one ought to eat one's vegetables and one ought to exercise, but extrinsic motivation is the thing that's lacking.
Personally, if it were “just relearn calculus and move from there” I think I’d be able to motivate myself. At that point I’m still pretty close to the problems and tasks that interest me. But the reality is I spent much of middle school and high school wasting my time doing things other than math, so in reality I’d have to go much further back and relearn all the prerequisite stuff, and the prerequisite stuff to that, etc. By that point Im so divorced from the reason I wanted to try and relearn math in the first places and I just get bored and give up.
> Please, please: learn some probability via measure theory. You’ll start reading machine learning papers wondering how people ever express themselves precisely without it. The entire field seems to be predicated around writing things like x ~ p_\theta(x|z=q_\phi(x)) as if that’s somehow meaningful notation.
Hear hear! How did ML get saddled with such awful notation?
How many papers with awful notation are actually the reverse engineering of some (barely) working code, cobbled together from random libraries and coefficients?
Thinking about this, I'd be also interested to hear what the author learned and didn't find useful over his phd. Is this a list of most of what he ended up learning, (which could potentially then have a lot of conformation bias in it) or is it curated from the maze of blind alleys he went down?
So one thing I learned a lot of in my PhD (for literally a whole year), that I literally never needed, was functional analytic methods for PDEs. Stuff like Moser iterations / the De Giorgi-Nash-Moser theorem, etc.
The finer details of Turing machines have never really helped me, although in my case that's probably the exception as I imagine that's still pretty important.
On a more ML note, I have literally never needed SVMs. (And hope I never get asked about them, I've forgotten everything about them haha.)
I think there's a lot of other stuff I could add to the "just-don't-know-stuff" list!
(And to answer your last question: this list is curated, and based on the criteria of (a) is it useful, and (b) is it widely applicable.)
This list is great for moving things from the "unknown unknown" (U-U) to the "known unknown" (K-U) bucket. It's relatively easy to move the things from the K-U to the K-K bucket just by virtue of knowing enough search terms and places to start.
I think all of the topics in TFA's list could come into play at some point (I have explored something to do with the majority of these concepts during my work in private research) and it is important to know how compiler optimizations are done, e.g. XLA and Jacobian accumulation techniques to design fast models. I don't think it matters that you don't master all of them, but being able to quickly spin back up to comprehension upon a relatively brief refresh is pretty important when it comes to algorithm design and prototyping.
Knowing things is good, but I think the real benefit of a PhD is in developing the skill to learn new things quickly and fill in any gaps as needed. It’s important to get practice learning many things in depth but the goal isn’t to become a storehouse of knowledge, it’s to develop the ability assimilate existing knowledge and apply it to figuring new things out.
The amount of topics I’ve studied in depth dwarfs this list (and the same is certainly true of its author) but the set of things I could teach a class on today without preparation is much smaller. The important thing is that if I have a problem, I can use the impressions from all I’ve learned before to get a sense of where to look next. My memory isn’t great but it doesn’t matter because I can refresh, learn, and figure things out as needed.
It's hard to be sure it really happened this way, but I feel like my PhD began more with a research challenge and then went top down into learning what I needed to to think about the challenge, and working back up to an academic framing of how the fundamental theory was advanced.
We also had comprehensive exams, so you're forced to know the overall theory of your discipline as part of the rigor of the program.
I personally like the challenge approach to research, it's like what companies call their "north star" sometimes. It's not that you're working directly on that problem necessarily, it's that you're identifying what would have to be true for that problem to be solvable and working on some of those things
Can confirm, the way ML is used in many businesses is like an egg drop: You open up Jupyter, load in some data, and play around with various models until you find one that fits the data, then use it to try to predict future data. If the future results comport with the model, congratulations: your egg is safe, at least for drops of that height.
I keep searching how to be a good researcher but I haven't found any answers.
I know almost all of this except for weirdly arbitrary/specific stuff (you really don't need to know Haskell for an ML PhD). It's all pretty basic, half of it you learn during any CS Masters degree, the other half are models and concepts that have been popular enough in the last few years that you would have read the papers and possibly implemented them if you keep up with the field.
This hasn't helped me at all, my PhD has gone horribly and I'm not entirely sure why. The first year I wasted on an EEG-related review paper before realizing that EEG data is garbage, the second year I wasted on an original method that I wanted to pursue but it didn't perform well and got scooped. The third year (which ended up being the fourth year because of severe health issues) I was burnt-out and produced nothing of value.
I don't know how to produce good original research. I know how to program like nobody's business, better than any of my lab colleagues but this has done nothing to help me. I can implement a method in a night, I can implement tons of ideas in a year but if they don't perform better on the important metrics what's the point? I know the math and the ML side of things but this hasn't given me any insights, usually when I have an idea I realize it's already been done while doing the literature review.
It's all a big mystery to me, I think maybe the pressure of holding an industry job and having to finish the PhD before my funding ran out prevented me from experimenting freely but it might just be a convenient excuse.
I'm sorry to hear that things didn't go so well for you!
FWIW being able to "program like nobody's business" is still really really valuable. It's why I dedicate such a large chunk of the post to software dev skills. :)
For example another commentor mentions low-latency concerns in finance, and that's something I have zero experience with.
HPC has often also meant writing a lot of C++ to do e.g. MD or something.
These days, I consider myself HPC-adjacent -- I write scientific ML software, often for use on pretty beefy hardware (TPU pods etc.) So at least for that, here's an off-the-cuff list of a few items that come to mind:
- Know JAX. Really, really well: its internals, how its transforms work. It's definitely a bit bumpy in places, but it's still one of the best things we have for easily scaling programs, e.g. through `jax.pmap`, being able to test on CPU and then run on TPU, etc.
- Triton! New(-ish) kid on the block for GPU programming.
- How CPUs work: L1/L2/L3 caches, branch prediction, etc. Parallelism via OpenMP.
- Know autodiff well. E.g. have a read of the Dex paper, and the concerns with doing autodiff through index operations. Modern scientific computing is moving towards a ubiquitously autodifferentiable future.
- ... plus loads more, haha. Probably I'm still missing ten different things that another reader considers crucial.
Depends! HPC is a massive umbrella term. Which part of it are you interested in?
For finance, firewall-to-firewall time was an important concept. (The time it takes a signal to get into your datacenter, be processed, then emit a signal back out.)
But e.g. massive data processing is an entirely different beast. Latency isn't too important, whereas parallelization is crucial.
So it's kind of hard to make a list without being pointed in a vague direction.
I know most of the stuff listed there and do not have a published textbook or papers. It sounds like the author achieved success because they work hard on things they’re passionate about. Knowledge and published works are a byproduct of that.
I think the point is that knowing this stuff makes it easier to write a paper because you are not as easily out of breath. It's not guaranteeing you a good paper and phd, that's your part.
Maybe you too could be a successful phd-student? :) At least you know the basics!
Half of the items, and almost all in the "Mathematics" section, you would have learned during your BSc/MSc (if it's in applied math or physics, and Python programming is your hobby). I can't find the author's CV, but there is 5 years from him returning his MSc thesis in 2017 to returning PhD thesis in 2022. Maybe he studied ML for all those 5 years, or maybe he took off 2 years to travel the world, who knows.
You can learn a lot when you’re in your early 20s, single, ambitious, and getting a stipend from a university to do nothing but study for 16 hours a day.
I was about to comment on the short PhD in "2-and-a-bit years" -- and how the UK expectation for PhD program duration is so different from in the US.
And it wasn't to say that he's a genius. It's that the program is planned to be shorter, and in fact, there's less funding available to go longer even if you wanted to.
I have learned many of these things at some point. Unfortunately, I have also forgotten most of it. Some of it I still remember to some degree, or just the basic idea, and can probably reconstruct it given some time, or easily look it up and remember. However, many of the things I have completely forgotten, and it would take more time to understand it again.
I am not sure I can keep so many things active in my memory + also other things, depending on what I am working on in the last years. If my work in the last few years does not need some of this knowledge, I keep forgetting it.
Maybe my memory is just bad.
I think many of the things listed here are somewhat specific to what the author has worked on, so that it's easier for him to really know all this.
Maybe the point of this post is also more generic: Try to have an active memory over a diverse set of fields.
Haha, so I actually have an atrocious memory! Famously so amongst my friends, I never remember what we've discussed.
When I wrote this list I certainly wasn't expecting/recommending all of this to stay in the reader's head forever.
Rather: if you've worked with something deeply at one time, then -- even if you've forgotten the details -- you still can pattern-match on it later. And then look up whatever you've forgotten!
Maybe I should start a PhD in ML as a university drop out, it sounds like a list of basic stuff taught within the first two years of any math/CS program, but I might miss something as it's far from being detailed...
I think it's hard to get those basics in an undergrad program.
CS often has the problem that the math basics are not teached as rigorously as it should (I think calculus I/II and linear algebra I/II should be exactly the same as the lectures for math). Then it misses measure theory (usually in calc III) and therefore you are going to miss solid fundamentals in probability. Optimization is also usually absent, it's sometimes squeezed into the numerical math. lecture but it shouldn't since numerics (or better scientific computing) is so important. You also want some statistics course.
Math usually lacks the whole machine learning canon, from SVM to NNs, from bayesian methods to statistical learning theory. I just see the same statistics lecture everywhere building upon probability and deriving good estimators and their properties and that's all. Also, you might miss CS basics like algorithms and struggle with the basics of scientific computing (programming in C, knowing all your matrix decompositions etc.). Also, no knowledge of git, how to navigate a server just with a shell etc.
I think you will either have to learn the missing parts after your undergraduate, for example as your master, first year in your phd, or be really lucky! I think most miss a significant chunk of the topics after completing their undergraduate.
I'd argue that cross field is a good avenue for a PhD in CS. Work in collaboration with one of the sciences. There are plenty of papers to be had in application. Not everything in ML research needs to be algorithms. You'll likely find places where the models break down and need algo work too
People have been saying this for as long as I've been hanging on ML forums, meanwhile the professors I work with can't keep up with industry demand for ML and ML adjacent roles and have to open up new classes.
Sounds a lot like my dad telling me to go to law school instead of getting into cs. He thought the ultimate goal of cs is to make everyone in the field obsolete... well maybe he was right in the end.
Trendy fields often have lots of money in them in both the short term(grants, associateships, funding), and the medium term(first job prospects) as well as benefits like preferential appearance in journals(if two equally insightful articles are up for one spot, would you rather take the article on the topic many are interested in or the article from a much more niche area?) and possibly more low-hanging fruit since there is so much more work to piggyback on/respond to/extend.
I think every significant model has come from industry. us postdocs just tweak hyperparameters and publish answers to questions about them that no one is ever likely to ask
It's like the "will it blend" series, but for machine learning
While this surely seems required for a PhD program in one of the top universities, more down-to-earth programs would require you only about 1/4-1/2 of whats in the list depending on the area of research. For instance, graph neural networks are very niche.
And if you just want to do research around current SoTA, that would be more like 1/8th.
I am going to disagree with you there. I have physical copies of a couple books I recommend and making them easily referenced can be a bit of a pain - not difficult but may be not worth the effort if I was compiling a list of topics to serve as my general answer to a question I receive frequently
Ok, where is your personal list of important topics to understand <field>? I'd like to check that you properly referenced every recommendation.
I think my real gripe here is the implicit demand for more free labor from some one else. We're talking references here, we both know its not a huge deal if you already have your reference manager fired up, but in the context of this thread the note of 'needs references' feels like a demand/like some kind of obviously missing thing vs a neutral observation that it would make a thing better.
Kinda like if some one makes a cake and when you get a slice all you say is 'needs strawberries'. Maybe y'aint wrong, still rude though.
Linear algebra is important! Check out Linear Algebra Done Right by Axler.
If the author can’t do this minimal extra step then I question why they’re writing their post since they’re either unqualified or don’t really care about helping people.
This guy is obviously talented, but also he comes from a tradition of optimizing for this kind of academic culture. You would be similarly weirded out by the leetcode fetish in tech if that's not what you were used to. I think that's what many commenters are missing.