Hacker News new | past | comments | ask | show | jobs | submit login
Just know stuff (or, how to achieve success in a machine learning PhD) (kidger.site)
241 points by occamschainsaw on Jan 27, 2023 | hide | past | favorite | 103 comments



As somebody from the U.S., I'm struck by how much people from the U.K. value a strong background in theory. At Google my experience has been "what matters is how smart you are, whether you understand the problem we're trying to solve, and whether you have creative solutions". When I applied for DeepMind some time ago, I was grilled, rapid-fire, a hundred questions covering the breadth of a rigorous undergraduate education in linear algebra, stats, ML, calculus, etc.. They seemed content to measure my intelligence by seeing how rapidly and deeply I had assimilated standard courses, rather than by seeing how I approached a problem I'd never seen before.

This guy is obviously talented, but also he comes from a tradition of optimizing for this kind of academic culture. You would be similarly weirded out by the leetcode fetish in tech if that's not what you were used to. I think that's what many commenters are missing.


I've noticed that cultural difference too. I think there are things to take away from both approaches. Extremely knowledgeable people with a lot of background in theory should get better at understanding and creatively applying that knowledge to new problems. And people who are great at problem solving should learn more theory instead of expecting themselves to just materialize the best solution out of thin air.


If anyone's reading over this and feels "Gosh, I'll never be an ML dev; this is way too much": I don't know most of that list, and still manage to be a productive researcher. I learn what I need as I go.

That's probably the optimal strategy. I'm skeptical of first-principles learning. It's great to immerse yourself in theory, but when you've gone all the way to "topology" you've probably gone beyond the limit of what most ML devs care about on a day-to-day basis.

It's still useful to know. I've applied lots of ideas from other fields. But can you force that knowledge by forcing yourself to study other fields? Maybe. We all have a finite amount of time though.

That said, lots of items on this list are key, and it'd be worth ranking them. There's no need to memorize formulas for Adam, but knowing the concept of momentum-per-weight is pretty crucial.


The author and you seem to be talking to different audiences, you're talking about ML eng and OP is talking about ML researchers.

Researchers absolutely need to know a lot, not necessarily all the way to topology or w/e but definitely the underlying mathematical principles in order to advance the field (IMO).


You're right, I wasn't clear. But I'm a full-time ML researcher. In terms of advancing the field, my contributions so far have been modest, but they're there. Some of my favorite ideas were swarm training (https://battle.shawwn.com/swarm-training-v01a.pdf), stop loss for stabilizing GAN training (https://twitter.com/search?q=from%3Atheshawwn%20stop%20loss&...), and getting GPT to play chess (https://www.theregister.com/2020/01/10/gpt2_chess/) back when that was a shocking idea. And in terms of cited research, https://arxiv.org/abs/2101.00027 has been the most impactful.

There's significant overlap between ML research and ML dev. If I can do it without most of that list, it should give people here some hope of joining the field without needing to immerse themselves in theory.


Aha! Nice, I'm actually booking up on transformers as part of my doctoral programme and had a recollection about your gpt chess project earlier today!


I'm an ML researcher, and I'm with sillysaurusx on this. I actually know most of the things on the list, but only because my research is mainly in model compression and computational efficiency. Recently I've been interested in adapting diffusion models to generate music (in raw audio domain), and I'd say only 5 out of 18 bullet points in ML section are relevant - the rest falls from "nice to know" to "irrelevant".


> Gosh, I'll never be an ML dev

FWIW very few of the ML devs I work with have PhDs. I'm not sure aspiring ML devs are the intended audience.


I feel like I know up to 75% of that stuff and can't get hired even for entry level data science.


In my field we’d never hire someone with a PhD as entry level, we’d move right past them. It’s maybe because none of us have PhDs and so we think of you guys as gods that should be above us and we don’t deserve you and we can’t pay you enough. Maybe the same thing is happening?


I don't have a phd degree and I'm not at the same level as someone who has that degree, actually. Just knowing the things that are listed in that post doesn't get you a degree. I'm also circumspect on what I mention in an application for that reason.


Oh sorry I misunderstood your post in the mental context I had at the time.


Same. Have the PhD also. Never even go to first round of interviews, or hear back.

And to get to this level of understanding takes around a decade. Feels like a huge waste of time. The things you could be in a decade of work...


What you need to know depends largely what you are working on - and that also holds for a successful Ph.D. candidate. Of course, it is good advice to be open-minded beyond one's narrow field of inquiry, as the OP suggests, but a long list of maths topics may not be helpful a lot to beginners.

The Ph.D. period is the time when you have some time available to acquire additional skills, and my advice is: try to strengthen those aspects of your education where you currently have the most glaring deficits. For instance, take a statistics course if that is your weak spot, or learn a foreign language if you have evaded that topic so far in your educational journey.

Read the main textbooks of your field and read and re-read ALL relevant papers for your actual Ph.D. topic, once you've been able to identify it (which may well take you most or all of your first year). Take details notes because nobody can remember most of that much highly concentrated advanced material. Try to find gaps: ask questions and find out if people have tried to answer them yet or not. Interact with others e.g. at conferences, after meeting some people there by attending and networking in a prior year.


> Read the main textbooks of your field and read and re-read ALL relevant papers for your actual Ph.D. topic, once you've been able to identify

I'm gonna be real as someone in grad school. Basically no PhD, grad students, or professors I know read full text books. I hear these ideas a lot, and they often sound like one of those Instagram influencer diets, that's completely unreasonable if you have any constraints in your life, and it's mostly been used to gate keep "real" scientists. Be well studied and knowledgeable about your problem domain, but you have a finite lifespan, so never feel bad that you aren't "educated enough".


As the author of this article... I have read maybe one textbook cover-to-cover in my life. :D

(Hands-on machine learning, by Geron, back when I made the jump math->ML.)


Elements of Statistical Learning is my cover to cover read :), I just think it isn't a requirement to be a "good student"


I have an applied maths PhD but no machine learning. Would you still recommend Geron for that purpose?

BTW, I spotted a typo in the first paragraph of your thesis abstract: "neural networks and differential equation are two sides".


Aha, I have probably read that sentence literally hundreds of times, and never spotted the typo. I will never be able to unsee that.

Geron is good but now a bit out-of-date. No transformers (just CNNs/RNNs/etc.) and the coding component is all in scikit-learn and TensorFlow (rather than PyTorch or JAX).

FWIW I did ask this question recently over at https://twitter.com/PatrickKidger/status/1602776438159339521, in case any of the responses are helpful.


This was a really fun list to read through. I agree with the author that knowing as much as possible about how things work, not just what things do, is extremely useful.

However, "Just know stuff" I think is a secondary requirement (although an important one) to be successful. People really struggle to "just know stuff" if they aren't interested in the subject in the first place. People who aren't interested will settle for knowing only what things do, and not dive into how they work.

I am interested in this stuff, and actually self taught (bachelors in mathematics here). I have experience with a lot of stuff on this list, not through work or academia, or because I want to make money, but through fiddling on my desktop at home. I too got the "coveted tech job" as a machine learning engineer, but I never would have if I wasn't legitimately interested in this stuff, studying for fun in my spare time. I have seen lots of people fail to progress in this field because they _don't care_, they just want a good job.

Kind of off topic, but this is actually an integral part of my interviewing process. We give candidates a simple dataset to model, and we receive their script. The performance on the hold-out set is only weighted ~20%. The candidates ability to talk about their process, about the internals of the model they used, about the feature engineering quirks to work around model limitations, about their parameter tuning scheme - these conversations reveal how much someone is actually interested in the field, and is a great indicator for whether or not they are going to be a good contributor to the team. I've had candidates who couldn't tell me _anything_ about how the models they used actually worked. PhDs included!


This is a great point I didn't cover! "Just know stuff" tends to follow naturally from "care about stuff".


I just wrapped up a machine learning PhD at Caltech (now doing a postdoc in ML at Berkeley) and I disagree strongly with this article. What matters isn't knowing a bunch of random stuff, but rather writing/speaking skills, a willingness to learn new things, perseverance in the face of setbacks, creativity, having enough EQ to navigate the advisor-advisee relationship and departmental politics, and most of all, an ability to follow through and actually get things done. These "intangible" skills are far more important than having any specific knowledge.


This. The article is meh in terms of advices, and even serves more as a self-promotion (which apparently he is good at and that's something to learn). Maybe it somehow works for him and it's fine, but it's no more useful than a table of contents.


I think those are all things you need for life.

But what you do need for specifically a PhD? I argue that "knowing stuff" is what is necessary -- and that indeed it's essentially the purpose of the whole academic institutiom.


A Ph.D. isn't about knowing things, it's about doing research. Recipients of the degree are given the title "doctor" not for what they know, but because they are first and foremost teachers of knowledge (from the Latin docere, meaning to teach).

"Knowing stuff" is enough to get you to the point where you can formulate a good research question, but being a good researcher is a much broader skillset than just knowing things. And consequently, just knowing things won't get you a Ph.D., because getting one requires you to talk about your research. A lot. Like, all the time. And after all, the last step in getting the Ph.D. is called a "defense", not an "examination", because you are not there to tell anyone what you know -- you are there to defend what you know, and that's a different skillset than gaining knowledge or even disseminating knowledge.

I guess all that is to say, you could know everything in the world, but if you lack the skills to tell anyone else about that knowledge, you'll never get a Ph.D.


These examples right here are the real skills to go beyond being stuck in purely technical roles.

These communication skills are also much more emphasized in American phd programs than European.

I have seen in general the American phd programs produce much more mature, well-rounded and broad scientists than Europe.

It could also be because the blog poster is only 2.5 years into their post graduate years, while American PhDs can easily be double.


The tone of the post is pretty off putting. It reads as a "look how smart I am!" article - the author doesn't even pretend to be modest.


Should they be? In this case wouldnt it end up being false modesty?

Like, if this person cant say "Look at me, I am UNUSUALLY INTELLIGENT!" then who can?!


> Like, if this person cant say "Look at me, I am UNUSUALLY INTELLIGENT!" then who can?!

No one, that's my point. If academia has not taught to the author that his intelligence isn't unusual, the workforce of his new employer certainly will. Listing github stars and twitter followers in the second paragraph as an achievement to me transpires lack of maturity and a need for external validation. On the bright side, being good at self-promotion when entering a mega-corporation will make sure he has a great career in front of him.


I see, you read it as a form of self aggrandizement. I think your reading was wrong (in addition to being unkind), but wouldnt disagree with the core value under discussion.

I think a fair test to apply is whether or not the thing is a fact - if it is a fact, and you think it is impressive, that might say more about you than the author. I feel like otherwise you are asking people to self censor too strongly for fear of betraying some perverse sense of modesty that does now allow for anyone to have done anything worth noticing at all.


I'm sorry it came across this way for you! Rather, I'm just outlining why folks seem to keep asking me this question. :)


I think your post combined with the responses to it are a great example of how messages become stripped of all nuance, even by otherwise very educated/intelligent people. A lot of the critique here ignores the nuance and specificity you give just in the opening paragraphs.


Even if you can does not mean you should.


Why not?


"Just know stuff" deeply resonates. Even much below the author's level, as is the case for me, technical knowledge dominates everything else by orders of magnitude. I'm still appalled at how people can manage to gather the courage to utter they don't need math.

A great list, too. I guess I have stuff to brush up on for the next 10 to 20 years?


> I'm still appalled at how people can manage to gather the courage to utter they don't need math.

I try to (re-)learn math periodically because I feel like I should, like how one ought to eat one's vegetables and one ought to exercise, but extrinsic motivation is the thing that's lacking. I usually end up on recreational math puzzles or something, before dropping it, since at least those are fun. Probably made three or four cracks at it in a decade, each has gone the same way.

I literally don't know what I'd use most of it for—I can't find that need, even when I try. I'm sure I could make something up just to have an excuse to apply what I was learning, but... why? Probably there are other jobs I could find where knowledge of math was absolutely key, but... why? I've been paid to write code for about 23 years, my pay's great, and I've repeatedly gained a reputation at companies for being the guy to go to for tough or low-level problems. But if you gave me an intro to linear algebra final or calculus 1 final, today, I'd be lucky to score 25% on either (hell, I'd be lucky to get anything right on the Calc final, but maybe there'd be a couple easy questions at the beginning—I've never, once, ever applied anything I learned in calc, for any purpose, so what little I knew about it to begin with is long gone)


It’s one of those things where if you don’t have the math knowledge, the opportunities to apply it will be literally invisible to you.

If all you have is a hammer, everything looks like a nail—but the converse of that is that if you have never seen a hammer, nails will be invisible and incomprehensible to you, they will just blend into the background of noise.

When I learn about something, I suddenly see it everywhere.

Hypothetically, if you wanted to learn it, I would recommend devoting the first hour of every day to it. Before your brain fully wakes up and starts asking why you are doing it, just do it.

I assume you have the standard CS background, with decent knowledge of discrete math, calculus, linear algebra. I would recommend starting with a graduate-level linear algebra textbook. A friend of mine used to say “linear algebra is the new addition”, it permeates everything, and knowing it well produces massive dividends.


Why should I think I'll find uses this time, when I already burned time and money learning (some of) it once, and lost all that precisely because I never encountered any need for it? An hour a day is a huge time investment for something that's already failed to prove its worth once. Like I'm sure I could go find some jobs that I can't get now because I lack math skills, and I'm sure some (far from all!) of those pay better than what I make now, but... like... that's equally true of jobs that require better business or speaking skills, and unlike linear algebra I can easily point to ways those skills could be beneficial in everyday life and in my existing job. Where's the corresponding immediately-useful benefit for lin. alg.?


Clearly there are people that don’t need to know math. You happen to be one of them, congratulations. Though I know I’d be bored out of my mind if I did software work that only used high school level math and logic.


Sure, some people need it, I was just responding to:

> I'm still appalled at how people can manage to gather the courage to utter they don't need math.

When... well, the vast majority of people really don't. They promptly forget almost everything back to about 6th grade math, shortly after finishing formal education, because they truly never need it, so that knowledge and those skills quickly rust.

If these people in-fact could make great use of it, such that it's "appalling" that they don't think they need it, then that's probably what school should focus on teaching, at least for non-math-majors. Laser-focus on application in everyday life. Especially in k-12. If it's actually useful and people are being forced to spend hundreds to thousands of hours learning junior high, high school, and college math, but then losing most of it because they never see any use for it, that's a tremendous failing of curriculum that should be addressed as directly as possible. If such a program wouldn't succeed because it's actually true that most people don't really need most of that math for anything, then we ought not be "appalled" at their correctly assessing that truth.


I see what youre saying. I don't think it's a failure of the curriculum to teach people things when they are young they don't end up using. Certainly a 13 year old isn't going to know what their future career path/interests will be (some do, but most don't) and shouldn't let them shut doors down the road at such a young age. I think at this point high school has devolved to the point of just giving everyone the basic broad skills that they could feasibly succeed at any college major. The seniors that have already decided they just want to build houses all day can complain during math, we all heard it, "when will I ever use this", but the problem is the answer is not "never" and its not "always" its "we don't know, but you may need it, and closing those doors now will limit your future potential".


This is a great description, thank you! What you've said is precisely the reason I emphasised knowing a bit of foundational math, e.g. topology.


> I try to (re-)learn math periodically because I feel like I should, like how one ought to eat one's vegetables and one ought to exercise, but extrinsic motivation is the thing that's lacking.

Personally, if it were “just relearn calculus and move from there” I think I’d be able to motivate myself. At that point I’m still pretty close to the problems and tasks that interest me. But the reality is I spent much of middle school and high school wasting my time doing things other than math, so in reality I’d have to go much further back and relearn all the prerequisite stuff, and the prerequisite stuff to that, etc. By that point Im so divorced from the reason I wanted to try and relearn math in the first places and I just get bored and give up.


...and then you meet your new boss who has no clue about anything and makes your life hell when his imagination doesn't match the "stuff you know".


> Please, please: learn some probability via measure theory. You’ll start reading machine learning papers wondering how people ever express themselves precisely without it. The entire field seems to be predicated around writing things like x ~ p_\theta(x|z=q_\phi(x)) as if that’s somehow meaningful notation.

Hear hear! How did ML get saddled with such awful notation?


How many papers with awful notation are actually the reverse engineering of some (barely) working code, cobbled together from random libraries and coefficients?


The problem with this list, although impressive in scope, is that it is not clear how /deeply/ one should know each of those items on the list.

One can write several papers for each one of those items on list.

Knowing the concept well enough to pass a job interview is a much lower bar than well enough to innovate and push new knowledge.


Thinking about this, I'd be also interested to hear what the author learned and didn't find useful over his phd. Is this a list of most of what he ended up learning, (which could potentially then have a lot of conformation bias in it) or is it curated from the maze of blind alleys he went down?


Ooh, that's a great suggestion!

So one thing I learned a lot of in my PhD (for literally a whole year), that I literally never needed, was functional analytic methods for PDEs. Stuff like Moser iterations / the De Giorgi-Nash-Moser theorem, etc.

The finer details of Turing machines have never really helped me, although in my case that's probably the exception as I imagine that's still pretty important.

On a more ML note, I have literally never needed SVMs. (And hope I never get asked about them, I've forgotten everything about them haha.)

I think there's a lot of other stuff I could add to the "just-don't-know-stuff" list!

(And to answer your last question: this list is curated, and based on the criteria of (a) is it useful, and (b) is it widely applicable.)


This list is great for moving things from the "unknown unknown" (U-U) to the "known unknown" (K-U) bucket. It's relatively easy to move the things from the K-U to the K-K bucket just by virtue of knowing enough search terms and places to start.

I think all of the topics in TFA's list could come into play at some point (I have explored something to do with the majority of these concepts during my work in private research) and it is important to know how compiler optimizations are done, e.g. XLA and Jacobian accumulation techniques to design fast models. I don't think it matters that you don't master all of them, but being able to quickly spin back up to comprehension upon a relatively brief refresh is pretty important when it comes to algorithm design and prototyping.


Knowing things is good, but I think the real benefit of a PhD is in developing the skill to learn new things quickly and fill in any gaps as needed. It’s important to get practice learning many things in depth but the goal isn’t to become a storehouse of knowledge, it’s to develop the ability assimilate existing knowledge and apply it to figuring new things out.

The amount of topics I’ve studied in depth dwarfs this list (and the same is certainly true of its author) but the set of things I could teach a class on today without preparation is much smaller. The important thing is that if I have a problem, I can use the impressions from all I’ve learned before to get a sense of where to look next. My memory isn’t great but it doesn’t matter because I can refresh, learn, and figure things out as needed.


Completely agreed!


It's hard to be sure it really happened this way, but I feel like my PhD began more with a research challenge and then went top down into learning what I needed to to think about the challenge, and working back up to an academic framing of how the fundamental theory was advanced.

We also had comprehensive exams, so you're forced to know the overall theory of your discipline as part of the rigor of the program.

I personally like the challenge approach to research, it's like what companies call their "north star" sometimes. It's not that you're working directly on that problem necessarily, it's that you're identifying what would have to be true for that problem to be solvable and working on some of those things


"The stuff is what the stuff is, brother." --from a James Mickens talk on machine learning, https://www.youtube.com/watch?v=ajGX7odA87k&t=13m40s

Can confirm, the way ML is used in many businesses is like an egg drop: You open up Jupyter, load in some data, and play around with various models until you find one that fits the data, then use it to try to predict future data. If the future results comport with the model, congratulations: your egg is safe, at least for drops of that height.


I keep searching how to be a good researcher but I haven't found any answers.

I know almost all of this except for weirdly arbitrary/specific stuff (you really don't need to know Haskell for an ML PhD). It's all pretty basic, half of it you learn during any CS Masters degree, the other half are models and concepts that have been popular enough in the last few years that you would have read the papers and possibly implemented them if you keep up with the field.

This hasn't helped me at all, my PhD has gone horribly and I'm not entirely sure why. The first year I wasted on an EEG-related review paper before realizing that EEG data is garbage, the second year I wasted on an original method that I wanted to pursue but it didn't perform well and got scooped. The third year (which ended up being the fourth year because of severe health issues) I was burnt-out and produced nothing of value.

I don't know how to produce good original research. I know how to program like nobody's business, better than any of my lab colleagues but this has done nothing to help me. I can implement a method in a night, I can implement tons of ideas in a year but if they don't perform better on the important metrics what's the point? I know the math and the ML side of things but this hasn't given me any insights, usually when I have an idea I realize it's already been done while doing the literature review.

It's all a big mystery to me, I think maybe the pressure of holding an industry job and having to finish the PhD before my funding ran out prevented me from experimenting freely but it might just be a convenient excuse.


I'm sorry to hear that things didn't go so well for you!

FWIW being able to "program like nobody's business" is still really really valuable. It's why I dedicate such a large chunk of the post to software dev skills. :)


I didn't expect this to be as helpful as it actually was. Great list.

Can anyone suggest me something similar for HPC domain?


So HPC means like ten different things.

For example another commentor mentions low-latency concerns in finance, and that's something I have zero experience with.

HPC has often also meant writing a lot of C++ to do e.g. MD or something.

These days, I consider myself HPC-adjacent -- I write scientific ML software, often for use on pretty beefy hardware (TPU pods etc.) So at least for that, here's an off-the-cuff list of a few items that come to mind:

- Know JAX. Really, really well: its internals, how its transforms work. It's definitely a bit bumpy in places, but it's still one of the best things we have for easily scaling programs, e.g. through `jax.pmap`, being able to test on CPU and then run on TPU, etc.

- Triton! New(-ish) kid on the block for GPU programming.

- How CPUs work: L1/L2/L3 caches, branch prediction, etc. Parallelism via OpenMP.

- How GPUs work: warps etc.

- How BLAS works (e.g. tiling)..

- Compiler theory. Inlining functions, argment aliasing, NRVO, ...

- Know autodiff well. E.g. have a read of the Dex paper, and the concerns with doing autodiff through index operations. Modern scientific computing is moving towards a ubiquitously autodifferentiable future.

- ... plus loads more, haha. Probably I'm still missing ten different things that another reader considers crucial.


Depends! HPC is a massive umbrella term. Which part of it are you interested in?

For finance, firewall-to-firewall time was an important concept. (The time it takes a signal to get into your datacenter, be processed, then emit a signal back out.)

But e.g. massive data processing is an entirely different beast. Latency isn't too important, whereas parallelization is crucial.

So it's kind of hard to make a list without being pointed in a vague direction.


I know most of the stuff listed there and do not have a published textbook or papers. It sounds like the author achieved success because they work hard on things they’re passionate about. Knowledge and published works are a byproduct of that.


I think the point is that knowing this stuff makes it easier to write a paper because you are not as easily out of breath. It's not guaranteeing you a good paper and phd, that's your part.

Maybe you too could be a successful phd-student? :) At least you know the basics!


2 years? Sheesh. This is the type of stuff that makes me think genuis is a biological thing.


> 2 years?

Half of the items, and almost all in the "Mathematics" section, you would have learned during your BSc/MSc (if it's in applied math or physics, and Python programming is your hobby). I can't find the author's CV, but there is 5 years from him returning his MSc thesis in 2017 to returning PhD thesis in 2022. Maybe he studied ML for all those 5 years, or maybe he took off 2 years to travel the world, who knows.


You can learn a lot when you’re in your early 20s, single, ambitious, and getting a stipend from a university to do nothing but study for 16 hours a day.


I was about to comment on the short PhD in "2-and-a-bit years" -- and how the UK expectation for PhD program duration is so different from in the US.

And it wasn't to say that he's a genius. It's that the program is planned to be shorter, and in fact, there's less funding available to go longer even if you wanted to.


In the UK, 3 years is common.


Right! This was the situation for me.


One must consider what he spent the previous 22 years doing.


I have learned many of these things at some point. Unfortunately, I have also forgotten most of it. Some of it I still remember to some degree, or just the basic idea, and can probably reconstruct it given some time, or easily look it up and remember. However, many of the things I have completely forgotten, and it would take more time to understand it again.

I am not sure I can keep so many things active in my memory + also other things, depending on what I am working on in the last years. If my work in the last few years does not need some of this knowledge, I keep forgetting it.

Maybe my memory is just bad.

I think many of the things listed here are somewhat specific to what the author has worked on, so that it's easier for him to really know all this.

Maybe the point of this post is also more generic: Try to have an active memory over a diverse set of fields.


Haha, so I actually have an atrocious memory! Famously so amongst my friends, I never remember what we've discussed.

When I wrote this list I certainly wasn't expecting/recommending all of this to stay in the reader's head forever.

Rather: if you've worked with something deeply at one time, then -- even if you've forgotten the details -- you still can pattern-match on it later. And then look up whatever you've forgotten!


+1 for probability via measure theory & functional analysis

I would add the need for geometry and topology becoming more of a pre-requisite for picking up modern methods: https://arxiv.org/abs/2104.13478


Maybe I should start a PhD in ML as a university drop out, it sounds like a list of basic stuff taught within the first two years of any math/CS program, but I might miss something as it's far from being detailed...


I think it's hard to get those basics in an undergrad program.

CS often has the problem that the math basics are not teached as rigorously as it should (I think calculus I/II and linear algebra I/II should be exactly the same as the lectures for math). Then it misses measure theory (usually in calc III) and therefore you are going to miss solid fundamentals in probability. Optimization is also usually absent, it's sometimes squeezed into the numerical math. lecture but it shouldn't since numerics (or better scientific computing) is so important. You also want some statistics course.

Math usually lacks the whole machine learning canon, from SVM to NNs, from bayesian methods to statistical learning theory. I just see the same statistics lecture everywhere building upon probability and deriving good estimators and their properties and that's all. Also, you might miss CS basics like algorithms and struggle with the basics of scientific computing (programming in C, knowing all your matrix decompositions etc.). Also, no knowledge of git, how to navigate a server just with a shell etc.

I think you will either have to learn the missing parts after your undergraduate, for example as your master, first year in your phd, or be really lucky! I think most miss a significant chunk of the topics after completing their undergraduate.


Now is a terrible time to start a PhD in ML. If you do a CS PhD, pick literally any other subfield.


I'd argue that cross field is a good avenue for a PhD in CS. Work in collaboration with one of the sciences. There are plenty of papers to be had in application. Not everything in ML research needs to be algorithms. You'll likely find places where the models break down and need algo work too


People have been saying this for as long as I've been hanging on ML forums, meanwhile the professors I work with can't keep up with industry demand for ML and ML adjacent roles and have to open up new classes.


Sounds a lot like my dad telling me to go to law school instead of getting into cs. He thought the ultimate goal of cs is to make everyone in the field obsolete... well maybe he was right in the end.


Lawyers too (maybe they are first)


Why?


The subfield is oversupplied with labor relative to the supply of good ideas worth working on for six years and advising capacity.


It's trendy, and in general it's best to avoid trendy fields.

But if you actually care about ML -- I think it's one of the best things in the world -- go for it! ML has never been more accessible.


Trendy fields often have lots of money in them in both the short term(grants, associateships, funding), and the medium term(first job prospects) as well as benefits like preferential appearance in journals(if two equally insightful articles are up for one spot, would you rather take the article on the topic many are interested in or the article from a much more niche area?) and possibly more low-hanging fruit since there is so much more work to piggyback on/respond to/extend.


because the very best machine learning models are distilled from postdoc tears.


I think every significant model has come from industry. us postdocs just tweak hyperparameters and publish answers to questions about them that no one is ever likely to ask

It's like the "will it blend" series, but for machine learning


While this surely seems required for a PhD program in one of the top universities, more down-to-earth programs would require you only about 1/4-1/2 of whats in the list depending on the area of research. For instance, graph neural networks are very niche.

And if you just want to do research around current SoTA, that would be more like 1/8th.



How come no one mentioned that the most important thing apart for "Just know stuff" is "have a 200 IQ"!


> Please, please: learn some probability via measure theory. [And integration]

Are there any good, gentle books on measure theory for self-study?


Royden.


That's what I do. I drink and I know stuff.


Amazing resource - very applied and relevant. Thank you for this (coming from someone in Data Science).


This post makes me feel like I need to go back in time and start learning when I was 10.


Good references seem to be missing.


Agreed that it would be nice, but exactly how much of the work of becoming educated are you willing to ask this person to do on your behalf?


If they are as educated on these topics as they claim, adding references is a trivial matter and generally greatly improves the value of the work.


I am going to disagree with you there. I have physical copies of a couple books I recommend and making them easily referenced can be a bit of a pain - not difficult but may be not worth the effort if I was compiling a list of topics to serve as my general answer to a question I receive frequently


It’s not hard to drop a title and an author at all. I heavily disagree.


Ok, where is your personal list of important topics to understand <field>? I'd like to check that you properly referenced every recommendation.

I think my real gripe here is the implicit demand for more free labor from some one else. We're talking references here, we both know its not a huge deal if you already have your reference manager fired up, but in the context of this thread the note of 'needs references' feels like a demand/like some kind of obviously missing thing vs a neutral observation that it would make a thing better.

Kinda like if some one makes a cake and when you get a slice all you say is 'needs strawberries'. Maybe y'aint wrong, still rude though.


The effort required is minimal. Here, watch!

Linear algebra is important! Check out Linear Algebra Done Right by Axler.

If the author can’t do this minimal extra step then I question why they’re writing their post since they’re either unqualified or don’t really care about helping people.


Yeah, but perhaps they should have written/published this in a form such that others can add references.


Holy hell, thats an idea with legs. Why dont you make that and send it their way?


Maybe not necessary for a PhD, but you should also have some idea about how HW works.


>Don’t expect to cover all of that in a few months

lol

Yes that was definitely what I was thinking




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: