More

selectron · on July 27, 2016

Hand counting of votes seems like a no-brainer, regardless of whether there was a conspiracy this election.

DamienSF · on July 27, 2016

Agree, we have to make hand counting possible in the first place.

selectron · on July 25, 2016

This statement is too general. You could of said the same thing about chess, there are chess Grandmasters who devote their lives to studying the game yet computers play chess at a much higher level than any human.

semi-extrinsic · on July 25, 2016

Chess is rational, following a easily understood set of rules, and both players have perfect information. The big problem has always been analysing all future possibilities.

The stock markets are very far from a rational, perfect information game with simple rules.

jb1991 · on July 26, 2016

If you honestly think that living a real human life, with all the concurrent decisions that are simultaneously and relentlessly made on a micro and macro level throughout every second, every day is the same as a single game of chess, then by all means, go trade the stock market and show us how it's done.

gaius · on July 26, 2016

An organisation at the scale of IBM was able to create, after many attempts, and vast investment, a computer that can beat Grandmasters. That insight isn't useful to an individual trying to do the same.

selectron · on July 23, 2016

I would say that table is really quite valuable. Kaggle problems come from all types of companies, so it doesn't make sense to say that it is "overfitted patterns that he's adopted in his own realm". With that said, validation on your own dataset will trump general knowledge, so you shouldn't view these parameters as hard and fast rules. But the parameters in that table will provide a useful starting point, and if you stray too far from them that is a warning sign that you might be overfitting.

selectron · on July 23, 2016

For image competitions you are right. Neural networks are often in winning teams ensembles, but they require a lot more work than something like xgboost (gradient-boosted decision trees). For a dataset that isn't image processing or NLP, xgboost is in general much more widely used than neural nets. Neural nets suffer from the amount of computing resources and knowledge needed to apply them, though given infinite knowledge and computing power they are probably on par with or better than xgboost. And if you need to analyze an image they are great.

selectron · on July 23, 2016

1) It depends heavily on the model. Something like xgboost (gradient boosted decision tree) will handle irrelevant features fairly well, while other models (like linear models, especially without lasso regularization) will have much more trouble. In virtually all cases adding noise will decrease model performance.

2) Same as 1), depends on the model. With good hyper-parameters xgboost can handle correlated features well, while other models may struggle.

3) With a good model (again like xgboost), feature engineering is usually the best use of your time. Removing "bad" labels and "noise" in the data is especially dangerous, as if you are not extremely careful you can make your model worse. If you can identify why the label is "bad" then you can remove or correct it, but you need a reason why you wouldn't have these bad labels on your test dataset. Removing outliers can help your model, but it is risky. In contrast smart feature engineering is low risk and can provide large gains if you see a pattern the model could not see. Feature selection can be important as well, and is generally pretty quick assuming you have good hardware, so you might as well do it, especially if you have some knowledge about which features you expect to be not that useful.

selectron · on July 21, 2016

There is no way machine learning will be a necessary skill for software engineering, if that is your motivation I would not spend time learning it. However, if you still want to learn it you should first study statistics, for instance http://www-bcf.usc.edu/~gareth/ISL/.

boniface316 · on July 21, 2016

I have started reading this book. So far so good.

samblr · on July 21, 2016

Thanks for referring stats book.

selectron · on July 20, 2016

My advice is Python, but it depends on what your background is and what you want to do. If this is your first language and you have a stats background, R is a solid choice. If you already know another language, R has a lot of flaws that are quite frustrating. Perhaps the worst thing about R is how hard it is to google answers to as opposed to Python.

Like if you google R for loop, the first result http://www.r-bloggers.com/how-to-write-the-first-for-loop-in... is much worse than the equivalent first result for python: https://wiki.python.org/moin/ForLoop

boniface316 · on July 21, 2016

True say. This is going to be the first programming language that I will be learning. I will stick to R for now and slowly evolve from there. I really appreciate you sharing your thoughts.

selectron · on July 19, 2016

Interesting. After watching the show Billions, and reading up on how much money hedge fund managers make on fees (seems totally ridiculous), I wonder how common is illegal insider trading for hedge funds? No matter how good your model is, you won't beat someone with information your model doesn't have.

mathattack · on July 21, 2016

Read up on SAC (Point72). Quite a bit of insider trading.

It isn't just about having great models. 51% of the world can't beat the median return. Everyone has great models. It's figuring out where the models are wrong that matter. (Example 1: Most Mortgage models assumed that housing values in all US markets couldn't go negative at the same time) Sometimes that's by qualitative insights. But that's very very hard. And sometimes it's by having someone give some info that they shouldn't.

selectron · on July 15, 2016

To really understand if companies are biased or not, you also need to know the percent of applicants to these companies who are black. If only 2% of applicants to Google are black, I would expect only 2% of new hires at Google to be black.

The assumption that a white applicant and a black applicant should be roughly equal is a strong prior. I would need to see convincing data to counteract this assumption.

jameshart · on July 15, 2016

Absolutely, this is rather the point of the article: that diversity efforts focused on eliminating bias during selection are pointless unless you also ensure a diverse candidate pool to begin with.

selectron · on July 14, 2016

The goal of research (at least for basic science) isn't to make money, it is to increase knowledge about the universe. This knowledge is a public good, so it makes sense that private industries motivated by profit do not support fundamental research. Scientific progress is a rising tide that lifts all boats.

bane · on July 14, 2016

At some point most of the sources of funding for most research, there's some expectation of ROI at some point. This isn't true obviously for very basic research, but even then there's a hope that the process of exploring the deep unknown produces some practical output. For example, the process of building and designing the LHC produced various advances in fields as far ranging as management theory to precision machining and CAD design.

For the organizations and countries involved, this helps provide for an advanced industrial base that continues to build and drive an ecosystem that supports powerful economic engines.

selectron · on July 14, 2016

If you just want ROI, you are better off spending your money elsewhere. This is evidenced by the lack of money most companies put into scientific research. Further the gains of science are in general hard to profit off of. (http://www.therichest.com/business/they-could-have-been-bill...)

Another major reason governments fund research is to produce people with PhDs. This is why the apprenticeship structure of academia is so sticky, and why in order to get tenure professors have to graduate students.

danielweber · on July 14, 2016

Price signals are how we tell people what research careers are worth considering.

Unless they don't mind working for peanuts, in which case there's no problem.

tostitos1979 · on July 14, 2016

Well .. it is a tournament model so price is misleading. Top researchers at industrial labs get paid a decent amount. The problem is (a) getting a coveted job, and (b) mobility when things eventually go south at your employer and they decide research is expendable.

projectramo · on July 14, 2016

I think both these points are true:

1. Society uses price signals to drive activity to areas "we" want to develop

2. In research, the tournament model prevails so a few winners get most of the gains

However, the question arises:

The tournament model prevails in many other domains including startups, the movie industry and so forth. However, in those areas most people don't feel that the enterprise is severely underfunded. (I am assuming this so if people have contrary data, I will re-assess).

The question stands: in spite of the tournament model, why does society fail to deliver enough reward for basic research when we as a society believe that there should be more of it.

(We could ask the same of teachers, and so on.)

It could also be the case that "society", whatever that is, doesn't really feel that science is underfunded.

selectron · on July 14, 2016

The problem is that there are plenty of graduate students willing to work for peanuts.

mattkrause · on July 14, 2016

You also don't know how you stack up in the tournament. The people entering graduate school have usually done really well in undergrad. However, being in the top 1% (or whatever) of that small pool provides you with a lot less information than you'd think about your prospects in a pool of people who were also in the top 1% of their own undergrad classes.

Repeat for grad school->postdoc, postdoc->tenure track, and so on.

selectron · on July 15, 2016

I agree completely. I also think there is way more luck involved than people want to admit - a lot interesting results are unexpected, and there are so few jobs the timing has to work out for you. I have known plenty of great post docs who couldn't get a Professor job, and plenty of Professors who seem pretty mediocre.