Numerai, a hedge fund built by a community of anonymous data scientists

dzdt · on June 22, 2016

Some thoughts on the business model:

* traditional hedge funds have a problem with scaling: if you put more money in the same strategy returns go down. Numerai hopes to scale the number of strategies it employs by scaling the number of researchers participating.

* by providing researchers only opaque streams of data, they prevent researchers from leaving and competing directly. If you don't know how the data corresponds to the market, you can't replicate the trading at another fund. (Some big hedge funds like D.E.Shaw do the same!)

* researchers may still leave and compete indirectly, using the same algorithms on different market features. But by paying anonymously in bitcoin, Numerai may be hoping for the reverse, that programmers from other quant funds will anonymously moonlight for Numerai using their algorithms from those other funds.

* by being opaque with the data, Numerai keeps researchers from knowing the true value their strategies are providing. That information asymmetry is in Numerai's favor, letting them underpay even strong performers.

lordnacho · on June 22, 2016

Quant fund insider here.

The data is pretty pure, in the sense of not telling you any metadata at all. It's literally just a bunch of numbers and 0/1 labels.

It's hard to implement a strategy without knowing what exactly you're looking at. I get the feeling this "pure dataset" is part of some framework that Numerai thinks will beat the market, given good predictors.

That's not necessarily the case. Say I assume the 0/1 means up/down over some period. Well, being able to guess 0/1 correctly would obviously help. Say I'm right 70% of the time, then I can equal weight my bets and it will be just swell. But say I'm right about 51% of the time. Then it's going to take quite a while longer for the law of large numbers to work in my favour. Remember your ML algo will only be able to give you good predictions if some of the 21 features are actually meaningful, and we have no reason to think they are actually meaningful.

Now, let's say I have some domain knowledge in finance. I want to predict over/underachievement relatively. I would be able to guess which shares go up relative to others, but not the market factor. That would require a different framework to the one I'm supposing is presented here. Is there flexibility for that?

The secrecy thing makes me wonder, too. If it's just a matter of not showing your work, why don't you just have a website where people submit their daily/weekly/monthly portfolios and you keep track of the tally?

valdiorn · on June 22, 2016

> Say I'm right 70% of the time, then I can equal weight my bets and it will be just swell. But say I'm right about 51% of the time. Then it's going to take quite a while longer for the law of large numbers to work in my favour.

That's actually very far from being true. If you trade a single instrument, sure, the variance will kill you in anything but the very long run. But if you trade thousands of securities (like say, the entire US equity market), then a 55% prediction ratio and a market neutral strategy will absolutely crush. Even if you blindly buy/sell on every signal without doing any sort of weighing (excluding low confidence predictions, etc), then you should see a several sigma strategy.

It only takes a very, very small edge to make a very low risk strategy if you can diversify.

https://en.wikipedia.org/wiki/Signal_averaging

Now add on top of that the fact they will have several low SNR prediction signals, and the effects of signal averaging become even greater

I'm also a "quant fund insider", as you put it...

lordnacho · on June 22, 2016

Yes, Taleb did the actual calculation in one of his books. I'm exaggerating because if I say it's 50.01% it will cause head scratching.

mikkom · on June 22, 2016

> I get the feeling this "pure dataset" is part of some framework that Numerai thinks will beat the market, given good predictors.

> That's not necessarily the case.

This is the part that I find most interesting. They have a hypothesis and they are testing it with real money.

They are even outsourcing computational power which I think is very interesting as running ml fund with thousands of algos would probably be quite hard to scale.

gtrubetskoy · on June 22, 2016

I'm skeptical. There are skyscrapers in NYC's, Londons, Singapores and Hong Kong's of this world filled with people who are smart and have enormous computer resources and funds and are paid handsomely to work on solving this problem with all manners of ML and AI at their disposal, the "crowd" has no advantage over them. The "closed system" is much larger than the "crowd" in this case.

mikkom · on June 22, 2016

Unless there are people in the "crowd" who work in machine learning / data mining in totally different sector (let's say genomics/biostatistics as this is the example in the article) but have no access to the hedge fund world.

The "crowd" could very well have a long list of very intelligent people who are "experts" in some other sector and who have fresh insights and want to get some anonymous extra income on top of their salary.

This said, I think their compensation seems really low.

karmacondon · on June 22, 2016

I don't think this is true at all. 10,000 people are just going to have more ideas and better individual ideas than 100 experts. The impact of that much creativity and perspective can be exponential, and it's hard to duplicate.

When I'm designing a system, I hate to have to try to out think everyone on the internet. If you have a known set of opponents you can predict what they might do. When you're up against anybody from anywhere, you never know what you're going to get. Global scale collaboration is a very powerful thing because it allows a complete exploration of the solution space, and it's difficult to stop.

iopuy · on June 22, 2016

I absolutely disagree. I'll take the word of 100 experts over 10,000 amateurs.

sseveran · on June 22, 2016

The interesting factor here is strategy sizing. The 100 experts need to manage portfolios of a certain size. The amateurs don't. There are people who can carve out a living of a strategy that has sustainable alpha but would not be attractive to hedge funds due to the capacity of the strategy.

Ameo · on June 22, 2016

This is absolutely true. Many, many strategies that are viable at small portfolio sizes fall off very quickly when millions, tens of millions of dollars start to be used for them.

Although the world's financial markets are massive, the little inefficiencies that can be exploited for profit often aren't.

roel_v · on June 22, 2016

"This is absolutely true. Many, many strategies that are viable at small portfolio sizes fall off very quickly when millions, tens of millions of dollars start to be used for them."

I've thought this for nearly a decade now, yet have never seen or thought of such a thing. Of course I'm just an idiot so the fact that I didn't think of any means nothing; but you'd think that in all that time looking for it, someone somewhere would have described such a thing, even if only to make money on 'how to find small-portfolio investment strategies' ebooks and seminars.

So I'm curious what makes you say 'absolutely true' rather than 'I think so too'.

sseveran · on June 22, 2016

Well to start with people running small capacity strategies tend to be just as secretive as running a large capacity strategy. Strategy capacity is something that professionals talk about and doesn't get as much play when talking to a retail audience.

As for the approach, its not any different than finding a high capacity strategy. It requires some piece of information or insight that other market participants don't have. Consider a scalping strategy that trades a few different futures contracts. If on average we trade 200 times per day with an expected profit of $5 per trade with are making $5,000 per week with our strategy. If we say we spend $5,000 per month on the tech to run our business (a risk system, market data, compute time, etc...) we are making $15,000 of profit each month.

If we are a large hedge fund or prop operation the $15,000 per month (assuming the same costs) may or may not be worth running. As a trader say I am making $250K base plus some percentage of my P&L I would definitely need to be running more than that strategy to justify my job. Depending on how much attention it requires it might not be worth the company running it. If I have two other strategies that each make $100,000 per month for the business am I better of in investing in those strategies or one that makes a lot less money? The answer could be yes (like maybe I could add a hundred more instruments to trade) but just like any other business the investment will be evaluated versus the expected returns of my other options.

roel_v · on June 23, 2016

I understand the last paragraph, my question is more: what makes strategies not scale. Should you go looking in very niche sectors, where the transaction volume limits the scalability? Or are there particular classes of market insights that by their nature only allow (relatively) small profits? Or are there areas where one can exploit particular technical skills? Or maybe focus on geographically small areas somehow, which by their nature limit the potential profit and thus might keep out big players?

I'm not a trader and the level of my questions probably shows that; still despite my (amateur) research for years, I haven't found evidence of people successfully deploying such strategies. And while particular strategies of big players are secret, there is a lot of information on the general principles; for small setups, nothing (afaik). So that leads me more and more to the conclusion that it's simply not viable.

Looking at it another way: a trader who got his experience in a big fund, and who goes solo (a documented scenario), do they go after such inherently small markets, or do they do the same they'd do at a big fund only with less money or with less sophistication? In other words, if they'd have more money, could they scale up or not?

sseveran · on June 23, 2016

So if you think about any opportunity in the market you are bounded by how much liquidity is available at a price level that is mispriced. In many cases the size (or capacity) is correlated pretty closely this what the expected duration of a trade is and the size of the mis-pricing. In my example of a futures contract that is not very liquid as you move into the market and buy contracts you will naturally move the price such that the price moves to the level you were expecting it to. This limits how much capital can be deployed in an single trade.

Bloomberg wrote about smaller shops a few months ago: http://www.bloomberg.com/news/articles/2016-03-16/barbarian-...

I run a strategy currently which consistently is profitable. I know others that do as well. What I run currently is work that came out of starting an automated trading shop so my partner and I have a considerable amount of infrastructure at our disposal that others just starting might not. I currently work elsewhere in finance but may return to it full time when what I am working on now either succeeds or fails. There have been a proliferation of 2 - 5 person shops that are typically pretty secretive about what they do. Several people I know in different ones don't even say that they have a job on linked in.

That being said there is a lot more available off the shelf things available now (Quantopian,etc...) then there was 5 years ago when I tried.

To your last point going to different markets is not just something that individuals can do. There are for instance HFT firms that started trading in places like brazil given the competitiveness of US markets. Markets are only long run efficient.

dzdt · on June 22, 2016

Dude, in the context of an internet post "this is absolutely true" MEANS "I think so too."

roel_v · on June 22, 2016

Sure, it was just a roundabout way of saying 'please elaborate' while trying to keep the tone non-combative.

taneq · on June 22, 2016

Examples of the word of 10,000 amateurs:

* Anti-vaxxers

* The healing power of crystals

* Moon landing conspiracy nuts

* Multi-level marketing

tomp · on June 22, 2016

Examples of the word of 100 experts:

* LTCM implosion

* The Great Recession

* Libyan and Syrian wars

* Fukushima

darawk · on June 22, 2016

There are also 'experts' in these fields, so your point is moot.

wallace_f · on June 22, 2016

OK, but how many physicists support the moon landing conspiracy? How many aerospace engineers support the moon landing conspiracy? What credentials do the experts of the moon landing conspiracy have that I should trust?

I think he has a valid point if you have bias in which experts you place trust. There are, in fact, a lot of experts -- and even academically tenured, credentialed, published experts -- that I agree, don't have much of anything worthwhile to say.

darawk · on June 22, 2016

Ya, true. There are particular areas though where 'wisdom of the crowds' works much better than any expert, and then there are obviously areas where it does not.

I'm not sure exactly what the properties of each type of problem are, but it doesn't seem at all obvious to me that stock picking is not one in which a sort of herd optimization approach might be very effective.

bcherny · on June 22, 2016

The two aren't exclusive. What's to stop the experts from doing this as a side project?

sseveran · on June 22, 2016

They stand to lose a lot for relatively little gain. Hedge fund non-competes are very, very carefully enforced. Making money trading on the side would get you sued.

flashman · on June 22, 2016

The excellent thing is we have fund size as a scoring system.

gtrubetskoy · on June 22, 2016

My point is that in the world of quants, it's more like there is 10,000 paid experts who are not about to share anything and 100 amateurs.

joncooper · on June 22, 2016

"It is intuitively obvious that an open access hedge fund will generate more intelligence than a closed system built on a pre-internet, pre-cryptocurrency, pre-AI organizational design."

Really? Because the folks with the magic black box aren't capable of funding an Interactive Brokers account to keep 100% of their upside and 100% of their IP?

(Also: risk management and order handling are harder problems than signal generation.)

arcanus · on June 22, 2016

Isn't that the magic of OSS? Linux::Windows, matplotlib::mathematica, android::iPhone, etc. In each case, the free variety quickly catches up to the proprietary version, and in doing so, cuts into the profitability of the parent. Furthermore, this often breaks down monopolies, as they must innovate or die.

superuser2 · on June 22, 2016

Matplotlib covers a tiny spec of a footnote of Mathematica's functionality. SageMath (a composition of Numpy, Scipy, Sympy, matplotlib, R, etc.) is a more appropriate analogy.

kefka_p · on June 22, 2016

While some of the examples are more apt, iPhone/iOS actually integrates a good deal of open source software itself i.e. Darwin, WebKit, LibDispatch, Core Foundation, Swift, etc. Further the openness of Android has been called into question by many over the years. Android isn't just AOSP anymore than iOS is just Darwin. I imagine a degree of the innovation provided by OSS has helped Apple keep profit margins as high as they have.

fpgaminer · on June 22, 2016

I started poking at this out of curiosity, and a desire to begin sharpening my TensorFlow axe, and one thing remains unclear. They give you two spreadsheets, one being the training data and the other is the tournament data (what you need to predict on). Each entry in the spreadsheet is 21 features and a single binary class. The latter is what you predict. But for the submissions they request a probability, not a class. They don't explain what "probability" here means. Does it mean probability of class 0? Probability of class 1? Probability of the moon exploding on a Thursday?

Overall interesting idea. Undecided whether it's real/scam/fake, but definitely very interesting at face value. I just wish their documentation was more clear. Seems kind of important...

EDIT: Found a comment on Reddit that indicates that it means probability of class 1 (https://www.reddit.com/r/MachineLearning/comments/3wdr9e/num...)

TrickedOut · on June 22, 2016

Have you found any good TensorFlow examples which handle financial or time series data like this? Please do share! Most of the examples I find are either image processing or text processing. Rarely time series or traditional DB type data.

nl · on June 22, 2016

Generally people use a LTSM for time series if they want to use a NN approach.

See http://robromijnders.github.io/LSTM_tsc/

cryptokoala · on June 22, 2016

Numerai comes across as fraudulently abusing cryptographic buzzwords like homomorphic encryption https://medium.com/@Numerai/encrypted-data-for-efficient-mar...

_yvjs · on June 22, 2016

Yes, they still haven't replied to a question about this.

https://www.reddit.com/r/Bitcoin/comments/4p5xgx/ai_hedge_fu...

Based off that article they don't seem to understand the homomorphic in homomorphic encryption.

The mix of technical BS and seemingly expert advisers is weird.

alxmng · on June 22, 2016

Numerai uses order-preserving encryption, which is homomorphic. The algos are trained on the ciphertext itself.

bberenberg · on June 22, 2016

Understanding which features to create and why is significantly more impactful than just trying new models on the same dataset.

Xcelerate · on June 22, 2016

> Numerai was seed funded by Howard L. Morgan the co-founder of Renaissance Technologies.

Very interesting. This gives this idea some legitimacy in my opinion.

s_q_b · on June 22, 2016

Very much agreed.

For those who are not aware, Renaissance Technologies is a massively successful hedge fund that makes investment decisions solely from data, with perhaps the most sophisticated mathematical models in the marketplace.

Their approach was entirely novel when James Simons founded the firm. Simons is incredible mathematician, graduated MIT in his teens, and obtained his doctorate at 23. Before and during Renaissance, he made significant contributions to cryptology, topology, and string theory.

His firm essentially invented quantitative trading. To this day, with close to $30 Billion under management, Renaissance Technologies still makes investment decisions purely algorithmically.

vostok · on June 22, 2016

To provide a counterexample, P/NP was doing quantitative trading before RenTech. I will say that I wouldn't comment on implementation details in this industry unless I've worked at the company in question.

s_q_b · on June 22, 2016

That's fair. It was not my intention to provide a comprehensive review of the firm's approach, but rather a quick summary that glosses over a great many details.

To address your second point, I wouldn't comment on implementation details in a company for whom I had worked.

P/NP is Princeton Newport Partners. I have a passing familiarity with that story ;)

vostok · on June 22, 2016

I agree. I wouldn't do that either.

osullivj · on June 22, 2016

I see they've got Packard from Prediction Company as well. Bass, Thomas A., The Predictors, 1999, gives a good account of Packard & Doyne Farmer's work on market prediction.

powera · on June 22, 2016

This is where the "accredited investor" warnings are appropriate. Don't do this with your money if you aren't willing and able to lose it!

In the long run, it's impossible for people to beat the market simply by looking at historic stock prices. Impossible. If it is possible in the short run, more and more people will do it until they don't make any money at all, or a "black swan" event occurs and they go completely bankrupt. (I suppose there's a third option, that they all make so much money that the entire rest of the world goes bankrupt, but that's absurd)

So be careful!

hault · on June 21, 2016

As Peter Thiel says, great startup founders are those which can see the future in ways in which others can't. This idea certainly looks like the future to me. Very interested to see where this goes.

rgbrgb · on June 22, 2016

So cool. This is the first time I've heard of homomorphic encryption. In my case, Open Listings has a lot of real estate data that we're not allowed to vend programmatically (sale prices, list prices, property characteristics). It would be interesting to be able to release this data in a legally encrypted form and let data scientists train predictors. We currently have an offer creation API that's being used by algorithmic investors but they have to get their data to decide what to bid on from another source. My immediate questions maybe someone here will know the answer to...

1) Is it legal to vend a dataset that is encrypted this way if you're not allowed to vend the original? The OP implies that it is, but that seems too good to be true.

2) Is there software purpose-built for this type of thing? What's good in this domain? Our stack is mostly ruby but we're polyglots.

modeless · on June 22, 2016

So in exchange for giving a hedge fund a stock tip that earns 20% in a month, the guy gets $10k? That sounds like a ripoff to me! If you have the skills to do that repeatedly you can make a whole lot more than $10k doing the trades yourself.

onion2k · on June 22, 2016

The point is that you probably can't do it repeatedly, or predict with any confidence that you can even do it once. Numerai enables you to bet using someone else's money, with a vastly reduced reward, but no risk to you.

Numerai won this time (hence the PR piece) but I don't think we should judge their performance on one action in isolation. We should judge whether their approach works based on a year or two of trading on these predictions. Maybe longer, if reacting to unusual events (economic collapse, freak speculation on tulips, etc) is something you care about.

mrkgnao · on June 22, 2016

Plus, Numerai gives them opaque features for ML training that correspond to real-life data in ways that only Numerai knows. So you can't bail and use your model on your own.

agorabinary · on June 22, 2016

Well if you have $50k capital then 20% gains = $10k is about fair. Luckily for numerai they have $1mil capital so the rewards are a bit higher...

oneloop · on June 22, 2016

You know they have $1m, or are you guessing? References please?

davnn · on June 21, 2016

Interesting idea. Not that crowd driven investment algorithms are new, but I have not seen a machine learning one before.

What really ennoys me about this kind of businesses is that they pay tiny prices and shut the competitions once they have found what they were looking for, however Numerai might be completely different in that regard and I wish them the best!

Btw: The article kind of conveys the feeling as if machine learning is something new to the hedge fund business and that's absolutely not the case. There are already smart people working on really complex algorithms since a couple of years now.

dharmon · on June 21, 2016

Even more than "a couple of years". About 15 years ago I worked at a day trading firm and we were writing models that used machine learning. At the time we thought of it more as "computational statistics", but its basically what is called ML now and taught in ML courses (although we didn't use Neural Nets).

BTW, even in 2001 we were far from the first to do this.

T-A · on June 22, 2016

Depending on where you draw the line between statistics and machine learning, it could be argued to have originated with Bachelier's thesis in 1900 [1], or Thorp's adaptation of Kelly's work first to gambling and then to finance in the early 60s (he may have been first to use computers for this kind of thing) [2] or maybe with James Simons' Renaissance Technologies in the early 80s [3].

[1] https://en.wikipedia.org/wiki/Louis_Bachelier

[2] https://en.wikipedia.org/wiki/Edward_O._Thorp

[3] https://en.wikipedia.org/wiki/Renaissance_Technologies

hkmurakami · on June 22, 2016

Regarding Rentec, the only decent book on their history has been in "The Quants". Are there any others?

https://www.amazon.com/Quants-Whizzes-Conquered-Street-Destr...

TrickedOut · on June 22, 2016

Almost everything I see on neural nets are related to either image processing or text processing. Rarely time series or traditional DB type data -- is there a classic hello-world example or common demonstration of how neural nets are applied onto tabular data common in financial markets?

richard_craib · on June 21, 2016

Every other machine learning competition I've seen is kind of one-off in nature. But stock market data is being generated all the time so I think there will always be new strategies to learn on Numerai.

shostack · on June 21, 2016

Can you clarify on the "pay tiny prices and shut the competitions once they have found what they were looking for" comment? Has this happened before? Is there anything here that makes it seem like this wouldn't be the case here?

ikeboy · on June 22, 2016

They only require you to give them predictions, not a model. So you can reuse the same model over and over if it works, and they won't be able to cut you out, if that's what you're asking.

shostack · on June 22, 2016

That was part of what I was asking. The other part was around whether there have been instances of firms cutting people out like you described.

thedlade · on June 21, 2016

Exactly, machine learning has already been applied in a number of Wall Street firms for some time now

ianpurton · on June 22, 2016

When you’re standing at the beginning of a super exponential curve, that’s the time to buy insurance against any negative outcomes along that curve. So today, we’re allowing users to donate Bitcoin to the Machine Intelligence Research Institute (MIRI) as a hedge against things going horribly right.

If you're the kind of person that falls for this kind if thing, then you should know I'm also standing in front of a super exponential curve raised to the power of infinity and beyond. You can also send me bitcoin as a hedge if you wish.

dharma1 · on June 22, 2016

I've been looking at this a few times. Its like a giant ensemble. But I'm not sure ML will be able to beat chance on average on a data source like this.

And if someone discovers they are making money consistently on numerai, I think they would set up their own fund quite quickly.

I do like the encrypted system though, could be used for other ML competitions where you don't want to give your model away

HappyTypist · on June 22, 2016

I know why they're paying out Bitcoin and keeping everything anonymous. They are hoping quant hedge fund insiders submit their model to the site.

dharma1 · on June 22, 2016

You don't submit a model, just results

abcampbell · on June 21, 2016

But why did the machine want to go long Salmar ASA?

alxmng · on June 21, 2016

The same reason any trader wants to go long Salmar ASA: Their model says so. Except with numerai, the model is a machine-learning algorithm instead of a human with their bias and intuition.

abcampbell · on June 24, 2016

Don't think you understand my question.

brycehidysmith · on June 21, 2016

Does it matter? The machine saw a pattern, and it responded to the pattern. We don't need to know.

datamingle · on June 21, 2016

If the machine is "anonymous", it does matter. Scenario #1: A human gets insider information that a Solar City will be bought. Makes his anonymous "AI machine" predict that Solar City is a great stock to buy!

alxmng · on June 21, 2016

That's not possible. The data is encrypted. None of the participants can see which stocks (or anything) about the data they train with. Numerai turns stock prediction into a pure ML problem.

theli0nheart · on June 21, 2016

Doesn't the encrypted chart still need to display price history or volume? If so it seems like it'd be a trivial task to match it up with its real-world counterpart.

richard_craib · on June 21, 2016

It would be easy to match an obfuscated stock market dataset with some third party dataset, and this has happened on many Kaggle competitions (data leaks). That's why the encryption here is important.

Someone · on June 22, 2016

But how do you encrypt a stock's historical performance without removing the information (performs better in summer, went up after 9/11…) hidden inside it?

You can add noise, but I doubt that will be enough.

ikeboy · on June 21, 2016

See https://en.wikipedia.org/wiki/Homomorphic_encryption, which they claim to use.

ves · on June 22, 2016

There's no way in hell they're doing ML on cipher text. It's like orders upon orders of magnitudes too slow.

emkman · on June 22, 2016

The model has no idea what stock it is predicting. It is just a random ID that represents a security. Further, there are no encrypted charts and no ability to backtest your model outside of the small amount of data provided by Numerai. Download the csv's. They are much smaller than I expected.

jldugger · on June 22, 2016

Okay, submit hundreds of bots, and have them generate random picks. One gets lucky, and is selected for hilighting.

richard_craib · on June 22, 2016

Check your coin toss math on 1000, 10000, 100000 predictions.

discardorama · on June 22, 2016

So when your "AI" makes a recommendation, it recommends some encrypted symbol?

oneloop · on June 22, 2016

Yes, it does matter. Following the machine without understanding what's going on is how you make mistakes.