That's a great question, I've been working up a more detailed analysis of implementation shortfall (and thinking about ways to make this automated) - but your question prompted me to post a quick comparison here that you can look at for some current results: https://www.quantopian.com/posts/live-results-vs-backtest-re...
Full disclosure, I work for Quantopian and I wrote this strategy.
zygomega, I'm one of the zipline maintainers and we'd love to collaborate with you. There are a few people doing HFT research with zipline, and there's a lot of work to do. At quantopian (my day job), we focus on longer hold periods, so there is room in the zipline ecosystem for you to do HFT.
The main benefit we've found with python as the algo language is that it allows for stat programming with pandas, but also OO or functional programming for the algo logic. This smoothes the transition from research to production, just as you're describing with R -> haskell, but you can stay in one language.
I think one of the biggest potential wins with parallelization is if you can assume all positions are closed overnight, most often true for HFT. That way, you can simulate all the trading days in a test range in parallel. This is quite similar to the parallel processing we do to handle the large number of concurrent backtests running at quantopian. We did all of that with python, but I'd be fascinated to see it done with haskell.
I would see haskell very much as the plumbing side of things. The tools available for handling and reasoning about streams are streets ahead of anything I've seen elsewhere. With zeromq and protocol buffers (that's what we use in our stack) you could very nicely separate the plumbing of the data from its consumption. I'd love to see something like this as well!
How would you handle the position sizing part of the algo if you're testing all days in parallel? Wouldn't the trade size depend on the all of the previous day's PNL?
Hi fawce, I'm a big fan of quantopian but didn't realize that zipline was a separated project. Will have a good scout around the project and see what you guys are up to :)
It would be cool to do that with this signal, if the algo was buying/selling on another signal. Maybe use the short interest signal as a gate on momentum investing for example.
You're speaking truth. We (quantopian) deal with all of those headaches, and test the algos with fully adjusted data. Splits, symbol changes, mergers, divestitures, dead companies, dividends - they're all covered.
If you click on the code and search for commissions, you'll see how those costs are taken into account. The big missing thing is the market for borrowing the stock to do the short side of the trade.
No money has traded on my version no. But, I understand that asset management firms have licensed the more sophisticated one Jess wrote at TR, so I would think they use it with real money. From what I understand, firms look at numerous signals like this, and then make investments based on a combination of the signals.
The stockmarket is primarily driven by the news, both on the ultra-short term (minute scale) and at the scale of a few days. So, scraping the web and performing semantic analysis of investor sentiment can be a good way to get an edge in the game.
Though in order to be really effective you'd need to do it before Wall Street traders' reactions adjust the prices, ie. get access to Bloomberg's B-Pipe [1], do real time semantic analysis and place orders with ultra low latency. Which quite a few trading firms are already doing...
Other factors include P/E ratio (only accurate in the longer term), div yield, recent growth...
Also, the chart in the page is from actual trading on a roughly $25k account, and updates each day.