• Click prediction: How do you accurately predict if the user will click on an ad in less than a millisecond? Thankfully, you have billions of data points to help you.
• Recommender systems: A standard SVD works well. But what happens when you have to choose the top products amongst hundreds of thousands for every user, 2 billion times per day, in less than 50ms?
• Auction theory: In a second-price auction, the theoretical optimal is to bid the expected value. But what happens when you run 15 billion auctions per day against the same competitors?
• Explore/exploit: It's easy, UCB and Thomson sampling have low regret. But what happens when new products come and go and when each ad displayed changes the reward of each arm?
• Offline testing: You can always compute the classification error on model predicting the probability of a click. But is this really related to the online performance of a new model?
• Optimization: Stochastic gradient descent is great when you have lots of data. But what do you do when all data are not equal and you must distribute the learning over several hundred nodes?
• Click prediction: How do you accurately predict if the user will click on an ad in less than a millisecond? Thankfully, you have billions of data points to help you.
• Recommender systems: A standard SVD works well. But what happens when you have to choose the top products amongst hundreds of thousands for every user, 2 billion times per day, in less than 50ms?
• Auction theory: In a second-price auction, the theoretical optimal is to bid the expected value. But what happens when you run 15 billion auctions per day against the same competitors?
• Explore/exploit: It's easy, UCB and Thomson sampling have low regret. But what happens when new products come and go and when each ad displayed changes the reward of each arm?
• Offline testing: You can always compute the classification error on model predicting the probability of a click. But is this really related to the online performance of a new model?
• Optimization: Stochastic gradient descent is great when you have lots of data. But what do you do when all data are not equal and you must distribute the learning over several hundred nodes?
Missions of the team in more details: http://labs.criteo.com/wp-content/uploads/2015/04/Software-E...
Feel free to drop me a line at n.rassam[at]criteo.com