Wonder what the compute cost for this ends up being. My guess is they will have to somehow rate limit it - I’m sure the rollout speed will also depend how much people end up using it. Maybe they will figure out some ad revenue - but I don’t think they can charge users for this.
Would love to know how much compute $ they’re willing to throw at this to attempt to topple Google. Probably whatever it takes (within reason), MS missed out on mobile and search - can’t afford that to happen again.
Over time the inference for these queries will get cheaper (FP8 and later 4bit precision, sparse/pruned weights etc)
How many searches are only searched 2 and 3 times?
I'd question if it's worth it to store the results an AI output for 3 years so that you don't need to regenerate it when someone searches it in the future.
For 1 - you're probably going to have a meaningfully better model every year...
Yes, there’s a graph of cache duration to hit ratio.
With no data or particular domain expertise, I’d bet a 24 hour cache would hit at least 30% of the time, and a 7 day cache 60% of the time.
The problem is that a lot of the misses will be for recent events. How do you know whether “how many 737 maxes have crashed” should be served with a cache that is 3 months old?
Would love to know how much compute $ they’re willing to throw at this to attempt to topple Google. Probably whatever it takes (within reason), MS missed out on mobile and search - can’t afford that to happen again.
Over time the inference for these queries will get cheaper (FP8 and later 4bit precision, sparse/pruned weights etc)