They already do this surely? I would think to implement high quality Bayesian type spam indexing, it's much the same.
They parse over header all the time. All mail providers do to implement procmail type functions. They parse body because they support complex boolean logic over string search.
I get it: this feels like a meta step "upward" but really, I am unsure it is. I am also of course like you concerned. It's my mail too.
I think there is a difference between comparing strings and so on, versus actually semantically processing the text. Spam/Phishing detection already feels invasive to me (although there is no way I would leave that to most users), and this is just one step further.
Meaning they will actively be parsing emails and indexing them.