It's not actually all that surprising for people that use styled subtitles with antialiased edges. In extreme cases (lots of glyphs on the screen) you can end up with a noticeable framerate reduction.
It shouldn't need to hurt your framerate too much, considering that the font rendering only needs to happen once every few seconds. A new subtitle can be rendered, kept in memory as a texture, and then just blended by the GPU as pixels. The titles are also known ahead of time, so it's possible to set up a pipeline with no sudden increases in processing load.
The context of that is that the Netflix GUI and the video are being composited together before 3D rendering happens. That's a surprising place to find subtitles, to me. Why are you putting them on the virtual screen? Why not let them float in space in front of your eye? They should be where the sound is, not where the picture is.
I agree conceptually, but our eyes can't focus on 2 things at once when they're at different distances (or appear to be). By putting the 2D screen at the same place in 3D as the subtitles, it'll be less tiring on the eyes.
That's making the assumption that we only have a static screen to place them into. In VR, we have the entire environment around the screen and potential for a HUD locked to the user's eyes in space. It would be cool to try different implementations and see which experience is best.
"noticeable" is probably the surprising part. I'm sure it also uses more power on a normal Android phone too, but not to the degree that it does in VR.
These are the Carmack gems I was looking for.