The statement about just throwing 200k tokens to get best answer for smaller datasets goes against my experience. I commonly find as my prompt gets larger, the less consistent the output becomes, and the poorer following instructions becomes. Does anyone else experience this or a well known way to avoid this? It seems to happen at much less than even 25k tokens.