You are, of course, correct. It's not exactly a bulletproof heuristic. At best, you'd probably only be able to identify likely filler episodes, as opposed to filler scenes.
A truly sophisticated approach capable of identifying filler scenes would probably involve machine learning using data that's not (to my knowledge) actually available to the public, like engagement/watchtime statistics.
A truly sophisticated approach capable of identifying filler scenes would probably involve machine learning using data that's not (to my knowledge) actually available to the public, like engagement/watchtime statistics.