I couldn't find a paper that talks about the specifics of this very problem. The ones I've found in the past fail with outliers like a bike going through stopped traffic and or the scenario above.
The best approach I could think of is they expand the time they're looking at to classify the vehicles before the outlier situations.
With all the sensors in the phone, it is not that hard. There's a lot of research in that field...