So it seems that the 486 had a trivial not-taken predictor; but that's still different from stalling on each conditional branch and does require checkpointing and rollback on misprediction (although with a pipeline only 5 deep that's probably also not very complex).
Edit: pentium did have a significantly more sophisticated predictor of course, although not without flaws.
I can't find any evidence to support that [1]
I suppose it's technically possible to have branch prediction on a scalar processor, but I imagine it would not be hugely beneficial.
https://books.google.com.sg/books?id=QzsEAAAAMBAJ&pg=PA59&lp...