Just tried it with a simple 4 character one and it's bad at it, detects 1 or two characters correctly from the 4 if it outputs anything.
It's probably better with the "select the traffic lights" kind of captchas, but those are also already possible to solve with other image models too if I remember correctly.