Super easy to get started, but lacking for larger datasets where you want to und...

Super easy to get started, but lacking for larger datasets where you want to understand a bit more about predictions. You generally lose things like prediction probability (though this can be recovered if you chop the head off and just assign output logits to classes instead of tokens), repeatability across experiments, and the ability to tune the model by changing the data. You can still do fine tuning, though itll be more expensive and painfaul than a BERT model.

Still, you can go from 0 to ~mostly~ clean data in a few prompts and iterations, vs potentially a few hours with a fine tuning pipeline for BERT. They can actually work well in tandem to bootstrap some training data and then use them together to refine your classification.