That explains your results. 3B and 8B models are tiny - it's remarkable when they produce code that's even vaguely usable, but it's a stretch to expect them to usefully perform an operation as complex as "extract the dataclasses representing events".
You might start to get useful results if you bump up to the 20B range - Mistral 3/3.1/3.2 Small or one of the ~20B range Gemma 3 models. Even those are way off the capabilities of the hosted frontier models though.
My experience is that the hosted frontier models (o3, Gemini 2.5, Claude 4) would handle those problems with ease.
Local models that fit on a laptop are a lot less capable, sadly.