OpenCV and other onboard computer softwares can be trained to recognize shapes, 10+ years ago there was a demo of a NodeCopter controlled small drone following red flags.
Stick the GPS coord, fly there, and once in a geofence look for a shape to crash into doesn't seem impossible given what was possible 10 years ago.
Hell, 30 years ago I was working for the MOD (they sponsored my PhD and turned it into an RA) in the UK creating context-aware neural network inference engines for FLIR (Forward-looking infra-red) data. We had all sorts of "fun" stuff running on a Meiko computing surface, with parallelised network training and implementations, temporal and spatial averaging, and relaxation labelling all thrown into the mix to aid the recognition engine, done with a voting system of various architectures sharing to a "blackboard" where information could be posted to and read from. Visualisation was all on high-end (for the time) Silicon Graphics workstations.
The context (together with the features extracted) was the killer (forgive the pun) feature though - everything else reduced noise, but context increased signal.
My gast remains flabbered that the sort of thing I was working on back then hasn't become commonplace in the interim. The computing power available today, compared to then, and the accuracy we had (I know for a fact at least one of the designs was made into real hardware, it was called RH7, and "RH" stood for "Red Herring" - oh how we laughed) ... It beggars belief that it was just left to digitally rot.
There is often quite heavy GPS jamming or spoofing. Also in some of the published videos I think you can see a "no GPS lock" status message - but maybe they just did not bother with GPS if all the drones were manually piloted anyway.
Yes, I assumed they didn't need GPS because they knew exactly where the trucks that were the launch sites were to be placed and they knew approximately their targets would be sitting on a certain section of airport tarmac. The pilots would have had a detailed satellite photo map of their entire route until visual target ID was possible. While GPS was probably partially jammed, that deep in Russia I doubt it would have been as severe as near the front lines. Plus there wouldn't have been heavy jamming of the local drone control frequencies because they weren't expecting a drone attack there.
To me the more interesting question is how they managed sending the real-time video feeds and control data. Since the trucks were mobile, I assume it had to be via a bunch of mobile phones signed up to Russian service providers since Starlink doesn't work inside Russia. To reduce latency, I wonder if the phones were connecting to a covert site in Russia which had a high-bandwidth wired link, maybe a front company established for the operation with servers and broadband internet connections.
GPS is heavily jammed throughout Russia, and the ArduPilot overlays shown on the videos released directly show there was no GPS lock (might not have been equipped as I'm sure they'd be expecting this).
Given these are static targets, it might be possible to relay precise GPS information from that morning’s satellite data. No real time intelligence required. Just dead reckoning to the target coordinates.
Operation in GNSS-denied areas is already a stock feature on many relatively inexpensive commercial drones. The manufacturers talk about it euphemistically for obvious reasons, but they're designing drones specifically for the Ukraine war. There's a huge amount of engineering effort going into building drones that can remain operational in an extremely hostile RF environment.
Compensating for wind drift is a fairly straightforward software problem when you've got a fast processor, a bunch of high-resolution cameras and a laser rangefinder.
If you have a downward facing camera you can track your movement like an optical mouse by just watching the terrain. Error will creep in, but you only need to fly a few kilometers till you find something that looks like a strategic bomber.
Stick the GPS coord, fly there, and once in a geofence look for a shape to crash into doesn't seem impossible given what was possible 10 years ago.