This is cool - there’s some similar work here https://arxiv.org/pdf/2402.01571 which uses spiking neural networks (essentially Dirac pulses). I think the next step for this would be to learn a tonal embedding of the source alongside the event embedding so that you don’t have to rely on physically modelled priors. There’s some interesting work on guitar amp tone modelling that’s doing this already https://zenodo.org/records/14877373
How funny, I actually corresponded with one of the authors of the "Spiking Music..." paper when it first showed up on arxiv. I'll definitely give the amp-modeling paper a read, looks to be right up my alley!
Now that I understand the basics of how this works, I'd like to use a (much) more efficient version of the simulation as an infinite-dataset generator and try to learn a neural operator, or NERF like model that, given a spring mesh configuration, a sparse control signal, and a time, can produce an approximation of the simulation in a parallel and sample-rate-independent manner. This also (maybe) opens the door to spatial audio, such that you could approximate sound-pressure levels at a particular point in time _and_ space. At this point, I'm just dreaming out-loud a bit.
This is possible but very very hard! Actually getting the model to converge on something that sounds reasonable will make you pull your hair out. It’s definitely a fun and worthwhile project though. I attempted something similar a few years ago. Good luck!