The trick is not to use NNs for DSP but to discover the parameters for the DSP. ...

The trick is not to use NNs for DSP but to discover the parameters for the DSP. In other words you hardcode the signal flow architecture using a common technique like an FDN but then train a NN to find "good" sounding parameters, like comparing to a convolution reverb or recordings.

The thing about reverb is they require a lot of state and nonlinearity is undesirable.