I think we're living in a world where deep learning is winning so consistently that comparison to other methods is often just a time suck. It would be nice to provide a non-DL approach as a baseline, but I would expect it to lag behind the DL methods.
Furthermore, often pre-DL methods can be recast as hand-tuned special cases of DL models - some sequence of linear operations with hand-picked discontinuities sprinkled around. If you can implement the pre-DL method using standard neural network components, then gradient descent training of a neural network "should" find an equivalent or better solution.
Deep learning models are not better for vast problem areas which have analytical design algorithms. Deep learning's succession of triumphs has been across areas where analytical design has proven difficult.
First, there are many optimal, or near optimal, direct design algorithms for systems that are well characterized. These solutions are more concise, easier to analyze, reveal important insights, and come with guarantees regarding reliability, accuracy, stability, resource requirements, and operating regimes. Clear advantages over inductively learned solutions.
Second, just assuming that new algorithms are better than older algorithms is completely irrational. An anathema to the purpose and benefits of science, math, and responsible research in general.
If you are going to propose new algorithms, you need to compare the new algorithm against the previous state of the art.
Otherwise practitioners and future researchers will be driven into deadends, deploy pointlessly bad designs, forget important knowledge, and worst of all, lose out on what older algorithms can suggest for improving newer algorithms. With no excuse but gross carelessness.
This something that DL researchers like to think but it is definitely not true for time series forecasting. See https://forecastingdata.org/ for some examples where simple non-DL approaches beat state-of-the-art DL systems.
> I think we're living in a world where deep learning is winning so consistently that comparison to other methods is often just a time suck.
This is quite untrue. DL methods work well when there’s a lot of data in closed domains. DL works well by learning from corpuses of text and media where it can make reasonable interpolations.
When you don’t have enough data and you don’t have a known foundational model that you can do zero shot from, DL doesn’t work better than simpler conventional methods.
> It would be nice to provide a non-DL approach as a baseline, but I would expect it to lag behind the DL methods.
The M# competitions have usually shown very old forecasting algorithms work quite well, with frankly, way less training overhead and data. Ensemble models usually do best, but for a lot of use cases, DL is probably overkill versus ARIMA or triple exponential smoothing.
Furthermore, often pre-DL methods can be recast as hand-tuned special cases of DL models - some sequence of linear operations with hand-picked discontinuities sprinkled around. If you can implement the pre-DL method using standard neural network components, then gradient descent training of a neural network "should" find an equivalent or better solution.