Deep learning and machine learning don’t work. Quantitative math will always prevail, as it always has. Unfortunately, mathematical research isn’t there yet. We don’t have models for vision, audition and linguistics. Neuroscience and psychology are in their infancy, a good analogy would compare these fields to where physics was pre-Newton, Galileo era of understanding. I suspect that in the decades to come, these fields will influence mathematics the same way physics influenced calculus. Physics historically had a huge influence on math, in the coming century it will be neuroscience and psychology, in linking brains to behavior, and the quantitative laws that allow brains to give rise to minds.
I guess it depends on... what "work" means. So I worked on Deep Networks for quantum chemistry (I'm not a physicist or chemist but) I can tell you people were ecstatic about the possibility that the approximations that the neural nets come up with might get closer to real physics than current approximations right now. This w/o any needed advancements. Some challenges are so difficult in these areas that approximations are the best that are possible. It's kind of similar to drug discovery now.. like if there are models which can help narrow down potential molecules / targets, that has tremendous potential even if the system needs to be double checked by a person. So it's hard to see "don't work" as anything but buzzy. BUT I will agree with you neuroscience will help develop our understanding of cognition.
I just wanted to blast these other applications, because I think people get this idea that AI has to be AI for anything interesting to happen... but there are really niche applications where people don't think these tools are experimental. And what you describe may already be happening, Geoff Hinton's critique of modern deep nets seems to be a call to get more biological. (Thinking of capsules nets).
>Deep learning and machine learning don’t work. Quantitative math will always prevail,
I have a neural net onboard my phone which automatically detects songs offline and tells me what they are. Is that semantically 'quantitative math' and not machine learning?
Which app is that? Most of the well known music identification apps like Shazam use acoustic fingerprinting to identify songs. They work well without using neural nets or deep learning. What benefits does your neural net based app offer over this well known approach?
It uses both - the neural network creates it's own acoustic fingerprint database, which it then uses to perceive sound.
This shrinks the traditional acoustic fingerprinting data enough to be stored on mobile devices, while allowing for a low-power "always-on" identification offline.
It's analogous to a human being able to identify songs by remembering the chorus, just that the NN uses it's own features for both the memory and offline perception.
>In 2017 we launched Now Playing on the Pixel 2, using deep neural networks to bring low-power, always-on music recognition to mobile devices. In developing Now Playing, our goal was to create a small, efficient music recognizer which requires a very small fingerprint for each track in the database, allowing music recognition to be run entirely on-device without an internet connection.
The biggest issue I have is: "To do this we developed an entirely new system using convolutional neural networks to turn a few seconds of audio into a unique “fingerprint.”
Why did you pick a neural network? What mathematical properties does a neural network have that makes it appealing to this problem? How were the networks trained? Back propagation? It doesn't converge, and worse learning weights for a new batch can cause you to forget previous batches. This isn't a desirable property of neural networks or back propagation. You probably had a lot of heuristics on top, fine. How do you know that the weights you ended up with will always work in practise? Given an arbitrary track, you can encode it? What about growing the database? Does the neural network get updated for new songs, or do you use the same neural network to fingerprint new songs and update the data base?
Here's how I would have done it:
A song file is just a sequence of amplitudes. I would do some kind of an interpolation of piece-wise trig function. Trig functions have very desirable properties: they are continuous everywhere, and infinitely differentiable. Moreover, a sine basis decomposition will be able to reconstruct the original signal very well. This is great, because now you can use theories from DSP and fourier analysis. So we take the entire song, do a continuous time discrete cosine transform, in a block size of 32. Now you compute the square norm of all feature vectors, sort them, eliminate the vectors that are within 1e-3 radius (they are too similar to each other, there's not point in keeping them) and only store the top 25% of feature vectors by the square norm. The 25% cut off threshold and 1e-3 radius of similarity are heuristics, and adjustable parameters.
Now you have a database. For a new song, repeat the procedure, and get a feature vector for every 32 interval. There are probably theories in DSP you can use to get a better similarity measure, but for now, we'll just use the L2 norm of the difference. Do a nearest neighbour search in your data base for all feature vectors, and rank the results based on hits. I can run all of this on a computer from 2000s which are crappier than modern phones, and have the entire backend run on equally crappy hardware too. All parts of what I'm doing are fully deterministic, updating the DB is incredibly fast, CTDCT is super fast, there are no questions of convergence, no need for training. You can probably increase the accuracy and speed by doing some DSP and doing the nearest neighbour search based on different voice, bass, instrumental etc. features.
In practise how would it compare to your neural network? No idea, but I imagine it should be very competitive. The big benefits are that you have only 3 parameters (radius of similarity, cut off threshold and block size). This seems very easy to bench mark against, it should take like a week to implement. I'm not sure about the compression of the finger print however. Not sure how much space 1000000 songs will take (probably 25% since that was our cutoff). You can probably borrow psycho acoustics to make a better data base, and get a better compressed representation. Another alternative would be to down sample the song to 64kbps before hand.
I agree with spaced-out. A neural net can capture all those smaller eigenvectors in the signal that are routinely thrown away during traditional feature engineering, like what you describe. When the number of training samples grows big enough, those factors with marginal contribution become significant and allow higher levels of accuracy in prediction or classification than are possible when curating features manually.
Deep nets are here to stay. They're just not magic bullets that solve all problems equally well, especially those when training data is minimal.
This was a paper from Shazam from 2003. This is essentially what I proposed, there is no training. Shazam works pretty well. It's not even going into the mathematical consideration I went into.
>You'll never be able to develop features with the heuristic methods you described that will work as well as the features learned by a neural net.
Quantitative math, or applied math isn't based on fitting data to an arbitrary mathematical structure. It's looking at real life, and deriving the mathematical laws that govern what you see. You could have a neural net predict planetary motion. However, it doesn't know jack shit about physics.
>I have a neural net onboard my phone which automatically detects songs offline and tells me what they are.
MP3 uses something called psycho acoustics, which is a quantitative model on human perception, which is used to eliminate frequencies that can't be heard based on this model.
Your neural network doesn't tell you what features make songs distinct, it's not a quantitative model at all, but a black box heuristic on what the important features are superficially. If actual mathematicians worked on this problem, I guarantee you they'd do a better job, and their models would work on a commadore64, with real time training. Moreover it would tell you things like who is singing, if it's a live performance, which concert it was.
" If actual mathematicians worked on this problem, I guarantee you they'd do a better job"
No, this is wrong.
Some of the most brilliant people in the world have been working on image recognition, voice recognition etc. and AI is crushing all of their work.
"Your neural network doesn't tell you what features make songs distinct, it's not a quantitative model at all" - it doesn't matter at all if our objective is detecting the song. Neither does the mp3 compression algorithm.
>Some of the most brilliant people in the world have been working on image recognition, voice recognition etc. and AI is crushing all of their work.
This is very true. I take my stronger statements back, MAINSTREAM mathematicians attempting this problem are all wrong, and have been wrong for 50 years. But you do need the right theory, and the right math that realizes this theory.
"AI" is superficially beating the work in computer vision. Computer vision is complete bogus. The gabor filters, fourier transfroms etc. are all wrong conceptually. The known methods do abysmally on basic tasks like object recognition, texture segmentation etc. But they keep trying it.
I would take this one step further: computer vision, audio and NLP researchers have been stuck in a rut for the past 50 years. DL is beating THEIR math, but this is because of data and computation speed, not because of any insights. But DL is also wrong, and giving you an illusion of progress. Both of these things are doomed to go the way of GOFAI.
I can go into great detail and carefully explain why MAINSTREAM contemporary ideas in math for vision, audition and language are completely wrong, and have been wrong for 50 years. What is the right model? Like I mentioned before, the right ideas are emerging, neural networks will dominate, just not DL.
> deriving the mathematical laws that govern what you see
Fitting a model.
> Your neural network doesn't tell you what features make songs distinct
It literally learns better features that you could ever come up with by hand. This is why CNNs do better in computer vision that hand engineered filters.
> I guarantee you they'd do a better job, and their models would work on a commadore64, with real time training.
LOL if you think that a room full of people can listen to TBs of audio data, decide what mathematical functions when combined together are better descriptors of that data than a DL model learning its features.
You don't have the slightest clue what you're talking about.
This is a No True Scotsman. Actual mathematicians did work on this problem, training the neural network to achieve it's target task of identifying songs using minimal power and storage consumption - which works.
> Deep learning and machine learning don’t work. Quantitative math will always prevail, as it always has.
What do you think machine learning is, if not “quantitative math”? Deep learning is just linear algebra and calculus, and things like random forests are even simpler mathematically.
Machine learning is glorified curve fitting. DL isn't even mathematically sound, back propagation has no proof of convergence. Quantitative math is about extracting natural laws, and mapping them to mathematical structures. You could use DL to predict planetary motion, and get pretty good at it. But this isn't a quantitative understanding of the world. You didn't learn anything. Physics in contrast has the laws of motion and gravitation. You can directly model arbitrary planets. Moreover, you can model arbitrary rigid bodies, from cars to space shuttles. Your ML, DL random forrest etc. all use math, sure. But so did the Keplarian models of motion. You aren't qualitatively deducing math that governs the world, but forcing an arbitrarily chosen mathematical structure to your data.
If we’re throwing out anything that doesn’t have a proof of convergence as “not mathematically sound,” you can kiss fluid mechanics goodbye, as well as lots of other subfields of physics that rely on partial differential equations.
>you do realize that NN/AI is totally state of the art for many tasks?
Being state of the art doesn't imply that these things will solve these problems. In ML terms, how do you know that NN/AI isn't a local maxima that we need to jump out of? All NLP systems are joke. Sure replace Watson with DL, might perform better on Jeopardy. But in real conversations? Forget it.
I wouldn't bet on these things. NN will win, but not the back propagation, ReLu, sigmoid or whatever pseudo science that is the current buzzword. There is 50 years worth of understanding in actual neuroscience and cognitive modelling that no one has paid attention to, and new design principles are emerging that will influence mathematics.
Because DL isn't great at NLP (and may never solve "real conversations" aka AGI), it's worthless?
It's the best performing tool we have for NLP, image recognition, etc. Is it a local maxima? Probably. But it's out there solving real problems nonetheless. We'll capture all the gains we can and then move on after.
"Being state of the art doesn't imply that these things will solve these problems. "
I suggest you are misinformed about the state of AI.
AI is currently ahead of all other approaches in many fields.
It's lead to quite a number of practical advances and breakthroughs.
The 'best examples' are those that I described, but there are many more.
Your comments indicate I think some ignorance on the issue - I think I see the point you are trying to make but I also submit that you're not aware of what AI is doing today.
'Self driving cars' would be impossible without AI today, for example. The vision systems depend on AI it's a breakthrough without which we simply wouldn't have the tech.
What if I don't care about general intelligence? Honestly it sounds more like a burden than a benefit. A local maxima might actually be exactly the type of solution that I need, especially in cases where no satisfying solution currently exists.
I agree that DL/ML is destined to fail in domains like this but can you expand on this reasoning? What exactly do you mean by "quantitative math" (I haven't heard this phrase used in this way before)? And what were the equivalents of DL/ML for physics before calculus?
Quantitative math might be the wrong word, maybe applied math? In quantitative finance, you make quantitative models about the world, and build math that realize those assumptions and understanding. A simple example: options are a great financial trading instrument that you can model mathematically, the simplest being the Black-Scholes model. You can imply things like volatility of the stock price based on the price of the option to get a better understanding of what a risk neutral market is thinking, and compare that to the actual market distribution.
>And what were the equivalents of DL/ML for physics before calculus?
This is a good question. Before Newtonian calculus, and the laws of gravitation, people were building very complicated conic models (ie. eclipses, parabolas etc.) to get better and better prediction of planetary motion. A lot of parametric math came out of this, with many sophisticated models getting better and better, giving these astronomers an illusion of progress. However, Newton's insight was that motion is connected to mass, and this insight was the basis of how to derive the laws of motion, which gave us the laws of gravitation (F = (Gm1m2/r^2)). This insight eliminated the previous Keplarian models of motion, because you were now able to predict the motion of arbitrary rigid bodies using very simple math (we teach this in highschool). Ofcourse, Newtonian motion has its limitations that's why we have quantum physics and Einstien's relativity theory. But for practical technological applications, Newtonian physics on its own gets you incredibly far.
Where is ML/DL? It would be akin to Keplarian elliptical motion. More realistically however, it's closer to aether theory of light, and will go the way of GOFAI. This stuff isn't grounded in modelling any scientific observation. Moreover, they are mathematically useless. Back propagation doesn't converge, and why should you fit your data to an arbitrary mathematical structure? In practise, DL/ML doesn't work at all, you will be much more successful by modelling your problem mathematically. For example, consider an automobile manufacturer, which has all kinds of moving parts in their planes. They typically model each part mathematically (ie. gear x under goes exponential time decay), and imply their parameters using rigours test data. Then you use some sort of an empirical statistical model to predict the failure.
I've seen deep learning companies come and fall flat on their face trying to beat the accuracy of these deterministic systems. Those guys needed a lot of data, and GPUs. I'm not even criticizing the fact that DL is a black box. It's worse, it's inferior to everything out there on every metric imaginable. These mathematical models in contrast have been in production for decades, with yearly updates, and they run in real time with little historical data, they are fully understandable and they beat every method we know of.
This isn't the first time multi layer perceptrons gained hype. They didn't work in the 80s, or 90s or the 2000s, they don't work now. The math behind DL is the same that we had in the 80s, they just called it multi layer perceptron. None of the ideas in modern ML/DL are new, all these ideas like reinforcement learning, GANs etc.
1. Black-Scholes works in a lot of cases but is an approximation: it has edge cases where it does not fare well...
2. Likewise Newtonian physics is also an approximation: it does not fare well near relativistic speeds or high gravity. But at least we have models which seem to be accurate to many decimal places today. Who knows what the future may hold.
3. Not all useful problems can be represented by simple equations, but they can be computed analytically (e.g. N-body problem).
4. Ultimately DL is popular because it works better than anything else in some very specific domains like speech recognition and image recognition. It is overapplied I'll admit, but if you can do better then feel free to publish a paper.