I'm currently taking a graduate class in Applied Cryptography and will definitely use this as another resource for reference. Love that they've opened it up and made it free.
I use a Jupyter notebook to explore how a Bayesian might compare two products on Amazon with the goal of finding the probability that one product is better than another.
I basically agree with this rule. I find that my colleagues who overly hype unsupervised approaches typically don't have much experience working on ML problems without labeled data. My suspicion of this comes from the fact that whenever I give a talk on ML I always have a wealth of personal experience to draw on for examples. My colleagues almost always reuse slides from projects they never worked on.
I don't disagree with your point, but the unsupervised aspect of NLP typically isn't useful on its own. Usually it's a form of pre-training to help supervised models perform better with less data.
From Google in 2018:
"One of the biggest challenges in natural language processing (NLP) is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labeled training examples. However, modern deep learning-based NLP models see benefits from much larger amounts of data, improving when trained on millions, or billions, of annotated training examples. To help close this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). The pre-trained model can then be fine-tuned on small-data NLP tasks like question answering and sentiment analysis, resulting in substantial accuracy improvements compared to training on these datasets from scratch."
As I said, I'm an NLP researcher and practitioner, so you don't need to quote this at me.
The unsupervised aspect is the engine driving all modern NLP advancements. Your comment suggests that it is incidental, which is far from the case. Yes, it is often ultimately then used for a downstream supervised task, but it wouldn't work at all without unsupervised training.
Indeed, one of the biggest applications of deep NLP in recent times, machine translation, is (somewhat arguably) entirely unsupervised.
I didn't mean to make it sound incidental although I do see your point. Just wanted to chime in with how important having a labeled dataset is for a successful ML project.
I think the point is labeling itself is very difficult except for special and limited domains. Manually constructed labels, like feature engineering, are not robust and do not advance the field in general.
That makes sense. I'm coming from the angle of applied ML where solutions need to solve a business problem rather than advance the field of ML. In consulting many problems can't be solved well without a labeled dataset and in lieu of one, less credible data scientists will claim they can solve it in an unsupervised manner.
For sure. There are counter-examples however - fully unsupervised machine translation for resource poor languages comes to mind and is increasingly getting business applications.
I think that in the future, more and more clever unsupervised approaches will be the path forward in huge AI advances. We've essentially run out of labeled data for a large variety of tasks.
I would argue that GAN's by definition aren't unsupervised, they just aren't supervised by humans. Additionally, OpenAI's game stuff also has similar arguments against it.
I'm not sure that's correct. The discriminator and the generator both learn to match a training set. You don't need to label the training set at all. You can just throw 70,000 aligned photos at it.
I think I see what you're saying, but that might be a different definition of "supervised". It seems impossible for one half of the same algorithm to be supervised and the other to be unsupervised. But I like your definition (if it was renamed to something else) because you're right that the discriminator is the only thing that pays attention to the training data, whereas the generator does not.
Time will tell, but as a machine learning engineer, when you see results this good it's more probable that a mistake was made. They could be reporting the training error on an overfit model or data leakage could be occuring due to an improper train-test spilt of the data.
Also, it is definitely appropriate to use the term AI in this case. AI is not a technical term so it's really in the eye of the beholder, but I think it's safe to say that ML is a subset of AI. Perhaps people are conflating AI with AGI?
It's a 14-person training set with an 8-person test set, so my guess is that it can pretty accurately predict seizures in the small group of people it is trained on. Whether the model could be kernelized for a useful general deployment is unclear. It still requires many electrodes attached to the scalp, so there is a still a ways to go before it can be integrated into a watch, for example.
They acknowledge this in the article - the system will have to be transfer learned for every patient. Which IMO is ideal, but you will need training/validation data, which in this case sounds like it'd be extremely expensive. Furthermore, the system could get automatically better over time _for that patient_, if properly designed and fed clean training samples.
Yeah watches can't even detect whether you are sleeping or not. The products in the market are mainly accelerometer based and aren't really reliable if you e.g. are awake but don't move.
They're too good and also bad at the same time. Most laypeople don't realize that "99.6%" accuracy is 1/250th chance of making an error. If you do inferences every second, that's 14.4 errors per hour. Now granted, some of those errors are false negatives, but I'm not sure what's worse in this case. Depending on which action is expected after receiving an alarm, this could render the device completely useless.
I think you meant AI is a subset of ML. Agree AGI is a red herring here. Mostly this is a marketing problem, but the pattern rec., ML, "AI", etc. branding has been shifting around for decades.
This set is far too small to say anything strong about the results, but the problem is interesting anyway.
I think I disagree in two regards. One that unless we stretch the meaning of AI to extremes, there is plenty of ML that is not AI. I guess that presupposes we sort out the stats vs. ML issues but that's what you get with all these fuzzy terminologies floating around. So it isn't really useful to think of it as a subset, in my opinon.
Secondly, while I know of (mostly historically) a small amount of serious non-ML AI and AGI work being done, it has almost nothing to do with the common parlance to day, which is nearly entirely ML. Is this what you mean when you talk about non-ML AI or is there something I'm missing?
For what it's worth, I'm happy with thinking of them as overlapping, although I do think the AI terminology is almost useless at the moment, and ML is slightly better defined.
I was not referring to the inconsistent way in which these terms are used to sell products and obtain funding for startups, but to their academic definitions.
While most recent successes in the field of AI have been brought about from advances in the subfield of Machine Learning, even today's most advanced AI systems have components that are not Machine Learning (e.g. Alpha Go still requires tree search techniques from "classic AI" to work).
In the definitions below, AI refers to any "intelligent agent", whereas ML refers to the subset of techniques that achieve this through learning/experience/data.
ML:
"Tom M. Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."[0]
AI:
"Computer science defines AI research as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals."
Ok, we are bogged down in semantics, but I see where you are coming from. I don't buy the proper subset argument for reasons above (i.e. I don't buy the broading of AI to fit that idea of "intelligent agent", as it includes too many things that don't really fit, imo).
Unless something has changed radically since I stopped paying as much attention there is not actual agreement on these terminologies, at least broadly, in academic circles.
I certainly agree many current systems with a core ML component include other techniques from lots of areas including what you call "classic AI" as well as optimization, etc., but the ML is still the fundamental part of nearly everything recent I've seen. As pretty much every successful system of this type is a hybrid in the sense you mean, I don't find differentiating them from some putative "pure ML" approach very interesting.
There was some good work in very different approaches in the 70s through 80s, but that seems to have tapered off in the 90s really. I'm not very current though and would love to hear of newer interesting things in that vein.
I appreciate this thoughtful and detailed reply. I was thinking all those things in my head while reading as well, but couldn't bring myself to invest the time to address them all. I got the impression that the author wasn't someone who has a lot of experience building real-world predictive models otherwise they'd appreciate the trade-offs that need to be made sometimes to get something that works well and can be debugged/interpreted without too much trouble. Of course this isn't to say we shouldn't be striving to develop more interpretable solutions, but I don't think this paper is very helpful to due to its lack of rigor and straw-man tactics.
I'm excited to see a couple lectures with Swift. All the work to add interoperability with Python and make swift-jupyter is very appreciated and feels like it's Xmas in June.
You can but it's not ideal. What if you want to write your own cuda kernel for your experiment? Python isn't really setup for this easily unless you want to throw odd c++ integrations into your code. Swift is designed to be a direct match to the underlying instructions. This would make deep learning much more expressive and flexible in Swift, with less errors.
This question is addressed extensively in the course. Check out the last two lectures, they do a great job of going over lots of different reasons.
That doesn't really make sense. The code you implement in Swift when in Jupyter needs to also be available at runtime to execute. Meaning you can do the exact same thing in Python, because your model architecture is going to be embedded in the exported model.
For custom kernel code, what you really want to use is a custom TF op. But I doubt that's what you're getting at anyway, because that's for more advanced use cases.
The goal is to allow Swift to be used for writing MLIR and XLA kernels. The new LazyTensor under development already allows for fused XLA operations to be created in Swift. There's an awful lot you can do in Swift which is very very hard to do properly in Python. I've got a bit more background on this here: https://www.fast.ai/2019/03/06/fastai-swift/
Edit: HN isn't letting me reply deeper, so I'll reply to "what are the benefits over C++?" here. The first is that MLIR has dialects that support stuff like polyhedral compilation, which result in much more concise and understandable code, which is often faster too. The second is that using the same language from top to bottom means you can profile/debug/etc your code in one place, which is much more efficient. And you don't have to learn two languages. And you don't have to use C++, which (for me at least) is a big win! ;)
MLIR is going to be orthogonal to C++ performance. You're talking about the efficiency of an intermediate representation. But that IR turns into TF opcodes, which then need to execute natively.
You can achieve the same efficiency in C++ via a custom TF op. You end up with native instructions either way. And you have access to the entire memmapped model in the op, allowing you to do as you please.
You're only debugging in one place, because the same C++ code runs during training in your notebook that runs during inference on your device.
You're getting ease of implementation and usage of Swift. But your code will be less portable; you won't be able to run the same model on Android or on the server. And there's not necessarily any performance benefit over doing the same in C++, definitely not if your kernel is simple.
Thanks. What is the advantage of using Swift over implementing a custom TF op in C++, and using its generated Python wrapper in Jupyter? Just not having to deal with C++?
If you need to debug something, you don't need to use some mixture of pdb and gdb.
Swift is a relatively young language, meaning it does not (yet) have weird hairy bits to work around design decisions made 20 years ago.
Similarly, Swift is still getting defined in many areas. There is (theoretically) the opportunity to influence language design decisions to patterns that mesh better with ML needs.
Yes, but at the cost of portability. You won't be able to run the same model on Android for instance.
You're still debugging in two places; in Swift and Python. Debugging Swift is probably easier than C++ though.
I think using Swift is a valid solution for special cases, but not the best solution for most cases. The TF authors already provide a suitable, general solution in the form of custom TF ops.
And if you don't need a custom kernel, and the chances are you don't, then stick with pure Python for maximum ease of use and portability.