Chiming in to say that this precisely our observation. The existing ML/DL libraries are not bad as far as those types of things go. In fact, Pytorch is an amazing library IMO. Especially compared to TensorFlow, Caffe and the stuff that came before that.
But like George points out in the article, unlike "traditional" software, ML requires iteration, data management, monitoring, specific infra reqs, and so on. So our take was that libraries would never be enough, hence the SaaS offering.
We optimized the training, annotation and deploy infra to minimize the time it takes to bring up a custom object detector. The current version only supports detection of object centers (as opposed to full bounding boxes) and tends to do best if the objects don’t vary too much in size.
I work at Nyckel. In fact, I'm the "ml guy" at Nyckel. I have a PhD in ML and did some research at Berkeley, but I mostly consider myself a ML engineer. My most recent job was in the self-driving car industry, leading a ML team there.
Knowing the math/stats is helpful when navigating the vast set of models to choose from when fitting your data. Although I'd argue that some sort of black-magic "intuition" earned by doing this for a long time is more important in practice...
However, when validating a model, there is really only one way: test it on production data. This is what Nyckel does: upload your production data, do some annotations, and see if it works. Nyckel handles model search, cross validation, etc for you which reduces the risk of bugs. In a way we are making the argument that by focusing on your data, you are most likely to do well.
But what about that pesky out-of-domain issue? Like the tank/cats or whatever? Well, our customers are not trying to develop AGI, but solve narrow problems using image and text classification. And they are also doing it for themselves so they have all the incentives to be honest. Consider one example use-case from a health food store we work with: "what type of legume (from the 10 I offer in bulk) is in this picture"? As long as they train and test on production data from the warehouse camera stream, they are in good shape from a statistical perspective. Sure, if they throw in a picture from anywhere else, they are toast, but why would they?
I believe it is a very common mistake for intelligent people to assume that others will behave at least reasonable. But in my experience, when people do AI without understanding it, all bets are off.
"Sure, if they throw in a picture from anywhere else, they are toast, but why would they?" Since you list a Barcodeless Scanner as an example, the manufacturer of strawberries might run a promotion for blueberries on their box. For a non-expert user, it is unimaginable that a model trained on 3D blueberries might be triggered by a 2D photo of blueberries.
Also, I'm going to go with your legume example. As soon as each new truck arrives, the intern runs out and takes photos of the legumes in their boxes for the AI training. He uploads the images to your website and trains a model. TADA! The model is deployed to production and starts causing issues. But the people working alongside the fancy new celebrated machine don't want to lose their job, so they silently fix what's going wrong. You've just reduced productivity by introducing a costly machine.
Turns out, the different suppliers arrive at different times of day, so the lighting is different. And different suppliers use different box types. But without expert domain knowledge, you wouldn't even consider that this might be a problem. Also, why do you assume the customer will verify their model on independently sampled production data? To someone lacking the domain knowledge, using the exact same set of photos for training and for verification seems just fine. Actually, it's a lot less work that way.
That's what I tried to get at with my blind driver analogy. An untrained person will do things that seem absurdly unreasonable to us. But to them, it's the logical choice. They lack the knowledge to properly understand why what they are doing might be problematic.
Based on your description, however, it sounds like you (and your team of experts) are actively working with this customer and giving them feedback on what to do and how to do it. Have you considered making that part of your offering?
"Use Nyckel to integrate state of the art machine learning into your application. Anyone can curate their data set with our ML platform. A quick chat with an experienced AI engineer helps identify the best model and training procedure for your use case. It only takes minutes to finish your first model. Once created, your functions can be invoked in real-time using our API."
I'm pretty sure any serious business user would be happy to spend $100 for a 15 minute chat with someone that checks that their data is OK and their approach is reasonable. And it's also a nice way to segment out those that'll never become paid users anyway.
Good question. We have been doing this for almost 2 years now and we still find new players almost every week! It's a bit of a wild west for sure.
I can't say what we do different from everyone, but a few things that we focus on:
* Speed: we train models based on DL in seconds. So you get real-time feedback on your model/data as you annotate and upload more. This is true for a few, but far from all of our competitors. In our benchmarking we find that we still perform on par with the competition (at least in the "low-data" regime https://www.nyckel.com/blog/automl-benchmark-nyckel-google-h...)
* Level of abstraction: Many competitors expose some ML knobs for their users thinking it will improve the experience. We found that this induces "ML anxiety" for many. As a result we have zero knobs. Just focus on your data, we do the rest.
* API: we have spend a ton of time developing clean API abstractions. Some competitors have great APIs, other don't.
* Cost: we are super cheap. Our lowest tier if $50. We don't charge for training or per function/model.
Chiming in on the weak labeling question: As of right now, you can use outside libraries like skweak to create weak labels offline and then PUT those using our API (https://www.nyckel.com/docs#update-annotation). This wouldn't cost anything since we only charge for invokes, but it requires some work.
We may look at adding weak labeling as a first class feature of our site down the road, but we are not yet sure we need to. With the powerful semantic representations offered by the latest deep nets, we find that smaller number of hand-annotated samples often suffice for the desired accuracy which makes the whole annotation process simpler and faster. Of course, if you have data & evidence to the contrary, we'd love to take a look.
Hi Cyril_HN! Thanks for your question. What you are asking for is sometimes called "part of speech" tagging. We currently don't support that but will add it down the road along with more advanced image outputs like detection.
Chiming in to say that this precisely our observation. The existing ML/DL libraries are not bad as far as those types of things go. In fact, Pytorch is an amazing library IMO. Especially compared to TensorFlow, Caffe and the stuff that came before that.
But like George points out in the article, unlike "traditional" software, ML requires iteration, data management, monitoring, specific infra reqs, and so on. So our take was that libraries would never be enough, hence the SaaS offering.