To be fair, the hyperparameter tuning behind these AutoML systems are getting fa...

m0zg · on Jan 29, 2019

Are you sure this is actually being used for "AutoML" type services? All of the mentioned methods require a parameter search, which is computationally infeasible in a "quick" AutoML use case, and expensive in case you actually need it. That is, you more or less run several training sessions in parallel, and learn from which performs the best in choosing the next parameters. You don't do a full grid search (that's completely unfeasible most of the time), but you at best tweak only a few parameters, and you don't do it every time you train. Hyperparameters aren't just the learning rate and weight decay, it's also the size and number of layers, where and when to quantize and by how much, structure of the network, parameters of pooling, etc etc. I'd say we're still pretty early in the game with all of that, especially when it comes to efficient architectures that demonstrate high accuracy.

Zephyr314 · on Jan 29, 2019

I agree that this isn't as common for most end-to-end "AutoML" systems that take a CSV, do light feature engineering/combinations, pipe it into a random forest / GBDT, and then output a model. For many of those approaches there are fewer parameters to tune and you don't get as much lift from tuning them right. Often it is more about quantity of models and ease of use vs quality. I do think that quality will increasingly help though so some tuning will start to be used as the volume, variety, or complexity of the models in these systems increases or the value of the models themselves start to increase.

However, or more complex model pipelines where an expert is probably involved there are lots of tools to help with it and it is quickly becoming automated and less of a "dark art." Some of these tools are built into frameworks like Google/Amazon, some are built into open source platforms (like katib in kubeflow), and others are entire companies building model experimentation platforms (like SigOpt). Many of these can handle everything from traditional hyperparameters like learning rate to architecture parameters to tuning feature embeddings, all at once [1]. I agree with the original author that playing with parameters and doing trial and error optimization of hyper-, architecture, or feature transformation parameters will largely stop happening in the manual way it is done today. All of these methods are orders of magnitude quicker than standard brute-force approaches.

Otherwise, I think you are completely right that there are a ton of aspects of modeling that require domain expertise and nuance beyond pulling a model off the shelf. I think a lot of that comes down to picking the model, picking the data that matters, picking the objective that actually solves the problem for the task at hand, etc. I believe less of that will be high-D non-convex optimization done manually.

[1]: https://aws.amazon.com/blogs/machine-learning/fast-cnn-tunin...