More

Zephyr314 · 2025-04-23T18:01:39 1745431299

Do you guys have examples of people actually using this in production? I'm curious how it scales beyond dev.

Callicles · 2025-04-23T18:22:06 1745432526

We are currently in Production on Boreal https://www.fiveonefour.com/boreal, our hosting solution for Moose with F45 https://f45training.com, a global studio fitness studio brand. We wrote a case study with them here: https://www.fiveonefour.com/blog/case-study-f45. So we Have a 24/7 consumer facing deployment that we have been running for the last 5 months.

We are going towards 1.0 from an API perspective, we have just landed what we internally call DMV2 which is the latest iteration of the abstraction level for the api. Think SST / Terraform CDK vertically integrated for Data.

If you are looking to work with Moose in production we would love to chat with you :)

f45_greg · 2025-04-23T18:32:51 1745433171

Hi Zephyr! I'm the Head of Engineering at F45 Training. We had early access to moose, and we've been using it in production since last year with thousands of our members. We use moose to manage the backend for LionHeart - our heart rate tracking system in studio. We also use Moose's paid hosting service called Boreal. It's a new product so still a bit rough around the edges - but it has scaled really well for us and the 514 Team has been terrific.

Zephyr314 · 2025-04-23T17:36:25 1745429785

Very cool to see what used to take a team years to build in a simple, intuitive OSS package. Getting a stack like this up and running in 20 lines of python out of the box would have been unthinkable 10 years ago. Congrats to the team. Can't wait to see where you take this!

Zephyr314 · on April 4, 2023

Hi, I am one of the founders of SigOpt (acquired by Intel in 2020) and I am happy to answer any questions people may have!

You can also jump right to the code here: https://github.com/sigopt/sigopt-server

Zephyr314 · on March 27, 2023

Hi, I'm one of the founders of SigOpt (Scott Clark) and still working with the team after our acquisition by Intel in 2020. I am happy to answer questions.

I am incredibly proud that the SigOpt product that has helped thousands of researchers worldwide is now completely open to the broader community.

Zephyr314 · on April 18, 2019

There are also several papers and blog posts diving into details and tradeoffs of different Bayesian optimization approaches and components here [0]. Example: Covariance Kernels for Avoiding Boundaries [1]

[0]: https://sigopt.com/research/

[1]: https://sigopt.com/blog/covariance-kernels-for-avoiding-boun...

Zephyr314 · on April 18, 2019

This NVIDIA post goes into extending Bayesian Optimization to multiple metrics [0]. It shows how you can use efficient optimization to find a good Pareto Frontier[1].

[0]: https://devblogs.nvidia.com/sigopt-deep-learning-hyperparame...

[1]: https://en.wikipedia.org/wiki/Pareto_efficiency

Zephyr314 · on April 18, 2019

hyperopt also uses TPEs [0], this may be a variant/fork of that.

[0]: http://hyperopt.github.io/hyperopt/

Zephyr314 · on April 11, 2019

This can be a very difficult market, but there are a handful of different projects designed to help with this. In-Q-Tel [0] has been accelerating adoption of new technology for the Intelligence Community for many years and often invests in and helps deploy technology from startups of many sizes. SBIRs [1] can be an effective way to quickly get gov money for research and even culminate in a procurement vehicle for sole source contracts if you get through Phase III. There are also a handful of accelerators like DIU [2] and MD5 [3] designed to help small firms navigate this difficult space. It still isn't easy, but it can help level the playing field a bit when you are just a startup.

[0]: https://www.iqt.org/

[1]: https://www.sbir.gov/

[2]: https://www.diu.mil/

[3]: https://community.md5.net/md5/landing

Zephyr314 · on March 26, 2019

Which in this case nets you >$1M [0] vs the 80k quoted above. This also assumes no retention grants over 6 rounds.

  [0]: 0.02*0.8^6*$200,000,000 = $1,048,576

Zephyr314 · on Jan 28, 2019

To be fair, the hyperparameter tuning behind these AutoML systems are getting fairly robust. Google bases theirs on Vizier [0]. The Amazon Sagemaker group has people from the gpyopt project [1]. There is also tons of open source projects out there to help for non-enterprise projects [2] [3]. There are also stand-alone companies that help with this explicitly for enterprises [4] (Caveat, I am a founder).

Increasingly I think more time will be spent on the creative/bespoke aspects you mention later in your post, like making sure that you are building a system that actually achieves some business value (vs just getting a better academic-oriented metric result). Hyperparameter tuning is basically trying to do high-dimensional, non-convex optimization on time consuming and expensive to sample functions. Hand tuning is a terrible way to approach this, and is different for each problem as you point out. Experts can leverage their domain expertise and the unique aspects of their data, models, and applications in much better ways.

[0]: https://www.kdd.org/kdd2017/papers/view/google-vizier-a-serv...

[1]: https://github.com/SheffieldML/GPyOpt

[2]: https://github.com/Yelp/MOE

[3]: https://github.com/hyperopt/hyperopt

[4]: https://sigopt.com

m0zg · on Jan 29, 2019

Are you sure this is actually being used for "AutoML" type services? All of the mentioned methods require a parameter search, which is computationally infeasible in a "quick" AutoML use case, and expensive in case you actually need it. That is, you more or less run several training sessions in parallel, and learn from which performs the best in choosing the next parameters. You don't do a full grid search (that's completely unfeasible most of the time), but you at best tweak only a few parameters, and you don't do it every time you train. Hyperparameters aren't just the learning rate and weight decay, it's also the size and number of layers, where and when to quantize and by how much, structure of the network, parameters of pooling, etc etc. I'd say we're still pretty early in the game with all of that, especially when it comes to efficient architectures that demonstrate high accuracy.

Zephyr314 · on Jan 29, 2019

I agree that this isn't as common for most end-to-end "AutoML" systems that take a CSV, do light feature engineering/combinations, pipe it into a random forest / GBDT, and then output a model. For many of those approaches there are fewer parameters to tune and you don't get as much lift from tuning them right. Often it is more about quantity of models and ease of use vs quality. I do think that quality will increasingly help though so some tuning will start to be used as the volume, variety, or complexity of the models in these systems increases or the value of the models themselves start to increase.

However, or more complex model pipelines where an expert is probably involved there are lots of tools to help with it and it is quickly becoming automated and less of a "dark art." Some of these tools are built into frameworks like Google/Amazon, some are built into open source platforms (like katib in kubeflow), and others are entire companies building model experimentation platforms (like SigOpt). Many of these can handle everything from traditional hyperparameters like learning rate to architecture parameters to tuning feature embeddings, all at once [1]. I agree with the original author that playing with parameters and doing trial and error optimization of hyper-, architecture, or feature transformation parameters will largely stop happening in the manual way it is done today. All of these methods are orders of magnitude quicker than standard brute-force approaches.

Otherwise, I think you are completely right that there are a ton of aspects of modeling that require domain expertise and nuance beyond pulling a model off the shelf. I think a lot of that comes down to picking the model, picking the data that matters, picking the objective that actually solves the problem for the task at hand, etc. I believe less of that will be high-D non-convex optimization done manually.

[1]: https://aws.amazon.com/blogs/machine-learning/fast-cnn-tunin...