Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What ML platform are you using?
149 points by speedylight on March 13, 2022 | hide | past | favorite | 83 comments
I am interested to know what ML platforms you use for personal/hobbyist projects... Do you rent GPU instances from the likes of Azure, GCP, AWS or do you use managed solutions like Paperspace Gradient, or Colab? why or not?

I am very much a beginner in the space of machine learning and have been overwhelmed by the choices available. Eventually I do want to simply want to build my own rig and just train models on that, but I don't have that kind of money right now, nor is it easy to find GPUs even if could afford them.

So I am basically stuck to cloud solutions for now, which is why I want to hear personal experiences of HN folks who have used any of the available ML platforms. Their benefits, short comings, which are more beginner friendly, cost effective, etc

I am also not opposed to configuring environments myself rather than using managed solutions (such as Gradient) if it is more cost effective to do so, or affords better reliability // better than average resource availability... because I read some complaints that Colab has poor GPU availability since shared among subscribers, and that the more you use it the less time is allocated to you... not sure how big of a problem it actually is though.

I am very motivated to delve into this space (it's been on my mind a while) and I want to do it right, which is why I am asking for personal experiences on this forum given that there is a very healthy mix of technology hobbyists as well as professionals on HN, of which the opinion of both is equally valuable to me for different reasons.

Also please feel free to include any unsolicited advice such as learning resources, anecdotes, etc,

Thanks for reading until the end.




> I am very much a beginner in the space of machine learning

While the (precious and useful) advice around seem to cover mostly the bigger infrastructures, please note that

you can effectively do an important slice of machine learning work (study, personal research) with just a battery-efficiency-level CPU (not GPU), in the order of minutes, on a battery. That comes before going to "Big Data".

And there are lightweight tools: I am current enamoured with Genann («minimal, well-tested open-source library implementing feedfordward artificial neural networks (ANN) in C», by Lewis Van Winkle), a single C file of 400 lines compiling to a 40kb object, yet well sufficient to solve a number of the problems you may meet.

https://codeplea.com/genann // https://github.com/codeplea/genann

After all, is it a good idea to use tools that automate process optimization while you are learning the deal? Only partially. You should build - in general and even metaphorically - the legitimacy of your Python ops on a good C ground.

And: note that you can also build ANNs in R (and other math or stats environments). If needed or comfortable...

Also note - reminder - that the MIT lessons of Prof. Patrick Winston for the Artificial Intelligence course (classical AI with a few lessons on ANNs) are freely available. That covers the grounds before a climb into the newer techniques.


Note that this won't work with reasonably performant CNNs. Passing an image batch through a large-ish ResNet takes half a second on our GPUs, several minutes at full load on CPU. This makes training infeasible, and most models small enough to work on CPU are so far from state-of-the-art that you can't do any worthwhile computer vision research with them.


Yes, but note on the other hand that simpler infrastructures such as one-digit-wide-GB GPUs you could buy and install on your workstation could be similarly frustrating, because you may easily encounter their limits (as in, "I got this semi-specialized equipment and I cannot get an output above 1024x768?!").

So, while one is learning, the case could be for being conservative and work directly on available tools, which will be revealing on some scalability requirements, also optimistically: you do not need a full lab to do (reasonable) linear regression, nor to train networks for OCR, largely not to get acquainted with the various techniques in the discipline.

When the needs push, it sometimes will not be just high-end consumer equipment to solve your problem, so on the side of hardware already some practical notion of actual constraints of scale will help orientation. Because you do not need a GPU for most pathfinding (nor for getting a decent grasp of the techniques I am aware of), and when you will want to produce new masterpieces from a Rembrandt "ROM construct"¹ (and much humbler projects) a GPU will not suffice.

(¹reprising the Dixie Flatline module in William Gibson's Neuromancer)


Why start with vision? Do some language models. I used to train those all the time on my laptop.

GPT 5MB for the win. It really works.


...I am curious, now that I know about Fabrice Bellard's LibNC (bellard.org/libnc), if that «image batch through a large-ish ResNet» would be faster using this library - which can work on both CPU and CUDA...


Fast CPU transformers: https://bellard.org/libnc

Fast CPU convolutions: https://NN-512.com

Both are completely stand-alone (no external dependencies).


> Fast CPU transformers: https://bellard.org/libnc

And especially, from Fabrice Bellard (QEMU, FFMPEG...)

I do not know how you found it: it is not even in his site's index!

--

I see that NN-512 is a personal project of yours: congratulations! Though it seems to be a go-lang application that generates specialized C for convolutional NNs... Not a general purpose library, not for beginners.


FWIW this is the first link on the index page: https://bellard.org/nncp/ which mentions libnc as its underlying ML library.


(Yes, well, NNCP is an attempt to perform lossless data compression through ANN - which is quite interesting, and of definite theoretic interest, though yet not practical in its more defined purpose e.g. because of speed, hence presumably power efficiency. The other is the invention of water.

It is like "Let me show you my new idea for a cupboard..." - ok, nice! - "...I created through a new lightweight portable all-purpose "Fabrice Bellard"-quality Swiss army knife for automation that operates on any material and that you may use if you want" - YES!? REALLY?... Metaphors do not come close.

This, LibNC, is an Artificial Intelligence engine signed Fabrice Bellard, in low level implementation... It's a "revolution".)


I worked in Google Research for over 5 years doing Machine Learning, and recently quit to build my own ML start-up. These days, I solve a mix of NLP, computer vision, and tabular problems, all with state of the art neural network techniques. I've tried many setups.

My advice is go with Colab Pro ($50/mo) and TensorFlow/Keras. You can go with Pytorch too if you prefer.

I made the mistake of buying a 2080Ti for my desktop thinking it would be better, but no. Consumer grade hardware is nowhere near as good/fast as the server grade hardware you get in Colab. Plus you have the option to use TPUs in Colab if you want to scale up quickly.

You really don't need to get fancy with this setup. The best part of using Colab is you can work on your laptop from anywhere, and never worry about your ML model hogging all your RAM (and swap) or compute and slowing your local machine down. Trust me, this sucks when it happens, and you have to restart!

As for your data, you can host it in a GCS bucket. For small data (<1TB) even better is Google drive (I know, crazy). Colab can mount your Google drive and loads from it extremely quickly. It's like having a remote filesystem, except with a handy UI and collaboration options, and an easy way to inspect and edit your data.


(OP, please don’t subject yourself to TensorFlow/Keras. The moment Jax became available on TPUs publicly, the moment I stopped using TF. And boy oh boy, “never looked back” is an understatement. I still cringe remembering all the time I spent trying to get tf.function to just please, please work, like a housewife alarmed that neither her partner nor herself are able to actually work.)


To support your point with data, here is a graph of usage of TF vs PyTorch in papers over time: https://horace.io/pytorch-vs-tensorflow/


I've seen this chart too. But research != Production. I bet that TensorFlow is still more commonly used in serving than Pytorch because of great tooling like TensorFlow Serving. I could be wrong though, as I'm not up to date with the latest in the Pytorch ecosystem.


FWIW PyTorch has TorchServe nowadays which does the same thing as TF Serving.


Great advice! BTW, your startup https://creatorml.com/ is very cool, what a creative idea.


Thank you! Feel free to reach out to me on Twitter or Discord (linked on the homepage) if you want to chat.


Note that Colab Pro is $10/month and Pro+ is $50/month.

The $10 is more than enough for learning Deep Learning.


100% agreed. Start with the $10 plan. I forgot how much it cost for the middle tier. The one benefit of the higher tier is you get access to better GPUs, and can run multiple colabs in parallel effectively getting multiple accelerators at once if you're doing distributed hyperparameter tuning.


The one benefit of the higher tier is you get access to better GPUs

I think it’s the same GPUs as Pro. I’m actually surprised you don’t recommend buying a 2080Ti. The best GPU you get with colab pro+ is P100 which is slower than 2080Ti. If you can afford it, having your own GPU workstation is a much better experience than dealing with colab.


+1 and you can even connect your Colab to a GCP Marketplace Colab runtime that has no time limit (but will cost us) if you e.g. need to run something for a few days (although then you don't get the awesome Google drive mounting - hope they fix this eventually)


Honestly, I've found that most ML tooling is overly complicated for most ML projects.

I use a paperspace VM + Parsec for personal ML projects. Whenever I've done the math an hourly rate on a standard VM w/GPU is better than purchasing a local machine and the complexity of a workflow management tool for ML just isn't worth it unless you are collaborating across many researchers. As an added bonus, you can re-use these VMs for any hobby gaming you might do.

The majority of ML methods train quickly on a single large modern GPU for typical academic datasets. The scaling beyond 1 GPU or 1 host leads to big model research. While big models are a hot field, this is where you would need large institutional support to do anything interesting. A model isn't big unless it's > 30 GB these days :)

Even in a typical industrial setting, you'll find the majority of scientists using various python scripts to train and preprocess data on a single server. Data wrangling is the main component which requires large compute clusters.


I put together a linux box with a 2080ti a few years ago and have been using it consistently for personal ml research ever since. Ive found it well worth the investment and learned that the ease with which I can jump into hacking on a project is key, which is why this works so well for me. I can just ssh in at any time and start experimenting with models. Even if its not technically economical when you do the math, the ease, reliability and fact that I know my models arent being billed per hour helps encourage me to experiment often, which is kwy to learning.

As for software, I do everything with jax and tensorboard for viewing experiments. Jax is a phenomenal library for personal ml learning as its extremely flexible and has relatively low level composable abstractions.


What do you wrt CUDA and linux? I'm a linux person but every time I try and mess around with CUDA the whole thing gets super annoying. I don't want to have to reinstall everything every time there is a kernel upgrade . Maybe there is some trick with WSL2 now?


I can’t speak to using WSL2 on Windows, but on my Linux System76 GPU laptop I get around CUDA configuration time sinks by not updating my configuration for long periods of time. I don’t mind spending setup time once every 6 months, but I don’t want to waste my time during it frequently. System76 has new container oriented CUDA setup that is OK, but I liked just setting everything up on my own, and then not modifying anything for as long as possible.


Get a decent NVIDIA GPU. Then install PyTorch and off you go. I advise to make all your own tooling, as you likely have a specific use-case, and so your tooling can be tailored to that. Most ML tools are very generic, or so simply you might as well do them yourself. The advantage if having your own box, is that (1) you'll learn some systems skills building it and (2) since you invested in it, you should feel obligated to use it! Good luck.


I would say the exact same thing if the circumstances weren’t so dire, but an NVIDIA GPU is so expensive nowadays that it might be a bit better to use rented services (like the paid version of Google Colab) for now, if you have any monetary constraints.

Maybe GPU prices will stabilize after Ethereum switches to POS and manufacturing pipelines get back to normal, but then I’m not that sure after seeing US trying to go ham with sanctions all over the place.


I have a 3090 for serious (hobby) work and a 1070 from 4ish years back next to my bed. I think getting anything 1070 or better is good enough at the beginner level (training baby datasets/models from scratch such as MNIST/CIFAR, transfer learning the big models). I just don't understand the cost argument .. you can get this stuff used. Main thing is you need CUDA.

The 3090 machine gets about the same use as the 1070 in my case. While it is nice to have more GPU memory to have huge batches and train things faster, this is a quality of life improvement/bragging to be honest. Serious work in some sub-areas needs multi-GPUs or enterprise grade hardware (e.g. A100s).

Software-wise, I just use Pytorch/Pytorch lignting/keras, and anaconda.

Edit: I used to build my own machines in my younger days. The two machines I spoke of above are alienware. Got them on black friday sales. Cost-wise, they were ridiculously cheap for the power they give/impact on my career.


The problem is that even a 1070 is ridiculously expensive these days (About $400 on Newegg, that was the cost of a RTX 2070 a few years ago!) If you can get an used GPU from a friend that would be great, but other than that you’re going to have to shell hundreds of dollars for an old GPU that you will probably have to upgrade soon.

I don’t know the OP’s financial situation, but if you’re a poor student than these things certainly matter.


Good points .. however, my advice for the cash strapped students on HN: Buying 1 good machine will last you 4+ years easily these days. Buying a system like Alienware with a 3070-3080 was under 2K Canadian over the recent Black Friday shopping event. Over 4 years, that is $500. If you are studying CS, it is absolutely worth buying a decent machine. You don't need a top of the line machine but you need to be able to study your craft on something more powerful than a rasperry pi or junky old machine.

I was quite poor growing up, and I recall buying a 3K machine when I started undergrad (that was crap hardware by today's standard). And i have no doubt that having this machine helped me get my first job, and things got better from then. If you are in CS, think of it as an investment, and make it pay off!

Btw, I am trying to be positive .. pls don't construe anything here as negative. I appreciate that money is tight for a lot of folks. Paying 10% interest to buy a 3K machine is not a good idea!! I just skimped a lot as a student, and some of it was quite pointless. I wish someone explained this to me, and hence my comment.

If you are in a situation where you'd have to take a loan to buy a system, pls don't feel like you need a GPU to do anything useful. I am certain one can make do with just colab and a web browser. Good luck to all the students out there!! Life is hard at that stage .. it gets far easier once you have a paying job in the field.


You do not necessarily need a high performance GPU to step into ML. I'm running a couple of my hobby projects either on my notebook or an old discarded former Server hardware with onboard GPU.


+1 on that. I had a Mac Pro 5.1 that was surplus to requirements. Its 10 years old with 64GB of Ram and 2 x 6core processors. I stripped off the OS and installed Ubuntu along with an nVidia 750 - $100. For learning purposes and running hyper-parameter tuning it is pretty robust.


And you can run your own github/gitlab runner on the GPU box and set up CI in your projects. It's good practice (and good practise). And it's free. This unchains you from your PC so you can push code from anywhere and it'll crunch the numbers for you. You just need to install docker, run the runner and make sure you're using a runner container with the NVIDIA stuff installed and you pass through the GPU to the container.


It is very different when you are paying for DL compute yourself, not as part of a job. I have mostly worked in DL for 7 years, but I also have your use case of running my own experiments and simply wanting to learn new things on my own time.

I am biased towards using Keras and I suggest you bookmark these curated examples https://keras.io/examples/

I bought an at home GPU rig 3 years ago and I regret that decision. As many other people here have mentioned Google Colab is a great resource and will save you so much time because you will not be setting up your infrastructure. Start with the free version and when you really need to, switch to Pro or Pro+.

For more flexibility, set up a GPU VPS instance that you can stop when not in use to save money. I like GCP and AWS, but I used to use Azure and that is also a great service. When a VPS is in a stopped state, you only pay a little money for storage. I will sometimes go weeks without starting up my GPU VPS to run an experiment. Stick with Colab when it is good enough for what you are doing.

Now for a little off topic tangent: be aware that most knowledge work is in the process of being automated. Don’t be disappointed if things you spend time learning get automated away. Look at the value of studying new tech as being very transitory, and you will always be in the mode you are in right now: a good desire to learn new things. Also, think of deep learning in the context of using it for paid work to solve real problems. As soon as you feel ready, start interviewing for an entry level deep learning or machine learning job.


Honestly, if you're a beginner in the machine learning space, you're not going to need GPUs for a LONG time, and would benefit from learning what's going on under the hood. Install Python on your machine, learn to structure your projects well, environments and requirements, etc. if and when you need more, figure it out then.


I think Colab is very popular since it's free. Should be perfect for a beginner who doesn't want to spend money. I don't think there's a lot of lock-in so just try it and see. There are bigger questions to worry about, like should you use TensorFlow (no) or PyTorch (yes) or JAX (maybe). That's much harder to change later.


Thr best option for you is- Gradient Paperspace and Colab. Both are free and managed.

Learn Machine Learning first. Do not spend time on managing infra for ML while you are learning ML. Focus on learning ML first.

You can make decent cutting edge models and SOTA classic models just with free options. I am saying this because I have done this.

I suggest that you get Colab Pro after that.

AWS burns a hole on your pocket, and you should not spend money on that now. Although, AWS SageMaker is pretty tension-free experience.

I personally use GCP. I like the tooling around it to be the most convenient.

I suggest you learn the basics first. Learn classic ML, CNNs, RNNs, LSTM, Transformers, learn the necessary Maths, and even GANs if you are inclined.

If done in the right way, it will take you a 5/6 months to 18/20 months, depending on your time commitment, your current levels of grasp on programming and Math.

Do not rush or hurry.

When you reach that point, you can think of spending serious money for Deep Learning projects.

A few months back, I have gotten into TPUs, and these are fantastic. And GCP is my only option for these. I have only used TPUs for learning and personal project and never for work. I intend to keep it that way for a while.


can you please be specific on "necessary Math"? trying to apply pareto principle and cut down amount of time needed to brush up what seemingly all of lower division math courses.


My suggestion is to learn just high school amount of Differential Calculus, Linear Algebra. That much Statistics is not needed.

And do these the right way- forget about being able to prove stuff for test, or remembering the heuristics for solving problems in tests, or being able to pick the correct option from many in test.

Just forget how you studied for test. Learn limited things- very very deeply.

Learn why and how exactly each thing works. Each and every part.

The resources for these are-

1. Mathematics for Machine Learning: Linear Algebra (Coursera, Imperial)

2. - do - : Calculus

3. Essence of Linear Algebra Playlist: 3blue1brown

4. Essence of Calculus Playlist, Ibid

5. Khan Academy Statistics Playlist for High School

Again, understand each and every part very deeply.

This much Math is enough to get started with Machine Learning.

(You will need much much, much more if you want to be a Research Engineer or an Assistant Professor doing active research.

But you can chart your own path after a while.)

Then you start doing ML.

Then you learn whatever math is needed along the way.

Never, ever load your head with a bunch of math concepts just to "prepare" yourself for studying ML. I, very highly advise against it.

So,

Learn very basic stuff, but make the concepts crystally clear -> start doing ML -> learn more math as you face the need.

Learning math is a noble and worthy goal. But do not confuse it with "learning math so that I can study ML".


Not the parent but I would say you just need a little of statistics and a little of calculus and linear algebra. If you are interested in theory then you need more.

For statistics I would recommend "All of Statistics".

The level of calculus required is to know to differentiate.

Algebra is more important. Any introductory linear algebra book would do. If you are able to multiply matrices and solve equations you can postpone a topic until necessary like for example eigenvectors or matrix factorization.

My advice is to first jump into the pool and learn swimming as needed. But learn swimming, use the concrete problems to motivate yourself.


Two things I forgot to mention.

If you want to see what the curriculum of a "Math for ML" from a top research uni looks like, you should check out the website of Math for ML course offered by Universitat Tubingen [0]. If you know those, you will be able to read the math of most papers that you will come across.

Secondly, the best way to get started with ML is to do the Andrew Ng classic on Coursera. Then move on to FastAI [1]. Fastai is a fantastic learning resource and you will learn many nice things from Jeremy Howard that will help you make your own models. But do NOT limit yourself with fastai. It's a crappy software- too many limitation, syn sugars, API anti patterns, etc.

Learn PyTorch for full-fledged projects.

[0]: https://www.tml.cs.uni-tuebingen.de/teaching/2020_maths_for_...

[1]: https://fast.ai


I created something that lets you get free GPU on VS Code with Google Colab with just 1-click. Have a look at https://github.com/DerekChia/colab-vscode

This is my default go-to as a poor man ML setup, with environment and dependencies set up automatically via bash script on start up.


As some others have said, using a low power PC without an accelerator is a perfectly good place to start. This will get you 70% of the way there.

In terms of framework,, Pytorch seems to be better documented than Tensorflow and supports a more intuitive model for GPU/TPU compute in my opinion. It also natively supports complex number types when backpropagating so no need to implement your own. It also seems like Tensorflow has issues converting python code to the graph where Pytorch basically never has issues. It can take me 1/3 less time to program using Pytorch because of this. If you are using high-level interfaces, this shouldn't be an issues though.

Colab (and I believe Sagemaker) has free instances which have high power GPUs/TPUs. However, I prefer having access to a good graphical debugger so I develop on my local computer, then run large models on Colab. If you can afford it, I'd recommend a cheap, low power Cuda capable GPU for your local computer to develop the network, then use an IPython based cloud solution when memory/computer becomes limiting. They are also a fine place to start out. It's just having a graphical debugger can make you more productive.


Working with a large well known tech company, with surprisingly basic/non-existent ML until only very recently.

Using Redshift to do a lot of the heavy lifting and initial data preparation, then SageMaker for hosting models and scoring, and Tableau for dashboards.

While you can do training within SageMaker, we have a cluster of EC2 instances using H2O libraries (xgboost) to train, then wrap the resulting model as a docker image and deploy it to ECR and link to a SageMaker endpoint.

Clunky and very much human-in-the-loop for training and deployment, but you can't run before you can crawl in this space.


I found Redshift to be far inferior to Snowflake as a data warehouse for marshalling any tables or views you need for ML work. There's lots of statistical functions available within Snowflake that will speed things up for you if you need pre-calculations on feature sets.


Are you just interested in the training part and managing the trained models, or you'd actually like to productionize the models and serve them at scale?

A lot of end-to-end platforms are available nowadays that try to cover the entire lifecycle of a model from data prep, ETL, to training, serving, monitoring, operating. However, I found none of them really robust enough to cover all these cases perfectly, so I resorted to using different pieces from different vendors combined with my own stuff to make the entire platform suit my needs. This is still not perfect, though, and I think there's a lot of room for improvement in the space to enable really easy to use and scalable MLOps.

Still some of the tools I found to be ok: TensorFlow TFX, Kubeflow (to some extent - ops are a nightmare), Feast, MLFlow, GCP Vertex and AWS Sagemaker can get some work done, too.


I'd say for the foreseeable future I simply want to focus training and running trained models, I don't plan to do anything at scale like launch a business, so the creating and training aspect is the one I want and probably should only focus on at first either way.

But I like your approach of stitching together various vendors so they fit your use case, I think it can be really flexible but also probably more expensive and slightly harder to manage... I think it can be worth the tradeoff though.

Thank you for the input!


You make a good point there. Personally I’ve struggled quite a bit moving from one off models to taking them to production. Would you mind elaborating on what you mean by none of the platforms being robust enough?


I have been working last 1.5 years on my master thesis and I my setup evolved Colab -> Kaggle -> Azure ML.

Colab you is great for diving into examples that are already premade for colab.

Kaggle is better in my opinion in dataset handling, you can import public or upload your dataset with ease. They give you 30+ gpu hours for a week with ability to train your models in background. This can’t be done in Colab.

ML Azure platform is next level when you can pay for it. I’ve got credits from school. You can start experiments from python sdk with your own configurations, setup python environments, upload datasets, etc.


Try it out on a local Linux machine first, if you have one. There are plenty of ML techniques outside of neural networks which train perfectly well on a CPU, so I'd start there.

Look at some kind of AutoML framework like AutoGluon, then dive deeper on the components it uses once you've got through the initial setup process. AutoGluon will let you train some basic models with all the data cleaning and normalisation steps handled for you.


> I am also not opposed to configuring environments myself rather than using managed solutions

vast.ai has pretty low prices and gives you remote ssh into a GPU instance that you then have root on (albeit containerized).

Having a local GPU is effectively a requirement for doing "development" work (e.g. getting an architecture and/or codebase to the point where you would even be able to start training). Unfortunately, getting your own GPU is just absurdly expensive these days and probably not worth it. In the meantime, colab/kaggle/paperspace can be _okay_ as dev environments. Unfortunately, renting compute on vast.ai all day just to do occasional dev work gets expensive pretty quickly.

For something in-between vast.ai and AWS, datacrunch.io has slightly higher prices, with remote SSH into a server and a few more "niceties" that you get with a traditional cloud such as CPU instances and the ability to use those to pre-load data onto disk.

If and when you are able to get a GPU - just make sure to get nvidia as they have a stranglehold over the industry. The RTX cards are great - I've been doing tons of multimodal work on an RTX 2070 I bought pre-pandemic for around 350$. It only has 8 GiB of vram but is actually quite similar to a server-style V100 otherwise. I assume it probably costs like 2000$ these days.

If you're interested in the realm of running inference/training on giant models (say GPT-J 20B), you may find yourself in lack of VRAM. Using libraries like deepspeed, you can split the work across multiple GPU's. I highly recommend investing time to learning multi-GPU libraries or framework-provided features like pytorch's distributed data parallel as the size of models becomes a limiting factor very quickly in the case of transformers. A sibling comment mentions that you will need institutional support for training such models. This may be true, unfortunately. All I will say is that if you are even mildly competent, the demand for that type of work is increasing a lot lately.

Oh and yes, there is a new-ish site called replicate that I have been using to allow people to run inference on models that I've trained https://replicate.com/ without needing to be a coder. A lot of people use colab for this but that platform is annoying to support in practice.


I like Colab for the most part since I'm biased towards Python, but being centered around Jupyter notebooks does have its shortcomings. Also, despite being a service offered by Google, I prefer PyTorch over Tensorflow.

For smaller projects, I generally find a Towhee pipeline (https://towhee.io/pipelines) that I then fine-tune on my 3080.


I haven't experienced any real issues with GPU availability on Colab, I suggest that you just go ahead and use it and wait with the premature optimization until you actually hit a wall and need it.

For general advice focused on beginners and ESPECIALLY practical, cheap and efficient methods and hacks to do DL, I recommend searching in https://www.fast.ai/ and their forums https://forums.fast.ai/

I'll try to search inside fast.ai if there is a more specific link to give. I know that one of their chief pieces of advice has been to use Colab and take advantage of the 300$ free credit you get (per credit card) when signing up to Google Cloud, which you can use for DL.

Disclaimer - I'm one of the creators of DagsHub, we created the platform especially to help people like you with the difficulties of managing things like data and model versioning, experiment tracking, labeling, etc. we'd love to have you onboard, and thanks for reading until the end :)


We use Kubeflow at our shop. If you use a managed K8s offering, it’s quite simple to manage, and of course you can deploy all your other stuff alongside it using the same tool stack.


I have been using Google Colab and AWS for 'big' personal projects while I use my PC to train light models. Colab is neat because it's free and I like the style of their notebooks. However, you cannot always find available GPU and I wouldn't recommend free Colab for anything else than learning and experimenting with ML. I have been using AWS Ubuntu server set up with PyTorch and had OK experience, but you need to be careful about pricing and set up policies, as well as remember to turn off your machines when you're not working on them if you don't want your credit card to blow up. In the future, I might give Google Colab Pro a try, but most of the work now I do on the company's server.

Anecdote: When I was taking the 'Computing For Data Science' class, we had a task to learn to use AWS tools like SageMaker, NLP bot or DeepRacer and present it in the class. The professor was also new to the whole AWS ecosystem. He opened many instances and left them running for a week which ended up taking 1000$ from bank account.. (Moral of the story: don't use aws with the card where all your money is)


Personally I stick with the classic SMLNJ


This made me smile


Every single time I see someone say “ML” I read it as meta language


As some others have said, using a low power PC without an accelerator is a perfectly good place to start.

Pytorch seems to be better documented than Tensorflow and supports a more intuitive way to use the GPU/TPU in my opinion. It also natively supports complex number types when backpropagating so no need to implement your own. It also seems like Tensorflow has issues converting python code to the graph where Pytorch basically never has issues. If you are using high-level interfaces, this shouldn't be an issues though.

Colab (and I believe Sagemaker) has free instances which have high power GPUs/TPUs. However, I prefer having access to a good graphical debugger so I develop on my local computer, then run large models on Colab. If you can afford it, I'd recommend a cheap, low power Cuda capable GPU for your local computer to develop the network, then use the IPython, cloud based solutions when memory/computer becomes limiting. They are also a fine place to start out. It's just having a graphical debugger can make you more productive.


I have been building Deploifai for a year. I built it for myself early on because I wanted to train machine learning models on the cloud since we don't have the resources for a physical machine. I basically wanted to use my AWS account to create VMs with environments pre-configured, and just simply start building my ML models. Deploifai sets up the VM with pre-selected ML framework, NVIDIA drivers and Jupyterlab. It takes about 15mins to set up but it eventually ends up saving me quite a bit of time.

You can give it a try as well: https://deploif.ai (It says paid on the website, but just get on our Discord and message me). The platform now supports GCP and Azure as well. I am happy to guide you through as well. It's not complete, but in case you choose to go ahead with cloud, this could help you out :)

We'd also be happy to have someone try the tool!


For learning (and for development on most projects, until it actually comes time to train the real model) the K80 or whatever the lower tier is on colab is fine as a gpu.

The problem with colab IMO is that if it's your main platform, you'll be pushed to use notebooks for everything which is not really a good practice. Whatever you use, I'd suggest focusing on building a real train.py script (I'm assuming you'll be using python) that takes command line arguments for the hyperparameters. Don't get sloppy and just have things run as a bunch of cells.

If you are learning, my unsolicited advice is don't use built in datasets, make sure you can write datasets / dataloaders yourself so you understand what is going on and can adapt to your own work. All the stock examples using built in mnist or whatever gloss over the most important parts of setting up the data


I want to add to your advice: not all of your code has to live in Colab. If you create a public repo on GitHub, then on Colab you can simply do a pip install using the git URI for your repo. Your GitHub repo will need to be setup as a proper Python library, but there are many simple examples you can find on the web.

I find this technique to be particularly useful since the same GitHub based libraries that I use on Colab I sometimes also use from Common Lisp locally on my laptop using py4cl.


> All the stock examples using built in mnist or whatever gloss over the most important parts of setting up the data

Could you elaborate on this / provide a link to a tutorial explaining what’s going on?


One of my deal breakers when choosing tooling is how easy is to move from a local environment to a distributed environment. Ideally, you want to start locally and move to a distributed env if you need to. So choose one tool that allows you to get started quickly and move from there.

As an example: one of the reasons why I don't use Kubeflow is because it requires having a Kubernetes cluster up and running, which is an overkill in many cases.

Check out the project I'm working on: https://github.com/ploomber/ploomber


I bought a 4x 2080Ti box from Lambda Labs a few years ago and am very happy with the purchase. It's powerful enough to train reasonably sized models, and I charge clients for the compute time at a rate slightly lower than what AWS or GCP would charge. I also mine ETH on it during downtime, and it's already more than paid itself off (I get "free" electricity from my commercial landlord.)

Finances aside, it's really nice being able to iterate locally on things like training/inference pipelines and model serving. My work is more toward the ML engineering space than it is research, so I don't spend much time in Colab.


For personal use I use my laptop for anything ML and Colab Pro for anything Deep Learning. The free edition of Colab is also great for learning Deep Learning.

I personally find most cloud providers annoying to use for personal use. You have to ask for permission to get access to a GPU that's not any better than what you get in the free with Colab. Then, there's all sorts of configuration you have to do. Colab is much easier and basically zero wait time to go from logging in to starting to run code.

At work we use Databricks, which is too expensive for personal use.


I see some great recommendations in the thread already, but I think https://cocalc.com is definitely worth checking out if you consider yourself to still be more of a learner. Their focus seems to really be on helping people who are new to the field get started. It offers familiar Jupyter Notebook-like features so you should feel right at home.

I have no affiliation with them whatsoever :) Just a fan of what they’re doing.


In case you want to start creating batch jobs too I’d recommend checking out Orchest (www.orchest.io). It has a generous free tier and supports GPU instances. The platform itself is self-hostable too and open source (https://github.com/orchest/orchest).

The main advantages are its interactive pipeline editor, support for Jupyter notebooks in the pipeline/DAG context, and a simple way to specify environment dependencies. It also supports auto start-and stopping of instances so you only pay for the compute necessary to run your data pipelines.

Disclosure, I’m one of the creators.


Colab is a cheap (free) way to start, though you won't be training very large models for very long (which you shouldn't be doing if you are a beginner). You learn a different set of skills when you put together your own rig and install/maintain the libraries, which is something I recommend everyone to try just go gain an appreciation of devops skills. But that's not a necessary diversion for a beginner (and may needlessly increase the learning curve).


If you want to build a web application on top of your ML project, give https://hal9.com a shot. We designed Hal9 with ease of use for deployment and maximum compatibility with web technologies that enable you to build ML apps with React, Vue, etc. We launched a couple months ago but could use some early feedback and users. Thank you!


You can run on-demand training jobs on GCP vertex ai training. I'm not sure the price point but it's pretty useful both for development and training.


I think you have to figure out what kind of problems you want to solve. scikit-learn will run perfectly fine on your CPU and you might not need all this complexity beyond that.

I think you are getting side tracked by a bunch of people at a car show with their hood popped checking out the custom chrome engines each other have. It is a bit pointless to worry about if you don't even know how to drive yet.


I recommend to use Colab for learning because so many research papers publish their own examples as a Colab link nowadays, so you'll have plenty of stuff to try out and explore.

For the actual deployment in production, the only thing that's really affordable is if you send your own GPU workstations to a colocation hosting company. But that's a lot of work.


For inference, we extended KServe (previously KFServing from Kubeflow) to fit our on-prem cluster needs. Highly recommended!


BTW this question probably suits a poll.

https://news.ycombinator.com/newpoll


You need over 200 karma to create a poll. OP doesn't have that much, yet.


At work, I use an Azure Virtual Machine that I scale up/down according to needs. At home, I have a desktop with a GPU, running linux.


What do you want to train and why?


Colab (Pro+?) should be enough until you decide to spend 500$/mo.


And not one MLOps data platform was named in all of the comments.


PyPer built on PyTorch


colab or rapidminer.

both of them work great for scratch projects.


Please give us a try:

https://elbo.ai - Train more. Pay less

We want to make ML tasks as cheap and as easy as possible. We can provision GPU nodes from multiple cloud providers (today we have 4 - TensorDock, AWS, Linode and FluidStack). You don't have to sign up with them, manage keys or passwords, AMI Images, VPCs, Subnets, Firewall rules, EBS volumes or worry about Colab closing your session, network transfer bills, GPU usage approvals, opening ports, billing surprises. We take care of all that and let you focus on learning ML.

I faced the same problem when I started learning ML and tried different cloud providers, Colab, Paperspace, custom PC with RTX30 series GPU. Most of the solutions were either very expensive or very complicated. I started building a tool for myself to deploy GPU nodes with a single command and thought it would be a nice product to have for other ML learners like me.

  1. Sign up at https://elbo.ai for the free tier.
  2. `pip3 install elbo`
  3. `elbo login` with your token (from signup)
  4. Jupyter Notebook in a single command in under 4 minutes (typically)- `elbo notebook`
  5. Setup a GPU node to work remotely over SSH using `elbo create`
  6. Submit ML tasks defined in a YAML file using `elbo run --config <config_file>`

Quick start guide - https://docs.elbo.ai/quick-start

CLI Reference - https://docs.elbo.ai/reference/cli-reference

Looking at our inventory today, you can get a decent Quadro 4000 GPU with 16 CPU and 32 GB memory for about $0.61 an hour.

    PRICE                 GPU CPU   MEM  GPU-MEM 
    $ 0.2700/h      Tesla K80   4   61Gb 12Gb AWS (spot)
    $ 0.6100/h    Quadro 4000  16   32Gb  8Gb TensorDock
    $ 0.9000/h      Tesla K80   4   61Gb 12Gb AWS
    $ 0.9180/h           V100   8   61Gb 16Gb AWS (spot)
    $ 0.9200/h    Quadro 5000   2    4Gb 16Gb FluidStack
    $ 0.9600/h          A5000   2   16Gb 24Gb TensorDock
    $ 1.4900/h          A4000  12   64Gb 16Gb FluidStack
    $ 1.4940/h            A40   2   12Gb 48Gb TensorDock
    $ 1.5000/h    Quadro 6000   8   32Gb  0Gb Linode 
    $ 1.5140/h          A6000   2   16Gb 48Gb TensorDock
    $ 2.1600/h   8x Tesla K80  32  488Gb 12Gb AWS (spot)
    $ 3.0000/h 2x Quadro 6000  16   64Gb  0Gb Linode 
    $ 3.0600/h           V100   8   61Gb 16Gb AWS
    $ 3.6720/h        4x V100  32  244Gb 16Gb AWS (spot)
    $ 3.7460/h        7x V100   6    8Gb 16Gb TensorDock
    $ 4.3200/h  16x Tesla K80  64  732Gb 12Gb AWS (spot)
    $ 4.5000/h 3x Quadro 6000  20   96Gb  0Gb Linode 
    $ 6.0000/h 4x Quadro 6000  24  128Gb  0Gb Linode 
    $ 7.3440/h        8x V100  64  488Gb 16Gb AWS (spot)
    $ 7.9200/h   8x Tesla K80  32  488Gb 12Gb AWS
    $ 9.8318/h        8x A100  96 1152Gb 80Gb AWS (spot)
    $13.0360/h        4x V100  32  244Gb 16Gb AWS
    $14.4000/h  16x Tesla K80  64  732Gb 12Gb AWS
    $24.4800/h        8x V100  64  488Gb 16Gb AWS
    $32.7726/h        8x A100  96 1152Gb 80Gb AWS

If you just need a dedicated machine on the cloud, then I would highly recommend our provider - Tensordock (https://tensordock.com/). They have a good range of ML capable GPUs and are cheaper than many other cloud providers.

We are just getting started, so if you hit any glitches or bugs, please email us at hi@elbo.ai

Thanks for reading till here and for your time!

EDIT: Updated formatting.


> I am very motivated to delve into this space (it's been on my mind a while) and I want to do it right, which is why I am asking for personal experiences on this forum given that there is a very healthy mix of technology hobbyists as well as professionals on HN, of which the opinion of both is equally valuable to me for different reasons.

Regarding personal experiences, I moved to ML engineering after almost 15 years in Software Development. I found it challenging at first, to cope up with the terminology and Math. Although I was able to create data processing pipelines and simple models it was still a mystery how it all worked. After a good year and a half of trying to teach myself ML, I decided that I needed formal education. After researching possible options that work would for my work schedule and skill level, the Stanford SCPD AI Certificate program seemed to be the best. Here are some useful pointers (in no particular order).

- This blog by Pavel helped me a lot, to understand what the course was about and how to approach it -- http://coldattic.info/post/122/

- Most of Stanford Lectures notes and slides are publicly available. CS229 is a good beginner class to take (http://cs229.stanford.edu/syllabus.html)

- The best and the most interesting IMO, is CS236 on Generative Modeling. It is taught by Prof. Ermon and his team. Some of the topics covered in class (especially Score based models) were mind blowing. Here is a talk by Prof. Ermon if you are interested in generative modeling (https://www.youtube.com/watch?v=8TcNXi3A5DI).

- If your math skills are a bit rusty, then you will have to practice and work a lot more. I found the TA sessions and office hours extremely helpful.

Some additional personal experiences:

- "Deep Learning with Python" by Francois Chollet (Creator of Keras) is a good book to get started. The code samples are in TF Keras and easy to understand and implement.

- Avoid TensorFlow if you can. Its unnecessarily complicated (personal opinion). You will find PyTorch and PyTorch Lightning much more approachable to start learning.

- I also found Kaggle tutorials helpful for practical aspects of ML. For example: Categorial Variables (https://www.kaggle.com/alexisbcook/categorical-variables).

- Yannic Kilcher's ML News series is a great way to keep in touch with the latest events in ML (https://www.youtube.com/c/YannicKilcher). Also very entertaining :)

- Prof. Jeff Heaton has a bunch of good videos on practical ML applications - https://www.youtube.com/c/HeatonResearch

ML is very exciting and rewarding, Good luck on your new adventure! Feel to reach out and I would be happy to help in any way.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: