There is some nice information in the appendix, like:
“One training with a schedule similar to the one reported in the paper represents ≈ 30 GPU-days. We also roughly estimate that the total GPU-days used for running all our experiments to 5000, or ≈ 120k GPU-hours. This amounts to total emissions in the order of 20 tons of CO2eq.”
I am not in AI at all, so I have no clue how bad this is. But it’s nice to have some idea of the costs of such projects is.
Assuming you're purchasing from someone with infinite carbon credits and you're spending it in an environment with infinite ability to re-sink the carbon. Sure.
To a more and lesser degree depending on the action, I try to apply "that rigor" to myself, at least?
And yes, I think the world would be better off if more people considered how their decisions impact others, if that's what you're getting at, but it's unrealistic to expect everyone to care about other people - and of course entirely impossible to account for ALL variables.
How do you come up with a ratio that you consider a fair trade?
I'm really not sure how I'd personally set a metric to decide it. I could go with the stat that one barrel of oil is equivalent to 25,000 hours of human labor. That means each barrel is worth 12.5 years of labor at 40 hours per week. That seems outrageous though - off hand I don't know how many barrels would be used during the flight but it would have to be replacing way more than several engineers working for several years.
> 1. Different energy sources produce varyings of co2
Yes.
> 2. This likely does not include co2 to make the GPUs or machines
Definitely not, nobody does that.
Wish they did, in general I feel like a lot of beliefs around sustainability and environmentalism are wrong or backwards precisely because embodied energy is discounted; see e.g. stats on western nations getting cleaner, where a large - if not primary - driver of improved stats is just outsourcing manufacturing, so emissions are attributed to someone else.
Anyway, embodied energy isn't particularly useful here. Energy embodied in GPUs and machines amortizes over their lifetimes and should be counted against all the things those GPUs did, do and will do, of which the training in question is just a small part. Not including it isolates the analysis to contributions from the specific task per se, and makes the results applicable to different hardware/scenarios.
> 3. Humans involved are not added to this at all, and all of the impact they have on the environment
This metric is so ill-defined as to be arbitrary. Even more so with conjunction with 2, as you could plausibly include a million people into it.
> 4. No ability to predict future co2 from using this work.
Total, no. Contribution of compute alone given similar GPU-hours per ton of CO2eq, yes.
1. yes, this is the default co2 eq/ watts from the tool that is cited in the paper, but it's actually very hard to know the source of energy that aliments the cluster, so the numbers are only an order of magnitude rather than "real" numbers
2. 4. I found that https://huggingface.co/blog/sasha/ai-environment-primer gives a good broad overview (not only of the co2 eq, which is limited imo) of AI environmental impact
> Also if it really matters, then why do it at all? If we’re saying hey this is destroying the environmental and care, then maybe don’t do that work?
Although it may not the best way to quantify it, it gives a good overview of it. I would argue that it matters a lot to quantify and popularize the idea of such sections in any experimental ML papers (and should in my opinion be the default, as it is now for the reproducibility statement and ethical statement).
People don't really know what an AI experiment represents. It may seem very abstract since everything happens in the "cloud", but it is pretty much physical: the clusters, the water consumption, the energy. And as someone who works in AI, I believe it's important to know what this represents, which these kinds of sections show clearly. It was the same in the DINOv2 paper or in the Llama paper.
But let’s say you were able to see it all somehow. Your lab was also the data center, powerplant, etc. You see the fans spinning, the turbines moving, and exhaust coming out. Do you change what you do? Or do you look around, see all the others doing the same and just say welp this is the tragedy of the commons.
I think it’s clear that people generally want to move to clean energy, and use less energy as a whole. That’s a gradual path. Maybe this reinforces the thinking, but ultimately you’re still causing damage. If you really truly cared about the damage, why would you do it at all?
I’m not a big fan of lip service. Just like all these land acknowledgements. Is a criminal more “ethical” if they say “I know I’m stealing from you” as they mug you? If you cared, give back your land and move elsewhere!
yes I agree...
But personally I do wonder what is best between (1) leaving without any impact on the rest of the herd, or (2) trying to be careful about what you do, raise awareness and try to move the herd in the good direction. I would personally go for (2) since usually the scale of these papers is still o(LLM training).
so say i have a site with 3000 images, 2M pixel each. How many GPU-months it would take to mark them? And, what gigabytes i would have to keep for the model?
The amounts of gpu time in the paper are for all experiments, not just training the last model that is OSS (which is usually reported). People don't just oneshot the final model.
Yes, although the number of parameters is not directly linked with the flops/speed of inference. What's nice about this AE architecture is that most of the compute (message embedding, and merging) is done at low resolution, same idea as behind latent diffusion models