Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Link to the paper in the README is broken. I believe this is the correct link to the referenced paper: https://arxiv.org/abs/2411.07231


There is some nice information in the appendix, like:

“One training with a schedule similar to the one reported in the paper represents ≈ 30 GPU-days. We also roughly estimate that the total GPU-days used for running all our experiments to 5000, or ≈ 120k GPU-hours. This amounts to total emissions in the order of 20 tons of CO2eq.”

I am not in AI at all, so I have no clue how bad this is. But it’s nice to have some idea of the costs of such projects is.


> This amounts to total emissions in the order of 20 tons of CO2eq.

That's about 33 economy class roundtrip flights from LAX to JFK.

https://www.icao.int/environmental-protection/Carbonoffset/P...


33 seats on a flight maybe. It's about one passenger aircraft flight, one way.


And it has produced a system superior to several engineers working full time for several years.

Seems like a fair carbon trade.


Assuming you're purchasing from someone with infinite carbon credits and you're spending it in an environment with infinite ability to re-sink the carbon. Sure.


Are you applying that same rigor to every action people undertake daily?


To a more and lesser degree depending on the action, I try to apply "that rigor" to myself, at least?

And yes, I think the world would be better off if more people considered how their decisions impact others, if that's what you're getting at, but it's unrealistic to expect everyone to care about other people - and of course entirely impossible to account for ALL variables.


But is it a trade? Feels additive. Assuming same engineers will continue spending their carbon budget elsewhere ...


> Seems like a fair carbon trade.

How do you come up with a ratio that you consider a fair trade?

I'm really not sure how I'd personally set a metric to decide it. I could go with the stat that one barrel of oil is equivalent to 25,000 hours of human labor. That means each barrel is worth 12.5 years of labor at 40 hours per week. That seems outrageous though - off hand I don't know how many barrels would be used during the flight but it would have to be replacing way more than several engineers working for several years.


> That seems outrageous though

There's a good reason oil is so hard to give up. [6.1 GJ worth of crude oil](https://en.wikipedia.org/wiki/Barrel_of_oil_equivalent) costs about $70 USD.


Barrel of oil is currently $70 which is 10 person-hours at minimum wage.

I guess you could get number like that if you are comparing the energy output. But that is a weird way to do it since we don't use people for energy.


Only if it's actually used.... hard to imagine this has much use to begin with


It’s very interesting this is gpu time based because:

1. Different energy sources produce varyings of co2

2. This likely does not include co2 to make the GPUs or machines

3. Humans involved are not added to this at all, and all of the impact they have on the environment

4. No ability to predict future co2 from using this work.

Also if it really matters, then why do it at all? If we’re saying hey this is destroying the environmental and care, then maybe don’t do that work?


> 1. Different energy sources produce varyings of co2

Yes.

> 2. This likely does not include co2 to make the GPUs or machines

Definitely not, nobody does that.

Wish they did, in general I feel like a lot of beliefs around sustainability and environmentalism are wrong or backwards precisely because embodied energy is discounted; see e.g. stats on western nations getting cleaner, where a large - if not primary - driver of improved stats is just outsourcing manufacturing, so emissions are attributed to someone else.

Anyway, embodied energy isn't particularly useful here. Energy embodied in GPUs and machines amortizes over their lifetimes and should be counted against all the things those GPUs did, do and will do, of which the training in question is just a small part. Not including it isolates the analysis to contributions from the specific task per se, and makes the results applicable to different hardware/scenarios.

> 3. Humans involved are not added to this at all, and all of the impact they have on the environment

This metric is so ill-defined as to be arbitrary. Even more so with conjunction with 2, as you could plausibly include a million people into it.

> 4. No ability to predict future co2 from using this work.

Total, no. Contribution of compute alone given similar GPU-hours per ton of CO2eq, yes.


>Definitely not, nobody does that.

Except every proper Life-cycle assessment on carbon emissions ever.


  >proper
doing Scotsman-like lifting when the point was that these things are not considered, or are "externalities"


not sure how that invalidates Algernon's point. These things should be considered, and are in a lot of LCAs.


  >should be considered, and are
Not as much as they should be, was his point. Saying something is not proper is the No True Scotsman fallacy.


Just define "proper" to mean "it is an analysis that considers the whole supply chain and would pass academic peer review".


Those would count toward “Scope 3” emissions, right?

https://www.mckinsey.com/featured-insights/mckinsey-explaine...


1. yes, this is the default co2 eq/ watts from the tool that is cited in the paper, but it's actually very hard to know the source of energy that aliments the cluster, so the numbers are only an order of magnitude rather than "real" numbers 2. 4. I found that https://huggingface.co/blog/sasha/ai-environment-primer gives a good broad overview (not only of the co2 eq, which is limited imo) of AI environmental impact

> Also if it really matters, then why do it at all? If we’re saying hey this is destroying the environmental and care, then maybe don’t do that work?

Although it may not the best way to quantify it, it gives a good overview of it. I would argue that it matters a lot to quantify and popularize the idea of such sections in any experimental ML papers (and should in my opinion be the default, as it is now for the reproducibility statement and ethical statement). People don't really know what an AI experiment represents. It may seem very abstract since everything happens in the "cloud", but it is pretty much physical: the clusters, the water consumption, the energy. And as someone who works in AI, I believe it's important to know what this represents, which these kinds of sections show clearly. It was the same in the DINOv2 paper or in the Llama paper.


But let’s say you were able to see it all somehow. Your lab was also the data center, powerplant, etc. You see the fans spinning, the turbines moving, and exhaust coming out. Do you change what you do? Or do you look around, see all the others doing the same and just say welp this is the tragedy of the commons.

I think it’s clear that people generally want to move to clean energy, and use less energy as a whole. That’s a gradual path. Maybe this reinforces the thinking, but ultimately you’re still causing damage. If you really truly cared about the damage, why would you do it at all?

I’m not a big fan of lip service. Just like all these land acknowledgements. Is a criminal more “ethical” if they say “I know I’m stealing from you” as they mug you? If you cared, give back your land and move elsewhere!


yes I agree... But personally I do wonder what is best between (1) leaving without any impact on the rest of the herd, or (2) trying to be careful about what you do, raise awareness and try to move the herd in the good direction. I would personally go for (2) since usually the scale of these papers is still o(LLM training).


so say i have a site with 3000 images, 2M pixel each. How many GPU-months it would take to mark them? And, what gigabytes i would have to keep for the model?


That amount of compute was used for training. For inference (applying the watermarks), hopefully no more than a few seconds per image.

Llama 3 70B took 6.4M GPU hours to train, emitting 1900 tons of CO2 equivalent.


Thanks! I was not at all aware of the scale of training! To me those are crazy amounts of gpu time and resources.


The amounts of gpu time in the paper are for all experiments, not just training the last model that is OSS (which is usually reported). People don't just oneshot the final model.


The embedder is only 1.1M parameters, so it should run extremely fast.


Yes, although the number of parameters is not directly linked with the flops/speed of inference. What's nice about this AE architecture is that most of the compute (message embedding, and merging) is done at low resolution, same idea as behind latent diffusion models




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: