I’m curious where you got any of those numbers. Many laptops use <20W. But most local-ai inferencing requires high end, power hungry nvidia GPUs that use multiple hundreds of watts. There’s a reason those GPUs are in high demand, with prices sky high, because those same (or similar) power hungry chips are in data centers.
Compared to traditional computing it seems to me like there’s no way AI is power efficient. Especially when so many of the generated tokens are just platitudes and hallucinations.
> The agreed-on best guess right now for the average chatbot prompt’s energy cost is actually the same as a Google search in 2009: 0.3 Wh. This includes the cost of the answering your prompt, idling AI chips between propmts, cooling in the data center, and other energy costs in the data center. This does not include the cost of training the model, the embodied carbon costs of the AI chips, or the fact that data centers typically draw from slightly more carbon intense sources. If you include all of those, the full carbon emissions of an AI prompt rise to 0.28 g of CO2. This is the same emissions as we cause when we use ~0.8 Wh of energy.
How concerned should you be about spending 0.8 Wh? 0.8 Wh is enough to:
Stream a video for 35 seconds
Watch an LED TV (no sound) for 50 seconds
Upload 9 photos to social media
Drive a sedan at a consistent speed for 4 feet
Leave your digital clock on for 50 minutes
Run a space heater for 0.7 seconds
Print a fifth of a page of a physical book
Spend 1 minute reading this blog post. If you’re reading this on a laptop and spend 20 minutes reading the full post, you will have used as much energy as 20 ChatGPT prompts. ChatGPT could write this blog post using less energy than you use to read it!
W stands for Watts, which means Joules per second.
The energy usage of the human body is measured in kilocalories, aka Calories.
Combustion of gasoline can be approximated by conversion of its chemicals into water and carbon dioxide. You can look up energy costs and energy conversions online.
Some AI usage data is public. TDP of GPUs are also usually public.
I made some assumptions based on H100s and models around the 4o size. Running them locally changes the equation, of course - any sort of compute that can be distributed is going to enjoy economies of scale and benefit from well worn optimizations that won't apply to locally run single user hardware.
Also, for AI specifically, depending on MoE and other sparsity tactics, caching, hardware hacks, regenerative capture at the datacenter, and a bajillion other little things, the actual number is variable. Model routing like OpenAI does further obfuscates the cost per token - a high capabilities 8B model is going to run more efficiently than a 600B model across the board, but even the enormous 2T models can generate many tokens for the equivalent energy of burning µL of gasoline.
If you pick a specific model and gpu, or Google's TPUs, or whatever software/hardware combo you like, you can get to the specifics. I chose µL of gasoline to drive the point across, tokens are incredibly cheap, energy is enormously abundant, and we use many orders of magnitude more energy on things we hardly ever think about, it just shows up in the monthly power bill.
AC and heating, computers, household appliances, lights, all that stuff uses way more energy than AI. Even if you were talking with AI every waking moment, you're not going to be able to outpace other, far more casual expenditures of energy in your life.
A wonderful metric would be average intelligence level per token generated, and then adjust the tokens/Joule with an intelligence rank normalized against a human average, contrasted against the cost per token. That'd tell you the average value per token compared to the equivalent value of a human generated token. Should probably estimate a ballpark for human cognitive efficiency, estimate token/Joule of metabolism for contrast.
Doing something similar for image or music generation would give you a way of valuing the relative capabilities of different models, and a baseline for ranking human content against generations. A well constructed meme clip by a skilled creator, an AI song vs a professional musician, an essay or article vs a human journalist, and so on. You could track the value over context length, length of output, length of video/audio media, size of image, and so on.
Suno and nano banana and Veo and Sora all far exceed the average person's abilities to produce images and videos, and their value even exceeds that of skilled humans in certain cases, like the viral cat playing instrument on the porch clips, or ghiblification, or bigfoot vlogs, or the AI country song that hit the charts. The value contrasted with the cost shows why people want it, and some scale of quality gives us an overall ranking with slop at the bottom up to major Hollywood productions and art at the Louvre and Beethoven and Shakespeare up top.
Anyway, even without trying to nail down the relative value of any given token or generation, the costs are trivial. Don't get me wrong, you don't want to usurp all a small town's potable water and available power infrastructure for a massive datacenter and then tell the residents to pound sand. There are real issues with making sure massive corporations don't trample individuals and small communities. Local problems exist, but at the global scale, AI is providing a tremendous ROI.
AI doombait generally trots out the local issues and projects them up to a global scale, without checking the math or the claims in a rigorous way, and you end up with lots of outrage and no context or nuance. The reality is that while issues at scale do exist, they're not the issues that get clicks, and the issues with individual use are many orders of magnitude less important than almost anything else any individual can put their time and energy towards fixing.
You are cleary biased.
A complex chatgpt 5 thinking runs at 40 Wh per prompt. This is more in line with the estimated load that ai needs to scale. These thinling models wpuld be faster but use similar amount of energy. Humans doing that thinking use far fewer jpiles than gpt 5 thinking. Its not even close.
I just followed their guide last week and was surprised how smooth it went. Their documentation seemed very thorough. I kinda expected a few issues, but everything worked flawlessly. Seems like they do a pretty good job of detecting most of the edge cases that would cause issues. Granted, my installation hasn’t been modified too heavily outside the norm. I think I had one or two modified config files I had to edit, but the helper script found and told me about them and how to handle it.
I had put off the upgrade for a while figuring it would be a breaking change. But it went so smoothly I’ll probably be upgrading to 9.1 pretty soon.
Quite. Its almost as though the docs are written by people who actually use it.
I was (still am sadly) a VMware consultant for about 25 years. It makes me laugh when I hear breathless "enterprise noises" with regards VMware and how PVE isn't quite ready yet.
PVE is just so easy and accommodating. It's Linux on Debian with a few nobs on. The web interface is so quick and uncluttered and simple. The clustering arrangements are superb and simple. The biggest issue for me and many like me was how to deal with iSCSI SANS (no snapshots - long story) It turns out you can pull the SSDs out of a Dell Msomething SAN and wack them into the hosts and you have a hyperconverged Ceph thingie with little effort.
VMware rapidly gets very expensive. Nowadays with Broadcom you have to fork out for the full enterprise thing to get DRS and vDS - that's auto balancing clusters and funky networking. PVE gifts you Open vSwitch support out of the box and all clusters are equal. Storage DRS (migrate virty hard discs on the fly) is free on PVE too. Oh and you get containers too on PVE - VMware Tanzu is seriously expensive.
Anyway, I could grind on about this for quite some time but in my opinion, PVE is a far better base product in general for your VMs. A vCentre is a horrendous waste of resources and the rest of VMware's appliances are pretty tubby too. I recall evaluating their first efforts at SDN with edge firewalls and so on - no thanks!
I had an experiment with vmware to build our next iteration of kubernetes platform, and they were asking why we used rancher and things like that, they got very frustrated when I was trying to do anything with their product and needed to sign up or sign in to a billion things, which I got frustrated and said ‘this! this id why we went with rancher, because there was no friction!’
too bad SUSE is doing the rancher prime stuff now as well.
The whole point (from a savvy business perspective) is throw money at the hardware and throw experience at the software.
In the end, Proxmox is based on KVM and KVM does run a workload or two across the world. VMware isn't KVM and I watched both be born and grow up oh and I should mention Xen but I can't be arsed. Most of the rest are Johnny come latelys.
If I need a massive cloud then I'll go all in on K8s or whatever and get my orchestration hat on big time but for my needs and my customer needs, PVE is more than enough, whilst being just enough.
My hacker news icon has been stuck as the icon for a weather site that I sometimes check. It’s been stuck that way for close to a year now, and has survived an iOS update too.
It persists across profiles and into private browsing mode.
Yes, I also have them in general, e.g. on about:newtab, but for HN, there isn't any shown on the tab (there is if I make a bookmark). Maybe I messed something up.
The tailscale login servers had an issue last week. My local network had an issue at the same time and all connections dropped. Then none of my stuff could reconnect because I couldnt connect to tailscale :(
Looking into setting up my own headscale instance now. This is the first issue I’ve had with tailscale but seems dumb that my local lan devices couldn’t even talk to each other.
(Tailscalar here) We're taking this kind of outage very seriously. In particular this outage meant newly connected devices couldn't reliably reach our control plane and couldn't get the latest network state. IMO that's not okay.
One of Tailscale's fundamental promises is that we want to try as much as possible to get our control plane and infrastructure as out of the way of your connectivity paths, while still using our infra to "assist" when there's connectivity issues (like difficult to traverse NAT), and maintain trust across the network, and keep everything up to date.
It's a tough balance and this year we're dedicating resources to making sure even small blips in our control plane don't mean temporary losses of connectivity across even your newly woken up devices. In particular we're taking a multi-pronged approach, right now. We're working in parallel to increase client tolerance of control outages (in response to cracks shown in this incident) and have an ongoing effort to make the control plane more resilient and available.
There isn’t a date in the article, but I know I had read this months ago. And sure enough, wayback has the text-to-image page from April.
But the image editing page linked at the top is more recent, and was added sometime in September. (And was presumably the intended link) I hadn’t read that page yet. Odd there is no dates, at first glance one might think the pages were made at the same time.
I’ve run into idle bot accounts several times while playing and it’s infuriating. Mainly in the arms race mode.
Players can leave and join that mode at any time. So the bots will constantly be joining and leaving. if the bots manage to become 50% of the game they will vote kick all the remaining players. I’ve had several in progress matches interrupted because a few of the actual players bailed and the bots managed to take over the lobby.
Theres a lot of weird setup often required on the backend in my experience, but when it works, it works well. But until you get everything dialed in it can have weird issues that don't have a clear path to fix them.
It might be better in their weird AIO solution? But i dont like the idea of giving a docker container the ability to spawn more containers. I just use one of their normal docker containers and have had to manually change a lot to make it work as they actually suggest. Like just recently i setup their notify_push plugin as it improves performance - but the provided setup instructions didn't work in my setup and i had to manually tweak several things.
It took a while for me to fully set up Nextcloud with STUN/TURN, Office server, etc. in a properly containerised setup. It clearly felt like it was built before containers and modern devops approaches were a best practice.
And while community is great, I don't think Nextcloud developer community is that big and active. Their plugin system is basic, archaic, lots of things there are begging for rework.
So while Nextcloud is decent once set up, I am happy to see some fresh OSS projects solving similar issues appear. Maybe their approach will be better.
The world has already migrated through so many past now-insecure cryptography setups. If quantum computers start breaking things, people will transition to more secure systems.
In HTTPS for example, the server and client must agree on how to communicate, and we’ve already had to deprecate older, now-insecure cryptography standards. More options get added, and old ones will have to be deprecated. This isn’t a new thing, just maybe some cryptographic schemes will get rotated out earlier than expected.
> If quantum computers start breaking things, people will transition to more secure systems.
that's not really the issue, the real interesting part is existing encrypted information that three letter agencies likely have dutifully stored in a vault and that's going to become readable. A lot of that communication was made under the assumption that it's secure.
Yeah, all the encrypted messages collected when illegal markets got seized will be decrypted. Many of them uses RSA 2048 so by 2030 its gonna be broken according to the timelines.
Its actually something we will notice. Arrests will be announced.
I’ve heard anecdotes of people using an entirely internal domain like “plex.example.com” even if it’s never exposed to the public internet, google might flag it as impersonating plex. Google will sometimes block it based only on name, if they think the name is impersonating another service.
Its unclear exactly what conditions cause a site to get blocked by safe browsing. My nextcloud.something.tld domain has never been flagged, but I’ve seen support threads of other people having issues and the domain name is the best guess.
I'm almost positive GMail scanning messages is one cause. My domain got put on the list for a URL that would have been unknowable to anyone but GMail and my sister who I invited to a shared Immich album. It was a URL like this that got emailed directly to 1 person:
Then suddenly the domain is banned even though there was never a way to discover that URL besides GMail scanning messages. In my case, the server is public so my siblings can access it, but there's nothing stopping Google from banning domains for internal sites that show up in emails they wrongly classify as phishing.
Think of how Google and Microsoft destroyed self hosted email with their spam filters. Now imagine that happening to all self hosted services via abuse of the safe browsing block lists.
if it was just the domain, remember that there is a Cert Transparency log for all TLS certs issued nowadays by valid CAs, which is probably what Google is also using to discover new active domains
It doesn’t seem like email scanning is necessary to explain this. It appears that simply having a “bad” subdomain can trigger this. Obviously this heuristic isn’t working well, but you can see the naive logic of it: anything with the subdomain “apple” might be trying to impersonate Apple, so let’s flag it. This has happened to me on internal domains on my home network that I've exposed to no one. This also has been reported at the jellyfin project: https://github.com/jellyfin/jellyfin-web/issues/4076
That's not going to be gleaned from a CT log or guessed randomly. The URL was only transmitted once to one person via e-mail. The sending was done via MXRoute and the recipient was using GMail (legacy Workspace).
The only possible way for Google to have gotten that URL to start the process would have been by scanning the recipient's e-mail.
I've read almost everything linked in this post and on Reddit and, with what you pointed out considered, I'd say the most likely thing that got my domain flagged is having a redirect to a default styled login page.
The thing that really frustrates me if that's the case is that it has a large impact on non-customized self-hosted services and Google makes no effort to avoid the false positives. Something as simple as guidance for self-hosted apps to have a custom login screen to differentiate from each other would make a huge difference.
Of course, it's beneficial to Google if they can make self-hosting as difficult as possible, so there's no incentive to fix things like this.
Well, that's potentially horrifying. I would love for someone to attempt this in as controlled of a manner as possible. I would assume it's possible for anyone using Google DNS servers to also trigger some type of metadata inspection resulting in this type of situation as well.
Also - when you say banned, you're speaking of the "red screen of death" right? Not a broader ban from the domain using Google Workplace services, yeah?
> Also - when you say banned, you're speaking of the "red screen of death" right?
Yes.
> I would love for someone to attempt this in as controlled of a manner as possible.
I'm pretty confident they scanned a URL in GMail to trigger the blocking of my domain. If they've done something as stupid as tying GMail phishing detection heuristics into the safe browsing block list, you might be able to generate a bunch of phishy looking emails with direct links to someone's login page to trigger the "red screen of death".
This reminds me of another post where a scammer sent a gmail message containing https://site.google.com/xxx link to trick users into click, but gmail didn't detect the risk.
these AI services also won't really distinguish between "user input" and "malicious input that the user is asking about".
Obviously the input here was only designed to be run in a terminal, but if it was some sort of prompt injection attack instead, the AI might not simply decode the base64, it might do something else.
Compared to traditional computing it seems to me like there’s no way AI is power efficient. Especially when so many of the generated tokens are just platitudes and hallucinations.
reply