Updated rate limits for unauthenticated requests

Zdh4DYsGvdjJ · 2025-05-14T18:01:18 1747245678

GitHub answered https://github.com/orgs/community/discussions/159123#discuss...

TheNewsIsHere · 2025-05-14T13:41:44 1747230104

I don’t think the publication date (May 8, as I type this) on the GitHub blog article is the same date this change became effective.

From a long-term, clean network I have been consistently seeing these “whoa there!” secondary rate limit errors for over a month when browsing more than 2-3 files in a repo.

My experience has been that once they’ve throttled your IP under this policy, you cannot even reach a login page to authenticate. The docs direct you to file a ticket (if you’re a paying customer, which I am) if you consistently get that error.

I was never able to file a ticket when this happened because their rate limiter also applies to one of the required backend services that the ticketing system calls from the browser. Clearly they don’t test that experience end to end.

gs17 · 2025-05-15T13:01:19 1747314079

Maybe they expect you to file the ticket from a different IP.

gnabgib · 2025-05-09T16:12:42 1746807162

60 req/hour for unauthenticated users

5000 req/hour for authenticated - personal

15000 req/hour for authenticated - enterprise org

According to https://docs.github.com/en/rest/using-the-rest-api/rate-limi...

I bump into this just browsing a repo's code (unauth).. seems like it's one of the side effects of the AI rush.

mijoharas · 2025-05-14T16:05:10 1747238710

Why would the changelog update not include this? it's the most salient piece of information.

I thought I was just misreading it and failing to see where they stated what the new rate limits were, since that's what anyone would care about when reading it.

naikrovek · 2025-05-15T13:14:24 1747314864

> Why would the changelog update not include this?

I don't know. The limits in the comment that you're replying to are unchanged from where they were a year ago.

So far I don't see anything that has changed, and without an explanation from GitHub I don't think we'll know for sure what has changed.

1oooqooq · 2025-05-15T01:56:22 1747274182

because it will go way lower soon. and because they don't have to.

they already have all your code. they've won.

naikrovek · 2025-05-15T13:16:23 1747314983

you are not ... you don't have any part of your body in reality, do you? you have left the room.

If people training LLMs are excessively scraping GitHub, it is well within GitHub's purview to limit that activity. It's their site and it's up to them to make sure that it stays available. If that means that they curtail the activity of abusive users, then of course they're going to do that.

1oooqooq · 2025-05-15T14:23:26 1747319006

it was never about avoid scrapers. that's just the excuse. they own the scrapers too, remember.

why do you think before they blocked non logged in users from even searching? they need your data and they are getting it exactly in their terms. because as I've said, they have already won.

sebmellen · 2025-05-15T15:05:59 1747321559

Embrace, extend, extinguish.

naikrovek · 2025-05-15T16:49:29 1747327769

… I… what has been embraced, extended and extinguished?

I see no MS or GitHub specific extension, here. Copilot exists, and so do many other tools. Copilot can use lots of non-Microsoft models, too, including models from non-Microsoft companies. You can also get git repository hosting from other companies. You can even do it yourself.

So, explain yourself. What has been embraced, extended, and extinguished? Be specific. No “vibes”. Cite your sources or admit you have none. I see no extending unique to MS and I see no extinguishing. So explain yourself.

1oooqooq · 2025-05-16T17:56:31 1747418191

the entire open source community exist in github.

Microsoft have a more successful social network for programmers than HN or google circles (heh) ever dreamed.

the arguments had already dropped access to the information by scrapers, since they own the scrapers and all... why did you brought it back as the main argument? they hijacked what could have been a community hub and turned into a walled garden to sell a few enterprise licenses.

pdimitar · 2025-05-15T19:24:16 1747337056

I'm with you, but let's not forget that they haven't started the extinguishing yet. They might yet do it. The extending they've done plenty: issue tracker, wiki, discussions etc.

naikrovek · 2025-05-18T02:44:57 1747536297

Those things all existed before Microsoft bought them, and they’re all present in competing products, even free ones.

naikrovek · 2025-05-15T16:46:47 1747327607

[flagged]

tomhow · 2025-05-16T14:05:11 1747404311

> What the hell are … no, this is not a drug. This is a mental illness. Get help.

This is an unacceptable comment on HN and we have to ban accounts that do it repeatedly. We've warned you in the past about inappropriate comments. Please remind yourself of the guidelines and take care to observe them in future.

https://news.ycombinator.com/newsguidelines.html

naikrovek · 2025-05-18T02:46:37 1747536397

Ban me then.

The person I responded to clearly has a mental illness and needs help.

The people behind this site think it’s some bastion of civility, and it just isn’t. People can be assholes using any words they choose, and they do so continuously here, but you mods don’t care because your rules are followed.

“Orange website bad” isn’t a meme for no reason. It’s because the orange website is bad. So fucken ban me.

tomhow · 2025-05-18T22:06:02 1747605962

We don't need to ban you, we just need you, along with everyone else here, to observe the guidelines, the first of which in the Comments section, is to ”be kind”. If everyone made the effort to do that, the site wouldn't be bad. It's no big deal, and it's not that hard to observe the guidelines if you're sincere about making a positive contribution to the site.

naikrovek · 2025-05-19T11:35:13 1747654513

I'm 100% kind when people are kind to me.

I am 0% kind when people are unkind to me.

I've lived my entire life rolling over when people are assholes to me because I don't want to make the situation worse, or as seen here, throw the 2nd punch. the 2nd punch is always the one that gets caught. Never the first.

usernamed7 · 2025-05-14T22:05:48 1747260348

1 request a minute?!? wow that's just absurd you get it for just looking through code.

rendaw · 2025-05-15T16:35:10 1747326910

I opened a repo in a spare computer browser and clicked on a couple things and got a rate limit error. It feels effectively unusable unless you're logged in now (couldn't search from before, now you can't even browse).

out-of-ideas · 2025-05-14T22:42:19 1747262539

agreed. when i first read the title i thought "oh what did the they up the rates to" - then i realized its more of a "downgraded rate limits"

thanks github for the worse experience

blinker21 · 2025-05-14T21:26:11 1747257971

I've hit this over the past week browsing the web UI. For some reason, github sessions are set really short and you don't realise you're not logged in until you get the error message.

I really wish github would stop logging me out.

Novosell · 2025-05-14T21:56:32 1747259792

Hmmmm, Github keeps me logged in for months I feel like. Unless I'm misunderstanding the github security logs, my current login is since march.

1oooqooq · 2025-05-15T01:58:28 1747274308

GH is Microsoft's most successful social network.

GH now uses the publisher business model, and as such, they lose money when you're logged out. same reason why google, fb, etc will not ask you for a password for decades.

dghlsakjg · 2025-05-15T01:57:32 1747274252

Something strange is going on. I think GH has kept me logged in for months at a time. I honestly can’t remember the last time I had to authenticate.

zarzavat · 2025-05-15T01:57:01 1747274221

Yes, it's not the rate limits that are the problem per se but GitHub's tendency to log you out and make you go through 2fa.

If they would let me stay logged in for a year then I wouldn't care so much.

tux3 · 2025-05-15T11:11:46 1747307506

You might be afflicted with some SSO or enterprise thing, I haven't logged into Github on my personal account in years.

zarzavat · 2025-05-16T02:06:28 1747361188

Nope, just normal GitHub account.

Though GitHub did force me to use 2fa earlier because they said I have a "popular repo", so perhaps my account is considered high risk. Or maybe it's triggered by travelling and changing IP locations? I have no clue, but it's annoying to have to 2fa more than once in a blue moon.

jakebasile · 2025-05-17T18:29:44 1747506584

I bump into these limits just using a few public install scripts for things like Atuin, Babashka, and Clojure on a single machine on my home IP. They're way too low to be reasonable.

ikiris · 2025-05-15T00:52:24 1747270344

1/min? That’s insanely low.

notatoad · 2025-05-15T01:17:25 1747271845

60/hr is not the same as 1/min, unless you're trying to continually make as many requests as possible, like a crawler. and if that is for your use case, then your traffic is probably exactly what they're trying to block.

zarzavat · 2025-05-15T02:01:58 1747274518

60/h is obviously well within normal human usage of an app and not bot traffic...

A normal rate limit to separate humans and bots would be something like 60 per minute. So it's about an order of magnitude too low.

mjevans · 2025-05-15T03:19:35 1747279175

Use case: crawling possibly related files based on string search hints in a repo you know nothing about...

Something on the order of 6 seconds a page doesn't sound TOO out of human viewing range depending on how quickly things load and how fast rejects are identified.

I could see ~10 pages / min which is 600 pages / hour. I could also see the argument that a human would get tired at that rate and something closer to 200-300 / hr is reasonable.

hansvm · 2025-05-15T11:11:29 1747307489

All of that assuming they're limiting based on human-initiated requests, not the 100x requests actually generated when you click a link.

PaulDavisThe1st · 2025-05-14T16:53:02 1747241582

Several people in the comments seem to be blaming Github for taking this step for no apparent reason.

Those of us who self-host git repos know that this is not true. Over at ardour.org, we've passed the 1M-unique-IP's banned due to AI trawlers sucking our repository 1 commit at a time. It was killing our server before we put fail2ban to work.

I'm not arguing that the specific steps Github have taken are the right ones. They might be, they might not, but they do help to address the problem. Our choice for now has been based on noticing that the trawlers are always fetching commits, so we tweaked things such that the overall http-facing git repo works, but you cannot access commit-based URLs. If you want that, you need to use our github mirror :)

soraminazuki · 2025-05-15T00:54:09 1747270449

Only they haven't started doing this right now. For many years, GitHub has been crippling unauthenticated browsing, doing it gradually to gauge the response. When unauthenticated, code search doesn't work at all and issue search stops working after like, 5 clicks at best.

This is egregious behavior because Microsoft hasn't been upfront about this while they were doing this. Many open source projects are probably unaware that their issue tracker has been walled off, creating headaches unbeknownst to them.

jonas21 · 2025-05-15T05:09:37 1747285777

Just sign in, problem solved. It baffles me that a site can provide a useful service that costs money to run, and all you need to do to use it is create a free account -- and people still find that egregious.

soraminazuki · 2025-05-15T05:25:32 1747286732

That's not how consent works. GitHub captured the open source ecosystem under the premise that its code and issue tracker will remain open to all. Silently changing the deal afterwards is reprehensible.

naikrovek · 2025-05-15T13:19:47 1747315187

> GitHub captured the open source ecosystem under the premise that its code and issue tracker will remain open to all. Silently changing the deal afterwards is reprehensible.

It still is "open to all", but you can't abuse the service and expect to retain the ability to abuse the service.

Also where is "silently" coming from? This whole HN page is because someone linked to an article announcing the change...

I'm not really a fan of Microsoft anymore, but some of you have (apparently long ago) turned the corner into "anything Microsoft does that I don't want Microsoft to do is clearly Microsoft being evil" and that is simply not a reality-based viewpoint. sometimes Microsoft is doing something which one could consider "evil", but without knowledge that something evil is happening, you're assuming that evil is happening, and that's not really a valid way to think about things if you want to be heard by anyone.

soraminazuki · 2025-05-15T14:58:33 1747321113

I repeat, this didn't start today. It has been happening for years. And no, browsing a few files or searching for an issue or two, which they totally kick in the rate limit for, isn't "abuse."

naikrovek · 2025-05-15T16:45:18 1747327518

They don’t rate limit someone who is browsing with a normal usage pattern. They did for a day or two, then discovered their mistake and fixed it.

> years

No.

> a few

I’ve always considered “a few” to be “between 3 and 12” and 60 is more than “a few”.

soraminazuki · 2025-05-15T17:29:41 1747330181

I'm speaking from direct experience over the past few years, from my home, work, and outside with a phone. Do you actually browse GitHub anonymously, or are you reflexively shifting blame?

If you need more proof, this is last year:

https://news.ycombinator.com/item?id=39322838

And this is another year before that:

https://news.ycombinator.com/item?id=36254129

Oh look, there's even visual proof in the discussion:

https://imgur.com/a/github-search-gated-behind-login-BT6uRIe

naikrovek · 2025-05-15T18:01:00 1747332060

Yes I actually browse GitHub anonymously. Not always but I do it every day. Never once had a problem.

In another browser I log in because I do work with code in GitHub frequently. I comment on issues and PRs and all the normal stuff.

I regularly drive two browsers, yes. I alternate between them multiple times per minute, often. In one, I am not logged in. In the other, I am logged in.

Not once have I hit any anonymous rate limit.

I respect one’s desire to use something without logging in, that’s fine. But what you do when you use up the free tier of a service is one of the following: A) you pay for the next tier, B) you (in this case) log in so that your usage is no longer considered “anonymous”, or C) you wait for the next usage measurement period to begin so that you can resume.

It’s their service and they can decide how they want to provide it, in the exact same way that you can decide how to provide any services that you might provide.

If it is your privacy that you are considering by not having an account, fine. By making that choice you are limiting yourself to whatever the services you use decide to give you, and you are entitled to nothing.

“I could do more in the past!” So what? They decided to let you do more in the past, and now they’ve decided to let you do less. They don’t owe you free services; you choose to use the free service and by doing so you’ve chosen to be bound by any usage caps that they decide to apply to you.

Nobody owes you free services AT ALL, but you’re getting them anyway. Instead of feeling entitled to more than you’re getting, maybe be thankful for what you have.

soraminazuki · 2025-05-15T18:23:46 1747333426

> Not once have I hit any anonymous rate limit.

I have a really hard time believing you on this. There's visual evidence from a year ago and it's consistent with my experience. And no, I haven't been hammering their servers.

https://imgur.com/a/github-search-gated-behind-login-BT6uRIe

> “I could do more in the past!” So what?

So, I'll repeat what I said in the first comment that you replied to. GitHub captured the open source ecosystem under the premise that its code and issue tracker will remain open to all. Silently changing the deal afterwards is reprehensible.

> Instead of feeling entitled

Again, I'll just repeat yet another one of my comments. Microsoft didn't just give, they're benefitting massively from open source. And they're looking to extract even more value through data mining from forced logins and stealing GPL licensed code by laundering it using AI. Open source projects that chose GitHub didn't agree to this!

> be thankful for what you have

You can't be serious. Yeah, be grateful for the trillion dollar company buying a service it didn't create, extracting as much value as they can from it in questionable ways and tearing up social contract!

brookst · 2025-05-15T13:57:02 1747317422

Are all contributors to open source under a lifetime obligation to never change their level of investment?

Kind of a rhetorical question I guess, for a while I maintained a small open source project and yes, I still get entitled “why did you even publish this if you’re not going to fix the bug I reported” comments. Like, sorry, but my life priorities changed over the intervening 15 years. Fork it and fix it.

soraminazuki · 2025-05-15T15:06:01 1747321561

Microsoft didn't just give, they're benefitting massively from open source. And they're looking to extract even more value through data mining from forced logins and stealing GPL licensed code by laundering it using AI. There's no room for sympathy here.

brookst · 2025-05-16T13:16:23 1747401383

There’s no sympathy in business. It’s a straw man to claim I’m looking for some emotional response.

But there is obligation. I’m asking if contributing to open source creates an obligation to do so forever, either for individuals or companies.

jnky · 2025-05-15T14:49:59 1747320599

All contributors to open source are not created equal. It is different when a literal 3 trillion dollar company does it, thus demonstrating they were unworthy of the trust and goodwill put in them. They have the money, they have the cloud infrastructure, they are doing all kinds of scraping themselves.

arkh · 2025-05-15T07:24:19 1747293859

Who could have known that Microsoft would pull some shenanigans?

Is 20 years too long ago to learn from then?

Embrace. Extend. Extinguish. This has never gone away.

IsTom · 2025-05-15T09:15:02 1747300502

When github was getting popular it was not owned by MS.

6031769 · 2025-05-15T14:50:21 1747320621

When github did not pull this sort of shenanigans it was not owned by MS.

hannob · 2025-05-15T05:21:12 1747286472

> Several people in the comments seem to be blaming Github for taking this step for no apparent reason.

I mean...

* Github is owned by Microsoft.

* The reason for this are AI crawlers.

* The reason AI crawlers exist in masses is an absurd hype around LLM+AI technology.

* The reason for that is... ChatGPT?

* The main investor of ChatGPT happens to be...?

1oooqooq · 2025-05-15T14:25:38 1747319138

almost like we bomb children because a politician told us to think of the children. crazy.

uallo · 2025-05-15T08:25:35 1747297535

That is also a problem on a side project I've been running for several years. It is based on a heavily rate-limited third-party API. And the main problem is that bots often cause (huge) traffic spikes which essentially DDoSes the application. Luckily, a large part of these bots can easily be detected based on their behaviour in my specific case. I started serving them trash data and have not been DDoSed since.

VladVladikoff · 2025-05-15T01:24:15 1747272255

Have you noticed significant slowdown and CPU usage from failban with that many banned IPs? I saw it becoming a huge resource hog with far less IPs than that.

PaulDavisThe1st · 2025-05-15T14:24:10 1747319050

Yeah, when we hit about 80-100k banned hosts, iptables causes issues.

There are versions of iptables available that apparently can scale to 1M+ addresses, but our approach is just to unban all at that point, and then let things accumulate again.

Since we because responding with 404 to all commit URLs, the rate of banned address accumulation has slowed down quite a bit.

knowitnone · 2025-05-15T00:15:16 1747268116

you mean AI crawlers from Microsoft, owners of Github?

haiku2077 · 2025-05-15T00:32:14 1747269134

The big companies tend to respect robots.txt. The problem is other, unscrupulous actors use fake user agents and residential IPs and don't respect robots.txt or act reasonably.

internetter · 2025-05-15T01:32:18 1747272738

Big companies have thrown robots.txt to the wind when it comes to their precious AI models.

sph · 2025-05-15T09:00:10 1747299610

Yeah, they have openly disregarded copyright law, it's not a puny robots.txt file that's gonna stop them.

haiku2077 · 2025-05-15T12:26:40 1747312000

robots.txt isn't just an on/off switch. You can set crawler rate limits in there that crawlers may choose to respect, and the big companies respect them- because it's in their interest to reduce their crawling cost and not send more requests than they need to.

However, these smaller companies are doing ridiculous things like scraping the same site many thousands of times a day, far more often than the content of the sites change.

PaulDavisThe1st · 2025-05-15T14:25:05 1747319105

I have no idea where they are from. I'd surprised if MS is using a network of 1M+ residential IP addresses, but they've surprised me before ...

londons_explore · 2025-05-14T23:41:41 1747266101

Surely most AI trawlers have special support for git and just clone the repo once?

Macha · 2025-05-14T23:47:43 1747266463

The AI companies could do work or they could not do work.

They've pretty widely chosen to not do work and just slam websites from proxy IPs instead.

You would think their products would be used by them to do the work if they worked as well as advertised...

ikiris · 2025-05-15T00:53:51 1747270431

I think you vastly overestimate the average dev and their care for handling special cases that are mostly other people’s aggregate problem.

koolba · 2025-05-15T03:41:09 1747280469

Can’t they use the AIs to do it?

1oooqooq · 2025-05-15T14:26:43 1747319203

not if you vibe coded your crawler

NBJack · 2025-05-15T00:12:30 1747267950

Apparently, the vibe coding session didn't account for it. /s

I would more readily assume a large social networking company filled with bright minds would have worked out some kind of agreement on, say, a large corpus of copyrighted training data before using it.

It's the wild wild west right now. Data is king for AI training.

whitehexagon · 2025-05-15T07:29:22 1747294162

If a company the size of MS isn't able handle the DOS caused by the LLM slurpers, then it really is game over for the open internet. We are going to need government approved ID based logins to even read the adverts at this rate.

But this feels like a further attempt to create a walled garden around 'our' source code. I say our, but the first push to KYC, asking for phone numbers, was enough for me to delete all and close my account. Being on the outside, it feels like those walls get taller every month. I often see an interesting project mentioned on HN and clone the repo, but more and more times that is failing. Trying to browse online is now limited, and they recently disabled search without an account.

For such a critical piece of worldwide technology infrastructure, maybe it would be better run by a not-for-profit independent foundation. I guess, since it is just git, anyone could start this, and migration would be easy.

notpushkin · 2025-05-15T09:17:34 1747300654

I’m pretty sure they can handle it, but given their continuous (if somewhat bittersweet) relationship with OpenAI, I’m pretty sure they are just trying to protect “their IP“ or something.

graemep · 2025-05-15T10:06:55 1747303615

These exist, and you can self host.

However, a lot of people think Github is the only option, and it benefits from network effects.

Non-profit alternatives suffer from a lack of marketing and deal making. True of most things these days.

brookst · 2025-05-15T13:51:29 1747317089

They also don’t have the resources to ensure perf and reliability if they get really popular, or to invest in UI and other goodness.

Still great for some applications and developers, but not all.

notpushkin · 2025-05-15T10:11:24 1747303884

> Non-profit alternatives suffer from a lack of marketing and deal making

Sad but true. I’m trying to promote these whenever I can.

ghssds · 2025-05-15T08:04:59 1747296299

you mean https://savannah.gnu.org?

notpushkin · 2025-05-15T08:30:55 1747297855

Or maybe https://codeberg.org/.

pdimitar · 2025-05-15T19:11:59 1747336319

Codeberg, Gitea, Forgejo.

jorams · 2025-05-14T15:58:38 1747238318

> These changes will apply to operations like cloning repositories over HTTPS, anonymously interacting with our REST APIs, and downloading files from raw.githubusercontent.com.

Or randomly when clicking through a repository file tree. The first time I hit a rate limit was when I was skimming through a repository on my phone, and about the 5th file I clicked I was denied and locked out. Not for a few seconds either, it lasted long enough that I gave up on waiting then refreshing every ~10 seconds.

zX41ZdbW · 2025-05-14T22:10:03 1747260603

This can affect hosting databases in GitHub repositories.

Yes, it does not look like an intended service usage, but I used it for a demo: https://github.com/ClickHouse/web-tables-demo/

Anyway, will try to do the same with GitHub pages :)

hardwaresofton · 2025-05-15T02:00:43 1747274443

Does it seem to anyone like eventually the entire internet will be login only?

At this point knowledge seems to be gathered and replicated to great effect and sites that either want to monetize their content OR prevent bot traffic wasting resources seem to have one easy option.

mjevans · 2025-05-15T03:21:12 1747279272

Static, Near Static (not generated on demand at least; generated only on real content update), and Login seems likely.

AI not caching things is a real issue. Sites being difficult TO cache / failing the 'wget mirror test' is the other side of the issue.

grishka · 2025-05-15T03:44:00 1747280640

What about AI not respecting robots.txt? I myself have never ran into this, but I've seen complaints of many people who did.

tonyhart7 · 2025-05-15T05:53:53 1747288433

"What about AI not respecting robots.txt?"

since when actor that want gather your entire data respect things like this??? how can you enforce such things with just "please don't crawl this directory thanks"

arkh · 2025-05-15T07:27:34 1747294054

> how can you enforce such things

A van, some balaclavas and 4 people with big sticks.

grishka · 2025-05-15T16:43:21 1747327401

You can't enforce them, but for the entire preceding history of the internet, most crawlers respected them. But then AI happened and those companies decided that their noble mission of forcing their slop into as many facets of human life as possible is above some stupid rules.

jrochkind1 · 2025-05-15T13:00:07 1747314007

This means it's no longer safe to point to github-hosted repos in `git:` or `github:` dependencies in ruby bundler, yes?

I forget because I don't use them, but weren't there some products meant as dependency package repositories that github had introduced at some point, for some platforms? Does this apply to them? (I would hope not unless they want to kill them?)

This rather enormously changes github's potential place in ecosystems.

What with the poor announcement/rollout -- also unusual for what we expect of github, if they had realized how much this effects -- I wonder if this was an "emergency" thing not fully thought out in response to the crazy decentralized bot deluge we've all been dealing with. I wonder if they will reconsider and come up with another solution -- this one and the way it was rolled out do not really match the ingenuity and competence we usually count on from github.

I think it will hurt github's reputation more than they realize if they don't provide a lot more context, with suggested workarounds for various use cases, and/or a rollback. This is actually quite an impactful change, in a way that the subtle rollout seems to suggest they didn't realize?

Animats · 2025-05-15T07:29:07 1747294147

Are the scraper sites using a large number of IP addresses, like a distributed denial of service attack? If not, rather than explicit blocking, consider using fair queuing. Do all the requests from IP addresses that have zero requests pending. Then those from IP addresses with one request pending, and so forth. Each IP address contends with itself, so making massive numbers of requests from one address won't cause a problem.

I put this on a web site once, and didn't notice for a month that someone was making queries at a frantic rate. It had zero impact on other traffic.

jashmatthews · 2025-05-15T08:52:52 1747299172

Exactly that. It's an arms race between companies that offer a large number of residential IPs as proxies and companies that run unauthenticated web services trying not to die from denial of service.

https://brightdata.com/

fellerts · 2025-05-15T07:43:57 1747295037

Huh, that sounds very reasonable, and it's the first time I've heard it mentioned. Why isn't this more wide-spread?

hombre_fatal · 2025-05-15T12:20:30 1747311630

Complex, stateful.

I'm not even sure what that would look like for a huge service like GitHub. Where do you hold those many thousands of concurrent http connections and their pending request queues in a way that you can make decisions on them while making more operational sense than a simple rate limit?

A lot of things would be easy if it were viable to have one big all-knowing giga load balancer.

I remember Rap Genius wrote a blog post whining that Heroku did random routing to their dynos instead of routing to the dyno with the shortest request queue. As opposed to just making an all-knowing infiniscaling giga load balancer that knows everything about the system.

10000truths · 2025-05-15T17:38:26 1747330706

A giga load balancer is no less viable than a giga Redis cache or a giga database. Rate limiting is inherently stateful - you can't rate limit a request without knowledge of prior requests, and that knowledge has to be stored somewhere. You can shift the state around, but you can't eliminate it.

Sure, some solutions tend to be more efficient than others, but those typically boil down to implementation details rather than fundamental limitations in system design.

Animats · 2025-05-15T22:47:37 1747349257

> Where do you hold those many thousands of concurrent http connections and their pending request queues in a way that you can make decisions on them while making more operational sense than a simple rate limit?

Holding open an idle HTTP connection is cheap today. That's the use case for "async". Servicing a Github fetch is much more expensive.

Animats · 2025-05-15T22:52:37 1747349557

Because it doesn't help against DDOS attacks, with bogus request sources.

It's a good mitigation when you have legit requests, and some requestors create far more load than others. If Github used fair queuing for authenticated requests, heavy users would see slower response, but single requests would be serviced quickly. That tends to discourage overdoing it.

Still, if "git clone" stops working, we're going to need a Github alternative.

jclulow · 2025-05-15T07:45:13 1747295113

Yes, LLM-era scrapers are frequently making use of large numbers of IP addresses from all over the place. Some of them seem to be bot nets, but based on IP subnet ownership it seems also pretty frequently to be cloud companies, many of them outside the US. In addition to fanning out to different IPs, many of the scrapers appear to use User Agent strings that are randomised, or perhaps in some cases themselves generated by the slop factory. It's pretty fucking bleak out there, to be honest.

Animats · 2025-05-15T20:31:06 1747341066

Sounds like a violation of the Computer Fraud and Abuse Act. If a big company training an LLM is doing that, it should be possible to find them and have them prosecuted.

jrochkind1 · 2025-05-14T21:01:44 1747256504

Wow, I'm realizing this applies to even browsing files in the web UI without being logged in, and the limits are quite low?

This rather significantly changes the place of github hosted code in the ecosystem.

I understand it is probably a response to the ill-behaved decentralized bot-nets doing mass scraping with cloaked user-agents (that everyone assumes is AI-related, but I think it's all just speculation and it's quite mysterious) -- which is affecting most of us.

The mystery bot net(s) are kind of destroying the open web, by the counter-measures being chosen.

thih9 · 2025-05-14T19:23:12 1747250592

What does “secondary” stand for here in the error message?

> You have exceeded a secondary rate limit.

Edit and self-answer:

> In addition to primary rate limits, GitHub enforces secondary rate limits

(…)

> These secondary rate limits are subject to change without notice. You may also encounter a secondary rate limit for undisclosed reasons.

https://docs.github.com/en/rest/using-the-rest-api/rate-limi...

pogue · 2025-05-14T06:34:11 1747204451

I assume they're trying to keep ai bots from strip mining the whole place.

Or maybe your IP/browser is questionable.

globie · 2025-05-14T16:34:35 1747240475

What's being strip mined is the openness of the Internet, and AI isn't the one closing up shop. Github was created to collaborate on and share source code. The company in the best position to maximize access to free and open software is now just a dragon guarding other people's coins.

The future is a .txt file of John Carmack pointing out how efficient software used to be, locked behind a repeating WAF captcha, forever.

roughly · 2025-05-15T00:43:39 1747269819

AI isn't the one closing up shop, it’s the one looting all the stores and taking everything that isn’t bolted down. The AI companies are bad actors that are exploiting the openness of the internet in a fashion that was obviously going to lead to this result - the purpose of these scrapers is to grab everything they can and repackage it into a commercial product which doesn’t return anything to the original source. Of course this was going to break the internet, and people have been warning about that from the first moment these jackasses started - what the hell else was the outcome of all this going to be?

globie · 2025-05-15T05:53:19 1747288399

This rings the same tune as the MPAA and RIAA utilizing lawfare to destroy freedom online when pirates were the ones "break[ing] the internet."

Could you help me understand what the difference is between your point and the arguments MPAA and RIAA used to ruin the torrent users' lives they concluded were "thieves"?

As a rule of thumb, do you think people who are happy with the services they contribute content to being open access and wish them to remain so should be the ones who are forced to constantly migrate to new services to keep their content free?

When AI can perfectly replicate the browsing behavior of a human being, should Github restrict viewing a git repository to those who have verified blood biometrics or had their eyes scanned by an Orb? If they make that change, will you still place blame on "jackasses"?

roughly · 2025-05-15T17:04:46 1747328686

The moral argument in favor of piracy was that it didn’t cost the companies anything and the uses were noncommercial. Neither of those applies to the AI scrapers - they’re aggressively overusing freely-provided services (listen to some of the other folks on this thread about how the scrapers behave) and they’re doing so to create a competing commercial products.

I’m not arguing you shouldn’t be annoyed by these changes, I’m arguing you should be mad at the right people. The scrapers violated the implicit contract of the open internet, and now that’s being made more explicit. GitHub’s not actually a charity, but they’ve been able to provide a free service in exchange for the good will and community that comes along with it driving enough business to cover their costs of providing that service. The scrapers have changed that math, as they did with every other site on the internet in a similar fashion. You can’t loot a store and expect them not to upgrade the locks - as the saying goes, the enemy gets a vote on your strategy, too.

globie · 2025-05-15T20:30:39 1747341039

There are plenty of commercial pirates, and those commercial uses were grouped in with noncommercial sharing in much the same way you are doing with scraping. Am I wrong in assuming most of this scraping comes from people utilizing AI agents for things like AI-assisted coding? If an AI agent scrapes a page at a users' request (say the 1 billionth git commit scraped today), do you consider that "loot[ing] a store"? What got looted? Is it the bandwidth? The CPU? Or does this require the assumption that the author of that commit wouldn't be excited that their work is being used?

I'd like to focus on your strongest point, which is the cost to the companies. I would love to know what that increase in cost looks like. You can install nginx on a tiny server and serve 10k rps of static content, or like 50 (not 50k) rps of a random web framework that generates the same content. So this increase in cost must be weighed against how efficient the software serving that content is.

If this Github post included a bunch of numbers and details demonstrating how they have reached the end of the line on optimizing their web frontend, they have ran out of things to cache, and the increase in costs is a real cause for concern to the company (not just a quick shave to the bottom line, not a bigger net/compute check written from Github to their owners), I'd throw my hands up with them and start rallying against the (unquestionably inefficient and on the line of hostile) AI agent scrapers causing the increase in traffic.

Because they did not provide that information, I have to assume that Github and Microsoft are doing this out of pure profit motivations and have abandoned any sense of commitment to open access of software. In fact, they have much to gain from building the walls of their garden up as high as they can get away with, and I'm skeptical their increase in costs is very material at all.

I would rather support services that don't camouflage as open and free software proponents one day and victims of a robbery on the next. I still think this question is important and valid: There is tons of software on Github written by users who wish for their work to remain open access. Is that the class of software and people you believe should be shuffled around into smaller and smaller services that haven't yet abandoned the commitments that allowed them to become popular?

roughly · 2025-05-16T00:04:02 1747353842

> There are plenty of commercial pirates, and those commercial uses were grouped in with noncommercial sharing

I don't think many people were particularly sympathetic to people making money off piracy - by and large, people were upset because people committing piracy for personal use were getting hit with the kinds of fines and legal charges usually reserved for, well, people who make money off piracy.

> Am I wrong in assuming most of this scraping comes from people utilizing AI agents for things like AI-assisted coding?

Yes. The huge increases in traffic aren't from, say, Claude going and querying Github when you ask it to, it's from the scraping to drive the initial training process. Claude and the others know the first thing about code because Github and StackOverflow were part of their training corpus, because the companies which made them scraped the whole damn site and used it as part of their training data for making a ~competing product. That's what Github's reacting to, that's what Reddit reacted to, that's what everyone's been reacting to - it's the scraping of the data for training that's leading to these reactions.

To be clear, because I think this is maybe a core of our disagreement: The problem that's leading to this isn't LLM agents acting on behalf of a user - it's not that Cursor googled python code for you - it's that the various companies training the models are aggressively scraping everything they can get their hands on. It's not one request for one repo on behalf of one user, it's the wholesale scraping of everything on the site by a rival company to make a rival product, most likely in violation of terms of service and certainly in violation of anything that anyone could reasonably assume another corporate entity would stand for. Github's not mad at you, they're mad at OpenAI.

> There is tons of software on Github written by users who wish for their work to remain open access. Is that the class of software and people you believe should be shuffled around into smaller and smaller services that haven't yet abandoned the commitments that allowed them to become popular?

You store your money in a bank. The bank gets robbed repeatedly by an organized group of serial bank robbers, and increases security at the branch. You move your money to another bank, because the increased security annoys you. You understand the problem here may repeat itself elsewhere as well, right?

globie · 2025-05-19T17:29:09 1747675749

>You understand the problem here may repeat itself elsewhere as well, right?

I do, and how is this to cap off our discussion:

>You move your money to another bank, because the increased security annoys you.

On my way out, I would quote Benjamin Franklin: "Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."

I go out of my way to help my community, and I expect those I support to do the same. That bank could put more resources into investigating/catching the robbers who are attacking not just the bank but the security and liberty of their community, or it could treat every customer with more suspicion than they did yesterday. I know where the latter option leads, and I won't stand for it.

You're right that it repeats itself elsewhere. Often, I find.

Dylan16807 · 2025-05-15T07:36:42 1747294602

> Could you help me understand what the difference is

Well the main difference is that this is being used to justify blocking and not demanding thousands of dollars.

> When AI can perfectly replicate the browsing behavior of a human being

They're still being jackasses because I'm willing to pay to give free service to X humans but not 20X bots pretending to be humans.

lionkor · 2025-05-15T19:49:33 1747338573

Free and open source software is on GitHub, but AI- and other crawlers do not respect the licenses. As someone who writes a lot of code under specific FOSS licenses, I welcome any change that makes it harder for machines to take my code and just steal it

voidnap · 2025-05-14T06:52:07 1747205527

I encountered this on github last week. Very agressive rate limiting. My browser and IP is very ordinary.

Since Microsoft is struggling to make ends meet, maybe they could throw a captcha or proof of work like Anubis by xe iaso.

They already disabled code search for unauthenticated users. Its totally plausible they will disable code browsing as well.

kstrauser · 2025-05-14T15:41:14 1747237274

That hit me, too. I thought it was an accidental bug and didn’t realize it was actually malice.

miohtama · 2025-05-15T07:25:06 1747293906

Just sign in if it's an issue for your usage.

voidnap · 2025-05-15T07:52:25 1747295545

My usage isn't high. I was rate limited to like 5 requests per minute. It was a repo with several small files.

And seriously if they keep this up, with limits on their web interface but leave unauthenticated cloning allowed, I'd rather clone the repo than log in.

GitHub code browsing went south since microsoft bought them anyway. Having a simple proxy that clones a repo and serves it would solve problems with rate limits and their awful UX.

confusing3478 · 2025-05-14T06:56:55 1747205815

> Or maybe your IP/browser is questionable.

I'm using Firefox and Brave on Linux from a residential internet provider in Europe and the 429 error triggers consistantly on both browsers. Not sure I would consider my setup questionable considering their target audience.

grodriguez100 · 2025-05-14T06:59:39 1747205979

I’m browsing from an iPhone in Europe right now and can browse source code just fine without being logged in.

JeremyStinson · 2025-05-15T05:45:15 1747287915

Then it means they're looking at the User-Agent string and determining that an iPhone in Europe most likely has a human using it, and might not require rate-limiting.

tostr · 2025-05-14T06:47:43 1747205263

*other ai bots, ms will obviously mine anything on there.

Personally, I like sourcehut (sr.ht)

immibis · 2025-05-14T15:29:01 1747236541

Same way Reddit sells all its content to Google, then stops everyone else from getting it. Same way Stack Overflow sells all its content to Google, then stops everyone else from getting it.

(Joke's on Reddit, though, because Reddit content became pretty worthless since they did this, and everything before they did this was already publicly archived)

croes · 2025-05-14T12:41:05 1747226465

Other bots or MS bots too?

jhgg · 2025-05-14T21:49:21 1747259361

The truth is this won't actually stop AI crawlers and they'll just move to a large residential proxy pool to work around it. Not sure what the solution is honestly.

dmitrygr · 2025-05-15T00:00:33 1747267233

Criminal charges under CFAA to actual CEOs of actual companies doing this, with long jail terms.

latentsea · 2025-05-15T01:13:25 1747271605

I don't know if I ever recall seeing a CEO go to jail for practically anything, ever. I'm sure there are lots of examples, but at this point in my life I have kind of derived a rule of thumb of "if you want to commit a crime, just disguise it as a legitimate business" based off seeing so many times where CEOs get off scott free .

simoncion · 2025-05-15T03:02:36 1747278156

FWIW, Joseph Nacchio (the CEO of Qwest Communications) went to jail for like a decade for refusing to help the NSA violate FISA circa 2001.

umbra07 · 2025-05-15T04:54:42 1747284882

He went to jail because he refused to help the NSA violate FISA, and then he sold the stock knowing that his refusal would cause the stock to drop (i.e. insider trading). His conviction was entirely on the basis of insider trading. The SEC went after him, not the NSA or DOJ or whatever.

dragonwriter · 2025-05-15T04:58:44 1747285124

> His conviction was entirely on the basis of insider trading. The SEC went after him, not the NSA or DOJ or whatever.

The SEC has no criminal prosecution powers; all they can do in that regard is write a note asking the DOJ to pretty-please look into something. The only way to get a federal (civilian) criminal conviction is to have the DOJ go after you.

miohtama · 2025-05-15T07:22:57 1747293777

Criminally charging Russian and Chinese does not work. The solution would be to drop these contries off the internet if we want to play hard.

The US cannot even stop NSO to hack the system with spyware and Israel is a political ally.

dmitrygr · 2025-05-15T09:29:19 1747301359

Most of ML training crawlers hitting my site come from USA. Should we drop USA too?

DonHopkins · 2025-05-15T13:36:03 1747316163

"US Out of North America!" -Ironic Protest T-Shirt and Graffiti (Social-Revolutionary Anarchist Federation) ;)

https://the-t-shirt-chronicles.com/2022/02/15/u-s-out-of/

miohtama · 2025-05-15T07:21:12 1747293672

At GitHub scale, crawlers will run out of IP addresses regardless if they are use residential addresses.

croemer · 2025-05-15T00:52:03 1747270323

The blog post is tagged with "improvement" - ironic for more restrictive rate limits.

Also, neither the new nor the old rate limits are mentioned.

pdimitar · 2025-05-15T19:19:40 1747336780

A take that I'm not seeing in all the "LLM scrapers are heading to our site, run for your lives!" threads is this:

Why can't people harden their software with guards? Proper DDoS protection? Better caching? Rewrite the hot paths in C, Rust, Zig, Go, Haskell etc.?

It strikes me as very odd, the atmosphere of these threads. So much doom and gloom. If my site was hit by an LLM scraper I'd be like "oh, it's on!", a big smile, and I'll get to work right away. And I'll have that work approved because I'll use the occasion to convince the executives of the need. And I'll have tons of fun.

Can somebody offer a take on why are we, the forefront of the tech sector, just surrendering almost without a single shot?

lionkor · 2025-05-15T19:47:37 1747338457

Because our sites are written in layers of abstraction and terrible design, which leads to requests taking serious server resources. If we hosted everything "well", you'd get a few 10-20k req/s per CPU core, but we aren't.

pdimitar · 2025-05-15T20:00:53 1747339253

True. I am simply wondering -- is the resistance from executives' so powerful that it can be never overpowered? Can't we ever just tell them "Look, this is like your car with plastic suspension -- it will work for a few days or even months but we can't rely on it forever; it's time to do it proper"?

Especially when the car's plastic suspension is costing them extra money? I don't get it here, for real. I would think that selfish capitalistic interests would have them come around at one point! (Clarification: invest $5M for a year before the whole thing starts costing you extra $30M a year, for example.)

And don't even get me started on the fact that GitHub is written in one of the most hardware-inefficient web frameworks (Rails). I mean OK, Rails is absolutely great for many things because most people's business will never scale as much and as such the initial increased developer velocity is an unquestionable one-sided win. I get that and I stopped hating Rails long time ago (even though I dislike it; but I do recognize where it's a solid and even preferred choice). But I've made a lot of money from trying to modernize and maintain Rails monoliths; it's just not suited for one scale and on -- without paying for extremely expensive consultants that is. It's like, everything can be made to work but it does start costing exponentially more from one scale and further up.

And yet nobody at GitHub figures "Maybe it's time we rewrite some of the hot paths?" or just "Make more aggressive caching even if it means some users see data outdated by 30 seconds or so"? Nothing at all?

Sorry, I am kind of ranting and not really saying anything to you per se. I am just very puzzled about how paralyzed GitHub seems under Microsoft.

lionkor · 2025-05-15T22:04:02 1747346642

I'm fully with you.

However, execs I know lease cars, not buy them, for that exact reason. You don't care if the suspension is made of plastic, if it's a subscription model. The metaphor very much falls apart but I had a point somewhere.

pdimitar · 2025-05-16T00:18:34 1747354714

Oh yeah. That. :/

Well, there's a solution for that as well: execs should be liable for a number of years even after they move on. Para-troopers that swoop in, reap rewards they never worked for, and parachute away with the gold is something that must be legislated against, hard and aggressive. People should go to jail.

But... these are the people who make the rules so not happening, right?

Oh well, better luck to us in the next life I guess.

lionkor · 2025-05-16T17:01:15 1747414875

I found the best way to improve the product, as a programmer, is to know when you can squeeze in little refactors and improvements. My bosses learn to appreciate this pretty quickly.

Zdh4DYsGvdjJ · 2025-05-14T07:11:27 1747206687

This was announced https://github.blog/changelog/2025-05-08-updated-rate-limits...

dang · 2025-05-14T15:24:14 1747236254

(This was originally posted as a reply to https://news.ycombinator.com/item?id=43981344 but we're merging the threads)

croes · 2025-05-14T12:42:21 1747226541

Doesn‘t make it any better.

Collateral damage of AI I guess

formerly_proven · 2025-05-14T13:55:11 1747230911

It's even more hilarious because this time it's Microsoft/Github getting hit by it. (It's funny because MS themselves are such a bad actor when it comes to AIAIAI).

fragmede · 2025-05-14T14:08:58 1747231738

This is the same Microsoft that owns LinkedIn which got sued by HiQ which is where the ruling came from that is making sites login required.

immibis · 2025-05-14T15:28:01 1747236481

Wow! Website terms of use actually meant something in a court of law!

fragmede · 2025-05-14T16:27:21 1747240041

that wasn't what the case was about, so not really.

londons_explore · 2025-05-15T04:03:50 1747281830

Most of these unauthenticated requests are read-only.

All of public github is only 21TB. Can't they just host that on a dumb cache and let the bots crawl to their heart's content?

yorwba · 2025-05-15T04:49:55 1747284595

I guess you're getting the size from the Arctic Code Vault? https://github.blog/news-insights/company-news/github-archiv... That was 5 years ago and is presumably in git's compressed storage format. Caching the corresponding GitHub HTML would take significantly more.

TheDong · 2025-05-15T09:24:27 1747301067

You're talking about the 21TB captured to the arctic code vault, but that 21TB isn't "all of public github"

Quoting from https://archiveprogram.github.com/arctic-vault/

> every *active* public GitHub repository. [active meaning any] repo with any commits between [2019-11-13 and 2020-02-02 ...] The snapshot consists of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size

So no files larger than 100KB, no commit history, no issues or PR data, no other git metadata.

If we look at this blog post from 2022, the number we get is 18.6 PB for just git data https://github.blog/engineering/architecture-optimization/sc...

Admittedly, that includes private repositories too, and there's no public number for just public repositories, but I'm certain it's at least a noticeable fraction of that ~19PB.

kvemkon · 2025-05-15T11:56:59 1747310219

> At GitHub, we store a lot of Git data: more than 18.6 petabytes of it, to be precise.

About $ 250 000 for 1000 HDDs and you get all the data. Meaning private persons such as top FAANG engineers could get a copy of the whole data after 2-3 years job. For companies dealing with AI such raw price is nothing at all.

jarofgreen · 2025-05-14T06:51:56 1747205516

Also https://github.com/orgs/community/discussions/157887 "Persistent HTTP 429 Rate Limiting on *.githubusercontent.com Triggered by Accept-Language: zh-CN Header" but the comments show examples with no language headers.

I encountered this too once, but thought it was a glitch. Worrying if they can't sort it.

Euphorbium · 2025-05-14T07:09:35 1747206575

I remember getting this error a few months ago, this does not seem like a temporary glitch. They dont want llm makers to slurp all the data.

new_user_final · 2025-05-14T07:15:03 1747206903

Isn't git clone faster than browsing web?

PaulDavisThe1st · 2025-05-14T16:49:27 1747241367

Yep. But AI trawlers don't use it. Ask them why.

jopsen · 2025-05-14T22:44:13 1747262653

Do we know it's AI trawlers?

And not just generally degenerate bots? Or just one evil bot network?

PaulDavisThe1st · 2025-05-14T23:32:52 1747265572

Is there any difference between those 3?

trallnag · 2025-05-14T17:11:39 1747242699

Good that tools like Homebrew that heavily rely on GitHub usually support environment variables like GITHUB_TOKEN

jrochkind1 · 2025-05-14T20:57:17 1747256237

Did I miss where it says what the new rate limits are? Or are they secret?

mmsc · 2025-05-14T22:25:25 1747261525

Even with authenticated requests, viewing a pull request and adding `.diff` to the end of the URL is currently ratelimited at 1 request per minute. Incredibly low, IMO.

soraminazuki · 2025-05-15T01:10:52 1747271452

This is going to make the job of package managers a PITA. Especially Nix.

spacephysics · 2025-05-14T21:14:15 1747257255

Probably to throttle scraping from AI competitors, and have them pay for the privilege as many other services have been doing

InfiniteLoup · 2025-05-14T15:59:08 1747238348

How would this affect Go dependencies?

athorax · 2025-05-14T16:04:26 1747238666

Go doesn't pull dependencies directly from GitHub, they are pulled from https://proxy.golang.org/ by default

watermelon0 · 2025-05-14T06:53:42 1747205622

Time for Mozilla (and other open-source projects) to move repositories to sourcehut/Codeberg or self-hosted Gitlab/Forgejo?

tonyhart7 · 2025-05-15T05:56:12 1747288572

time to move (alternative github experience) is works, until those crawler goes into alternative then forcing those rate limit as well

gsich · 2025-05-14T13:03:43 1747227823

Not Mozilla.

stevekemp · 2025-05-14T17:14:58 1747242898

Once again people post in the "community", but nobody official replies; these discussion-pages are just users shouting into the void.

knowitnone · 2025-05-15T00:13:47 1747268027

you mean you want to better track users

micw · 2025-05-14T13:20:44 1747228844

See also: https://github.com/orgs/community/discussions/159123

xnx · 2025-05-14T14:07:21 1747231641

It sucks that we've collectively surrendered the urls to our content to centralized services that can change their terms at any time without any control. Content can always be moved, but moving the entire audience associated with a url is much harder.

turblety · 2025-05-14T15:30:23 1747236623

Gitea [1] is honestly awesome and lightweight. I've been running my own for years, and since they've put Actions in a while ago (with GitHub compatibility) it does everything I need it to. It doesn't have all the AI stuff in it (but for some that's a positive :P)

1. https://about.gitea.com/

kstrauser · 2025-05-14T15:39:43 1747237183

Gitea’s been great, but I think a lot of its development has moved to Forgejo: https://forgejo.org/

That’s what I run on my personal server now.

homebrewer · 2025-05-14T21:46:26 1747259186

I'm stuck on the latest gitea (1.22) that still supports migration to forgejo and unsure where to go next. So I've been following both projects (somewhat lazily), and it seems to me that gitea has the edge on feature development.

Forgejo promised — but is yet to deliver any — interesting features like federation; meanwhile the real features they've been shipping are cosmetic changes like being able to set pronouns in your profile (and then another 10 commits to improve that...)

If you judge by very superficial metrics like commit counts, forgejo's count is heavily inflated by merges (which gitea development process doesn't use, preferring rebase), and frequent dependency upgragdes. When you remove that, the remaining commits represent maybe half of gitea's development activity.

So I expect to observe both for another year before deciding on where to upgragde. They're too similar at the moment.

FWIW, one of gitea larger users — Blender — continues to use and sponsor gitea and has no plans to switch AFAIK.

kstrauser · 2025-05-15T00:44:16 1747269856

That's an interesting perspective, and I can't strongly disprove it, but that doesn't match my impression. I cloned both repos (Gitea's from GitHub; Forgejo's from Codeberg, which runs on Forgejo) and ran this command:

  git log --since="1 year ago" --format="%an" | sort | uniq -c | sort -n | wc -l

to get an overview of things. That showed 153 people (including a small handful of bots) contributing to Gitea, and 232 people (and a couple bots) contributing to Forgejo. There are some dupes in each list, showing separate accounts for "John Doe" and "johndoe", that kind of thing, but the numbers look small and similar to me so I think they can be safely ignored.

And it looks to me like Forgejo is using a similar process of combining lots of smaller PR commits into a single merge commit. The wide majority of its commits since last June or so seem to be 1-commit-per-PR. Changing the above command to `--since="2024-07-1"` reduces the number of unique contributors to 136 for Gitea, 217 for Forgejo. It also shows 1228 commits for Gitea and 3039 for Forgejo, and I do think that's a legitimately apples-to-apples comparison.

If we brute force it and run

  git log --since="1 year ago" | rg '\(\#\d{4,5}\)' | wc -l

to match lines that mention a PR (like "Simplify review UI (#31062)" or "Remove `title` from email heads (#3810)"), then I'm seeing 1256 PR-like Gitea commits and 2181 Forgejo commits.

And finally, their respective activity pages (https://github.com/go-gitea/gitea/pulse/monthly and https://codeberg.org/forgejo/forgejo/activity/monthly) show a similar story.

I'm not an expert in methodology here, but from my initial poking around, it would seem to me that Forgejo has a lot more activity and variety of contributors than Gitea does.

oynqr · 2025-05-15T05:39:56 1747287596

I've successfully migrated from Gitea 1.23 by just rolling back migrations manually in SQL to where forgejo supports it again. Of course, I had backups.

mappu · 2025-05-14T23:18:27 1747264707

The development energy has not really moved, Gitea is moving much faster. Forgejo is stuck two versions behind and with their license change they're struggling to keep up.

TheNewsIsHere · 2025-05-14T15:52:28 1747237948

I’ve almost completed the move of my business from GitHub’s corporate offering to self-hosted Forgejo.

Almost went with Gitea, but the ownership structure is murky, feature development seems to have plateaued, and they haven’t even figured out how to host their own code. It’s still all on GitHub.

I’ve been impressed by Forgejo. It’s so much faster than Github to perform operations, I can actually backup my entire corpus of data in a format that’s restorable/usable, and there aren’t useless (AI) upsells cluttering my UX.

kstrauser · 2025-05-14T16:02:41 1747238561

I agree with every word of that.

For listeners at home wondering why you'd want that at all:

I want a centralized Git repo where I can sync config files from my various machines. I have a VPS so I just create a .git directory and start using SSH to push/pull against it. Everything works!

But then, my buddy wants to see some of my config files. Hmm. I can create an SSH user for him and then set the permissions on that .git to give him read-only access. Fine. That works.

Until he improves some of them. Hey, can I give him a read-write repo he can push a branch to? Um, sure, give me a bit to think this through...

And one of his coworkers thinks this is fascinating and wants to look, too. Do I create an SSH account for this person I don't know well at all?

At this point, I've done more work than just installing something like Forgejo and letting my friend and his FOAF create accounts on it. There's a nice UI for configuring their permissions. They don't have SSH access directly into my server. It's all the convenience of something like GitHub, except entirely under my control and I don't have to pay for private repos.

jarofgreen · 2025-05-14T07:10:26 1747206626

https://github.com/orgs/community/discussions/157887 This has been going on for weeks and is clearly not a simple mistake.

dang · 2025-05-14T15:23:26 1747236206

(We detached this subthread from https://news.ycombinator.com/item?id=43981673 so we could include it in the merged thread)

amai · 2025-05-14T13:25:46 1747229146

Triggered by Chinese language on the client side? Interesting.

radicality · 2025-05-14T07:03:00 1747206180

Just tried it on chrome incognito on iOS and do hit this 429 rate limit :S That sucks, it’s already bad enough when GitHub started enforcing login to even do a simple search.