Hacker News new | past | comments | ask | show | jobs | submit login

I had always assumed it was for a technical reason around not having to keep old threads in hot storage.



Kind of ridiculous that any tin-pot hobby forum can keep decades of threads available but the biggest services can't. Where's economies of scale? It's not supposed to work like that.


I can tell you why reddit does it. Because their entire architecture is built around new data being hot and old data not.

The tiny tin-pot hobby forum can keep every single post in memory and it's not really a problem. They can also do a full database scan pretty quickly.

But reddit can't do that. If it's not in the cache, it takes a lot of work to pull the data from the database. And there is no way to cache the entire dataset. That's why threads get locked at 6 months. So they can be statically archived for quick access.

Economies of scale isn't really part of it, it's more about moore's law.


The amount of actual user text content on even a comparatively large forum is ... surprisingly small.

When Google+ was shutting down, I did estimating of the total amount of public content within the Communities feature. Median size of a post was pretty close to Tweet-sized --- 120 characters or so. (G+ could ingest very large posts --- I never hit a limit though I wouldn't be surprised if it was book sized.) The highly active user population was maybe 12 million (another 300m or so posted at least once). Volume seems to have been about a million posts per week, for six years.

Which works out to less than 100GB of actual user-contributed text.

https://old.reddit.com/r/plexodus/comments/afbnvd/so_how_big...

Adding in the rest of G+ multiplied that out a few times, but likely still well under 1TB data. Images were asssociated with about 1/3 of posts and weighed in at a few MB each, call it 3 MB --- so a picture is worth 24,000 posts.

Rendered and delivered, page weight was just under 1 MB (excluding graphics), for a payload-to-page ratio of 0.015%. (Ironically, about the same as the ratio of monthly posting users to all accounts.)

But if you wanted to extract and store just user content and metadata, excluding video, storage requirements are surprisingly modest.

Facebook data aren't clear, but:

> There are 2.375bn billion monthly active users (as of Q3 2018).

> In a month, the average user likes 10 posts, makes 4 comments, and clicks on 8 ads.

> Hive is Facebook’s data warehouse, with 300 petabytes of data.

> Facebook generates 4 new petabytes of data per day.

https://www.brandwatch.com/blog/facebook-statistics/

I'm going to assume much of the 4 PB is system data, not user-generated.

2.4 billion users posting 4 x 120 byte comments/mo works out to ~15 TB of textual data per year.

Values are extrapolated, unconfirmed. Corrections or suggestions welcomed.


There’s no cost benefit for a small forum to remove old threads. Reddit would save hundreds of terabytes doing it though.


"hundreds of terabytes" of actual pure text. I find it extremely unlikely that this is actually the case.



But reddit doesn't remove anything (AFAIK), it just makes it read only.


If you can render an old thread (as opposed to pre-rendered HTML and I doubt that is done for forums), there is no storage cost or performance benefit to preventing comments on it.

New comments don't need to live in the same read-only storage as the old ones. They just need to be found when the old ones are rendered, and that's easy - the new comments are in hot storage after all.


I agree with the first half of this. The second half implies that the original designer felt that historical comments were important, which I think any social MVP will disagree with.


There is no reason unless you do some aggressive caching but forum software did not do a good job of making it clear that it's an old thread that has been revived.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: