> You can still do that now, but it's not as important
I think you hit the nail here.
Most contemporary journalism is just not worth archiving.
There is a small percentage that definitely is worth archiving, but the bulk of it is produced for the moment.
> Most contemporary journalism is just not worth archiving.
I don't agree. There are many interesting tidbits which go unnoticed as bog-standard / disposable news until the dots connect in the future, and one wants to construct the history backwards from that tipping point.
When these "contemporary articles" go missing, many important details of the history is lost.
One of the people who recovered a lot of old Usenet archives once remarked that the reality was that no one cares about the details of some long ago SunOS bug whereas a lot of the cultural discussions about government policies etc. provide a window into the time.
Are you really interested in reading a 50 year old newspaper article about a collision that happened midday on Saturday afternoon in your community newspaper?
Of course not. BUT on the other hand that particular collision may become highly relevant in those 50 years. Imagine your own scenario where the facts reported on the day might conflict with something claimed to be true much later, and the person wasn’t notable at the time but they’re highly notable now.
The issue is we don’t know what may be important after many years, decades or centuries have passed. Given how easily text compresses, it would be a shame to not have at least text of real news sources archived perpetually. I know there is also an endless stream of listicles which are just generated to get clicks and ad views, but newspapers are worth archiving.
> Imagine your own scenario where the facts reported on the day might conflict with something claimed to be true much later, and the person wasn’t notable at the time but they’re highly notable now.
I do see your point. I'm reminded of the photograph who snapped basically a digital throw away picture of Monica Lewinsky weeks/months/years before anyone really knew who she was. later on, he was happy to have that picture, since it was one (or the only one) of her at some event hugging Clinton.
Text is easy enough to store, but making it useful to search and access seems like another problem to solve.
I’d challenge you to find one article where this is the case.
Almost all “journalism” today is opinion notifications without substance, source, or first hand accounts. I suppose that’s its own kind of historical artifact, but not the kind I’ve seen be meaningful in my study of history.
Who knows, maybe in the future one of those opinion pieces may be the first digital record about a specific topic found by some future civilization. In the same way that the Complaint Tablet to Ea-Nāṣir doesn't provide much historical value, but is interesting in being the oldest written customer complaint.
History isn't just about the big things. The history of the mundane is also important in understanding a culture.
The text of every article is easily worth archiving. If we take average article size to be 500-800 words, then 2KB each is plenty. You want to hang on to 10 billion articles? That will cost you 1 (ONE) hard drive. Remember to make backups.
If we allocate 200KB per article for pretty strongly compressed images, then it's still 100 million on a single hard drive. Why not archive the whole lot and let God or future historians sort it out?
You only need to start getting picky when video is involved, but on the other hand when the alternative is total obliteration you can crush video down to 1MB per minute and have a tolerable VHS-like experience. And even 8MB Shrek gets the point across.
I think you undershot that by a factor of 60. But when I said "get picky" I meant things like not archiving all of youtube. And for articles specifically, archiving every video article is a lot smaller.
And going by this, restricting to a reasonable view count will cut the space you need by a factor of 10 to 25.
If we round that to a petabyte per year, it's not in the range of a typical personal archive, but it's a reasonable thing to picture several big libraries doing.
(Though a single person could handle that much if they really wanted to, spending $10 a day on data tapes.)
Storage is also massively cheaper than in the past. It's worth archiving everything that was published on the Internet because you never know what it might reveal in a future context.
Maybe the comment thread on a trivial clickbait article contains a post by a future dictator, and it will be a crucial piece of her biography fifty years from now.
Who are we to decide? Many television tapes were wiped or lost last century because people didn't think them important, including a big chunk of early Doctor Who and a lot of the Apollo moon landing coverage.
What are you doing posting this on Hacker News? Don't you know it's majority-owned by Jon Danilovsky, the son of the moon landing movie producer? Oh god they have your IP now, you only have hours.
But to a historian it is all a series of moments. To understand how/why one happened, it is useful to know the ones that came before, and things created for the moment might better describe it than those with a wider purview.
Of course you don't need it all, but deciding what to keep is a complex (and bias ridden, accidental or otherwise) problem in its own right.
> Most contemporary journalism is just not worth archiving.
It is impossible to know presently, how valuable something will be to someone in the future. I don't know if any historians have this as their motto, but I like to think they do.
>Most contemporary journalism is just not worth archiving
Considering how much historians love ancient rubbish bins, one mans junk is another mans treasure. Even the junk produced during and before the Trump/Brexit 2016 election/referendum would be massively valuable to a future historian in explaining how those things happened.
Most contemporary journalism isn't even worth reading, let alone archiving.
It's published at virtually no cost compared to what newspapers and magazines used to cost. And with LLMs it's going to get even cheaper as you no longer need people to write the stories. We are spiraling into an era where we will be drowned in content that is all worth next to nothing.
This may in fact be one of the most effective forms of archival available to us. (Handwaving away the preservation of compatibility with future hardware or the maintenance of an “emulation chain” to get us back from whatever GPUs are there in 2124 to something our today models would run on)
We could train a model every year and preserve it, then future historians can quiz this model that thinks it’s 2024 and ask it whatever they need to know. It’s fascinating because it will probably “know” the kinds of everyday normal people things that are very hard to glean from only reading old news stories. Things like how the average person feels about their world, or how they feel about current events and why.
I think you hit the nail here.
Most contemporary journalism is just not worth archiving. There is a small percentage that definitely is worth archiving, but the bulk of it is produced for the moment.