If even the semantic web people are declaring victory based on a post title and ...

jll29 · on Aug 21, 2024

The blog post does not address why the Semantic Web failed:

1. Trust: How should one know that any data available marked up according to Sematic Web principles can be trusted? This is an even more pressing question when the data is free. Sir Berners-Lee (AKA "TimBL") designed the Semantic Web in a way that makes "trust" a component, when in truth it is an emergent relation between a well-designed system and its users (my own definition).

2. Lack of Incentives: There is no way to get paid for uploading content that is financially very valuable. I know many financial companies that would like to offer their data in a "Semantic Web" form, but they cannot, because they would not get compensated, and their existence depends on selling that data; some even use Semantic Web standards for internal-only sharing.

3. A lot of SW stuff is either boilerplate or re-discovered formal logic from the 1970s. I read lots of papers that propose some "ontology" but no application that needs it.

oneeyedpigeon · on Aug 21, 2024

> title (redundant, as HTML already has a tag for that)

Note that `title` isn't one of the properties that BlogPosting supports. It supports `headline`, which may well be different from the `<title/>`. It's probably analogous to the page's `<h1/>`, but more reliable.

lynx23 · on Aug 21, 2024

I had pretty much the same reacon while reading the article. "BlogPosting" isn't particularily informative. The rest of the metadata looked like it could/should be put in <meta> tags, done.

A very bad example if the intention was to demonstrate how cool and useful semweb is :-)

oneeyedpigeon · on Aug 21, 2024

The schema.org data is much more rich than meta tags, though. Using the latter, an author is just a string of text containing who-knows-what. The former lets you specify a name, email address, and url. And that's just for the Person type—you can specify an Organization too.

tsimionescu · on Aug 21, 2024

That's still just tangential Metadata. The point of a semantic web would be to annotate the semantic content of text. The vision was always that you can run a query like, say, "physics:particles: proton-mass", over the entire web, and it would retrieve parts of web pages that talk about the proton mass.

rakoo · on Aug 21, 2024

Which was already possible with RDF. It is hard to not see JSON-LD as anything other than "RDF but in JSON because we don't like XML".

jerf · on Aug 21, 2024

Yeah, this is hiking the original Semantic Web goal post over the horizon, across the ocean, up a mountain, and cutting it down to a little stump downhill in front of the kicker compared to the original claims. "It's going to change the world! Everything will be contained in RDF files that anyone can trivially embed and anyone can run queries against the Knowledge Graph to determine anything they want!"

"We've achieved victory! After over 25 years, if you want to know who wrote a blog post, you can get it from a few sites this way!"

I'd call it damning with faint success, except it really isn't even success. Relative to the promises of "Semantic Web" it's simply a failure. And it's not like Semantic Web was overpromised a bit, but there were good ideas there and the reality is perhaps more prosaic but also useful. No, it's just useless. It failed, and LLMs will be the complete death of it.

The "Semantic Web" is not the idea that the web contains "semantics" and someday we'll have access to them. That the web has information on it is not the solution statement, it's the problem statement. The semantic web is the idea that all this information on the web will be organized, by the owners of the information, voluntarily, and correctly, into a big cross-site Knowledge Graph that can be queried by anybody. To the point that visiting Wikipedia behind the scenes would not be a big chunk of formatted text, but a download of "facts" embedded in tuples in RDF and the screen you read as a human a rendered result of that, where Wikipedia doesn't just use self-hosted data but could grab "the Knowledge Graph" and directly embed other RDF information from the US government or companies or universities. Compare this dream to reality and you can see it doesn't even resemble reality.

Nobody was sitting around twenty years ago going "oh, wow, if we really work at this for 20 years some people might annotate their web blogs with their author and people might be able to write bespoke code to query it, sometimes, if we achieve this it will have all been worth it". The idea is precisely that such an act would be so mundane as to not be something you would think of calling out, just as I don't wax poetic about the <b> tag in HTML being something that changes the world every day. That it would not be something "possible" but that it would be something your browser is automatically doing behind the scenes, along with the other vast amount of RDF-driven stuff it is constantly doing for you all the time. The very fact that someone thinks something so trivial is worth calling out is proof that the idea has utterly failed.

tsimionescu · on Aug 21, 2024

Beautifully said.

I'll also add that I wouldn't even call what he's showing "semantic web", even in this limited form. I would bet that most of the people who add that metadata to their pages view it instead as "implenting the nice sharing link API". The fact that Facebook, Twitter and others decided to converge on JSON-LD with a schema.org schema as the API is mostly an accident of history, rather than someone mining the Knowledge Graph for useful info.