Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The point here is that in semantic web there're supposed to be lots and lots of different ontologies/schemas by design, often describing the same data.

This is incredibly problematic for many reasons. Not the least of which is the inevitable promulgation of bad data/schemas. I remember one ontology for scientific instruments and I, a former chemist, identified multiple catastrophically incorrect classifications (I forget the details, but something like classifying NMR as a kind of chromatography. Clear indicators the owl author didn't know the domain).

The only thing worse than a bad schema is multiple bad schemas of varying badness, and not knowing which to pick. Especially if there is disjoint aspects of each which are (in)correct.

There may have been advancements in the few years since I was in the space, but as of then, any kind of probabilistic/doxastic ontology was unviable.



That's a valid point, but I'm not sure, the following problem has a technical solution:

> Clear indicators the owl author didn't know the domain


It doesn't, which is exactly the problem. Ontologies inevitably have mistakes. When your reasoning is based on these "strong" graph links, even small mistakes can cascade into absolute garbage. Plus manual taxonomic classification is super time consuming (ergo expensive). Additionally, that assumes that there is very little in the way of nebulosity, which means you don't even have a solid grasp of correct/incorrect. Then you have perspectives - there is no monopoly on truth.

It's just not a good model of the world. Soft features and belief-based links are a far better way to describe observations.

Basically, every edge needs a weight, ideally a log-likelihood ratio. 0 means "I have no idea whether this relation is true or false", positive indicates truthiness and negative means the edge is more likely to be false than true.

Really, the whole graph needs to be learnable. It doesn't really matter if NMR is a chromatographic method. Why do you care what kind of instrument it is? Then apply attributes based on behaviors ("it analyses chemicals", "it generates n-dim frequency-domain data")


Understood, thank you.

Yes, that's not solvable with just OWL (though it might help a little) or any other popular reasoners I know. There're papers, proposals and experimental implementations for generating probability-based inferences, but nothing one can just take and use, but there're tons of interesting ideas on how to represent that kind of data in RDF or reason about.

I think the correct solution in SW context would be to add a custom reasoner to the stack.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: