Hacker News new | past | comments | ask | show | jobs | submit login

Very happy to see industry discover the power of graphs and especially, a triple-based representation (cf. RDF [0]; subjects are “subjects”, relationships are “predicates”, and objects are “objects”).

Now, a genuine question: why try to shoehorn a freeform graph (because the list of relationships is not hardcoded) into a relational DB instead of using a graph DBMS like Neo4j, Apache Jena (Fuseki) etc. From looking at the source code briefly [1], I didn’t see any extreme SQL optimizations. This indicates to me that Warrant would either support a very limited set of query types, or be very slow on quite a few types of them. Also see “billion triple challenge” in the academia around this.

Good luck with your startup!

[0]: https://www.w3.org/TR/rdf11-primer/

[1]: https://github.com/warrant-dev/warrant/tree/main/pkg/authz/o...




Is Neo4j a good option? I've not heard great things about it performance-wise, though this was some years ago when tinkerpop/gremlin was starting to make news in my circles, and we were operating on extremely dense graphs.


I have experience with Neo4j as a consumer of the database, but as part of a project where someone else wrote the queries.

I hate it. It's extremely expensive. It's slow. Very slow. It only recently had multiple databases per instance. It doesn't support per database encryption. Did I mention it's slow?

We also looked at the ongdb effort, but that went offline all of the sudden due to licensing issues. Now it's back but they reset (?) the version number. Confusing. Also, that one is built in version 3-ish. So no multi-db. While you can spin up multiple instance (it's free?, it's still Java, i.e. slow and eats memory.


The only thing I like in Neo4j is Cypher – it's powerful and intuitive. I don't use Neo4j because of two reasons:

1) It has no support for subgraph queries. In other words, you can't run a query on a graph and have the query result be a graph too. Instead, you will get a tabular result set. In SPARQL-based systems, you can run a 'CONSTRUCT' query. Very useful if you want to process the results by other parts of the code that also expect a graph (composability). See [1] and [2] if you want to take SPARQL for a spin.

2) It has no support for a standard graph data format. Their blog had some posts about using CSVs but they are a tabular data format, which means that some acrobatics are needed to extract a graph from CSV (actually, two CSVs) and none of this would be standard. Also some attempts to fit a graph peg into a tree-shaped hole (JSON, XML). To my knowledge, RDF is the only widely used standard to actually represent graphs. Unfortunately, there is a lot of confusion around RDF because (a) RDF is actually just a model and there are multiple file formats – I recommend Turtle, and (b) RDF has a semantic web heritage – forget semantic web and just use a graph data format.

But I know that industry is most familiar with Neo4j, that's why I mentioned it. To my knowledge, Stardog is one of the most advanced and performant systems (with on-prem deployment) but is very expensive. Amazon Neptune and Azure Cosmos are cloud-only, which is a hindrance for many projects. Bottom line is that graph DBMSs have a long way to go and more interest from the community is needed to motivate more dev effort.

P.S. For dense graphs, a graph DBMS may not be the best solution. Graph DBMSs also lose their appeal if your queries are not traversal-heavy.

[1]: https://data.nobelprize.org/sparql

[2]: https://query.wikidata.org/


I have recently gone through at 3.x to 4.x upgrade as an ops person for a ~1TB database.

My takeaway: https://www.youtube.com/watch?v=JNC1CpJQxzg

The engineering quality along with documentation left a pretty bad taste.

Tho sometimes some aspects being really primitive were helpful for getting out of trouble.


Graph databases excel when you need maximum flexibility and this is when the shape of the graph is constantly changing by adding new, novel, and unexpected data. As soon as you put it behind an application it becomes static and you end up paying a huge price for flexibility that will not be needed or used.


Same reason why vector databases are a thing. When performance matters, you go purpose built.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: