Very happy to see industry discover the power of graphs and especially, a triple...

zdragnar · on June 25, 2023

Is Neo4j a good option? I've not heard great things about it performance-wise, though this was some years ago when tinkerpop/gremlin was starting to make news in my circles, and we were operating on extremely dense graphs.

WirelessGigabit · on June 26, 2023

I have experience with Neo4j as a consumer of the database, but as part of a project where someone else wrote the queries.

I hate it. It's extremely expensive. It's slow. Very slow. It only recently had multiple databases per instance. It doesn't support per database encryption. Did I mention it's slow?

We also looked at the ongdb effort, but that went offline all of the sudden due to licensing issues. Now it's back but they reset (?) the version number. Confusing. Also, that one is built in version 3-ish. So no multi-db. While you can spin up multiple instance (it's free?, it's still Java, i.e. slow and eats memory.

smarx007 · on June 26, 2023

The only thing I like in Neo4j is Cypher – it's powerful and intuitive. I don't use Neo4j because of two reasons:

1) It has no support for subgraph queries. In other words, you can't run a query on a graph and have the query result be a graph too. Instead, you will get a tabular result set. In SPARQL-based systems, you can run a 'CONSTRUCT' query. Very useful if you want to process the results by other parts of the code that also expect a graph (composability). See [1] and [2] if you want to take SPARQL for a spin.

2) It has no support for a standard graph data format. Their blog had some posts about using CSVs but they are a tabular data format, which means that some acrobatics are needed to extract a graph from CSV (actually, two CSVs) and none of this would be standard. Also some attempts to fit a graph peg into a tree-shaped hole (JSON, XML). To my knowledge, RDF is the only widely used standard to actually represent graphs. Unfortunately, there is a lot of confusion around RDF because (a) RDF is actually just a model and there are multiple file formats – I recommend Turtle, and (b) RDF has a semantic web heritage – forget semantic web and just use a graph data format.

But I know that industry is most familiar with Neo4j, that's why I mentioned it. To my knowledge, Stardog is one of the most advanced and performant systems (with on-prem deployment) but is very expensive. Amazon Neptune and Azure Cosmos are cloud-only, which is a hindrance for many projects. Bottom line is that graph DBMSs have a long way to go and more interest from the community is needed to motivate more dev effort.

P.S. For dense graphs, a graph DBMS may not be the best solution. Graph DBMSs also lose their appeal if your queries are not traversal-heavy.

[1]: https://data.nobelprize.org/sparql

[2]: https://query.wikidata.org/

jnsaff2 · on June 26, 2023

I have recently gone through at 3.x to 4.x upgrade as an ops person for a ~1TB database.

My takeaway: https://www.youtube.com/watch?v=JNC1CpJQxzg

The engineering quality along with documentation left a pretty bad taste.

Tho sometimes some aspects being really primitive were helpful for getting out of trouble.

zcw100 · on June 26, 2023

Graph databases excel when you need maximum flexibility and this is when the shape of the graph is constantly changing by adding new, novel, and unexpected data. As soon as you put it behind an application it becomes static and you end up paying a huge price for flexibility that will not be needed or used.

esafak · on June 26, 2023

Same reason why vector databases are a thing. When performance matters, you go purpose built.