For people here experienced with graph databases, do you typically use the graph...

nchuhoai · on June 25, 2014

I have actually started with using neo4j as a primary store and then moved to using it as a secondary store next to postgres. I wrote some stuff up here: http://nambrot.com/posts/14-credport-technical-post-mortem/

It led to itself as neo4j was only useful for parts of our queries and fitting everything into neo4j was just hassle when most of our data was relational.

kennybastani · on June 27, 2014

From your post: "The biggest issue was that we had data in the graph, that just didn’t feel right in the graph instead of a relational DB." -- What exactly was the problem other than managing complexity? I read through your posts and I didn't see any mention to some of the technical aspects of your issues with Neo4j. Was your data just so large that going the relational table route gave you a better understanding of this complexity?

cihangirsavas · on June 26, 2014

i had the same history with neo4j

shuzchen · on June 26, 2014

Actually you can have a graph database in postgres as well! Look into queries using "WITH RECURSIVE", and you can do pretty much anything a graph database could do. From the specific use case I had, there was actually no difference in performance between neo4j and postgres. I really enjoyed using cypher, and it was a pain to translate a query written in a graph-specific query language to a postgres equivalent with "WITH RECURSIVE", but because postgres was already a part of the stack I stuck with it.

baq · on June 26, 2014

how do you cope with cycles in the graph? do you even have cycles?

shuzchen · on June 26, 2014

Depending on the data model there are a few ways to deal with cycles. My project involved a friendship graph, and doing queries like "find friends of X", or "find people who like X, who are within 2 degrees of friendship-separation from Y". These are problems where it was okay to have cycles in the graph, as the traversal depth was hard limited. You'll have more problems if you're doing questions like "find the cheapest path from A to B", although there is certainly a way to cope with cycles there as well.

AlisdairO · on June 26, 2014

I've got a lot of background in RDF graph stores. It depends a lot on your usage, but I think for your typical web app, you'd be better off using a Postgres install, and making use of fancier features like WITH RECURSIVE as necessary. Graph stores often miss out on features like guaranteed relational integrity and guaranteed constraints, which I find invaluable for safe application development in the face of concurrent updates.

Graph stores are typically much slower for repetitive data that fits cleanly into a relational model. This isn't to say they're not useful - for more irregular data they're a fantastic fit - it's just that very irregularly structured data isn't the common case.

Of course, you can always use two different stores - much like many sites do with a separate lucene/elasticsearch index for text search - but your graphing needs must be relatively componentised for that to work well.

JPKab · on June 26, 2014

Curious: What RDF triple stores have you used, and in what kind of application?

I was looking into using Stardog for a metadata repository I was building, but we ended up (probably unwisely) bastardizing Postgres into a bunch of self-join heirarchies.

AlisdairO · on June 29, 2014

The ones I've spent most time with were Jena/TDB, Virtuoso, 3store, along with a couple of proprietary engines. BigOWLIM is also a strong contender in the space. I've used them in the context of both object storage and semantic web data storage.

My experience is that if you don't need constraints/enforced relational integrity, RDF stores make for really simple/easy object storage. There's definitely a performance tradeoff, though - depends on what you need, really!

malkia · on June 26, 2014

SQLite added CTE (WITH RECURSIVE) recently - http://www.sqlite.org/lang_with.html

webmaven · on June 26, 2014

Which ORMs support WITH RECURSIVE?

dragonwriter · on June 26, 2014

You don't really need any special ORM support -- even if you are using an ORM -- if you use appropriate views.

webmaven · on June 26, 2014

Hmm. I guess what I'm getting at is that I typically start building a web app by writing model objects that get turned into db schemas by the ORM (in Python, this is usually the Django ORM or SQLAlchemy), and the ORM turns attribute access into joins (either eagerly or lazily).

So an ORM that usefully interpreted model subclassing etc. and created self-joining tables and could query the resulting model using RECURSIVELY WITH as appropriate would be a real boon.

klapinat0r · on June 25, 2014

I don't currently use graph databases in production, but I do, however, have some experience.

I use both, in a similar vein of "using Elastic Search". It could be your primary store, but it's sometimes it's more pragmatic to have two sets, a "solid" base.

This is not to say that it can't be done. What I'm stressing is that larger "changes" are hard and difficult to handle - which means a lot in the start of your process, and less in the end, as in, when you're deciding how to model your data. For instance node layout (new properties? different type? other constraints?), and mass updates are also a bit cumbersome.

Usually I have more than one SQL table (naturally) since the data I've used in graph databases is mix and match (otherwise I'd just use a fixed schema and some relational DB).

-- As for "how that works", for me it's:

routinely update from my base database with queries alike: ID > last ID.

This has worked as expected, in terms of what data you get in, and which limitations you impose (e.g. timeliness).

I'm currently making a shift to running all data in my graph database as I've settled on a model (which edges, which nodes, which properties).

> querying from two different databases is going to slow down my responses

True, but depending on your data (do you know one of the queries beforehand - e.g. is your postgresql query enriching whatever your graph query returns) you might have success tying (inserting) some of the SQL data to your graph database.

JasonL9000 · on June 25, 2014

A graph can do what a table can do and a lot more, but that's usually not the whole issue. In practice you need to consider things like speed, volume, scale, consistency, redundancy, computation, ad-hoc vs. planned operations, use of resources (disk, memory, CPU, GPU), etc. And as most NoSQL systems just aren't as mature as their table-based counterparts, you'll also have to factor in your tolerance for issues and general system crankiness. All that being said, some applications just cry out for graphs, particularly apps that involve items linked in pairs. Social apps (people linked by friendships), travel (places linked by flights), communications (people linked by messages), all of these can play hell with an SQL database but are naturals for graph databases.

jmlvanre · on June 25, 2014

I agree with the idea that tables are just strict graphs and as such a graph database is usually capable of substituting a relational database. I think many graph DBs lack a sophisticated enough query language to bridge that gap. At Orly (https://github.com/orlyatomics/orly) we're working on a powerful query language, and it's nice to see that Cayley is doing the same.

> querying from two different databases is going to slow down my responses

I think querying 2 different systems tends to be slower, but more importantly you lose transactionality. If you can use a single system that is at least on-par with your relational system for your run of the mill data and have a very powerful graph then that's a big win.

ghein · on June 26, 2014

I work on a graph focused firm, XN Logic, where we use an unhydrated graph to store and analyze the relations, the appropriate store for large volumes of information, and Datomic to store mutations for the graph for history analysis.

We use the PACER engine (https://github.com/pangloss/pacer ) to power queries.

This approach allows you to get the optimal performance and only reaching out to other systems when needed.

deepGem · on June 26, 2014

I first started with neo4j as a primary data store for our semantic graph but there are some limitations that are forcing us to look for alternatives.

1. Adding edges to a neo4j graph is a painfully slow process. For a large graph with a few million nodes - it'll take days. 2. Scaling neo4j on a cluster is either not possible or it's a painful process. I'm yet to discover this.

However, the greatest advantage that neo4j offers is the ability to query a path. So far, no other graph databases that I know have this ability (including Apache spark and giraph).

It's quite possible to build a directed graph database as an adjaceny list in redis. We tried this and it's super fast and scalable. However, querying is very painful.

kennybastani · on June 27, 2014

1. Adding nodes and relationships in Neo4j does not have to be slow. It really depends on how you are loading that data in. Neo4j provides many options for data import and a transactional endpoint over HTTP for batching transactions and decreasing disk write overhead.

2. The reason Neo4j is the only database that allows you to query a path is the same reason that setting up clustering or sharding is difficult. If your graph is complex then the problem is "How do I split up these subgraphs into shards so that traversals don't have to traverse across shards?" -- Building a giant adjacency list and using that as a traversal index is a clever idea, I must admit. :)

philjohn · on June 26, 2014

Openlink Virtuoso (and any other RDF store that supports them) does with the Property Paths feature of SPARQL 1.1.

deepGem · on June 26, 2014

Right but you have to use SPAERQL and from my limited experience with SPARQL it's not very fast either.

philjohn · on June 27, 2014

As someone else said, very much dependent on database engine. Some are faster than others, some scale better than others - it's about picking whats right for your requirements.

jerven · on June 26, 2014

You should try again, depending on the software (i.e. which SPARQL database) it can be much faster than neo4j.

spyder · on June 26, 2014

If you query the two databases parallel then the response time should be equal to the slowest one of the two database responses (not the sum of them). But if you use two database then you have to maintain both of them and if they are on the same server then sharing the same resource could make them slower than just using one db.

philjohn · on June 26, 2014

We use an RDF datastore (OpenLink Virtuoso, clustered edition) as our primary datastore. We use it in combination with Apache Solr to provide fulltext search over various resources that we extract and pass through an Indexing pipeline to go from RDF Graph -> Search Document.

TallTed · on July 4, 2014

It's worth noting that Virtuoso (produced by my employer, available in free Open Source and paid Commercial variants, http://virtuoso.openlinksw.com/features-comparison-matrix/) is a hybrid Relational/Graph/XML/FreeText storage and query engine, which natively supports SQL, SPARQL, XPath, XQuery, and many other open standards. It might satisfy the OP's needs on its own.

Virtuoso's support for open standards makes it easy to use it as a complete solution covering all the bases, or, as in @philjohn's case, to plug-and-play with best-in-breed solutions along any axis where our implementation proves not to serve your needs for any reason. (We do want to know how and why we don't measure up, so we can improve that aspect!)

sebastianconcpt · on June 25, 2014

I use only object databases. No relational. BTree response times for the indexed objects so life is good.

Some projects I use NoSQL, mongo.

I do use Redis for reactivity and caching (but redis isn´t a database so)

Congratulations for the progress, thanks for sharing your work. I do want to see more great odbs with friendly APIs

devniel · on June 26, 2014

Currently, I'm using a graph database (neo4j) in a project about ontologies.