Why HStore2/jsonb is the most important patch of PostgreSQL 9.4

jeremyjh · on Feb 23, 2014

Chasing flavor-of-the-month seems like a pretty lost cause. I would like to see the patch in as well, but this seems like a pretty terrible reason for it.

wisty · on Feb 23, 2014

AJAX was flavour of the month in 2004. JSON has been flavour of the month since ~2006.

While that makes it about 10 years younger than Postgres and MySQL, it's still not a flash in the pan.

jeremyjh · on Feb 23, 2014

You misunderstand me. I am not commenting on the nature of the feature, but rather the motivation cited in the SP for holding the release open to get it in. That reason could be summarized as "node.js and mongo!!".

crucialfelix · on Feb 23, 2014

Its not flavor-of-the-month, its flavor-of-the-generation. That's a 4 or 5 year span- not a ship you want to miss. Django and Rails have brought significant Postgres adoption. If widespread OS adoption and support is a goal then it is wise to pay attention to the features and platforms that will drive adoption.

jeremyjh · on Feb 23, 2014

MongoDB is "flavor-of-the-generation" ? If so, then that generation is truly lost.

crucialfelix · on Feb 23, 2014

no, Django and Rails have been a generation. Node.js and Go are potentially the incoming generation (according to the article)

Node has encouraged MongoDB usage because there were nice packages that made working with Mongo very easy. If Postgres supports JSON with powerful features then developers will make npm packages that support it and this will make Postgres very attractive.

ebiester · on Feb 28, 2014

Inexpensively replicating key-value databases for suitable problems is the "flavor-of-the-generation."

spikels · on Feb 23, 2014

I disagree. Node and MongoDb have both had very strong growth for several years now across a wide variety of users from individual hackers to the largest companies. While I'm skeptical of MongoDb's long term prospects due to its underlying technology and tight sponsorship I can understand the appeal of it's ease of use, especially for development. Node and JavaScript, on the other hand, seem clearly here to stay. While far from perfect are becoming deeply embedded in many key areas.

And even if you don't like MongoDb or Node, JSON is clearly one of the most popular data formats - better support is surely a good thing.

jeffdavis · on Feb 24, 2014

I agree in the sense that nobody should re-architect their product around a trend.

But this is different because the postgres architecture is very much capable of great support here, and getting even better. Postgres has the best support for adding in new types, with amazing capabilities around indexing.

I'm not just talking about different kinds of BTrees, I mean real non-scalar indexing strategies like inverted indexes (used for full text, but also great for JSON), similarity matching indexes, unbalanced indexes, spatial indexes, etc.

The author merely wants to use this infrastructure as intended for the JSON use case. Easier said than done, of course, but no architectural compromises need to be made in the process of providing this support.

rosser · on Feb 23, 2014

Because following the latest fashions is clearly the PostgreSQL Core Committee's raison d'être...

jmnicolas · on Feb 23, 2014

PostgreSQL is the best open source relational database and I hope it remains that way.

While I don't mind seeing some non relational additions like JSON, it shouldn't be detrimental to Postgres relational core.

It's not a pissing contest : if MongoDB is more popular than Postgres for non relational (NoSql) data so what ?

It's not like Postgres is a commercial product and need wider adoption to keep the parent company afloat.

masklinn · on Feb 23, 2014

> While I don't mind seeing some non relational additions like JSON, it shouldn't be detrimental to Postgres relational core.

The patchset in question is an improvement of hstore[0] (typed hstore, nested hstore) and unification of json and hstore (by using the binary hstore format for both). There is no more detriment to the relational core than when hstore or XML were originally added. Also, next-GIN[1] by the same people[2], not sure it's part of the patch mentioned.

[0] [PDF warning] http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2...

[1] [PDF warning] http://www.sai.msu.su/~megera/postgres/talks/Next%20generati...

[2] Who previously contributed GiST and GIN, FTS, ltree or intarray

jeltz · on Feb 23, 2014

The GIN improvements were submitted as a set of separate patches and several of them have been merged into 9.4.

masklinn · on Feb 23, 2014

Good to know, thanks a lot for the info.

jeltz · on Feb 23, 2014

I know that the GIN compression patch has been committed and some of the patch for speeding up searches by skipping posting list entires, but I have not seen anything about the third part, speeding up FTS by adding ordering info to the indexes.

https://commitfest.postgresql.org/action/patch_view?id=1357 https://commitfest.postgresql.org/action/patch_view?id=1382

twic · on Feb 23, 2014

They've still got a .su address! Old school!

einhverfr · on Feb 23, 2014

> PostgreSQL is the best open source relational database and I hope it remains that way.

Yes, and it will. In fact the best post-relational features of PostgreSQL work best when used in a relational way.

> While I don't mind seeing some non relational additions like JSON, it shouldn't be detrimental to Postgres relational core.

As I usually tell people "Good object-relational design is usually good object-oriented design and good relational design, and that makes it really, really hard."

> It's not a pissing contest : if MongoDB is more popular than Postgres for non relational (NoSql) data so what ?

I would agree. The fact is that for the last 10 years, PostgreSQL has been the choice of open source rdbms's in business apps for the last 12 years and I don't see that changing.

> It's not like Postgres is a commercial product and need wider adoption to keep the parent company afloat.

Mindshare is important though.

jeltz · on Feb 23, 2014

Adding to what masklinn said, JSON is just another type with related functions and operators. It uses the already implemented extensible type system, and is no different from any other type including varchar and integer.

einhverfr · on Feb 23, 2014

I am interested in it for another reason.

I am working on a not-yet-released (needs test cases!) CPAN module called PGObject::Type::JSON.[1] This module would let you effectively grab JSON types from the db, serialize and deserialize on the db query so you don't have to worry about it.

The serialization and deserialization is cool, but the question is what you can do with it. With the json functions if they become more functional, we could use JSON as an input to stored procedures, allowing arbitrarily complex data types to be fed into or out of stored procedures easily without having to serialize in tuple/array format.

A single, cannonical, well supported format would be huge in that it would allow you to do things with the database with extraordinary ease that are not trivial to do today. For example, really good json support + composite types as tuple elements (including arrays of composite types) + really good indexing support should give you an ability to approximate the power (for at least some workloads) of full custom type development, without needing to go to C.

[1] https://github.com/einhverfr/PGObject-Type-JSON

dorfsmay · on Feb 23, 2014

I don't tkink people care so much for storing json, rather than "talk" json to the database. A different interface with simple insert and update with json is typically what people are after. A sort of ORM that simplifies inserts and update (and retrieval per primary key even) where data is provided as json would be a much better tool to attract the newer generation of programmers.

And like some of the comments said, PLEASE auto-partitionning!

einhverfr · on Feb 23, 2014

Agreed. How about something like:

"Here is a financial transaction. Please make sure it is balanced and post it."

This is not trivial to do right now. Top-notch JSON support would make it a breeze.

jeltz · on Feb 23, 2014

Just having a JSON type should fix this without the need for a JSON based protocol. Actually the current JSON support in 9.3 should be enough for this. Just call a stored procedure with the JSON object describing the transactions.

einhverfr · on Feb 23, 2014

It's reasonably easy to wrap the SQL involved in calling stored procedures.[1]

The JSON in 9.3 is almost good enough for this. It doesn't handle nested data structures to sql nested complex types properly, and so it won't quite get you there but otherwise agreed.

I.e. to make it work currently you have to slice and dice the JSON in some fun ways first. I find it less painful to just handle the complex type serialization/deserialization on the client side.

[1] See https://github.com/einhverfr/PGObject-Simple-Role

dorfsmay · on Feb 23, 2014

Yes, it is easy, but people using Mongo, CouchDB etc... will argue that not having to do it at all is better, and will use that argument to justify their choice.

Same idea as ORMs... Making any CRUD operation is easy in SQL, why did we need ORMs? Yet, there is a fragment of the population who wouldn't use a database they couldn't talk to through an ORM. Today, there is a segment of programers that won't use a database if they can't PUT({"name": "Big Bird", "address": "123 Sesame street"}), because, you know, that's the modern way to do things.

einhverfr · on Feb 23, 2014

It loses the power of SQL which may also seem to be the modern way to do things ;-)

dorfsmay · on Feb 23, 2014

No. Adding support for a json NoSQL like interface doesn't remove the SQL access. You could let the front end guys do Create Update Retrieve by primary key using the json interface, while the backend analysis guys use SQL through the normal SQL driver.

einhverfr · on Feb 24, 2014

I think you misunderstood my point. The "modern" way seems to be to avoid SQL using either NoSQL or something like an ORM.

Personally I prefer to tackle this from the other side with stored procedures and service locators so my application code is free of SQL (aside from the service locator), and all my SQL is in .sql files.....

rdtsc · on Feb 23, 2014

> This is not trivial to do right now. Top-notch JSON support would make it a breeze

What does it matter if it is JSON or not. In financial transactions you want:

* Maximum fault tolerance * Auditability * Security * Performance * Ability to roll it back with all the cleanly defined states in between * Ability to talk to legacy / standard systems

JSON-ifying the contents of the transaction is nice. But I wouldn't think that is the main pain point in the industry at the moment. It is really the behavior of the systems outside the transaction contents itself that matters.

einhverfr · on Feb 24, 2014

> What does it matter if it is JSON or not.

It makes it easier to pass the complex data regarding a financial transaction into a stored procedure all at once. That's the non-trivial part today.

jeltz · on Feb 23, 2014

As far as I can tell hstore is one of the most popular extensions to PostgreSQL, so people do indeed care about storing non-relational data in their database.

spikels · on Feb 23, 2014

In my opinion hstore is one of the most useful newer features in Postgress and hstore2 is a big step improvement on it.

chrisfarms · on Feb 23, 2014

I'd also add that SPA[1]/client-side applications are also an important consumer of databases with decent JSON support.

The "backend" for these types of applications are often relatively thin API wrappers around a database, with the bulk of application logic on the client-side.

Postgres' JSON improvements over the last couple of years have made it a pretty good choice already for these sorts of things, but the JSON indexing improving would make this a no-brainer.

[1] http://en.wikipedia.org/wiki/Single-page_application

pjmlp · on Feb 23, 2014

Who cares about node.js? I am yet to see any Fortune 500 deployment on our consulting projects.

Following fashion usually doesn't lead to good results.

jeffdavis · on Feb 24, 2014

I think it's valuable to show how an "old, traditional"[1] system like postgres can actually be much more flexible and powerful even working in a fashionable environment.

If nothing else, it's a great exercise to make sure that postgres isn't missing out on a real, fundamental use case. Fashionable technologies don't always have something new to offer, but sometimes they help you see something from a new perspective.

[1] Postgres is very non-traditional in many respects, and I personally believe it's revolutionary; but many people perceive it as more traditional.

spikels · on Feb 23, 2014

Walmart, eBay/PayPal, Yahoo, Amazon/Zappos, LinkedIn, CBS, Time Warner, GM, DowJones, Microsoft....

And those a just the ones I have happened to hear about with major node.js projects.

pjmlp · on Feb 23, 2014

Those are Fortune 50, not Fortune 500. Note the extra digit.

rpedela · on Feb 23, 2014

I am not sure it is a good idea to use Fortune 500 companies as a signal for which technology to use. They are typically pretty slow to change.

pjmlp · on Feb 23, 2014

I just came out of a project using AngularJS with a Spring backend, so they also use new stuff, not just everything that is the flavour of the month.

rpedela · on Feb 23, 2014

Interesting. Here is a blog post about Node.js in the enterprise including Fortune 500.

http://blog.appfog.com/node-js-is-taking-over-the-enterprise...

pjmlp · on Feb 23, 2014

I would consider such companies Fortune 10, or Fortune 50 at most.

justincormack · on Feb 23, 2014

Postgres has never been popular. Why the sudden race for popularity now? MongoDB is replacing MySQL not Postgres. I would be interested to know what proportion of Postgres users use the JSON features, I suspect it is still small, even amongst the Node users. The popularity of Postgres is growing at a manageable rate, not an exploding rate and that seems fine to me.

venus · on Feb 23, 2014

> Postgres has never been popular. [..] MongoDB is replacing MySQL not Postgres

I don't know what you are basing that on, but it's the complete opposite of what I am seeing. Postgres is now the default open source database as far as I can see, especially for Rails. MongoDB enjoyed a brief hype-fuelled day in the sun but is now viewed as a specialised tool whose choice over a more general RDBMS would require considerable justification.

I suspect the question of whether its popularity is "exploding" or not depends on your definition of the word. I have certainly noticed a very marked uptick in the last few years. Not exploding perhaps, but certainly ballooning.

justincormack · on Feb 23, 2014

Thats what I meant by "growing at a manageable rate".

Well maybe Postgres and Mongo are replacing MySQL. I haven't seen Postgres being the default, but who knows, I mostly hang around with people who have been long term postgres users anyway.

venus · on Feb 23, 2014

I suppose my main experience is with rails people, and as of rails 4 postgres is pretty clearly the "default" DB, emphatically displacing MySQL.

The author appears to come from much more of a node.js angle and so probably what he sees is different. I'm probably underestimating the number of node.js projects so the author likely has a good point. Still, I'd raise an eyebrow pretty highly at anyone who considered MongoDB a solid default choice for a general purpose application. Most companies I know who opted for MongoDB regret it, for reasons that have been rehashed here any number of times.

threeseed · on Feb 23, 2014

> Postgres is now the default open source database

Really ? I still see MySQL and MongoDB both being the two most deployed.

venus · on Feb 23, 2014

Yeah, I realised after typing it that there's probably a huge amount of availability bias in my statement. It wouldn't surprise me if Wordpress MySQL deploys outnumber all rails/django postgres deploys combined. And the PHP mainstream isn't moving away anytime soon.

MongoDB, though - I do question that, however I have no figures and it again probably depends on the circles in which you move whether you see a lot of it or not. I don't move in node circles much, so I hardly see it at all.

Mea culpa. That said, in the startup-filled tech incubator in which I work, and in which rails is pretty dominant, postgres is king. Make of that what you will.

threeseed · on Feb 23, 2014

I work in the enterprise space and MongoDB is very popular. It isn't replacing MySQL or PostgreSQL at all (neither of which I've ever seen) but rather Oracle. And it's big in the higher end internet space as well:

http://www.mongodb.org/about/production-deployments/

masklinn · on Feb 23, 2014

> Postgres has never been popular.

Postgres is the "default" database for django installations.

> MongoDB is replacing MySQL not Postgres.

That's part of the point, if user bases are switching databases, that's a chance for postgres.

> I would be interested to know what proportion of Postgres users use the JSON features, I suspect it is still small

For good reasons since it's still young (first introduced in 9.2) and fairly limited (9.3 did add operator and indexation as well as a bunch of functions). I'm in a postgres shop and we aren't quite ready yet to drop support for all postgres versions preceding 9.3 (to my dismay).

raverbashing · on Feb 23, 2014

"Postgres is the "default" database for django installations."

Is it? It is the recommended DB, between oh, MySQL, PostgreSQL, Oracle and Sqlite3

But I've yet to see an actual statistic on that.

With 3rd party plugins you can even use it on Microsoft SQL Server (though of course I don't recommend it unless it's a specific thing)

masklinn · on Feb 23, 2014

> It is the recommended DB

Hence "default". You can use any db you want, but if you follow the documentation or ask which one you ought use, Postgres will be suggested and recommended. Sounds like a default to me.

collyw · on Feb 23, 2014

The settings file generates settings for SQLite, so I would say that is "default".

threeseed · on Feb 23, 2014

Show me where it says that PostgreSQL is the default.

https://docs.djangoproject.com/en/dev/ref/databases/

I've only used Django with SQLite but I would be shocked if it was actively recommending a single database. It would imply that somehow Django doesn't work as well with the others.

masklinn · on Feb 23, 2014

> Show me where it says that PostgreSQL is the default.

FAQ: https://docs.djangoproject.com/en/dev/faq/install/#what-are-...

> you’ll also need a database engine. PostgreSQL is recommended, because we’re PostgreSQL fans

geoDjango (bundled): https://docs.djangoproject.com/en/dev/ref/contrib/gis/instal...

> PostGIS is recommended

the migration document also clearly notes that postgres has the best migration support, although it does not outright recommend it (probably because migration is generally an afterthought and production database has already been decided by then)

collyw · on Feb 23, 2014

South does complain every time a MySQL migration has problems and you get recommended to use Postgres there. Other than that I haven't noticed any preference in the Django documentation in 3 years.

raverbashing · on Feb 23, 2014

Yes, but it complains less with some upgrades (I've seen it happen)

So I think it's also South who's not as good in doing upgrades in MySQL as it is with PostgreSQL

jeltz · on Feb 23, 2014

Not a Django guy but my guess is that it has nothing to do with South and all to do with that PostgreSQL supports transactional DDL while MySQL does not.

raverbashing · on Feb 23, 2014

True, however transactional DDL doesn't seem to be a problem after MySQL 5.5

https://wiki.postgresql.org/wiki/Transactional_DDL_in_Postgr...

masklinn · on Feb 23, 2014

This only demonstrates that rollbacking an insertion after creating a table within a transaction now works. That is, it shows DDL does not completely break a transaction it's performed in.

According to the documentation DDL remains non-transactional in MySQL 5.5[0]

> The CREATE TABLE statement in InnoDB is processed as a single transaction. This means that a ROLLBACK from the user does not undo CREATE TABLE statements the user made during that transaction.

[0] http://dev.mysql.com/doc/refman/5.5/en/implicit-commit.html

einhverfr · on Feb 23, 2014

> Postgres has never been popular.

Not my experience. I have found that PostgreSQL has a remarkably stealthy popularity. I went to a conference a couple years ago and everyone knew PostgreSQL, and even there were more booths advertising Pg services than MySQL services.

rdtsc · on Feb 23, 2014

You are both right and wrong.

Postgres is not popular if we talk enterprise world that has been buying Oracle. Oracle is an immense behemoth in that space with skilled salesmen, and they make lots of money.

I once asked an Oracle salesman what he thought of MySQL and PostgreSQL and he laughed, and basically said they are seen as joke by his customers. Maybe he was brainwashed sure, but I suspect that reflects (reflected? that was 2-4 years ago) the reality in that market.

Now with companies outside the enterprise (startups) and for internal projects, I feel Postgres has been very popular. MySQL effectively become unappealing after its trademark issues. Postgres gained a some of those users. It also gained disillusioned NoSQL people, some didn't RTFM, some were mislead, some didn't understand the problem well enough and so on.

pfraze · on Feb 23, 2014

The node-pg driver by brianc is in good shape; I'm not sure why OP sees it as a major issue. Feel free to improve it, though.

freyr · on Feb 23, 2014

Could someone fill me in about what's lacking with the node pg driver by brianc?

spikels · on Feb 23, 2014

My understanding is that it will need to be updated to handle all the new hstore2 features.

darksaints · on Feb 23, 2014

I think this addition is extremely valuable, but courting node users is a weak argument for it. For me, it is more about the growing importance of semi-structured data, as well as other forms of data that are difficult to model outside of row-tuples and joins. Hstore was a hack...really useful, but a hack nonetheless. Hstore2 is a legitimate kv-store. Now if we could just do the same with a graph-store.

sheff · on Feb 23, 2014

The linked post says : "I think we shouldn't release 9.4 unless it goes in."

I haven't been following the PG developers mailing list that closely recently and the article doesn't go into any details as to why it may be delayed.

From the link that @masklinn posted (Thanks, very interesting!) to a hstore presentation, it looks like the hstore side was developed by November last year, so I'm wondering why it wouldn't be ready for release by this September or whether its just the integration of hstore/JSON side of things that is the hold up.

For me, the new PG feature - although its an outside project - which I am most looking forward to in the 9.4 time frame is a newer JDBC driver for Postgres ( https://github.com/impossibl/pgjdbc-ng ).

masklinn · on Feb 23, 2014

http://postgresql.1045698.n5.nabble.com/jsonb-and-nested-hst... seems to be the relevant threads. Apparently there are issues over wire format, and jsonb being in core (and unless I'm misreading things jsonb is to be a new type separate from json, which is unwelcome news, I didn't know about that although it does make sense from a "binary copy of database files" sense)

sandGorgon · on Feb 23, 2014

I would like to suggest a modification to the author's thought about an ORM. What would be very cool would be a PG extension that wraps a mongodb compatible Api on top of postgresql.

That should completely short circuit the whole value proposition discussion of pg vs mongo.

twic · on Feb 23, 2014

This could be done fairly easily, perhaps more easily, as a standalone program, i think. Something that exposes a MongoDB-compatible API, and translates requests into libpq to go to a PostgreSQL backend.

You could run it either on the PG host, or on the same host as the apps wanting to talk to PG.

brannon · on Feb 23, 2014

My understanding of all of this is beginner level at best, but isn't that more or less what waterline does? https://github.com/balderdashy/waterline-docs

twic · on Feb 24, 2014

Based on a quick look at the documentation, i think the API Waterline exposes to clients is not an imitation of the MongoDB API, which would allow PostgreSQL to be used as a drop-in replacement for MongoDB. Rather, it's its own thing.

Waterline does support both MongoDB and PostgreSQL, though, so you could use it as part of a migration strategy: start with MongoDB, install Waterline backed by MongoDB, migrate your app to use Waterline rather than using MongoDB directly, replace MongoDB with PostgreSQL.

jeltz · on Feb 23, 2014

The guy behind plv8 has already started such a project. I have no idea if it is production ready.

https://github.com/umitanuki/mongres

sandGorgon · on Feb 23, 2014

this is very cool. I wonder what happened to it.

rdtsc · on Feb 23, 2014

Just by observing from the outside, Node.js and MongoDB, don't know what it is about those two platforms, but the communities formed around them seem to attract a large number of people I wouldn't want to interact with. There is a lot of immaturity and drama. Maybe it is the unreasonable and over-hyped marketing, not sure, can't quite put my figure on it.

Go is also a new language and framework that is popular but it has a different feel, a better one. Python so far, has one of the largest and most pleasant community (given its size) I've seen.

Json/KV friendly features are welcome, but perhaps looking to impress Node.js and MongoDB users is just barking at the wrong tree.

collyw · on Feb 23, 2014

OK, I now know why it is the most important patch. Is there anywhere I can read what it is what it does exactly?

spikels · on Feb 23, 2014

Maybe try searching for "Postgres 9.4 hstore". Just a thought but I'm to lazy to try. ;)