More

tomiko_nakamura · on March 2, 2015

I don't think XML was all that widely used, and the implementation looked more "looks interesting, let's develop something and people will use that".

With JSONB it's more like "people are already using JSON data type (text + validations) a lot, and want something better, and we do have hstore, and we're working on hstore 2.0, so let's use the ideas for json". And that's how JSONB was born.

My estimate is there are already more people using JSONB than XML (talking about the PostgreSQL data types), so the momentum is very different.

tomiko_nakamura · on March 2, 2015

I think that if you're using JSONB for data that can be easily stored in a proper tabular form (aka relation), you're doing it wrong. No doubt there will be people using it this way, though, and maybe there are cases where it really makes sense.

What JSONB makes much easier is storing data that are not a good fit for relational paradigm. For example if you don't know the exact structure (e.g. if you don't know what keys to expect), you can't reasonably decompose that into relational schema.

This is the case with mail headers, for example - it's highly variable, every client/MTA can define their own headers, and so on. You could decompose that into EAV model (yuck), use tables with generic PARAM_1, ..., PARAM_N columns (yuck!), or you can just stuff that into a JSONB column.

tomiko_nakamura · on March 2, 2015

People are working on other storage types for PostgreSQL. That's all I can say at this moment.

ddorian43 · on March 2, 2015

Strange that they are working in secrecy. I would guess something like[1] would be better.

[1]: http://www.postgresql.org/message-id/CA+U5nM+AFftDf-8UaMoe7Z...

ddorian43 · on March 6, 2015

Maybe you meant about this ?

http://www.datanami.com/2015/03/04/fujitsu-adding-column-ori...

tomiko_nakamura · on March 2, 2015

PostgreSQL does not have in-place update in the first place, on every UPDATE it does a DELETE+INSERT. So "partial update" does not make much sense, because you'll create a new copy of the row anyway.

NickNameNick · on March 2, 2015

That's not quite how PostgreSQL's Multi-value-Concurrency system works.

tomiko_nakamura · on March 2, 2015

MVCC means "Multiversion Concurrency Control" and yes, that's pretty much exactly how it works (creating copies of the rows, so that different snapshots see different rows).

In practice it's more complicated of course. A nice intro to MVCC is available here: http://momjian.us/main/writings/pgsql/mvcc.pdf (and in the Bruce's other talks http://momjian.us/main/presentations/internals.html).

Or maybe you mean something else by Multi-value-Concurrency. In that case please elaborate.

tomiko_nakamura · on March 2, 2015

I really wonder what you mean by UPSERTS. Because the traditional meaning for UPSERT is "UPDATE if exists, INSERT otherwise", aka MERGE. And you're right PostgreSQL is missing that, but hopefully we'll get something like that in PostgreSQL 9.5 at the end of 2015. There are ways to do that in PostgreSQL using writable CTEs, in some cases.

But that's completely unrelated to JSONB, so maybe you mean something else?

Nitpick: it's JSONB, not JSOB.

tomiko_nakamura · on March 2, 2015

It doesn't have MySQL-like pluggable storage, and frankly I'm thankful for that. How many of the MySQL storage engines actually work, including transactions for queries working with tables using different storage engines and such? Or support all the features like fulltext, and so on? Whenever I want to get scared at night, I either watch the first "Alien" movie or read "MySQL restrictions and limitations". So no, thank you very much.

OTOH, PostgreSQL code base is one of the cleanest and well structured code bases I've seen, and adding a new storage engine is not all that complicated, assuming you know what you're doing and have time to do that properly. If you just want something simple, managed outside but accessed by SQL, use FDW.

tomiko_nakamura · on March 2, 2015

I'm not saying "plan locking" is not a useful feature in some cases, and maybe it would really solve this particular issue. But my experience is that when people say "I need it" in most cases it's "I've been working with X for a long time, and that's how you solve problems looking like this there." (where X is usually Oracle or DB2).

If you really need nightly vacuuming (and not just analyze), and you accidentally turn it off, you have bigger problems than planning.

This should be handled by autovacuum in modern versions, so either you're running a very old version (before 8.3 when autovacuum was enabled by default).

So either you're running a very old version (i.e. older than ~5 years - sorry, not much can be done here except for upgrading).

Or you don't have autovacuum tuned properly, or you really have strange workload. In those cases you should really talk to people on pgsql-performance.

x0x0 · on March 2, 2015

This experience was 8 years ago; autovacuum is nice.

Having been bitten two or three times by a fast query suddenly getting very slow because a heuristic in the query planner flipped over (including once in the last couple of years), I'd simply like to fix a performant enough query plan and not worry about it.

Just to make my opinion clear, pg along with linux and vim are the best pieces of software I use.

tomiko_nakamura · on March 1, 2015

Expression indexes are nice, and you can still do that. It however requires you to know what you'll be looking for (and create index on that particular expression).

The stuff I find really awesome is indexing for arbitrary expressions:

CREATE INDEX jsonb_index ON jsonb_table USING GIN (jsonb_column);

And then you can where whatever key/value you want, and it'll use the index.

tomiko_nakamura · on March 1, 2015

Why? What mistake? I mean, many people won't use that, but for many it's very handy. And the features delivered with JSONB (e.g. the indexing) is way better than what was available with XML.

tomiko_nakamura · on March 1, 2015

JSONB had issue with compression, because the compression is decided on the fly by checking if first part of the serialized data, and in the original format the first part was incompressible (thus no compression was used). But this was fixed in beta3, and so 9.4.0 works fine.

Regarding size, it's more complicated. For example after loading the 'delicious-1250k' dataset from

http://randomwalker.info/data/delicious/

I get 1322MB for JSON and 1415MB for JSONB, so yes, in this case it's slightly larger.

FWIW, both JSON and JSONB use TOAST, because that's how PostgreSQL treats works around the 8kB page size (or whatever page size you use).