PostgreSQL 9.4 Released

pilif · on Dec 18, 2014

Like every year before, the Postgres team has blessed us with an early christmas present. And like every release post before, I'd like to use this opportunity to say thanks to the team for the awesome job they are doing year after year.

It's not just the database itself (and that's awesome on its own right), but it's also all the peripheral stuff: The documentation is seriously amazing and very complete, the tools that come with the database are really good too (like psql which I still prefer to the various UIs out there).

Code-wise, I would recommend anybody to have a look at their git repo and the way how they write commit-messages: They are a pleasure to read and really explain what's going on. If everybody wrote commit messages like this, we'd be in a much better place what code-archeology is concerned.

Patches from the community are always patiently reviewed and, contrary to many other projects, even new contributors are not really required to have a thick skin nor flame retardant suits. The only thing required is a lot of patience as the level of quality required for a patch to go in is very, very high.

Finally, there's #postgresql on Freenode where core developers spend their time patiently helping people in need of support. Some questions could be solved by spending 30 seconds in the (as I said: excellent) manual and some of them point to really obscure issues, but no matter what time it is: Somebody in #postgresql is there to help you.

I think there's no other free software project out there that just gets everything right: Very friendly community, awesome documentation, awesome tools, and of course and awesome product offering to begin with.

Huge thanks to everybody involved.

Also: Huge YAY for jsonb - I have many, many things in mind I can use that for and I have been looking forward to this for a year now.

gh02t · on Dec 18, 2014

> we'd be in a much better place what code-archaeology is concerned.

This sounds like a great setup for a sci-fi novel. 500 years into the future, the infrastructure their distant ancestors coded has begun to fail. Now Biff Miffington, code-archaeologist, must sift through millions of forgotten messages using a mysterious tool remembered only as "git." Its interface is arcane and the remaining messages broken, tainted by the destructive influence of Mountain Dew and Cheetos. Will he unravel the mystery that's causing Candy Crush Saga MCCXXXI to kill its users?

Edit: On what OP actually said, I'd also like to say that Postgres is an awesome product.

arethuza · on Dec 18, 2014

Vernor Vinge's A Deepness in the Sky actually has a role described as "programmer archaeologist":

http://en.wikipedia.org/wiki/A_Deepness_in_the_Sky#Interstel...

veidr · on Dec 18, 2014

That is, btw, an absolutely fantastic distant future sci-fi novel. Among the best of all time. (Vinge doesn't consistently product books that good, but man that one is great.)

squeaky-clean · on Dec 19, 2014

Just bought it because that premise sounds amazing. Amazon has it listed as "Zones of Thought series Book 2". Should I read book 1 before reading this?

veidr · on Dec 19, 2014

Maybe, but I don't think you necessarily need to. I started with A Deepness in the Sky and then read the other two (and then all the other books Vinge has written).

Especially since it is actually a prequel, I don't think the order is that important.

wiml · on Dec 19, 2014

There are things in Deepness that will have more significance if you've read A Fire upon the Deep, but nothing central to the plot. They're both standalone novels, just in the same setting/universe/background.

jdnier · on Dec 18, 2014

A great book, full of interesting ideas and implications of software development carried on over millennia.

angersock · on Dec 18, 2014

Testing the waters, I'd played with the idea of a story with basically this setup:

Two new developers start at a company/startup, and are brought in to do a six-month sprint to fix a stalled/broken product after the development team becomes unavailable because reasons. So, they're dropped into a codebase and are trying to pull everything together.

However, as they work through into deeper and deeper parts of the system, the commit messages and comments get more and more cryptic and unsettling, hinting at the reasons for the prior team's dissolution, the business forces that caused that to happen, and maybe something worse/better going on outside.

I'm really lazy though. :(

phaemon · on Dec 19, 2014

>after the development team becomes unavailable because reasons.

Every member of the programming team was attacked and killed by wolves in unrelated incidents on the same night.

And they said it couldn't happen...

elsurudo · on Dec 18, 2014

I like it. It reminds me of something like System Shock (2 is the one I played), where you learn about what happened on this ship before it was overrun by mutants from journals and such. Instead of a book, this may work well as a game of some sort. Perhaps even text-based?

pilif · on Dec 18, 2014

This is a really good idea. Too bad I seriously suck at writing fiction or I'd pick this up.

NoMoreNicksLeft · on Dec 18, 2014

In Vernor Vinge's Deepness In The Sky, there really are code archeologists, as it's often simpler to find programs than to write them from scratch.

They believe that the Unix epoch (since extended many times over) dates to the moon landing (approximately correct), since the exact historical explanation has been lost.

pilif · on Dec 18, 2014

When I say "code archeology", I mean the equivalent of "git blame" without assigning blame - just finding out why stuff is the way it is and not who's responsible because that would be totally pointless, especially years after the incident.

I do like your spin however :-)

gh02t · on Dec 18, 2014

Oh yeah, I knew your meaning. Doing blame and bisect to find a deep-seated bug can be a lot of fun... or frustrating and miserable. I tend to give terrible on word commit messages when I'm developing, but I always do an interactive rebase to clean things up and make a reasonable view of history. We use Phabricator a lot too, which does a great job of letting you annotate code, commits etc.

NickNameNick · on Dec 19, 2014

Praise is a built-in alias for SVN, not so for git unfortunately.

icebraining · on Dec 19, 2014

Ditto in Bazaar, which copied SVN. But there's an easy fix:

  git config --global alias.praise blame

dnfehren · on Dec 18, 2014

Ken Macleod's books have a some of this kind of stuff in them - people having to maintain old space stations that run Oracle DB's and stuff like that.

frost_knight · on Dec 18, 2014

Candy Crush Saga on git? Sounds more like a fantasy novel.

gh02t · on Dec 18, 2014

Heh, the twist is that he discovers the secret, which is that Candy Crush is really a giant conspiracy to cull the feeble. He decides not to fix anything.

kermatt · on Dec 18, 2014

Maybe not once it becomes an ancient relic: https://github.com/Olde-Skuul/doom3do

chetanahuja · on Dec 18, 2014

I'm pretty sure Sony has already optioned this plot and trying to get Tom Cruise for Biff.

gdulli · on Dec 18, 2014

In addition to those peripheral benefits there's also the intangibles. Mainly that worrying about database stability is just not part of my life the way it was five years ago before I'd fully migrated to PostgreSQL.

emcrazyone · on Dec 19, 2014

well said @pilif. I've been using PostgreSQL since the early 6.x days as the back end to web sites. As time went on, I developed an Accounting & ERP software suite in C++ with a thin wrapper around libpq. We've pretty much standardized on PostgreSQL for the things you just said: "friendly community, awesome documentation, awesome tools, and of course and awesome product...."

I literally have a multi-million dollar business & product because of it. And without it, we would be stuck with Oracle or MS-SQL.

And we just recently added full text search within our product using the PostgreSQL full text add-on. My customers absolutely love the feature and they love us because of this.

Another HUGE Thank you from me too.

danmaz74 · on Dec 18, 2014

+1 for #postgresql help. Recently I had a really weird problem, and after a few hours with forums and manuals I went there to ask for help. They saved me I don't know how many more hours that would almost for sure not found the completely unexpected answer. Thanks!

jeffdavis · on Dec 19, 2014

Can the documentation or user experience be improved to prevent that issue from happening to someone else?

danmaz74 · on Dec 19, 2014

There isn't an easy way imho, because this was a real edge case caused by creating a new index while there was an "idle in transaction" connection: http://dba.stackexchange.com/questions/84844/dropped-an-inde...

But more in general, at least in my experience, understanding and optimizing indexes usage is the most important and most difficult task with postgresql, and improving the documentation about this could really help. Usually they work well on their own, but a few times I was really baffled why pg wouldn't use an index which I created to optimize some important and slow query, even if using the index (when I found a way to "convince" pg to do so) cut the execution time by a factor of 100 or even 1000.

The easiest improvement that comes to my mind would be to add to the manual a better and more in-depth explanation about how to optimize index usage; this could include a FAQ where you could also include my edge case. This current page could be a starting point: http://www.postgresql.org/docs/9.3/interactive/indexes-exami...

An even better but much more long term project would be to improve the EXPLAIN ANALYZE commands; specifically it would be great if it could show the different plans considered, making it easy to understand why a specific plan was discarded. Right now, the only way to nudge pg in the right direction is by trial and error. Also making the explain output easier to understand would help, but I guess that's difficult.

Anyway, thanks again for all the effort of the pg contributors!

IndianAstronaut · on Dec 18, 2014

>even new contributors are not really required to have a thick skin nor flame retardant suits

I will have to look into this community more. I've been burned by some oss communities before.

qooleot · on Dec 18, 2014

Since you're planning on doing some conceptual work with JSONB, just a heads upon one gotcha -

duplicate properties are not allowed. I.E.:

{ task: "do stuff", task: "do other stuff" }

which sometimes is useful when you have front-end data with an N-number of entries but its a form that serializes to an object instead of an array. There are other use cases too.

masklinn · on Dec 18, 2014

RFC4627 §2.2 ¶1

> The names within an object SHOULD be unique.

RFC2119 §3

> [SHOULD], or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

Not allowing duplicate keys in JSON objects is very close the exact opposite of a gotcha. Allowing and round-tripping them would be a gotcha.

qooleot · on Dec 18, 2014

No, it can be a gotcha because the json column type (which still exists and many may mistakenly use instead of the jsonb type) allows it:

select '{"foo": 1, "foo": 2}'::json; json ---------------------- {"foo": 1, "foo": 2} (1 row)

select '{"foo": 1, "foo": 2}'::jsonb; jsonb ------------ {"foo": 2} (1 row)

So technically, the gotcha is you have to remember to use jsonb instead of json, as well as json having a true "gotcha" and jsonb not having one.

ak4g · on Dec 19, 2014

the json type just stores the string value passed to it, but only the final value is available to the JSON-aware operators.

  mydb=# select
      ('{"foo": 1, "foo": 2}'::jsonb)->'foo' jsonb_foo,
      ('{"foo": 1, "foo": 2}'::json)->'foo'  json_foo;
  jsonb_foo|2
  json_foo|2

tracker1 · on Dec 18, 2014

Most implementations that allow them, from those I've seen only take the last (in actual order) value...

harel · on Dec 18, 2014

I would have been worried if it was allowed. This is exactly what arrays are for.

jackmaney · on Dec 19, 2014

Okay, I'll bite.

What, exactly, do you think should happen if you have an object of the form

    { task: "do stuff", task: "do other stuff" }

? Objects in JSON are key-value pairs. A single key goes to a single value. Instead, you should map task to an array of values.

mjevans · on Dec 19, 2014

No, you should have written an array of values to begin with.

Coercing data types is surely more wrong than taking the most recently supplied value for a given key in a map.

mc808 · on Dec 19, 2014

Um, I think you're agreeing. "Instead, you [i.e., not the deserializer] should map task to an array of values."

tracker1 · on Dec 18, 2014

Just want to say to the PostgreSQL and EnterpriseDB guys that it's always great to see the progress on this. My hopes for 9.5/10 is that we will see PLV8 and Replication baked into the actual release.

PLV8 is such a natural fit with the new JSON(B) types that it's probably going to become the most used extension with that data type... And imho sorely missing from the out of the box experience. I'm glad that they've concentrated on getting the data structure and storage right first. Hopefully we'll see this in vNext.

As to replication, I understand that this is part of EnterpriseDB's business model, just the same not having the basic replication pieces baked in, is still lacking compared to other databases. Even if the graphical tooling was commercial only, and all the knob frobbing via config or command line is more complex, having it in the box is a must imho. I actually really like how MongoDB handles their replica sets, and where RethinkDB is going with this as well. Though they aren't transactional SQL databases primarily, it's a must have feature these days. Replication with automagic failover is a feature that has gone past enterprise-only.

One last piece, would be if there were built in functions similar to the String.prototype.normalize that was added to JavaScript... so that input strings could be normalized easier for comparison/indexing, though PLV8 support could/would bring this readily.

All the same, thanks for all of your hard work, and I look forward to the future of PostgreSQL.

internetisthesh · on Dec 18, 2014

Yes, replication with auto failover is far away from enterprice-only today. If you pay 5$/month you can get a SQL Database in Azure running on 3 nodes with auto failover. One node is synchronous and the second one is async updated. I wouldnt choose a database today unless setting up somerhing like that is obvious and trivial.

gfodor · on Dec 18, 2014

JSONB is getting a lot of attention (and deservedly so) but logical decoding is much more exciting to me. Being able to capture postgres transactions to put into a durable log (like Kafka) for downstream processing is a fundamental tool needed to build a unified logging architecture. If you've worked with hadoop you've probably tried to approximate this by hand by taking regular snapshots of your database or something, but this is much, much more sane. Really exciting. Great work postgres team!

jeltz · on Dec 18, 2014

Looking forward to when we move to 9.4 so I can start using "sum(foo) FILTER (WHERE bar)" instead of the ugly "sum(CASE WHEN bar THEN foo END)".

wvenable · on Dec 18, 2014

Wow, that's fantastic. My queries are littered with that type of construct. Unfortunately, we are using SQL Server and Microsoft hardly ever makes any developer-friendly enhancements to T-SQL.

tracker1 · on Dec 18, 2014

True enough... I can't even begin to wrap my head around the XML wrangling functions in T-SQL, I feel like a n00b copying JS methods for jQuery when I go there. That said, it's probably one of the easiest RDBMS servers to administrate for small-medium sized deployments.

sarciszewski · on Dec 18, 2014

I've written migration unit testing tools before between SQL Server and PostgreSQL specifically (there are specific gotchas between the two).

If you need help migrating to PGSQL send me an email and I'll help you escape from MSSQHell. :D

electrum · on Dec 18, 2014

In Presto, we support the popular but non-standard IF construct as a shorthand for CASE, so you can do this: sum(if(bar, foo))

We also have a count_if() function: count_if(bar)

http://prestodb.io/docs/current/functions/conditional.html#i...

http://prestodb.io/docs/current/functions/aggregate.html#cou...

wiredfool · on Dec 18, 2014

Ooohh. I hadn't seen that one. That's going to make things so much better for the accounting code.

systematical · on Dec 18, 2014

Switched over to PostgreSQL for a personal project for the json datatype. Great if you want some mongo-esque document storage without losing out on having a relational database.

http://clarkdave.net/2013/06/what-can-you-do-with-postgresql...

aidos · on Dec 18, 2014

That info is for PostgreSQL 9.3 (and 9.2). 9.4 comes with other enhancements if you're using json. jsonb in particular opens up whole new possibilities if you're doing heavier work with json data.

radicalbyte · on Dec 18, 2014

Next year I really need to switch from MSSQL to Postgres. The work that the Postgres team have done in the last 2-3 years is really amazing.

They are also clearly reaping the benefits of some very smart architectural decisions, and that gives me the confidence that they will be able to continue innovating in the coming years.

kermatt · on Dec 18, 2014

There is actually a lot of overlap in the feature set. When I moved from MSSQL to a gig that was all Postgres it was fairly smooth to do so, the main differences being OS related. Thus you don't need to switch over entirely to add a useful tool to the toolbox.

MSSQL has some advantages in parallel query execution and data warehousing enhancements - if you can afford the licensing for the latter. The other view on that is you can tip up as many Postgres production instances as you have hardware for without impacting the software budget.

g8oz · on Dec 18, 2014

If only there were a way to run performant Postgres on Windows. Maybe it's time learn Hyper-V so I can run it in a Linux VM.

jrapdx3 · on Dec 19, 2014

That's close to what I'm doing, except running FreeBSD 10.1 in Hyper-V. It's actually pretty easy to set up since FBSD 10.1 is available for download as a VHD. Just need to specify the VHD image file as the disk to attach when creating a new HV VM.

It's somewhat more fun/challenging to configure network access for the VM. Once that's done it's very handy for developing web servers/apps. With the web server running in the VM, the client/browser on the host points to the VM just like any remote site. PostgreSQL in the VM performs quite well.

IMO this kind of development is much more satisfactory in a unix environment than under Windows and Hyper-V provides a convenient way to get there.

taspeotis · on Dec 19, 2014

It looks like PostgreSQL is on track to slowly succeed MySQL as the de-facto open source database.

Microsoft tentatively seems to be settling on them as the preferred RDBMS for non-Windows platforms [1]:

> Within ASP.NET 5 our primary focus is on SQL Server, and then PostgreSQL to support the standard Mac/Linux environment.

I use EF+SQL Server and they're very much complementary and provide an excellent developer experience. NHibernate+SQL Server is woeful unless you want to use the loosely-typed Criteria stuff. NH's LINQ provider is terrible and it gets confused at the drop of a hat (call Distinct and then OrderBy? "I'm sorry Dave, I'm afraid I can't do that"). At this point I'm convinced only MS know how to write LINQ providers that won't fall over the moment you try to do something useful with them.

Microsoft writing a LINQ provider for PgSql is a great thing for running .NET code on non-Windows platforms.

[1] http://blogs.msdn.com/b/adonet/archive/2014/12/02/ef7-priori...

otis_inf · on Dec 19, 2014

> Microsoft writing a LINQ provider for PgSql is a great thing for running .NET code on non-Windows platforms.

I've heard nothing of this kind with respect to npgsql, which is the current .NET provider for PostgreSql. Microsoft has never spent a minute on EF support in Npgsql, so that they bother now is a first. The thing is that to support EF in npgsql (or any other ADO.NET provider), the ADO.NET provider has to contain a command interpreter which interprets the command trees coming from EF's linq provider, and which are then to be used to create SQL statements. This isn't simple at all, and as the command trees change with EF7, it will be a struggle for MS to get a lot of ADO.NET providers support EF7 at the start.

Microsoft's only great linq provider is the one in Linq to Sql: it is able to handle a tremendous amount of edge cases. The thing with linq providers is that a general linq provider gets you only that far: a tremendous amount of cases are 'special cases' which have to get their own path to get from the expression-tree to specific sql. e.g.: ctx.A.Select(a=>a.B.Cs);. This gives a set of sets of C instances. To do this, you have to know at the materialization side which C rows belong to which set (as you have to group them by B, which isn't in the projection). Linq to Sql has a specific piece of code for this, it produces a specifically grafted SQL query which contains an extra column so the materializer can know which C rows to group together. EF doesn't, it obtains a big joined soup.

Irony is that EF7's linq provider will be built on Relinq, which is also the base of NHibernate's linq provider, and they didn't re-use the Linq to sql linq provider, which is kind of odd, considering linq to sql's is pretty db agnostic.

Writing a linq provider isn't simple btw. It took me a full year full time to write the one for LLBLGen Pro.

zapov · on Dec 19, 2014

> At this point I'm convinced only MS know how to write LINQ providers that won't fall over the moment you try to do something useful with them.

I would argue Revenj + Postgres provide much better developer experience. But as you said, it's not written by Microsoft so that attitude doesn't help it out.

I'm 100% sure Microsoft can't write LINQ provider which actually understands Postgres and can use it to the fullest (as Revenj can).

dodyg · on Dec 19, 2014

The problem with Revenj is that it seems to tied to dsl-platform, which looks like an online service. People would rather to have their compiler tools with them. You don't want your code investments to go 'poof' should the online service fails.

zapov · on Dec 19, 2014

While that's a fair criticism (to have compiler offline), dsl-platform can be licensed for an offline use.

It's not like Revenj needs dsl-platform, but rather that dsl-platform integrates into Revenj.

So to make it blunt, would you try/use Revenj if it had part of dsl-platform compilers available for offline use?

dodyg · on Dec 19, 2014

It's a surprise to me that RevenJ can be used without dsl-platform because the readme makes it look like it is dependent on dsl-platform.

Yes, i would try dsl-platform if it's available offline. Online compilers are pretty much deal breakers for me.

dodyg · on Dec 19, 2014

LLBLGen Pro has an excellent LINQ provider as well.

otis_inf · on Dec 19, 2014

Thanks, dodyg :)

squigs25 · on Dec 18, 2014

I'm really pumped about the update to GIN indexes, and the ability to concurrently update materialized views. Both enhancements are huge for the postgres ecosystem, and especially for productionizing postgres databases.

davidgerard · on Dec 18, 2014

Postgres is about to be the new hotness. I mentioned to our hosting provider that we were looking into moving our in-house Oracle and MySQL to Postgres (off Oracle because it's expensive, off MySQL 'cos it's shit) and he said more than a few customers were looking into this precise move.

We're just getting into PG now, and it's just really nice to set up and use. I really wish more web stuff properly supported PG and didn't pretty much require MySQL.

smcleod · on Dec 18, 2014

Congratulations to the PostgreSQL team for continuously supporting and improving what is in my mind the best all-round database server out there.

gamesbrainiac · on Dec 18, 2014

EFF launches an iconic case and PG 9.4 on the same day? This is probably the best day in the year and its not even christmas yet.

codeaken · on Dec 18, 2014

Did PostgreSQL just kill MongoDB?

jpgvm · on Dec 18, 2014

MongoDB has been dead for a long time.

_ugfj · on Dec 19, 2014

I doubt. It seems that MongoDB still has a much richer query syntax than jsonb.

Disclaimer: I consult for MongoDB Inc.

organsnyder · on Dec 18, 2014

bonif · on Dec 18, 2014

You're right, lots of amateur-hour developers might still be using mongodb

organsnyder · on Dec 18, 2014

Yep. MySQL isn't dead yet, either. :-)

I'm not familiar enough with MongoDB (or Postgres 9.4) to really answer the original question. My guess is that Mongo will still be applicable to certain use-cases, but—like you mention—the majority of users will be those who really don't understand the technologies and their strengths/weaknesses.

threeseed · on Dec 18, 2014

Only someone with a poor understanding of the industry would think that. Mongo is killing it in the enterprise and their partnerships with Teradata et al is only going to cement their position for likely decades to come.

Plus MongoDB's pluggable engine approach will definitely breath some new life into it.

nemo44x · on Dec 18, 2014

This is exactly right. I can't believe how snide and ignorant a few of the commenters are (except I can because it's the same folks with the same FUD every time). MongoDB isn't standing still and it is getting better and better with every release. Not to mention there are 100's of massive projects running on MongoDB with success.

Something about MongoDB really drives the Postgres community (and certain NoSQL DB fans) nuts and I'm guessing it's that MongoDB is eating their lunch, growing faster than them, is gaining popularity faster, etc. Keep in mind MongoDB is many years younger too and is maturing more quickly now. Plus, developers absolutely love working with it.

It's a tool like any other and still has issues but I'm afraid some here are dancing on its grave well before it has even shown signs of letting up on its growth.

tensor · on Dec 18, 2014

People here have a problem with it because it is a technically poor solution, even among nosql databases. The main problems are:

1. Historically mongodb was distributed with completely unsafe defaults. It was insane to use it with any data you actually cared about. Once you toggle on the safety features most of the vaunted performance goes away.

2. It doesn't actually scale that well despite claims that it does.

The fact that there are better nosql solutions just make people further annoyed. It's basically the cargo cult behaviour of mongodb proponents that people don't like.

palmer_eldritch · on Dec 19, 2014

Basically what you're saying is that mongodb is the mysql of nosql databases right?

threeseed · on Dec 18, 2014

You know what I don't like. People who complain about technologies they've never used.

1. This was NEVER an issue for 99.999% of people. The drivers all had the default set to FSYNC_SAFE.

2. It DOES scale the way it claims. It just doesn't have unlimited scalability and guess what no product does.

angrybits · on Dec 18, 2014

1. So then the problem they had was they didn't get the performance they expected since the benchmarks were done with different settings to juice the numbers? There is no upside to arguing this one, Mongo loses one way or the other. Can't play them both.

2. I haven't seen anyone say that Mongo scales in an unlimited fashion. You said that. I think what they might be saying is that it doesn't scale well enough to pay for the trade-offs from using it. If you aren't running a system that can be composed of somewhat-interrelated documents, you're gonna have a bad time.

justrudd · on Dec 18, 2014

All the drivers default to FSYNC_SAFE? Maybe the Java one didn't get the memo. It's set to ACKNOWLEDGED by default.

https://github.com/mongodb/mongo-java-driver/blob/master/src...

I like Mongo. And until I really learned about it, I got bitten by the default a few times. Personally I use REPLICA_ACKNOWLEDGED when running in a cluster and FSYNC when writing to a single node.

threeseed · on Dec 19, 2014

At the time when the big uproar of the lack of fsyncing by default the Java driver had it set as FSYNC_SAFE (there was no ACKNOWLEDGED).

norkakn · on Dec 19, 2014

2 is wrong. Mongos shits all over itself after only a few nodes and moderate traffic. It's balancing is incredibly broken and will kill a small cluster very quickly.

Source: tried to use it at scale

threeseed · on Dec 19, 2014

The balancing is broken if you use an inappropriate sharding key. Which is no different to every other system.

And I've used it under very high load with zero problems other than data model changes.

norkakn · on Dec 19, 2014

I think we had one of the larger mongo installs around, so you just might not have been at the scale to see these issues.

angrybits · on Dec 18, 2014

"This just in, Mongo is still in the hype cycle in the slow-moving enterprise space! Obviously after having handily crushed the todo list and twitter clone verticals, they want to expand their influence into mega corporations' dev shops who are tired of dealing with shitty Oracle DBAs. Back to you, Chuck."

davidw · on Dec 18, 2014

There are a lot of things that have changed since my first programming job, back in 1997. Things that I still use and love: Postgres, and Emacs.

mgkimsal · on Dec 18, 2014

Is there an equivalent to the MySQL handlersocket stuff in postgresql?

http://www.percona.com/doc/percona-server/5.5/performance/ha...

http://www.slideshare.net/akirahiguchi/handlersocket-2010062...

It might go against the "no transaction" crowd, but seems useful for performance-critical needs. I'm scheduling a bit of testing time with it next week to see if it's something I'd roll out in production (Maria 10 system)

tiffanyh · on Dec 18, 2014

I really wish someone would update these benchmarks to the latest releases of Scientifc Linux, FreeBSD & Dragonfly

http://www.dragonflybsd.org/performance/

jvinet · on Dec 18, 2014

With JSONB here, JSONPath starts to become very interesting...

http://goessner.net/articles/JsonPath/

http://blog.redfin.com/devblog/2012/03/json_in_postgres.html

That article was written before JSON/JSONB showed up, but the idea remains the same.

I didn't have plv8 installed, so I did some plumbing code in plpython. plv8 would be more suitable though.

https://github.com/jvinet/pg-jsonpath

k_sze · on Dec 18, 2014

Bye bye MongoDB.

digitalzombie · on Dec 18, 2014

How do you cluster in postgresql? Serious question, my prefer noSQL is cassandra and clustering is pretty easy.

I ask this question every year and postgresql have not deliver this. If there is any, there are hardly any documentation on it.

agentultra · on Dec 18, 2014

> In 9.4, Logical Decoding supplies a new API for reading, filtering and manipulating the PostgreSQL replication stream. This interface is the foundation for new replication tools, such as Bi-Directional Replication, which supports the creation of multi-master PostgreSQL clusters. Other improvements in the replication system, such as replication slots and time-delayed replicas, improve management and utility of replica servers.

See also, ye manual: http://www.postgresql.org/docs/9.4/interactive/high-availabi...

klaruz · on Dec 18, 2014

I think the problem is clustering is still very much a duct-tape situation in postgresql with no real clear consensus on how to build out a cluster.

Postgres-XL looks great for scale out, but you need 4 independent types of servers. Even with all those moving parts, it doesn't provide availability. If you want fail-over, you need pacemaker for the data nodes with traditional sync replication, and something like VRRP for your balancer, and something else to failover the coordinator. Several of these pieces can be tricky to set up in a cloud provider.

BDR looks nice, but it looks like there could be lots of gotchas for consistency in there. Maybe it is a magic bullet though... I don't know much about it yet.

Contrast with something like rethinkdb, mysql-galera, cassandra, etc, you start up enough nodes for quorum, tell them about each other, and you're pretty much done. The clients can handle the balancing, or you can use a pooler/balancer.

In my perfect world, I'd install postgresql-awesome-cluster-edition on 3 nodes, add the 3 IPs (or turn on multicast discovery, if my env can support it), and away we go for read scalability and availability. I do this today for mysql-galera, and other than the fact it's mysql, it's awesome. For writes, if you add 4 or more nodes, there should be some sort of shard system like XL has.

That said, postgresql is still clearly the best SQL and even noSQL single node server out there, it's a really great piece of software.

gthank · on Dec 18, 2014

If you honestly believe that all you have to do is stand up a bunch of instances of Mongo/Cassandra/whatever and you instantly get acceptable HA, then you need to read the [Jepsen series](https://aphyr.com/tags/jepsen)

nemo44x · on Dec 18, 2014

It depends on what you consider "acceptable HA". There are many instances where I'm not trying to protect from a network partition (single data center, monitored batch data loads, etc) and don't have a requirement for that level of tolerance. However, you're right in that it's important to know that nearly every distributed system has edge cases where things might not appear as you thought. Elasticsearch has a section on their Website detailing their resiliency efforts. I wish every company was as transparent about what they're doing on that front so we can all plan and consider expectations better.

threeseed · on Dec 18, 2014

Setting up Mongo/Cassandra for HA is still several orders of magnitude less work than with PostgreSQL.

And there are plenty of options for dealing with the issues presented in those series.

internetisthesh · on Dec 18, 2014

Elasticsearch can run with Zookeeper. Zookeeper is pretty solid. Also, MSSQL with AlwaysOn is something which seems very robust too.

threeseed · on Dec 18, 2014

That's an API not a solution.

I do think (for once) PostgreSQL is addressing it's core weakness and by version 10 will likely have horizontal scalability locked down. The new API is a really positive step.

angrybits · on Dec 18, 2014

Because multi-master clustering is extremely hard to do when you need transactional semantics. For those that need it, scaling up is by far the easiest path. Most of us will never exceed the growth rate of server capacity over time, so you can just keep going up. HA using stand-bys and failover is well documented, and quite sufficient I might add.

If you don't need transactional semantics and you do need globally distributed multi-master key/value storage, I would not switch from Cassandra to PG, you already have a good solution for that.

If you don't need transactional semantics and you also do not need globally distributed multi-master key/value storage, use whatever you want, it doesn't really matter.

norkakn · on Dec 18, 2014

Clustering is really, really hard.

Cassandra is one of the better ones out there, but you have to deal with its data model and weird consistency promises (which however weird you think they are, are weirder)

The correct way to cluster also changes dramatically depending on your use case. Sure there are things like RAC that promise to make it just work, but those don't scale more than a few nodes.

Mongo is kind of the worst in this - it clusters in one weird way, has bad tooling, and subtly destroys your data at scale.

The general philosophy with postgres is to do it right, or not do it. There are ways to do specific kinds of clustering, but all of them (just like mongo, oracle, etc) have a lot of nuances to them.

If you have a natural shard key, use a bunch of schemas and table inheritance, and eat the downtime during re-shards. Check out citus as well. They have their issues, but they can help you hook up what you need.

amitlan · on Dec 19, 2014

> If you have a natural shard key, use a bunch of schemas and table inheritance,

They are working on improving FDW API to help make foreign tables available as inheritance children. I think It's a step forward...

norkakn · on Dec 19, 2014

This plus zookeeper to manage shard transitions would be awesome.

threeseed · on Dec 18, 2014

> Cassandra is one of the better ones out there

I take it you haven't actually used Cassandra much in the last few years. It's data model is almost identical to a typical relational one and it's consistency promises are quite clear:

http://www.datastax.com/documentation/cql/3.0/cql/aboutCQL.h...

And I've scaled Cassandra clusters from 1 to 100 nodes in hours with no issues. It really is quite simple. Likewise have had no issues with MongoDB replica sets. It is definitely not "really, really hard".

> postgres is to do it right, or not do it

What a pathetic cop out. PostgreSQL has been around for decades they've had plenty of time to have a proven, stable solution implemented.

norkakn · on Dec 19, 2014

Cassandra avoids some of the really visible issues by being AP instead of CP. Hbase hits them, but dodges a bit by only having row level consistency. They are solving very different problems.

The lack of vector clocks in Cassandra can lead to some very non-intuitive (possible wrong) behavior - check out their counter implementation for some rage on that. It's pretty well made though, and I think C*, Hbase and Postgres all have great uses (along with Redis, and a lot of others)

Mongo tends to get things subtly wrong in ways that corrupt data, or that don't scale, and it gives up both A and C.

dmpk2k · on Dec 19, 2014

It's quite simple until you write code that has to deal with AP (as in CAP), and the limitations that come with it. I have, and what a pain that was.

pmontra · on Dec 18, 2014

> Mongo is kind of the worst in this - it clusters in one weird way, has bad tooling, and subtly destroys your data at scale.

I'm genuinely interested. How does Mongo corrupts data? Thanks.

norkakn · on Dec 19, 2014

The most infuriating way is that it will roll back its oplog on an election, and sometimes throw away confirmed writes.

Mongos does weird magic as well when it gets confused, and will confirm writes to the wrong shards during "interesting" situations.

pmontra · on Dec 19, 2014

This is more than worrysome. I googled for it and found this http://docs.mongodb.org/v2.6/core/replica-set-rollbacks/

Is this behaviour what you are referring to?

> A rollback reverts write operations on a former primary when the member rejoins its replica set after a failover. A rollback is necessary only if the primary had accepted write operations that the secondaries had not successfully replicated before the primary stepped down. When the primary rejoins the set as a secondary, it reverts, or “rolls back,” its write operations to maintain database consistency with the other members.

norkakn · on Dec 19, 2014

Yeah, it's worse than it sounds though. If you write to 2/3 nodes, and the third gets elected, you can roll back majority confirmed writes. I think it did even worse things when near capacity, but those are harder to pin down.

pmontra · on Dec 22, 2014

So the "majority" write concern is not safe. However it seems that the number of nodes that must confirm a write can be configured http://docs.mongodb.org/manual/core/replica-set-write-concer... From what I read there it seems that if you want to get a confirmation from all the nodes and one gets offline you'll be waiting forever (but there is wtimeout). I'm afraid that any db cluster with any technology has similar problems.

icebraining · on Dec 18, 2014

Postgres-XL, Postgres-XC, pg_shard...

mslot · on Dec 19, 2014

Depending on your exact requirements, you can use pg_shard [1] for real-time inserts/update/delete, or CitusDB [2] for scaling out selects, or combine them.

[1] https://github.com/citusdata/pg_shard [2] http://www.citusdata.com/

Disclaimer: I work for Citus Data.

throwaway90446 · on Dec 18, 2014

Just do vnode-based consistent hashing on the client side.

nemo44x · on Dec 18, 2014

How do I add and remove capacity then?

bonif · on Dec 18, 2014

You don't need to (and if you ever do, you will have or recruit the know-how)

http://www.ovh.com/us/dedicated-servers/big-data/

nemo44x · on Dec 18, 2014

That is entirely wrong. It might suit your use case (having a bunch of monkeys running around fixing a hack job) but I'd rather use a tool which can expand and reduce capacity when I want, which is quite often. It saves me money on cloud hosting to only provision what I need when I need it and be able to expand and contract it at will, easily. Your solution fails at this.

bonif · on Dec 18, 2014

This is not my solution (and I don't know much about monkeys, sorry).

This is a cost-effective way to be always maxed out (expanded as you say) with fixed price, this can scale well beyond the needs of almost every business. And , as I said, if you happen to be the next Facebook, Uber, Airbnb or whatever, you will acquire the know-how to scale.

atonse · on Dec 18, 2014

Great news! I'd love to move over to this from MongoDB for a project that has high uptime requirements. But while I think the JSON will really replace it, does PG have a solution for High Availability (like replica sets) in the works?

I'm newer to Postgres so am not sure. Replica Sets are the killer feature for me, more so than just storing JSON documents. I'd appreciate if someone can chime in. I've done some googling but there seem to be multiple strategies for replication.

arthursilva · on Dec 18, 2014

PG has async/sync/hybrid replication for years already. It's not as tooled as MongoDB but there's some tools like http://www.repmgr.org/ to amend it.

internetisthesh · on Dec 18, 2014

Every time I look into tools like this they are quite far behind for example AlwaysOn in MSSQL. For example, with PG I have to reseed the original master if it comes back online after an outage. It's as far as I can tell not fully automatic and transparent to me as the guy responsible for managing it. With SQL and elasticsearch+ZooKeeper for example nodes can go up and down without anyone noticing it and me not having to do anything. Is this still a place where PG is behind?

tensor · on Dec 18, 2014

Yes and no. There are differences in the data guarantees that something like elastic search and postgresql give. So it's not really appropriate to compare those.

MS SQL does have easier tooling for replication. The setup for postrgresql is complex and it doesn't come with out of the box tools to easily manage failover and recovery as you mention. Progress is being made in making this easier, but it's still mostly in the low level functionality:

https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_...

amitlan · on Dec 19, 2014

> For example, with PG I have to reseed the original master if it comes back online after an outage

pg_rewind[1] is supposed to fix this. It's moving into core for 9.5, by the way.

[1] https://github.com/vmware/pg_rewind

devonnulled · on Dec 18, 2014

Hey there. I gave a talk at PGConfNYC '14 earlier this year about our solution to HA using open source solutions. The animations and stuff don't really work too well on Slideshare, but it should point you in the right direction:

http://www.slideshare.net/TeamARIN/building-a-high-availabil...

pgSQL doesn't have anything built in for fencing, failing-over, etc. by default, but by using stuff like Pacemaker, you can get the job done with a little elbow grease.

illumen · on Dec 19, 2014

Do you have any scripts to set up a cluster for this? Or know where to look?

en4bz · on Dec 18, 2014

Did you take a look at Slony-I?

http://slony.info/

tracker1 · on Dec 18, 2014

There are some open-source options, and several closed source options for this... parts are closer to baked in with 9.4, but being realistic, you need to defer to a commercial option.

EnterpriseDB pricing for this isn't too bad (about $7k/cpu-socket/year), which is a lot less than MS-SQL, DB2 or Oracle for most uses... but it's imho a feature that should be in the box.

jsmeaton · on Dec 19, 2014

Pricing per socket really really frustrates me. I work in an Oracle shop where we're hamstrung by similar pricing.

gfodor · on Dec 18, 2014

AFAIK RDS has turnkey read only replicas (and hot standby.)

elchief · on Dec 18, 2014

Full notes here:

http://www.postgresql.org/docs/9.4/static/release-9-4.html

My favourite parts:

Allow views to be automatically updated even if they contain some non-updatable columns

Allow control over whether INSERTs and UPDATEs can add rows to an auto-updatable view that would not appear in the view. This is controlled with the new CREATE VIEW clause WITH CHECK OPTION.

Allow security barrier views to be automatically updatable

nonuby · on Dec 19, 2014

Whats a good place to suggest a postgresql [json] improvement, my message was intercepted when posting to pg-performance. A major one at the moment is that offset does the select projection on discarded rows (common to use offset in paging), under normal circumstances this isn't a problem, however when does a json operation such as reading a field ->> this causes major performance degrading. Note of course only immutable functions can this optimization apply. There are several workarounds but if PostgreSQL wants to win back some nosql heads it should be straight forward.

In addition to update a json field isn't straight forward, these operations should be supported by first class inbuilt functions.

Its getting close but its not quite a nosql killer yet if they are targetting people who didn't originally come from rdbms background..

netcraft · on Dec 18, 2014

How long till someone creates a library with the mongo api?

remeh · on Dec 18, 2014

Actually ... https://github.com/umitanuki/mongres

mcx · on Dec 18, 2014

There is also ToroDB: https://github.com/torodb/torodb, HN post: https://news.ycombinator.com/item?id=8527013

petercooper · on Dec 18, 2014

Are there any good books coming out that cover Postgres 9.4? I know the docs are okay but I want something with more of a narrative structure as my history with Postgres is spotty. The only one I've found so far is O'Reilly's "PostgreSQL: Up and Running, 2nd Edition" coming out this month but would prefer a personal rec.

VieElm · on Dec 18, 2014

I have O'Reilly's "PostgreSQL: Up and Running, 2nd Edition" (You can buy the ebook already) and it's mostly about setting it up and administering postgres and a look at the tools around postgres. It's not a book that goes in depth on actually on how to write SQL for postgres, although it has a secton on this, or develop applications with postgres. It's more of an ops book. Anyway, I like it because coming from MySQL I was unfamiliar with how to set up and use postgres and this book set me straight.

petercooper · on Dec 18, 2014

Thanks to Safari, I'm now several chapters into the book and it's perfect for me - thanks :) It was the ops side of things that was the biggest sticking point for me, although it seems like it'll dig into the JSONB stuff later on too.

percept · on Dec 18, 2014

_Beginning Postgres_, Cooper Press (2015)

kermatt · on Dec 18, 2014

Any ISBN or link for that? Google mostly returns old unrelated titles.

petercooper · on Dec 18, 2014

percept is winding me up ;-) Cooper Press is my company and I wrote a book called Beginning Ruby many moons ago. I wrote the first edition while learning the details of Ruby myself and I'd consider writing a Postgres book under similar circumstances if only I had the time! ;-)

steventhedev · on Dec 18, 2014

Quick question regarding JSONB:

Is attribute order stable? Obviously, order is not preserved, but if the order changes on subsequent accesses, this causes problems if you ever serve content directly from a jsonb field without sorting the attributes manually.

azdle · on Dec 18, 2014

Why would this cause problems? Order is irrelevant in JSON. "An object is an unordered set of name/value pairs."[0] From a there is absolutely no difference between {"name": "Patrick", "age": 24} and {"age": 24, "name": "Patrick"}.

[0] http://json.org/

EDIT: Well, now four of us have replied at the same time saying the same thing, I'd say this topic is now well covered.

skybrian · on Dec 18, 2014

Sometimes it matters whether serialization to a string is a deterministic function. If the JSON is the same, you want the output to be the same. A JSON parser won't care, but it's useful to do a string comparison without parsing (for example when diffing output).

lsaferite · on Dec 18, 2014

But that's not really a correct thing to expect is it? If an object is an unordered set of name/value pairs then multiple serialized versions of the object may indeed be the same object data.

skybrian · on Dec 18, 2014

It's not something you can assume, but sorting the keys is sometimes a desirable feature in a JSON serializer. (Just like pretty-printing is.)

tomiko_nakamura · on Dec 22, 2014

AFAIK the default serializer does not guarantee the order to be stable, but I haven't participated in the development of this feature that closely. So maybe I'm wrong.

Anyway, it should be possible to write a simple serializer on your own - a trivial PL/V8 function should suffice, I guess. And then you can define a CAST using that function (but generally adding casts is a bit dangerous, as it may have unexpected consequences).

gburt · on Dec 18, 2014

Of course, it is still an interesting question of whether it would be practically stable in Postgres or not.

matthewrudy · on Dec 18, 2014

According to the json spec

    "An object is an unordered set of name/value pairs."

So you should not be relying on ordering of json fields.

steventhedev · on Dec 18, 2014

In theory, correct. In practice, here are the two use cases which that approach breaks:

1. Cache control using etags. If the content changes by a single byte, even if semantically identical, the etag should change. Hence '{"foo":1,"bar":2}' is not equivalent to '{"bar":2,"foo":1}'. I can see serving such information directly, or embedding it into a larger JSON response.

2. Committed JSON files. This is an anti-pattern, but one I've seen many times. In one case, it was a translation file that was generated in a separate project. When I joined, it had been like that for more than 5 years. It was like that when I left, although I at least monkey patched REXML to emit sorted attributes in XML (also unordered in theory).

So while in practice, I don't depend on a specific ordering, but I need the order (whatever it is) to be stable. For all I care, it could be sorted by the cryptographic hash of the keys, so long as it is consistent.

Just to gather what others have said here: JSONB appears to store fields sorted lexicographically for binary search purposes, and emits them in that order as well. This could be undocumented behavior, yet important for the above two use cases.

perlgeek · on Dec 18, 2014

You can still use the 'json' type (not 'jsonb'), which preserves the text representation.

Or you can sort the keys on serialization when you serve or store the JSON.

lucian1900 · on Dec 19, 2014

For the former, you really need to sort the keys. If you ever end up passing the JSON through any other encode/decode step, your ETags will break and it won't be obvious why.

For hand-written JSON, I'm not sure there's a good solution besides modding the users' editors to sort for them.

arthursilva · on Dec 18, 2014

Fields are always sorted (internally), I didn't test but I assume it's preserved on the wire (as json) as well.

jeltz · on Dec 18, 2014

The attributes are stored and returned in sorted order. The reason for this is to allow binary search in JSON objects with many keys.

rpedela · on Dec 18, 2014

I am pretty happy about the addition of ALTER SYSTEM. I haven't tried it yet, but I think it will make automatic failover to a standby easier to implement. Does anyone have experience with this?

aswanson · on Dec 18, 2014

Even though rails hides it from me, I honestly don't mind directly working with the SQL interface on this db, and it's language interfaces are awesome. Thanks, team.

apetresc · on Dec 18, 2014

So when can we expect AWS RDS to support it? :)

_ondq · on Dec 18, 2014

Not trying to be mean but, why wait? If you're stuck on AWS it's not impossible (though difficult and annoying certainly) to get a decently-performant Postgres instance going. The experience setting it up will lessen the vendor-lock that Amazon has on you (convenience always has a price).

gfodor · on Dec 18, 2014

The answer is simple, as with any AWS service, a whole slew of concerns (but not all of them) become Not My Problem. I have enough problems.

_ondq · on Dec 18, 2014

...in exchange for a number of different problems including (in no particular order): severe vendor lock-in, mediocre to acceptable performance, and relatively high cost per unit of performance.

lewaldman · on Dec 18, 2014

I really don't get all this yadda yadda about AWS lock-in.

Which component on AWS doesn't have a OpenSource counter-part that you, having the time, knowledge (;P) and time for it, could not implement on your own infrastructure?

Really... It's an honest question from some one that works as Senior AWS architect on a full time job.

_ondq · on Dec 18, 2014

Imagine you're an early stage startup. You have a handful of overworked, stressed-out engineers trying to ship v1.0. You don't have the time or resources for a dedicated Ops team, so you go with AWS (I think this is a great use case for AWS or any public cloud, btw--it just makes sense at a really early stage startup).

You ship your product, get some customers. Fast forward 18-24 months. You have grown enormously, you have lots of customers and revenue projections and expectant investors.

Your (now much larger) engineering team has gotten used to AWS conveniences and leveraged a lot of them in the development workflow. Your architecture consists of several layers of ELBs, you have painstakenly set up autoscaling and deployment via Elastic Beanstalk. Your server backups are AMIs. Your data lives in RDS (analytics in Red Shift). Your webservers use Elasticache. SQS is the backbone of your asynchronous job workflow. Customer email traffic goes through SES. Etc.

Now your AWS bill is something like $25-30k/month. You're at the point where the pricing differential between real hardware and AWS is getting big, and it will only get worse over time.

Now management has to make the hard choice:

- Build an Ops team capable of architecting, constructing and deploying new infrastructure on bare metal

- 3-6 months of work to go from design to final cut-over

- Risk some business interruption in the changeover (downtime, unexpected problems) and slower development iteration until the engineering team gets used to new tools/ways of pushing code out.

or

- Continue on AWS and eat the bill as cost of doing business.

This is vendor lock-in at the most basic level.

akurilin · on Dec 18, 2014

Ultimately this is a good problem to have, no? Your startup is alive and growing, you can afford an ops team, your product proved itself. Most startups will have been dead for years at this point.

hkarthik · on Dec 19, 2014

> Now your AWS bill is something like $25-30k/month. You're at the point where the pricing differential between real hardware and AWS is getting big, and it will only get worse over time.

Until you're at the point where it's on par with hiring a dedicated Ops team, $25-30K/month will really feel like a drop in the bucket.

This is why so many of us are happy to accept our AWS overlords.

gfodor · on Dec 19, 2014

Congratulations, you're rich and now can pay someone else to deal with this.

_ondq · on Dec 19, 2014

Do you seriously think that a two year old startup (in the scenario proposed) is "rich"?

atonse · on Dec 19, 2014

We've been running mostly VMs all this time on our app (Windows, SQL Server, Linux w/ Rails), but have been using more and more ACTUAL AWS services (SQS, DynamoDB, Redshift).

I can maybe see where those latter amazon specific services are a lock-in. But I still don't see it that way, really. They aren't a lock-in from the perspective of the code if you put everything behind service interfaces with different implementations. The big guys like Azure and Google Compute all have similar services for DB as a service, blob storage, VMs, etc.

But the act of migrating to something like Azure is just generally a lot of work in ensuring 0 downtime and a smooth transition. That's hard no matter what technologies we use.

And we've talked about it. But just the act of moving is, in my opinion, a multi-week migration process with load tests etc.

And the cost savings would have to outweigh the amount of engineering costs of moving.

All this to say, the real lock-in is lack of dissatisfaction in our situation. "Amazon works well enough for us."

dahdum · on Dec 18, 2014

Different problems, but vendor lock-in isn't really one of them imho. If you are skilled enough to setup pg instead of using RDS you can always move off it to systems under your control.

If you don't have the skills, you can migrate to another provider like Heroku.

As far as my experience, they aren't extending PG in non-compatible ways.

gfodor · on Dec 18, 2014

Yup, and for startups this tradeoff is usually a no-brainer.

personZ · on Dec 18, 2014

I don't personally get the RDS value proposition, but how does it lead to vendor lock-in, much less severe vendor lock-in? I don't believe there is really any lock-in whatsoever -- backup your database and move it wherever you want, whether you're running the pgsql, Mysql, or SQL Server variants of RDS.

_ondq · on Dec 18, 2014

While of course you're correct, it's a bit like saying "you can just pack up your code and run it somewhere else". If the entire architecture and workflow are based around proprietary AWS tools (Elastic Beanstalk, Elastic load balancers, RDS, Cloud Formation, autoscaling, etc) even if your code is technically portable, you're still suffering from vendor lock-in. Arguably worse than the bad old days of the Microsoft monopoly.

Xorlev · on Dec 18, 2014

RDS is just a database though. All the standard tools work with it. Unless you start dipping into more exotic services, it's pretty hard to become locked in. EC2+ELB+RDS are fairly independent. Even autoscaling doesn't lock you in.

That said, CloudFormation, SQS, Kinesis, DynamoDB, etc. are all very sticky services.

correctdarecord · on Dec 18, 2014

Where you see 'vendor lock-in', I see the opportunity to leverage somebody else's engineering talent to solve commodity problems that our customers don't pay for.

personZ · on Dec 18, 2014

That's a bit of an extrapolation though, isn't it? Many people simply run VMs that connects to some arbitrary pgsql instance, which might happen to be on RDS if they want to simplify the management of that.

Those VMs can absolutely run almost anywhere, and they can connect to just about any instance of pgsql. I heavily use many services on AWS, but can quite literally move the entirety of it to my own servers, Google, Azure, or elsewhere in very short notice. People can choose to use some of the more unique services, but that is not related to RDS.

dilap · on Dec 18, 2014

Outside observer: Man, I figure setting up a "decently performing" Postgres instance on AWS would be such a common thing to do that it would either be known to be impossible, or have cookie-cutter instructions, if not a script. How is it that it's "difficult and annoying" still?

Not trolling, genuinely curious :)

mbell · on Dec 18, 2014

It isn't "difficult and annoying" anymore, it used to be before SSD backed EBS and/or provisioned IOPS because you had to RAID0 together a dozen or so magnetic EBS volumes to get decent disk performance and then deal with the annoyance of sorting out a way to take consistent snapshots of the RAID array for backups.

Now you can just toss a single 1TB SSD backed EBS volume on an instance and get ~3k iops, or use provisioned iops to get almost any performance level you need.

joevandyk · on Dec 18, 2014

That performs decently now?

I got scared off using EBS a few years ago and use only the ephemeral storage + failovers.

_ondq · on Dec 18, 2014

Regardless of whether they're backed by SSD, all EBS volumes on an instance sit behind a 1 Gbps pipe (except for the more exotic and expensive instance types). That's part of the reason why Amazon talks about IOPs instead of raw disk bandwidth.

Go ahead and run:

  $ sudo du -hs /*

..on a vanilla m3.* instance and run iotop in a different session. You'll see bandwidth numbers that 2002 would be embarrassed about.

correctdarecord · on Dec 18, 2014

Bandwidth available for EBS volumes varies. For instance, with EBS-optimized volumes it can be 500Mbps, 1Gbps, 2Gbps, or 10Gbps, depending on the instance type, as shown in this chart from an Amazon presentation:

http://cl.ly/image/440b142t0T1x/Screen%20Shot%202014-12-18%2...

The "NA" of the 8xlarge instance types is because there's no ebs-optimization as an optional feature on that instance type; you automatically get access to 10Gbps.

Here's a video of the whole talk: https://www.youtube.com/watch?v=3OH4-Hx3tlE

Here's the slides: http://www.slideshare.net/AmazonWebServices/stg302-28617072

rpedela · on Dec 18, 2014

The SSD EBS claims a default performance of 3 IOPS/GB with a burst of up to 3000 IOPS with a 99% consistency. One of my larger instances uses SSD EBS and so far I am happy with it. In particular, I was very impressed how much faster updating Ubuntu was when I first booted the instance. I am planning to move my Postgres RDS instance to SSD EBS next week. I think (I hope) it will fix a particular stall my application experiences sometimes.

gtaylor · on Dec 18, 2014

It's not just the setup. For smaller companies with tiny ops teams (or a developer that also serves as an ops guy), RDS can save a bunch of time in maintenance/security/upgrade land. You are going to get decent to good performance for most general cases, and you can easily create failovers and scale up/down without much of any Postgres knowledge. Especially when your product is in "get traction and survive" mode, it's nice to be able to worry more about your app and less about things like running a Postgres server or cluster.

Yes, you'll wait a bit longer for new versions to come out, but that may or may not be a big deal. It isn't for us. We have the in-house knowledge to run our own cluster, but RDS does such a good job that we don't need to.

dahdum · on Dec 18, 2014

Same here, have the skills to run it but not the motivation - rather spend all the time on product dev.

Our one exception was PLV8 support, and RDS added that a couple months ago.

photograve · on Dec 18, 2014

Anyone has a performance benchmark and/or has experience on the scalability on this version?

tomiko_nakamura · on Dec 22, 2014

That really depends on what version you're using now, and what exactly you mean by scalability.

If you're using 9.1 or older, you may see a significant improvement in OLTP workloads on many-core machines (making it linearly scalable to >64 CPUs). This happened in 9.2 (i.e. ~2 years ago).

The main improvement in 9.4 I'm aware of is the GIN fastscan, which significantly improves performance of applications using GIN indexes (e.g. full-text).

Of course, there are many other performance improvements on various places - the principle is not to make the new version slower.

Some interesting numbers were presented in this talk at pgconf.eu 2014, including 9.4 beta (but there should be no significant differences): http://www.slideshare.net/fuzzycz/performance-archaeology-40...

sarciszewski · on Dec 18, 2014

Best news I've heard all week. Awesome. Time to update my servers :D

ageyfman · on Dec 18, 2014

we currently use 9.4 beta, and it's been rock solid for us. We chose it because of the jsonb data type. JSONB has been a great fit for the type of work that we needed it to do.

curiously · on Dec 18, 2014

I just recently began using PostgreSQL albiet an older version. Does this 9.4 mean that MongoDB is now pretty much a dud? Being able to store, manipulate, query JSON data AND have SQL on a established wheel that have been proven reliable and polished far longer than the age of most other databases?

Are there any code examples (preferably Python) that show how to use JSONB? I'd love to see some examples on how to query every record that contains a key in a json, or order rows based on a value in a json object.

off topic: If Meteor.js implements PostgreSQL 9.4 I would seriously consider using it again. That and maybe make DDP scalable.

tomiko_nakamura · on Dec 22, 2014

I'm not a big fan of MongoDB, but I don't think the introduction of JSONB in PostgreSQL 9.4 makes it a dud (which does not mean MongoDB is not a dud for other reasons).

JSONB allows you to do a lot of things that people are often doing with MongoDB (or document databases in general), but there are still some features not available in PostgreSQL. Built-in sharding, for example. There are external tools to do that with PostgreSQL, and I do have my doubts about MongoDB (partially because I only hear about the horror stories), and I expect similar features in PostgreSQL 9.5 / 9.6, but at the moment it's not there.

Not sure what you mean by Python examples - you can either fetch the data as 'text' and convert it in the application (e.g. json.loads) or just use psycopg2 with an adapter (http://initd.org/psycopg/docs/extras.html) and you'll get the data as Python dictionaries.

The best source of examples is the official documentation (http://www.postgresql.org/docs/9.4/static/datatype-json.html) and the "NoSQL on ACID" training from Bruce Momjian and Thom Brown (https://wiki.postgresql.org/images/d/de/NoSQL_training_-_pgc...).

hype_this · on Dec 19, 2014

What a joke. There's not even a working MySQL => Postgres 9.4 migration tool that works. Try them.

On paper, Postgres sounds great. But of all the people on here cooing about it, how many are actually using the tool?

I don't know who has the stronger hype machine on Hacker News: Postgres or Rust.

forgottenpass · on Dec 19, 2014

Just because you haven't used it doesn't put the adoption of postgres on the same level as a still experimental programming language. It's a solid open source solution in the run of the mill domain of RDBMS. It has a weird level of hype these days in certain pockets of the internet as mysql's hayday is long gone, and the whole "no sql" fervor is gone and now understood to have been oversold.

I've worked on projects that were using it a decade ago. To this day at a different job I'm still building on it. Shipping it our products to customers, building internal tools on it, and we've have purchased a handful of self-hosted applications that have happened to run on it.

You're generalizing from your own experience and assuming that's everyone's experience. Availability of dump and load tools to move from mysql don't really that much. MySQL isn't poison just because there is a new hotness. Keeping old shit on mysql, and using postgres for new things is a much more common migration strategy.

paulryanrogers · on Dec 20, 2014

Having moved from Mysql to Postgresql in 2005 and continued with it up to the present, I'd say it's a superior database for OLTP, enterprise workloads. Mysql's traditional mode with a transactional engine can work too; though, I've found Postgresql to be a solid, consistent performer for my use cases. I've also used it at home after running into trouble with Sqlite: https://paulrrogers.com/2014/10/self-hosting-owncloud-review... .

Postgresql's excellent documentation made the move much more practical than it would have been using a clearly less documented tool. (One such as MonetDB circa 2005: http://www.monetdb.org/ .)

Similarity to standard SQL and Oracle are also nice if one happens to work in an heavily regulated industry where one may have to migrate to certified platforms.

tomiko_nakamura · on Dec 22, 2014

We certainly are. We're operating an analytical service operated on PostgreSQL - tens of TBs of data, hundreds of machines, tens of thousands of clients. Initially it was running on MySQL, but because of various reasons we migrated to PostgreSQL ~3y ago and never regretted that.

The reasons were both technical (better performance with this kind of workload, great reliability, excellent code quality, ...) and political (we have contributed numerous patches to PostgreSQL - not sure if you ever tried to do that with MySQL).

If you think there's a simple MySQL -> PostgreSQL migration tool (or a migration between arbitrary databases), you're foolish. Databases are not that interchangeable - all databases have some issues with specific workaround, and the 'good bits' are database-specific too. And those things are anchored in the application code, so if you think there's a simple migration tool, you'll be disappointed.

lucian1900 · on Dec 19, 2014

Both Postgres and Rust are grounded in solid, desirable concepts. That's why people like them.

Of course, they are on two sides of the maturity spectrum. Postgres is used by lots of people and we're happy it works so well. Rust is not even released yet, but looks very promising.