Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the problem is clustering is still very much a duct-tape situation in postgresql with no real clear consensus on how to build out a cluster.

Postgres-XL looks great for scale out, but you need 4 independent types of servers. Even with all those moving parts, it doesn't provide availability. If you want fail-over, you need pacemaker for the data nodes with traditional sync replication, and something like VRRP for your balancer, and something else to failover the coordinator. Several of these pieces can be tricky to set up in a cloud provider.

BDR looks nice, but it looks like there could be lots of gotchas for consistency in there. Maybe it is a magic bullet though... I don't know much about it yet.

Contrast with something like rethinkdb, mysql-galera, cassandra, etc, you start up enough nodes for quorum, tell them about each other, and you're pretty much done. The clients can handle the balancing, or you can use a pooler/balancer.

In my perfect world, I'd install postgresql-awesome-cluster-edition on 3 nodes, add the 3 IPs (or turn on multicast discovery, if my env can support it), and away we go for read scalability and availability. I do this today for mysql-galera, and other than the fact it's mysql, it's awesome. For writes, if you add 4 or more nodes, there should be some sort of shard system like XL has.

That said, postgresql is still clearly the best SQL and even noSQL single node server out there, it's a really great piece of software.



If you honestly believe that all you have to do is stand up a bunch of instances of Mongo/Cassandra/whatever and you instantly get acceptable HA, then you need to read the [Jepsen series](https://aphyr.com/tags/jepsen)


It depends on what you consider "acceptable HA". There are many instances where I'm not trying to protect from a network partition (single data center, monitored batch data loads, etc) and don't have a requirement for that level of tolerance. However, you're right in that it's important to know that nearly every distributed system has edge cases where things might not appear as you thought. Elasticsearch has a section on their Website detailing their resiliency efforts. I wish every company was as transparent about what they're doing on that front so we can all plan and consider expectations better.


Setting up Mongo/Cassandra for HA is still several orders of magnitude less work than with PostgreSQL.

And there are plenty of options for dealing with the issues presented in those series.


Elasticsearch can run with Zookeeper. Zookeeper is pretty solid. Also, MSSQL with AlwaysOn is something which seems very robust too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: