PostgreSQL 11 Released

Doches · on Oct 18, 2018

Previous HN discussion on the beta (https://news.ycombinator.com/item?id=17144221) and pre-releases (https://news.ycombinator.com/item?id=18043425).

dang · on Oct 18, 2018

Also https://news.ycombinator.com/item?id=17119785, https://news.ycombinator.com/item?id=17864837, and https://news.ycombinator.com/item?id=18150727.

thegabez · on Oct 18, 2018

Major enhancements in PostgreSQL 11 include:

* Improvements to partitioning functionality, including:

-- Add support for partitioning by a hash key

-- Add support for PRIMARY KEY, FOREIGN KEY, indexes, and triggers on partitioned tables

-- Allow creation of a “default” partition for storing data that does not match any of the remaining partitions

-- UPDATE statements that change a partition key column now cause affected rows to be moved to the appropriate partitions

-- Improve SELECT performance through enhanced partition elimination strategies during query planning and execution

* Improvements to parallelism, including:

-- CREATE INDEX can now use parallel processing while building a B-tree index

-- Parallelization is now possible in CREATE TABLE ... AS, CREATE MATERIALIZED VIEW, and certain queries using UNION

-- Parallelized hash joins and parallelized sequential scans now perform better

* SQL stored procedures that support embedded transactions

* Optional Just-in-Time (JIT) compilation for some SQL code, speeding evaluation of expressions

* Window functions now support all framing options shown in the SQL:2011 standard, including RANGE distance PRECEDING/FOLLOWING, GROUPS mode, and frame exclusion options

* Covering indexes can now be created, using the INCLUDE clause of CREATE INDEX

* Many other useful performance improvements, including the ability to avoid a table rewrite for ALTER TABLE ... ADD COLUMN with a non-null column default

https://www.postgresql.org/docs/11/static/release-11.html

chadash · on Oct 18, 2018

small, but useful feature is that you can now use keywords quit or exit to exit the psql command line. Helpful change for those of us who forget that the only current command is "\q".

EDIT: ctrl+D also works, just be careful not to press twice, since doing so will exit psql and then exit the shell altogether

c0l0 · on Oct 18, 2018

In bash, you can use `set -o ignoreeof` for your interative shells to not trigger logout upon receiving EOF/^D.

bartlomieju · on Oct 18, 2018

Thanks, that's super useful! I found myself accidentally quitting terminal multiple times by pressing Ctrl+D one too many times.

BorisMelnik · on Oct 18, 2018

damn thats really helpful, thank you

millerm · on Oct 19, 2018

Thanks!

aquabeagle · on Oct 18, 2018

Does ^D to send EOF not work like most programs?

hultner · on Oct 18, 2018

It does, I think this is more aimed at novice shell users.

Apaec · on Oct 19, 2018

Vim should learn from psql, not even ctrl+d works there.

vladsanchez · on Oct 18, 2018

Sorry but I think this is a frivolous feature.

smt88 · on Oct 18, 2018

Standardizing commands and making PG more user-friendly isn't frivolous. Usability is one of the most important qualities of a good program or library.

Drdrdrq · on Oct 19, 2018

I think this is why MySQL had such success compared to PgSQL in the past - compare "show tables" with "\d".

threeseed · on Oct 18, 2018

I've been using a terminal since the Minix days.

I honestly had never heard of the Ctrl-D escape sequence until a few weeks ago. So having common exit/quit commands makes life a lot easier at least for some of us.

beatgammit · on Oct 18, 2018

Ctrl+D works for me. :)

hultner · on Oct 18, 2018

Does anyone know if a plan to enable JIT by default exists? I would assume they made it optional to ease it in in case there's any baby problems?

pgaddict · on Oct 18, 2018

Generally speaking yes, but it's not clear when exactly that will happen (if at all).

Firstly, at this point JIT requires LLVM, with may or may not be available when building (so it depends on the packager). We might add other JIT providers in the future (e.g. GCC) but that's not done yet.

Secondly, we need to improve the costing model, so that we make better decisions when to use JIT for a query. AFAIK this is the #1 reason why it's not enabled on JIT-enabled builds now.

anarazel · on Oct 18, 2018

> Generally speaking yes, but it's not clear when exactly that will happen (if at all).

It's still enabled by default in 12/master. I'd expect it to stay that way. But reports for v11 will obviously influence that.

> Firstly, at this point JIT requires LLVM, with may or may not be available when building (so it depends on the packager). We might add other JIT providers in the future (e.g. GCC) but that's not done yet.

I see very very little point in adding another JIT provider to core pg. If somebody wants to write an extension that does so, ok, go ahead, but why would we maintain two nontrivial pieces of code in core? Also the GCC based stuff isn't even close to being competitive.

> Secondly, we need to improve the costing model, so that we make better decisions when to use JIT for a query. AFAIK this is the #1 reason why it's not enabled on JIT-enabled builds now.

Right.

pgaddict · on Oct 18, 2018

> > Generally speaking yes, but it's not clear when exactly that will happen (if at all). > > It's still enabled by default in 12/master. I'd expect it to stay that way. But reports for v11 will obviously influence that.

Sure, but that's a development version, and I'm not sure it'll stay like that at the time of release. I hope it will, but my understanding is it's enabled mostly to get exposure from the developers by default.

>> Firstly, at this point JIT requires LLVM, with may or may not be available when building (so it depends on the packager). We might add other JIT providers in the future (e.g. GCC) but that's not done yet. > > I see very very little point in adding another JIT provider to core pg. If somebody wants to write an extension that does so, ok, go ahead, but why would we maintain two nontrivial pieces of code in core? Also the GCC based stuff isn't even close to being competitive.

Yeah, unfortunately the state of JIT in gcc is a bit sad :-(

anarazel · on Oct 18, 2018

> > > Generally speaking yes, but it's not clear when exactly that will happen (if at all).

> > It's still enabled by default in 12/master. I'd expect it to stay that way. But reports for v11 will obviously influence that.

> Sure, but that's a development version, and I'm not sure it'll stay like that at the time of release. I hope it will, but my understanding is it's enabled mostly to get exposure from the developers by default.

Its the stated intent to have it on in 12. We'd pondered disabling JIT in 11 since before the original commit. I rockthic think this is broadly analogous what we did with parallelism (except we won't forget to document the change, as we already noted it in the 12 notesnotes ;)).

pgaddict · on Oct 18, 2018

Well, stated intent is great, but it's not the same as certainty. Don't get me wrong - I hope we end up enabling it in 12. But underpromise + overdeliver ;-)

BTW what's "12 notesnotes"?

anarazel · on Oct 18, 2018

> BTW what's "12 notesnotes"?

Release notes. To avoid a repeat of the mishap where we enabled parallel query by default in 10, but didn't mention it because it was always kept enabled in master, we've already mentioned it in release-12.sgml.

pgaddict · on Oct 18, 2018

Oh! I honestly haven't realized we already have release notes for 12 ...

anarazel · on Oct 18, 2018

It's more like a note, singular ;)

petereisentraut · on Oct 18, 2018

Right. It will probably be enabled by default in PostgreSQL 12.

polskibus · on Oct 18, 2018

I wish PostgreSQL had real clustered index like MS SQL and MySQL, not a readonly version that has to be rebuilt after mutation.

adwf · on Oct 18, 2018

Effectively you can now do this; a PK index + INCLUDE all other columns in the table. This will be kept upto date without any need for manual reclustering.

Sure it's duplicating the table data, but it's functionally equivalent to the reclustering or rematerialising a view of the table methods that people use now, just without the hassle.

polskibus · on Oct 19, 2018

That's a lot worse performance-wise than a clustered index that is at the same time the primary ordering for the table.

adwf · on Oct 19, 2018

True, but MSSQL style clustering has it's own performance trade-offs in terms of slower writes, fillfactor bloat etc.

polskibus · on Oct 19, 2018

It's not going to be slower vs an unordered heap + a separate index that includes all the columns. Hence MS SQL implementation is superior. If you don't want a PK, then MS SQL offers you an undordered option.

jcadam · on Oct 18, 2018

The partitioning support may now be good enough to rid myself of pg_partman (No offense to the author, great extension, but I'd much prefer not having the dependency).

Figuring out how to convert a partitioning scheme created using pg_partman to native Postgres 11 won't be fun.

bsaul · on Oct 18, 2018

Anyone has an idea on what makes postgresql both such a robust and always evolving db ? i’m amazed at the pace at which they’re adding deep features ( such as jit) without breaking anything release after release, in an open source environment.

Usually this kind of things are the consequences of a good architecture but i wonder if anyone has an more in-depth explanation on what in the architecture makes it that good.

devilmoon · on Oct 18, 2018

Not my personal experience, but one of my professors is trying to get his patch approved into Postgres for an upcoming release; since his changes are very deep and touch the kernel (he wants to add support for temporal data) he told me the whole process will take years and the whole community will examine these changes before anything is added.

I think that pretty much explains it: Cutting edge features are proposed way earlier than commercial DBs start thinking about implementing them, and the community hones them over years before they are released to the mainstream public.

bunderbunder · on Oct 18, 2018

This mirrors my experience working at a company that achieved similar results on some of its internal products.

The very biggest difference between working there and working anywhere else, is that developers spent only a tiny fraction of time actually writing code.

The biggest chunk of it was spent on peer review. Every change required sign-off from everyone on the immediate team, and, if it was being made to a component that other teams interacted with, they'd also have members reviewing the changes. And, as a newcomer, I found the reviews to be brutal. Over time, though, I came to recognize that what originally felt like others getting all up in my business about pedantic little variable naming issues was actually everyone having my back about legitimate maintainability concerns.

Next biggest chunk was spent on planning - understanding requirements, talking over changes with anyone who might be impacted by them, etc.

The second biggest difference was just how fast we could move. I think a lot of that came from a certain "wei wu wei" that was attributable to the deep understanding of the systems we were working on that developed out of all that time with our hands off our keyboards.

Allow me the vice of inserting a mangled Hitchhiker's quote here: "If human beings don’t keep exercising their [fingers], he thought, their [hands] probably seize up. After a few months’ consideration and observation he abandoned this theory in favor of a new one. If they don’t keep on exercising their [fingers], he thought, their brains start working."

tomnipotent · on Oct 18, 2018

> Cutting edge features are proposed way earlier than commercial DBs

This is not true. SQL Server & Oracle already have pretty every feature just released in 11 and have had most of them for many years.

combatentropy · on Oct 19, 2018

Maybe they have many of the features that are new in 11, but what about what PostgreSQL had already? I have used all three, and like Postgres a hundred times as much. https://www.linkedin.com/pulse/postgresql-vs-ms-sql-server-g... (I did not write this, just found it after a quick search)

orf · on Oct 18, 2018

Like native JSON support?

tomnipotent · on Oct 18, 2018

I would rather have query parallelism, advanced partitioning, and online index creation than JSON support.

Postgres is an amazing RDBMS and the clearly the most advanced open source offering (I use it for my startup), but it will always trail the big commercial systems like SQL Server and Oracle. The query planners for both of those RDBMS' are still considerably more advanced.

monus21 · on Oct 18, 2018

I used to work in enterprise. There's nothing I miss more than Oracle's automatic partition management.

zeroname · on Oct 18, 2018

That's a "hipster" feature, not something that the "enterprise" customers of these products are asking for.

Twisell · on Oct 18, 2018

This is quite a pedantic opinion. I’ve found jsonb pretty useful to handle semi-structured data in production.

I wouldn’t try to use jsonb attribute in a primary key though (even if it’s practically possible as it’s a fully featured base type).

gleenn · on Oct 18, 2018

Not sure if you're trying to be funny about the hipster thing.

I've seen the json features used at multiple serious organizations. It turned what would have otherwise been unnecessarily complicated queries into really reasonable queries in our ETL pipeline.

derefr · on Oct 18, 2018

It's a difference of mindset.

Postgres includes many features that increase the number of simple use-cases for the existing data in their database, thus allowing you to replace a Postgres complement (like Mongo) with Postgres for that use-case.

"Enterprise" DBMSes like Oracle and MSSQL, meanwhile, don't focus so much on expanding the use-cases of their offering, as they do on expanding the number of ways a sysadmin can take the existing use-cases and make them scale better, with less manual maintenance. Oracle and Microsoft allow you to replace their ecosystem of tools and extensions with a DBMS-internal feature for the given sysadmin need.

So Postgres has an expanding ecosystem of tooling and extensions, but is slowly eating its complements; while Oracle and MSSQL have an expanding space of complementary offerings, but are slowly eating their own product ecosystems.

Or, to put that another way: Postgres focuses more on making life easier for people who write SQL; Oracle and Microsoft focus more on making life easier for the DBAs who manage the DBMS cluster that people are running SQL against.

rb666 · on Oct 19, 2018

Proposed, not implemented.

manigandham · on Oct 18, 2018

> Cutting edge features are proposed way earlier than commercial DBs start thinking about implementing them

Definitely not true. Postgres development is slow, deliberate, and the opposite of cutting edge. It's what has lead to a great platform that is reliable and flexible, but every single commercial database out there has far modern tech implemented.

Tostino · on Oct 18, 2018

In some ways that is true, and in others that is not the case. For JSON support, PG really did lead the way as far as RDBMS' go. Things like range types are something I wish MSSQL supported when I used it at my day job.

bunderbunder · on Oct 18, 2018

And plenty more that PostgreSQL will probably lack for the foreseeable future. Especially at the more expensive pricing tiers.

But I'm not sure it's a fair comparison, precisely because of that word, "pricing". It's amazing what you can get people to do when you're paying them big bucks to do it. Part of what makes PostgreSQL so impressive is what it does given the budget it's working with.

riku_iki · on Oct 18, 2018

I think they targeted xml instead of json, and much earlier. XML is much more enterprise-ly.

dominotw · on Oct 18, 2018

> support for temporal data

native support for TimescaleDB ? That would be nice.

derefr · on Oct 18, 2018

TimescaleDB is for time-series data, which is half-way to being one kind of temporal data.

Full support for both kinds of temporal data ("valid time" and "transaction time") requires a bit more engineering.

The PG support for the TSTZRANGE data type, operators built on it, and indices built on those operations, was part of a previous phase of adding support for "valid time" temporal data.

https://pgxn.org/dist/temporal_tables/ is an extension implementing a prototype of "transaction time" temporal data, though in modern Postgres, you can do this much yourself.

In both cases, none of the SQL syntax specific to temporal queries has been implemented yet. Rather than a distinct view and backing "current" and "history" tables and etc., you should be able to just have a table that represents all of those things, and do queries on its components using different syntax. (Sort of like how S3 versioned bucket resources work.)

jeffdavis · on Oct 18, 2018

What is the professor's name?

pgaddict · on Oct 18, 2018

Good question. I'm contributing to postgres for quite a few years and I still don't have a clear answer to that. It certainly is not a single discrete reason, but IMHO a combination of various factors:

1) careful step-by-step engineering (vision, reviews, bug-fixing, ...)

2) open and civil community

3) not founded/owned by particular company (no one has the upper hand, companies serve customers with different needs, compete but also cooperate, ...)

4) code / documentation quality

I could probably go on, but those four are probably the most important ones. (At least for me.)

hi41 · on Oct 19, 2018

Congratulations to the postgres team. How do you create such a high quality product when you have to understand the code written by so many people? I take a long time to understand what someone else has written.

pgaddict · on Oct 21, 2018

It's not like every contributor (or even committer) understands all parts of the code base - I certainly don't. People usually start by writing external code (e.g. by writing extensions in C), learn the code style and basic constructs (memory context management, data types, etc.) and then gradually expand their knowledge by working on stuff closer to core and/or reviewing stuff that interests them. At least that's how it worked for me and a couple of people around me. I don't think it's particularly specific to Postgres.

Regarding the high quality, we don't do anything magical - we simply apply well-known tools in a careful and methodical way.

1) careful reviews - Not just the code, but appropriate tests, comments and docs, and most importantly overall design/direction. It's not uncommon to reject the patch on the basis that there's a better approach, and get it reworked from scratch. Evan patches from the most senior contributors generally go through it too (with the exception of bugfixes, ...).

2) not accepting technical debt - No quick hacks that would make our life harder down the road. No features that are arbitrarily incompatible with other features. No compiler warnings.

3) incremental development - Split large patches, even if it means it's not really useful initially.

4) testing - We have a range of tests, that are executed on a variety of machines/platforms/compilers ... (https://buildfarm.postgresql.org/)

5) adopt useful technologies - static analysis, test coverage, valgrind, fuzzing (sqlsmith found quite a few bugs), ...

6) attention to bug reports / fixing them promptly (instead of just stashing them somewhere forever)

I'm sure there are many additional things contributing to the quality. In any case, there's no secret sauce, except for applying those principles consistently.

hi41 · on Oct 27, 2018

Thank you very much! I stand to gain much from your thoughtful answer. Recently my career took a big downturn. I got laid off and found it hard to get a programming job. I am doing production support now. I feel great despair that I don't measure up as a programmer anymore. I need to get back my programming skills but I am confounded by the huge complexities of the current stacks.

zrail · on Oct 18, 2018

The PostgreSQL ethos, from the very beginning, has been "make it correct, then make it fast". I don't know a lot about the team structure but I do know they don't have much in the way of formal process. It's just a team of competent professionals making the world a better place.

mason55 · on Oct 18, 2018

Also the initial release of features is usually very minimal, almost to the point of not being usable in the real world except for a very targeted use case. Then things sloowwwwly expand to cover more and more things as people have time to test and review.

And if your patch adds a good feature but in a way that is bad architecturally then it will be rejected or fully rewritten with no care about what that means from a timing perspective.

It means that it takes awhile for things like parallel query to really become valuable but when they do you know they work and work well.

ecnahc515 · on Oct 18, 2018

Yep, it's honestly the epitome of release early and release often applied to something mission critical like a DB. By the time it's in a release, even the minimal version of a feature is pretty good quality, but just not necessarily the full-feature yet, so people can still test it and rely on it like it's a full stable feature missing minor features.

sroussey · on Oct 18, 2018

The partition by hash is a good example in this release. I wondered by it wasn’t done when ranges were done, but patience is a virtue.

jeffdavis · on Oct 18, 2018

My opinion (as a community member and contributor):

PostgreSQL is willing to accept delays in some features -- even if highly desired -- to enable high overall feature development and quality.

Also, more developers are willing to put time into refactoring and simplifying what they need before starting on their project. In other words the groundwork is laid first and then the new feature.

Another thing that might help is that there is no bug tracker. Bugs are fixed and the fixes backported after a report comes in. Bugs don't end up stuck in a backlog.

vikrantrathore · on Oct 18, 2018

I think the simple answer is philosophy of Kaizen(改善)- continuous improvement. I started working with postgresql when it was just postgres95. Recently found that my earliest question on postgresql-users was 13th October 1998, 20 years ago (https://postgrespro.com/list/id/3.0.2.32.19981013214318.0069...).

I still use PostgreSQL in my startup for almost all critical production workload. This in itself a testament to its improvements. I remember I starting with c libraries of msql and postgres95 and continued to work with them as they evolved into postgreSQL and a new kid on the block mysql. I liked postgreSQL due to its adherence to SQL. Since I like working with relational algebra. So even though MySQL was more popular in later years, I still used PostgreSQL more often, except for building qmail servers with mySQL backend.

Koshkin · on Oct 18, 2018

> Kaizen

This is the last call of a hungry manager crowing on top of the body of an already dead software project.

zephharben · on Oct 18, 2018

This might be valid if we were discussing an internal, proprietary software project.

We're talking about an open source project used by millions of people that has gotten CONTINUOUSLY better over the course of decades - to the point where it is now - arguably - the best of its kind.

Given this track record, I think postgres devs are entitled to use whatever management-speak euphemism they like. In this case, the idiom's original meaning/intent seems apt.

iagooar · on Oct 18, 2018

I guess it's simple: they have world-class engineers behind it. Of the smartest people in tech, that truly understand their craft. That, and a broad, open community.

abledon · on Oct 18, 2018

You gotta be insanely good and enjoy your craft to enjoy writing low level DB code for open source lol

simias · on Oct 18, 2018

DBs might not be glamorous but I'm sure there are many very interesting problems to solve when developing them. Of course on such a large, old project there's got to be a lot of tedious maintenance work as well, but that's true for all software in my experience.

It's often a conversation I have with people looking to get into software development. Often they'll aim for video games or something like that but I often warn them that it might not be nearly as cool as they imagine. You're more likely to end up scripting crappy menu systems than being the next Carmack.

On the other hand some of the most interesting pieces of software I've written were for very unsexy industrial applications. And I actually have good working conditions unlike people working in the videogame industry apparently.

pizza234 · on Oct 18, 2018

  Often they'll aim for video games or something like that 
  but I often warn them that it might not be nearly as cool 
  as they imagine. You're more likely to end up scripting 
  crappy menu systems than being the next Carmack.

Interestingly, Carmack himself said if he wouldn't write videogames, he would work on databases. I can't find the exact quote, but here's a similar reference:

  I enjoy lots of different areas of it [programming]...
  I'm taking a great deal of enjoyment writing device
  drivers for Linux. I could also be having a good time 
  writing a database manager or something because there
  are always interesting problems.

Twisell · on Oct 18, 2018

And I recently had mindblowing moment when I realized that Civilization VI is heavily SQL based. I didn’t have time to dive into it but the Graphical UI is basically an interface to visualize and update a database.

Seems afterwards quite logical and clever considering the turn based gameplay.

abledon · on Oct 18, 2018

Now just design a civ style game with sql command-line interface but fancy autocomplete bright warcarft3 colors and a fun storyline . Have a generation of gamers learn sql

pgaddict · on Oct 18, 2018

I'm not sure I understand the "open source" part - it's not as if we do it for free. I'd say for most of the senior PostgreSQL engineers / contributors it's part of their paid jobs. We either work for companies providing various PostgreSQL services, or for companies using PostgreSQL.

We might do the same type of work on a proprietary database, but that would lack the community/cooperation aspect of the project.

Of course, enjoying the craft is somewhat natural in both cases. And yes, it's a bit scary how smart some of the contributors are.

jeltz · on Oct 18, 2018

It varies a lot. Many parts of the code base are very accessible and easy to modify for almost anyone, while some parts, especially the query planner, contain some of the most advanced code I have worked with.

Coding_Cat · on Oct 18, 2018

And probably (because of that) a very robust codebase-base. Adding a JIT to a 'textual query'->'results' pipeline is a lot easier if you took the effort to silo off each individual step than it is when you consider 'text'->'result' to be one fixed step. Even if the initial partioning is a lot of work.

(pure speculation, I haven't seen their code yet, but I am curious now).

atomic77 · on Oct 18, 2018

On that note, I stumbled on an LWN article a couple of years ago that made me reflect on this miraculous balance of new features and stability:

https://lwn.net/Articles/645020/

Anyone more involved in the pgsql community able to comment on how this balance is being struck? It doesnt' seem like the release schedule has changed much, though the version numbers seem to be incrementing faster (no v10.1, 10.2, etc?)

qaq · on Oct 18, 2018

I think there are many factors on tech. side it's pretty modular and has very detailed comments in the source code, on the ecosystem side there is no single entity behind PG so it created an ecosystem where multiple entities can contribute and build products and services around PostgreSQL.

rb666 · on Oct 18, 2018

The inclusion of the keywords "quit" and "exit" in the PostgreSQL command-line interface to help make it easier to leave the command-line tool

YES, at last. This beats parallel queries any day.

bschwindHN · on Oct 18, 2018

I'd highly recommend taking a look at pgcli, I find it much more pleasant to use than psql.

overcast · on Oct 18, 2018

Seriously, that alone was very disorienting jumping into PGSQL for the first time. The command line definitely not as user friendly as MySQL.

ibotty · on Oct 18, 2018

There are widely different opinions about that. I have never heard someone proclaim MySQL has a great command line, while I heard that about PostgreSQL many times.

I am sure it's more a matter of familiarity.

briffle · on Oct 18, 2018

I came to Postgresql from Oracle. the ability to hit the up arrow in the CLI was groundbreaking :)

sofaofthedamned · on Oct 18, 2018

Agreed! It's the small things sometimes...

pgaddict · on Oct 18, 2018

https://www.reddit.com/r/ProgrammerHumor/comments/9gtq70/lad...

enraged_camel · on Oct 18, 2018

Serious question: when you need a relational database, are there any technical reasons why you would use anything other than Postgres these days?

mrcarruthers · on Oct 18, 2018

One thing that Postgres does not do well is row churn. If you have a table that have rows that are constantly added/deleted/updated, Postgres tends to bloat pretty quickly because of the way they handle updates. Say what you will about using a relational db for that kind of data, but if you need to, Postgres isn't a great choice.

pgaddict · on Oct 18, 2018

That really depends.

You're right UPDATEs may be an issue (because we handle them essentially as DELETE+INSERT). Generally speaking, row churn in the table alone is not an major issue - it's easy to clean up by vacuum, and it will be reused for new data. And you can limit the amount of bloat by tweaking the autovacuum parameters.

What's more painful is bloated indexes (e.g. due to UPDATEs that modify indexed columns), because that's much harder / more expensive to get rid of.

The thing is - this is part of the MVCC design, and it has some significant advantages too. It's not like the alternative approaches have no downsides.

emddudley · on Oct 18, 2018

In some applications, at least, vacuuming is not sufficient to deal with row churn.

I use PostgreSQL in an embedded device. There is a high insertion rate, and eventually when the disk starts to get full I need to get rid of old rows.

Using plain DELETE and VACUUM does not work. The deletes aren't fast enough to keep up with the inserts, and vacuuming reduces performance to the point that I have to drop data that is waiting to be inserted. This is on a high performance SSD and I've tuned postgresql.conf. (Bigger/better hardware is not possible in my application).

Instead, I think partitions with DROP PARTITION are the only way to handle high volume row churn. Dropping a partition is practically instant and incurs no vacuum penalty.

pgaddict · on Oct 18, 2018

Yeah, DROP PARTITION is definitely going to be much more efficient than DELETE + cleanup. No doubt about that.

Not sure what postgresql.conf tuning you've tried, but in general we recommend making autovacuum more frequent, but performing the cleanup in smaller chunks. Also, batching inserts usually helps a lot. But maybe you've already tried all that. There's definitely a limit - a balance between ingestion and cleanup.

panarky · on Oct 18, 2018

> eventually when the disk starts to get full

Don't wait until the disk gets full.

Autovacuum works great for most small to moderate sized databases. And it works great for larger databases with a few tweaks.

pella · on Oct 18, 2018

ZHEAP (maybe in the PG12; Check Pluggable Storage API! )

* http://amitkapila16.blogspot.com/2018/03/zheap-storage-engin...

* https://news.ycombinator.com/item?id=16526623

* https://github.com/EnterpriseDB/zheap

twic · on Oct 18, 2018

That's very interesting.

At the moment, PostgreSQL keeps all versions of all tuples in its heap files. Inserts, updates, and deletes all result in addition of new tuples to the heap, and the engine keeps track of which transactions can see which tuples. The vacuum process deletes tuples which are no longer needed, but in the meantime, there is bloat.

zheap would keep only the latest version of each tuple in its heap files. When a tuple got updated or deleted, the engine would move the old version into separate storage, the "undo" log. A vacuum process would need to clean up the undo log, but the heap would remain unbloated.

This is obviously very practical. But it's a shame that it introduces an asymmetry, where some transactions will be reading tuples from the heap, and some will need to root around in the undo log.

I note that this is the approach that Oracle has always used - as explained in this fine article by the same chap who wrote the blog post about zheap above:

https://www.enterprisedb.com/blog/databases-different-approa...

Fergi · on Oct 18, 2018

We are about to release an open-source extension for PostgreSQL that will help with the bloat problem you mentioned here, at least for continuous updates like high volume aggregation (see: pipelinedb.com). PipelineDB will be refactored as a standard PostgreSQL extension this month. Continuous aggregation can help with the bloat problem for use cases where you only need aggregated / summary data, like realtime reporting & monitoring.

panarky · on Oct 18, 2018

What databases do you recommend to handle frequent updates?

How does their design differ?

What are the tradeoffs to consider?

zzzcpan · on Oct 18, 2018

LSM-tree, fractal tree, B-epsilon tree or similar ideas based stuff for on disk storage is what solves the problem. Pretty much no tradeoffs compared to classic approaches, these things are just better.

dhd415 · on Oct 18, 2018

LSM trees are good for frequent updates on recent data. Frequent updates of old data are much worse on LSM trees since they trigger merges on the oldest (and typically largest) levels of the tree.

Some storage engines allow for in-place updates of records if the fields are fixed-length or if the updated variable-length value fits in the existing space. I've used some of those (e.g., MS SQL Server) to good effect in high-update scenarios.

zzzcpan · on Oct 18, 2018

Depends on your implementation of course, but updates shouldn't be triggering merges on the largest levels. They should be accumulated until there is enough of them to merge or there is some other reason to merge things, like scrabbing. Either way, it's still more efficient.

jeltz · on Oct 18, 2018

Only in some cases of churn plus there are plenty of workarounds where you sacrifice some performance and/or usability to avoid the bloat so I would still recommend PostgreSQL since few databases consist only, or almost only, of the bad cases for bloat.

I used to work in online gambling where we had plenty of row churn and not much bloat at all without having to use any of the workarounds.

jeltz · on Oct 18, 2018

I can only think of three cases:

1) You need an embedded relational database with a small foot print. Here PostgreSQL cannot compete with SQLite.

2) You have an application which does not support PostgreSQL.

3) You have petabytes worth of data. Here you want to look into something like Greenplum, a PostgreSQL fork.

truth_seeker · on Oct 18, 2018

PG works okay with Peta Bytes of data. Various types of indexes, Table partitioning and parallel query computation can help.

Citus and other community extensions can help if you want to go distributed and fault tolerant

jacques_chester · on Oct 19, 2018

Worth noting that the Greenplum team are working to converge back to mainline.

benmmurphy · on Oct 18, 2018

some people have problems with they way MVCC is implemented. like Mysql using rollback segments so it doesn't have issues with vacuuming [tho, there is tradeoffs with doing it that way as well]. the vacuuming issues are really scary and basically involve losing write access to the whole database. have a look at stuff like this: https://www.joyent.com/blog/manta-postmortem-7-27-2015 . i believe postgres have made a bunch of changes to reduce the risk of vacuums blowing things up.

this MVCC implementation couple with the way indexing works also causes write amplification. the indices have pointers to the physical place where the row is stored (some other databases might instead record the primary key and do a lookup on the primary key to get the record). so if you update a row and it causes it to move to a different physical page (you need to keep the original row for MVCC so there needs to be space for the new row in the current page) then you need to update all the indices as well. PG has some optimisation around trying to write the update to the same physical page to reduce the number of writes.

drej · on Oct 18, 2018

Horizontal scaling and/or alternative storage systems (I know, pg can theoretically do both, but...).

That and perhaps global consistency aka "sort of but not really defeating the CAP theorem using atomic clocks and all that" in Spanner.

pgaddict · on Oct 18, 2018

Yeah, we're not particularly good at those out of the box.

1) Horizontal scaling: In some cases it's doable using streaming replication, but it depends if you need to scale reads or writes. Or if you need distributed queries. There are quite a few forks and/or projects built on PostgreSQL that address different use cases (CitusDB, Greenplum, Postgres-XL, BDR, ...). And the features slowly trickle back.

One reason why it's like this is extensibility/flexibility - the project is unlike to hard-code one particular approach to horizontal scaling, because that would not work for the other use cases. So we need something that does not have that effect, which takes longer. It's a bit annoying, of course.

2) Storage systems: We don't really have a way to do that now - there are extensions using FDW to do that, but I'd say that's really a misuse of the FDW interface, and it has plenty of annoying limitations (backups, MVCC, ...). But it's something we're currently working on so there's hope for PG12+: https://commitfest.postgresql.org/20/1283/

truth_seeker · on Oct 18, 2018

Citus extension can help you achieve horizontal scaling.

Here are some of the success stories. https://www.citusdata.com/customers/

GlennS · on Oct 19, 2018

MS SQL Server still has a few things up its sleeve which can be valuable in specific cases:

1. Incrementally updated materialized views (in MS SQL Server, these are called 'indexed views').

2. SQL Server Management Studio is better than anything I've seen elsewhere.

3. Reporting and analytics (SSRS/SSAS) built in. These are actually pretty good, but not the approach I would recommend.

4. Very solid clustering (I haven't used PostgreSQL's clustering, so this point might be out of date).

The license fees are steep (not as steep as Oracle's...) though, so you've got to really want those features.

I think report builders tools miss the sweet spot. Instead, build the specific reports that your business needs. Or, do a regular export to flat files (tab delimited causes fewest problems) and let people build what they need in Excel/R/whatever. Or both.

In practice, I try SQLite first, and fall back to Postgres if I need concurrent writes.

majewsky · on Oct 18, 2018

Because the app you want to run does not support it.

OpenStack used to support both MySQL and Postgres, but they removed the Postgres support, which is a decision that completely baffles me.

grouseway · on Oct 18, 2018

I'm a fairly casual PG user but I have these issues:

Lack of case and accent insensitive collations is inexcusable at this point. Is there anyone that enjoys sprinkling every bit of SQL with upper() comparisons?

Non materialized CTEs. I like materialized CTEs sometimes and wish MSSQL had them as an option but I also need it to work the other way.

Native point-in-time recovery out of the box. This looks like it might be possible but the process doesn't fill me with a lot of confidence.

Better connection scaling.

Ability to load custom TS dictionaries in user space (for hosted postgres on google/aws/etc..)

nathan_long · on Oct 18, 2018

> Is there anyone that enjoys sprinkling every bit of SQL with upper() comparisons?

Does using a citext column not solve this?

Tostino · on Oct 18, 2018

Agree on PITR being something that seems less approachable than it really is.

Use a tool like pgbackrest, and everything just works really well.

collyw · on Oct 18, 2018

You have a sever with MySQL already.

We had Rackspace servers in the place I worked a couple of years ago and they didn't offer Postgress as far as I remember. (You could install and manage it on a virtual machine but then you had to do everything yourself, while they had managed MySQL servers available).

Also setting up master / slave replication seems easier with MySQL as I understand - I never tried with Postgress but did with MySQL.

jeltz · on Oct 18, 2018

Setting up master / slave replication with PostgreSQL used to be hard but that was fixed a few years ago.

sroussey · on Oct 18, 2018

I’ve found MySQL far better in replication arrangements, with postgresql wal being termpermental.

jeltz · on Oct 18, 2018

Temperamental in what way?

timc3 · on Oct 18, 2018

Multi-region replication like Cassandra? Sort of a question because the last time I really looked around the release of PostgreSQL 9 there wasn't a solution.

chucky_z · on Oct 18, 2018

Actually an interesting solution to this is using something like kafka-connect. Debezium specifically has libraries around pgsql -> pgsql replication using kafka + kafka-connect that work shockingly well. The lag is not much different than traditional replication but you can have a single primary that kafka-connect is pulling into kafka, and a gazillion replicas off that single stream.

tfehring · on Oct 18, 2018

One reason is the lack of native support for columnar storage (though an open source third-party extension exists: https://github.com/citusdata/cstore_fdw)

The need to manually update clustered indexes, and the fact that an exclusive lock is needed to do so, is also annoying.

captain_perl · on Oct 19, 2018

MySQL with Innodb is easier to manage in large replication clusters, doesn't require connection pooling, and doesn't have write amplification problems.

MySQL out of the box is more secure and easier to configure than Postgresql because of pg's public and the legacy config files.

It's also much easier to hire DBAs.

Thaxll · on Oct 18, 2018

Why would it be better than MySQL is you're already using MySQL for years and built knowledge with it and have no issues? Why would I change, why Postgres would be better?

astrodust · on Oct 18, 2018

The JSON/JSONB columns and the JSON navigator options are amazing. MySQL isn't even close yet, they have a JSON column that's basically opaque. Other column types like the INET, CIDR and MACADDR are often invaluable.

You can also index ARRAY columns, meaning if you need a "tags" feature it's ridiculously easy to implement, no need for additional tables.

There's also PostGIS, which has no MySQL analog, and makes doing actual real-world GIS work possible.

nobleach · on Oct 18, 2018

Having tried, and failed to use MySQL Spatial Extension (which some might argue is PostGIS' analog) and failing miserably at some pretty basic things, I agree totally. I'd take PostGIS over ESRI's SDE and Oracle Spatial as well.

nathan_long · on Oct 18, 2018

One "quality of life" thing - you can wrap changes to the schema in a transaction, so that a given set of changes is atomic. (Elixir's Ecto and Ruby's ActiveRecord both have migrations that do this by default.) As far as I know, that doesn't work in MySQL.

There's also neat stuff like exclusion constraints: http://nathanmlong.com/2016/01/protect-your-data-with-postgr...

And there's the ability to do all kinds of neat geospatial queries with PostGIS - eg "list all properties within X distance from the curvy coastline".

barrkel · on Oct 19, 2018

Feature for feature, PG blows MySQL out of the water. PG gives you way more tools to use in your queries, indexes etc. to solve app problems.

The real knowledge gap would come into things like all the various tactics you need to learn to get MySQL to produce good plans for your queries. PG has a much smarter optimizer which is harder to control - if it does the wrong thing, it's harder to encourage it to do the right thing, and philosophically they don't support query hints. OTOH, worst case in PG is usually much better than worst case on MySQL.

twic · on Oct 18, 2018

If somebody else is paying for it, Oracle is still pretty good, and has some features that PostgreSQL doesn't.

Tepix · on Oct 18, 2018

SQLite fits the bill but of course fills a very different niche.

sroussey · on Oct 18, 2018

You might want clustered indexes.

knocte · on Oct 18, 2018

No, there aren't. In other words, I hope MSSQL dies in a dumpster fire.

paulie_a · on Oct 18, 2018

I agree, it should first beaten with a baseball bat.

netcraft · on Oct 18, 2018

I've used lots of different databases over the years and PG is my pick. Thanks to all who were involved, this looks like a great release.

Heliosmaster · on Oct 18, 2018

Congratulations for the release! PostgreSQL is, imho, a great example of how open-source software can be stable and performant. Kudos to the team

drej · on Oct 18, 2018

I know we got 11 just today, but I can't wait for 12 already! Rumours about alternative storage systems are extremely exciting, the idea of having a columnar materialised view (just guessing, not sure what the actual implementation will be like).

I can't wait to throw out all the columnar databases and the ETLs I have to support them, all just for a few queries.

munk-a · on Oct 18, 2018

pg12 may be removing the (mandatory) optimization fences in CTEs, this is really exciting to me as it'll enable much more maintainable complex queries to not suffer a performance hit.

As a general rule of programming, I never want syntax sugaring or a chosen expression pattern to impact runtime performance. I have found it's optimal to write expressive code first and performant code second - and if there's a trivial expression transformation between the two a compiler or interpreter better be equipped to do it for you.

smilliken · on Oct 18, 2018

I'll play devil's advocate: the optimization fence on CTEs is a godsend. The query planner is often the wrong kind of smart. A big table can get autovacuumed in the middle of the night, cause the planner to use a new query plan, and a query that used to be instantaneous takes minutes.

If you know the query plan you want, you can usually force it using CTEs.

Really I'd be happier without a query planner at all.

jordanthoms · on Oct 19, 2018

Don't think I'd like having no query planner, most of the time it does the right thing. But having the ability to more easily correct it when it goes wrong would be nice (arguably postgres already supports the ideal way of doing this with it's custom statistics, though sometimes it's easier to have a more manual knob).

munk-a · on Oct 19, 2018

Sorry, to clarify the "(mandatory)" in the comment above, pg12 will be making CTEs optionally optimization fences, so people who want to hand-plan their query still can, but people who wish to use CTEs for organization and clarity will be able to do so without being forced to do so in a manner that matches performance needs.

I think being happy not having any query planner at all is a bit extreme, but there are times when queries need to be tuned to force early data culling or adherence to indexes, I find these times to ultimately be pretty rare and generally have to review all the instances of our manual optimization whenever we update postgres to see which ones the query planner can now do better. This change would still allow you to sidestep the query planner, but would also open up expressive CTEs to use in performance sensitive queries.

anarazel · on Oct 19, 2018

I think "will be making" unfortunately includes a bit of wishful thinking. It's far from clear we'll have a patch that's mature enough by the time the feature freeze is there and that also has the necessary support.

jeltz · on Oct 18, 2018

It is very hard to say right now what will make it into PostgreSQL 12. Some further improvements to partitioning have already landed and I expect more to land, but for other features I have no idea. There are some very ambitious projects in the pipeline.

anarazel · on Oct 18, 2018

> Rumours about alternative storage systems are extremely exciting, the idea of having a columnar materialised view (just guessing, not sure what the actual implementation will be like).

Note that even if we get the pluggable storage work into v12 - which I hope and think we can do, I'm certainly spending more time on it than I'd like - it'll not include a columnar storage on its own. There's others working on storage engines, but the furthest along intended for core aren't, to my knowledge, columnar. And even if somebody submits one for core, it might be a while till it's fast enough to satisfy your demands ;)

elchief · on Oct 18, 2018

Alls I want in the world is TFIDF or BM25 in Postgres Text Search

egeozcan · on Oct 18, 2018

Maybe I'm just spoiled from how fast they implement all these features but I wish that, in the near future, table inheritance would also get some love. In its current state, inheritance needs a lot of manual checks[0]:

> unique, primary key, and foreign key constraints are not inherited.

[0]: https://www.postgresql.org/docs/11/static/ddl-inherit.html

makmanalp · on Oct 18, 2018

I'm very excited about the JIT query compilation stuff finally making it into postgres! A big win for OLAP queries, although it looks like they compile specific expressions within the query rather than the whole query operator, but that could change. I'm very curious to see how this interacts with the query planner (e.g. when it's worth the overhead or not).

anarazel · on Oct 18, 2018

Right, it's "just" expressions and tuple deforming right now. Especially the former required significant refactoring (landing in v10), but that's nothing against the refactoring required to do proper whole query compilation. I'm working on the necessary changes (have posted prototype), which have independent advantages as well.

There's planning logic to decide whether JIT is worthwhile, but that's probably the weakest part right now.

makmanalp · on Oct 19, 2018

Right - I don't mean to minimize the effort at all, it must have been a massive undertaking already! Thank you for all your work!

SnowingXIV · on Oct 18, 2018

PostgreSQL has been such a joy to use for both hobby and professional projects. It's so reliable. I wonder when Heroku will support 11 and I'm curious if upgrading from 10 to 11 with heroku using the simple pg_upgrade would break anything. If it's just a pure performance increase, I'm looking forward to doing it to a few production applications.

generalpf · on Oct 18, 2018

I'm really surprised pgsql procedures weren't already able to start, commit, and rollback transactions. Well done devs!

flyinglizard · on Oct 18, 2018

Piggybacking for a Postgres question: I have thousands of legacy software instances all around the world, running Postgres with very marginal internet connections (think IoT). I’d like to continuously get all their data into my cloud. Should I use Postgres built in async replication for that?

_verandaguy · on Oct 18, 2018

If you added your design goals and constraints (do you want it to be fast? frequent? power-efficient? which ones can you trade off?), this would make a good question for the DBA stackexchange.

jeltz · on Oct 18, 2018

Hard to say for sure without knowing more but I would probably build my own custom protocol for syncing this, because the replication (both binary and logical) is built with the assumption that you have a reasonably constant connection from the master to the replica. If not you will start to build up large amounts of write-ahead logs.

Cthulhu_ · on Oct 18, 2018

Never invent the wheel if you haven't done your research yet.

Colleagues of mine successfully employed CouchDB in a situation where devices were offline for an x amount of time (running on tablets in airplanes), once they had internet again they would reliably start syncing data with the main database again. This was a number of years ago though, I haven't heard anything about CouchDB since.

megous · on Oct 18, 2018

It works perfectly with intermittent connectivity, restarts, etc. That will not be a problem.

I'm not sure about thousands of nodes syncing to a single replica though. That may be a bit over the top.

pgaddict · on Oct 18, 2018

Who says you need to feed it directly to the replica. Logical replication provides infrastructure for decoding changes, and you may fetch it any way you want / how often you want / feed it wherever you want (file, another database, ...).

It would be trivial to write something that connects to a bunch of machines regularly, fetches the decoded increment and feed it somewhere (say, to a single database over a shared connection).

See test_decoding / pg_recvlogical for examples.

pgaddict · on Oct 18, 2018

You need to accumulate the data somewhere. If you don't fetch if from the node, it has to accumulate there. Custom sync protocol does not eliminate this.

ibotty · on Oct 18, 2018

I would occasionally send the WAL-log from the client to the cloud. pg_basebackup, etc.

sroussey · on Oct 18, 2018

Might look into https://gun.eco/

truth_seeker · on Oct 18, 2018

Congratulations and big thanks to PG community & Contributors. It has been my choice for all kinds of systems, OLTP, OLAP and Data warehousing. Its just get better with PG 11

I would love to see native column store format, distributed sharding support and in memory tables in future releases.

Nelkins · on Oct 18, 2018

Does anybody know where I can find an example of hash partitioning? I see it mentioned here in the docs but am having trouble finding an example: https://www.postgresql.org/docs/11/static/ddl-partitioning.h...

edit: Found an example here: https://pgdash.io/blog/partition-postgres-11.html

Followup question: Is there a way to re-partition? Say, if your data grows and you want to split the data up further?

amitlan · on Oct 18, 2018

The documentation of creating partitions [1] says this:

"When creating a hash partition, a modulus and remainder must be specified. The modulus must be a positive integer, and the remainder must be a non-negative integer less than the modulus. Typically, when initially setting up a hash-partitioned table, you should choose a modulus equal to the number of partitions and assign every table the same modulus and a different remainder (see examples, below). However, it is not required that every partition have the same modulus, only that every modulus which occurs among the partitions of a hash-partitioned table is a factor of the next larger modulus. This allows the number of partitions to be increased incrementally without needing to move all the data at once. For example, suppose you have a hash-partitioned table with 8 partitions, each of which has modulus 8, but find it necessary to increase the number of partitions to 16. You can detach one of the modulus-8 partitions, create two new modulus-16 partitions covering the same portion of the key space (one with a remainder equal to the remainder of the detached partition, and the other with a remainder equal to that value plus 8), and repopulate them with data. You can then repeat this -- perhaps at a later time -- for each modulus-8 partition until none remain. While this may still involve a large amount of data movement at each step, it is still better than having to create a whole new table and move all the data at once."

[1] https://www.postgresql.org/docs/current/static/sql-createtab...

AtlasBarfed · on Oct 18, 2018

Theoretically that should just be a data sync while maintaining double-write read-primary, and then delete the data from the nodes you don't need anymore once the data has been synced? Of course with non-hash indexes the deletions start to slow down with size...

I'm assuming joins, indexes, etc are all isolated to the shard data?

shroom · on Oct 18, 2018

Looks like a great release even to a Postgres noob like myself. Only just recently started using Postgres but I really enjoy it and it inspires me to dust of my database skills and learn more SQL.

I'd like to take the opportunity and ask if anyone has any recommendations or "must have" settings to the Postgres-terminal. Preferably on MacOSX.

For I found the formatting acting kind of weird in some cases when selecting multiple columns or writing long queries. I figure there must be some things you setup at the start and then can not live without...

Any other tips are also welcome! Thanks

jeltz · on Oct 18, 2018

My only settings are setting extended query mode to automatic to handle output with many columns and to turn on query timing. But it is common for people to also change the NULL symbol from empty string to something else or to have unicode borders.

My .psqlrc:

    \set QUIET 1
    \x auto
    \timing on
    \set QUIET 0

romed · on Oct 18, 2018

Have fun dumping and reloading your entire database.

jeltz · on Oct 18, 2018

You can use pg_upgrade. It has been stable for about 10 years now.

pgaddict · on Oct 18, 2018

Or just use pg_upgrade.

Or use logical replication to reduce the downtime even more.