* Improvements to partitioning functionality, including:
-- Add support for partitioning by a hash key
-- Add support for PRIMARY KEY, FOREIGN KEY, indexes, and triggers on partitioned tables
-- Allow creation of a “default” partition for storing data that does not match any of the remaining partitions
-- UPDATE statements that change a partition key column now cause affected rows to be moved to the appropriate partitions
-- Improve SELECT performance through enhanced partition elimination strategies during query planning and execution
* Improvements to parallelism, including:
-- CREATE INDEX can now use parallel processing while building a B-tree index
-- Parallelization is now possible in CREATE TABLE ... AS, CREATE MATERIALIZED VIEW, and certain queries using UNION
-- Parallelized hash joins and parallelized sequential scans now perform better
* SQL stored procedures that support embedded transactions
* Optional Just-in-Time (JIT) compilation for some SQL code, speeding evaluation of expressions
* Window functions now support all framing options shown in the SQL:2011 standard, including RANGE distance PRECEDING/FOLLOWING, GROUPS mode, and frame exclusion options
* Covering indexes can now be created, using the INCLUDE clause of CREATE INDEX
* Many other useful performance improvements, including the ability to avoid a table rewrite for ALTER TABLE ... ADD COLUMN with a non-null column default
small, but useful feature is that you can now use keywords quit or exit to exit the psql command line. Helpful change for those of us who forget that the only current command is "\q".
EDIT: ctrl+D also works, just be careful not to press twice, since doing so will exit psql and then exit the shell altogether
Standardizing commands and making PG more user-friendly isn't frivolous. Usability is one of the most important qualities of a good program or library.
I honestly had never heard of the Ctrl-D escape sequence until a few weeks ago. So having common exit/quit commands makes life a lot easier at least for some of us.
Generally speaking yes, but it's not clear when exactly that will happen (if at all).
Firstly, at this point JIT requires LLVM, with may or may not be available when building (so it depends on the packager). We might add other JIT providers in the future (e.g. GCC) but that's not done yet.
Secondly, we need to improve the costing model, so that we make better decisions when to use JIT for a query. AFAIK this is the #1 reason why it's not enabled on JIT-enabled builds now.
> Generally speaking yes, but it's not clear when exactly that will happen (if at all).
It's still enabled by default in 12/master. I'd expect it to stay that way. But reports for v11 will obviously influence that.
> Firstly, at this point JIT requires LLVM, with may or may not be available when building (so it depends on the packager). We might add other JIT providers in the future (e.g. GCC) but that's not done yet.
I see very very little point in adding another JIT provider to core pg. If somebody wants to write an extension that does so, ok, go ahead, but why would we maintain two nontrivial pieces of code in core? Also the GCC based stuff isn't even close to being competitive.
> Secondly, we need to improve the costing model, so that we make better decisions when to use JIT for a query. AFAIK this is the #1 reason why it's not enabled on JIT-enabled builds now.
> > Generally speaking yes, but it's not clear when exactly that will happen (if at all).
>
> It's still enabled by default in 12/master. I'd expect it to stay that way. But reports for v11 will obviously influence that.
Sure, but that's a development version, and I'm not sure it'll stay like that at the time of release. I hope it will, but my understanding is it's enabled mostly to get exposure from the developers by default.
>> Firstly, at this point JIT requires LLVM, with may or may not be available when building (so it depends on the packager). We might add other JIT providers in the future (e.g. GCC) but that's not done yet.
>
> I see very very little point in adding another JIT provider to core pg. If somebody wants to write an extension that does so, ok, go ahead, but why would we maintain two nontrivial pieces of code in core? Also the GCC based stuff isn't even close to being competitive.
Yeah, unfortunately the state of JIT in gcc is a bit sad :-(
> > > Generally speaking yes, but it's not clear when exactly that will happen (if at all).
> > It's still enabled by default in 12/master. I'd expect it to stay that way. But reports for v11 will obviously influence that.
> Sure, but that's a development version, and I'm not sure it'll stay like that at the time of release. I hope it will, but my understanding is it's enabled mostly to get exposure from the developers by default.
Its the stated intent to have it on in 12. We'd pondered disabling JIT in 11 since before the original commit. I rockthic think this is broadly analogous what we did with parallelism (except we won't forget to document the change, as we already noted it in the 12 notesnotes ;)).
Well, stated intent is great, but it's not the same as certainty. Don't get me wrong - I hope we end up enabling it in 12. But underpromise + overdeliver ;-)
Release notes. To avoid a repeat of the mishap where we enabled parallel query by default in 10, but didn't mention it because it was always kept enabled in master, we've already mentioned it in release-12.sgml.
Effectively you can now do this; a PK index + INCLUDE all other columns in the table. This will be kept upto date without any need for manual reclustering.
Sure it's duplicating the table data, but it's functionally equivalent to the reclustering or rematerialising a view of the table methods that people use now, just without the hassle.
It's not going to be slower vs an unordered heap + a separate index that includes all the columns. Hence MS SQL implementation is superior. If you don't want a PK, then MS SQL offers you an undordered option.
The partitioning support may now be good enough to rid myself of pg_partman (No offense to the author, great extension, but I'd much prefer not having the dependency).
Figuring out how to convert a partitioning scheme created using pg_partman to native Postgres 11 won't be fun.
Anyone has an idea on what makes postgresql both such a robust and always evolving db ?
i’m amazed at the pace at which they’re adding deep features ( such as jit) without breaking anything release after release, in an open source environment.
Usually this kind of things are the consequences of a good architecture but i wonder if anyone has an more in-depth explanation on what in the architecture makes it that good.
Not my personal experience, but one of my professors is trying to get his patch approved into Postgres for an upcoming release;
since his changes are very deep and touch the kernel (he wants to add support for temporal data) he told me the whole process will take years and the whole community will examine these changes before anything is added.
I think that pretty much explains it: Cutting edge features are proposed way earlier than commercial DBs start thinking about implementing them, and the community hones them over years before they are released to the mainstream public.
This mirrors my experience working at a company that achieved similar results on some of its internal products.
The very biggest difference between working there and working anywhere else, is that developers spent only a tiny fraction of time actually writing code.
The biggest chunk of it was spent on peer review. Every change required sign-off from everyone on the immediate team, and, if it was being made to a component that other teams interacted with, they'd also have members reviewing the changes. And, as a newcomer, I found the reviews to be brutal. Over time, though, I came to recognize that what originally felt like others getting all up in my business about pedantic little variable naming issues was actually everyone having my back about legitimate maintainability concerns.
Next biggest chunk was spent on planning - understanding requirements, talking over changes with anyone who might be impacted by them, etc.
The second biggest difference was just how fast we could move. I think a lot of that came from a certain "wei wu wei" that was attributable to the deep understanding of the systems we were working on that developed out of all that time with our hands off our keyboards.
Allow me the vice of inserting a mangled Hitchhiker's quote here: "If human beings don’t keep exercising their [fingers], he thought, their [hands] probably seize up. After a few months’ consideration and observation he abandoned this theory in favor of a new one. If they don’t keep on exercising their [fingers], he thought, their brains start working."
Maybe they have many of the features that are new in 11, but what about what PostgreSQL had already? I have used all three, and like Postgres a hundred times as much. https://www.linkedin.com/pulse/postgresql-vs-ms-sql-server-g... (I did not write this, just found it after a quick search)
I would rather have query parallelism, advanced partitioning, and online index creation than JSON support.
Postgres is an amazing RDBMS and the clearly the most advanced open source offering (I use it for my startup), but it will always trail the big commercial systems like SQL Server and Oracle. The query planners for both of those RDBMS' are still considerably more advanced.
Not sure if you're trying to be funny about the hipster thing.
I've seen the json features used at multiple serious organizations. It turned what would have otherwise been unnecessarily complicated queries into really reasonable queries in our ETL pipeline.
Postgres includes many features that increase the number of simple use-cases for the existing data in their database, thus allowing you to replace a Postgres complement (like Mongo) with Postgres for that use-case.
"Enterprise" DBMSes like Oracle and MSSQL, meanwhile, don't focus so much on expanding the use-cases of their offering, as they do on expanding the number of ways a sysadmin can take the existing use-cases and make them scale better, with less manual maintenance. Oracle and Microsoft allow you to replace their ecosystem of tools and extensions with a DBMS-internal feature for the given sysadmin need.
So Postgres has an expanding ecosystem of tooling and extensions, but is slowly eating its complements; while Oracle and MSSQL have an expanding space of complementary offerings, but are slowly eating their own product ecosystems.
Or, to put that another way: Postgres focuses more on making life easier for people who write SQL; Oracle and Microsoft focus more on making life easier for the DBAs who manage the DBMS cluster that people are running SQL against.
> Cutting edge features are proposed way earlier than commercial DBs start thinking about implementing them
Definitely not true. Postgres development is slow, deliberate, and the opposite of cutting edge. It's what has lead to a great platform that is reliable and flexible, but every single commercial database out there has far modern tech implemented.
In some ways that is true, and in others that is not the case. For JSON support, PG really did lead the way as far as RDBMS' go. Things like range types are something I wish MSSQL supported when I used it at my day job.
And plenty more that PostgreSQL will probably lack for the foreseeable future. Especially at the more expensive pricing tiers.
But I'm not sure it's a fair comparison, precisely because of that word, "pricing". It's amazing what you can get people to do when you're paying them big bucks to do it. Part of what makes PostgreSQL so impressive is what it does given the budget it's working with.
TimescaleDB is for time-series data, which is half-way to being one kind of temporal data.
Full support for both kinds of temporal data ("valid time" and "transaction time") requires a bit more engineering.
The PG support for the TSTZRANGE data type, operators built on it, and indices built on those operations, was part of a previous phase of adding support for "valid time" temporal data.
https://pgxn.org/dist/temporal_tables/ is an extension implementing a prototype of "transaction time" temporal data, though in modern Postgres, you can do this much yourself.
In both cases, none of the SQL syntax specific to temporal queries has been implemented yet. Rather than a distinct view and backing "current" and "history" tables and etc., you should be able to just have a table that represents all of those things, and do queries on its components using different syntax. (Sort of like how S3 versioned bucket resources work.)
Good question. I'm contributing to postgres for quite a few years and I still don't have a clear answer to that. It certainly is not a single discrete reason, but IMHO a combination of various factors:
3) not founded/owned by particular company (no one has the upper hand, companies serve customers with different needs, compete but also cooperate, ...)
4) code / documentation quality
I could probably go on, but those four are probably the most important ones. (At least for me.)
Congratulations to the postgres team. How do you create such a high quality product when you have to understand the code written by so many people? I take a long time to understand what someone else has written.
It's not like every contributor (or even committer) understands all parts of the code base - I certainly don't. People usually start by writing external code (e.g. by writing extensions in C), learn the code style and basic constructs (memory context management, data types, etc.) and then gradually expand their knowledge by working on stuff closer to core and/or reviewing stuff that interests them. At least that's how it worked for me and a couple of people around me. I don't think it's particularly specific to Postgres.
Regarding the high quality, we don't do anything magical - we simply apply well-known tools in a careful and methodical way.
1) careful reviews - Not just the code, but appropriate tests, comments and docs, and most importantly overall design/direction. It's not uncommon to reject the patch on the basis that there's a better approach, and get it reworked from scratch. Evan patches from the most senior contributors generally go through it too (with the exception of bugfixes, ...).
2) not accepting technical debt - No quick hacks that would make our life harder down the road. No features that are arbitrarily incompatible with other features. No compiler warnings.
3) incremental development - Split large patches, even if it means it's not really useful initially.
4) testing - We have a range of tests, that are executed on a variety of machines/platforms/compilers ... (https://buildfarm.postgresql.org/)
5) adopt useful technologies - static analysis, test coverage, valgrind, fuzzing (sqlsmith found quite a few bugs), ...
6) attention to bug reports / fixing them promptly (instead of just stashing them somewhere forever)
I'm sure there are many additional things contributing to the quality. In any case, there's no secret sauce, except for applying those principles consistently.
Thank you very much!
I stand to gain much from your thoughtful answer.
Recently my career took a big downturn. I got laid off and found it hard to get a programming job. I am doing production support now. I feel great despair that I don't measure up as a programmer anymore. I need to get back my programming skills but I am confounded by the huge complexities of the current stacks.
The PostgreSQL ethos, from the very beginning, has been "make it correct, then make it fast". I don't know a lot about the team structure but I do know they don't have much in the way of formal process. It's just a team of competent professionals making the world a better place.
Also the initial release of features is usually very minimal, almost to the point of not being usable in the real world except for a very targeted use case. Then things sloowwwwly expand to cover more and more things as people have time to test and review.
And if your patch adds a good feature but in a way that is bad architecturally then it will be rejected or fully rewritten with no care about what that means from a timing perspective.
It means that it takes awhile for things like parallel query to really become valuable but when they do you know they work and work well.
Yep, it's honestly the epitome of release early and release often applied to something mission critical like a DB. By the time it's in a release, even the minimal version of a feature is pretty good quality, but just not necessarily the full-feature yet, so people can still test it and rely on it like it's a full stable feature missing minor features.
My opinion (as a community member and contributor):
PostgreSQL is willing to accept delays in some features -- even if highly desired -- to enable high overall feature development and quality.
Also, more developers are willing to put time into refactoring and simplifying what they need before starting on their project. In other words the groundwork is laid first and then the new feature.
Another thing that might help is that there is no bug tracker. Bugs are fixed and the fixes backported after a report comes in. Bugs don't end up stuck in a backlog.
I think the simple answer is philosophy of Kaizen(改善)- continuous improvement. I started working with postgresql when it was just postgres95. Recently found that my earliest question on postgresql-users was 13th October 1998, 20 years ago (https://postgrespro.com/list/id/3.0.2.32.19981013214318.0069...).
I still use PostgreSQL in my startup for almost all critical production workload. This in itself a testament to its improvements. I remember I starting with c libraries of msql and postgres95 and continued to work with them as they evolved into postgreSQL and a new kid on the block mysql. I liked postgreSQL due to its adherence to SQL. Since I like working with relational algebra. So even though MySQL was more popular in later years, I still used PostgreSQL more often, except for building qmail servers with mySQL backend.
This might be valid if we were discussing an internal, proprietary software project.
We're talking about an open source project used by millions of people that has gotten CONTINUOUSLY better over the course of decades - to the point where it is now - arguably - the best of its kind.
Given this track record, I think postgres devs are entitled to use whatever management-speak euphemism they like. In this case, the idiom's original meaning/intent seems apt.
I guess it's simple: they have world-class engineers behind it. Of the smartest people in tech, that truly understand their craft. That, and a broad, open community.
DBs might not be glamorous but I'm sure there are many very interesting problems to solve when developing them. Of course on such a large, old project there's got to be a lot of tedious maintenance work as well, but that's true for all software in my experience.
It's often a conversation I have with people looking to get into software development. Often they'll aim for video games or something like that but I often warn them that it might not be nearly as cool as they imagine. You're more likely to end up scripting crappy menu systems than being the next Carmack.
On the other hand some of the most interesting pieces of software I've written were for very unsexy industrial applications. And I actually have good working conditions unlike people working in the videogame industry apparently.
Often they'll aim for video games or something like that
but I often warn them that it might not be nearly as cool
as they imagine. You're more likely to end up scripting
crappy menu systems than being the next Carmack.
Interestingly, Carmack himself said if he wouldn't write videogames, he would work on databases. I can't find the exact quote, but here's a similar reference:
I enjoy lots of different areas of it [programming]...
I'm taking a great deal of enjoyment writing device
drivers for Linux. I could also be having a good time
writing a database manager or something because there
are always interesting problems.
And I recently had mindblowing moment when I realized that Civilization VI is heavily SQL based. I didn’t have time to dive into it but the Graphical UI is basically an interface to visualize and update a database.
Seems afterwards quite logical and clever considering the turn based gameplay.
Now just design a civ style game with sql command-line interface but fancy autocomplete bright warcarft3 colors and a fun storyline . Have a generation of gamers learn sql
I'm not sure I understand the "open source" part - it's not as if we do it for free. I'd say for most of the senior PostgreSQL engineers / contributors it's part of their paid jobs. We either work for companies providing various PostgreSQL services, or for companies using PostgreSQL.
We might do the same type of work on a proprietary database, but that would lack the community/cooperation aspect of the project.
Of course, enjoying the craft is somewhat natural in both cases. And yes, it's a bit scary how smart some of the contributors are.
It varies a lot. Many parts of the code base are very accessible and easy to modify for almost anyone, while some parts, especially the query planner, contain some of the most advanced code I have worked with.
And probably (because of that) a very robust codebase-base. Adding a JIT to a 'textual query'->'results' pipeline is a lot easier if you took the effort to silo off each individual step than it is when you consider 'text'->'result' to be one fixed step. Even if the initial partioning is a lot of work.
(pure speculation, I haven't seen their code yet, but I am curious now).
Anyone more involved in the pgsql community able to comment on how this balance is being struck? It doesnt' seem like the release schedule has changed much, though the version numbers seem to be incrementing faster (no v10.1, 10.2, etc?)
I think there are many factors on tech. side it's pretty modular and has very detailed comments in the source code, on the ecosystem side there is no single entity behind PG so it created an ecosystem where multiple entities can contribute and build products and services around PostgreSQL.
There are widely different opinions about that. I have never heard someone proclaim MySQL has a great command line, while I heard that about PostgreSQL many times.
One thing that Postgres does not do well is row churn. If you have a table that have rows that are constantly added/deleted/updated, Postgres tends to bloat pretty quickly because of the way they handle updates. Say what you will about using a relational db for that kind of data, but if you need to, Postgres isn't a great choice.
You're right UPDATEs may be an issue (because we handle them essentially as DELETE+INSERT). Generally speaking, row churn in the table alone is not an major issue - it's easy to clean up by vacuum, and it will be reused for new data. And you can limit the amount of bloat by tweaking the autovacuum parameters.
What's more painful is bloated indexes (e.g. due to UPDATEs that modify indexed columns), because that's much harder / more expensive to get rid of.
The thing is - this is part of the MVCC design, and it has some significant advantages too. It's not like the alternative approaches have no downsides.
In some applications, at least, vacuuming is not sufficient to deal with row churn.
I use PostgreSQL in an embedded device. There is a high insertion rate, and eventually when the disk starts to get full I need to get rid of old rows.
Using plain DELETE and VACUUM does not work. The deletes aren't fast enough to keep up with the inserts, and vacuuming reduces performance to the point that I have to drop data that is waiting to be inserted. This is on a high performance SSD and I've tuned postgresql.conf. (Bigger/better hardware is not possible in my application).
Instead, I think partitions with DROP PARTITION are the only way to handle high volume row churn. Dropping a partition is practically instant and incurs no vacuum penalty.
Yeah, DROP PARTITION is definitely going to be much more efficient than DELETE + cleanup. No doubt about that.
Not sure what postgresql.conf tuning you've tried, but in general we recommend making autovacuum more frequent, but performing the cleanup in smaller chunks. Also, batching inserts usually helps a lot. But maybe you've already tried all that. There's definitely a limit - a balance between ingestion and cleanup.
At the moment, PostgreSQL keeps all versions of all tuples in its heap files. Inserts, updates, and deletes all result in addition of new tuples to the heap, and the engine keeps track of which transactions can see which tuples. The vacuum process deletes tuples which are no longer needed, but in the meantime, there is bloat.
zheap would keep only the latest version of each tuple in its heap files. When a tuple got updated or deleted, the engine would move the old version into separate storage, the "undo" log. A vacuum process would need to clean up the undo log, but the heap would remain unbloated.
This is obviously very practical. But it's a shame that it introduces an asymmetry, where some transactions will be reading tuples from the heap, and some will need to root around in the undo log.
I note that this is the approach that Oracle has always used - as explained in this fine article by the same chap who wrote the blog post about zheap above:
We are about to release an open-source extension for PostgreSQL that will help with the bloat problem you mentioned here, at least for continuous updates like high volume aggregation (see: pipelinedb.com). PipelineDB will be refactored as a standard PostgreSQL extension this month. Continuous aggregation can help with the bloat problem for use cases where you only need aggregated / summary data, like realtime reporting & monitoring.
LSM-tree, fractal tree, B-epsilon tree or similar ideas based stuff for on disk storage is what solves the problem. Pretty much no tradeoffs compared to classic approaches, these things are just better.
LSM trees are good for frequent updates on recent data. Frequent updates of old data are much worse on LSM trees since they trigger merges on the oldest (and typically largest) levels of the tree.
Some storage engines allow for in-place updates of records if the fields are fixed-length or if the updated variable-length value fits in the existing space. I've used some of those (e.g., MS SQL Server) to good effect in high-update scenarios.
Depends on your implementation of course, but updates shouldn't be triggering merges on the largest levels. They should be accumulated until there is enough of them to merge or there is some other reason to merge things, like scrabbing. Either way, it's still more efficient.
Only in some cases of churn plus there are plenty of workarounds where you sacrifice some performance and/or usability to avoid the bloat so I would still recommend PostgreSQL since few databases consist only, or almost only, of the bad cases for bloat.
I used to work in online gambling where we had plenty of row churn and not much bloat at all without having to use any of the workarounds.
some people have problems with they way MVCC is implemented. like Mysql using rollback segments so it doesn't have issues with vacuuming [tho, there is tradeoffs with doing it that way as well]. the vacuuming issues are really scary and basically involve losing write access to the whole database. have a look at stuff like this: https://www.joyent.com/blog/manta-postmortem-7-27-2015 . i believe postgres have made a bunch of changes to reduce the risk of vacuums blowing things up.
this MVCC implementation couple with the way indexing works also causes write amplification. the indices have pointers to the physical place where the row is stored (some other databases might instead record the primary key and do a lookup on the primary key to get the record). so if you update a row and it causes it to move to a different physical page (you need to keep the original row for MVCC so there needs to be space for the new row in the current page) then you need to update all the indices as well. PG has some optimisation around trying to write the update to the same physical page to reduce the number of writes.
Yeah, we're not particularly good at those out of the box.
1) Horizontal scaling: In some cases it's doable using streaming replication, but it depends if you need to scale reads or writes. Or if you need distributed queries. There are quite a few forks and/or projects built on PostgreSQL that address different use cases (CitusDB, Greenplum, Postgres-XL, BDR, ...). And the features slowly trickle back.
One reason why it's like this is extensibility/flexibility - the project is unlike to hard-code one particular approach to horizontal scaling, because that would not work for the other use cases. So we need something that does not have that effect, which takes longer. It's a bit annoying, of course.
2) Storage systems: We don't really have a way to do that now - there are extensions using FDW to do that, but I'd say that's really a misuse of the FDW interface, and it has plenty of annoying limitations (backups, MVCC, ...). But it's something we're currently working on so there's hope for PG12+: https://commitfest.postgresql.org/20/1283/
MS SQL Server still has a few things up its sleeve which can be valuable in specific cases:
1. Incrementally updated materialized views (in MS SQL Server, these are called 'indexed views').
2. SQL Server Management Studio is better than anything I've seen elsewhere.
3. Reporting and analytics (SSRS/SSAS) built in. These are actually pretty good, but not the approach I would recommend.
4. Very solid clustering (I haven't used PostgreSQL's clustering, so this point might be out of date).
The license fees are steep (not as steep as Oracle's...) though, so you've got to really want those features.
I think report builders tools miss the sweet spot. Instead, build the specific reports that your business needs. Or, do a regular export to flat files (tab delimited causes fewest problems) and let people build what they need in Excel/R/whatever. Or both.
In practice, I try SQLite first, and fall back to Postgres if I need concurrent writes.
I'm a fairly casual PG user but I have these issues:
Lack of case and accent insensitive collations is inexcusable at this point. Is there anyone that enjoys sprinkling every bit of SQL with upper() comparisons?
Non materialized CTEs. I like materialized CTEs sometimes and wish MSSQL had them as an option but I also need it to work the other way.
Native point-in-time recovery out of the box. This looks like it might be possible but the process doesn't fill me with a lot of confidence.
Better connection scaling.
Ability to load custom TS dictionaries in user space (for hosted postgres on google/aws/etc..)
We had Rackspace servers in the place I worked a couple of years ago and they didn't offer Postgress as far as I remember. (You could install and manage it on a virtual machine but then you had to do everything yourself, while they had managed MySQL servers available).
Also setting up master / slave replication seems easier with MySQL as I understand - I never tried with Postgress but did with MySQL.
Multi-region replication like Cassandra? Sort of a question because the last time I really looked around the release of PostgreSQL 9 there wasn't a solution.
Actually an interesting solution to this is using something like kafka-connect. Debezium specifically has libraries around pgsql -> pgsql replication using kafka + kafka-connect that work shockingly well. The lag is not much different than traditional replication but you can have a single primary that kafka-connect is pulling into kafka, and a gazillion replicas off that single stream.
MySQL with Innodb is easier to manage in large replication clusters, doesn't require connection pooling, and doesn't have write amplification problems.
MySQL out of the box is more secure and easier to configure than Postgresql because of pg's public and the legacy config files.
Why would it be better than MySQL is you're already using MySQL for years and built knowledge with it and have no issues? Why would I change, why Postgres would be better?
The JSON/JSONB columns and the JSON navigator options are amazing. MySQL isn't even close yet, they have a JSON column that's basically opaque. Other column types like the INET, CIDR and MACADDR are often invaluable.
You can also index ARRAY columns, meaning if you need a "tags" feature it's ridiculously easy to implement, no need for additional tables.
There's also PostGIS, which has no MySQL analog, and makes doing actual real-world GIS work possible.
Having tried, and failed to use MySQL Spatial Extension (which some might argue is PostGIS' analog) and failing miserably at some pretty basic things, I agree totally. I'd take PostGIS over ESRI's SDE and Oracle Spatial as well.
One "quality of life" thing - you can wrap changes to the schema in a transaction, so that a given set of changes is atomic. (Elixir's Ecto and Ruby's ActiveRecord both have migrations that do this by default.) As far as I know, that doesn't work in MySQL.
Feature for feature, PG blows MySQL out of the water. PG gives you way more tools to use in your queries, indexes etc. to solve app problems.
The real knowledge gap would come into things like all the various tactics you need to learn to get MySQL to produce good plans for your queries. PG has a much smarter optimizer which is harder to control - if it does the wrong thing, it's harder to encourage it to do the right thing, and philosophically they don't support query hints. OTOH, worst case in PG is usually much better than worst case on MySQL.
I know we got 11 just today, but I can't wait for 12 already! Rumours about alternative storage systems are extremely exciting, the idea of having a columnar materialised view (just guessing, not sure what the actual implementation will be like).
I can't wait to throw out all the columnar databases and the ETLs I have to support them, all just for a few queries.
pg12 may be removing the (mandatory) optimization fences in CTEs, this is really exciting to me as it'll enable much more maintainable complex queries to not suffer a performance hit.
As a general rule of programming, I never want syntax sugaring or a chosen expression pattern to impact runtime performance. I have found it's optimal to write expressive code first and performant code second - and if there's a trivial expression transformation between the two a compiler or interpreter better be equipped to do it for you.
I'll play devil's advocate: the optimization fence on CTEs is a godsend. The query planner is often the wrong kind of smart. A big table can get autovacuumed in the middle of the night, cause the planner to use a new query plan, and a query that used to be instantaneous takes minutes.
If you know the query plan you want, you can usually force it using CTEs.
Really I'd be happier without a query planner at all.
Don't think I'd like having no query planner, most of the time it does the right thing. But having the ability to more easily correct it when it goes wrong would be nice (arguably postgres already supports the ideal way of doing this with it's custom statistics, though sometimes it's easier to have a more manual knob).
Sorry, to clarify the "(mandatory)" in the comment above, pg12 will be making CTEs optionally optimization fences, so people who want to hand-plan their query still can, but people who wish to use CTEs for organization and clarity will be able to do so without being forced to do so in a manner that matches performance needs.
I think being happy not having any query planner at all is a bit extreme, but there are times when queries need to be tuned to force early data culling or adherence to indexes, I find these times to ultimately be pretty rare and generally have to review all the instances of our manual optimization whenever we update postgres to see which ones the query planner can now do better. This change would still allow you to sidestep the query planner, but would also open up expressive CTEs to use in performance sensitive queries.
I think "will be making" unfortunately includes a bit of wishful thinking. It's far from clear we'll have a patch that's mature enough by the time the feature freeze is there and that also has the necessary support.
It is very hard to say right now what will make it into PostgreSQL 12. Some further improvements to partitioning have already landed and I expect more to land, but for other features I have no idea. There are some very ambitious projects in the pipeline.
> Rumours about alternative storage systems are extremely exciting, the idea of having a columnar materialised view (just guessing, not sure what the actual implementation will be like).
Note that even if we get the pluggable storage work into v12 - which I hope and think we can do, I'm certainly spending more time on it than I'd like - it'll not include a columnar storage on its own. There's others working on storage engines, but the furthest along intended for core aren't, to my knowledge, columnar. And even if somebody submits one for core, it might be a while till it's fast enough to satisfy your demands ;)
Maybe I'm just spoiled from how fast they implement all these features but I wish that, in the near future, table inheritance would also get some love. In its current state, inheritance needs a lot of manual checks[0]:
> unique, primary key, and foreign key constraints are not inherited.
I'm very excited about the JIT query compilation stuff finally making it into postgres! A big win for OLAP queries, although it looks like they compile specific expressions within the query rather than the whole query operator, but that could change. I'm very curious to see how this interacts with the query planner (e.g. when it's worth the overhead or not).
Right, it's "just" expressions and tuple deforming right now. Especially the former required significant refactoring (landing in v10), but that's nothing against the refactoring required to do proper whole query compilation. I'm working on the necessary changes (have posted prototype), which have independent advantages as well.
There's planning logic to decide whether JIT is worthwhile, but that's probably the weakest part right now.
PostgreSQL has been such a joy to use for both hobby and professional projects. It's so reliable. I wonder when Heroku will support 11 and I'm curious if upgrading from 10 to 11 with heroku using the simple pg_upgrade would break anything. If it's just a pure performance increase, I'm looking forward to doing it to a few production applications.
Piggybacking for a Postgres question: I have thousands of legacy software instances all around the world, running Postgres with very marginal internet connections (think IoT). I’d like to continuously get all their data into my cloud. Should I use Postgres built in async replication for that?
If you added your design goals and constraints (do you want it to be fast? frequent? power-efficient? which ones can you trade off?), this would make a good question for the DBA stackexchange.
Hard to say for sure without knowing more but I would probably build my own custom protocol for syncing this, because the replication (both binary and logical) is built with the assumption that you have a reasonably constant connection from the master to the replica. If not you will start to build up large amounts of write-ahead logs.
Never invent the wheel if you haven't done your research yet.
Colleagues of mine successfully employed CouchDB in a situation where devices were offline for an x amount of time (running on tablets in airplanes), once they had internet again they would reliably start syncing data with the main database again. This was a number of years ago though, I haven't heard anything about CouchDB since.
Who says you need to feed it directly to the replica. Logical replication provides infrastructure for decoding changes, and you may fetch it any way you want / how often you want / feed it wherever you want (file, another database, ...).
It would be trivial to write something that connects to a bunch of machines regularly, fetches the decoded increment and feed it somewhere (say, to a single database over a shared connection).
You need to accumulate the data somewhere. If you don't fetch if from the node, it has to accumulate there. Custom sync protocol does not eliminate this.
Congratulations and big thanks to PG community & Contributors.
It has been my choice for all kinds of systems, OLTP, OLAP and Data warehousing. Its just get better with PG 11
I would love to see native column store format, distributed sharding support and in memory tables in future releases.
The documentation of creating partitions [1] says this:
"When creating a hash partition, a modulus and remainder must be specified. The modulus must be a positive integer, and the remainder must be a non-negative integer less than the modulus. Typically, when initially setting up a hash-partitioned table, you should choose a modulus equal to the number of partitions and assign every table the same modulus and a different remainder (see examples, below). However, it is not required that every partition have the same modulus, only that every modulus which occurs among the partitions of a hash-partitioned table is a factor of the next larger modulus. This allows the number of partitions to be increased incrementally without needing to move all the data at once. For example, suppose you have a hash-partitioned table with 8 partitions, each of which has modulus 8, but find it necessary to increase the number of partitions to 16. You can detach one of the modulus-8 partitions, create two new modulus-16 partitions covering the same portion of the key space (one with a remainder equal to the remainder of the detached partition, and the other with a remainder equal to that value plus 8), and repopulate them with data. You can then repeat this -- perhaps at a later time -- for each modulus-8 partition until none remain. While this may still involve a large amount of data movement at each step, it is still better than having to create a whole new table and move all the data at once."
Theoretically that should just be a data sync while maintaining double-write read-primary, and then delete the data from the nodes you don't need anymore once the data has been synced? Of course with non-hash indexes the deletions start to slow down with size...
I'm assuming joins, indexes, etc are all isolated to the shard data?
Looks like a great release even to a Postgres noob like myself. Only just recently started using Postgres but I really enjoy it and it inspires me to dust of my database skills and learn more SQL.
I'd like to take the opportunity and ask if anyone has any recommendations or "must have" settings to the Postgres-terminal. Preferably on MacOSX.
For I found the formatting acting kind of weird in some cases when selecting multiple columns or writing long queries. I figure there must be some things you setup at the start and then can not live without...
My only settings are setting extended query mode to automatic to handle output with many columns and to turn on query timing. But it is common for people to also change the NULL symbol from empty string to something else or to have unicode borders.