Hacker News new | past | comments | ask | show | jobs | submit login
Postgres Guide (postgresguide.com)
362 points by craigkerstiens on Aug 3, 2015 | hide | past | favorite | 51 comments



I like the idea of such a guide, but the resources currently there can be picked up by someone in less than a day of using postgres. It is a good start however as it covers the basics well except for installation/setup. I think that section needs to include setting up postgres database data in a different partition than the one where the DB itself is installed as well as setting up clusters.

Also it lacks certain information that is really helpful particularly when dealing with Postgres. An example is this: http://blog.jonanin.com/2013/11/20/postgresql-char-varchar/

The harder things that I've had trouble with are not covered. They include escaping JSON correctly when dumping the database to CSV, stored procedures, setting up a cluster, automated backups, fallover DBs and recovering from a crash with minimal downtime.


Yes, it would be lovely if there was a guide for those topics you mentioned.


I'm going to restrict myself to official documentation here as much as possible....

How to dump things to any file the correct way using the postgres COPY command: http://www.postgresql.org/docs/9.4/static/sql-copy.html

Stored procedures: http://www.postgresql.org/docs/9.4/static/plpgsql.html (ok, this one is a bit big... but PL/pgSQL is damn powerful)

Clustering is a topic that lives outside PostgreSQL, there are some helpful posts on the wiki though: https://wiki.postgresql.org/wiki/Replication,_Clustering,_an...

https://wiki.postgresql.org/wiki/Clustering

Failover: http://www.postgresql.org/docs/9.4/static/warm-standby-failo... ( You should really read all of this though: http://www.postgresql.org/docs/9.4/static/high-availability.... )

Recovering from a crash: This is a difficult topic. There isn't a great single page in the documentation, but essentially it just replays x-logs (from what I remember last, this may have changed), and if you lose them, there are a number of options. pg_restore from a backup, use this: http://www.postgresql.org/docs/9.2/static/runtime-config-dev.... the list is pretty long for options.


http://use-the-index-luke.com/ is another great resource though not specifically targeted at Postgres.


Another thing to consider that I don't see covered often is the object relational nature of PostgreSQL. So often I will see PostgreSQL being used like any other RDBMS when there are some very powerful development and querying tools in the system that are uniquely PostgreSQL.

Things like being able to treat relations as types, compound types and the like. Naturally where and when these things are appropriate and when they are not would make for good subject matter not often times covered.


What is "relations as types"? I googled to no avail.


Relations are tables, indexes, views, etc. When you create one, PG will also create a composite data type[1] that corresponds to one of its rows. And you can use those types in interesting ways - for example, you can have a column in a table that stores a complete composite value.

[1] http://www.postgresql.org/docs/9.3/static/rowtypes.html


Yes, this is exactly why I think a guide examining such a topic would be good. You get a doc page for it (see icebraining's comment), but that's about it. Performance gets covered often, thinking about how to get the most of PostgreSQL's type system doesn't.

I don't really use this feature to drive structure (really at all), but do use it extensively when programming functions in PL/pgSQL. I also exploit this in queries sometimes as well. It's not a life changing feature to be sure, but it is relatively unique and can help solve problems that are more cumbersome to solve without the feature.


Why is PostgreSQL considered to have a steep learning curve? At least coming from another SQL database, it's a 5 minute install and has none of the weird quirks like "the empty string is null" etc.


Also, I find the PostgreSQL manual to be one of the best Free Software documentations. It is clear, accurate and up to date.

In general, PostgreSQL has no nasty surprises. It's a very smooth experience.

Even "under the hood", from a developers perspective, PostgreSQL has a very responsive mailing list, is friendly to newcomers and yet has a one of the best quality assurance mechanisms I'm aware of. Tom Lane and all the other PostgreSQL hackers do a really good job there. ("Commit fests", Releases on regular schedule, "Stable" is really stable, beta-versions are honestly marked as "Beta", etc.)


Postgres has fantastic documentation, but it still is rather intimidating to the completely new developer.

For example, take the following scenario:

  1. Go to project's website  
  2. Click docs/documentation.
This is what you get:

Postgres: http://www.postgresql.org/docs/

MongoDB: http://docs.mongodb.org/manual/

RethinkDB: http://rethinkdb.com/docs/

Wipe your entire memory of databases for a second: which one would you chose to delve further into?


Agreed. I think PG has pretty good reference documentation, with a few weaker sections, but horrible introductory material. Once you're familiar with postgres you wonder why you'd ever need anything more than good reference documentation, but that's just because you now know where to look for what.

EDIT: missing word


Actual link to docs is http://www.postgresql.org/docs/9.4/interactive/index.html , come on, be fair.


The GP is being fair. The link provided is the one exposed on the postgresql.org homepage. The GP was explicit in considering the different projects from the POV of a new user unfamiliar with them. If the url that you provide is the most appropriate, then that is the url that should be anchored to the 'Documentation' link on the Postgres homepage. I imagine that rethink and Mongo also have point versions. They both choose not to force new users to choose, devoid of context, which point version is most appropriate (sure you can use heuristics, like highest non-experimental point version, but this heuristic sometimes leads to pitfalls, Python and Perl come to mind).

And the http://www.postgresql.org/docs/ landing page is made even less necessary since the 'interactive' documentation already has links at the top to other point versions. Throw in a minimal sidebar and you could kill the landing page without losing any of the functionality.

Of course, Postgres has been around longer, and their last major website redesign probably pre-dates the entire existence of the other two websites altogether[1], but that has no bearing on which website is the least intimidating to newcomers. If postgres sees fit to make their website less intimidating to newcomers, then their work is cut out for them.

[1] Confirmed via Internet Archive's Wayback Machine. The last postgresql.org redisign was complete by early 2006. Mongo and ReThink websites showed up, in minimal form, in late 2008 and early 2009 respectively.


Man, the MongoDB documentation looks different. I was getting into it a couple years ago, and I was so frustrated with it, like it would explain that something returns an error but not what the values could be, or the prototype would show it returned something but the documentation wouldn't say what, or parameters would be left unexplained, etc. Looks like it's all new and different now, which is probably a good thing.


>At least coming from another SQL database..[]..

Yeah, that'd be it. Not everyone comes to Postgres with an understanding of another RDBMS or even SQL; I would imagine this guide is designed to reduce the learning curve for those developers who are more likely to pick up a popular NoSQL solution simply because it has more user friendly docs and a data structure they are already used to dealing with, i.e JSON (k:v).

HN users seem to agree with the concept "if in doubt, use Postgres" but that is far from the reality out there, especially in those circles not particularly used to handling ops (e.g frontend).


I can't speak as to other's experience but the one thing I found more complex was the user account side of things. Certainly compared to mysql's simple username / password system there was a lot more to understand to set it up.

I'm much happier with it now, and it's a way more robust system – but it was a hurdle to get over.

Other than that I found it a dream to work with straight away (though I have plenty of prior RDBMS experience to draw on).


What needs more documentation is the highlighting of sections where Postgres does differently from say, Oracle, MySQL or SqlServer...

e.g., If you are used to do X this way with MySQL, here's how to do it in Postgres.


Good opportunity missed here too, since a lot of stuff that's a bit of work in other DBs is often less in postgres... if you know the magic words.



Here's a good write-up on the difference between postgres and MS SQL.

http://www.pg-versus-ms.com/

I've been collecting links from this thread for my own personal learning so thought I'd throw in the only one I came here with.


IMO, it will be hard to beat the official documentation (at http://www.postgresql.org/docs/9.4/interactive/index.html)

So, what makes this different from that approach?


The presentation/navigation seems a lot better if anything.


I like the presentation of the guide. I wish something like this had been available when I started.

The "Filtering Data" example at the very bottom of this page [1] could be improved a bit. It's using >= AND <= for finding records between certain dates when BETWEEN is generally better for this. Since the guide is aimed at "beginners and experienced users", maybe a different example could be used or the BETWEEN version could be added below it as an example of cases where there's a more efficient way to filter data?

[1] http://www.postgresguide.com/sql/select.html#filtering-data


You gotta be careful when using BETWEEN.

BETWEEN does

    a >= x AND a <= y  
Usually, you want

    a >= x AND a < y
http://www.postgresql.org/docs/9.3/static/functions-comparis...


This is why I love the Postgres range types so much (http://www.postgresql.org/docs/9.4/static/rangetypes.html). With those, you get to chose based on the value whether your ranges include the boundaries or not. Even better: you get to chose for both start and end.


I recently replaced all my a >= x AND a < y with BETWEEN, and then I realised I got burned by what you describe :-) It's been on the Trello to-do to fix it for weeks...


Good point!

Edit: my parent post is wrong, the guide is using <, not <=, but I can't edit it now.


I'm using Postgres for a new project. After playing around with various NoSql stores, I've come to the following conclusion:

If you want a super scalable eventually consistent database, use one of the more minimally structured global key value store type DBs like Cassandra. In this case you do not want SQL's guarantees or structure, so don't go there.

If you want a structured database with ACID guarantees, just use XXXXing SQL.

NoSql databases that have attempted to implement structured data, consistency guarantees, rules, ACID or near-ACID semantics, etc., have all started to basically just converge with SQL. They end up re-implementing SQL but with a less consistent, hackier query language and they miss spots. They've gone around the circle and re-invented the wheel and in many cases ended up with an inferior one to boot.

Sure SQL is old. So is math. SQL is rooted in set theory and other pieces of immortal mathematical truth. It's a great example of a software system designed around ageless mathematical concepts that will always be valid. It could use some syntactic modernization, but the core of it will be as useful in a million years as it is today. Learn how to properly structure a database (normalization, DRY, etc.) and how to use its more obscure abilities (esoteric joins) and you'll find that it's amazingly powerful.

PostgreSQL is fantastic because it's doing just that: a bit of modernization around a solid core. It gives you SQL when you want structured data, and it also give you JSON columns when you want to store blobs of unstructured data in the database. So it kind of gives you the best of both worlds: SQL plus a JSON document store. You can (to some extent) query your JSON columns too, though if you intend to do this a lot I'd recommend moving that data into SQL-land.

This facilitates a kind of iterated development where you throw temporary and less structured data into JSON columns, then if you discover later that this data wants to be more long-lived and structured you migrate it to real SQL columns. It's a very agile/YAGNI way of doing things -- do it quick at first, then optimize and clean up once you know what wants to live where and what's really important.


What I find interesting is that PostgreSQL is slowly gaining features like that make it a good fit for many use cases of NoSQL datastores. With the BDR extension, it might even be possible to use PostgreSQL to implement an eventually consistent multi-master document store. Of course, none of this comes "out of the box", but it's interesting to see the two technologies becoming more like each other.


Great job, craigkerstiens.

I'm sure it needs more love and more content, but this is definitely a great start.

PostgreSQL's documentation is outstanding, but at 3004 pages (9.4's full documentation PDF) is no piece of cake. This guide serves as a starting point for people wanting to get into PostgreSQL.

Thanks!


Their reference manual is huge, yes, but the tutorial chapters serve as a nice introduction while giving roughly the same information as the site linked by OP:

http://www.postgresql.org/docs/9.4/static/tutorial.html

The tutorial has the additional advantage that it links directly to more advanced chapters in the manual for people willing/needing to go deeper.

It's also kept up to date by the people working on the database itself, so it's bound to be more accurate as time progresses.


Unfortunately none of the parts of the tutorial are regularly updated. It's been years since substantial updates to it have been made.


It looks like a good start.

I see why the performance section might not cover everything under the sun, but given how little it currently covers, I think that a link to some of the classic tuning resources would be very helpful. At the very least, mention that there are entire topics of Postgres performance that are not covered: For instance, per-table statistics targets, or tuning the database configuration to matches the available hardware and database size: If a DB has a lot of memory and is backed by an array of SSDs, the optimum settings will vary wildly from those of a small machine with a hard drive using platters (or, as some "interesting" people have done, hosting the actual database files in a network file system. shudder)


Thanks for the feedback, when initially creating the performance area there weren't a lot of great references. There's now one really great set of slides from a talk that gives you the basics of configuring your own postgresql.conf, I'll make sure to add it soon as an additional reference - http://thebuild.com/presentations/not-your-job.pdf


This looks nice, and I look forward to taking the time to read through it more thoroughly, but one thing that jumped out to me from the navigation bar is the location of 'Window Functions' under 'Postgres Specific Tips'.

Window functions are part of the SQL standard, and certainly not unique to Postgres. I don't have time to research the implementation history across major relational databases, but they've been available in MS SQL Server since 2008 at the latest, and I know they're present in Oracle as well. It would make sense to include these under 'General SQL'.

Edit: Just noticed that this section is under both headings, nearly identically (some links are different). I didn't notice this before.


Looks like a good guide, but some UI issues detract from it. The mobile menu doesn't close when you tap the menu icon again (or tap outside the menu) and it could really use a next page button :)

Great job otherwise!


Thanks! Sadly I'm not a designer. The mobile is definitely far from perfect, and will continue iterating on it. If you're especially design inclined I'll welcome any PRs to improve that experience - https://github.com/craigkerstiens/postgresguide.com


I submitted a PR for previous/next text and mobile menu fix.


We plan to write a great guide, similar to DO model, once Kong (https://github.com/mashape/kong) will support Postgres (https://github.com/Mashape/kong/issues/331)


I always appreciate these kinds of references. Thanks for the material. A couple small items: 1) Some of your links under further reading, don't go anywhere. 2) I'm confused about the Joins section. The second sentence under Tables, says you will cover joins later, but I'm not sure when that happens. Otherwise, helpful material. Thanks again.


Thanks for the report on the broken links, should be all fixed.

As for the section on joins, it's definitely not well labeled, perhaps I'll churn out a page on joins tonight :)


This probably does not apply to users starting out with Postgres, but it would be nice with some tried-and-true advice for tweaking the configuration. The configuration only provides documentation on a per-setting basis and lacks an overview of what you should set for general purpose usage depending on your hardware.


This is simple short and right to the point guide for PostgreSQL newbies. Certainly less intimidating than the official documentation: http://www.postgresql.org/docs/manuals/


In OSX you can install through homebrew, you can add it to installation guide -https://wiki.postgresql.org/wiki/Homebrew


I've found Heroku's postgres.app (http://postgresapp.com/) easier to work with and keep updated.


It appears that the Querying content is in the Joins section.


Doh. You're absolutely right, was due to a bad copy and paste error from the old version. Should be all fixed now, but do need to add some more to it.


Would be great if this guide were available in an offline PDF version .. Any chance we could get that? (Don't want to have to wget -m -np everything ..)


As for a postgres newbie, this is gold.

Concise, clear and everything is in one place, that's what I was looking for. Also I can use it as a reference guide.


The idea is great. But pretty bare bone information in this guide. And my biggest complain currently: No license. I opened a bug about this in his github repo.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: