It's also an extra point of failure, an extra account to manage, etc. I have seen plenty of low-traffic Rails (and other) apps where neither hosting cost nor performance would be significantly improved by adding a CDN.
I'll add cache invalidation to the list of reasons not to use a CDN. Its a solvable problem, but I've more than once seen irritating issues caused by something getting cached in the CDN layer and loading the wrong resources.
Or simpler, why do you need a CDN? It's rarely worth the additional work (setup + deployment) when most websites are not limited by bandwidth for assets.
If you're serving assets directly from S3 (or even from nginx) you're exposed to a "denial of wallet" attack, given the price markup on outbound networking.
So you have 10 scripts on your page, and you put them in 5 different hosts.
What does the waterfall on that look like? You're probably talking about a web app that transfers 2.6mb to the user who only wants to read 78 bytes of text. And your primary concern is the markup on outbound networking? Don't design crap pages and outbound networking isn't a problem. Most of the data transfer is fluff nobody wants or needs anyway.
You built the website! Now you don't think it's worth sending to the user? Or did you maybe go overboard when you were adding things to it?
> Enable keep-alive connections. Keep-alive connections are reusable. They prevent having to re-establish a connection, as well as SSL negotiation. They reduce latency time for all pages made up of several resources.
Pretty sure this only applies to HTTP/1 and you'll get better performance with HTTP/2:
"Connection-specific header fields such as Connection and Keep-Alive are prohibited in HTTP/2 and HTTP/3"
"HTTP persistent connection, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair. The newer HTTP/2 protocol uses the same idea and takes it further to allow multiple concurrent requests/responses to be multiplexed over a single connection."
Some good advice here, but the “don’t index boolean columns” needs an “it depends” caveat, since Postgres will sometime use multiple boolean indexes to perform a bitmap index scan, which can be advantageous.
Correct, when the query conditions match the index conditions, and they both select a low proportion of the rows. For PostgreSQL and those unfamiliar with partial indexes, worth a read: https://www.postgresql.org/docs/current/indexes-partial.html
I like how the PostgreSQL docs use the term “profitable” for whether an index helps.
The purpose of the index often is to lower the “cost” (greater profit), by providing more values the query is accessing (filtering, ordering, selecting) within the index entries, for faster retrieval.
When users have the skills to generate query plans and review whether an index supports a query, verifying that the index is picked by the planner, and indeed lowers the cost, then they can answer this question for their own unique combination of hardware, data distribution, queries, and indexes.
As generic advice, I think more often than not the index won’t be used for Boolean columns. But it’s generic advice and it does depend.
As you suggested, users must check their own system.
My book also covers the 2 and 3 value variations (nulls allowed) for Boolean columns.
For readers wanting to build these skills themselves, here’s more info:
This got me to take a look at the Rails site and go through the Getting Started tutorial. It's really well-written and is probably amongst the best documentation I've seen out there. I would love to do more projects in Rails now.
That's great! Hopefully the authors of Rails Guides and the Getting Started tutorial see this. I'll share it with a core member and ask them to reshare it. I'm sure they'd appreciate seeing their hard work get recognized, and would welcome for your feedback.
Good stuff but the `size`, `count`, `length` section just intensifies my dislike for ORMs. ORMs bury all of the SQL, just for devs to dig it back up when they realize it's important for performance. Now you have to be a SQL expert and an ActiveRecord expert.
I guess this is a fair point. It is easy to use SQL with AR. But when you do so, you get lambasted in code review by people saying "you can generate this same SQL with this arcane Arel incantation!!". But that is certainly a culture problem and not a technical one!
I tend to agree as a SQL enthusiast. However I have yet to see a Rails team that doesn’t use Active Record or writes much SQL directly, or by default, across 100s of apps. I’m sure it happens but in my experience it’s rare.
This is a place where I think tools like Rubocop help. They can be configured to point out method swaps like this (size over count) automatically which is a relatively low effort task to change the code.
With those rules/linting in place, you aren’t throwing out the benefits of AR (ORM), and hopefully leveraging their useful methods like these that help avoid unnecessary queries.
Please, please don't mix up ORMs with ActiveRecords. ActiveRecords are one way to implement an ORM, but it's not the only way. I think many say they hate ORMs when they actually mean ActiveRecords.
For bigger projects ActiveRecords suck, yes. But also you need to have some database layer logic which most likely does some Object & Relation Mapping (ORM).
Oh sure, it's all about knowing your requirements. I write raw SQL when performance or complexity call for it. My intention was to argue that both have merits, not that ORMs are the one true solution.
These performance rules applies for all backend development. Use compression and caching, index foreign keys in your database and tune your sql queries.
Great list, but one caveat I'd add is this: While "SQL will always be faster than your code" is true, in the context of a sufficiently large app with many parallel requests the solution might still be to do some processing in the app because it can scale horizontally and (most) databases can only scale vertically and are thus more limited.
The hint about using .pluck to only grab what you need from an ActiveRecord query is a pretty good one. I hand't realized you could do that.
I assume this is telling us it doesn't actually make an ActiveRecord instance out of each row when you do that. And instantiating big bunches of ActiveRecord model instances just to grab a few fields from a result set with a lot of rows can be sooo slow.
That's correct. If i run `User.limit(5).pluck(:id)` the query it runs is `SELECT "users"."id" FROM "users" LIMIT $1 [["LIMIT", 5]]` and returns an array, not an ActiveRecord association
For readers who want all of these and more in book form, with a sample Rails app and big data to test with (generated), please consider my book:
High Performance PostgreSQL for Rails https://news.ycombinator.com/item?id=38407585
The book helps readers build database skills with the overall purpose of improved performance and scalability.
Again, great, concise article. I’ll be recommending it to others and it will help a lot of developers!
Thanks!