Everyday performance rules for Ruby on Rails developers

andatki · on Dec 7, 2023

Really solid list! I went through all the Active Record, query design, and index design tips for PostgreSQL, and can +1 them all. Nice work.

For readers who want all of these and more in book form, with a sample Rails app and big data to test with (generated), please consider my book:

High Performance PostgreSQL for Rails https://news.ycombinator.com/item?id=38407585

The book helps readers build database skills with the overall purpose of improved performance and scalability.

Again, great, concise article. I’ll be recommending it to others and it will help a lot of developers!

Thanks!

a12b · on Dec 7, 2023

Thanks. I bought your book since the B1.0 version and I recommend it.

andatki · on Dec 7, 2023

Thank you! Beta book period is nearly wrappped up!

Vayl · on Dec 7, 2023

Awesome, thanks for plugging this here! Exactly the kind of material I was looking for :)

andatki · on Dec 7, 2023

Great! I wasn’t sure whether to plug it but it seemed very relevant to the post. :)

sunshine_reggae · on Dec 7, 2023

> We can’t think of any good reason to do without [CDNn]

CDNs are another way to track everybody. So privacy is an excellent reason for not using a CDN.

dmurray · on Dec 7, 2023

It's also an extra point of failure, an extra account to manage, etc. I have seen plenty of low-traffic Rails (and other) apps where neither hosting cost nor performance would be significantly improved by adding a CDN.

jon-wood · on Dec 7, 2023

I'll add cache invalidation to the list of reasons not to use a CDN. Its a solvable problem, but I've more than once seen irritating issues caused by something getting cached in the CDN layer and loading the wrong resources.

c0balt · on Dec 7, 2023

Or simpler, why do you need a CDN? It's rarely worth the additional work (setup + deployment) when most websites are not limited by bandwidth for assets.

vemv · on Dec 7, 2023

If you're serving assets directly from S3 (or even from nginx) you're exposed to a "denial of wallet" attack, given the price markup on outbound networking.

zelon88 · on Dec 7, 2023

So you have 10 scripts on your page, and you put them in 5 different hosts.

What does the waterfall on that look like? You're probably talking about a web app that transfers 2.6mb to the user who only wants to read 78 bytes of text. And your primary concern is the markup on outbound networking? Don't design crap pages and outbound networking isn't a problem. Most of the data transfer is fluff nobody wants or needs anyway.

You built the website! Now you don't think it's worth sending to the user? Or did you maybe go overboard when you were adding things to it?

__s · on Dec 7, 2023

You're answering "Why do you need CDN?" with "Because I'm on AWS"

Their point stands: if you don't have a reason to use a CDN, don't use a CDN

seanwilson · on Dec 7, 2023

> Enable keep-alive connections. Keep-alive connections are reusable. They prevent having to re-establish a connection, as well as SSL negotiation. They reduce latency time for all pages made up of several resources.

Pretty sure this only applies to HTTP/1 and you'll get better performance with HTTP/2:

"Connection-specific header fields such as Connection and Keep-Alive are prohibited in HTTP/2 and HTTP/3"

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ke...

"HTTP persistent connection, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair. The newer HTTP/2 protocol uses the same idea and takes it further to allow multiple concurrent requests/responses to be multiplexed over a single connection."

https://en.wikipedia.org/wiki/HTTP_persistent_connection

bruce343434 · on Dec 7, 2023

> We can’t think of any good reason to do without them except for an application running solely on a private network.

Ever read those articles that explain half the internet is unavailable because of some random e.g. cloudflare outage? That.

lukeasrodgers · on Dec 7, 2023

Some good advice here, but the “don’t index boolean columns” needs an “it depends” caveat, since Postgres will sometime use multiple boolean indexes to perform a bitmap index scan, which can be advantageous.

silvestrov · on Dec 7, 2023

If the boolean values are not 50%/50% but skewed like 1%/99% then there can be a big advantage in using a partial index

CREATE INDEX index_accounts_balance ON accounts (id) WHERE boolean_flag;

andatki · on Dec 7, 2023

Correct, when the query conditions match the index conditions, and they both select a low proportion of the rows. For PostgreSQL and those unfamiliar with partial indexes, worth a read: https://www.postgresql.org/docs/current/indexes-partial.html

andatki · on Dec 7, 2023

I like how the PostgreSQL docs use the term “profitable” for whether an index helps.

The purpose of the index often is to lower the “cost” (greater profit), by providing more values the query is accessing (filtering, ordering, selecting) within the index entries, for faster retrieval.

When users have the skills to generate query plans and review whether an index supports a query, verifying that the index is picked by the planner, and indeed lowers the cost, then they can answer this question for their own unique combination of hardware, data distribution, queries, and indexes.

As generic advice, I think more often than not the index won’t be used for Boolean columns. But it’s generic advice and it does depend.

As you suggested, users must check their own system.

My book also covers the 2 and 3 value variations (nulls allowed) for Boolean columns.

For readers wanting to build these skills themselves, here’s more info:

https://news.ycombinator.com/item?id=38407585

Good nudge that it depends!

qrush · on Dec 7, 2023

Really glad to see more basic Ruby / Rails content showing up on HN!!!

Yhippa · on Dec 7, 2023

This got me to take a look at the Rails site and go through the Getting Started tutorial. It's really well-written and is probably amongst the best documentation I've seen out there. I would love to do more projects in Rails now.

andatki · on Dec 7, 2023

That's great! Hopefully the authors of Rails Guides and the Getting Started tutorial see this. I'll share it with a core member and ask them to reshare it. I'm sure they'd appreciate seeing their hard work get recognized, and would welcome for your feedback.

resonious · on Dec 7, 2023

Good stuff but the `size`, `count`, `length` section just intensifies my dislike for ORMs. ORMs bury all of the SQL, just for devs to dig it back up when they realize it's important for performance. Now you have to be a SQL expert and an ActiveRecord expert.

vemv · on Dec 7, 2023

This particular is easy to pick up and remember.

Also, a proficient developer looks at the SQL logs anyway as they develop a given feature.

Rails' flavor of ORM is particularly composable and transparent - you can easily mix/match it with vanilla SQL.

resonious · on Dec 8, 2023

I guess this is a fair point. It is easy to use SQL with AR. But when you do so, you get lambasted in code review by people saying "you can generate this same SQL with this arcane Arel incantation!!". But that is certainly a culture problem and not a technical one!

andatki · on Dec 7, 2023

I tend to agree as a SQL enthusiast. However I have yet to see a Rails team that doesn’t use Active Record or writes much SQL directly, or by default, across 100s of apps. I’m sure it happens but in my experience it’s rare.

This is a place where I think tools like Rubocop help. They can be configured to point out method swaps like this (size over count) automatically which is a relatively low effort task to change the code.

With those rules/linting in place, you aren’t throwing out the benefits of AR (ORM), and hopefully leveraging their useful methods like these that help avoid unnecessary queries.

ckdot2 · on Dec 7, 2023

Please, please don't mix up ORMs with ActiveRecords. ActiveRecords are one way to implement an ORM, but it's not the only way. I think many say they hate ORMs when they actually mean ActiveRecords. For bigger projects ActiveRecords suck, yes. But also you need to have some database layer logic which most likely does some Object & Relation Mapping (ORM).

phendrenad2 · on Dec 7, 2023

I disagree. POROs are the way to go.

allknowingfrog · on Dec 7, 2023

I don't want to live in a world where I can't pop into a Rails console and run something like `Foo.joins(:bars).any?`.

More concretely, performance is on my list of "good problems to have". Businesses die in the time it takes to write raw SQL.

mrtesthah · on Dec 7, 2023

sometimes performance can be so bad that the business can’t launch in the first place.

allknowingfrog · on Dec 7, 2023

Oh sure, it's all about knowing your requirements. I write raw SQL when performance or complexity call for it. My intention was to argue that both have merits, not that ORMs are the one true solution.

thomasfl · on Dec 7, 2023

These performance rules applies for all backend development. Use compression and caching, index foreign keys in your database and tune your sql queries.

MrBusch · on Dec 8, 2023

Great list, but one caveat I'd add is this: While "SQL will always be faster than your code" is true, in the context of a sufficiently large app with many parallel requests the solution might still be to do some processing in the app because it can scale horizontally and (most) databases can only scale vertically and are thus more limited.

cattown · on Dec 7, 2023

The hint about using .pluck to only grab what you need from an ActiveRecord query is a pretty good one. I hand't realized you could do that.

I assume this is telling us it doesn't actually make an ActiveRecord instance out of each row when you do that. And instantiating big bunches of ActiveRecord model instances just to grab a few fields from a result set with a lot of rows can be sooo slow.

durkie · on Dec 7, 2023

That's correct. If i run `User.limit(5).pluck(:id)` the query it runs is `SELECT "users"."id" FROM "users" LIMIT $1 [["LIMIT", 5]]` and returns an array, not an ActiveRecord association

hnatt · on Dec 8, 2023

You can also do `User.limit(5).ids` which does the same thing.

Fire-Dragon-DoL · on Dec 11, 2023

Isn't gzip disabled with https a thing since forever due to a security issue?

pftg · on Dec 7, 2023

Nice list. Some good and some of them are contr-productive like promoting of use of preloading and calc in memory.

mediumsmart · on Dec 7, 2023

Excellent list, thank you for that. Implementing.

the CDN fauxpas is forgiven, nobody is perfect

VoodooJuJu · on Dec 7, 2023

Step 1: stop using Ruby on Rails

I can't even read the site with that text contrast. Why is illegibility a trend at all?

seattle_spring · on Dec 7, 2023

Looks completely fine and legible. Maybe something is wrong with your settings or screen?