I run an Alexa top-2000 website. (Mangadex is presently at about 6000.) I spend ...

tristan9 · on Sept 7, 2021

Hi, we're trying to lower the requests:pageview ratio in general, but for what it's worth this article essentially:

- ignores the vast majority of "image serving" (most is handled by DDG and our custom CDN)

- the JS fragments thankfully should load only on first visit and then get aggressively cached by DDG/your browser

One of the pain points is that there are a lot of settings for users to decide what they should or shouldn't see (content rating, original language of origin, search tags, etc) and some are already specifically denormarlized (when querying chapter entities, ES indices for those contain some manga-level properties to avoid needing to dereference that first too) -- however this also makes caching substantially less efficient in many places, alas

Thanks!

jiggawatts · on Sept 7, 2021

Hi, I'm a performance tuning expert, and this thread piqued my interest.

The first thing that I noticed is that even with caching enabled, you're loading "too much data". After loading the main page and then clicking one of the tiles, there are several JSON API calls.

Here's an example, 195 kB transferred (528 kB size): https://api.mangadex.org/manga/bbaa17c4-0f36-4bbb-9861-34fc8...

Oof. Half a megabyte of JSON! Ignore the network traffic for a moment, because GZIP does wonders. The real problem is that generating that much JSON is very "heavy" on servers. Lots and lots of small object allocations, which gives the garbage collector a ton of work to do. It's also expensive to decode on the browser for similar reasons.

On my computer, this took a whopping 455ms to transfer, nearly half a second. That results in a noticeable latency hit to the site.

In my consulting gig I always give developers the same advice: "Displaying 1 kilobyte of data should take roughly 1 kilobyte of traffic".

In other words, there's isn't 500 KB of text anywhere on that page! A quick cut & paste shows about 8 KB of user-visible text in the final HTML rendering. That's a 1:60 ratio of content-to-data, which is very poor. I bet that behind the scenes, this took a heck of a lot more back-end network traffic and in-memory processing to generate. Probably tens to hundreds of megabytes of internal traffic, all up.

This is one of the core reasons most sites have difficulty scaling, because for every kilobyte of content output to the screen, they're powering through megabytes or even gigabytes of data behind the scenes.

Can this API query be cut down to match what's displayed on the screen? Can it be cached for all users? Can it be cached precompressed?

Etc...

tristan9 · on Sept 7, 2021

> The real problem is that generating that much JSON is very "heavy" on servers. Lots and lots of small object allocations, which gives the garbage collector a ton of work to do. It's also expensive to decode on the browser for similar reasons.

For what it's worth, this isn't generated live but a mix of existing entity documents

Most of it is page filenames which indeed could be made optional and fetched only by the reader, but that'd be us actively nulling them out in the returned entity, since they are there in the ES documents for the chapters (a manga feed like this being a list of chapters)

jiggawatts · on Sept 7, 2021

You're basically dumping down a database to the web browser, including all of the internal metadata that's likely irrelevant to rendering the HTML.

For example, user role memberships:

   {
        "id": "c80b68c5-09ae-4a50-a447-df7c5a4a6d01",
        "type": "user",
        "attributes": {
            "username": "kinshiki",
            "roles": [
                "ROLE_MEMBER",
                "ROLE_GROUP_MEMBER",
                "ROLE_POWER_UPLOADER"
            ],
            "version": 1
        }
    }

Also record timestamp dates like created/changed, along with contact details that may be revealing sensitive info:

    "attributes": {
        "name": "SENPAI TEAM",
        "locked": true,
        "website": "https:\/\/discord.gg\/84e3j9b",
        "ircServer": null,
        "ircChannel": null,
        "discord": "84e3j9b",
        "contactEmail": "senpai.info@gmail.com",
        "description": null,
        "official": false,
        "verified": false,
        "createdAt": "2021-04-19T21:45:59+00:00",
        "updatedAt": "2021-04-19T21:45:59+00:00",
        "version": 1
    }

But let's just go back to your response:

> Most of it is page filenames which indeed could be made optional

Do that! If you strip them out, the 529 kB document shrinks to 280 kB, which hardly seems worth the hassle, but when gzipped, this is a miniscule 13 kB! This is because those strings are hashes, which significantly reduces their compressibility compared to general JSON, which usually compresses very well.

It's basic stuff like this that can make a website absolutely fly.

Avoid giving computers unnecessary, mandatory work: https://blog.jooq.org/many-sql-performance-problems-stem-fro...

tristan9 · on Sept 7, 2021

As I said, it's not so much that we ask that data to be fetched -- it is there in the first place, and pulled from Elasticsearch, not a SQL database

Because of this model, we also make sure that Elasticsearch merely works a search cache, not as an authoritative content database (hence everything we add in there is considered public, on purpose, and what isn't meant to be public is just not indexed in ES)

However the gzip efficiency improvements would be really neat for sure

Fwiw I also don't work on the backend and there might be good reasons to not expressly filter out data (yet anyway, perhaps it will end up as a separate entity and be a include parameter)

BizarroLand · on Sept 7, 2021

I have to say I'm glad this is being talked about in a public forum. Outsiders rarely get to see brainstorming, troubleshooting & group discussion of technological issues like this.

Someone who is focused on the performance aspect & someone who is focused on stack stability discussing the real world input & output of a business system and showing why performance & UX are not the only metrics that matter is a good thing for us to see.

clambordan · on Sept 7, 2021

You can query Elastic for specific fields only: https://www.elastic.co/guide/en/elasticsearch/reference/curr...

Edit: As you said, there may be reasons on the backend not to filter things out of the query. Though it seems likely that the web response could be trimmed down.

kmeisthax · on Sept 7, 2021

This seems less like a performance problem and more of a security issue. Especially considering that this is a website that hosts unlicensed translations. How much of this information is actually intended to be made public?

krick · on Sept 7, 2021

> Displaying 1 kilobyte of data should take roughly 1 kilobyte of traffic

Is this to be taken literally? I don't consider myself a performance-tuning expert, but I'm not sure how can I make something useful out of this advice. Of course, "the less you transfer, the better" is an obvious thing to say (a bit too obvious to be useful, in fact), but does it really mean I should aspire to transfer only what I'm actually going to display right now? For example, there is a city autocomplete form on the page (well, a couple of thousand relatively short entries). In that case I would probably consider making 1 request to fetch all these cities (on input focus, most likely), instead of making a request to the server on every couple of characters you type. Is it actually a wrong way of thinking?

jiggawatts · on Sept 8, 2021

It's an aspirational goal, not a hard rule.

In your case, you're optimising for round-trips, which is also important. As long as you only send the city names instead of a huge blob that also includes a bunch of metadata, you're probably fine.

The most common example of my rule is that I often see SELECT statements on unindexed columns. This means that behind the scenes, the database engine is forced to do a table scan to find the row. If the query uses a wildcard selector, then it is also forced to return all columns, whether they are used by the application or not.

I commonly see scans over 100 MB tables returning 100 KB to the web tier, which then converts this to 200 KB of JSON to show 100 bytes of text to the end user. Simply adding an index to the table allows the database engine to reduce the data it has to process to 10-30 KB. Selecting specific columns can reduce that to a few kilobytes, and likely also shrink the JSON to match. Eliminating the JSON and directly generating the HTML on the server like in the good old days would cut the Internet network traffic down to minimum 100 bytes required also.

Similarly, you often see performance monitoring, logging, or graphing programs store data in fantastic detail and precision. Meanwhile, the graph needs only 16 bits of data, because screens are typically at most a few thousand pixels across in size! A case in point is Microsoft System Center Operations Manager (SCOM), which has a metric write amplification of something like 300:1, which is why it can't log metrics at a usefully high frequency. Not because that's impossible, but because it's wasting the available computer power to an absurd degree. Azure has inherited this code, and then layered JSON on top. (I guess when you bill by gigabytes ingested, the incentives are all wrong.)

baybal2 · on Sept 7, 2021

> This is one of the core reasons most sites have difficulty scaling, because for every kilobyte of content output to the screen, they're powering through megabytes or even gigabytes of data behind the scenes.

> Can this API query be cut down to match what's displayed on the screen? Can it be cached for all users? Can it be cached precompressed?

This is why you want to bypass the JS realm, (or whatever language does the serdes) and send clients JSON or XML directly from the database, so the client is only getting the data at rest.

maxk42 · on Sept 7, 2021

> the JS fragments thankfully should load only on first visit and then get aggressively cached by DDG/your browser

According to Alexa you have a 46.4% bounce rate. [1]

When 46% of your users aren't coming back, how does 31 round-trips to your server for 100% of first-page visitors save anyone time or bandwidth? Your pageviews per visitor is 6.8, meaning the 53.6% that stick around view an average of 11.8 pages each. Even if there are zero subsequent js requests on other pages (clicking a random page I see 8) you would be generating 31 requests up-front to save 10.8 subsequent requests for about half of your users. (And again - in any scenario where the number of js fragments transferred on subsequent requests >= 1 even this benefit goes out the window.) How does that save you or your users bandwidth, server load, or other overhead?

The scale is not quite linear, but generally speaking, if you get your number of requests down from > 100 to < 5, you'll be able to handle around 20x the traffic with the same number of web-facing servers. Or alternatively the same amount of traffic with around 1 / 20th the servers.

Would that have a material effect on your costs?

[1] https://www.alexa.com/siteinfo/mangadex.org

tristan9 · on Sept 7, 2021

Definitely needs optimising for user experience indeed!

However the serving of this JS has nearly no cost to us (as they are cached at the edge by DDoS-Guard and the frontend is otherwise entirely static on our end)

rowanG077 · on Sept 9, 2021

It does have a cost it's just hidden. The cost is that it increases your bounce rate because of bad UX.

the8472 · on Sept 7, 2021

One issue I see is that flipping back and forth between chapters reloads images from different URLs which means they're uncachable. I guess that's somehow related to the mangadex@home thing, but if the URLs were generated in a more deterministic manner (keyed on some client ID + the chapter being loaded) then the browser could avoid redundant traffic.

tristan9 · on Sept 7, 2021

That's very close to how MD@H works, but it also has a time component and tokens are not generated by our main backends, so it'd require a separate internal http call per chapter

the8472 · on Sept 7, 2021

Another thing. For each page that's being loaded there's a report being sent. Instead this could be aggregated (e.g. once a second) and then processed as a batch on the server side which should be faster.

And if your JS assets are hashed then you can add cache-control: immutable so that a browser doesn't have to reload them when the user F5s.

radicalbyte · on Sept 7, 2021

Do you manage to get as many buzz-words and OSS products into your system as they do? :)

In general the less moving parts you have in a system the more reliable, secure, efficient and cheaper the system becomes.

In their case they run a site that is probably under constant attack by the "hired goons", so they're going to need to have more moving parts than others. Plus they will want to optimise for minimal development time (it's a hobby) so just adding another tried and trusted system into the stack to do something you need makes sense.

maxk42 · on Sept 7, 2021

lol

> In general the less moving parts you have in a system the more reliable, secure, efficient and cheaper the system becomes.

100% agreed. This is not my first high-traffic site, nor even the highest. (I built the analytics system for a an Alexa top-10 site in 2010, reaching some 30 billion writes / day off of a mere 14 small ec2 instances.) I've never seen a k8s implementation in production that was necessary.

I will note that my Alexa-2k site is also a personal site (no revenue) and under constant attack. In fact we frequently suffer DDOSes that we don't even notice until reviewing the logs later because it doesn't suffer any latency under pressure.

radicalbyte · on Sept 7, 2021

Interesting, wouldn't mind having a chat outside of HN if you're interested (see my profile for mail).

I've spent much of my career working on systems with active users from the hundreds to low thousands, but which process a huge number (50k/sec scale) jobs/tasks.

It's a totally different kettle of fish, and if I'm totally honest I'm shocked at how badly "web" scales and how common these naive and super inefficient implementations are (hint: my bare-metal server from 2005 was faster than expensive cloud VMs).

Recently I've worked on two high-usage systems (one of which was "handling" 30k requests/second for the first couple of week).

golergka · on Sept 7, 2021

> I've spent much of my career working on systems with active users from the hundreds to low thousands, but which process a huge number (50k/sec scale) jobs/tasks.

MMO games, by any chance?

Folcon · on Sept 7, 2021

Would you mind outlining your approach?

Really interested to see how you think about this sort of thing =)...

maxk42 · on Sept 7, 2021

My approach to what?

maxk42 · on Sept 7, 2021

(1) Simple beats complex.

(2) You can spend weeks building complex infrastructure or caching systems only to find out that some fixed C in your equation was larger than your overhead savings. In other words: Measure everything. In other other words: Premature optimization is the root of all evil.

(3) Fewer moving parts equals less overhead. (Again: Simple beats complex.) It also makes things simpler to reason about. If you can get by without the fancy frameworks, VMs, containers, ORM, message queues, etc. you'll probably have a more performant system. You need to understand what each of those things does and how and why you're using them. Which brings me to:

(4) Learn your tools. You can push an incredible amount of performance out of MySQL, for instance, if you learn to adjust its settings, benchmark different DB engines for your application, test different approaches to building your schemas, test different queries, make use of tools like the EXPLAIN statement, etc. you'll probably never need to do something silly like make half a dozen round-trips to the database in a single page load.

(5) Understand your data. Reason about the data you will need before you build your application. If you're working with an existing application, make sure you are very familiar with your application's database schema. Reason ahead of time about what requirements you have or will have, and which data will be needed simultaneously for different operations. Design your database tables in such a way as to minimize the number of round-trips you will need to make to the database. (My rule of thumb: Try to do everything in a single request per page, if possible. Two is acceptable. Three is the maximum. If I need to make more than three round-trips to the database in a single page request, I'm either doing something too complex or I seriously need to rethink my schema.)

(6) Networking is slow. Minimize network traversal. Avoid relying on third-party APIs where possible when performance counts. Prefer running small databases local to the web server to large databases that require network traversal to reach. This is how I handled 30 billion writes / day: 12 web servers with separate MySQL instances local to each sharded on primary key IDs. The servers continuously exported data to an "aggregation" server, which was subsequently copied to another server for additional processing. Having the web server and database local to the same VM meant they didn't need to wait for any network traversal to record their data. I could've easily needed several times as many servers if I had gone with a traditional cluster due to the additional latency. When you need to process 25,000 events in a second, every millisecond counts.

(7) Static files beat the hell out of databases for read-only performance. (Generally.)

(8) Sometimes you can get things moving even faster by storing it in memory instead of on disk.

(9) Reiterating what's in (3): Most web frameworks are garbage when it comes to performance. If your framework isn't in the top half of the Techempower benchmarks, (or higher for performance-critical applications) it's probably going to be better for performance to write your own code if you understand what you're doing. Link for reference: https://www.techempower.com/benchmarks/ Note that the Techempower benchmarks themselves can be misleading. Many of the best performers are only there because of some built-in caching, obscure language hack, or standards-breaking corner-cutting. But for the frameworks that aren't doing those things, the benchmark is solid. Again, make sure you know your tools and why the benchmark rating is what it is. Note also that some entire languages don't really show up in the top half of techempower benchmarks. Take that into consideration if performance is critical to your application.

(10) Most applications don't need great performance. Remember that a million hits a day is really just 12 hits per second. Of course the reality is that the traffic doesn't come in evenly across every second of the day, but the point remains: Most applications just don't need that much optimization. Just stick with (1) and (2) if you're not serving a hundred million hits per day and you'll be fine.

arethuza · on Sept 7, 2021

"Simple beats complex."

In the very first lecture of the Computer Science degree I did in the 1980s the lecturer emphasised KISS, and said that while we almost certainly wouldn't believe it at first eventually we'd realise that this is the most important design principle of all. Probably took me ~15 years... ;-)

politelemon · on Sept 7, 2021

Sadly I think this is a lesson that we as an industry consistently keep unlearning.

_6zew · on Sept 7, 2021

> Simple beats complex. > Fewer moving parts equals less overhead.

Took me almost a decade to really comprehend this.

I used to include all sorts of libraries, try out all the fancy patterns/architectures etc...

After countless of hours debugging production issues... the best code i've ever written is the one with the fewer moving parts. Easier to debug and the issues are predictable.

kiba · on Sept 7, 2021

"The best part is no part." is an engineering quote I heard.

bradknowles · on Sept 7, 2021

Said in a slightly different way: No part is better than no part.

I know I’m not the first to use that phrasing, but I’m not sure where I picked it up. If someone wants to point out the etymology of that type of phrase, I’d be glad to read up on what I’ve forgotten/missed.

arethuza · on Sept 7, 2021

I'm sure I've heard something like "engineering is solving problems while doing as little new as possible".

emi2k01 · on Sept 8, 2021

> 12 web servers with separate MySQL instances local to each sharded on primary key IDs.

I don't understand this part. Hopefully you can clarify this to me.

If you're sharding by primary key, doesn't that mean that there's a high chance that the shard in your local DB instance won't have the data the web server is requesting?

I'm not familiar with DB management.

radicalbyte · on Sept 10, 2021

Imagine you have a system which services 50 states. In the vast majority of cases, states only look at or mutate information on their own state.

In that case, you can easily split the data between shards based on ranges of an integer key. It's very easy to code, test, deploy and understand such a design.

Folcon · on Sept 7, 2021

Thanks, this is a good list in general of things to think about =)...

I've not really ever applied 9 myself, I've run comparative benchmarks a couple of times, but not thought about using that as a basis for whether to roll my own on critical performance parts.

_xkvm · on Sept 7, 2021

> But for the frameworks that aren't doing those things, the benchmark is solid.

Any example of such frameworks?

manigandham · on Sept 7, 2021

(ASP).NET is solid. Extremely fast, very reliable, and highly productive.

https://dotnet.microsoft.com/apps/aspnet

radicalbyte · on Sept 10, 2021

As long as you know what you're doing. If you're throwing an ORM like Entity Framework at a problem because you don't understand SQL, then you're going to see poor performance.

probotect0r · on Sept 7, 2021

Can you share how you do logging/monitoring/alerting for your site?

maxk42 · on Sept 7, 2021

Bash scripts and cron. Automatic alerts go out to devs via OpsGenie when resource availability drops so we can get out ahead of it. 0 seconds of downtime in the past 12 months.

Folcon · on Sept 7, 2021

To architecting a high traffic site =)...

Cipater · on Sept 7, 2021

He posted a reply to his own comment.

https://news.ycombinator.com/item?id=28443113

maxk42 · on Sept 7, 2021

Actually, my reply was to Folcon. HN simply doesn't allow you to reply to comments beyond a certain depth sometimes.

Perhaps mods have the ability to extend this for active discussions and that's why I can reply now?

detaro · on Sept 7, 2021

it's timing based. you can always reply by going to the permalink of the comment you want to reply to.

maxk42 · on Sept 7, 2021

Couldn't reply to this comment - but sure enough, the permalink gives me the option. Thank you for the info!

detaro · on Sept 7, 2021

Yeah, it's a somewhat well-meaning feature (supposed to slow down flamewars) that is extremely unintuitive

starfallg · on Sept 7, 2021

>In their case they run a site that is probably under constant attack by the "hired goons", so they're going to need to have more moving parts than others.

That's taken care of by the DDoS-Guard system they placed fronting their infrastructure. The design of their system has to take this into account, but that is mainly on a IP and DNS level. The design of their stack behind the loadbalancer is mainly driven by their functional and non-functional requirements, rather than by the need to prevent DDoS attacks.

radicalbyte · on Sept 7, 2021

The layering - defence in depth - is very much a security consideration. Especially if you're building a pure request/response/sync system you need that. Or you decouple with a queue for mutations and avoid a lot of issues.

starfallg · on Sept 7, 2021

That may be in terms of managing general security, especially with regards to the attack surface of the solution, but here we are talking about DDoS, which is mostly a separate topic and handled on the network level (for volumetric attacks) and load-balancer level (for non-volumetric attacks) or a combination of both.

polote · on Sept 7, 2021

I don't know about you, but they have 42 average Page-views per visit (HN has 3) so Alexa rank is going to be biased

tyingq · on Sept 7, 2021

>A fresh load of the home page generates over 100 requests.

I see 17 requests, all over either h2 or h3. 4 of them JS, and 2 images.

maxk42 · on Sept 7, 2021

Then you're not doing a fresh load of the page. There are over 30 images visible on the front page, so your measure doesn't pass the smell test, does it?

tyingq · on Sept 7, 2021

>Then you're not doing a fresh load of the page

Nope. Different problem.

The article was linked to a page under the domain "mangadex.dev".

Without any other context, I had assumed "home page" meant http://mangadex.dev , or what I got when clicking "Home" on the linked article.

Apparently not.