Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

First, research on hit factor:

https://zoompf.com/blog/2010/01/should-you-use-javascript-li... http://statichtml.com/2011/google-ajax-libraries-caching.htm... https://github.com/h5bp/html5-boilerplate/pull/1327#issuecom...

I have more recent data on this that is not yet published from my day-to-day work at a web performance company. The pictures has improved somewhat, but not significantly.

You are correct, "Overheads of fetching library from a CDN are applicable to the first request" The problem is, because of the fragmentation, every website is asking you to hit a different URL, so every request is a "first request". You aren't leveraging the browser cache.

Most sites are already serving you site-specific JS anyway over a warm connection (even more so with HTTP/2), so there is even less benefit to going to a 3rd party host to potentially avoid downloading a few dozen kilobytes. Couple that with the security implications of injecting 3rd party code into your page, its just plain silly and wasteful to do this for a modern website.




jQuery CDN cache hit rate is 99.8%[0], Google CDN numbers should be comparable. So yes, you are leveraging browser cache for most popular libraries.

Also I was talking from the subsequent requests from the same client.

[0]https://www.maxcdn.com/blog/maxscale-jquery/


You are confusing browser cache hits with a CDN/edge server cache hit.jQuery, or MaxCDN for that matter has no idea what the "hit rate" of a browser cache is.

This sentence should be a big clue: "We usually average around 15,000 hits per second to our CDN with 99.8% of those being cache hits."

"We" in that sentence is Kris Borchers speaking collectively about the jQuery foundation, talking a MaxCDN interviewer. But he is not talking about the browser cache. He can't be, because jQuery, or MaxCDN for that matter has no idea what the "hit rate" of a browser cache is.

Example: If I go to 1.example.com, which links to maxcdn.com/jquery.js, and then later I go to site 2.example.com, which links to the same maxcdn.com/jquery.js file, my browser doesn't send any requests! That is the entire point of far-future caching! I was able to use the version of jquery that was in my browser cache. However MaxCDN, or jQuery for that matter, have no idea this hit took place.

By the same token, if I go to 1.example.com, which links to maxcdn.com/jquery.js, and then later I go to site 2.example.com, which links to a different URL like maxcdn.com/master/jquery.js, my browser has a cache miss. /master/jquery.js is not in my browser's cache, I've never been there. MaxCDN, or jQuery for that matter, have no idea that I requested something different then before.

CDN cache hit rate has nothing to do with browser caches. In fact, people that are not you, being able to detect if something is in your browser cache or not, is a massive security problem. See my talk at BlackHat in 2007, Many of Jeremiah Grossman's talks at BlackHat (2006, 2007, 2009), or go all the way back to Ed Felton's work on using the timing side channels against browser caches.

In the industry, "99.8%" cache hit on a CDN's edge server means that 99.8% of the time the edge server can handle the request, instead of the request having to go all the way to the origin. They have no way of knowing how often a random person on the internet loads a random file from their browser cache.

This whole thing proves my point: Calling shared, common, publicly hosted copies of popular JS libraries "CDN's" or "JavaScript CDNs" is just confuses people. CDNs are about reducing latency. Share JS libraries are about trying to avoid requests altogether by leveraging the browser cache, and they are largely ineffective.


Maybe they are talking about 200 vs 304 caches.

A browser can use be told to revalidate files, telling the server to return the content using the "If-Modified-Since" and "If-None-Match" headers. This way, the server will return 304 and empty content if the file has not changed or 200 and the file if it is new or it changed


You are right, I was confusing browser cache with CDN cache hit. In their interview they state that:

"Our CDN is a huge part of the jQuery ecosystem. We usually average around 15,000 hits per second to our CDN with 99.8% of those being cache hits. This provides much better performance for those using our CDN as their visitors can use a cached copy of jQuery and not have to download it, thus decreasing load time."

Somehow because of that I assumed that they had analysis done to understand browser caching rates. My bad.

EDIT: Huh, funny thing. What exactly is the origin server for the CDN jQuery library when the request URI is https://code.jquery.com/jquery-2.1.4.min.js ?

What would be the point for going to origin server at all if versioned jquery libraries are static and do not change? Edge locations are for all intents and purposes an origin server. I think that the sibling comment may be more accurate in its assumption: 99.8% cache hit most probably are 200 vs 304 responses.

END OF EDIT

Nevertheless, I've spent more time to research the issue of random person loading a javascript library from browser cache.

Usage on top10k websites: Google JS CDN is used on 23.5% [1] jQuery DNS is used on 4% [4] CDNJS is used on 4% [2] jsDelivr is used on 0.5% [3] OSSCdn is used on 2% [5]

Supposedly set of websites that use a particular JS CDN belong are disjoint with a set of websites using a competitive CDN. Thus we can estimate total JS CDN use at 30% of top10k websites and literally millions of websites scattered around the internet.

As JS libraries popularity follows a power law distribution and libraries cache headers are set for a year and longer, I would suggest that the probability of top 100 JS libraries being already cached in a browser is really high.

Statistical data hints that JS cdns are in fact quite efficient at reaching their goals, but certainly doesn't prove anything conclusively.

[0]https://www.maxcdn.com/blog/maxscale-jquery/

[1]https://trends.builtwith.com/cdn/AJAX-Libraries-API

[2]https://trends.builtwith.com/cdn/CDN-JS

[3]https://trends.builtwith.com/cdn/jsDelivr

[4]https://trends.builtwith.com/cdn/jQuery-CDN

[5]https://trends.builtwith.com/cdn/OSS-CDN


Come on, man. 5 year old data? 20-30% of top websites are currently using Google's CDN, so it seems like you're wrong about the picture not having improved significantly. You really think people don't have DNS for ajax.googleapis.com already? And HTTPS is available (and default), so you really think somebody is gonna hack Google to serve you bad JS? You also conveniently ignore the benefits of domain sharding and the fact that the CDN will serve the files faster and with lower latency than almost any setup. And that HTTP/2 mitigates the cost of not concatenating your scripts.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: