I assume he is talking about Richard King. I don't think he has his own channel. but as mentioned Stefan Gotteswinter and Keith Rucker both have videos of the scraping skills they learned from him. (Rucker has actually hosted several of King's classes in his workshop)
edit: I clearly misread the end of OP's post. Definitely not talking about Richard King.
He did years of scraping before he met Rich, but he did take a Rich King class to learn power scraping. Made it easier to justify skipping the expensive class for me.
I suspect that if netflix went to the ISPs to cache their data, the ISPs would be more than happy to charge them for it like to do with other content networks. In this case, netflix is assuming that the volume of netflix traffic at the ISP will drive the ISP to request on network caching. Netflix is happy to provide this, but is not willing to pay the ISP for the privilege.
Does netflix care about how much traffic they are directly serving vs how much is hitting caching servers? Are they playing chicken with the ISPs hoping the ISPs will blink first and put caching servers servers onsite for free?
Exactly. It's almost like a peering relationship between providers but in this instance netflix is not directly connected to the ISP so wouldn't be paying the ISP for bandwidth used by their customers accessing netflix. It definitely saves the ISP a lot of bandwidth cost
This is spot on. This is basically how Netflix gets free colocation. Netflix's monopoly is what allows them to do this. ISPs would laugh if anyone else wanted free hosting of their edge equipment, but because Netflix is such a burden on ISPs, they're able to strongarm ISPs into providing free hosting of Open Connect appliances.
I read books constantly, and I do own a 2nd generation Kindle, but I think it is buried in a box somewhere from a move I did last fall. I do 99% of my reading these days on either my Android phone, my laptop or my desktop PC. I really think the killer feature that Amazon bring to its Kindle brand is actually "Whisper Sync" more than eInk displays or any other specific device. Being able to pick up my book from where I left off on any internet enabled device is the #1 reason my books purchases are almost exclusively from amazon.
You would think so, but I still see new projects started every day using Apache for something nginx would be a much better fit for. I think that mostly the admins and architects on these projects have simply never heard of nginx, or if they have, they don't really know what it does.
Don't underestimate how much people prefer to work with what they know. There's been plenty of times when I've gone with Apache on a server just because nginx's benefits aren't going to make a huge difference and I've been using Apache long enough to be able to configure it in my sleep.
I'm using both Nginx and Apache, depending on my needs and goals. I still like Apache because of its stability, configurability and awesome collection of plugins available.
Example 1: I once needed to send all logs from all web servers to a central machine. Now I know there's syslog-ng and other stuff like that, but I've had pretty bad experiences with it and in this instance I just configured Apache to pipe the logs to a simple Perl script I wrote (you know, instead of to a file). This was for a website with 10 million of visits per day and this configuration is still in production and worked wonderfully well. Nginx can't do it.
Example 2: if you ever want to develop Python/Django/wsgi apps, one of the best choices you can make is mod_wsgi. mod_wsgi is a self-healing Python/wsgi server that just works and is integrated perfectly within Apache. Ruby's Passenger was inspired by it and there's no mod_wsgi for Nginx (somebody tried porting it, but the results were awful).
Example 3: PHP is bound to Apache and will forever perform at its best as an Apache module. If you try the alternatives, you're just shooting yourself in the foot. Some people are also placing Nginx in front of Apache, but IMHO those are resources that would be better spent on placing Varnish in front (a kick-ass reverse proxy cache that can also do load-balancing).
I do love Nginx though. It is freakishly fast and it uses few resources.
I'm not biased one way or the other, although I find nginx far easier
to work with than apache. And like many here, I've been working with
apache forever.
1. I have no problem consolidating logs. I don't know your particular use
case, though, so maybe you have an edge case that is tricky with nginx.
2. I tried using native wsgi on nginx and didn't enjoy it. So, I
switched to gunicorn. Problem solved. I run graphite and its clan via
gunicorn as a key process in many places; it's rock solid.
3. php fpm as a proxy behind nginx performs just as well as mod-php on
apache, if not better. It's a damn sight easier to scale.
So, I don't share your concerns about nginx. So far, I have found no
reason to reason to continue with apache.
The biggest win has been serving a huge throughput of static images for a
particular client's web-site. We sized a new machine under apache, but
when we deployed with nginx, it used a tiny percentage of apache's
resources.
> I think that mostly the admins and architects on these projects have simply never heard of nginx, or if they have, they don't really know what it does.
Then let those admins and architects be hired by your competitors. I mean, whoever is skilled at a craft, must - not should - be aware of a few alternative tools. Even when they are not productive with those tools, at least they should be knowledgeable about how they stack against their tools of choice.
It seems like you're making the assumption that 'they' would choose nginx just because it's superior.
There are many reasons for them not to go with nginx. The biggest one is probably that it's just different. Managing it is different. Configuring it is different (regardless of any similarities). Testing is still required.
It's nothing new that a lot of people either want to be lazy or are afraid of taking risks.
Being 'better' alone isn't a good enough reason to expect everyone to switch.
Different is risky. Consider, for example, lighttpd, another lightweight Apache alternative. It was chosen for a certain project at a former workplace of mine. Not too very long afterward, significant problems with memory leaks started occurring on the deployments that used it, and I found myself tasked with fixing them.
I was unprepared and somewhat astounded to find that lighttpd does not support sending "large files" over CGI, FastCGI, or proxy connections, and the maintainers don't care (cf. http://redmine.lighttpd.net/issues/1283 ...)
Besides the waste of engineering time, customers were impacted.
The getter/setter stuff in java is a huge waste of space and time. However, at least in the enterprise space, all of the tooling and frameworks assume your code follows the java bean naming conventions. For this reason alone I always make sure all my classes follow the pattern and whenever I train new developers I make sure they get a large amount of experience building out bean definitions and understand how much time is saved with Spring and Hibernate as long as the conventions are followed.
But yeah, its a bunch of ridicules, generated spam.
Oh thank God somebody in authority is calling this bean bull_hit what it is.
After reading the Eiffel book and thinking about things like programming by contract and class invariants, the bean pattern which constructs empty, initially useless (or at least unreliable) objects seemed like a huge step back.
The crux of the argument (and most of the book for that matter) is that you want to avoid mutability whenever possible. Adding setters for every field by default means mutability is the default mode for your application. Adding getters for every field by default means you lose any advantages of encapsulation.
At one time, Java Beans were a heavily marketed pattern by Sun. Joshua Bloch came along and said "hey, this is a bad pattern" (and maybe others, but he's the one I always think of).
Why does everything have a getter/setter by default?
That is a pretty horrible anti-pattern, is it something to do with being able to serialise the state (including internal state) of the whole object?
It's about hiding implementation. If you access an object only through methods then the implementation can change without the client code having to know or care (aka be recompiled)
A few months ago the NY Times wrote an article about how they did static generation for their election results mini-site. The HN discussion was here:
http://news.ycombinator.com/item?id=2025611
Some people were watching TV online before, now lots of people are watching TV online. Hulu has played a big part in that. I suspect the number of people watching Hulu every week dwarfs the number of people who were "illegally" watching content online before, at least in the US market. Most of the rest of the world is still forced to find extra-legal methods to obtain the same content.
I really appreciate that this example shows a useful testing strategy and handles the big scheduling gotcha (rescheduling jobs.) Neither are hard solutions once the answer is known, but having them spelled out will save someone a lot of time.
Does google require you to use their service the same way apple does? Or is their service just there for developers that want to use it(at a 10% cost?)
I believe the latter. Android has always allowed app installs not-from-the-market. Some CARRIERS (notably AT&T) have removed that ability for some handsets, but it's not universal and as far as I know, not common.
edit: I clearly misread the end of OP's post. Definitely not talking about Richard King.