Well, considering that said traffic originated from Google, and that Google was under no obligation to send it to OP's page, it's a little more nuanced than "stealing". It just means they sent different behaving traffic than usual - traffic that only touches one article and returns to Google.
It might discourage webmasters from adopting AMP though, if they have the expectation to lead the visitor to the homepage or other articles.
I really love the speed of AMP pages, but I don't like the fact that the server request is proxied through Google.
This is akin to an ISP injecting their own little optimization toolbar into pages rendered to the user, except that cert pinning isn't even an option here, since Google is doing the content re-writing at the app tier. They even manage to keep the browser security icon green, creating the at-a-glance perception that the data provenance can be trusted!
If Google edge caching really is critical, then I'd rather see an architecture in which the AMP content is signed by the originating site, and the signature is looked up and validated client-side. This might incur a bit more of a performance hit for sites that are being overwhelmed, but really, that's as it should be -- as an end user, it's surprising that my AMP results are essentially a Google cache. And it would seem that Google goes to good lengths to make the look-and-feel appear browser-like, instead of calling out the cached nature of the response.
> And it would seem that Google goes to good lengths to make the look-and-feel appear browser-like, instead of calling out the cached nature of the response.
Why would a cached website be shown looking different from a non-cached one?
The fact that it's being served out of Google's servers changes the trust model for the content. End users have no good reason to trust Google with the cached data, but people assign trust to them when they click an AMP link. There's no real indication to an end user that that data is coming out of Google's caches rather than from the content provider. I find that misleading.
Typically, cached web pages come out of edge caches, either maintained by a CDN selected by the content provider, an end-user's ISP, or by a box on the local network. In this case, Google -- the search engine -- is fundamentally changing the nature of how the user obtains the third-party search result content.
IMO this is very different than my company or ISP doing some edge caching. It's different because it is much more of an opt-in sort of relationship, and, more importantly, because it's decentralized -- if Comcast starts to muck with search result content, Verizon users will notice the discrepancy. If Google takes advantage of their monopoly position and alters AMP content, nobody except the content providers will be the wiser.
> End users have no good reason to trust Google with the cached data
I'd imagine that end users who don't trust Google aren't using Google.
> There's no real indication to an end user that that data is coming out of Google's caches rather than from the content provider.
Users who care can see the URL and the domain for which the cert is signed. Most users do not care.
> IMO this is very different than my company or ISP doing some edge caching. It's different because it is much more of an opt-in sort of relationship...
I don't see why I can't substitute "Google" and "Bing" for "Comcast" and "Verizon" in your example.
For that matter, at least in the U.S., it's a lot easier to switch out my search engine than my ISP if Comcast starts to muck with result content. FWIW, Verizon already redirects failures of domain name resolution to itself. I don't like it, and there's not much I can do about it if I want Internet around here.
> If Google takes advantage of their monopoly position and alters AMP content, nobody except the content providers will be the wiser.
That's a fair concern, but the day Google does that is the day a media firestorm blows AMP out of the water as a trustworthy tool.
> That's a fair concern, but the day Google does that is the day a media firestorm blows AMP out of the water as a trustworthy tool.
And in particular, it looks (at a quick check) like the google hosted files are in a very predictable location, so it should be easy to write something to check both that version and the hosted version against each other to monitor for discrepancies.
> I'd imagine that end users who don't trust Google aren't using Google.
Except Google use to just guide your to the content, you just had to trust them to give you the best possible answer. Now you have to trust they have the real one too.
> Users who care can see the URL and the domain for which the cert is signed. Most users do not care.
Most users do not care if their passwords are stored in plain text, or if a bridge they cross has been built by a proper engineer. Consumer ignorance is not a valid reason for shady practices.
> If Google edge caching really is critical, then I'd rather see an architecture in which the AMP content is signed by the originating site, and the signature is looked up and validated client-side.
AMP's purpose is to deliver content at minimal bandwidth and latency. Requiring the client to side-channel to the originating server (that can be God-knows-where relative to the requesting client) defeats the purpose of creating a system for low-bandwidth users to quickly fetch and display content on mobile devices.
Sure Google has no legal obligation to send the traffic to the site, but it is still sort of dishonest. Google convices people to install AMP under the premise that it will help them. Instead, it just allows Google to monetize other people's content, at the expense of the creator.
It's hypocrital to bill AMP as an improved mobile experience, while sticking a big button at the top. Also hypocritical to penalize sites for scraped, dublicate content, while doing the same themselves.
Well, it does help them -- it promotes their content ahead of those without AMP, and it still allows ads to be served from an approved ad network of the author's choosing [1].
As I wrote before, it's "more of a mutually-consented handholding with small amounts of arm-twisting" [2].
Google is not stealing traffic but it is stealing content by displaying it out of the original servers control. The real cost of a website is not hosting it but filling it with things people want to read. Content creators will want to get benefits such as ad revenue to compensate for their effort.
If Google is "stealing" content by displaying it out of the original servers control then we should stop all of the major search providers from caching content. Obviously this isn't going to happen because it's considerably more efficient to serve the cached page from Google's fast servers and network.
>The real cost of a website is not hosting it but filling it with things people want to read. Content creators will want to get benefits such as ad revenue to compensate for their effort.
Google doesn't create AMP pages for you - the content creator does. If the content creator wants to insert ads then they can do so via AMP.
It might discourage webmasters from adopting AMP though, if they have the expectation to lead the visitor to the homepage or other articles.