> oh wow, Signed Exchanges are worse than AMP!
> "make sure you are visiting mybanksite.com" is no longer safe.
Sounds like you don't trust public key based content signing. This is just broadening public key based signatures beyond the domain to include the domain and the content itself, and using signing to make the authenticity of the content independent of the physical infrastructure that served it.
That' what's being used here to verify authenticity of content's source, just like PGP/GPG does for signed emails.
That's a far stronger guarantee than "the data is authentic because it came IP address range X purchased by company Y".
In fact, without a such signature, there is no guarantee that just because a piece of content came from a particular server/datacenter, that it is authentic.
With signed exchanges, the chain of authenticity is pushed all the way back to the website's content creators - it doesn't stop at the web server. Also, this can't be phished unless you break the the content signing algorithms, and if that happens ... we all have bigger problems.
first, it breaks the URL specification, as the "host" is no longer a host. it breaks user's expectation of one of the VERY FEW things that everyday users understand about the internet.
one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache, and then use it to phish customers from within the bank's domain. Or just use a stolen key to make thousands of such pages before the bank finds out. I think , contrary to what you say, it's a brand new, major attack surface.
> first, it breaks the URL specification, as the "host" is no longer a host.
By this definition, "host" hasn't been a host in a long time, since the time it was possible to route DNS traffic to multiple IP addresses, possibly in different datacenters.
> it breaks user's expectation of one of the VERY FEW things that everyday users understand about the internet.
How is signing content directly less authentic than signing only at the web server? Signing content directly at the time of publishing ensures that it was created using the private keys of the entity in question, regardless of the delivery mechanism for the content.
> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache,
Signed content exchanges specifically limit that by putting the content signing step at the content creator level, not the web server level. Unless you steal the content creator's private keys, you can't represent your content as theirs.
> wouldn't the server sign all http responses by default? all you would need to do is upload a file
No, the content has to be signed when it is created, in the content management system or similar content creation tool, not when the server sends it. The content management system itself must have strong controls on it (ACLs, controlled user accounts, protected private keys stored only on encrypted and access controlled media, regular audits, etc).
Basically the server itself is no longer trusted as the arbiter of content authenticity, the actual content creator is. Concretely, when the editor at a publication approves an article after reviewing it, it is signed for delivery at the moment of publication, not at the moment that the request is served.
so that means i can sign a page on the editor's computer, take it with me and serve it to amp from my website? that sounds even more dangerous tbh. it delegates security from people who may know a little bit about it (web hosts) to people who likely know nothing about it (writers)
what happens if someone's key is stolen and they need to re-issue it? All the previously published copies are now invalid?
> first, it breaks the URL specification, as the "host" is no longer a host.
Really, how so? RFC 3986 goes out of it's way to make clear that the "host" component doesn't mean DNS, and doesn't even have to denote a host.
"In other cases, the data within the host component identifies a registered name that has nothing to do with an Internet host."
"A URI resolution implementation might use DNS, host tables, yellow pages, NetInfo, WINS, or any other system for lookup of registered names."
> it breaks user's expectation of one of the FEW things that everyday users understand about the internet.
What, exactly and concretely, is that expectation?
> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache, and then use it to phish customers from within the bank's domain.
If the attacker can upload arbitrary pages to the bank's website, just why would they need signed exchanges? They've already got their phishing page on the correct domain.
the RFC uses the word "host" and not "signer". It also says that the "host" is intented to be looked up in some service registry, and there is no such thing for arbitrary signers.
> exactly and concretely, is that expectation
One of the common security advice banks used to give is "check your browser address that you are in our server"
> just why would they need signed exchanges
with signed exchanges they can fool amp to cache the page long after it has been deleted from the server
The RFC explicitly says that "host" doesn't necessarily mean an actual host and you still insist the opposite. So I don't really know what to say.
> One of the common security advice banks used to give is "check your browser address that you are in our server"
So you say that everyday users have an expectation that they're "in the bank's server"? That doesn't seem very concrete, since that could mean anything. Surely there is some kind of expectation they have about actual behavior or property. Something that will happen / can't happen right now, but the opposite with signed exchanges.
> Anyone who has the file can intercept the form data from that page now - a complete phishing attack.
Uhh... And just how would they do that? They can't inject anything into the page, and they can't modify the page. How do you figure they force the browser to submit the form to the wrong server?
assuming that someone finds a way to sign a malicious Html page (e.g. by sneaking into the editors office) they can serve it from anywhere, and the browser will pretend it's coming from the bank
> One of the common security advice banks used to give is "check your browser address that you are in our server"
" in our server" is a simplification of the technical explanation: "signed by our computers using our private keys before delivery to you". That is still maintained in the case of signed content exchange, but instead the transport function is provided by a different server.
It's not much different than, i.e. signing a compiled app with your private keys before uploading it to an app store. Such apps also use hosts to identify themselves and their content, even though they are delivered via app-store mechanisms.
> signed by our computers using our private keys before delivery to you
Please try to explain that to an everyday grandma.
I still dont' see how it's an improvement. The file can be masqueraded by an arbitrary server god knows where and still be served as valid to me. Anyone who has the file can intercept the form data from that page now - a complete phishing attack. There are so many things that can go horribly wrong it just makes one wonder what's wrong with googlers these days: https://blog.intelx.io/2019/04/15/a-new-type-of-http-client-...
> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache
Only if you have the bank's private key, and the ability to serve arbitrary content from the bank's domain. In which case... yeah, I don't see how the signed exchanges standard makes that problem significantly worse.
i don't know what's the max expiration for amp's cache, but i could set a really-long expiration date on the file and remove it from the server without the bank ever knowing it existed. SGX don't even require an upload - one disgruntled employee could do the same with a stolen key.
Nobody benefits from this shit than google. Do we really need more attack surfaces?
I hadn't realized the content was actually signed; I assumed we were simply trusting Google to send us the content they said they were sending (much like we do when using the Google cache).
I'm curious now: would it be possible to use use the content/markup intended for use by the amp cache to view a static/unscripted/readable version of the page's main content? If so, why hasn't anyone built a browser extension to do so?
On a broader note, this also sounds like it could be used to allow caching proxies to work with https; you'd lose the privacy, but you'd gain from being able to cache content on local network if the browser only had to verify the content, and you trusted the cache not to spy on you.
> I'm curious now: would it be possible to use use the content/markup intended for use by the amp cache to view a static/unscripted/readable version of the page's main content? If so, why hasn't anyone built a browser extension to do so?
If the goal is to get around the AMP CDN, you don't even need to read the main page content. The AMP URL contains the original source URL itself [1].
The extension you are describing would just need to capture all requests with the prefix https://www.google.com/amp (or whatever CDN you are trying to get around), parse out the original URL, and then fetch it, and do what you will with it.
If the goal is to disable scripting on the AMP CDN delivered content, first note that AMP pages can't contain page-author-written JS [2], and any implicit JS has to run async.
But if that's insufficient, you can disable JS in the browser altogether, which would disable it in the loaded AMP content.
You could also try to parse out the main content from your extension from the AMP page if you know from the URL that it's an AMP page. Because AMP's forces relative terseness and simplicity of HTML content, it is probably easier to parse than original page's content. Obviously that won't generalize easily given the large variety of possible of content representations, but you stand a better chance of achieving this with AMP content than the original content.
And if you generalize it enough, you will end up with one component of a web crawl / indexing system in an extension ;)
I’m not sure you understand the purpose of https. Ensuring integrity of the document served by the server is only one small piece of it.
The other critical components are:
encryption so middleboxes can’t see what you’re looking at
guarantee (via the PKI) that the server you’re about to send your banking credentials to is using a cert that belongs to the domain name in the address bar that you trust sending your credentials to.
> encryption so middleboxes can’t see what you’re looking at
> guarantee (via the PKI) that the server you’re about to send your banking credentials to is using a cert that belongs to the domain name in the address bar that you trust sending your credentials to.
The purpose of SXG is to allow publisher signing of edge-cache accelerated public content - i.e. it's read-only - not to encrypt private information like credentials in transport. Https still handles encrypted transport independently of SXG.
Also, why or how would someone create a system that accepted private info or credentials via signed SXG anyways? There's literally no mechanism in it to achieve that. If you tried to build a password entry field for your bank website and distributed it via SXG, it wouldn't even work in the first place.
No, you can distribute whatever content you want. But the content distribution network can't listen for posts from those forms when the content is rendered.
SXG doesn't answer DNS requests for your domain. It only says that a particular piece of content has been signed using private keys that have been registered with the displayed host. That's it.
In fact, you don't even need a CDN or DNS to distribute SXG content. You could distribute it via USB drives, or code flags, USB drives attached to messenger pigeons, whatever. The point is that authenticity of the origin of the content is completely independent of how the content got to you.
When that SXG content, however it is distributed, is rendered, the browser represents that content as originating from your domain, which is in fact exactly where it originated.
There are 100 ways to steal credentials if you manage to convince the user that it’s safe to start typing in the page, since you can serve malicious js that way.
I really don’t understand why the browser would masquerade the url just because the content is signed. At best it is able to say ‘the content is signed with x’s key’
> There are 100 ways to steal credentials if you manage to convince the user that it’s safe to start typing in the page, since you can serve malicious js that way.
That's true, but it's completely independent of SXG. There's no way to trick SXG into showing a URL that it's not signed for. You would have to steal the private keys.
> At best it is able to say ‘the content is signed with x’s key’
Remember that x's key is cryptographically associated with their domain - that's how web certs work - so the browser can also say that "this content is signed with domain x's key". That's exactly what happens with https today, but with https, the chain of attribution implied by the signature stops at the webserver, since it holds the private keys for signing the content.
SXG allows the chain of attribution to be completely independent of the transport mechanism, https or otherwise. Of course, you should still use https to encrypt during data transmission over the internet, but that's orthogonal to content signing.
This is also directly analogous to how app stores distribute cryptographically signed apps. For example, it allows an iPhone to open a local native iOS app in response to a URL click in web content [1]: The app and the URL are both cryptographically signed by the same entity, so iOS can conclude that they are from the same origin, and allow the app to handle the URL.
i agree but i just can't justify the connection between the domain and signed content. The root node here is "X's key" and it is used to sign a domain cert and also a document. It's semantically wrong for the browser to pretend that the document belongs to the domain, and even more wrong when the signed document is being served by another domain with a completely different cert, google's!
Even app stores don't do that - if you download a signed app from any domain, it won't pretend it s downloaded from apple.com but it will report that its signed from Apple Inc. The situation is not analogous anyway because there are very few app stores from 3-4 highly trusted corporates. If any of their app store private keys are stolen the internet is fucked.
> The root node here is "X's key" and it is used to sign a domain cert and also a document. It's semantically wrong for the browser to pretend that the document belongs to the domain
Browsers "pretend" exactly this every time they download a page via HTTPS. It's how HTTPS works. Did you think that they trust that the content comes from the correct source by just doing a reverse DNS lookup on the IP address? They don't. Instead, they check a signature from the web server against their cert keystore, and if the PKI signature check fails, you get a big scary warning that the connection isn't secure/private. The same thing would happen with SXG based content if the signature didn't match the keystore, except the signature to be checked is carried with the content itself, just like with PGP/GPG.
> Even app stores don't do that - if you download a signed app from any domain, it won't pretend it s downloaded from apple.com but it will report that its signed from Apple Inc.
I just checked an iPhone, and they appear to attribute an app to the creator, not Apple, Inc.
But the reason they don't show a download domain is because consumer iOS apps can only be downloaded from Apple, from the App Store, and nowhere else. Adding the information about download source information to the iOS UI would be totally redundant as the value would always be 'downloads.apple.com' or whatever.
If you look at the actual cert signing procedure for iOS apps, the configuration step includes the domain, which is why Apple can associate an entity's apps with it's https websites. Nonetheless, the apps are still signed by the app's creator, not Apple, and the app's creator is responsible for securing the private keys [1]
> The situation is not analogous anyway because there are very few app stores from 3-4 highly trusted corporates.
Why should only the 3-4 big corporates be the only entities who can sign or distribute apps or static web content? They are not the only entities capable of securing private keys. Banks do it all the time, as do individual app developers (note the warnings to app developers about private key management on Apple's website). They are also not the only entities capable of distributing content. App and content stores can provide many other services of added value, like aggregation and curation and payment systems, but signing and distributing content isn't one of those services they can uniquely provide.
You could even argue that distributing the ability to sign and distribute content away from the big corporations reduces single points of failure and makes the whole content distribution ecosystem more robust and fault tolerant.
Well thanks for your reply , i still think sxg breaks semantics.
> Browsers "pretend" exactly this every time they download a page via HTTPS.
yeah and the big scary warnings are for the connection, not the content. currently browsers tie url host to DNS so the semantics are different, so the cert certifies the distributor. I also think this is only true for certs that don't have an organization name, at least i think that , for extended-validation SSL they still show this: https://upload.wikimedia.org/wikipedia/commons/6/63/Firefox_...
> and they appear to attribute an app to the creator, not Apple, Inc.
indeed , i meant that they attribute the app to Apple Inc as the creator, but not their domain, which is again, different semantics. (although i suppose apple is somehow involved in ensuring that the correct binary is distributed for every developer)
> Why should only the 3-4 big corporates
i m obviously not saying they should , but that it's not analogous situation, with their walled gardens and all. the web is nobody's a walled garden and a large part of the content is public domain which doesnt need any signing. that s why app store logic doesnt apply.
> reduces single points of failure
that 's what software hosts already do with providing hashes for binaries. and it's great that sxg can verify content through the browser. but it shows where the content was created, not where it was distributed , thats why i think it's wrong to change the URL
Sounds like you don't trust public key based content signing. This is just broadening public key based signatures beyond the domain to include the domain and the content itself, and using signing to make the authenticity of the content independent of the physical infrastructure that served it.
That' what's being used here to verify authenticity of content's source, just like PGP/GPG does for signed emails.
That's a far stronger guarantee than "the data is authentic because it came IP address range X purchased by company Y".
In fact, without a such signature, there is no guarantee that just because a piece of content came from a particular server/datacenter, that it is authentic.
With signed exchanges, the chain of authenticity is pushed all the way back to the website's content creators - it doesn't stop at the web server. Also, this can't be phished unless you break the the content signing algorithms, and if that happens ... we all have bigger problems.