They address this in the RFC. They use JSON as a base transport layer that can be streamed into an HTML renderer. They explicitly target SSR/SSG as use cases for this, and they’re actively working with Next.js for a reference implementation.
I suspect (though they didn’t say so outright) that their choice of a non-HTML layer is because they want to continue to provide core functionality as renderer-agnostic. They do mention React Native as another use case. Presumably the same could be said of renderers targeting e.g. CLI, or smart TV APIs.
tldr: we use a richer format to preserve state on refetch, but this format can be turned into HTML for first render.
Our underlying transport is not exactly JSON but something like "JSON with holes". It's basically a JSON with placeholder string values like "$1", "$2", that later get filled in the future rows. This lets us have breadth-first streaming where we can show some content as early as possible, but always have intentional loading states.
The choice of JSON is because we need to be able to
* Pass component props (which are JSON) to Client Components
* Reconcile the tree on refetches so that state inside isn't thrown away
HTML doesn't let us have either of these two. However, starting with a richer format, and then converting it to HTML for initial render, works.
Since we can also do this at the build time, you could build a website that does all of this work at the build time, and turns it into HTML. However, as you might expect, it won't be interactive without JS.
Thank you, this is a great explanation (and I skimmed some of the RFC on my phone so probably missed some of that). The approach certainly makes sense.
It’s also an interesting approach compared to some of the partial hydration approaches I’ve seen. They all seem to use either wrapper divs/spans (bad for all kinds of reasons) or adjacent script “markers” containing initial hydration data, which seems similar to the slots approach but doesn’t necessarily enable streaming/prioritizing certain content for first render.
Since you’re here, I know the RFC discusses compile time static analysis to identify content which could be server-/static-only, but is there any consideration for supporting that where better static analysis is available (e.g. TypeScript/Flow)? I don’t mind manually marking components static or dynamic if it means a better UX, but doing it automatically in the compiler/bundler could be great for DX, as well as for preventing mistakes.
We did experiment with some pretty aggressive compilation approaches a few years ago (see Prepack). Ironically our conclusion from this (which informed Server Components design) is that you don't want this to be done automatically. Because then you don't have precise control and confidence over what gets shipped to the client and what gets shipped to the server. One compiler bailout, and the difference in the bundle is huge. So you'd want to add a way to enforce things, and now we're back to manual annotations.
I have looked at Prepack! I was wondering if it would be revisited. I certainly understand that lack of confidence, though I think I'd worried more about the false negative: something should be available to the client, but for whatever reason the compiler couldn't identify it. That said, I think tooling could go a long way to address the false positive. For example, in Next.js dev mode, there's an indicator that says whether a page can be statically rendered. If there were tools that said "we've automatically marked this component for the client bundle" and allowed the dev to manually opt out if they're sure the component shouldn't be sent to the client, it's still nice to have for the non-pathological case.
Is the streaming you talk about here and in the RFC over websockets, or the event-stream HTTP content type, or something more custom? Will the architecture have the potential to support realtime applications where the server will push updates to the client without the need to poll on an interval?
It's just regular chunked encoding, which we consume via a ReadableStream[1]. The key part is that we're able to consume partial data because we read it row by row. Each row represents a piece of JSON with placeholders for missing values. When we receive a row, we try to "continue" rendering on the client, and if we have a loading state we can show (even before all values arrived), we do that.
>Will the architecture have the potential to support realtime applications where the server will push updates to the client without the need to poll on an interval?
I suspect (though they didn’t say so outright) that their choice of a non-HTML layer is because they want to continue to provide core functionality as renderer-agnostic. They do mention React Native as another use case. Presumably the same could be said of renderers targeting e.g. CLI, or smart TV APIs.