Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Typical: Data interchange with algebraic data types (github.com/stepchowfun)
129 points by g0xA52A2A on May 21, 2023 | hide | past | favorite | 50 comments


This seems like the thing I wish protocol buffers were, but learning from them and removing the warts. Very promising.

Also, implementing in rust and typescript as the first languages was a smart choice


Protocol buffer lacks of good documentation for beginners which fits in a README.md. It's not a community project i guess.

Basically enterprisey toolings bored me to death.

Nowadays i trust toolings whose documentation could fit in a README file. It means it's concise, simplicity.


This comes close https://protobuf.dev/programming-guides/encoding/

It's long, but dev-friendly. Once I got through, I could understand 99% of benefits and limitations of protobufs.


I don't know a lot about protocol buffers. What are the warts? I think they don't have a canonical format, is that one of them?


Typical creator here. I'm pleasantly surprised to find this on HN today! Happy to answer any questions about it.


I was wondering if you've considered having an alternative, human readable encoding (either your own syntax or a JSON-based schema)?

I find it quite useful to be able to inspect data by eye and even hand-editing payloads occassionally, and having a standard syntax for doing so would be nice.

(More generally, it's a shame JSON doesn't support sum types "natively" and I think a human readable format with Typical's data model would be really cool).


It's a good question! The binary format is completely inscrutable to human eyes and is not designed for manual inspection/editing. However:

1) For Rust, the generated types implement the `Debug` trait, so they can be printed in a textual format that Rust programmers are accustomed to reading.

2) For JavaScript/TypeScript, deserialized messages are simple passive data objects that can be logged or inspected in the developer console.

So it's easy to log/inspect deserialized messages in a human-readable format, but there's currently no way to read/edit encoded messages directly. In the future, we may add an option to encode messages as JSON which would match the representation currently used for decoded messages in JavaScript/TypeScript, with sums being encoded as tagged unions.


Thanks. That's a good answer.


> it's a shame JSON doesn't support sum types "natively"

You can describe it using JSON Schema though[1], using "oneOf" and "const". Though I prefer the more explicit way of using "oneOf" combined with "required" to select one of a number of keys[2].

[1]: https://www.jsonschemavalidator.net/s/6SCuYNBe

[2]: https://www.jsonschemavalidator.net/s/tNnQmsTd


It seems like the safety rules are buggy because some assumptions are missing?

The safety rules say that adding an asymmetric field is safe, and converting asymmetric to required is safe. If you do both steps, then this implies that adding a required field is safe. But it’s not. As you say, it’s not transitive.

But lack of transitivity also means that a pull request that converts a field from asymmetric to required is not safe, in general. You need to know the history of that field. If you know that the field has always been asymmetric (unlikely) or all the older binaries are gone, then it’s safe. A reviewer can’t determine this by reading a pull request alone.

Maybe waiting until old binaries are gone is what you mean by “a single change” but it seems like that should be made explicit?


No IDL that supports required fields can offer the transitivity property you're referring to.

Typical has no notion of commits or pull requests in your codebase. The only meaningful notion of a "change" from Typical's perspective is a deployment which updates the live version of the code.

When promoting an asymmetric field to required (for example), you need to be sure the asymmetric field has been rolled out first. If you were using any other IDL framework (like Protocol Buffers) and you wanted to promote an optional field to required, you'd be faced with the same situation: you first need to make sure that the code running in production is always setting that field before you do the promotion. Typical just helps more than other IDL frameworks by making it a compile-time guarantee in the meantime.

We should be more careful about how we use the overloaded word "change", so I'm grateful you pointed this out. Another comment also helped me realize how confusing the word "update" can be.


That's reasonable but it's a different notion of what a safe change is than I remember from using protobufs. I believe they just say adding or removing a required field isn't backward compatible.

The word "safe" seems worth clarifying. There are some changes that are always safe, like reordering fields, because they are also transitively safe.

Removing an optional field is safe as long as you remember not to reuse the index. Sometimes the next unused index is informally remembered using a counter in a comment.

Other changes, like converting asymmetric to required, seem like they're in a different category where it can be done safely but requires more caution. It's more like a database migration where you control all the code that accesses the database. There is a closed-world assumption.


> That's reasonable but it's a different notion of what a safe change is than I remember from using protobufs. I believe they just say adding or removing a required field isn't backward compatible.

Safe just means the old code and the new code can coexist (e.g., during a rollout), which requires compatibility in both directions. Not just backward compatibility.

This is true for Protocol Buffers as well, except they have no safe way to introduce or remove required fields. So the common wisdom there is to not use required fields at all.


I think our misunderstanding is really about use cases.

Sometimes Protocol Buffers are used to write log files, and the log files are stored and never migrated. To read the oldest logs, you need backward compatibility all the way back to the first production code that was released. This means transitive safety is needed and the changes you can make to the schema, which is used as a file format, are pretty limited.

This isn't just a limitation of the Protocol Buffer format. Safety rules are different when you do long-term persistence. If Typical were used that way, you could only trust safety rules that are transitive. Asymmetric fields could be added, but the fallbacks never go away.

(Also, a rollback doesn't get rid of any logs that were generated, so it's not a full undo. As you say, both forward and backward compatibility are needed.)

Serialization isn't just used for network calls, and even when it is, sometimes you don't control when clients upgrade, such as when the clients get deployed by different companies, or as part of a mobile app. So it seems worth clarifying the use cases you have in mind when making safety claims.


I think you're right, and now I understand why the rules seemed buggy to you but not to me. You're considering persisted messages that need to be compatible with many versions of the schema, whereas the discussion and rules are formulated in the context of RPC messages between services which only need to be compatible with at most three versions of the schema: the version that generated the message, the version before that, and the version after. The README could do better to clarify that.

In the persisted messages scenario, there is one change to the rules: you can never introduce a required field (since old messages might not have it). Not even asymmetric fields can be promoted to required in that scenario.


Okay, great! Hope that helps.

To expand on this, a way to think about it is that there are some changes that are always safe and others that depend on what data is still out there (or that’s still being generated) that you want the code to be able to read.

“What writers are out there” isn’t a property of the code alone, though maybe you could use the code to keep track of what you intended. The releases deployed to production determine which writers exist, and they keep running until stopped and perhaps upgraded.

In some cases a serialization schema might be shared in a common library among multiple applications, each with its own release schedule, making it hard to find out which writers are still out there.

It’s much easier when the serialization is only used in services where you control when releases happen and when they’re started and stopped.


From the description:

Thus, asymmetric fields in choices behave like optional fields for writers and like required fields for readers—the opposite of their behavior in structs.

So if you have a schema change which adds an asymmetric field to both a struct and a choice, it seems both writers and readers needs to be updated in order to successfully transmit to each other?

Or am I missing something fundamental?


If you add an asymmetric field to a struct, writers need to be updated to set the field for the code to compile.

If you also add an asymmetric field to a choice, readers need to be updated to be able to handle the new case for the code to compile.

You can do both in the same change. The new code can be deployed to the writers and readers in any order. Messages generated from the old code can be read by the new code and vice versa, so it's fine for both versions of the code to coexist during the rollout.

After that first change is rolled out, you can promote the new fields to required. This change can also be deployed to writers and readers in any order. Since writers are already setting the new field in the struct, it's fine for readers to start relying on it. And since readers can already handle the new case in the choice, it's fine for writers to start using it.


Ah, I was thinking in terms of the actual messages, not compilation. So it should read "Fields are required for compilation by default"?

This leads me to versioning. Imagine you have some old code which won't be upgraded, say an embedded system. Either you don't promote to "required", or you do versioning.

Given the lack of any mention of versioning, I take it that's to be deal with externally? Ie as a separate schema, and detected before handing the rest of the data to the generated code?

edit: Really cool project btw!


If I understand you correctly, I believe your understanding is correct.


Do you think you could ever generate types for go? The protobuf implementation of oneof in go is pretty rough to look at, and not fun to type over and over.


I'd love for Typical to support Go! We'd need someone with enough time to implement it.

If anyone is interested in contributing any code generators, you can start by copying the Rust or TypeScript generator and modifying it appropriately. See the contributing guide here: https://github.com/stepchowfun/typical/blob/main/CONTRIBUTIN...


Really cool! Does it work in browser if I want to compile .t spec using JS/TS?


Yes! We have comprehensive integration tests that run in the browser to ensure the generated code only uses browser-compatible APIs. Also, the generated code never uses reflection or dynamic code evaluation, so it works in Content Security Policy-restricted environments.

See this section of the README for more info: https://github.com/stepchowfun/typical#javascript-and-typesc...


It's nice that generated code works in browser but I was curious whether it's possible to actually generate code from .t syntax all in browser.

By the looks of it typical is only a CLI tool, so I guess not for now. Maybe unless it is ported to WASM...


A wasm port doesn't seem too farfetched. What's the use case for running the code generator in the browser?


Something that could involve giving the end user some control over schema...


Nice that algebraic types are getting more love. Would be nice if these could be imported into existing systems, like Cap 'n Proto.


Curious how this will look when they get to implementations with less expressive type systems. Typescript & Rust are particularly good. Making a usable library for this in golang won't be easy.

And now that I think about it, Protobuf/Thrift/etc type tools are heavily constrained by finding lowest-common-demoninator of language features to allow for cross serialization. Maybe in the next generation of these tools, languages like golang don't get a seat at the table for the sake of progress -- I could be fine with it.


You're exactly right about other frameworks appealing to the lowest common denominator, whereas Typical isn't willing to make such compromises.

Languages without proper sum types are at a disadvantage here, but it's possible to encode sum types with exhaustive pattern matching in such languages using the visitor pattern. That approach requires some ergonomic sacrifices (e.g., having to use a reified eliminator rather than the built-in `switch` statement), and people using those languages may prefer convenience over strong guarantees. It's an unfortunate impedance mismatch.


I'd imagine that most people voluntarily using go or similar languages wouldn't be too bothered by just having all the checks occur at runtime in the generated code, rather than encoding them in the type system.

Sum types are still awkward, but most languages can at least approximate them, minus some compile-time checks.


Brilliant, I have been thinking of doing exactly this for a while now, glad I waited for someone else to do it in a better way :)


Asymmetric fields is a really clever idea.


I love everything about this! I think a lot of code could benefit from restructuring via ADTs, and ser/deser is an important piece of that story. But I suppose I do have one nitpick.

Using a fallback for asymmetric fields in sum types seems off to me, albeit pragmatic. If the asymmetric fields for product types use an Option<T>, and Option is basically a sum of a T and a Unit, a close dual is a product (struct/tuple) of a T and the dual of Unit (a Nothing type, eg. one with no instantiable values, such as an empty enum).

I think this would provide similar safety guarantees, as a writer couldn't produce a value of an added asymmetric sum type variant, but a reader could write handling for it (including all subfields besides the Nothing typed one)?


You're considering an alternative behavior for asymmetric fields in choices, but you need to consider the behavior of optional fields in choices too.

In particular, the following duality is the lynchpin that ties everything together: "asymmetric" behaves like optional for struct readers and choice writers, but required for struct writers and choice readers.

From that duality, the behavior of asymmetric fields is completely determined by the behavior of optional fields. It isn't up to us to decide arbitrarily.

So the question becomes: what is the expected behavior of optional choice fields?

Along the lines you proposed, one could try to make optional choice fields behave as if they had an uninhabited type, so that writers would be unable to instantiate them—then you get exactly the behavior you described for asymmetric choice fields. Optional fields are symmetric, so both readers and writers would treat them as the empty type. This satisfies the safety requirement, but only in a degenerate way: optional fields would then be completely useless.

So this is not the way to go.

It's important to take a step back and consider what "optional" ought to mean: optionality for a struct relaxes the burden on writers (they don't have to set the field), whereas for a choice the burden is relaxed on readers (they don't have to handle the field). So how do you allow readers to ignore a choice field? Not by refusing to construct it (which would make it useless), but rather by providing a fallback. So the insight is not to think of optional as a type operator (1 + T) that should be dualised in some way (0 * T), but rather to think about the requirements imposed on writers and readers.

You're right to note the duality between sums and products and initial and terminal objects, and indeed category theory had a strong influence on Typical's design.


It seems that there are two approaches to schema encoding:

* Writing OpenAPI/Avro and then generating deserializing/serializing from that (like e.g. Avro[0] or Tie[1] or Typical)

* Writing the schema using an in-language DSL (e.g. Autodocodec[2])

If I have a single language codebase, why should I prefer the first approach? You can always make your in-language DSL serialize out to a dedicated language at a later point.

Typical isn't focused on JSON, so it doesn't seem like it is optimized for web. Not doing web makes it more likely that you don't need multiple language support.

You can limit the metaprogramming also: You don't strictly need GHC.Generics for the in-language DSL. But if you're generating code, it's always going to be opaque and hard to debug.

If you keep the DSL in-language, you don't need to generate stubs since you can use the languages own type system to enforce the mapping to the native records[2].

I have heard the argument that everything should be 'documentation first', which was given as an argument for using Tie. But I don't see why an in-language DSL can't provide enough detail. There is so much manually written OpenAPI out there, any of these approaches is vastly better than that.

I have been reading Designing Data Intensive Applications by Martin Kleppmann but it doesn't cover this trade-off. Which makes sense, since it isn't really a book on programming using DSLs.

[0]: https://hackage.haskell.org/package/avro#generating-code-fro...

[1]: https://github.com/scarf-sh/tie

[2]: https://github.com/NorfairKing/autodocodec#fully-featured-ex...


> If I have a single language codebase, why should I prefer the first approach?

Probably the most compelling reason is that a single language codebase might not be a single language codebase forever. But, as you suggested, the switch to a language-agnostic framework can be deferred until it becomes necessary.

However, there's a reason to use Typical specifically: asymmetric fields. This feature allows you to change your schema over time without breaking compatibility and without sacrificing type safety.

If you ever expect to have newer versions of the codebase reading messages that were generated by older versions of the codebase (or vice versa), this is a concern that will need to be addressed. This can happen when you have a system that isn't deployed atomically (e.g., microservices, web/mobile applications, etc.) or when you have persisted messages that can outlive a single version of the codebase (e.g., files or database records).

An embedded DSL could in principle provide asymmetric fields, but I'm not aware of any that do.

> Typical isn't focused on JSON, so it doesn't seem like it is optimized for web.

It just makes different trade-offs than most web-based systems, but that doesn't make it unsuitable for web use. We have comprehensive integration tests that run in the browser. Deserialized messages are simple passive data objects [1] that can be logged or inspected in the developer console.

[1] https://en.wikipedia.org/wiki/Passive_data_structure


I'd love to use this if it had .net support.

We use Nats to communicate between services and use JSON for a lack of better options. I've been looking for something more efficient and strict. This looks like it would be a good match for F#s types.


> Typical offers a new solution ("asymmetric" fields) to the classic problem of how to safely add or remove fields in record types without breaking compatibility. The concept of asymmetric fields also solves the dual problem of how to preserve compatibility when adding or removing cases in sum types.

This is indeed an elegant solution, although I am not so sure about its novelty. It seems very similar to how Avro achieves forward and backward compatibility. I wonder how the two strategies differ?


Avro has no equivalent of Typical's asymmetric fields. In Avro:

1. Record types can have optional (but not asymmetric) fields, just like in most IDLs. Avro implements this by taking the union of the field type with a special `null` type, but in practice it's equivalent to having optional fields.

2. Avro doesn't support proper sum types, but it has two approximations of them: unions (but not tagged unions) and enums. Unions have no support for adding/removing new cases safely. Enums can have a default value that is used if the case is not recognized.

From the "Typical perspective", both of these are problematic:

1. If you are trying to introduce a required field in Avro, you first introduce it as optional, and then at some point in the future when you have manually confirmed that the field is always being set by every writer, you can promote the field to required. Typical's asymmetric fields offload the burden of having to do that manual confirmation onto the type checker.

2. For unions, Avro offers no equivalent of optional or asymmetric cases. If you add a new case, you better not use it until all readers can handle it, and the type checker won't enforce that. For enums, the solution of having default values is unsatisfactory, because not every type has a suitable default. In practice, this usually means adding a catch-all default that is only used to signal an unrecognized input, but then what is a reader supposed to do with that? With Typical, if a writer introduces a new optional or asymmetric case, it must then provide an appropriate fallback to use when the new case isn't recognized by readers. For example, if a new specific type of error is introduced (e.g., with a stack trace), the fallback might be a more general type of error (e.g., with only an error message) that readers already know how to handle. If the new case is asymmetric, then you know readers can handle it, so once its rolled out you know it's safe to subsequently promote the case to required (so that writers no longer need to provide a fallback).

Here I've only discussed the challenges with adding new required fields/cases, but you run into similar trouble when removing them. This section of the README discusses all the pitfalls: https://github.com/stepchowfun/typical#required-optional-and...


I don't really understand your summary for record types; it doesn't match with my experience of Avro schema evolution.

Avro's schema resolution rules [0] for record types seem to implement asymmetry almost exactly. It's just that this is done by having separate reader and writer schemas, which is less ergonomic and clear.

To add a new required field, you add it as a *required* field to the writer's schema. To do this asymmetrically, you can make it an *optional* field in the reader's schema, as a union with null. Once all clients are compliant, you can change the reader schema to replace the union with a plain type.

To remove a required field, you can go through this same dance. You can also always delete a field from the reader side, and any written data under that tag will be ignored.

Regarding union and enums: a nit is that unions are tagged, at least on the wire [1]. The writer-specified fallback is an important difference though, I definitely agree with that, and it seems like a genuinely novel improvement.

---

[0] https://avro.apache.org/docs/1.11.1/specification/#schema-re...

[1] https://avro.apache.org/docs/1.11.1/specification/#unions-1


Looks phenomenal, especially regarding robustness around union types


> Note: An optional field in a choice isn't simply a field with an option type/nullable type. The word "optional" here means that readers can ignore it and use a fallback instead, not that its payload might be missing

Hmm, this is new to me -- is this a Typical concept, or something more general around protobuf-enums that I just haven't run into before?

(oh, and besides my question, strong agreement with everyone else here, cool lib and great documentation!)


This is a Typical concept. I haven't seen this approach to optionality for sum types in any other serialization framework.

Programmers have good intuition for what it means for a field in a struct to be optional: it's either there or it's not. But for a sum type, what does it mean for a case to be "missing"? It's not quite as obvious, but it helps to think about what responsibilities are placed on writers vs. readers: optionality for a struct relaxes the burden on writers (they don't have to set the field), whereas for a sum type the burden is relaxed on readers (they don't have to handle the case).


What is the difference with ATD: https://atd.readthedocs.io/en/latest/atd-project.html which also was designed with algebraic data types in mind?


I can't speak with authority about ATD, but the following might be helpful:

Aside from algebraic data types, the big selling point of Typical is asymmetric fields. That's the crucial feature that distinguishes Typical from every other framework. Without asymmetric fields, there is no safe way to introduce or retire required fields. People using other frameworks fear required fields (rightly so), whereas Typical gives you the tools to embrace them.


Asymmetric fields are great. Shouldn't they be the default?


Asymmetric fields are in a temporary transition state to/from required. So it would seem a bit odd to me for the transition state to be the default.

However, I think I see your reasoning: you don't want to accidentally introduce a required field without first making it asymmetric, and having asymmetric as the default would effectively prevent that.

So I can see the appeal of both designs!


It seems we're back to corba


IMO Corba was still better than SOAP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: