Using enums to represent state in Rust

diarrhea · on Sept 22, 2023

This is good, but could go further if you're pursuing type-system leverage. For example, why does a deleted user have an `activate` associated function in the first place? It will error out! That's safe in Rust and perfectly fine in terms of control flow: it cannot be forgotten, and cannot easily be handled incorrectly (unlike, say, a bool, where a single exclamation mark can mean a nasty bug). But it's not ideal.

I've been a huge fan of the type-state pattern [1]. I don't see it mentioned often, and never outside of Rust so far. However, it's applicable to most languages, including ones you wouldn't suspect (Python). If you introduce a whole new type (not a big deal in Rust; more of a ceremony in C# et al.) for `DeletedUser`, you can simply leave off the `active` function! Any action (==state transition) on that type will be legal and possible. Methods have unit value return type, no `Result` needed. You cannot handle that incorrectly! The code won't even compile.

I am still in the process of exploring downsides to the pattern. For example, in classic OOP languages, you can create a type hierarchy, with a top-level `User`, and the different kinds inheriting from it (and then ideally be marked `final`/`sealed` or whatever). You can then treat all users the same by interacting with the top-level type. Useful for DB interaction, for example. The same in Rust would go through traits: `impl User for DeletedUser`. But it's not quite as nice, is it?

1: https://cliffle.com/blog/rust-typestate/

hgomersall · on Sept 22, 2023

It's less good when the state machine traversal is only known at run time. Necessarily you can only capture the errors at run time, so you get less benefit. You can wrap the states in an enum, but then the enum needs to implement the interface of every state. In that pattern though, you do get the benefit that the enum level dispatcher has to properly obey the types' interfaces, so you get lots of confidence that the runtime errors will be correct (or at least, an error will happen if the state is called incorrectly).

Still, it's a great pattern that I use as often as I can!

admax88qqq · on Sept 23, 2023

Sounds like polymorphism with more steps.

An interface with multiple implementations.

the__alchemist · on Sept 22, 2023

Some of the earlier Rust embedded UIs demonstrate downsides. I like the idea in principle (Letting the compiler catch misconfiguration), but in the implementations I've seen, they don't work well with Rust docs, ie figuring out what types to declare, or how to construct things. My workaround was inserting arbitrary types like `i8` in relevant places and seeing what compiler message is output for the type it was expecting. It can also result in long nests of `<>`.

The typesstate pattern felt hostile when libs using it only included examples of use directly in a main function where you didn't have to specify the type; you'd try to use them in a program in a struct field, function signature etc, and wouldn't know what type to put in.

Example of typestates I've seen: `Spi<SPI1, PA5<Alternate<AF11>>, PA6<Alternate<AF69>...>>>>>>>`

When would be easier to use a plain `Spi` struct.

These aren't necessarily critiques of the typestate pattern in general, but those are the 2 points that pushed me away from it.

saurik · on Sept 22, 2023

It sounds like Rust is in dire need of something similar to C++ decltype.

tialaramex · on Sept 22, 2023

This feature wouldn't make a lot of sense in most cases in Rust. Rust has full type interference inside functions, so if we're inside the function body we can allow the type to be inferred, partially or entirely. For example Vec<_> says this is a Vec of something but we're not specifying what it's a Vec of.

In the function signature, Rust deliberately doesn't have inference, you must write down the types and decltype would not be acceptable for that purpose.

wging · on Sept 22, 2023

There’s one thing I can think of that looks like an exception to the lack of interference at function boundaries: you can return impl SomeTrait rather than worrying about the exact thing you’ll return. It’s useful for iterator adapters in particular, where the types depend on the functions you call and in which order, and thus aren’t stable under small modifications to the source code.

(I wouldn’t count impl trait in function parameters since that acts more like a generic type.)

tialaramex · on Sept 23, 2023

That's not inference, you're literally telling callers "the object I'm giving you might have any type but I promise it implements this Trait".

The compiler knows which concrete type it is, but you needn't and your caller isn't promised it is any particular type (but it is, they just aren't allowed to care)

This is useful because all Rust's functions are types, both lambda and ordinary functions are unique types, but we often want to say I'm going to return say a predicate - we can't name the predicate we're going to return but our caller just wants a predicate so they don't care that we couldn't spell its name.

saurik · on Sept 26, 2023

If doesn't make a lot of sense in most cases in C++ either (and as auto gets better it has only made sense in fewer cases since it was introduced)... I write a lot of C++ and I essentially never ever ever need or even merely use decltype; but, when it does make sense, it really truly makes sense, and here we have a place in Rust that sounds exactly like the use case in C++ where this actually comes up.

twic · on Sept 22, 2023

Time to re-pimp my idea of doing this using a single type parameterised to reflect different states:

https://github.com/tim-group/higher-kinded-lifecycle/blob/ma...

myvoiceismypass · on Sept 22, 2023

Seems like more idiomatic scala to have an ADT via a sealed trait hierarchy + pattern matching here.

sfvisser · on Sept 22, 2023

If you like this kind of pattern you might want to read up on Haskell’s GADTs. Generalized algebraic data-types.

This is where you can index the different constructors (enum variants) with an additional type variables and even specialize them when needed. A pretty powerful tool to encode at the type level that similar things are slightly different.

tkz1312 · on Sept 22, 2023

This kind of thing has been very standard in other ML family languages for decades now. Rust is a nice language, but it’s frustrating when its proponents act as if it’s the first language to have these kind of features. ML is 50 years old at this point…

rakenodiax · on Sept 23, 2023

Everything that was old is new again. I’m glad that features that were relegated to “esoteric” languages like ML's and lisps are becoming more mainstream. There are many lucky 10k’s! :)

tkz1312 · on Sept 24, 2023

rust mainstreaming a lot of functional concepts and patterns is awesome very much agree, just pushing back on the claim that this it’s the only language where this style is prevalent.

pdpi · on Sept 22, 2023

I kind of consider algebraic data types (the combination of this style of enum (sum types) and structs (product types)) paired with pattern matching as the core feature set that any modern language must support. It makes for really simple, easy to follow code that’s much less error prone than the alternatives.

flohofwoe · on Sept 22, 2023

I just wish Rust wouldn't have called those things 'enums' but 'tagged unions', would have saved us C peasants a lot of confusion when encountering them first ;)

joshmarlow · on Sept 22, 2023

C was my first language, and I actually have the opposite opinion - I suspect that calling them 'enums' helps adoption. My reasoning is that phrases like 'tagged unions' and 'algebraic datatypes' strike some developers as sounding very academic/ivory tower and so somewhat intimidating. I think this is really unfortunate (and is definitely not universal among developers).

Being able to say 'Rust enums can carry arguments' - for some reason - sounds less intimidating and conveys the core feature.

flohofwoe · on Sept 22, 2023

I agree that something like 'sum type' sounds a bit too esoteric, but IMHO 'tagged union' is very descriptive. It's the same thing as a C union plus a tag indicating the currently active content (and that's also what the memory layout looks like under the hood).

(but yeah, 'enum with arguments' also describes it very well)

LoganDark · on Sept 22, 2023

They do point them out as tagged unions many times in the documentation. But the syntax `tagged_union Whatever {}` doesn't exactly have me jumping out of my seat.

pavlov · on Sept 22, 2023

It’s not like “enum” and “mut” are complete English words either, so maybe an abbreviation like “tun” could have worked.

But honestly I think enum was a better choice.

LoganDark · on Sept 22, 2023

I personally think enum was basically the only reasonable choice, as normal enums really are just a subset of tagged unions. They just happen to support tagged unions on top of them.

goku12 · on Sept 22, 2023

All C and C++ peasants should refer this: https://cheats.rs/#memory-layout . It will probably accelerate your initiation ritual.

Arnavion · on Sept 22, 2023

That would still be technically misleading, since enums don't necessarily have tags due to niche optimization. Eg `Option` is an enum but `Option<Box<Foo>>` does not have any tags; it uses the content being zero to represent None since `Box<Foo>` cannot be zero.

Also, Rust does have actual tagged unions for C interop that you have to define yourself as a `struct` with an int field and a `union` field, just like in C.

twic · on Sept 22, 2023

Rust enums always have a discriminant: https://doc.rust-lang.org/std/mem/fn.discriminant.html

Their representation in memory may not use it, but it's still defined.

0x457 · on Sept 22, 2023

They do, but IIRC this mostly to compare enum variants when it can't implement `Eq` and/or `PartialEq`. For example, in tests.

There are also zero variants enums that don't have any discriminant, but still could be used.

I'm not sure what's the point of comparing it to C because enums in C only carry tags and no data, while unions only carry data and no tag. Enums in rust could do both.

touisteur · on Sept 22, 2023

Discriminated Records is The Way.

newZWhoDis · on Sept 22, 2023

Probably my favorite aspect of Swift as well, enums + payloads/associated types make for extremely safe and easy to understand code.

Combine it with switches and you get compiler guarantees that every state is explicitly handled, and if you are in a particular state you always have the relevant child objects.

I remember being shocked dart lacked this functionality when I tried out flutter.

the__alchemist · on Sept 22, 2023

I joke to myself that I program with "struct and enum-oriented programming". I got it from Rust, but apply it to Python too. (Python enums aren't as ergonomic, and they can't wrap values, but they're a start)

qsort · on Sept 22, 2023

Python's equivalent of Rust's Enums would be the __match_args__ machinery rather than enums themselves, ironically.

They aren't as ergonomic or type-safe, and rather surprisingly the match statement is not an expression in Python's grammar, but regardless of its problems the match statement is very powerful, even more so than static equivalents.

GalaxySnail · on Sept 22, 2023

IMO Python's equivalent of Rust's Enums would be `typing.Union` [1]. It is more ergonomic and type-safe, and mypy supports type narrowing [2] and exhaustiveness checking on it [3].

[1] https://docs.python.org/3/library/stdtypes.html#types-union

[2] https://mypy.readthedocs.io/en/stable/type_narrowing.html

[3] https://docs.python.org/3/library/typing.html#typing.assert_...

boredumb · on Sept 22, 2023

Absolutely. It took me a few projects in Rust to sort of stumble on this and it can make some fairly complex business logic sit into nice matches that are trivial to run through.

ewuhic · on Sept 22, 2023

What would be the equivalent of these patterns in Go?

fsdjkflsjfsoij · on Sept 22, 2023

You can't do the exact same thing in Go because Go doesn't have a way to define a sum type yet outside of generic constraints. The closest you can get is using an interface and a type switch but that won't give you exhaustive matching.

goku12 · on Sept 22, 2023

I sometimes wish Rust had combined structs and enums into a single concept - same an enum. Structs would have been unnecessary. The compiler can simply avoid the tag or the union when a pure struct or a pure (c-type) enum is required, respectively.

bombela · on Sept 22, 2023

> The compiler can simply avoid the tag or the union when a pure struct or a pure (c-type) enum is required, respectively.

I had to try:

  pub enum Foo {
      Foo { a: i32 },
  }
  
  impl Foo {
      pub fn new() -> Self {
        Foo::Foo { a: 42 }
      }
  
      pub fn get_a(Foo::Foo{a}: &Self) -> &i32 {
          a
      }
  }

At opt level above zero (-C opt-level=1) the tag is elided:

  example::Foo::new:
          mov     eax, 42
          ret
  
  example::Foo::get_a:
          mov     rax, rdi
          ret

https://godbolt.org/z/qKzMqvhb7

goku12 · on Sept 23, 2023

That really does make structs superfluous, doesn't it? The only additional thing needed is a syntax sugar, where the variant name can be avoided if there is only one variant.

bombela · on Sept 23, 2023

Syntactic sugar to avoid the variant where there is only one, not only where used, but also when declared: enum Foo { Foo{...} }, what to do with the second Foo. And what about FFI (foreign function interface) where you have to maintain ABI compatibility over time. Factor all of that in, and it turns out Rust already has syntax for all of that: struct Foo {}.

abrgr · on Sept 22, 2023

Rust ADTs and pattern matching are so much better than other mainstream languages I find that once my code compiles it actually is almost always correct.

The next step is to encode your transition logic in the From impls between the enum structs and you've got yourself a first-rate state machine.

zozbot234 · on Sept 22, 2023

> Rust ADTs and pattern matching are so much better than other mainstream languages

Pascal and Delphi have always had variant records as part of the language. Are these not "mainstream" enough, especially Delphi?

diarrhea · on Sept 22, 2023

Uuugh, no. Not anymore at least.

LoganDark · on Sept 22, 2023

Those languages have other issues that make them weird to use. Rust has its fair share of weirdness as well, but most of that weirdness is just a direct side effect of the way that it is.

Klonoar · on Sept 22, 2023

Neither has been mainstream in years.

metaltyphoon · on Sept 22, 2023

I wish it had nested object destructions like C# has. The best part of it all is that it can return expressions.

duped · on Sept 22, 2023

Enums are fantastic. But they could be better!

One thing Rust could really use are anonymous unions (A | B |C instead of E::A(A), E::B(B), E::C(C)). They are to enums what tuple types are to structs.

Another thing that a new language designer might consider is a mechanism to control the layout. For example say I have a pair of nested enums

    enum A {
        A0(B),
        ...
        A15(B),
    } 

    enum B {
        B0,
        ...
        B15,
    }

The outer enum A can be represented as `u8` where the upper nibble is the tag for `A` and the lower nibble is the value of `B`.

This is kind of a niche thing, but you see it in binary protocols from time to time and losing the ergonomics of enum/match because the enum can't represent your data without widening it is a shame.

Another problem that shows up is this

    enum E {
        A = 0
        B = 1,
        Rest(u8),
    }

This can't be represented in 1 byte because `Rest` could be 0 or 1. There's no way to tell the compiler that the value of E::Rest is disjoint from any other values in the enum definition - the only way is to add `Rest1, Rest2, ...` variants for all possible values of the underlying data.

This problem crops up when you use the `zerocopy` crate.

And finally something that is super difficult to reason about (and has many implications) is storing the tag out-of-band of the enum data. I believe Zig can do this, but I'm not sure much how it works.

These are super minor gripes about using enums in Rust, but I feel like not enough discussion goes towards some of their limitations and tradeoffs, particularly for high performance applications.

twic · on Sept 22, 2023

> One thing Rust could really use are anonymous unions (A | B |C instead of E::A(A), E::B(B), E::C(C)). They are to enums what tuple types are to structs.

Ceylon had union types, which is the only place i've seen these: https://github.com/eclipse-archived/ceylon-lang.org/blob/mas...

Another thing Rust enums are missing is having each variant be a type. If you have an enum Shape with variants Circle, Rectangle, and Polygon, there is no way to write a function which only takes a Circle. So you end up defining a struct for each case, then making your enum a trivial wrapper round the three structs. You end up with Shape::Circle and Circle, which are different things, and writing code like c.0.radius to get at the fields. It's rather inelegant. So either variants should be types in their own right, or an enum should be defined as a composition of existing types.

k_g_b_ · on Sept 22, 2023

Ceylon had a really neat type system - sadly it didn't take off. However before that these types in particular were a feature of OCaml: https://v2.ocaml.org/manual/polyvariant.html There they're called Polymorphic Variants. Note also the implementation efficiency concerns on that page - given that Rust is a systems language, defaulting to the current sum types instead of set-theoretic unions was a reasonable choice. Having the option to use them natively and without macros (there's some crates) would be nice still when aware of the additional performance overhead.

gaganyaan · on Sept 24, 2023

I haven't been following this closely, so I looked it up and it looks like that's not going to happen for the foreseeable future unfortunately:

https://github.com/rust-lang/lang-team/issues/122

Kind of a shame, but wrapper types work well enough that I understand. It does look like if there was someone with enough resources to make it happen that they'd be receptive to it.

duped · on Sept 22, 2023

Typescript and Dart have union types that work this way.

Expurple · on Sept 23, 2023

Also, Python with type annotations. But it's not a very fair comparison, since objects in Python/TS are all heap-allocated, introspectable and contain rutnime type information. Rust doesn't have any of that by default

bPspGiJT8Y · on Sept 22, 2023

> They are to enums what tuple types are to structs.

But this is just a generic sum type?

    data Sum a b = L a | R b
    infixr 5 type Sum as ⊕
    type E₂ a b z = a ⊕ b ⊕ z
    type E₃ a b c z = a ⊕ b ⊕ c ⊕ z
    -- and so on…

Here, `Eₙ` represents a sum type with at least `n` members indexed by their position, and `z` represents any type so that it's possible to keep extending the number of positions via further nesting. When you're done you set it to a type with no members:

    type E₃AndNoMore a b c = a ⊕ b ⊕ c ⊕ Void

I don't know Rust so I can't claim if it allows it, but I'm almost certain it does.

duped · on Sept 22, 2023

> But this is just a generic sum type?

No, it's actually less generic. It's not determined by position but by type. For example `A | B | A` is the same type as `A | B`.

This is useful as a shorthand when you don't want/need a new type to represent your problem, similar to tuples.

bPspGiJT8Y · on Sept 22, 2023

So you're talking about untagged unions?

> This is useful as a shorthand when you don't want/need a new type to represent your problem, similar to tuples.

Yes this is handled perfectly by the generic sum type, you don't need untagged unions for this. Rust used to have Either in its standard library, but they removed it and kept Result only. Semantically they're the same (a ⊕ b) but Result's name implies it has something to do with some "results". Anyways nothing stops you from creating one yourself, or even using Result if you're fine with the weird-sounding name.

duped · on Sept 22, 2023

No, I am not talking about untagged unions. I don't understand your notation. To be concrete, I am talking about tagged, disjoint union type, that does not require naming a new type to use.

This is also not covered by the Either/Result type.

Rust supports untagged unions, but they cannot be matched (because they have no tag). An anonymous union would still be tagged internally, but would be less general purpose than the generic enum type.

bPspGiJT8Y · on Sept 22, 2023

> To be concrete, I am talking about tagged, disjoint union type

But you just said "For example `A | B | A` is the same type as `A | B`". How would this be possible for tagged union types?

> that does not require naming a new type to use

> This is also not covered by the Either/Result type

It's more probable that I'm just not understanding what you're talking about, but *the only* re-usable tagged union type similar to tuples is *the* sum type.

Let's say you're dealing coffee. People want it either with sugar or without sugar. You don't want to create a new sum type CoffeeFlavor? Fine, just use Either<Sugar, NoSugar>. This is *the* equivalent of a tuple. You need more than 2 options? No problem, Either<Sugar, Either<JustABit, NoSugar>>. I don't know what else could be a "anonymous tagged union".

duped · on Sept 22, 2023

Ah ok I think we're mixing up terms here - in the context of systems programming languages, a "tagged" union refers to an integer in front of a bag of bytes that holds the data of the "un tagged" union. Rust has both tagged (enums) and untagged (unions) union types.

What you're asking about is a discriminated vs non-discriminated union, and indeed, that's exactly what I'm talking about.

A | B |C is not the same type as Either<A, Either<B, C>> because Either<A, Either<A, B>> cannot type check as Either<A, B>.

But even if you want to argue that you can represent things that way, it misses the point. The goal is to remove complexity from the type hierarchy of the program, not add to it.

bPspGiJT8Y · on Sept 22, 2023

> because Either<A, Either<A, B>> cannot type check as Either<A, B>

Why would you want the former to type check as the latter? Where do you see the complexity?

k_g_b_ · on Sept 22, 2023

Because your code might not need to care about the position you insert your A or B (left/right for Either), you also might not care whether it's an (encoding as) Either<A,B> or SomeoneElsesEither<A,B> and you also don't want to have to deal with flattening nested Either's as in the example.

These types are also called "set-theoretic" types as A|B means exactly the set of all values that can be typed as A or typed as B - note that this also induces a whole subtyping rule by set inclusion and this is in contrast to sum types where a value typed Either<A,B> can never be typed A or B - to move between them you need to apply extractors/match/constructors/(not sure of standard type theory nomenclature).

Implementation of these union types might still need additional tags and construction/matching/extraction underneath, but from a programming perspective there's less complexity as compared to involving an additional named type Either (or EitherOf3 and EitherOf4 and ...) and manually implementing set-theoretic laws.

bPspGiJT8Y · on Sept 22, 2023

> Because your code might not need to care about the position you insert your A or B

This is understandable. But what does it have to do with "collapsing" `a | a` into `a`? Throughout your post I think you're talking about plain untagged union types but that's something the guy I've been replying to already ruled out. Position problem can be handled beautifully by variants based on row polymorphism, such as in OCaml or PureScript. There you can access the fields not by their position but by a key, like keys in objects in JS, meaning that they don't have to be ordered at all. It's like an inverse of a struct: in a struct all fields/keys are guaranteed to exist, but in a variant only one of them exists. Due to row polymorphism they can also be extensible. You can even "handle" a particular field/key and remove it from the type but keep all the other ones and delay handling them.

> you also might not care whether it's an (encoding as) Either<A,B> or SomeoneElsesEither<A,B>

This is a theoretical issue but in practice I don't think I've ever seen anyone using some non-standard Either-like datatype in languages I've dealt with. Where Either needs to be used people just use Either.

> and you also don't want to have to deal with flattening nested Either's as in the example

What would "flattening" mean here? Fundamentally there are only 2 operations you can do on a generic sum type like this: either inject a value (construct the type) or try to get the value at a certain position. You might also think pattern matching will get tedious, but that's not the case either, you can just have a function `actOnAorBorC` and call it with `actOnA`, `actOnB` and `actOnC` and do the pattern matching inside these functions.

k_g_b_ · on Sept 22, 2023

> Position problem can be handled beautifully by variants based on row polymorphism, such as in OCaml or PureScript. There you can access the fields not by their position but by a key, like keys in objects in JS, meaning that they don't have to be ordered at all. It's like an inverse of a struct: in a struct all fields/keys are guaranteed to exist, but in a variant only one of them exists. Due to row polymorphism they can also be extensible. You can even "handle" a particular field/key and remove it from the type but keep all the other ones and delay handling them.

Exactly. OCaml's polymorpic variants implement a subset of set theoretic types for specifically defined types - see also this ICFP'16 paper https://dl.acm.org/doi/abs/10.1145/2951913.2951928

For languages with more first-class/principles set-theoretic types see the Ceylon type system (sadly dead and archived at Eclipse ceylon-lang.org) or TypeScript (though they obviously also have to deal with JS which makes everything more messy than necessary).

With "Flattening" I mean applying the usual laws of set theory for simplified types: Either<Either<A,B>,A>> is doesn't express our intent for a function return or parameter type if we don't care about the position of A, just whether it is an A, the same with Either<A, Either<A,B>>>/etc, so we'd want all nested variations normalized to Either<A,B>. But we also don't care about the difference between Either<A,B> and Either<B,A> - normalizing this is already not easy without metaprogramming/type reflection. At this point it ceases to have any significant relationship to the original Either type. If we'd use it still to signify A|B and would actively need to call normalizing functions to keep our types clean and simple in this way, that adds non-semantic (regarding the intent of our code) noise to our code or we need to hide the complexity by using more abstract tools like e.g. monad transformers. If instead the language already provided these types, this complexity caused by embedding set theory inside the language doesn't leak into our code and our intent can be expressed more clearly in types without "bookkeeping" artifacts. This is only exacerbated when going to higher arities of sets/Either.

bPspGiJT8Y · on Sept 22, 2023

> Either<Either<A,B>,A>> doesn't express our intent for a function return or parameter type if we don't care about the position of A, just whether it is an A

> so we'd want all nested variations normalized to Either<A,B>.

Sorry, perhaps my thinking is shaped by nominal type systems rather than structural, but if the only thing we care about is whether the type is A, then how do we end up having Either<Either<A, B>, A>> in the first place? Thinking about this in terms of a nominal type system, the specific type you present here has to have some specific meaning associated with, specifically, this type, otherwise we would have chosen some other type. So the key thing here is that if we have Either<A, A> then it HAS to be distinct from simply A, otherwise we wouldn't have this type in the first place. Us constructing it means we associate it with a specific meaning so it has to be distinct from A. But if we DON'T care, then, I guess, we shouldn't use this type? Use the type we do care about? The same goes for Either<A, B> and Either<B, A>.

> or we need to hide the complexity by using more abstract tools like e.g. monad transformers

This is interesting, how do monad transformers relate to this problem?

k_g_b_ · on Sept 23, 2023

Well, you might use a different custom named type than Either, but sum types only give you basically that meaning - you can't enforce the invariants we want. You could use other type mechanisms of your language (e.g. type classes or dependent types) to embed some form of set theoretic types and hopefully leak less of your abstraction (needing "bookkeeping" to keep the invariants) or deal with restricted forms (the type violating some of the invariants we want in some situations).

The examples above or Either<A,A> could result from polymorpic functions that would return a set of types that the function is abstracting about, something like: pickRandom<S,T> : S, T -> S|T. With Either<S,T> you would get pickRandom<A,A> a1 a2 : Either<A,A> (requiring cleanup if you want the invariants I wrote about), with set theoretic types you'd get A. If you have pickRandom<A|B, B|C> x y you would get nested Either's or just A|B|C respectively.

Either is a Monad and so Haskell and others allow us to hide a bunch of complexity of reducing nestings by using abstractions and custom magic syntax (do notation) built for them - but the underlying complexity of the type and necessary mental model remains. Monad transformers become a necessity because you already needed the Monad magic for the cleanup, but you also have another Monad you care much more about then Either (like IO), see e.g the answer here https://stackoverflow.com/questions/67617871/reduce-nestedne... Note that this isn't talking about nested Either's, just the nested syntax for handling them without using it as a Monad and do notation, with actual nested Either's you'd need to do more cleanup.

bPspGiJT8Y · on Sept 23, 2023

> If you have pickRandom<A|B, B|C> x y you would get nested Either's

If this wasn't the case, how would the information about what you got be retained? It's either positional, or by a tag/key (row-polymorphic variants), or none retained.

I don't see why would you want to use monadic API for approaching an "anonymous sum type" problem in the first place. As I said before, there are fundamentally just 2 operations you would want to use: inject and project. Maybe you could also mention assoc for re-association but I'd say if you're using it you're likely handling the problem the wrong way. So I still don't see how monad transformers play into this. They are a nice (decent, at least) trick for dealing with some situations but the problem we're talking about here isn't one of them.

duped · on Sept 22, 2023

Consider this code:

    fn foo () -> A | B | C {
        if condition {
            bar();
        } else {
            baz();
        }
    }

    fn bar() -> A | B {
        ...
    }

    fn baz() -> B | C {
        ...
    }

vs

    fn foo () -> Either<A, Either<B, C> {
        if condition {
            match bar() {
                Either::Left(a) => Either::Left(a),
                Either::Right(b) => Either::Right(Either::Left(b)),
            }
        } else {
            match baz() {
                Either::Left(b) => Either::Right(Either::Left(b),
                Either::Right(c) => Either::Right(Either::Right(b)),
            }
        }
    }

    fn bar() -> Either<A, B> {
        ...
    }

    fn baz() -> Either<B, C> {
        ...
    }

The latter code composes poorly and requires an extra branch at runtime. It is fundamentally more complex to dispatch on nested discriminated unions instead of flat non-discriminated unions both for the programmer to write, read, and for the runtime to execute.

The compiler can also optimize the representation of the anonymous enum based on the context in which its created, whereas its more difficult to do that in the discriminated case.

This isn't a controversial opinion, there are mountains of Typescript written in this style.

bPspGiJT8Y · on Sept 22, 2023

So basically the idea here is that you want to have TS-style untagged unions, but instead they're also tagged, but still unify and compose the way they do in TS? Then why couldn't you just do `{ tag: A, data: … } | { tag: B, … } | { tag: C, … }`? Wouldn't it solve your problem?

We didn't start with composability as a requirement but you're right in that if it's a goal then nesting Either's is a rather poor solution. A better fit would be variants based on row polymorphism as I described in the reply to the other poster.

It wouldn't be a 1:1 mapping to your first example though, if your union is ultimately closed (as in your first example) then you'd still need to have one extra no-op function call to unify the types. Not a big deal but row-polymorphic variants lose here. On the other hand, IMO the possibility of having them open as well is the killer feature.

Ultimately though, I don't like this style of type unification as the one happening in your first example. Shaped by the languages I'm working with, I simply don't end up in situations where I'd need something like this. I just approach the problems differently. But this is more of a subjective territory here.

zozbot234 · on Sept 22, 2023

You can do this by just implementing From<u8> and Into<u8> for your type - using the enum representation only for assignment or pattern matching. The more principled solution AIUI would be to have "patterns" as a first-class citizen within the language.

Storing the tag 'out of band' is something you can only do as part of some larger object, in which case you can similarly have getters and setters that take or return enums and do the appropriate conversion.

duped · on Sept 22, 2023

That's not equivalent. Notably if you have a `&[u8]` you can't transmute it to `&[E]`. It's also noisy when you have the enum as part of a larger struct, and can make encoding/decoding very verbose.

brigadier132 · on Sept 22, 2023

> From<u8> and Into<u8> for your type

It gets verbose fast when you are talking about all combinations of variants of an enum.

zozbot234 · on Sept 22, 2023

The point is that you only have to do it once, when defining the object. Everything else then happens via the From and Into implementations, which the compiler will generally be smart enough to inline. So it'll be just as efficient as working on the underlying u8.

brigadier132 · on Sept 22, 2023

You have to do it once for each subset of variants your state machine requires. Which if you enumerate them all is 2^N from implementations.

hurril · on Sept 22, 2023

Then what does this mean?

let x: u8 = E::Rest(0).into();

let y: E = x.into();

duped · on Sept 22, 2023

Bad example, that should panic (in a perfect world it should be a compile error). The invariant of the type is that `E::Rest` cannot hold zero.

hurril · on Sept 22, 2023

Why is that a bad example? It's proof that implementing From does not solve the problem.

cpuguy83 · on Sept 22, 2023

[1] Using enums as a compile-time safe state-machine.

Very powerful tool. I wish Go had (real) enums.

https://github.com/containerd/runwasi/blob/ba5ab5ada5a401762...

dep_b · on Sept 22, 2023

I love enums in Swift because they can have associated data. It's great to see this in other languages as well. They're so great to support the different states your application can be in, they can replace a lot of nullable values that only are set in a given state and in others not.

manoDev · on Sept 22, 2023

As someone not versed in Rust I understand the point of the article is showing the Enum construct is powerful, but if you are worried about self-documenting code and want to avoid illegal state transitions, isn't it as simple as defining a map of (pseudocode here):

    {Active -> Inactive,
     Inactive -> Active,
     Active -> Suspended,
     Suspended -> Active,
     ...}

And then having only _one_ function that mutates and checks for the valid transitions? In the author's implementation you need to read a lot of code to derive the state machine from the method's implementations instead of it being immediately obvious from looking at a data structure. I understand there's a benefit of the implementation being checked by the compiler in this way, but at the same time it seem to spread logic across many methods. Is there an alternative middle-ground?

taeric · on Sept 22, 2023

Agreed. I've actually been criticized at several positions for a big "manage_transition" function that is basically either chained if statements or a switch. Either is fine as long as you keep it linear and organized. Both can be terrible if you don't do those things. But, most alternatives can get unwieldy for the same reasons.

The worst is when someone refactors it into a "modern" approach and then proceeds to break the general flow of the state machine again and again.

bombela · on Sept 22, 2023

To me this feels similar to C++ where you have to choose between static or dynamic dispatch at the implementation (template vs virtual methods). This means the user is forced into it one or the other.

While in Rust, the implementation is done for a Trait, and the user can choose static or dynamic dispatch (Trait vs dyn Trait).

I feel the same dissonance between static and dynamic state machines in Rust (type states vs enum). Sometimes I want to enforce it at compile time, while sometimes, at runtime. And the implementation is forced to choose for the user.

I am sure one could write some (proc) macro, and there might be some crates to do that already. But it doesn't feel as elegant as the static/dynamic Trait in my mind.

brigadier132 · on Sept 22, 2023

How is that checked at compile time?

aranw · on Sept 22, 2023

I really like enums in Rust. It's one of the features that makes me want to use it. I kind of wish Go had Rust like enums built into the language

the__alchemist · on Sept 22, 2023

Adding to the section on the end about deserializing an integer to enum: The `num-enum crate` (https://docs.rs/num_enum/latest/num_enum/) it great for that. I make heavy use of the `repr()` and `TryFromPrimitive` for [de]serializing data to/from byte arrays for IO.

milliams · on Sept 22, 2023

I found this (old [2016] but still relevant) blog post expanding on this at https://hoverbear.org/blog/rust-state-machine-pattern/ which was very useful for describing type-safe state machines in Rust.

hresvelgr · on Sept 22, 2023

Perhaps I am being cynical, but this doesn't really provide a better explanation about Rust's ADT enums than the official Rust book, albeit with the note about `#[repr(...)]`. Seems a bit low effort, I would have liked to have seen some examples that are more practical and less foobar-ish.

kortex · on Sept 22, 2023

> Deleted { deleted_at: DateTime<Utc> },

What does this look like under the hood (in memory)? Does the compiler automatically generate a struct/union? Does the value take up the same width regardless of state?

sowbug · on Sept 22, 2023

Yes; they take up the size of the largest variant. If you're concerned about that, you can Box<> the contents, and then each variant is effectively a same-sized pointer to something on the heap (plus some bookkeeping).

goku12 · on Sept 22, 2023

Reposting: https://cheats.rs/#memory-layout

> Does the value take up the same width regardless of state?

Yes. As the other commenter mentioned, it's the size of the largest variant (same as a union in C) + a tag (almost the same as an enum in C). In some rare cases, the compiler even manages to optimize out the tag.

WhereIsTheTruth · on Sept 23, 2023

I said it for Zig, but the same is valid for Rust, it's one of the feature that makes these languages better than C