> This is useful as a shorthand when you don't want/need a new type to represent your problem, similar to tuples.
Yes this is handled perfectly by the generic sum type, you don't need untagged unions for this. Rust used to have Either in its standard library, but they removed it and kept Result only. Semantically they're the same (a ⊕ b) but Result's name implies it has something to do with some "results". Anyways nothing stops you from creating one yourself, or even using Result if you're fine with the weird-sounding name.
No, I am not talking about untagged unions. I don't understand your notation. To be concrete, I am talking about tagged, disjoint union type, that does not require naming a new type to use.
This is also not covered by the Either/Result type.
Rust supports untagged unions, but they cannot be matched (because they have no tag). An anonymous union would still be tagged internally, but would be less general purpose than the generic enum type.
> To be concrete, I am talking about tagged, disjoint union type
But you just said "For example `A | B | A` is the same type as `A | B`". How would this be possible for tagged union types?
> that does not require naming a new type to use
> This is also not covered by the Either/Result type
It's more probable that I'm just not understanding what you're talking about, but *the only* re-usable tagged union type similar to tuples is *the* sum type.
Let's say you're dealing coffee. People want it either with sugar or without sugar. You don't want to create a new sum type CoffeeFlavor? Fine, just use Either<Sugar, NoSugar>. This is *the* equivalent of a tuple. You need more than 2 options? No problem, Either<Sugar, Either<JustABit, NoSugar>>. I don't know what else could be a "anonymous tagged union".
Ah ok I think we're mixing up terms here - in the context of systems programming languages, a "tagged" union refers to an integer in front of a bag of bytes that holds the data of the "un tagged" union. Rust has both tagged (enums) and untagged (unions) union types.
What you're asking about is a discriminated vs non-discriminated union, and indeed, that's exactly what I'm talking about.
A | B |C is not the same type as Either<A, Either<B, C>> because Either<A, Either<A, B>> cannot type check as Either<A, B>.
But even if you want to argue that you can represent things that way, it misses the point. The goal is to remove complexity from the type hierarchy of the program, not add to it.
Because your code might not need to care about the position you insert your A or B (left/right for Either), you also might not care whether it's an (encoding as) Either<A,B> or SomeoneElsesEither<A,B> and you also don't want to have to deal with flattening nested Either's as in the example.
These types are also called "set-theoretic" types as A|B means exactly the set of all values that can be typed as A or typed as B - note that this also induces a whole subtyping rule by set inclusion and this is in contrast to sum types where a value typed Either<A,B> can never be typed A or B - to move between them you need to apply extractors/match/constructors/(not sure of standard type theory nomenclature).
Implementation of these union types might still need additional tags and construction/matching/extraction underneath, but from a programming perspective there's less complexity as compared to involving an additional named type Either (or EitherOf3 and EitherOf4 and ...) and manually implementing set-theoretic laws.
> Because your code might not need to care about the position you insert your A or B
This is understandable. But what does it have to do with "collapsing" `a | a` into `a`? Throughout your post I think you're talking about plain untagged union types but that's something the guy I've been replying to already ruled out.
Position problem can be handled beautifully by variants based on row polymorphism, such as in OCaml or PureScript. There you can access the fields not by their position but by a key, like keys in objects in JS, meaning that they don't have to be ordered at all. It's like an inverse of a struct: in a struct all fields/keys are guaranteed to exist, but in a variant only one of them exists. Due to row polymorphism they can also be extensible. You can even "handle" a particular field/key and remove it from the type but keep all the other ones and delay handling them.
> you also might not care whether it's an (encoding as) Either<A,B> or SomeoneElsesEither<A,B>
This is a theoretical issue but in practice I don't think I've ever seen anyone using some non-standard Either-like datatype in languages I've dealt with. Where Either needs to be used people just use Either.
> and you also don't want to have to deal with flattening nested Either's as in the example
What would "flattening" mean here? Fundamentally there are only 2 operations you can do on a generic sum type like this: either inject a value (construct the type) or try to get the value at a certain position. You might also think pattern matching will get tedious, but that's not the case either, you can just have a function `actOnAorBorC` and call it with `actOnA`, `actOnB` and `actOnC` and do the pattern matching inside these functions.
> Position problem can be handled beautifully by variants based on row polymorphism, such as in OCaml or PureScript. There you can access the fields not by their position but by a key, like keys in objects in JS, meaning that they don't have to be ordered at all. It's like an inverse of a struct: in a struct all fields/keys are guaranteed to exist, but in a variant only one of them exists. Due to row polymorphism they can also be extensible. You can even "handle" a particular field/key and remove it from the type but keep all the other ones and delay handling them.
For languages with more first-class/principles set-theoretic types see the Ceylon type system (sadly dead and archived at Eclipse ceylon-lang.org) or TypeScript (though they obviously also have to deal with JS which makes everything more messy than necessary).
With "Flattening" I mean applying the usual laws of set theory for simplified types: Either<Either<A,B>,A>> is doesn't express our intent for a function return or parameter type if we don't care about the position of A, just whether it is an A, the same with Either<A, Either<A,B>>>/etc, so we'd want all nested variations normalized to Either<A,B>. But we also don't care about the difference between Either<A,B> and Either<B,A> - normalizing this is already not easy without metaprogramming/type reflection. At this point it ceases to have any significant relationship to the original Either type. If we'd use it still to signify A|B and would actively need to call normalizing functions to keep our types clean and simple in this way, that adds non-semantic (regarding the intent of our code) noise to our code or we need to hide the complexity by using more abstract tools like e.g. monad transformers. If instead the language already provided these types, this complexity caused by embedding set theory inside the language doesn't leak into our code and our intent can be expressed more clearly in types without "bookkeeping" artifacts.
This is only exacerbated when going to higher arities of sets/Either.
> Either<Either<A,B>,A>> doesn't express our intent for a function return or parameter type if we don't care about the position of A, just whether it is an A
> so we'd want all nested variations normalized to Either<A,B>.
Sorry, perhaps my thinking is shaped by nominal type systems rather than structural, but if the only thing we care about is whether the type is A, then how do we end up having Either<Either<A, B>, A>> in the first place? Thinking about this in terms of a nominal type system, the specific type you present here has to have some specific meaning associated with, specifically, this type, otherwise we would have chosen some other type. So the key thing here is that if we have Either<A, A> then it HAS to be distinct from simply A, otherwise we wouldn't have this type in the first place. Us constructing it means we associate it with a specific meaning so it has to be distinct from A.
But if we DON'T care, then, I guess, we shouldn't use this type? Use the type we do care about? The same goes for Either<A, B> and Either<B, A>.
> or we need to hide the complexity by using more abstract tools like e.g. monad transformers
This is interesting, how do monad transformers relate to this problem?
Well, you might use a different custom named type than Either, but sum types only give you basically that meaning - you can't enforce the invariants we want. You could use other type mechanisms of your language (e.g. type classes or dependent types) to embed some form of set theoretic types and hopefully leak less of your abstraction (needing "bookkeeping" to keep the invariants) or deal with restricted forms (the type violating some of the invariants we want in some situations).
The examples above or Either<A,A> could result from polymorpic functions that would return a set of types that the function is abstracting about, something like: pickRandom<S,T> : S, T -> S|T. With Either<S,T> you would get pickRandom<A,A> a1 a2 : Either<A,A> (requiring cleanup if you want the invariants I wrote about), with set theoretic types you'd get A. If you have pickRandom<A|B, B|C> x y you would get nested Either's or just A|B|C respectively.
Either is a Monad and so Haskell and others allow us to hide a bunch of complexity of reducing nestings by using abstractions and custom magic syntax (do notation) built for them - but the underlying complexity of the type and necessary mental model remains. Monad transformers become a necessity because you already needed the Monad magic for the cleanup, but you also have another Monad you care much more about then Either (like IO), see e.g the answer here https://stackoverflow.com/questions/67617871/reduce-nestedne...
Note that this isn't talking about nested Either's, just the nested syntax for handling them without using it as a Monad and do notation, with actual nested Either's you'd need to do more cleanup.
> If you have pickRandom<A|B, B|C> x y you would get nested Either's
If this wasn't the case, how would the information about what you got be retained? It's either positional, or by a tag/key (row-polymorphic variants), or none retained.
I don't see why would you want to use monadic API for approaching an "anonymous sum type" problem in the first place. As I said before, there are fundamentally just 2 operations you would want to use: inject and project. Maybe you could also mention assoc for re-association but I'd say if you're using it you're likely handling the problem the wrong way. So I still don't see how monad transformers play into this. They are a nice (decent, at least) trick for dealing with some situations but the problem we're talking about here isn't one of them.
The latter code composes poorly and requires an extra branch at runtime. It is fundamentally more complex to dispatch on nested discriminated unions instead of flat non-discriminated unions both for the programmer to write, read, and for the runtime to execute.
The compiler can also optimize the representation of the anonymous enum based on the context in which its created, whereas its more difficult to do that in the discriminated case.
This isn't a controversial opinion, there are mountains of Typescript written in this style.
So basically the idea here is that you want to have TS-style untagged unions, but instead they're also tagged, but still unify and compose the way they do in TS? Then why couldn't you just do `{ tag: A, data: … } | { tag: B, … } | { tag: C, … }`? Wouldn't it solve your problem?
We didn't start with composability as a requirement but you're right in that if it's a goal then nesting Either's is a rather poor solution. A better fit would be variants based on row polymorphism as I described in the reply to the other poster.
It wouldn't be a 1:1 mapping to your first example though, if your union is ultimately closed (as in your first example) then you'd still need to have one extra no-op function call to unify the types. Not a big deal but row-polymorphic variants lose here. On the other hand, IMO the possibility of having them open as well is the killer feature.
Ultimately though, I don't like this style of type unification as the one happening in your first example. Shaped by the languages I'm working with, I simply don't end up in situations where I'd need something like this. I just approach the problems differently. But this is more of a subjective territory here.
> This is useful as a shorthand when you don't want/need a new type to represent your problem, similar to tuples.
Yes this is handled perfectly by the generic sum type, you don't need untagged unions for this. Rust used to have Either in its standard library, but they removed it and kept Result only. Semantically they're the same (a ⊕ b) but Result's name implies it has something to do with some "results". Anyways nothing stops you from creating one yourself, or even using Result if you're fine with the weird-sounding name.