This is the first I've heard of Shen, so I can't say.
However the Racket engineers, particularly Matthias Felleisen, are doing cutting-edge research into types. Typing is not a solved problem, because if it was, there wouldn't be a static vs dynamic type debate that lingers ad nauseum.
Racket is in an interesting sweet spot for typing research because it is a traditional lisp with dynamic types, to which gradual typing that works both in typed and untyped land has been introduced and continues to undergo extensive attention.
"Typing is not a solved problem, because if it was, there wouldn't be a static vs dynamic type debate that lingers ad nauseum." Could you clarify? Surely even if typing was completely solved, we would still have debates of that kind, because there are benefits to both approaches and there will always be people preferring one or another.
The reason typing is not a solved problem is this: there are valid programs which can be expressed in an untyped language that cannot be (directly) expressed in a typed one at the moment. For example:
(define (foo p?)
(if p?
42
"forty-two"))
What is the type of `foo`? It could be `bool -> int` or `bool -> string`, depending on the result of `p?`! In an untyped language, this is a perfectly valid program. In a typed language, this presents a type error, even if `p?` happens to always result in `true` or always result in `false`. The programmer reading the program knows that if `p?` is always `true`, then the type of `foo` is `bool -> int`, and that if `p?` is always `false`, the type is `bool -> string`. However, this program, as-is, is will be rejected as ill-typed.
In a language providing sum types, we can work around this by using a type like `(int * string) either`:
Where `left` and `right` are type constructors for values of type `('a * 'b) either`. Now, the type of `foo` is `bool -> (int * string) either`. This workaround effectively adds a run-time tag to the `either` values to distinguish the `left` case from the `right` case, which is essentially the way dynamically checked languages operate for values of all types but in a more restricted, specific, and arguably meaningful form. In many (most?) languages with sum types, the language will ensure that the programmer always checks the tag of the `either` values before accessing the underlying "actual" value -- this is the only way to guarantee type safety while ensuring that the program does not barf due to a "type" error at run-time.
Now, it could be argued that the typing problem presented above is basically solved by sum types. But it could also be argued that this doesn't really solve the problem, because what we'd really like is a way to specify that the type of `foo` depends on the result of `p?`, which could maybe be written as `p:=bool -> (p ? int : string)`, for example. Perhaps dependent typing could help us come to a true solution for this case, but there are other cases where other solutions will be needed.
According to Benjamin C. Pierce in Types and Programming Languages:
"Being static, type systems are necessarily also conservative : they can categorically prove the absence of some bad program behaviors, but they cannot prove their presence, and hence they must also sometimes reject programs that actually behave well at run time. ... The tension between conservativity and expressiveness is a fundamental fact of life in the design of type systems. The desire to allow more programs to be typed -- by assigning more accurate types to their parts -- is the main force driving research in the field.[0]
In essence, while there is already a vast array of programs we already know how to type, and while it's possible to work around cases that resist typing, type theory is still an active field of research with plenty of open questions yet to be answered. The corollary to this is that there are still valid reasons to prefer untyped languages to typed ones, as my sibling commentor pointed out. The day when typing is solved is the day people can no longer not look like an bozos for preferring untyped languages, but until that day comes, untyped languages remain fundamentally more expressive than typed ones (in the sense that they're fundamentally capable of expressing more programs[1]).
---
[0]: Pierce, Benjamin C. Types and Programming Languages. MIT Press, 2002. §1.1, pp. 2,3.
[1]: There's also a sense in which typed languages are more expressive than untyped ones, and that's the sense that they are capable of expressing more properties of the program as part of the program itself.
SBCL is based on Kaplan & Ullman flow-graph analysis (inherited from CMUCL), not Hindley-Milner. See also http://home.pipeline.com/~hbaker1/TInference.html (Nimble Type Inference, H. Baker). Apparently, Typed Racket's papers have other references about related work, albeit not explicitly those.
In OCaml, a function which throws an exception has not a different type than one which doesn't. So you can get runtime errors even when you typecheck. In SBCL, the NIL type (bottom) effectively conveys the lack of return value and is used when you call (ERROR ...) or simply with an infinite loop (LOOP). Things are different because the need are different. Basically, types in ML must be disjoint, which is impractical in Lisp.
Is that the type of FOO or the return type of FOO? How are the types or number of arguments specified for function types?
> SBCL is based on Kaplan & Ullman flow-graph analysis (inherited from CMUCL), not Hindley-Milner. See also http://home.pipeline.com/~hbaker1/TInference.html (Nimble Type Inference, H. Baker). Apparently, Typed Racket's papers have other references about related work, albeit not explicitly those.
I've read some of Henry Baker's other work, and the man is brilliant. I wish he were more well-known.
This is no exception, however it seems that this inferencer is useful primarily for enabling compiler optimizations rather than for performing type checking. Is that correct?
What I mean is, say you want to pass the result from FOO above into another function BAR which expects a numeric argument. It seems to me that BAR would need to have an argument of type (OR (INTEGER 42 42) (SIMPLE-ARRAY CHARACTER (9))) in order to guarantee type safety. Otherwise, the type checker couldn't reject a program that passes a string result of FOO as a numeric argument of BAR. Am I understanding this correctly?
> In OCaml, a function which throws an exception has not a different type than one which doesn't. So you can get runtime errors even when you typecheck.
Right, but I'm talking specifically about run-time type errors, i.e., errors that occur because some value was passed to an operation that does not handle values of such type. What I meant by my statement is that even with type checking, the programmer must handle all cases of a sum type when dealing with it in order for the program to be type safe (in the sense that not only does it not "go wrong" in the Milner sense, but also that it does not suffer a type error at run-time).
> Is that the type of FOO or the return type of FOO?
It is the type of FOO, it reads as: a function which takes one argument of any type (T) and returns exactly one value, which is either 42 or a string of length 9.
Here there are three arguments, the third one being a vector of bytes of length 1024. Note that the return values was inferred from the inputs.
> This is no exception, however it seems that this inferencer is useful primarily for enabling compiler optimizations rather than for performing type checking. Is that correct?
It is a mix of both, really.
CL is primarily designed to be dynamic. Static analyses are used to optimize code and prevent classes of errors if they can be detected in advance. If you define detecting a type error as a positive test, then SBCL allows to have false negatives. That happens in cases where the expected and actual types overlap: there might be an error, or not, so the actual check is delegated at runtime.
Another thing is that with global functions (defun), it seems that there is some widening happening, for the return type in particular, so that (OR INTEGER STRING) is treated as T. This does not happen with inline or local functions. Note also that global functions can be called from anywhere, be redefined (except standard ones) and they are generally responsible for checking their arguments, except when you explicitely turn the safety knob down and add type declarations.
So let's say that XYZ above is declared to be inlined, and we use it as follows:
(defun use-xyz-1 (x y z)
(declare (type positive-float y)
(type (integer 0 3000) x))
(xyz x y z))
The above is compiled without problems, even though you could give values which would make an out-of-bounds access. However, you will surely agree that there are theoretical limits to static type checking, so it is expected that not all expressions can be typed in CL as precisely as you could wish (of course, going up the lattice, functions accept type T arguments). However, when you change type declarations so that the intersection of expected/actual types is empty:
(defun use-xyz-2 (x y z)
(declare (type positive-float y)
(type (integer 2000 3000) x))
(xyz x y z))
... the compiler warns you that:
;; Derived type (INTEGER 2000 4611686018427387900) is not a suitable
;; index for (VECTOR (UNSIGNED-BYTE 8) 1024)
The way SBCL treats declaration is that they are used as assertions (except for return types in global declarations, see manual).
So what does it mean to treat declaration as assertions? Here is FOO:
It makes a new string made of N times character "#", where N is computed from the float input. Then, we call FOO from BAR:
(defun bar (x)
(foo x)
(typecase x
(float 0)
(integer 1)
(t 2)))
The unique value returned by BAR is of type `(integer 0 0)`, because knowing that `(foo x)` succeeds allows us to conclude that X was indeed a FLOAT, and thus the TYPECASE expression necessarily returns zero. Declarations, assertions, etc. can be used by the compiler.
Note that the equivalent (w.r.t. return value) function below has a different type:
(defun bar (x)
(prog1
(typecase x
(float 0)
(integer 1)
(t 2))
(foo x)))
This time, the return type is (MOD 3), i.e. the set {0,1,2}, even though propagation could be applied backward. However, backward propagation seems to pose problem w.r.t. the CL standard, at least that's what is said here (a good reference, by the way):
Static typing in SBCL gives something I did not yet witness in other languages. I defined a state machine with local functions, roughly as follows:
(defun sm ()
(let ((state))
(labels ((a () (setf state #'b))
(b () (setf state #'c))
(c () (if (plusp (random 2))
(setf state #'d)
(setf state #'b)))
(d () (setf state #'a))
(e () (return-from sm)))
(loop (funcall state)))))
So the local variable "state" holds current function. And so, this compiles (and the actual, longer code did so as well) with a note saying "deleting unused function (LABELS E :IN SM)", because "state" is known to never reach E. By the way, function SM never returns normally, as explained by the NIL return type. That was a useful and unexpected thing to notice.
Wow! That's a much more detailed response than I'd expected. Thank you! I wish I could upvote you twice! :)
I've used Common Lisp and even SBCL quite a bit, and had no idea SBCL could do some of this stuff. That last example is particularly impressive, given how hairy control-flow analysis can get when higher-order functions are involved.
Interesting! Such an extensional definition of a type could be really flexible, but seems like it would be limited to finite types. Is this the case? Or can you specify types with arbitrary judgments?
It's only limited in such that you must be able to express the rule using the Shen lisp language. The datatypes are sequents, essentially propositions, that the type checker attempts to prove true.
For example, I could define odd integers like this:
(datatype odd-integer
if (integer? X)
if (odd? X)
________________
X : odd-integer;)
This is quite a nice write-up, thank you. Regarding this point:
>because what we'd really like is a way to specify that the type of `foo` depends on the result of `p?`
Is this situation not resolved by allowing p? to not be a bool necessarily but also a sum type that informs foo which case of a different sum type to return?
Anyway, I like your footnote [1] which, for me, is what allows a typed program to actually be more expressive, in the sense that you can literally express the types in your head as you code, which is difficult or simply not provided in a dynamic language, despite that most developers will be "thinking in types" regardless of typing of the language.
Now, "expressive" also often means "elegant" because you can express an idea easily with less verbosity in a dynamic language.
A language with type inference is an interesting medium that sort of removes the expressivity of explicit type annotations while gaining the expressivity of a dynamic language in some respects.
> Is this situation not resolved by allowing p? to not be a bool necessarily but also a sum type that informs foo which case of a different sum type to return?
In the example I provided, the `bool -> (int * string) either` function is essentially a map from a `bool` to an `(int * string) either`.
From a type-theoretic perspective, bool (which has two nullary type constructors, `true` and `false`) can be expressed as the sum of two units: `1 + 1` (or equivalently, `() + ()` or `unit + unit`) = `2`. If I understand your question correctly, then you're asking if we could just use some other sum type to express the branch to take, like: `type if-branch = consequent | alternative;`, and the answer is that such a type is precisely isomorphic to a Boolean -- because it consists of two nullary constructors, it can also be expressed as the sum of two units, `1 + 1` = `2`.
To answer your question more directly, `p?` already is a sum type that informs `foo` which case to return: `type bool = true | false;`.
> Anyway, I like your footnote [1] which, for me, is what allows a typed program to actually be more expressive, in the sense that you can literally express the types in your head as you code, which is difficult or simply not provided in a dynamic language, despite that most developers will be "thinking in types" regardless of typing of the language.
"Expressive" can mean any of quite a few different things, so I wanted to be specific which I meant. In this case, I mean that untyped languages are more expressive because they're capable of expressing a greater number of programs than their typed counterparts. The ultimate goal of type theory is to eliminate that gap, but short of some miracle revelation, we've resigned to gradually filling it in.
Note that this isn't the same as saying you can express it more elegantly or less verbosely in an untyped language -- there are some programs that are strictly not expressible with current type systems. That's on purpose! The point of the type system is to reject programs which have bugs caused by type errors, so we make those programs inexpressible on purpose, so it's usually a good thing to be less expressive in that way. The problem is that our type systems sometimes reject bug-free programs that actually would run totally fine simply because it wasn't able to prove that such was the case. Again, type theorists are working to improve the situation.
What it comes down to is a question of what is being expressed. Taking programs as descriptions of processes, at the lowest level a language allows us to express a run-time process. Going up one level to the level of types, we can express information about the program itself. Thus, in a sense, by using a typed language we trade in the expressiveness of some processes for the expressiveness of some metaprocesses.
[1] - http://www.shenlanguage.org/learn-shen/types/types_sequent_c...