Deca - a systems language based on modern PL principles

Dn_Ab · on Jan 2, 2012

I keep a (small) list of active awesome and interesting alternatives to C. Cyclone has been mentioned already, adding Deca now. the other two I know are:

* ATS http://www.ats-lang.org/#what_is_ats_good_for

* Clay http://claylabs.com/clay/

ehsanu1 · on Jan 2, 2012

You may be interested in this (see the section about Nothing and Maru): http://www.vpri.org/pdf/tr2011004_steps11.pdf

A lot of work there has been done in completely replacing the entire current software development stack with something more reasonable (in terms of size/bloat and design). They aim for a full system (frontend user apps to the metal) in under 20,000 lines of code. Check it out. Quite the alternative to C, and everything.

Dn_Ab · on Jan 2, 2012

Thanks yeah, I've heard of VPRI and read a few of thier papers and seen some of their videos. Alan Kay is one of my heroes and I really do think that the idea of breaking down problems to domains and working within that is very important going forward.

nponeccop · on Jan 2, 2012

I'm shamelessly adding my own HNC here :) http://code.google.com/p/inv/ https://github.com/kayuri/HNC/wiki I'm moving to GitHub - so 2 repos. "C++ is an evil language used at early stages of HNC development", so it will go away. See also discussions on reddit (submitted not by me lol) http://www.reddit.com/r/haskell/comments/mldzq/hnspl_a_bette... and here on HN long time ago http://news.ycombinator.com/item?id=2116337 (submitted by me)

nponeccop · on Jan 2, 2012

See also Vala, Mercury, Felix and Haxe.

nickik · on Jan 2, 2012

Dont you know about Rust? If you do why is it not on the list?

rntz · on Jan 2, 2012

Rust, while interesting and admirable, isn't quite as low-level as C or Deca. For example, one of Deca's explicit goals is to not require a garbage collector or a runtime library. Rust has GC and a light-weight-threading runtime. So it's not quite a replacement for C, though it's certainly a good replacement for much of the application-level programming C and C++ are (unfortunately) widely used for.

nickik · on Jan 2, 2012

Ok, thx.

Rust certently has a runtime, cant argue with that.

I thought Rust allows a pretty tight control over allocation with the pointer hierarchy feature. The GC is designed not to be global and should be pretty easy ignore. They even have diffrent kinds lambdas to allow the programmer if the should be allocated on the heap or the stack.

I have not done alot of low level programmig but I want to learn more about writting Jits and the pretty much need to be rather low level atm there is not really an alternative then writting them in C (witch sucks).

Dn_Ab · on Jan 2, 2012

Yes I know of Rust and think it's very interesting. For most system level programming it should be an excellent choice. It's great that the dearth of systems capable languages is something that is becoming less of an issue.

But when i was making the list I was thinking about if the language could possibly be used in an embed context. C is still king there. Rust I believe, is partially garbage collected in a way that is not practically avoidable. Any use of boxed types or abstract datatypes will incur this penalty. So with that criterion rust is not an option, in an embed system continuous execution is most vital. The languages I listed are either not garbage collected or with an ability to turn it off or use an alternative to abstract datatypes and have basically no overhead. The last time I saw Rust mentioned on HN a team member validly stated that such concerns are not the priority at the moment.

obtu · on Jan 2, 2012

Garbage-collected languages can still be fast on microcontrollers. When there isn't a lot of memory it only takes a few milliseconds to run a collection cycle. The problem would be when you need real-time and can't deal with the jitter.

pjmlp · on Jan 3, 2012

There are real time GC working on mission critical embedded scenarios in real life.

One example is one of the french radar system for missile defense, Normandie.

nponeccop · on Jan 2, 2012

Can modern garbage collectors deal with almost full heaps? There is an urban legend that GC only shows good performance with 50% of memory free, which is certainly unacceptable for microcontrollers. Also, most garbage collectors require boxing which prevents tight packing of heap data. Are all these problems already solved somewhere? Jitter is the least important issue as there are a lot of works on realtime collectors.

obtu · on Jan 2, 2012

Unfortunately I've no idea how the various GCs deal with the trade-offs; my point is that (memory bandwidth / memory size) in these embedded things is high enough that garbage-collection is fast, contrary to popular wisdom. I was looking at OCAPIC for an example, which has a very simple stop and copy gc which takes 1.5ms to collect. The trade-offs would have some 2× impact on some metric, but not change the feasibility of the thing.

nponeccop · on Jan 2, 2012

My point is that 2x slowdown is critical for many systems - embedded, gamedev, HPC, systems software. Controlled measurement of slow-down and heap slack caused by GC is beyond capabilities of most embedded devs. Academia provided us with some experimental results saying that in half-empty heaps GC is lightning fast, but there's no (recent) data for 90% full heaps and for very large heaps.

wbhart · on Jan 2, 2012

It's great to see a language which doesn't ignore 20th and 21st century programming language theory. It's tremendously boring waiting for the future to happen.... Anyhow, the type system looks quite interesting.

The choice of LLVM for C interface and performance is also really sensible.

I was surprised to find the language implemented in Scala and Java though. Perhaps there will be some kind of bootstrap, or maybe the ability to access the java libraries is intentional.

Without proper documentation it is a bit hard to evaluate this more fully at this stage. I looked but didn't find many docs yet. Certainly a language to watch.

eli_gottlieb · on Jan 2, 2012

The language is a compiled language, with the JVM being used (originally) because I wanted a particular parser generator. Deca code can't access Java libraries because Deca code compiles to LLVM bitcode and thence to machine code. decac is a compiler, not an interpreter, so it takes the Deca code in and outputs the LLVM bitcode.

Sorry about the lack of docs. I've been slowly dumping my undergraduate thesis on this into the wiki, but the language has been evolving and I've been starting a career. Oy ;-).

_Y_ · on Jan 2, 2012

Sorry to nag, but how is outputting to LLVM bitcode not interpreting? You are taking source code and turning it into intermediate form (i.e. LLVM bitcode), no?

skrebbel · on Jan 2, 2012

An interpreter means a program that runs a program, straight from its source code.

A compiler transforms a program from one format into another. decac does not execute programs, so it's not an interpreter.

nponeccop · on Jan 2, 2012

LLVM bytecode does not exist after linking - it turns into usual machine code. It is not even a JIT compiler, talk less of an interpreter.

eternalban · on Jan 2, 2012

I greped for "core", "memory manager", "thread", "threading", and "cache", in the pdf [1]. Am I missing something? I'll probably get flak for this, but this programmer looks for a modern systems language that directly addresses these concerns.

"The fundamental problems of a systems-programming task or environment are hard limits on computational resources and a lack of safety protections. Systems programs have to deal with hardware-imposed limitations on their CPU time, registers, memory space, storage space on I/O devices, and I/O bandwidth. They also often have to deal with a lack of the safety features normally given to most programming environments: garbage-collection of memory, validation of memory accesses, synchronization primitives, abstracted access to I/O devices, and transparent multitasking. In fact, the point of systems programming is usually to create such abstractions and program such protections."

Let's take Go, as an example. (Or D). Per above definition, neither is a "systems programming" language. I know for a fact [2] that Go team would disagree.

So what is the accepted definition of a "modern" systems programming language?

[1]: http://code.google.com/p/decac/downloads/detail?name=Deca%20...

[2]: http://golang.org/doc/go_faq.html#What_is_the_purpose_of_the...

smosher · on Jan 2, 2012

I've always taken "systems language" to mean something you might write an OS in while the Go team takes it to mean something you'd replace Java with. Whichever way you want to look at it one thing the Go team can't say is that writing an OS is a "non-systems" problem.

Ideally these would be disjoint sets (in my opinion) but consider this: ASM -> C -> C++ -> Java -> ... looks like a continuum if you're writing any kind of Java code, but there's a sharp discontinuity in there if you're writing (say) a boot loader.

eternalban · on Jan 2, 2012

> Whichever way you want to look at it one thing the Go team can't say is that writing an OS is a "non-systems" problem.

I do agree with that. (Perhaps it would be helpful to apply the high/low qualifier to systems languages?)

> Ideally these would be disjoint sets (in my opinion) but consider this: ASM -> C -> C++ -> Java -> ... looks like a continuum if you're writing any kind of Java code, but there's a sharp discontinuity in there if you're writing (say) a boot loader.

Food for thought. I am not arguing for needless complexity (in a language) as far as the boot up phase is concerned. As far as I understand it, the loader has its critical, but short lived, life-cycle role to play and then it is out of the way. Arguably, it is distinct from the OS. Can we not continue to write them in ASM/C and load operating systems that are written in a more "modern" language?

smosher · on Jan 2, 2012

Personally I'd sooner map the somewhat fuzzier concept of "applications language" onto that supposed continuum instead. It would put Go in a clearer category, since it seems to compete with Java and Python better than it does with C.

I still don't imagine an OS being written in a non-systems application language, since you've got drivers and performance critical subsystems to write and things like GC to worry about from the apps-only languages.

pjmlp · on Jan 3, 2012

Have a look at Spin or Native Oberon, just to name two operating systems written in GC enabled languages.

Before C existed, there were already operating systems written in PL/I and ALGOL, which provide better type safety and memory management as C or C++.

Unix and C success has regressed what meant to be a proper systems programming language.

There is even a paper from Adele Goldberg about this phenomena, unfortunately I cannot recall the title now.

wbhart · on Jan 2, 2012

One of the examples in the repository is an implementation of malloc.

I tend to think of Go as an attempt to redefine what a systems programming language is (hence the "modern").

I don't think that there can be an established definition of what a modern systems language is, otherwise it would no longer be termed modern. Almost by definition theorists will differ on what the essential characteristics must be.

eternalban · on Jan 2, 2012

> One of the examples in the repository is an implementation of malloc.

Right. And (imho) malloc is quite possibly an "old fashioned" way of looking at things:

   function malloc(num_bytes: nat): @byte

A reference to a byte block obtained from specifying the number of bytes! I would like to see the modern memory manger be type aware, have a very rich memory model, and allow for the propagation of application level semantics to OS (so they can cooperate in dealing with the current issue: memory access latencies and concurrency).

> I don't think that there can be an established definition of [a] modern systems language

Key word is "modern" so here is an attempt to box that a bit: There are 2 possible dimensions to the notion of modern:

1 - it addresses new hardware realities

2 - it enables "progress" in dealing with matter rendered difficult by existing paradigms of system programming.

The realities of a modern system surely include: multi-core SMP; strong likelihood to be employed as distributed nodes; virtualization. So "modern" platforms and deployment patterns are not addressed.

Second, Deca, I gather, would delegate dealing with all this to some higher level (user level) framework built on top of the language. I don't see any progress in that critical front, either.

eli_gottlieb · on Jan 2, 2012

Deca does deal with SMP and concurrency in one way: module-level variables are tagged thread-local in the bitcode by default, at least if they're mutable.

Once the "newsig" branch (it's a rewrite from scratch after the original code had troubles with multi-module compilation) with the new signature system (region, effect, type, mutability) can compile some example code again, I'm merging it back into the main branch.

rdtsc · on Jan 2, 2012

> I would like to see the modern memory manger be type aware

Because these are system languages, that would happen when hardware is type aware.

nponeccop · on Jan 2, 2012

It is not necessary. With software replacement for MMU possible (see Singularity OS), hardly any hardware safety support is a requirement nowadays. TAL is the way to go - AFAIK it is possible to make plain x86 assembly dependently typable by attaching proof witnesses to EXE.

elehack · on Jan 2, 2012

Nice to see something actually trying a CLOS-style object system. There are good ideas in there that don't get enough presence in more recent language efforts.

nickik · on Jan 2, 2012

Look at Dylan, stuff started happening again.

http://opendylan.org/

nponeccop · on Jan 2, 2012

I wonder what happened to SELF-like object systems (slots, prototypes and all that).

fanf2 · on Jan 2, 2012

JavaScript

nponeccop · on Jan 2, 2012

I mean high-performance implementations. Techniques to (statically) compile prototypes into efficient code were developed, but they are largely irrelevant to Javascript. High-performance modern implementations of JS rely on tracing JIT.

cyrus_ · on Jan 12, 2012

Pretty sure polymorphic inline caches are standard fare in both Javascript and other VMs (JVM, CLR, etc.). What other techniques are you referring to?

pwpwp · on Jan 1, 2012

Looks interesting. When I read "modern PL principles" I was afraid to read about auto-generated getters and setters or something. :P

nponeccop · on Jan 1, 2012

Looks like you have never took academia papers seriously :)

andrewflnr · on Jan 2, 2012

I see that "lispy macros" will not be supported, but one of the things that bugs me about C (in my brief experience) is the verbosity and repititiousness. Good generics will help, but I'd still like to see a better macro system than the C preprocessor.

One possible feature that stands out would be macros local to a scope. I actually did this, defined a macro right in the middle of a function to automate some error-handling junk. It felt nasty, but not quite as nasty as copy-pasting or retyping the code, as long as I don't try to re-use the name. It would be nice if my language handled stuff like this.

This is obviously not a huge issue. The language looks awesome.

obtu · on Jan 2, 2012

Macros can be band-aid to overly verbose languages, but plenty of languages are concise without (whereas C is particularly bad). Higher order functions and an expressive type system (features it shares with OCaml, Oz, etc) are where the magic is.

zem · on Jan 2, 2012

ltu discussion: lambda-the-ultimate.org/deca

agumonkey · on Jan 1, 2012

Finally a successor to the now frozen bitc

nponeccop · on Jan 1, 2012

What about Cyclone?

agumonkey · on Jan 2, 2012

I'm not sure I've even heard the name once. Looks interesting , gradually safer than C, I like that.

nponeccop · on Jan 2, 2012

Well, BitC is not too popular either. Another interesting experiment is Sing#/Singularity, which is a successful attempt to get IO performance of FreeBSD despite designing whole OS in modified dependently-typed C# with usual JIT and garbage collection. The question whether GC is applicable in memory-constrained environments is still open - I couldn't find a single research in this direction.

mvzink · on Jan 1, 2012

Looks awesome, especially the type system.

Check out http://code.google.com/p/decac/source/browse/examples/list.d...

I'd love to see some discussions on possible/practical shortcomings.

eli_gottlieb · on Jan 2, 2012

The "newsig" branch is current development. That old stuff still has older syntax and semantics.

nponeccop · on Jan 1, 2012

The type system looks novel, but it doesn't provide controlled effects or safety as the system for Disciple language described in the "Type inference and optimization in impure world" paper. So it doesn't qualify as 'awesome' IMO

eli_gottlieb · on Jan 2, 2012

The docs are out of date. I read the Disciple papers about a month after finishing the thesis, and immediately began working out how to add "back" (regions were always intended) regions and effects.

nponeccop · on Jan 2, 2012

Yeah, I have the same story. I have long wanted a "mutable" type system for my HNC project ( https://github.com/kayuri/HNC/wiki, http://code.google.com/p/inv/ ) - it was here on HackerNews about 1 year ago but got little to no interest. I asked on cstheory.stackexchange.com but got no useful answers. And then someone pointed my attention to DDC. I knew about it but I thought it was just an eager revamping of Haskell, just like many other eager haskells (speculative execution etc) out there.

andrewcooke · on Jan 1, 2012

how do you combine first class functions (and so closures) with no garbage collection? don't you end up with possibly multiple references to memory and no way of knowing who owns it?

gsg · on Jan 2, 2012

If you have a function that returns a function, basically you need to rewrite things to lift the referenced state into the outer scope. In pseudo-C:

    (void -> int) incrementer(int initial_value) {
	int count = initial_value;
	return lambda() { return count++; }
    }

might be rewritten as:

    struct hidden_environment_type {
	int count;
    }

    int incrementer_lambda(hidden_environment_type *env) {
	return env->count++;
    }

    void incrementer(hidden_environment_type *hidden_arg, int initial_value) {
	hidden_arg->count = initial_value;
    }

Now the caller will allocate a variable of type hidden_environment_type, and pass its address as the first (artificial) argument.

Note that lambda functions require a different calling convention to C function pointers, since they need to be passed a pointer to their environment in addition to their regular arguments. Introducing lambdas either requires some resolution of this complication, such as specialisation machinery (ie in C++11, templates provide the necessary specialisation). Another option is to use the environment-passing convention for every indirect call, possibly giving a minor performance hit for calling raw function pointers.

Another issue is that the size of an environment type depends on the code of a function involving it, so lambdas don't have uniform size unless some additional indirection is introduced, probably to heap storage (Apple's blocks extension to C takes this approach). This makes it either impossible or slightly more expensive to store lambda functions in data structures, depending on which implementation you choose.

After all this, your ownership/lifetime question boils down to "who is holding onto the instance of hidden_environment_type?", which is a pretty simple question to answer in C-like languages.

nponeccop · on Jan 1, 2012

If you read the dissertation, it turns out that variables are captured by value. But even if they are captured by reference - solutions are possible such as the one in C++11. The decac doesn't provide more memory safety than C, so there are no problems :) Yet another solution is to inline all higher-order functions so no closures exist at runtime.

eli_gottlieb · on Jan 2, 2012

Actually, this issue is what spurred a major evolution in the language. I hadn't had a real region or effect system before, and now I do. Why? I needed some way to register what pointers (regions) are getting captured by an existential package (aka: a lexical closure). Since regions are now registered as part of an existential package, the regions captured by a package can be constrained to suit the region of the stack up to which a package is being returned.

In short, closures now remember what pointers they capture, and you won't be allowed to return a closure over a pointer further up the stack than safe. Other than that, closures capture by value, copying the captured variable into an environment structure.

olliesaunders · on Jan 1, 2012

Isn’t the name of the language Decac?

nponeccop · on Jan 1, 2012

The dissertation says it's Deca. DecaC stands for Deca Compiler.

tom626 · on Jan 2, 2012

They should have used more C-like syntax.

skucherenko · on Jan 2, 2012

http://bikeshed.com/