These are very good optimisations indeed. Inlining adds a lot more speed than people think.
I don't understand why people want closures in their languages. You basically want a function that cheats her own scope... I don't get where the big deal is.
One needs closures, that is, functional values bound to some context, for higher order functions (HOFs). For example in C++ much of STL was pretty useless until C++11 added lambdas. Many systems provide for closures in C, by requiring you register callbacks as event handlers, and these callbacks invariably have an associated client data pointer. A C callback function together with a pointer to arbitrary client data object is a hand constructed closure.
The utility of closures derives from being able to split a data context into two pieces and program the two halves separately and independently. For example for many data structures you can write a visitor function which accepts a closure which is called with each visited value. Lets say we have N data structures.
Independently you can write different calculations on those values formed incrementally one value at a time, such as addition, or, multiplication. Lets say we have M operations.
With now you can perform N * M distinct calculations whilst only writing N + M functions. You have achieved this because both halves of the computation are functions: lambda abstraction reduces a quadratic problem to a linear one.
The downside of this is that the abstraction gets in the way of the compiler re-combining the visitor HOF and the calculation function to generate more efficient code.
There's another more serious structural problem though. The client of a HOF is a callback. In C, this is very lame because functions have no state. In more advanced languages they can have state. That's an improvement but it isn't good enough because it's still a callback, and callbacks are unusable for anything complex.
A callback is a slave. The HOF that calls it is a master. Even if the context of the slave is a finite state machine, where there is theory which tells you how to maintain the state, doing so is very hard. What you really want is for the calculation part of the problem to be a master, just like the HOF itself is. You want your calculation to read its data, and maintain state in the first instance on the stack.
Many people do not believe this and answer that they have no problems with callbacks, but these people are very ignorant. Just you try to write a simple program that is called with data from a file instead of reading the file. An operating system is just one big HOF that calls back into your program with stream data, but there's an important difference: both your program and the operating system are masters: your program reads the data, it isn't a function that accepts the data as an argument.
Another common example of master/master programming is client/server paradigm. This is usually implemented with two threads or processes and a communication link.
Felix special ability is high performance control inversion. It allows you to write threads which read data, and translates them into callbacks mechanically.
Some programming languages can do this in a limited context. For example Python has iterators and you can write functions that yield without losing their context, but this only works in the special case of a sequential visitor.
Felix can do this in general. It was actually designed to support monitoring half a million phone calls with threads that could perform complex calculations such as minimal cost routing. At the time no OS could come close to launching that number of pre-emptive threads yet Felix could do the job on a 1990's desktop PC (with a transaction rate around 500K/sec where a transaction was either a thread creation or sending a null message).