Depends on the tools... it's usually done in C or C++ with low-level locking primitives and that can be pretty tricky to make efficient. You can easily write code that's correct and slow, or fast and error-prone.
IMO there's a lot of room for improvement on the tools side. I don't think a new architecture is needed/warranted given the improvement we could get just from better programming models and tools.
Eh, it is really not that bad.