True, but I suspect that without a truly global prescient scheduler it is almost...

True, but I suspect that without a truly global prescient scheduler it is almost never worth it to core switch unless you generally have really long tasks.

For an efficient core context switch the scheduler must accurately predict that the source (current) core won't be free for the duration of the full core context switch time and that the sink core will be free by the time the meta context gets there and will have been free by the time the rest gets there. Otherwise, the scheduler ends up thrashing the cpu (it is actually a bit worse as future task might need same context so you have to be aware of the future). So, for the scheduler to know this it would need to be:

- Global: The only scheduler on the system or basically rafting with all the other schedulers on the system

- Prescient: The scheduler(s) would need to be able to predict all tasks, thier context, and work time per task perfectly. Which could really could only happen when everything is static and hence deterministic.

For example, I think most tasks people are throwing at async are web requests. Most actually take the core an order of magnitude shorter time to compute then the time it takes passing the context from one core to another and they are all unpredictable to the scheduler. In this scenario I could see the scheduler taking up the majority of computational time on the system. So turn on multi-threading + async on a quad core and you will get worse bandwidth and latency(always) for all your pains.

EDIT: Although this single data point would tell me I am wrong (see description):

https://www.youtube.com/watch?v=IG-wGXENTt8