I think you are misapplying that paper? This as a library is the "batteries" to C++'s no-batteries-included standard library which does not implement asynchronous coroutines at all.
The paper is much more on the side of application and system performance. But you couldn't even write such a system without a library like this providing you the tools to do so. This is much more in the domain of "basic tool for ecosystem" than "library for specific tasks". It's on the user of the tool to address the paper's question, not the builder of the tools.
You are not incorrect in stating that the primary focus of the paper is more on the application side. However, I think providers of a parallel computation infrastructure would benefit from profiling a wide range of potential use cases across several work load sizes. This could then lead to a section in a README where the baseline overhead was broken down per workload/worksize measurements and a back of the envelope estimate by an application developer would be more particularly motivated when deciding which infrastructure tool may be the best fit for their application's specific requirements.
"Concurrency != parallelism" is an important distinction in this context. The base C++ coroutines feature is not about threads or parallel processing, but rather is a generalization of the concept of "subroutine" with respect to control flow and stack usage. An example using coroutines to service many tasks (if not otherwise involving threading features) is not much different (at the level of what the CPU sees) from a single threaded implementation using a loop or continuation passing to process concurrent tasks. Performance should be identical between both the language-supported coroutines code and the manually implemented single threaded loop if the work is batched the same.
I would tend to agree with your open assertion that Concurrency != Parallelism. However, I'm not sure that it is really germane in this situation. I am aware that library in question here uses C++ coroutines for 'task encapsulation' according to the developer; however, this library is being compared directly to TBB and OpenMP which are two of the 'go-to' implementations for parallel computation. So I don't think the focus on parallelism in this comment chain is in any way inaccurate or misapplied.
The paper is much more on the side of application and system performance. But you couldn't even write such a system without a library like this providing you the tools to do so. This is much more in the domain of "basic tool for ecosystem" than "library for specific tasks". It's on the user of the tool to address the paper's question, not the builder of the tools.