Because the Github post is the original source. The HN submission should be to the original source and a link to the "interesting comment(er)s" on Twitter could be in the comments here.
For a green thread implementation as featureful as a platform thread, there is no fundamental reason they can not be made almost equivalent (there are a few minor hardware overhead limitations performance wise, but they are fairly negligible). But, in practice, platform threads on most kernels have large amounts of overhead that increase their cost beyond what would be theoretically necessary, so the benefits of green threads are overstated.
However, one major advantage of green threads is that they do not need to be as featureful as platform threads. You can remove features so it fits your exact needs on a per program basis instead of having a one size does all implementation.
For instance, you could use a smaller stack size. On most operating systems, they require stacks to be a multiple of the memory mapping size to allow more robust stack guards. You can get away with tiny stacks on a green thread if it works for the program.
If you know for certain that you will not use certain registers on a green thread you can reduce the size of the task control block to omit the registers.
You can choose a simpler scheduler amongst the green threads. This turns out to be the source of very significant improvements. Task scheduling code is actually the vast majority of context switch time, not hardware costs like most people believe.
Green threads sort of pretend to be threads, they aren't really the same thing. So the short answer is yes there is a fundamental limitation.
The largest performance difference is probably that real threads must have a larger startup cost, as they will need their own copies of some resources (like stack space) that green threads can share. There's not really a way around that without changing how you define a platform thread.
Google's fiber implementation is a take on that actually. Each fiber iirc is an actual thread, but there are some special syscalls that allow context switching to another specific thread and bypassing slow scheduling logic.
This can save a lot of cpu scheduling threads, but they are still ordinary threads and stacks that take up the normal amount of memory.
On the one hand it's a mismatch between the OS' (aka C) model where it's expected to have ~1 per cpu core and they come relatively heavy with a full C stack's worth of memory pages and kernel-level data structures; and the managed languages' desire to have zillions of them offered in their API as very lightweight concepts (not quite but close to as lightweight as allocating memory).
This is probably best thought of from the Erlang perspective; where an OS level thread is called a "scheduler" and what as an Erlang programmer you think of as a thread (called an "erlang process") of which you can have quite literally hundreds of thousands or millions in a large enough app.
Interestingly you can think of it as the counterpoint to the (IIRC) early 2000s Java migration from M:N threading model (where M java threads would map on N OS threads) to 1:1 threading model (where the JVM finally gave up the belief it knew better than the OS and just mapped them 1:1).
On the other hand, there's also lots to be said about async style APIs (eg: goroutines in Go, or the recent async work in Rust; admittedly neither are managed languages) creating lots and lots of very short lived threads where in a managed language context, the JIT would be able to prove interesting properties (eg: it's extremely short-lived, blocks the "parent" thread until it finishes and uses no memory shared with concurrent threads) and treat it in a way that bypasses the whole "save CPU state in the OS" context switch.