I believe the GP was not talking about shared memory spaces but shared memory buses. It would not be trivial to have 50 processors sharing a single pool of physical memory even if the processes don´t share any physical memory addresses. Making all of them agree on the contents of a shared memory space is quite a nightmare.
I'm not sure how they're doing it now, but AMD used to design their multi-core processors optimistically. The RAM would be partitioned between cores, and they would only communicate on a central bus if they needed memory that resided on a different core's partition.
This actually works quite well since the OS tends to schedule a process on the same core, so processes tend to always access the local memory partition.
If that's the way they're doing it, then it sounds bad.
Why? Because the "working quite well" is only really valid for single threads with no shared data structures. In other words, when you pretend a thread is a process.
As soon as you have more than one high load thread, the OS will want to split them across multiple processors, which means that you're now trying to share the same chunk of memory between processes. If the OS tries to keep them on the same core, though, then you've got 2 processes competing for CPU time and leaving another core free.
Then again, even turning them into processes on a modern OS wouldn't distribute the memory contention all the time; memory is usually copy on write across forked processes, which means that unless you've written the memory, reads are still contending for the same bus.
Threads are the same as processes to the Linux scheduler.
When the alternative is using a shared bus all the time, it works out nicely this way.
Most programs are single threaded, and even most multithreaded programs don't share that much between threads. Of course the scheduler is going to schedule across CPUs in a reasonable way, but if it makes sense to keep it on the same CPU, it does that. The point is to keep bus contention to a minimum, and this does that in the average case.
Yes, but the typical data access patterns differ between threads and processes.
Again, this architecture makes sense for the "lots of totally independent processes" case. The problem is that this case isn't as common as you'd expect. on Linux, if you fork a process, you're sharing memory between them until you write to it. in threads, you're sharing all read-only data unless you've explicitly duplicated it.