I used them at the (US) Naval Research Laboratory, programming in a dialect of C called C*. This automatically distributed arrays among the many processors, similar to how modern Fortran can work with coarrays.
If the problem was very data-parallel, one could get nearly perfect linear speedups.
If the problem was very data-parallel, one could get nearly perfect linear speedups.