The terminology "vector processor" refers to a completely different type of arch...

dahart · on Nov 30, 2021

What definition of vector processor are you thinking of? Wikipedia’s definition appears to agree with the parent, and even states “Modern graphics processing units […] can be considered vector processors” https://en.wikipedia.org/wiki/Vector_processor

avianes · on Nov 30, 2021

Yes, the definition matches. But that doesn't mean that the architecture and the micro-architecture used is similar.

So.. Yes! We can say that these architectures are some kind of "vector processors", but it will be ambiguous regarding the programming model and the architecture used.

dahart · on Nov 30, 2021

I’m interested to hear what you mean by “vector processor”. What does that imply to the lay person, and how is it different enough to be confusing when applied to GPUs? What does the term imply to you in terms of architecture?

avianes · on Nov 30, 2021

The term "vector processor" generally refers to a processor with a traditional programming model, but which features a "vector" unit capable of performing operations on large vectors of fixed or variable size. It can occupy the vector unit for a significant amount of cycles. The RISC-V Vector extension is a good example of what makes a vector processor.

However, and this is a source of confusion, the standard definition is abstract enough so that many other architecture can be called "vector processor".

Regarding modern GPGPU (with SIMT architecture) we are dealing with a programming model named SIMT (Single-Instruction Multiple-Thread) in which the programmer must take into account that there is one code for multiple thread (a block of thread), each instructions will be executed by several "core" simultaneously.

This has implications, the hardware has a limited number of "core" so it must split the block of threads into sub-blocks called wraps (1 wrap = 32 threads on Nvidia machines). When we offload compute to a GPU we send him several blocks of wrap. And all wraps will be executed progressively, the GPU scheduler's job is to pick a ready wrap, execute one instruction from the wrap, then pick a new wrap and repeat.

This means that wraps from a block have the ability to get out of sync. With a classical vector processor this kind of situation is not possible (or not visible architecturally), it is not possible for a portion of the vector to be 5 instructions ahead for example. Therefore, GPU includes instructions to resynchronize wraps from a group, while vector processors don't need this. But it also means that you expose much more unintentional dependency between data with a vector processor.

dahart · on Nov 30, 2021

> the standard definition is abstract enough

It seems like you’re making the case that the term “vector processor” should be interpreted as something general, and not something specific? Since the Cray vector processor predates RISC-V by ~35 years, isn’t the suggestion above to use it the way Cray did fairly reasonable? It doesn’t seem like it’s really adding much confusion to include GPUs under this already existing umbrella term...

> With a classical vector processor […] it is not possible for a portion of the vector to be 5 instructions ahead

Just curious here, the WP article talks about how one difference between “vector” processing and SIMD is that vector computers are about variable length vectors by design, where SIMD vectors are usually fixed length. How does that square up with what you’re saying about not having any divergence?

This feels like it’s comparing apples to oranges a little… a SIMT machine has different units ahead of others because they’re basically mini independent co-processors. If you have a true vector processor according to your definition, but simply put several of them together, then you would end up with one being ahead of the others. That’s all a modern GPU SIMT machine is: multiple vector processors on one chip, right? It seems like time and scale and Moore’s Law would inevitably have turned vector processors into a machine that can handle divergent and/or independent blocks of execution.

BTW, not sure if it was just auto-correct, but you mean “warp” and not “wrap”, right?

avianes · on Dec 1, 2021

> BTW, not sure if it was just auto-correct, but you mean “warp” and not “wrap”, right?

Oh, sorry I totally meant "warp" not "wrap", I don't know how I introduced that typo.

> It seems like you’re making the case that the term “vector processor” should be interpreted as something general, and not something specific?

Not exactly, I am in favor of using a specific term, and in particular keeping the use of the term "vector processor" for machines similar to the Cray ones. But I admit that the term is used in a more abstract way. For instance the Intel AVX extension means "Advanced Vector Extensions" while it is definitely a SIMD extension. Computer architecture lacks of accurate/strict definitions, probably because there are often many possible implementations of the same idea. Then we sometimes find ourselves using words that are a bit disconnected from their original idea. The architectures that Cray's engineers came up with have not much to do with the modern SIMT architecture. That's why I find it confusing.

> vector computers are about variable length vectors by design, where SIMD vectors are usually fixed length. How does that square up with what you’re saying about not having any divergence?

Not sure if I understood the question correctly. But after execution of a vector or SIMD instruction, the vector or SIMD register is seen as containing the outcome of the operation, it's not possible to observe in the register a temporary value or an old value because it has not been processed yet. While with a SIMT programming model and architecture it is possible to observe it if we omit synchronization. This is a very clear difference in observable architectural states.

> If you have a true vector processor according to your definition, but simply put several of them together, then you would end up with one being ahead of the others.

Of course you can reproduce a model similar to SIMT with a lot of vector or scalar processors by changing the programming model and the architecture significantly. But then it seems reasonable to me to call that an SIMT programming model & architecture

> That’s all a modern GPU SIMT machine is: multiple vector processors on one chip, right?

Sort of.. But splitting the compute into groups and warps is not negligible, it implies big differences in the architecture, in uarch, in design, and programing model. So it makes sense to give a different name when there are many significant changes.