[PATCH] D57504: RFC: Prototype & Roadmap for vector predication in LLVM

Wed Feb 12 08:42:42 PST 2020

lkcl added a comment.

In D57504#1854330 <https://reviews.llvm.org/D57504#1854330>, @simoll wrote:

>

> But you need //some// such analysis, and must not hoist when those tricks all fail, because there's no general guarantee that the condition you're hoisting out of is independent from "%evl > element count?". A trivial (if pathological) example of this is when the condition never true in any execution and the EVL value is larger than W. A more real-world example, if you insist, comes from one proposed way to port hand-crafted fixed-width SIMD algorithms to RVV: check at runtime whether vector registers are at least as large as required by the SIMD algorithm, if so set the VL register to a constant and execute vector code,

ah... ah... you can't.  at least, the last version of the RVV spec that i read (7?) still explicity states, "regardless of what *you* want VL to be set to, the *hardware* gets to decide exactly what value *actually* goes into the VL CSR".

the only guarantee that you have is that you will find that if you set VL to a non-zero value, you will find that, when you read it immediately after setting, it will be non-zero.

this specifically *does not matter* on RVV (sigh: when RVV is not done on top of the FP regfile, and there is a separate vector regfile), because the vector regfile is specifically designed to refer to *vectors*... not to invididual elements.

for SimpleV, because we designed it right from the start to sit on top of the int and fp regfiles, what VL is set to *really does matter*, because it defines precisely and exactly how many of the scalar registers are to be used *as* "vector elements".

thus, for RVV, when converting SIMD assembly patterns to RVV, you absolutely *must* use the "loop pattern" described in https://www.sigarch.org/simd-instructions-considered-harmful/

if you try to hard-code-set VL to anything specific, this has the (unintended) side-effect of destroying the entire paradigm on which RVV is based, namely that you are not *supposed* to know the actual hardware vector "lane" size... at all.  so, if you had really minimalist hardware which only *had* one actual "Lane", then if you tried to explicitly set VL=4, that hardware is absolutely hosed, as it is literally unable to support, at the hardware level, the three extra lanes requested/demanded.

this is why you have to "ask" for a VL, and the instruction will put the *actual* number of elements that VL got set to into a destination register, because you need to subtract that number of (processed) elements from the loop.

of course, with the idea of dropping RVV on top of the FP regfile that goes somewhat out the window.  however i'm not... welcome, shall we say... in the RV WG participation, so you'd need to take this up with them, directly. and try not to mention my name too much because they're quite likely to sabotage things (to everyone's detriment) just because i was the one that came up with the insights.  *shakes head*...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57504/new/

https://reviews.llvm.org/D57504