[PATCH] D57504: RFC: Prototype & Roadmap for vector predication in LLVM

Thu Feb 6 17:09:25 PST 2020

lkcl added a comment.

In D57504#1862202 <https://reviews.llvm.org/D57504#1862202>, @andrew.w.kaylor wrote:

> > TTI would tell front ends and optimizations that `%evl` is a no-go for your target. Is this enough discouragement?
>
> In theory, yes. In practice, it will depend on how optimizations make use of that information. Your explanation of how the ExpandVectorPredicationPass will make this palatable to the backend worries me a little, because it essentially means that optimizations don't have to care that the target doesn't support this feature. They can generate IR that uses it and EVPP will smooth over it. Obviously, we could handle this on a case-by-case basis as it comes up. As you say, TTI will provide sufficient information for passes to make the decision.

ok so it is starting to sink in what is being proposed: a *mainstream* pass in llvm that *always* puts in vector predication, and then various backends, depending on hardware capability, will either have passes that turn that mandatory vector predication into scalar loops, or SIMD / SIMT (getting rid of %evl in the process), or, in the case of Cray-inspired hardware, calling SETVL assembly code.

if that's accurate, then wow that's quite bold and has a lot of advantages.

i have a suggestion.  for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.

however for SIMD (ARM, x86, other) i have a suspicion that being able to "hint" the best size of SIMD instruction width to use is probably a good idea.

if a SIMD width hint is available it happens to be synonymous with SimpleV's (hard) requirent to be able to specify MVL.

a scalar system would ignore both %evl and %mvl (or better mpvl - max partition vector length) i.e passes woule eliminate them.

a SIMD system would use %mpvl to choose the best SIMD opcodes for the job, the passes would subdivide work into such chunks then generate the suitablr cornercase last loop as well, *ignoring* %evl in the process.

SimpleV would use both to generate opcodes, coordinating with the regfile allocator, correctly and efficiently.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57504/new/

https://reviews.llvm.org/D57504