[PATCH] D57504: RFC: EVL Prototype & Roadmap for vector predication in LLVM

Tue Feb 5 02:36:30 PST 2019

simoll added a comment.

In D57504#1384114 <https://reviews.llvm.org/D57504#1384114>, @rkruppe wrote:

> In D57504#1380587 <https://reviews.llvm.org/D57504#1380587>, @simoll wrote:
>
> > In D57504#1380504 <https://reviews.llvm.org/D57504#1380504>, @programmerjake wrote:
> >
> > > We will also need to adjust gather/scatter and possibly other load/store kinds to allow the address vector length to be a divisor of the main vector length (similar to mask vector length). I didn't check if there are intrinsics for strided load/store, those will need to be changed too, to allow, for example, storing <scalable 3 x float> to var.v in:
> >
> >
> > .. and as a side effect evl_load/evl_store are subsumed by evl_gather/evl_scatter:
> >
> >   evl.load(%p, %M, %L) ==  evl.gather(<1 x double*> %p, <256 x i1>..) ==  evl.gather(double* %p, <256 x i1> %M, i32 %L)
>
>
> This seems shaky. When generalized to scalable vector types, it means a load of a scalable vector would be `evl.gather(<1 x double*> %p, <scalable n x i1>)`, which mixes fixed and scaled vector sizes. While it's no big deal to test the divisibility, allowing "mixed scalability" increases the surface area of the feature and not in a direction that seems desirable. For example, it strongly suggests permitting `evl.add(<scalable n x i32>, <scalable n x i32>, <n x i1>, ...)` where each mask bit controls `vscale` many lanes -- quite unnatural, and not something that seems likely to ever be put into hardware.

Mixing vector types and scalable vector types is illegal and is not what i was suggesting. Rather, a scalar pointer would be passed to convey a consecutive load/store from a single address.

> And what for? I see technical disadvantages (less readable IR, needing more finicky pattern matching in the backend, more complexity in IR passes that work better on loads than on general gathers) and few if any technical advantages. It's a little conceptual simplification, but only at the level of abstraction where you don't care about uniformity.

//Less readable IR:// the address computation would become simpler, eg there is no need to synthesize a consecutive constant only to have it pattern-matched and subsumed in the backend (eg <0, 1, 2, ..., 15, 0, 1, 2, 3.., 15, 0, ....>).
//Finicky pattern matching:// it is trivial to legalize this by expanding it into a more standard gather/scatter, or splitting it into consecutive memory accesses.. we can even keep the EVL_LOAD, EVL_STORE SDNodes in the backend so you woudn't even realize that an llvm.evl.gather was used for a consecutive load on IR level.
//More complexity on IR passes for standard loads:// We are already using intrinsics here and i'd rather advocate to push for a solution that makes general passes work on predicated gather/scatter (which is not the case atm, afaik).

But again, we can leave this out of the first version and keep discussing as an extension.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57504/new/

https://reviews.llvm.org/D57504