[PATCH] D57504: RFC: EVL Prototype & Roadmap for vector predication in LLVM

Mon Feb 4 12:59:46 PST 2019

rkruppe added inline comments.

================
Comment at: include/llvm/IR/Intrinsics.td:1132
+                                LLVMMatchType<0>,
+                                LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+                                llvm_i32_ty]>;
----------------
simoll wrote:
> programmerjake wrote:
> > simoll wrote:
> > > programmerjake wrote:
> > > > We will need to change the mask parameter length to allow for mask lengths that are a divisor of the main vector length.
> > > > See http://lists.llvm.org/pipermail/llvm-dev/2019-February/129845.html
> > > Can we make the vector length operate at the granularity of the mask?
> > > 
> > > In your case [1] that would mean that the AVL refers to multiples of the short element vector (eg `<3 x float>`).
> > > 
> > > [1] http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html
> > I was initially assuming that the vector length would be in the granularity of the mask.
> > That would work for my ISA extension. I think that would work for the RISC-V V extension, would have to double check, or get someone who's working on it to check. I don't think that would work without needing to multiply the vector length on AVX512, assuming a shift is used to generate the mask. I have no clue for ARM SVE or other architectures.
> So we are on the same page here.
> 
> What i actually had in mind is how this would interact with scalable vectors,e.g.:
> 
>    <scalable 2 x float> evl.fsub(<scalable 2 x float> %x, <scalable 2 x float> %y, <scalable 2 x i1> %M, i32 %L) 
> 
> In that case, the vector length should refer to packets of two elements. That would be a perfect match for NEC SX-Aurora, where AVL always refers to 64 bit elements (eg there is a packed float mode).
That definitely wouldn't work for RISC-V V, as its vector length register counts in elements, not bigger packets. For example, (in the latest public version of the spec at the moment, v0.6-draft), `<scalable 4 x i8>` is a natural type for a vector of 8-bit integers. You might use it in a loop that doesn't need 16- or 32-bit elements, and operations on it have to interpret the active vector length as being in terms of 8 bit elements to match the hardware, not in terms of 32 bit elements.

Moreover, it seems incongruent with the scalable vector type proposal to treat vlen as being in terms of `vscale` rather than in terms of vector elements. `<scalable n x T>` is simply an `(n * vscale)`-element vector and that the `vscale` factor is not known at compile time is inconsequential for numbering or interpreting the lanes (e.g., lane indices for shuffles or element inserts/extracts go from 0 to `(n * vscale) - 1`). In fact, I believe it is currently the case that scalable vectors can be legalized by picking some constant for vscale (e.g., 1) and simply replacing every `<scalable n x T>` with `<(CONST_VSCALE * n) x T>` and every call to `llvm.vscale()` with that constant.

I don't think it would be a good match for SVE or other "predication only" architectures either: as Jacob pointed out for the case of AVX-512, it seems to require an extra multiplication/shift to generate the mask corresponding to the vector length. This is probably secondary, but it feels like another hint that this line of thought is not exactly a smooth, natural extension.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57504/new/

https://reviews.llvm.org/D57504