[PATCH] D69891: [VP,Integer,#1] Vector-predicated integer intrinsics

Mon Nov 18 09:29:34 PST 2019

simoll marked an inline comment as done.
simoll added inline comments.

================
Comment at: llvm/docs/LangRef.rst:14702
+The '``llvm.vp.add``' intrinsic performs integer addition (:ref:`add <i_add>`) of the first and second vector operand on each enabled lane.
+The result on disabled lanes is undefined.
+
----------------
cameron.mcinally wrote:
> simoll wrote:
> > sdesmalen wrote:
> > > simoll wrote:
> > > > sdesmalen wrote:
> > > > > jdoerfert wrote:
> > > > > > simoll wrote:
> > > > > > > sdesmalen wrote:
> > > > > > > > Have you considered adding an extra argument to specify the value of the false lanes, similar to how this is currently done for `llvm.masked.load`? 
> > > > > > > > By passing `undef`, it would have similar behaviour as the current definition, yet it would remove the need to add explicit `select` statements for handling merging- or zeroing predication if an instruction supports it. For SVE for example, most instructions have either merging or zeroing predication and the intrinsics expose the merging/zeroing/undef predication directly in the C/C++ intrinsic API. I think it would be really nice to have that capability represented in the vp intrinsics as well.
> > > > > > > Yes i considered that. However, as you suggested you can express this without loss with a  `select` and i'd like to keep the intrinsics simple.
> > > > > > FWIW, I like the idea of simple intrinsics and explicit selection.
> > > > > If we want to keep the vp intrinsics simple, shouldn't we reconsider the need for `%evl` as an explicit parameter?
> > > > > 
> > > > > That value can be folded into the predicate using a special intrinsic that enables the lanes upto `%evl`, e.g.
> > > > > 
> > > > > `<256 x i1> llvm.vp.enable.lanes(<256 x i1> %p, i32 %evl)`
> > > > > 
> > > > > If any operation needs `%evl` as an explicit parameter, it would be a matter of having a pattern that matches the enable.lanes intrinsic, similar to how merging/zeroing predication can be implemented by matching the selects.
> > > > > If any operation needs %evl as an explicit parameter, it would be a matter of having a pattern that matches the enable.lanes intrinsic, similar to how merging/zeroing predication can be implemented by matching the selects.
> > > > 
> > > > The SX-Aurora and RiSC-V V extension natively support an explicit vector length parameter just as they support predicated vector. Mask and EVL should be treated the same: if there is a vector mask parameter there should also be an EVL parameter.
> > > I understand that certain instructions take an explicit vector length parameter, but that doesn't really explain why an arch-independent intrinsic needs to expose this. Is there any reason that modelling this using something like `llvm.vp.enable.lanes` won't work?
> > That's already two architectures that do support EVL on **all** vector instructions and it is vital to model it. EVL has performance implications (smaller EVL -> lower latency; no need to compute and store a mask where EVL can be used) . For SVE/MVE, you can simply turn the feature off by passing `i32 -1`.
> > 
> > From the perspective of RVV and SX-Aurora, saying we do not need EVL is equivalent to saying we do not need vector predication.
> > Yes i considered that. However, as you suggested you can express this without loss with a select and i'd like to keep the intrinsics simple.
> 
> Having an explicit `passthru` is a good idea. Keeping the select glued to the intrinsics may be difficult. For example:
> 
> `select %mask, %r, undef`
> 
> It would be easy for a select peep to replace this with %r, since the masked bits are undefined. But for a situation like a trapping instruction, we wouldn't want the the masked elements to execute.
> > select %mask, %r, undef
This is the current behavior - the values on masked-off lanes (by EVL or the mask) are `undef`. 

My problem with passthru is that it encourages people to specify arbitrary values on masked-off lanes and to do so often. Yes, for some combination of target architecture, instruction, passthru value and type it might be a perfect match for one machine instruction . But, for most combinations you will have to emit an extra `select` anyway.

Besides, if the select is explicit and can be folded into another operation to simplify it, then that is actually a good thing.
But there is more, for things like coalescing predicated registers and vector register spilling, it is helpful to have undef-on-masked-off and a separate select instruction that can be re-scheduled.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69891/new/

https://reviews.llvm.org/D69891