[llvm-dev] LV: predication

Sjoerd Meijer via llvm-dev llvm-dev at lists.llvm.org
Mon May 18 05:52:50 PDT 2020


Hi,
I abandoned that approach and followed Eli's suggestion, see somewhere earlier in this thread, and emit an intrinsic that represents/calculates the active mask. I've just uploaded a new revision for D79100 that implements this.
Cheers.

________________________________
From: Simon Moll <Simon.Moll at EMEA.NEC.COM>
Sent: 18 May 2020 13:32
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>
Cc: Roger Ferrer Ibáñez <rofirrim at gmail.com>; Eli Friedman <efriedma at quicinc.com>; listmail at philipreames.com <listmail at philipreames.com>; llvm-dev <llvm-dev at lists.llvm.org>; Sander De Smalen <Sander.DeSmalen at arm.com>; hanna.kruppe at gmail.com <hanna.kruppe at gmail.com>
Subject: Re: [llvm-dev] LV: predication

On 5/5/20 12:07 AM, Sjoerd Meijer via llvm-dev wrote:
what we would like to generate is a vector loop with implicit predication, which works by setting up the the number of elements processed by the loop:

hwloop 10
  [i:4] = b[i:4] + c[i:4]

Why couldn't you use VP intrinsics and scalable types for this?

   %bval = <4 x vscale x double> call @vp.load(..., /* %evl */ 10)
   %cval = <4 x vscale x double> call @vp.load(..., /* %evl */ 10)
   %sum = <4 x vscale x double> fadd %bval, %cval
   store [..]

I see three issues with the llvm.set.loop.elements approach:
1) It is conceptually broken: as others have pointed out, optimization can move the intrinsic around since the intrinsic doesn't have any dependencies that would naturally keep it in place.
2) The whole proposed set of intrinsics is vendor specific: this causes fragmentation and i don't see why we would want to emit vendor-specific intrinsics in a generic optimization pass. Soon, we would see reports a la "your optimization caused regressions for MVE - add a check that the transformation must not touch llvm.set.loop.* or llvm.active.mask intrinsics when compiling for MVE..". I doubt that you would tolerate when that intrinsic were some removed in performance-critical code that would then remain scalar as a result.. so, i do not see the "beauty of the approach".
3) We need a reliable solution to properly support vector ISA such as RISC-V V extension and SX-Aurora and also MVE.. i don't see that reliability in this proposal.

If for whatever reason, the above does not work and seems to far away from your proposal, here is another idea to make more explicit hwloops work with the VP intrinsics - in a way that does not break with optimizations:

vector.preheader:
  %evl = i32 llvm.hwloop.set.elements(%n)

vector.body:
  %lastevl = phi 32 [%evl, %preheader, %next.evl, vector.body]
  %aval = call @llvm.vp.load(Aptr, .., %evl)
  call @llvm.vp.store(Bptr, %aval, ..., %evl)
  %next.evl = call i32 @llvm.hwloop.decrement(%evl)

Note that the way VP intrinsics are designed, it is not possible to break this code by hoisting the VP calls out of the loop: passing "%evl >= the operation's vector size" consitutes UB (see https://llvm.org/docs/LangRef.html#vector-predication-intrinsics). We can use attributes to do the same for sinking (eg don't move VP across hwloop.decrement).

- Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200518/9b7c600c/attachment.html>


More information about the llvm-dev mailing list