[PATCH] D79100: [LV][TTI] Emit new IR intrinsic llvm.get.active.mask for tail-folded loops

Simon Moll via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 20 05:23:43 PDT 2020


simoll added a comment.

In D79100#2041782 <https://reviews.llvm.org/D79100#2041782>, @SjoerdMeijer wrote:

> In D79100#2041747 <https://reviews.llvm.org/D79100#2041747>, @SjoerdMeijer wrote:
>
> > In D79100#2041646 <https://reviews.llvm.org/D79100#2041646>, @samparker wrote:
> >
> > > I was expecting the intrinsic to be performing the icmp because it feels as though if a target wants an intrinsic like this, that it would want it to do //something//?
> >
> >
> > That was my initial though too, so started drafting an intrinsic that would take the induction step, backedge taken count, the comparison operator, thus replacing the icmp, and feeding its result into the masked load/stores.
> >  This however, turned out to be a massively invasive change because of the different places where the induction variable is widened which creates the induction step and icmp, and where the masking happens. This change is very minimal, makes explicit exactly the same information, and thus had my preference.
>
>
> Hmm, but thinking about it, after my initial attempt to do this I got some more VPlan plumbing experience, and I now see that I could try again and that that it is perhaps not very different. If it helps for acceptance, I can try this, but I think my previous points still stand that this change is minimal and leaves the IR intact.


The comparison really should be encapsulated in the intrinsic itself because for scalable types it is not clear how many bits the stepvector type needs to enumerate its lanes without overflow:

  %lane_ids = <vscale x 1 x i8> llvm.stepvector() ; This will overflow if 'vscale > 256' at runtime (note that a 'stepvector' intrinsic does not even exist at this point)
  %lane_mask = %ule icmp ule %lane_ids, (splat %n)

That is not a problem if you have

  %lane_mask = <vscale x 1 x i1> llvm.active.mask(i32 0, i32 %n)

We need such an intrinsic anyway for the VP expansion pass to legalize the EVL parameter of VP intrinsics for targets that have scalable types but no active vector length (tail loop predication).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79100/new/

https://reviews.llvm.org/D79100





More information about the llvm-commits mailing list