[PATCH] D86147: [LangRef] WIP: Revise semantics of get.active.lane.mask

Sjoerd Meijer via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 19 02:48:13 PDT 2020


SjoerdMeijer added a comment.

Thanks @efriedma and @samparker !

In D86147#2224773 <https://reviews.llvm.org/D86147#2224773>, @efriedma wrote:

> I think this makes sense, assuming you're comfortable with the current code in the ARM backend for proving that `%n - %base` doesn't overflow.

The overflow check for BTC + 1 was the main show stopper at the moment, so getting rid of that is a big step forward which I think we should progress anyway.  Slightly orthogonal or different to that is the next checks we need to do. You mentioned "proving that `%n - %base` doesn't overflow", which I think refers to this part/checks in the ARM backend:

> 2. The element count needs to be sufficiently large that the decrement of element counter doesn't overflow,

where the element count is `%n` in our examples used here. I am not entirely happy with the checks currently for this because as you remarked, they are conservative but safe. Since I haven't seen it rejecting cases that we want to support, I am overall not too unhappy with this. Sam's comments related to this are also very useful and interesting:

> Just for complete clarity in my mind of the problem: 'i' defined as the maximum vector width for the loop?

Yep.

> So we would expect a loop body like this:
>
> loop:
>  %base = phi i32 [ 0, %entry ], [ %base.next, %loop ]
>  %count = phi i32 [ 0, %entry ], [ %count.next, %loop ]
>  %mask = get.active.lane.mask(i32 %base, i32 %n)
>  %base.next = add i32 %base, %vector.width
>  %count.next = add nuw i32 %count, 1
>  %cmp = icmp ne i32 %count.next, %vector.trip.count
>  br i1 %cmp, label %loop, label %exit

Yep. That's exactly right I think. Depending on where we are in the pipeline, we don't need `%count`, but that doesn't change anything about this observation:

> And so now we need to prove that (%base + (%vector.trip.count * %vector.width)) doesn't overflow for (%vector.trip.count - 1) iterations? If so, that sounds like a task for AddRecExpr::evaluateAtIteration?

which looks like the overflow problem formulated differently (than what we currently have), and might be easier to analyse.

I am now first going to prototype replacing the BTC with the tripcount to avoid the overflow checks for BTC+1 and will put the different patches up for review if I haven't found any new surprises.
After that, I will progress the `%n - %base` checks, and see if we can improve that using Sam's suggestions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86147/new/

https://reviews.llvm.org/D86147



More information about the llvm-commits mailing list