[PATCH] D86147: [LangRef] WIP: Revise semantics of get.active.lane.mask
Sjoerd Meijer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 19 02:48:13 PDT 2020
SjoerdMeijer added a comment.
Thanks @efriedma and @samparker !
In D86147#2224773 <https://reviews.llvm.org/D86147#2224773>, @efriedma wrote:
> I think this makes sense, assuming you're comfortable with the current code in the ARM backend for proving that `%n - %base` doesn't overflow.
The overflow check for BTC + 1 was the main show stopper at the moment, so getting rid of that is a big step forward which I think we should progress anyway. Slightly orthogonal or different to that is the next checks we need to do. You mentioned "proving that `%n - %base` doesn't overflow", which I think refers to this part/checks in the ARM backend:
> 2. The element count needs to be sufficiently large that the decrement of element counter doesn't overflow,
where the element count is `%n` in our examples used here. I am not entirely happy with the checks currently for this because as you remarked, they are conservative but safe. Since I haven't seen it rejecting cases that we want to support, I am overall not too unhappy with this. Sam's comments related to this are also very useful and interesting:
> Just for complete clarity in my mind of the problem: 'i' defined as the maximum vector width for the loop?
Yep.
> So we would expect a loop body like this:
>
> loop:
> %base = phi i32 [ 0, %entry ], [ %base.next, %loop ]
> %count = phi i32 [ 0, %entry ], [ %count.next, %loop ]
> %mask = get.active.lane.mask(i32 %base, i32 %n)
> %base.next = add i32 %base, %vector.width
> %count.next = add nuw i32 %count, 1
> %cmp = icmp ne i32 %count.next, %vector.trip.count
> br i1 %cmp, label %loop, label %exit
Yep. That's exactly right I think. Depending on where we are in the pipeline, we don't need `%count`, but that doesn't change anything about this observation:
> And so now we need to prove that (%base + (%vector.trip.count * %vector.width)) doesn't overflow for (%vector.trip.count - 1) iterations? If so, that sounds like a task for AddRecExpr::evaluateAtIteration?
which looks like the overflow problem formulated differently (than what we currently have), and might be easier to analyse.
I am now first going to prototype replacing the BTC with the tripcount to avoid the overflow checks for BTC+1 and will put the different patches up for review if I haven't found any new surprises.
After that, I will progress the `%n - %base` checks, and see if we can improve that using Sam's suggestions.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D86147/new/
https://reviews.llvm.org/D86147
More information about the llvm-commits
mailing list