[LLVMdev] Missed vectorization opportunities?
Vaivaswatha N
vaivaswatha at yahoo.co.in
Thu Apr 23 00:11:20 PDT 2015
Hi,
An update on my experiment wih the first loop:
For the first loop, if I change the pragma to "#pragma clang loop vectorize_width(4) interleave_count(2)", and force the legality check in isStridedPtr(), the loop gets vectorized and runs faster too.
So in summary,the issue with vectorizing the first loop seems to be (1) Too strict legality check that does not understand that index cannot really overflow and (2) Cost computation that says its not profitable to vectorize the loop.
Thanks,
- Vaivaswatha
On Thursday, 23 April 2015 11:05 AM, Vaivaswatha N <vaivaswatha at yahoo.co.in> wrote:
Thank you Sanjoy for the explanation. Is it worth filing a bug over this at this point?
Hi James,>Your first example is similar to the strided loops that Hao is working on vectorizing with his indexed load intrinsics.I'm curious. For the example I mentioned, legality check fails because the corresponding SCEV doesn't have nsw set and hence isStridedPtr() returns false. In reality the induction variable has a statically known bound and it cannot overflow, so it is really legal to vectorize the loop. Did you face this problem (and solve it) ?
Thanks everyone for your response and clarification.
- Vaivaswatha
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Thursday, 23 April 2015 12:22 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> I expect SCEV treats them differently because of MAX_INT handling.
> Look as the definedness of both if n == MAX_INT. The first has
> undefined behavior, the second does not.
> If you change the second into the first, you introduce undefined behavior.
> (or maybe it's implementation defined, but whatever)
To elaborate a little further on this:
In the first loop, you can never enter the loop with "j == INT_SMAX"
since INT_SMAX will never be < anything. This means j + 1 cannot
overflow. In the second loop you /can/ enter the loop with "j ==
INT_SMAX" if "n == INT_SMAX" so j + 1 can potentially overflow.
Ideally SCEV should be able to infer the nsw'ness of the additions
from the nsw bits in the source IR; but that's more complex that it
sounds since SCEV does not have a notion of control flow within the
loop and it hashes SCEVs by the operands and not by the nsw/nuw bits.
Crude example:
define void @x(i32 %a, i32 %b, i1 %c) {
entry:
%m = add i32 %a, %b
br i1 %c, label %do, label %dont
do:
%m1 = add nsw i32 %a, %b
br label %dont
dont:
ret void
}
both %m and %m1 get mapped to the *same* SCEV, and you cannot mark
that SCEV as nsw even though %m1 is nsw.
-- Sanjoy
>
>
> This is the:
> if (!getUnsignedRange(RHS).getUnsignedMax().isMaxValue()) {
>
> check in that function simplify.
>
> But you should file a bug anyway.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150423/021bedc3/attachment.html>
More information about the llvm-dev
mailing list