<table cellspacing="0" cellpadding="0" border="0"><tr><td valign="top"><div>Hi,<br />An update on my experiment wih the first loop:<br />For the first loop, if I change the pragma to "#pragma clang loop vectorize_width(4) interleave_count(2)", and force the legality check in isStridedPtr(), the loop gets vectorized and runs faster too.<br />So in summary,the issue with vectorizing the first loop seems to be (1) Too strict legality check that does not understand that index cannot really overflow and (2) Cost computation that says its not profitable to vectorize the loop.<br />Thanks,<br /> - Vaivaswatha <br /><br /><br /> <br /><br /> On Thursday, 23 April 2015 11:05 AM, Vaivaswatha N <vaivaswatha@yahoo.co.in> wrote:<br /> <br /><br /> Thank you Sanjoy for the explanation. Is it worth filing a bug over this at this point?<br />Hi James,>Your first example is similar to the strided loops that Hao is working on vectorizing with his indexed load
intrinsics.I'm curious. For the example I mentioned, legality check fails because the corresponding SCEV doesn't have nsw set and hence isStridedPtr() returns false. In reality the induction variable has a statically known bound and it cannot overflow, so it is really legal to vectorize the loop. Did you face this problem (and solve it) ?<br />Thanks everyone for your response and clarification.<br /><br /><br /> - Vaivaswatha <br /><br /><br /><br />_______________________________________________<br />LLVM Developers mailing list<br />LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu<br />http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br /><br /><br /> On Thursday, 23 April 2015 12:22 AM, Sanjoy Das <sanjoy@playingwithpointers.com> wrote:<br /> <br /><br /> > I expect SCEV treats them differently because of MAX_INT handling.<br />> Look as the definedness of both if n == MAX_INT. The first has<br />> undefined behavior, the second does not.<br
/>> If you change the second into the first, you introduce undefined behavior.<br />> (or maybe it's implementation defined, but whatever)<br /><br />To elaborate a little further on this:<br /><br />In the first loop, you can never enter the loop with "j == INT_SMAX"<br />since INT_SMAX will never be < anything. This means j + 1 cannot<br />overflow. In the second loop you /can/ enter the loop with "j ==<br />INT_SMAX" if "n == INT_SMAX" so j + 1 can potentially overflow.<br /><br />Ideally SCEV should be able to infer the nsw'ness of the additions<br />from the nsw bits in the source IR; but that's more complex that it<br />sounds since SCEV does not have a notion of control flow within the<br />loop and it hashes SCEVs by the operands and not by the nsw/nuw bits.<br />Crude example:<br /><br />define void @x(i32 %a, i32 %b, i1 %c) {<br /> entry:<br /> %m = add i32 %a, %b<br /> br i1 %c, label %do, label %dont<br /><br /> do:<br /> %m1 =
add nsw i32 %a, %b<br /> br label %dont<br /><br /> dont:<br /> ret void<br />}<br /><br />both %m and %m1 get mapped to the *same* SCEV, and you cannot mark<br />that SCEV as nsw even though %m1 is nsw.<br /><br />-- Sanjoy<br /><br /><br /><br />><br />><br />> This is the:<br />> if (!getUnsignedRange(RHS).getUnsignedMax().isMaxValue()) {<br />><br />> check in that function simplify.<br />><br />> But you should file a bug anyway.<br />> _______________________________________________<br />> LLVM Developers mailing list<br />> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu<br />> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br /><br /><br /> </div></td></tr></table>