[llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

Sat Jun 17 16:07:10 PDT 2017

FYI.

On Sat, Jun 17, 2017 at 3:41 PM, Adrien Guinet via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hello all,
>
> There is a missing vectorization opportunity issue with clang 4.0 with
> the file attached.
>
> Indeed, when compiled with -O2, the "op_distance" function get
> vectorized, but not the "op" one.
>
> For information, this test case has been reduced from a file generated
> by the Pythran compiler (https://github.com/serge-sans-paille/pythran).
>
> If we take a look at the generated IR without vectorization (using the
> -fno-vectorize clang flag), we get:
>
>> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize
>
>> ; Function Attrs: norecurse uwtable
>> define void @_Z11op_distancePi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
>> ; This one is vectorized!
>>   %6 = ptrtoint i32* %1 to i64
>>   %7 = ptrtoint i32* %3 to i64
>>   %8 = sub i64 %7, %6
>>   %9 = icmp sgt i64 %8, 0
>>   br i1 %9, label %10, label %26
>>
>> ; <label>:10:                                     ; preds = %5
>>   %11 = lshr exact i64 %8, 2
>>   br label %12
>>
>> ; <label>:12:                                     ; preds = %12, %10
>>   %13 = phi i64 [ %23, %12 ], [ %11, %10 ]
>>   %14 = phi i32* [ %22, %12 ], [ %0, %10 ]
>>   %15 = phi i32* [ %21, %12 ], [ %2, %10 ]
>>   %16 = phi i32* [ %20, %12 ], [ %1, %10 ]
>>   %17 = load i32, i32* %16, align 4, !tbaa !1
>>   %18 = load i32, i32* %15, align 4, !tbaa !1
>>   %19 = add nsw i32 %18, %17
>>   store i32 %19, i32* %14, align 4, !tbaa !1
>>   %20 = getelementptr inbounds i32, i32* %16, i64 1
>>   %21 = getelementptr inbounds i32, i32* %15, i64 1
>>   %22 = getelementptr inbounds i32, i32* %14, i64 1
>>   %23 = add nsw i64 %13, -1
>>   %24 = icmp sgt i64 %13, 1
>>   br i1 %24, label %12, label %25
>>
>> ; <label>:25:                                     ; preds = %12
>>   br label %26
>>
>> ; <label>:26:                                     ; preds = %25, %5
>>   ret void
>> }
>>
>> ; Function Attrs: norecurse uwtable
>> define void @_Z2opPi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
>> ; This one isn't!
>>   %6 = ptrtoint i32* %1 to i64
>>   %7 = ptrtoint i32* %3 to i64
>>   %8 = sub i64 %6, %7
>>   %9 = icmp sgt i64 %8, 0
>>   br i1 %9, label %10, label %28
>>
>> ; <label>:10:                                     ; preds = %5
>>   %11 = lshr exact i64 %8, 2
>>   br label %12
>>
>> ; <label>:12:                                     ; preds = %12, %10
>>   %13 = phi i64 [ %25, %12 ], [ %11, %10 ]
>>   %14 = phi i32* [ %24, %12 ], [ %0, %10 ]
>>   %15 = phi i32* [ %23, %12 ], [ %2, %10 ]
>>   %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
>>   %17 = inttoptr i64 %16 to i32*
>>   %18 = load i32, i32* %17, align 4, !tbaa !1
>>   %19 = load i32, i32* %15, align 4, !tbaa !1
>>   %20 = add nsw i32 %19, %18
>>   store i32 %20, i32* %14, align 4, !tbaa !1
>>   %21 = getelementptr inbounds i32, i32* %17, i64 1
>>   %22 = ptrtoint i32* %21 to i64
>>   %23 = getelementptr inbounds i32, i32* %15, i64 1
>>   %24 = getelementptr inbounds i32, i32* %14, i64 1
>>   %25 = add nsw i64 %13, -1
>>   %26 = icmp sgt i64 %13, 1
>>   br i1 %26, label %12, label %27
>>
>> ; <label>:27:                                     ; preds = %12
>>   br label %28
>>
>> ; <label>:28:                                     ; preds = %27, %5
>>   ret void
>> }
>
> If we compile only the "op" function while activation the debug mode,
> here is the output:
>
>> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize |~/dev/epona-llvm/build_debug_shared/bin/opt -debug -debug-only loop-vectorize -O2 -S
>>
>> LV: Checking a loop in "_Z2opPi16add_zip_iteratorS0_" from <stdin>
>> LV: Loop hints: force=? width=0 unroll=0
>> LV: Found a loop:
>> LV: Found an induction variable.
>> LV: Found an induction variable.
>> LV: Found an induction variable.
>> LV: Found an unidentified PHI.  %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
>> LV: Can't vectorize the instructions or CFG
>> LV: Not vectorizing: Cannot prove legality.
>> [...]
>
> The issue seems to be that the phi node "%16" can't be deduced as an
> induction variable. If we take a closer look, the cause seems to be in
> ScalarEvolution, in the createSCEV function
> (http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04770)
> :
>
>>  // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can
>>  // lead to pointer expressions which cannot safely be expanded to GEPs,
>>  // because ScalarEvolution doesn't respect the GEP aliasing rules when
>>  // simplifying integer expressions.
>
> Indeed, SCEV does not (legitimately) consider inttoptr/ptrtoint as
> no-op, and does not handle them. The thing is that, in our case, the GEP
> in %23 is thus not analyzed by SCEV, and the PHI %16 is thus not
> considered as an induction variable.
>
> To confirm this hypothesis, I created a small out-of-tree pass
> (https://github.com/aguinet/llvm-intptrcleanup) which registers before
> loop vectorization and does the following:
>
> * first, it search for phi nodes who have those properties:
>   - every incoming value of the phi node is a ptrtoint instruction. The
> original pointer type of every ptrtoint instruction must be the same type T.
>   - every user of this PHI node is an inttoptr instruction of the
> previous type T
> * for each of these PHI nodes, it creates a new PHI node which takes the
> original pointers as incoming values, and replace the uses of the
> inttoptr instructions that uses the original PHI node by the new one
> * it then removes the previous inttoptr instructions and the original
> PHI node
>
> The way I understand inttoptr and ptrtoint, this transformation should
> be valid (but I might have missed something!). Please note that this is
> a quick'n'dirty pass, which hasn't been heavily tested. Using this pass,
> the previous example is now vectorized correctly by the loop vectorizer.
> This can be seen by looking at the output of:
>
>> $ clang -Xclang -load -Xclang IntToPtrCleanup.so -O2 ./example/op_zip_operator.cpp -S -emit-llvm -o - -std=c++11
>
> The question that remains to me is how this should be correctly fixed:
>
> 1) Making SCEV support these no-op (in this case) inttoptr/ptrtoint
> conversions
> 2) insert the above transformation at some point in the optimization
> pipeline
> 3) clean the pass(es?) that somehow generated this case.
>
> I have to admit I'm not really sure which options is the best. 3) seems
> to be the way to go but might require some tedious work, and does not
> prevent the issue to come again in the future. 2) seems to be a quick
> patch that could be inserted in some "canonicalization" pass, let it be
> a valid transformation in the first place. I don't know SCEV enough to
> judge of the difficulty/faisability of 1).
>
> This mail is thus to discuss this issue and how to fix this properly :)
>
> Thanks everyone :)
>
> --
> Adrien Guinet.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare