[llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions
Davide Italiano via llvm-dev
llvm-dev at lists.llvm.org
Sat Jun 17 16:07:10 PDT 2017
FYI.
On Sat, Jun 17, 2017 at 3:41 PM, Adrien Guinet via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hello all,
>
> There is a missing vectorization opportunity issue with clang 4.0 with
> the file attached.
>
> Indeed, when compiled with -O2, the "op_distance" function get
> vectorized, but not the "op" one.
>
> For information, this test case has been reduced from a file generated
> by the Pythran compiler (https://github.com/serge-sans-paille/pythran).
>
> If we take a look at the generated IR without vectorization (using the
> -fno-vectorize clang flag), we get:
>
>> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize
>
>> ; Function Attrs: norecurse uwtable
>> define void @_Z11op_distancePi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
>> ; This one is vectorized!
>> %6 = ptrtoint i32* %1 to i64
>> %7 = ptrtoint i32* %3 to i64
>> %8 = sub i64 %7, %6
>> %9 = icmp sgt i64 %8, 0
>> br i1 %9, label %10, label %26
>>
>> ; <label>:10: ; preds = %5
>> %11 = lshr exact i64 %8, 2
>> br label %12
>>
>> ; <label>:12: ; preds = %12, %10
>> %13 = phi i64 [ %23, %12 ], [ %11, %10 ]
>> %14 = phi i32* [ %22, %12 ], [ %0, %10 ]
>> %15 = phi i32* [ %21, %12 ], [ %2, %10 ]
>> %16 = phi i32* [ %20, %12 ], [ %1, %10 ]
>> %17 = load i32, i32* %16, align 4, !tbaa !1
>> %18 = load i32, i32* %15, align 4, !tbaa !1
>> %19 = add nsw i32 %18, %17
>> store i32 %19, i32* %14, align 4, !tbaa !1
>> %20 = getelementptr inbounds i32, i32* %16, i64 1
>> %21 = getelementptr inbounds i32, i32* %15, i64 1
>> %22 = getelementptr inbounds i32, i32* %14, i64 1
>> %23 = add nsw i64 %13, -1
>> %24 = icmp sgt i64 %13, 1
>> br i1 %24, label %12, label %25
>>
>> ; <label>:25: ; preds = %12
>> br label %26
>>
>> ; <label>:26: ; preds = %25, %5
>> ret void
>> }
>>
>> ; Function Attrs: norecurse uwtable
>> define void @_Z2opPi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
>> ; This one isn't!
>> %6 = ptrtoint i32* %1 to i64
>> %7 = ptrtoint i32* %3 to i64
>> %8 = sub i64 %6, %7
>> %9 = icmp sgt i64 %8, 0
>> br i1 %9, label %10, label %28
>>
>> ; <label>:10: ; preds = %5
>> %11 = lshr exact i64 %8, 2
>> br label %12
>>
>> ; <label>:12: ; preds = %12, %10
>> %13 = phi i64 [ %25, %12 ], [ %11, %10 ]
>> %14 = phi i32* [ %24, %12 ], [ %0, %10 ]
>> %15 = phi i32* [ %23, %12 ], [ %2, %10 ]
>> %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
>> %17 = inttoptr i64 %16 to i32*
>> %18 = load i32, i32* %17, align 4, !tbaa !1
>> %19 = load i32, i32* %15, align 4, !tbaa !1
>> %20 = add nsw i32 %19, %18
>> store i32 %20, i32* %14, align 4, !tbaa !1
>> %21 = getelementptr inbounds i32, i32* %17, i64 1
>> %22 = ptrtoint i32* %21 to i64
>> %23 = getelementptr inbounds i32, i32* %15, i64 1
>> %24 = getelementptr inbounds i32, i32* %14, i64 1
>> %25 = add nsw i64 %13, -1
>> %26 = icmp sgt i64 %13, 1
>> br i1 %26, label %12, label %27
>>
>> ; <label>:27: ; preds = %12
>> br label %28
>>
>> ; <label>:28: ; preds = %27, %5
>> ret void
>> }
>
> If we compile only the "op" function while activation the debug mode,
> here is the output:
>
>> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize |~/dev/epona-llvm/build_debug_shared/bin/opt -debug -debug-only loop-vectorize -O2 -S
>>
>> LV: Checking a loop in "_Z2opPi16add_zip_iteratorS0_" from <stdin>
>> LV: Loop hints: force=? width=0 unroll=0
>> LV: Found a loop:
>> LV: Found an induction variable.
>> LV: Found an induction variable.
>> LV: Found an induction variable.
>> LV: Found an unidentified PHI. %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
>> LV: Can't vectorize the instructions or CFG
>> LV: Not vectorizing: Cannot prove legality.
>> [...]
>
> The issue seems to be that the phi node "%16" can't be deduced as an
> induction variable. If we take a closer look, the cause seems to be in
> ScalarEvolution, in the createSCEV function
> (http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04770)
> :
>
>> // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can
>> // lead to pointer expressions which cannot safely be expanded to GEPs,
>> // because ScalarEvolution doesn't respect the GEP aliasing rules when
>> // simplifying integer expressions.
>
> Indeed, SCEV does not (legitimately) consider inttoptr/ptrtoint as
> no-op, and does not handle them. The thing is that, in our case, the GEP
> in %23 is thus not analyzed by SCEV, and the PHI %16 is thus not
> considered as an induction variable.
>
> To confirm this hypothesis, I created a small out-of-tree pass
> (https://github.com/aguinet/llvm-intptrcleanup) which registers before
> loop vectorization and does the following:
>
> * first, it search for phi nodes who have those properties:
> - every incoming value of the phi node is a ptrtoint instruction. The
> original pointer type of every ptrtoint instruction must be the same type T.
> - every user of this PHI node is an inttoptr instruction of the
> previous type T
> * for each of these PHI nodes, it creates a new PHI node which takes the
> original pointers as incoming values, and replace the uses of the
> inttoptr instructions that uses the original PHI node by the new one
> * it then removes the previous inttoptr instructions and the original
> PHI node
>
> The way I understand inttoptr and ptrtoint, this transformation should
> be valid (but I might have missed something!). Please note that this is
> a quick'n'dirty pass, which hasn't been heavily tested. Using this pass,
> the previous example is now vectorized correctly by the loop vectorizer.
> This can be seen by looking at the output of:
>
>> $ clang -Xclang -load -Xclang IntToPtrCleanup.so -O2 ./example/op_zip_operator.cpp -S -emit-llvm -o - -std=c++11
>
> The question that remains to me is how this should be correctly fixed:
>
> 1) Making SCEV support these no-op (in this case) inttoptr/ptrtoint
> conversions
> 2) insert the above transformation at some point in the optimization
> pipeline
> 3) clean the pass(es?) that somehow generated this case.
>
> I have to admit I'm not really sure which options is the best. 3) seems
> to be the way to go but might require some tedious work, and does not
> prevent the issue to come again in the future. 2) seems to be a quick
> patch that could be inserted in some "canonicalization" pass, let it be
> a valid transformation in the first place. I don't know SCEV enough to
> judge of the difficulty/faisability of 1).
>
> This mail is thus to discuss this issue and how to fix this properly :)
>
> Thanks everyone :)
>
> --
> Adrien Guinet.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
Davide
"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
More information about the llvm-dev
mailing list