<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p>Hi, Adrien,</p>

    <p>Thanks for reporting this. I recommend that you file a bug report

      at <a class="moz-txt-link-freetext" href="https://bugs.llvm.org/">https://bugs.llvm.org/</a></p>

    <p>Whenever I see reports of missed optimization opportunities in

      the face of ptrtoint/inttoptr, my first question is: why are these

      instructions present in the first place? At the IR level, use of

      inttoptr is highly discouraged. Our aliasing analysis, for

      example, does not look through them, and so you'll generally see a

      lot of missed optimizations when they're around.</p>

    <p>In this case, inttoptr seems to be introduced by SROA. SROA

      should not be introducing inttoptr, but rather should be using

      GEPs on i8* (at least), to avoid introducing pointers that our AA

      can't analyze.</p>

    <p>Beyond that, if we need to handle inttoptr/ptrtoint in SCEV, then

      maybe there's a way to make it smarter about the expressions with

      which it can deal. I'm actually not sure to what "aliasing rules"

      the comment you quote below is referring. I can certainly

      understand not being able to place "inbounds" on some generated

      GEPs, but otherwise this seems non-obvious to me (i.e. either the

      expander can identify a base pointer from which to generate the

      GEP, or it can't, in which case it needs to generate a inttoptr).</p>

    <p>Sanjoy, thoughts?</p>

    <p> -Hal<br>

    </p>

    <div class="moz-cite-prefix">On 06/17/2017 05:41 PM, Adrien Guinet

      via llvm-dev wrote:<br>

    </div>

    <blockquote

      cite="mid:a8519902-ec18-7913-b77e-318d39283125@quarkslab.com"

      type="cite">

      <pre wrap="">Hello all,

There is a missing vectorization opportunity issue with clang 4.0 with

the file attached.

Indeed, when compiled with -O2, the "op_distance" function get

vectorized, but not the "op" one.

For information, this test case has been reduced from a file generated

by the Pythran compiler (<a class="moz-txt-link-freetext" href="https://github.com/serge-sans-paille/pythran">https://github.com/serge-sans-paille/pythran</a>).

If we take a look at the generated IR without vectorization (using the

-fno-vectorize clang flag), we get:

</pre>

      <blockquote type="cite">

        <pre wrap="">$ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize

</pre>

      </blockquote>

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">; Function Attrs: norecurse uwtable

define void @_Z11op_distancePi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {

; This one is vectorized!

  %6 = ptrtoint i32* %1 to i64

  %7 = ptrtoint i32* %3 to i64

  %8 = sub i64 %7, %6

  %9 = icmp sgt i64 %8, 0

  br i1 %9, label %10, label %26

; <label>:10:                                     ; preds = %5

  %11 = lshr exact i64 %8, 2

  br label %12

; <label>:12:                                     ; preds = %12, %10

  %13 = phi i64 [ %23, %12 ], [ %11, %10 ]

  %14 = phi i32* [ %22, %12 ], [ %0, %10 ]

  %15 = phi i32* [ %21, %12 ], [ %2, %10 ]

  %16 = phi i32* [ %20, %12 ], [ %1, %10 ]

  %17 = load i32, i32* %16, align 4, !tbaa !1

  %18 = load i32, i32* %15, align 4, !tbaa !1

  %19 = add nsw i32 %18, %17

  store i32 %19, i32* %14, align 4, !tbaa !1

  %20 = getelementptr inbounds i32, i32* %16, i64 1

  %21 = getelementptr inbounds i32, i32* %15, i64 1

  %22 = getelementptr inbounds i32, i32* %14, i64 1

  %23 = add nsw i64 %13, -1

  %24 = icmp sgt i64 %13, 1

  br i1 %24, label %12, label %25

; <label>:25:                                     ; preds = %12

  br label %26

; <label>:26:                                     ; preds = %25, %5

  ret void

}

; Function Attrs: norecurse uwtable

define void @_Z2opPi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {

; This one isn't!

  %6 = ptrtoint i32* %1 to i64

  %7 = ptrtoint i32* %3 to i64

  %8 = sub i64 %6, %7

  %9 = icmp sgt i64 %8, 0

  br i1 %9, label %10, label %28

; <label>:10:                                     ; preds = %5

  %11 = lshr exact i64 %8, 2

  br label %12

; <label>:12:                                     ; preds = %12, %10

  %13 = phi i64 [ %25, %12 ], [ %11, %10 ]

  %14 = phi i32* [ %24, %12 ], [ %0, %10 ]

  %15 = phi i32* [ %23, %12 ], [ %2, %10 ]

  %16 = phi i64 [ %22, %12 ], [ %6, %10 ]

  %17 = inttoptr i64 %16 to i32*

  %18 = load i32, i32* %17, align 4, !tbaa !1

  %19 = load i32, i32* %15, align 4, !tbaa !1

  %20 = add nsw i32 %19, %18

  store i32 %20, i32* %14, align 4, !tbaa !1

  %21 = getelementptr inbounds i32, i32* %17, i64 1

  %22 = ptrtoint i32* %21 to i64

  %23 = getelementptr inbounds i32, i32* %15, i64 1

  %24 = getelementptr inbounds i32, i32* %14, i64 1

  %25 = add nsw i64 %13, -1

  %26 = icmp sgt i64 %13, 1

  br i1 %26, label %12, label %27

; <label>:27:                                     ; preds = %12

  br label %28

; <label>:28:                                     ; preds = %27, %5

  ret void

}

</pre>

      </blockquote>

      <pre wrap="">

If we compile only the "op" function while activation the debug mode,

here is the output:

</pre>

      <blockquote type="cite">

        <pre wrap="">$ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize |~/dev/epona-llvm/build_debug_shared/bin/opt -debug -debug-only loop-vectorize -O2 -S

LV: Checking a loop in "_Z2opPi16add_zip_iteratorS0_" from <stdin>

LV: Loop hints: force=? width=0 unroll=0

LV: Found a loop: 

LV: Found an induction variable.

LV: Found an unidentified PHI.  %16 = phi i64 [ %22, %12 ], [ %6, %10 ]

LV: Can't vectorize the instructions or CFG

LV: Not vectorizing: Cannot prove legality.

[...]

</pre>

      </blockquote>

      <pre wrap="">

The issue seems to be that the phi node "%16" can't be deduced as an

induction variable. If we take a closer look, the cause seems to be in

ScalarEvolution, in the createSCEV function

(<a class="moz-txt-link-freetext" href="http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04770">http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04770</a>)

:

</pre>

      <blockquote type="cite">

        <pre wrap=""> // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can

 // lead to pointer expressions which cannot safely be expanded to GEPs,

 // because ScalarEvolution doesn't respect the GEP aliasing rules when

 // simplifying integer expressions.

</pre>

      </blockquote>

      <pre wrap="">

Indeed, SCEV does not (legitimately) consider inttoptr/ptrtoint as

no-op, and does not handle them. The thing is that, in our case, the GEP

in %23 is thus not analyzed by SCEV, and the PHI %16 is thus not

considered as an induction variable.

To confirm this hypothesis, I created a small out-of-tree pass

(<a class="moz-txt-link-freetext" href="https://github.com/aguinet/llvm-intptrcleanup">https://github.com/aguinet/llvm-intptrcleanup</a>) which registers before

loop vectorization and does the following:

* first, it search for phi nodes who have those properties:

  - every incoming value of the phi node is a ptrtoint instruction. The

original pointer type of every ptrtoint instruction must be the same type T.

  - every user of this PHI node is an inttoptr instruction of the

previous type T

* for each of these PHI nodes, it creates a new PHI node which takes the

original pointers as incoming values, and replace the uses of the

inttoptr instructions that uses the original PHI node by the new one

* it then removes the previous inttoptr instructions and the original

PHI node

The way I understand inttoptr and ptrtoint, this transformation should

be valid (but I might have missed something!). Please note that this is

a quick'n'dirty pass, which hasn't been heavily tested. Using this pass,

the previous example is now vectorized correctly by the loop vectorizer.

This can be seen by looking at the output of:

</pre>

      <blockquote type="cite">

        <pre wrap="">$ clang -Xclang -load -Xclang IntToPtrCleanup.so -O2 ./example/op_zip_operator.cpp -S -emit-llvm -o - -std=c++11

</pre>

      </blockquote>

      <pre wrap="">

The question that remains to me is how this should be correctly fixed:

1) Making SCEV support these no-op (in this case) inttoptr/ptrtoint

conversions

2) insert the above transformation at some point in the optimization

pipeline

3) clean the pass(es?) that somehow generated this case.

I have to admit I'm not really sure which options is the best. 3) seems

to be the way to go but might require some tedious work, and does not

prevent the issue to come again in the future. 2) seems to be a quick

patch that could be inserted in some "canonicalization" pass, let it be

a valid transformation in the first place. I don't know SCEV enough to

judge of the difficulty/faisability of 1).

This mail is thus to discuss this issue and how to fix this properly :)

Thanks everyone :)

</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>