<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Two equivalent values not folded in loop"

   href="https://llvm.org/bugs/show_bug.cgi?id=28006">28006</a>

          </td>

        </tr>


        <tr>

          <th>Summary</th>

          <td>Two equivalent values not folded in loop

          </td>

        </tr>


        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>


        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>


        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>


        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>


        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>


        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>


        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>


        <tr>

          <th>Component</th>

          <td>Scalar Optimizations

          </td>

        </tr>


        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>


        <tr>

          <th>Reporter</th>

          <td>code@klickverbot.at

          </td>

        </tr>


        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>


        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Consider the following function – a simple straight-line loop for selecting the

minimal element from a range of i32s –, which apart from loop unrolling is a

fixed point for the default optimizer pipeline (opt -O3

-disable-loop-unrolling/…) on current master:


---

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

target triple = "x86_64-apple-macosx"


; Function Attrs: noinline norecurse nounwind readonly uwtable

define i32 @foo({ i64, i32* } %r_arg, i32 %seedElement_arg) #0 {

  %1 = extractvalue { i64, i32* } %r_arg, 0

  %2 = extractvalue { i64, i32* } %r_arg, 1

  %3 = icmp eq i64 %1, 0

  br i1 %3, label %endfor, label %forbody


forbody:                                          ; preds = %0, %forbody

  %extremeElement.0 = phi i32 [ %.extremeElement.0, %forbody ], [

%seedElement_arg, %0 ]

  %extremeElementMapped.0 = phi i32 [ %.extremeElementMapped.0, %forbody ], [

%seedElement_arg, %0 ]

  %__key2831.02 = phi i64 [ %7, %forbody ], [ 0, %0 ]

  %4 = getelementptr i32, i32* %2, i64 %__key2831.02

  %5 = load i32, i32* %4, align 4

  %6 = icmp slt i32 %5, %extremeElementMapped.0

  %.extremeElement.0 = select i1 %6, i32 %5, i32 %extremeElement.0

  %.extremeElementMapped.0 = select i1 %6, i32 %5, i32 %extremeElementMapped.0

  %7 = add nuw i64 %__key2831.02, 1

  %exitcond = icmp eq i64 %7, %1

  br i1 %exitcond, label %endfor, label %forbody


endfor:                                           ; preds = %forbody, %0

  %extremeElement.1 = phi i32 [ %seedElement_arg, %0 ], [ %.extremeElement.0,

%forbody ]

  ret i32 %extremeElement.1

}


attributes #0 = { noinline norecurse nounwind readonly uwtable

"target-cpu"="haswell"

"target-features"="+sse2,+cx16,-tbm,-avx512ifma,-avx512dq,-fma4,-prfchw,+bmi2,-xsavec,+fsgsbase,+popcnt,+aes,-pcommit,-xsaves,-avx512er,-clwb,-avx512f,-pku,-smap,+mmx,-xop,-rdseed,-hle,-sse4a,-avx512bw,-clflushopt,+xsave,-avx512vl,+invpcid,-avx512cd,+avx,-rtm,+fma,+bmi,-mwaitx,+rdrnd,+sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ssse3,-sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,-sha,-adx,-avx512pf,+sse3"

}

---


For reasons that aren't immediately obvious to me, %extremeElement.0 and

%extremeElementMapped.0 are not folded together, even though they are trivially

equivalent. This isn't caught during instruction selection, either (note the

two cmov instructions, ~30% slower on an i7-4980HQ):


---

_foo:

    .cfi_startproc

    test    rdi, rdi

    je    LBB0_3

    mov    eax, edx

    .p2align    4, 0x90

LBB0_2:

    mov    ecx, dword ptr [rsi]

    cmp    ecx, eax

    cmovl    edx, ecx

    cmovle    eax, ecx

    add    rsi, 4

    add    rdi, -1

    jne    LBB0_2

LBB0_3:

    mov    eax, edx

    ret

    .cfi_endproc

---


I disabled loop unrolling only for clarity, as it just makes matters worse (all

the selects are duplicated). Crucially, this also causes the loop vectorizer

not to trigger.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      
      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>