<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Performance degradation of ‘fft’ test from eembc.1.1 suite on x86 Avoton-1.7 due to [DAGCombine]-shift changes"
   href="https://llvm.org/bugs/show_bug.cgi?id=24373">24373</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Performance degradation of ‘fft’ test from eembc.1.1 suite on x86 Avoton-1.7  due to [DAGCombine]-shift changes
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Scalar Optimizations
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>sergey.k.okunev@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>benny.kra@gmail.com, david.l.kreitzer@intel.com, denis.briltz@intel.com, llvm-bugs@lists.llvm.org, michael.m.kuperstein@intel.com, sergey.k.okunev@gmail.com, sergos.gnu@gmail.com, zia.ansari@intel.com
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=14700" name="attach_14700" title="test">attachment 14700</a> <a href="attachment.cgi?id=14700&action=edit" title="test">[details]</a></span>
test

The performance degradation of eembc.1.1/fft00 test is caused by the commit
rev. 240787 with the following comments.

commit 3791d56da63baf5072fa6ecaa872ace6adbc6892
Author: Benjamin Kramer <<a href="mailto:benny.kra@googlemail.com">benny.kra@googlemail.com</a>>
Date:   Fri Jun 26 14:51:36 2015 +0000

    [DAGCombine] fold (X >>?,exact C1) << C2 --> X << (C2-C1)

    Instcombine also does this but many opportunities only become visible
    after GEPs are lowered.

    git-svn-id: <a href="https://llvm.org/svn/llvm-project/llvm/trunk@240787">https://llvm.org/svn/llvm-project/llvm/trunk@240787</a>
91177308-0d34-0410-b5e6-96231b3b80d8

The performance degradation of benchmark is on the hottest inner loop and
occurs around load address calculations. The IR dumps before ‘Expand ISel
Pseudo-instructions’ phase are the same and it looks as follows.

.lr.ph10:                                         ; preds = %33, %.lr.ph10
  %46 = phi i32 [ %sext21, %.lr.ph10 ], [ %44, %33 ]
  %47 = add nsw i32 %46, %29                                   !!
  %sext4 = shl i32 %47, 16                                     !!
  %48 = ashr exact i32 %sext4, 16                              !!
  %49 = getelementptr inbounds [256 x i16], [256 x i16]* %RealBitRevData, i32
0, i32 %48
  %50 = load i16, i16* %49, align 2, !tbaa !2 
  %51 = sext i16 %50 to i32 
  %52 = mul nsw i32 %51, %38 
  %53 = getelementptr inbounds [256 x i16], [256 x i16]* %ImagBitRevData, i32
0, i32 %48
  %54 = load i16, i16* %53, align 2, !tbaa !2
  %55 = sext i16 %54 to i32
…………………. 

After ‘Expand ISel Pseudo-instructions’ phase the shifts are replaced by
‘movswl’ instruction in rev. 240786 case and remains in the code without
transformations in rev. 240787 case that leads to degradation. Corresponding IR
dump fragments of considered loads are the following.

rev. 240786:
----------------
BB#10: derived from LLVM BB %.lr.ph10  
    Predecessors according to CFG: BB#9 BB#10 
        %vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17
GR32:%vreg16,%vreg18
        %vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,
%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17
        %vreg63<def> = COPY %vreg62:sub_16bit; GR16:%vreg63 GR32:%vreg62
        %vreg64<def> = MOVSX32rr16 %vreg63<kill>; GR32_NOSP:%vreg64
GR16:%vreg63                                           !! movswl
        %vreg65<def> = MOVSX32rm16 <fi#0>, 2, %vreg64, 0, %noreg;
mem:LD2[%49](tbaa=<0x5761c08>) GR32:%vreg65 GR32_NOSP:%vreg64
        %vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,
%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13
        %vreg67<def> = MOVSX32rm16 <fi#1>, 2, %vreg64, 0, %noreg;
mem:LD2[%53](tbaa=<0x5761c08>) GR32:%vreg67 GR32_NOSP:%vreg64
…………………………….

vs.

rev. 240787:
-----------
BB#10: derived from LLVM BB %.lr.ph10
    Predecessors according to CFG: BB#9 BB#10
        %vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17
GR32:%vreg16,%vreg18
        %vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,
%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17
        %vreg63<def,tied1> = SHL32ri %vreg62<tied0>, 16, %EFLAGS<imp-def,dead>;
GR32:%vreg63,%vreg62                                            !! 
        %vreg64<def,tied1> = SAR32ri %vreg63<tied0>, 15, %EFLAGS<imp-def,dead>;
GR32_NOSP:%vreg64 GR32:%vreg63                                  !!
        %vreg65<def> = MOVSX32rm16 <fi#0>, 1, %vreg64, 0, %noreg; mem:LD2[%47]
GR32:%vreg65 GR32_NOSP:%vreg64
        %vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,
%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13
        %vreg67<def> = MOVSX32rm16 <fi#1>, 1, %vreg64, 0, %noreg; mem:LD2[%51]
GR32:%vreg67 GR32_NOSP:%vreg64
………………………………………………..

Test fft00.ll and IR dumps for two revisions are in attachment. Command line
for reproducing is the following.

clang   -m32 -fPIE  -fuse-ld=gold  -O2 -ffast-math -mfpmath=sse -march=slm 
-mllvm -print-after-all  fft00.ll


Okunev Sergey,
Software Engineer
Intel Compiler Team</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>