<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Performance degradation of ‘fft’ test from eembc.1.1 suite on x86 Avoton-1.7 due to [DAGCombine]-shift changes"

   href="https://llvm.org/bugs/show_bug.cgi?id=24373">24373</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Performance degradation of ‘fft’ test from eembc.1.1 suite on x86 Avoton-1.7  due to [DAGCombine]-shift changes

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Scalar Optimizations

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>sergey.k.okunev@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>benny.kra@gmail.com, david.l.kreitzer@intel.com, denis.briltz@intel.com, llvm-bugs@lists.llvm.org, michael.m.kuperstein@intel.com, sergey.k.okunev@gmail.com, sergos.gnu@gmail.com, zia.ansari@intel.com

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=14700" name="attach_14700" title="test">attachment 14700</a> <a href="attachment.cgi?id=14700&action=edit" title="test">[details]</a></span>

test

The performance degradation of eembc.1.1/fft00 test is caused by the commit

rev. 240787 with the following comments.

commit 3791d56da63baf5072fa6ecaa872ace6adbc6892

Author: Benjamin Kramer <<a href="mailto:benny.kra@googlemail.com">benny.kra@googlemail.com</a>>

Date:   Fri Jun 26 14:51:36 2015 +0000

    [DAGCombine] fold (X >>?,exact C1) << C2 --> X << (C2-C1)

    Instcombine also does this but many opportunities only become visible

    after GEPs are lowered.

    git-svn-id: <a href="https://llvm.org/svn/llvm-project/llvm/trunk@240787">https://llvm.org/svn/llvm-project/llvm/trunk@240787</a>

91177308-0d34-0410-b5e6-96231b3b80d8

The performance degradation of benchmark is on the hottest inner loop and

occurs around load address calculations. The IR dumps before ‘Expand ISel

Pseudo-instructions’ phase are the same and it looks as follows.

.lr.ph10:                                         ; preds = %33, %.lr.ph10

  %46 = phi i32 [ %sext21, %.lr.ph10 ], [ %44, %33 ]

  %47 = add nsw i32 %46, %29                                   !!

  %sext4 = shl i32 %47, 16                                     !!

  %48 = ashr exact i32 %sext4, 16                              !!

  %49 = getelementptr inbounds [256 x i16], [256 x i16]* %RealBitRevData, i32

0, i32 %48

  %50 = load i16, i16* %49, align 2, !tbaa !2 

  %51 = sext i16 %50 to i32 

  %52 = mul nsw i32 %51, %38 

  %53 = getelementptr inbounds [256 x i16], [256 x i16]* %ImagBitRevData, i32

0, i32 %48

  %54 = load i16, i16* %53, align 2, !tbaa !2

  %55 = sext i16 %54 to i32

…………………. 

After ‘Expand ISel Pseudo-instructions’ phase the shifts are replaced by

‘movswl’ instruction in rev. 240786 case and remains in the code without

transformations in rev. 240787 case that leads to degradation. Corresponding IR

dump fragments of considered loads are the following.

rev. 240786:

----------------

BB#10: derived from LLVM BB %.lr.ph10  

    Predecessors according to CFG: BB#9 BB#10 

        %vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17

GR32:%vreg16,%vreg18

        %vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,

%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17

        %vreg63<def> = COPY %vreg62:sub_16bit; GR16:%vreg63 GR32:%vreg62

        %vreg64<def> = MOVSX32rr16 %vreg63<kill>; GR32_NOSP:%vreg64

GR16:%vreg63                                           !! movswl

        %vreg65<def> = MOVSX32rm16 <fi#0>, 2, %vreg64, 0, %noreg;

mem:LD2[%49](tbaa=<0x5761c08>) GR32:%vreg65 GR32_NOSP:%vreg64

        %vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,

%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13

        %vreg67<def> = MOVSX32rm16 <fi#1>, 2, %vreg64, 0, %noreg;

mem:LD2[%53](tbaa=<0x5761c08>) GR32:%vreg67 GR32_NOSP:%vreg64

…………………………….

vs.

rev. 240787:

-----------

BB#10: derived from LLVM BB %.lr.ph10

    Predecessors according to CFG: BB#9 BB#10

        %vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17

GR32:%vreg16,%vreg18

        %vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,

%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17

        %vreg63<def,tied1> = SHL32ri %vreg62<tied0>, 16, %EFLAGS<imp-def,dead>;

GR32:%vreg63,%vreg62                                            !! 

        %vreg64<def,tied1> = SAR32ri %vreg63<tied0>, 15, %EFLAGS<imp-def,dead>;

GR32_NOSP:%vreg64 GR32:%vreg63                                  !!

        %vreg65<def> = MOVSX32rm16 <fi#0>, 1, %vreg64, 0, %noreg; mem:LD2[%47]

GR32:%vreg65 GR32_NOSP:%vreg64

        %vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,

%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13

        %vreg67<def> = MOVSX32rm16 <fi#1>, 1, %vreg64, 0, %noreg; mem:LD2[%51]

GR32:%vreg67 GR32_NOSP:%vreg64

………………………………………………..

Test fft00.ll and IR dumps for two revisions are in attachment. Command line

for reproducing is the following.

clang   -m32 -fPIE  -fuse-ld=gold  -O2 -ffast-math -mfpmath=sse -march=slm 

-mllvm -print-after-all  fft00.ll

Okunev Sergey,

Software Engineer

Intel Compiler Team</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>