<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Performance degradations of tests from eembc.1.1 suite on x86 Avoton-1.7 due to ‘SCEVExpander’ changes"

   href="https://llvm.org/bugs/show_bug.cgi?id=23070">23070</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Performance degradations of tests from eembc.1.1 suite on x86 Avoton-1.7  due to ‘SCEVExpander’ changes

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Scalar Optimizations

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>sergey.k.okunev@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>david.l.kreitzer@intel.com, denis.briltz@intel.com, elena.demikhovsky@intel.com, llvmbugs@cs.uiuc.edu, michael.m.kuperstein@intel.com, sanjoy@playingwithpointers.com, sergos.gnu@gmail.com, zia.ansari@intel.com

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>While our performance testing regressions on tests autocor00, aifftr01,

aiifft01 from eembc.1.1 suite were detected. Bisect analysis showed LLVM

revision 231018  is responsible for these degradations. The comments to commit

are the following.

commit caee94bbb4fb44971f594fe09fd61692dc4aa719

Author: Sanjoy Das <<a href="mailto:sanjoy@playingwithpointers.com">sanjoy@playingwithpointers.com</a>>

Date:   Mon Mar 2 21:41:07 2015 +0000

    Revert some changes that were made to fix PR20680.

    This re-lands change r230921.  r230921 was reverted because it broke a

    clang test; a checkin fixing the clang test will be commited shortly.

    Summary:

    As far as I can tell, the real bug causing the issue was fixed in

    r230533.  SCEVExpander should mark an increment operation as nuw or nsw

    only if it can *prove* that the operation does not overflow.  There

    shouldn't be any situation where we have to do something different

    because of no-wrap flags generated by SCEVExpander.

    Revert "IndVarSimplify: Allow LFTR to fire more often"

    This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).

    Revert "IndVarSimplify: Don't let LFTR compare against a poison value"

    This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).

    Reviewers: majnemer, atrick, spatel

    Differential Revision: <a href="http://reviews.llvm.org/D7979">http://reviews.llvm.org/D7979</a>

    git-svn-id: <a href="https://llvm.org/svn/llvm-project/llvm/trunk@231018">https://llvm.org/svn/llvm-project/llvm/trunk@231018</a>

91177308-0d34-0410-b5e6-96231b3b80d8

Submitted changes prevent to enabling some following loop optimizations that

leads to additional operations in x86 loop code in degraded cases. Consider

example from eembc.1.1 with fragments of IR dumps and asm codes for revisions

before (r231017) and after degradations (r231018). 

Options: -O2 -ffast-math -m32 -mfpmath=sse -march=slm -fPIE -pie

1) eembc_1_1/autcor00

---------------------

There is nested loop region with accumulator in the test. On second pass of

‘Induction Variable Simplification’ there are no recurrence optimizations in

r231018 case. Then “Loop Strength Reduction” could not apply further

transformation of loop condition. Corresponding loop IR dump fragments for two

versions are the following.

r231017:

-------

*** IR Dump After Induction Variable Simplification ***

for.body:                               ; preds = %for.body.lr.ph, %for.end

  %indvars.iv = phi i32 [ %1, %for.body.lr.ph ], [ %indvars.iv.next, %for.end ]

  %lag.032 = phi i32 [ 0, %for.body.lr.ph ], [ %inc16, %for.end ]

  %sub = sub nsw i32 %conv2, %lag.032

  %cmp428 = icmp sgt i32 %sub, 0

  br i1 %cmp428, label %for.body6.lr.ph, label %for.end

for.end:                      ; preds = %for.cond3.for.end_crit_edge, %for.body

........

  %inc16 = add nuw nsw i32 %lag.032, 1

  %indvars.iv.next = add nsw i32 %indvars.iv, -1

  %exitcond34 = icmp ne i32 %lag.032, %3              !!            

  br i1 %exitcond34, label %for.body, label %for.cond.for.end17_crit_edge

for.body6:                              ; preds = %for.body6.lr.ph, %for.body6

..........

  %inc = add nuw nsw i32 %i.029, 1

  %exitcond = icmp ne i32 %i.029, %indvars.iv        !! inner loop cond. is

transformed

  br i1 %exitcond, label %for.body6, label %for.cond3.for.end_crit_edge

*** IR Dump After Loop Strength Reduction ***

for.body6:                          ; preds = %for.body6.preheader, %for.body6

  %lsr.iv35 = phi i16* [ %InputData, %for.body6.preheader ], [ %scevgep,

%for.body6 ]

  %lsr.iv = phi i32 [ %indvars.iv.in, %for.body6.preheader ], [ %lsr.iv.next,

%for.body6 ]

.........

  %lsr.iv.next = add i32 %lsr.iv, -1

  %scevgep = getelementptr i16, i16* %lsr.iv35, i32 1

  %exitcond = icmp eq i32 %lsr.iv.next, 0              !! further

transformation

  br i1 %exitcond, label %for.end.loopexit, label %for.body6

vs.

r231018:

-------

*** IR Dump After Induction Variable Simplification ***

for.body:                               ; preds = %for.body.lr.ph, %for.end

  %indvars.iv = phi i32 [ %0, %for.body.lr.ph ], [ %indvars.iv.next, %for.end ]

  %lag.032 = phi i32 [ 0, %for.body.lr.ph ], [ %inc16, %for.end ]

  %sub = sub nsw i32 %conv2, %lag.032

  %cmp428 = icmp sgt i32 %sub, 0

  br i1 %cmp428, label %for.body6.lr.ph, label %for.end

for.end:                    ; preds = %for.cond3.for.end_crit_edge, %for.body

.......

  %inc16 = add nuw nsw i32 %lag.032, 1

  %indvars.iv.next = add nsw i32 %indvars.iv, -1

  %exitcond34 = icmp ne i32 %inc16, %1                !! inner loop cond. is

not transformed  

  br i1 %exitcond34, label %for.body, label %for.cond.for.end17_crit_edge

*** IR Dump After Loop Strength Reduction ***

for.body6:                          ; preds = %for.body6.preheader, %for.body6

  %lsr.iv = phi i16* [ %InputData, %for.body6.preheader ], [ %scevgep,

%for.body6 ]

.........

  %inc = add nuw nsw i32 %i.029, 1

  %scevgep = getelementptr i16, i16* %lsr.iv, i32 1

  %exitcond = icmp eq i32 %indvars.iv, %inc            !! 

  br i1 %exitcond, label %for.end.loopexit, label %for.body6

And resultant code of the loop of version before degradation (r231017) is

obtained more optimal – less instructions and the length of loop iteration is

less by 1 clock. Corresponding asm codes of inner loop are the following.

r231017:

-------

xf7723c00 26 2787 movswl (%edx),%edi           !!

0xf7723c03 27 228 movswl (%edx,%ebx,2),%esi    !!

0xf7723c07 28 2614 add    $0x2,%edx

0xf7723c0a 29 117 imul   %edi,%esi

0xf7723c0d 30 3498 sar    %cl,%esi

0xf7723c0f 31 3245 add    %esi,%eax

0xf7723c11 32 2636 add    $0xffffffff,%ebp     !!

0xf7723c14 33 136 jne    f7723c00 <fxpAutoCorrelation+0x50>

vs.

r231018:

-------

0xf7736c10 27 1816 mov    0x8(%esp),%ebp       !! additional fill-instr.

0xf7736c14 28 934 movswl (%edx),%esi

0xf7736c17 29 1874 add    $0x1,%edi            !! add is operand of 'cmp'

0xf7736c1a 30 879 movswl (%edx,%ebp,2),%ebp    !! + 1 clock in the loop

0xf7736c1e 31 1868 add    $0x2,%edx

0xf7736c21 32 924 imul   %esi,%ebp

0xf7736c24 33 2417 sar    %cl,%ebp

0xf7736c26 34 3499 add    %ebp,%eax

0xf7736c28 35 2084 cmp    %edi,%ebx            !! add -> cmp instead of --i

0xf7736c2a 36 917 jne    f7736c10 <fxpAutoCorrelation+0x60>

Okunev Sergey,

Software Engineer

Intel Compiler Team</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>