[llvm-bugs] [Bug 24373] New:=?UTF-8?Q?=20Performance=20degradation=20of=20=E2=80=98fft=E2=80=99=20test=20from=20eembc=2E1=2E1=20suite=20on=20x86=20Avoton=2D1=2E7=20=20due=20to=20?=[DAGCombine]-shift changes

Thu Aug 6 03:39:38 PDT 2015

https://llvm.org/bugs/show_bug.cgi?id=24373

            Bug ID: 24373
           Summary: Performance degradation of ‘fft’ test from eembc.1.1
                    suite on x86 Avoton-1.7  due to [DAGCombine]-shift
                    changes
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: sergey.k.okunev at gmail.com
                CC: benny.kra at gmail.com, david.l.kreitzer at intel.com,
                    denis.briltz at intel.com, llvm-bugs at lists.llvm.org,
                    michael.m.kuperstein at intel.com,
                    sergey.k.okunev at gmail.com, sergos.gnu at gmail.com,
                    zia.ansari at intel.com
    Classification: Unclassified

Created attachment 14700
  --> https://llvm.org/bugs/attachment.cgi?id=14700&action=edit
test

The performance degradation of eembc.1.1/fft00 test is caused by the commit
rev. 240787 with the following comments.

commit 3791d56da63baf5072fa6ecaa872ace6adbc6892
Author: Benjamin Kramer <benny.kra at googlemail.com>
Date:   Fri Jun 26 14:51:36 2015 +0000

    [DAGCombine] fold (X >>?,exact C1) << C2 --> X << (C2-C1)

    Instcombine also does this but many opportunities only become visible
    after GEPs are lowered.

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240787
91177308-0d34-0410-b5e6-96231b3b80d8

The performance degradation of benchmark is on the hottest inner loop and
occurs around load address calculations. The IR dumps before ‘Expand ISel
Pseudo-instructions’ phase are the same and it looks as follows.

.lr.ph10:                                         ; preds = %33, %.lr.ph10
  %46 = phi i32 [ %sext21, %.lr.ph10 ], [ %44, %33 ]
  %47 = add nsw i32 %46, %29                                   !!
  %sext4 = shl i32 %47, 16                                     !!
  %48 = ashr exact i32 %sext4, 16                              !!
  %49 = getelementptr inbounds [256 x i16], [256 x i16]* %RealBitRevData, i32
0, i32 %48
  %50 = load i16, i16* %49, align 2, !tbaa !2 
  %51 = sext i16 %50 to i32 
  %52 = mul nsw i32 %51, %38 
  %53 = getelementptr inbounds [256 x i16], [256 x i16]* %ImagBitRevData, i32
0, i32 %48
  %54 = load i16, i16* %53, align 2, !tbaa !2
  %55 = sext i16 %54 to i32
…………………. 

After ‘Expand ISel Pseudo-instructions’ phase the shifts are replaced by
‘movswl’ instruction in rev. 240786 case and remains in the code without
transformations in rev. 240787 case that leads to degradation. Corresponding IR
dump fragments of considered loads are the following.

rev. 240786:
----------------
BB#10: derived from LLVM BB %.lr.ph10  
    Predecessors according to CFG: BB#9 BB#10 
        %vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17
GR32:%vreg16,%vreg18
        %vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,
%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17
        %vreg63<def> = COPY %vreg62:sub_16bit; GR16:%vreg63 GR32:%vreg62
        %vreg64<def> = MOVSX32rr16 %vreg63<kill>; GR32_NOSP:%vreg64
GR16:%vreg63                                           !! movswl
        %vreg65<def> = MOVSX32rm16 <fi#0>, 2, %vreg64, 0, %noreg;
mem:LD2[%49](tbaa=<0x5761c08>) GR32:%vreg65 GR32_NOSP:%vreg64
        %vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,
%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13
        %vreg67<def> = MOVSX32rm16 <fi#1>, 2, %vreg64, 0, %noreg;
mem:LD2[%53](tbaa=<0x5761c08>) GR32:%vreg67 GR32_NOSP:%vreg64
…………………………….

vs.

rev. 240787:
-----------
BB#10: derived from LLVM BB %.lr.ph10
    Predecessors according to CFG: BB#9 BB#10
        %vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17
GR32:%vreg16,%vreg18
        %vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,
%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17
        %vreg63<def,tied1> = SHL32ri %vreg62<tied0>, 16, %EFLAGS<imp-def,dead>;
GR32:%vreg63,%vreg62                                            !! 
        %vreg64<def,tied1> = SAR32ri %vreg63<tied0>, 15, %EFLAGS<imp-def,dead>;
GR32_NOSP:%vreg64 GR32:%vreg63                                  !!
        %vreg65<def> = MOVSX32rm16 <fi#0>, 1, %vreg64, 0, %noreg; mem:LD2[%47]
GR32:%vreg65 GR32_NOSP:%vreg64
        %vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,
%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13
        %vreg67<def> = MOVSX32rm16 <fi#1>, 1, %vreg64, 0, %noreg; mem:LD2[%51]
GR32:%vreg67 GR32_NOSP:%vreg64
………………………………………………..

Test fft00.ll and IR dumps for two revisions are in attachment. Command line
for reproducing is the following.

clang   -m32 -fPIE  -fuse-ld=gold  -O2 -ffast-math -mfpmath=sse -march=slm 
-mllvm -print-after-all  fft00.ll

Okunev Sergey,
Software Engineer
Intel Compiler Team

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150806/abacc2e2/attachment-0001.html>