[llvm-bugs] [Bug 24373] New:=?UTF-8?Q?=20Performance=20degradation=20of=20=E2=80=98fft=E2=80=99=20test=20from=20eembc=2E1=2E1=20suite=20on=20x86=20Avoton=2D1=2E7=20=20due=20to=20?=[DAGCombine]-shift changes
via llvm-bugs
llvm-bugs at lists.llvm.org
Thu Aug 6 03:39:38 PDT 2015
https://llvm.org/bugs/show_bug.cgi?id=24373
Bug ID: 24373
Summary: Performance degradation of ‘fft’ test from eembc.1.1
suite on x86 Avoton-1.7 due to [DAGCombine]-shift
changes
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Scalar Optimizations
Assignee: unassignedbugs at nondot.org
Reporter: sergey.k.okunev at gmail.com
CC: benny.kra at gmail.com, david.l.kreitzer at intel.com,
denis.briltz at intel.com, llvm-bugs at lists.llvm.org,
michael.m.kuperstein at intel.com,
sergey.k.okunev at gmail.com, sergos.gnu at gmail.com,
zia.ansari at intel.com
Classification: Unclassified
Created attachment 14700
--> https://llvm.org/bugs/attachment.cgi?id=14700&action=edit
test
The performance degradation of eembc.1.1/fft00 test is caused by the commit
rev. 240787 with the following comments.
commit 3791d56da63baf5072fa6ecaa872ace6adbc6892
Author: Benjamin Kramer <benny.kra at googlemail.com>
Date: Fri Jun 26 14:51:36 2015 +0000
[DAGCombine] fold (X >>?,exact C1) << C2 --> X << (C2-C1)
Instcombine also does this but many opportunities only become visible
after GEPs are lowered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240787
91177308-0d34-0410-b5e6-96231b3b80d8
The performance degradation of benchmark is on the hottest inner loop and
occurs around load address calculations. The IR dumps before ‘Expand ISel
Pseudo-instructions’ phase are the same and it looks as follows.
.lr.ph10: ; preds = %33, %.lr.ph10
%46 = phi i32 [ %sext21, %.lr.ph10 ], [ %44, %33 ]
%47 = add nsw i32 %46, %29 !!
%sext4 = shl i32 %47, 16 !!
%48 = ashr exact i32 %sext4, 16 !!
%49 = getelementptr inbounds [256 x i16], [256 x i16]* %RealBitRevData, i32
0, i32 %48
%50 = load i16, i16* %49, align 2, !tbaa !2
%51 = sext i16 %50 to i32
%52 = mul nsw i32 %51, %38
%53 = getelementptr inbounds [256 x i16], [256 x i16]* %ImagBitRevData, i32
0, i32 %48
%54 = load i16, i16* %53, align 2, !tbaa !2
%55 = sext i16 %54 to i32
………………….
After ‘Expand ISel Pseudo-instructions’ phase the shifts are replaced by
‘movswl’ instruction in rev. 240786 case and remains in the code without
transformations in rev. 240787 case that leads to degradation. Corresponding IR
dump fragments of considered loads are the following.
rev. 240786:
----------------
BB#10: derived from LLVM BB %.lr.ph10
Predecessors according to CFG: BB#9 BB#10
%vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17
GR32:%vreg16,%vreg18
%vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,
%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17
%vreg63<def> = COPY %vreg62:sub_16bit; GR16:%vreg63 GR32:%vreg62
%vreg64<def> = MOVSX32rr16 %vreg63<kill>; GR32_NOSP:%vreg64
GR16:%vreg63 !! movswl
%vreg65<def> = MOVSX32rm16 <fi#0>, 2, %vreg64, 0, %noreg;
mem:LD2[%49](tbaa=<0x5761c08>) GR32:%vreg65 GR32_NOSP:%vreg64
%vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,
%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13
%vreg67<def> = MOVSX32rm16 <fi#1>, 2, %vreg64, 0, %noreg;
mem:LD2[%53](tbaa=<0x5761c08>) GR32:%vreg67 GR32_NOSP:%vreg64
…………………………….
vs.
rev. 240787:
-----------
BB#10: derived from LLVM BB %.lr.ph10
Predecessors according to CFG: BB#9 BB#10
%vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17
GR32:%vreg16,%vreg18
%vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,
%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17
%vreg63<def,tied1> = SHL32ri %vreg62<tied0>, 16, %EFLAGS<imp-def,dead>;
GR32:%vreg63,%vreg62 !!
%vreg64<def,tied1> = SAR32ri %vreg63<tied0>, 15, %EFLAGS<imp-def,dead>;
GR32_NOSP:%vreg64 GR32:%vreg63 !!
%vreg65<def> = MOVSX32rm16 <fi#0>, 1, %vreg64, 0, %noreg; mem:LD2[%47]
GR32:%vreg65 GR32_NOSP:%vreg64
%vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,
%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13
%vreg67<def> = MOVSX32rm16 <fi#1>, 1, %vreg64, 0, %noreg; mem:LD2[%51]
GR32:%vreg67 GR32_NOSP:%vreg64
………………………………………………..
Test fft00.ll and IR dumps for two revisions are in attachment. Command line
for reproducing is the following.
clang -m32 -fPIE -fuse-ld=gold -O2 -ffast-math -mfpmath=sse -march=slm
-mllvm -print-after-all fft00.ll
Okunev Sergey,
Software Engineer
Intel Compiler Team
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150806/abacc2e2/attachment-0001.html>
More information about the llvm-bugs
mailing list