[LLVMbugs] [Bug 23294] New: Performance degradation of eembc.1.1/idctrn01 test on x86 Avoton-1.7 due to adding of LICM pass after loop unrolling
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Mon Apr 20 05:31:46 PDT 2015
https://llvm.org/bugs/show_bug.cgi?id=23294
Bug ID: 23294
Summary: Performance degradation of eembc.1.1/idctrn01 test on
x86 Avoton-1.7 due to adding of LICM pass after loop
unrolling
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Loop Optimizer
Assignee: unassignedbugs at nondot.org
Reporter: sergey.k.okunev at gmail.com
CC: david.l.kreitzer at intel.com, denis.briltz at intel.com,
elena.demikhovsky at intel.com, llvmbugs at cs.uiuc.edu,
michael.m.kuperstein at intel.com, sergos.gnu at gmail.com,
zia.ansari at intel.com
Classification: Unclassified
Created attachment 14231
--> https://llvm.org/bugs/attachment.cgi?id=14231&action=edit
Initial ll-file of considered 't_run_test' function
Bisect analysis showed LLVM revision 232011 is responsible for the
degradation. The comments to commit are the following.
commit a56999c5decca0023e5ce481fc08571e227e3aa3
Author: Kevin Qin <Kevin.Qin at arm.com>
Date: Thu Mar 12 05:36:01 2015 +0000
Reapply 'Run LICM pass after loop unrolling pass.'
It's firstly committed at r231630, and reverted at r231635.
Function pass InstructionSimplifier is inserted as barrier to
make sure loop unroll pass won't affect on LICM pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232011
91177308-0d34-0410-b5e6-96231b3b80d8
LLVM-clang options: O2 -ffast-math -m32 -mfpmath=sse -march=slm -fPIE –pie
The eembc_1_1/idctrn01 test has several hot loops with 3 levels of nesting (8 x
8 x 8 iterations). Each of these loops was fully unrolled. In adding the loop
invariant code motion pass (LICM) after the loop unroll pass (in r. 232011),
there are 16 loads hoisted up and the same 16 stores with constant addresses
sunk from the body of hot loop.
When machine specific code was generated for x86_32, AVT1.7 architecture these
16 loaded and stored values are transferred by stack spill, fill instructions
due to lack of xmm registers. As result additional loads and stores (fills and
spills) were generated inside, before and after loop in r232011 case. So, the
number of loads and stores inside loop are very close for both revisions and
‘additional’ loads and stores before and after loop cause the regression.
Changes enabled by considered revision in terms of simplified IR looks as
follows.
+16 <4 x i32> loads -> virt_regs1 !! hoisted up loads with const.
addr
br label %l_loop
%l_loop:
+16 <4 x i32> phi assignments
calculations (virt_regs1 / virt_regs2) !! loads and stores were
replaced by virtual regs
br %exitcond, label %l_exit, label %l_loop
%l_exit:
+16 <4 x i32> phi assignments
+16 virt_regs2 -> <4 x i32> stores !! sunk stores with const. addr
Corresponding asm loop code fragments with one load-store chain for two
revisions are the following.
r232010:
-------
xor %ecx,%ecx
; start of loop1
l_f772f5a0:
add $0x20,%ecx
movsbl -0x38(%eax),%edx
movdqu 0x2d0(%ebx),%xmm6 !! load with const addr. is inside loop
movd %edx,%xmm0
pshufd $0x0,%xmm0,%xmm2
pmulld %xmm2,%xmm3
paddd %xmm3,%xmm6
movdqu %xmm6,0x2d0(%ebx) !! store with const. addr. is inside loop
; ... and 15 more block like that ...
add $0x1,%eax
cmp $0x100,%ecx
jne l_f772f5a0 <t_run_test+0x840>
; end of loop1
vs.
r232011:
xor %ecx,%ecx
movdqu 0x2d0(%ebx),%xmm0 !! hoisted up load with const. addr
movdqa %xmm0,0xd0(%esp) !! spill-instr. before loop
; and 15 more movs like that
; start of loop1
l_f77c7680:
add $0x20,%ecx
movsbl -0x38(%eax),%edx
movdqa 0xd0(%esp),%xmm7 !! fill-instr. inside loop
movd %edx,%xmm2
pshufd $0x0,%xmm2,%xmm2
pmulld %xmm2,%xmm0
paddd %xmm0,%xmm7
movdqa %xmm7,0xd0(%esp) !! spill-instr. inside loop
; … and 15 more block like that …
add $0x1,%eax
cmp $0x100,%ecx
jne l_f77c7680 <t_run_test+0x920>
; end of loop1
movdqa 0xd0(%esp),%xmm0 !! fill-instr. after loop
movdqu %xmm0,0x2d0(%ebx) !! sunk stores with const. addr
; and 15 more movs like that
Okunev Sergey,
Software Engineer
Intel Compiler Team
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150420/d32c2527/attachment.html>
More information about the llvm-bugs
mailing list