[LLVMbugs] [Bug 24348] New: [regalloc] A possible weakness of edge bundle based region splitting
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Mon Aug 3 17:53:26 PDT 2015
https://llvm.org/bugs/show_bug.cgi?id=24348
Bug ID: 24348
Summary: [regalloc] A possible weakness of edge bundle based
region splitting
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: normal
Priority: P
Component: Register Allocator
Assignee: unassignedbugs at nondot.org
Reporter: wmi at google.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
Created attachment 14687
--> https://llvm.org/bugs/attachment.cgi?id=14687&action=edit
testcase 1.cpp
The problem is found when analyzing https://llvm.org/bugs/show_bug.cgi?id=24278
Testcase 1.cpp is attached. The commandline is:
~/workarea/llvm-r243652/build/bin/clang++ -O2 -std=c++11
-fno-omit-frame-pointer -fexceptions -fno-tree-vectorize -S 1.cpp -o 1.s
There are two spills in one of the kernel loops generated:
.LBB2_9: # %for.body.41
movq -72(%rbp), %rax # 8-byte Reload
movslq (%rax,%rdi,4), %rax
movslq %edi, %rbx
movswl (%r10,%rbx,2), %ebx
movswl (%r9,%rdi,2), %edx
leal (%rdx,%rdx,2), %ecx
leal (%rbx,%rcx,2), %ecx
movswl (%r8,%rdi,2), %ebx
addl %ebx, %ecx
addl %edx, %ebx
shll $2, %ebx
movl %ecx, (%r11,%rax,4)
addq %rsi, %rax
movl %ebx, (%r11,%rax,4)
incq %rdi
movq -80(%rbp), %rax # 8-byte Reload
cmpl %edi, %eax
jne .LBB2_9
However the spills can be reduced if only the live ranges of VirtRegs are
splitted properly, because r13 is never directly used in the loop and also
never used for a variable living through the loop.
>From the debug trace, seems there is some weakness for existing edge bundle
based region splitting algorithm which blocks the proper region splitting. Here
is the finding:
The first spill was generated when selectOrSplit is called for vreg13. The dbg
trace is:
*** dbg trace ***
selectOrSplit GR64:%vreg13 [800r,3824B:0) 0 at 800r w=1.945538e+02
RS_Split Cascade 6
Analyze counted 2 instrs in 2 blocks, through 14 blocks.
Compact region bundles, v=9, none.
Cost of isolating all blocks = 665763.1429
%RAX no positive bundles
%RCX static = 665762.5238, v=4 no bundles.
%RDX static = 665762.5238, v=4 no bundles.
%RSI static = 1331525.048 worse than no bundles
%RDI static = 1331525.048 worse than no bundles
%R8 static = 1331525.048 worse than no bundles
%R9 static = 1331525.048 worse than no bundles
%R10 static = 1331525.048 worse than no bundles
%R11 static = 1331525.048 worse than no bundles
%RBX no positive bundles
%R14 no positive bundles
%R15 no positive bundles
%R12 no positive bundles
%R13 static = 0.619047619, v=6 no bundles.
Inline spilling GR64:%vreg13 [800r,3824B:0) 0 at 800r
>From original %vreg13
Merged spilled regs: SS#9 [800r,3824B:0) 0 at x
spillAroundUses %vreg13
rewrite: 800r %vreg178<def> = LEA64r %vreg6, 4, %vreg58, 4, %noreg;
GR64:%vreg178,%vreg6 GR64_NOSP:%vreg58
spill: 808r MOV64mr <fi#9>, 1, %noreg, 0, %noreg, %vreg178<kill>;
mem:ST8[FixedStack9] GR64:%vreg178
reload: 3208r %vreg179<def> = MOV64rm <fi#9>, 1, %noreg, 0, %noreg;
mem:LD8[FixedStack9] GR64:%vreg179
rewrite: 3232r %vreg106<def> = MOVSX64rm32 %vreg179<kill>, 4,
%vreg145, 0, %noreg; mem:LD4[%scevgep169](tbaa=!3) GR64_NOSP:%vreg106,%vreg145
GR64:%vreg179
******************
The simplified CFG is:
BB1
|
BB2
/ \
BB5 ...
... |
| |
-->BB14 BB8
| | \ / \
---- \ / BB9
\ / /
BB15
vreg13 is defined in BB1 and used in BB14, so its live range covers almost all
the function. Before trying region splitting for vreg13, %R13's interference is
[320r,1040r)[1040r, 1136r)[1184r,1488B)[1584r, 2112r)[3696r, 3792r). Notice
BB14's range is from [3168B, 3632r), so vreg13 has no interference with %R13
inside the loop of BB14. vreg13 has interference with %R13 from the entry of
BB8 to the middle of BB9.
>From the dbg trace above, vreg13 cannot use %R13 as split candidate because
there is no positive edge bundle node for candidate %R13 after the Hopfield
network iterations. This is because the entry and exit of BB14, the entry of
BB15, the exit of BB8 and the exit of BB9 are all bound to the same edge
bundle. In RAGreedy::addSplitConstraints, the entry and exit of BB14 have
PrefReg and they give very strong BiasP to its associated edge bundle node.
However this is still not enough to make the edge bundle positive. vreg13 has
interference with %R13 from the entry of BB8 to the middle of BB9, so the
constraint of BB8's exit is marked as MustSpill, and then the related edge
bundle node is marked as Negative directly in RAGreedy::addThroughConstraints.
That is why although it is plausible and beneficial to split vreg13's live
range at the boundary of loop of BB14 using %R13 as split candidate, it isn't
realized.
It looks like a general problem of using edge bundles. Not sure whether it is
already known or not. I have no idea about how to fix it now. Adding an empty
block after the loop of BB14 so the exit of BB14 will have different edge
bundle node with the entry of BB15 seems a possible fix for this case, but it
may not be general enough.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150804/041f3f68/attachment.html>
More information about the llvm-bugs
mailing list