[LLVMbugs] [Bug 24348] New: [regalloc] A possible weakness of edge bundle based region splitting

Mon Aug 3 17:53:26 PDT 2015

https://llvm.org/bugs/show_bug.cgi?id=24348

            Bug ID: 24348
           Summary: [regalloc] A possible weakness of edge bundle based
                    region splitting
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Register Allocator
          Assignee: unassignedbugs at nondot.org
          Reporter: wmi at google.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Created attachment 14687
  --> https://llvm.org/bugs/attachment.cgi?id=14687&action=edit
testcase 1.cpp

The problem is found when analyzing https://llvm.org/bugs/show_bug.cgi?id=24278

Testcase 1.cpp is attached. The commandline is:
~/workarea/llvm-r243652/build/bin/clang++ -O2 -std=c++11
-fno-omit-frame-pointer -fexceptions -fno-tree-vectorize -S 1.cpp -o 1.s

There are two spills in one of the kernel loops generated:
.LBB2_9:                                # %for.body.41
        movq    -72(%rbp), %rax         # 8-byte Reload
        movslq  (%rax,%rdi,4), %rax
        movslq  %edi, %rbx
        movswl  (%r10,%rbx,2), %ebx
        movswl  (%r9,%rdi,2), %edx
        leal    (%rdx,%rdx,2), %ecx
        leal    (%rbx,%rcx,2), %ecx
        movswl  (%r8,%rdi,2), %ebx
        addl    %ebx, %ecx
        addl    %edx, %ebx
        shll    $2, %ebx
        movl    %ecx, (%r11,%rax,4)
        addq    %rsi, %rax
        movl    %ebx, (%r11,%rax,4)
        incq    %rdi
        movq    -80(%rbp), %rax         # 8-byte Reload
        cmpl    %edi, %eax
        jne     .LBB2_9

However the spills can be reduced if only the live ranges of VirtRegs are
splitted properly, because r13 is never directly used in the loop and also
never used for a variable living through the loop.

>From the debug trace, seems there is some weakness for existing edge bundle
based region splitting algorithm which blocks the proper region splitting. Here
is the finding:

The first spill was generated when selectOrSplit is called for vreg13. The dbg
trace is:

*** dbg trace ***
selectOrSplit GR64:%vreg13 [800r,3824B:0)  0 at 800r w=1.945538e+02
RS_Split Cascade 6
Analyze counted 2 instrs in 2 blocks, through 14 blocks.
Compact region bundles, v=9, none.
Cost of isolating all blocks = 665763.1429
%RAX    no positive bundles
%RCX    static = 665762.5238, v=4 no bundles.
%RDX    static = 665762.5238, v=4 no bundles.
%RSI    static = 1331525.048 worse than no bundles
%RDI    static = 1331525.048 worse than no bundles
%R8     static = 1331525.048 worse than no bundles
%R9     static = 1331525.048 worse than no bundles
%R10    static = 1331525.048 worse than no bundles
%R11    static = 1331525.048 worse than no bundles
%RBX    no positive bundles
%R14    no positive bundles
%R15    no positive bundles
%R12    no positive bundles
%R13    static = 0.619047619, v=6 no bundles.
Inline spilling GR64:%vreg13 [800r,3824B:0)  0 at 800r
>From original %vreg13
Merged spilled regs: SS#9 [800r,3824B:0)  0 at x
spillAroundUses %vreg13
        rewrite: 800r   %vreg178<def> = LEA64r %vreg6, 4, %vreg58, 4, %noreg;
GR64:%vreg178,%vreg6 GR64_NOSP:%vreg58
        spill:   808r   MOV64mr <fi#9>, 1, %noreg, 0, %noreg, %vreg178<kill>;
mem:ST8[FixedStack9] GR64:%vreg178
        reload:   3208r %vreg179<def> = MOV64rm <fi#9>, 1, %noreg, 0, %noreg;
mem:LD8[FixedStack9] GR64:%vreg179
        rewrite: 3232r  %vreg106<def> = MOVSX64rm32 %vreg179<kill>, 4,
%vreg145, 0, %noreg; mem:LD4[%scevgep169](tbaa=!3) GR64_NOSP:%vreg106,%vreg145
GR64:%vreg179
******************

The simplified CFG is:
            BB1
             |
            BB2
           /   \
        BB5     ...
        ...       |
         |        |
    -->BB14      BB8
   |    | \      / \
    ----   \    /  BB9
            \  /  /
             BB15

vreg13 is defined in BB1 and used in BB14, so its live range covers almost all
the function. Before trying region splitting for vreg13, %R13's interference is
[320r,1040r)[1040r, 1136r)[1184r,1488B)[1584r, 2112r)[3696r, 3792r). Notice
BB14's range is from [3168B, 3632r), so vreg13 has no interference with %R13
inside the loop of BB14. vreg13 has interference with %R13 from the entry of
BB8 to the middle of BB9.

>From the dbg trace above, vreg13 cannot use %R13 as split candidate because
there is no positive edge bundle node for candidate %R13 after the Hopfield
network iterations. This is because the entry and exit of BB14, the entry of
BB15, the exit of BB8 and the exit of BB9 are all bound to the same edge
bundle. In RAGreedy::addSplitConstraints, the entry and exit of BB14 have
PrefReg and they give very strong BiasP to its associated edge bundle node.
However this is still not enough to make the edge bundle positive. vreg13 has
interference with %R13 from the entry of BB8 to the middle of BB9, so the
constraint of BB8's exit is marked as MustSpill, and then the related edge
bundle node is marked as Negative directly in RAGreedy::addThroughConstraints.
That is why although it is plausible and beneficial to split vreg13's live
range at the boundary of loop of BB14 using %R13 as split candidate, it isn't
realized.

It looks like a general problem of using edge bundles. Not sure whether it is
already known or not. I have no idea about how to fix it now. Adding an empty
block after the loop of BB14 so the exit of BB14 will have different edge
bundle node with the entry of BB15 seems a possible fix for this case, but it
may not be general enough.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150804/041f3f68/attachment.html>