[LLVMbugs] [Bug 1512] NEW: sse code regalloc/spillfill problems (only 2 registers used for 280 instructions)

bugzilla-daemon at cs.uiuc.edu bugzilla-daemon at cs.uiuc.edu
Fri Jun 15 02:46:11 PDT 2007


           Summary: sse code regalloc/spillfill problems (only 2 registers
                    used for 280 instructions)
           Product: libraries
           Version: trunk
          Platform: Macintosh
        OS/Version: MacOS X
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Register Allocator
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: duraid at octopus.com.au

Consider the following x86-64 sse code (emitted after the fixes to pr1509 and 1510)

0x00000001040a2796:     movaps %xmm0,15744(%rsp)  <- store xmm0 here (it gets clobbered)
0x00000001040a279e:     movaps 15536(%rsp),%xmm0
0x00000001040a27a6:     xorps  %xmm2,%xmm0
0x00000001040a27a9:     movaps %xmm0,15536(%rsp)
0x00000001040a27b1:     movaps 15296(%rsp),%xmm0
0x00000001040a27b9:     orps   15536(%rsp),%xmm0
0x00000001040a27c1:     movaps %xmm0,15296(%rsp)  <- store new xmm0 here for a second
0x00000001040a27c9:     movaps 15744(%rsp),%xmm0  <- restore the old xmm0
0x00000001040a27d1:     andps  15296(%rsp),%xmm0  <- only to and the new xmm0 with it
0x00000001040a27d9:     movaps %xmm0,15744(%rsp)  <- and store it where the old one was
0x00000001040a27e1:     movaps 15792(%rsp),%xmm0  <- before clobbering xmm0 again
// note: sp+15296 and +15536 are never referenced again.
// (15792 and 15744 *are* used again)

At first glance, this code looks like it was produced under severe register pressure, e.g. xmm0 is spilled 
spilled/filled just so it can be used to AND a single value computed immediately prior.

If there wasn't any register pressure (i.e. if you could use registers other than xmm0 and xmm2), you 
could rewrite the above fragment as simply:

movaps %xmm0,15744(%rsp)
movaps 15536(%rsp),%xmm1
xorps  %xmm2,%xmm1
movaps 15296(%rsp),%xmm3
orps   %xmm1,%xmm3
andps  %xmm3,%xmm0
movaps %xmm0,15744(%rsp)
movaps 15792(%rsp),%xmm0

..saving 3 instructions, but also being able to raise some loads and/or issue some instructions in 
parallel. (In the fragment above, everything is seralized through xmm0, and it doesn't look like any kind 
of renaming breaks these dependencies for SSE registers.) The only catch is that this new code would 
clobber anything in xmm1 or xmm3.

And that's where this bug repot comes in. The fragment up top is actually insn #14527 in a block of 
23745. However, the first time a register that *isn't* xmm0 or xmm2 appears is actually 123 
instructions earlier. And the first time a register not used in the above fragment gets referenced after 
the fragment is 145 instructions later. So basically, the above code is sitting in a window of ~280 
instructions where the only registers used are xmm0 and xmm2. Eek!

An improvement would be to have regs xmm1 and xmm3 spilled *once*, at the top of the "window", 
filled at the bottom, and code such as the stuff I've written above in between.

Not sure what the root cause of this behaviour is.

I will try and distill a smaller test case showing this, but it's hard to see this kind of "hysteresis" in very 
small cases. (It just looks like regalloc is doing a normal job.) Right now, pr1509's testcase shows the 
same problem.

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

More information about the llvm-bugs mailing list