[LLVMbugs] [Bug 1512] NEW: sse code regalloc/spillfill problems (only 2 registers used for 280 instructions)
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Fri Jun 15 02:46:11 PDT 2007
http://llvm.org/bugs/show_bug.cgi?id=1512
Summary: sse code regalloc/spillfill problems (only 2 registers
used for 280 instructions)
Product: libraries
Version: trunk
Platform: Macintosh
OS/Version: MacOS X
Status: NEW
Severity: enhancement
Priority: P2
Component: Register Allocator
AssignedTo: unassignedbugs at nondot.org
ReportedBy: duraid at octopus.com.au
Consider the following x86-64 sse code (emitted after the fixes to pr1509 and 1510)
0x00000001040a2796: movaps %xmm0,15744(%rsp) <- store xmm0 here (it gets clobbered)
0x00000001040a279e: movaps 15536(%rsp),%xmm0
0x00000001040a27a6: xorps %xmm2,%xmm0
0x00000001040a27a9: movaps %xmm0,15536(%rsp)
0x00000001040a27b1: movaps 15296(%rsp),%xmm0
0x00000001040a27b9: orps 15536(%rsp),%xmm0
0x00000001040a27c1: movaps %xmm0,15296(%rsp) <- store new xmm0 here for a second
0x00000001040a27c9: movaps 15744(%rsp),%xmm0 <- restore the old xmm0
0x00000001040a27d1: andps 15296(%rsp),%xmm0 <- only to and the new xmm0 with it
0x00000001040a27d9: movaps %xmm0,15744(%rsp) <- and store it where the old one was
0x00000001040a27e1: movaps 15792(%rsp),%xmm0 <- before clobbering xmm0 again
// note: sp+15296 and +15536 are never referenced again.
// (15792 and 15744 *are* used again)
At first glance, this code looks like it was produced under severe register pressure, e.g. xmm0 is spilled
spilled/filled just so it can be used to AND a single value computed immediately prior.
If there wasn't any register pressure (i.e. if you could use registers other than xmm0 and xmm2), you
could rewrite the above fragment as simply:
movaps %xmm0,15744(%rsp)
movaps 15536(%rsp),%xmm1
xorps %xmm2,%xmm1
movaps 15296(%rsp),%xmm3
orps %xmm1,%xmm3
andps %xmm3,%xmm0
movaps %xmm0,15744(%rsp)
movaps 15792(%rsp),%xmm0
..saving 3 instructions, but also being able to raise some loads and/or issue some instructions in
parallel. (In the fragment above, everything is seralized through xmm0, and it doesn't look like any kind
of renaming breaks these dependencies for SSE registers.) The only catch is that this new code would
clobber anything in xmm1 or xmm3.
And that's where this bug repot comes in. The fragment up top is actually insn #14527 in a block of
23745. However, the first time a register that *isn't* xmm0 or xmm2 appears is actually 123
instructions earlier. And the first time a register not used in the above fragment gets referenced after
the fragment is 145 instructions later. So basically, the above code is sitting in a window of ~280
instructions where the only registers used are xmm0 and xmm2. Eek!
An improvement would be to have regs xmm1 and xmm3 spilled *once*, at the top of the "window",
filled at the bottom, and code such as the stuff I've written above in between.
Not sure what the root cause of this behaviour is.
I will try and distill a smaller test case showing this, but it's hard to see this kind of "hysteresis" in very
small cases. (It just looks like regalloc is doing a normal job.) Right now, pr1509's testcase shows the
same problem.
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
More information about the llvm-bugs
mailing list