[llvm-commits] Bad COPY removal in register coalescer
James Molloy
james.molloy at arm.com
Mon Oct 29 09:34:12 PDT 2012
Hi [Jakob],
I've found a bug in the Register Coalescer, but am not really sure how
to fix it. Hopefully you can help!
The short version is that the coalescer decides to coalesce (via
adjustCopiesBackFrom()) a COPY that it shouldn't.
The copy is this:
%vreg31<def> = COPY %vreg24; QPR:%vreg31,%vreg24
The code calls isCoalescable() with this argument:
%vreg24:dsub_0<def> = COPY %vreg31:dsub_0; QPR:%vreg24,%vreg31
... which returns true.
Unless I'm being daft, it can't coalesce these copies, because the other
subregister (dsub_1) is not defined by the ACopyMI. Indeed, this causes
a codegen fault where a dsub register is clobbered.
I *think* the code at RegisterCoalescer:354 should probably handle it:
} else {
// DstReg is virtual.
if (DstReg != Dst)
return false;
// Registers match, do the subregisters line up?
return compose(TRI, SrcIdx, SrcSub) == compose(TRI, DstIdx, DstSub);
}
But those compositions don't sit quite right with me and I don't really
know how I should be reading them to make them make sense, even after
reading the documentation for TRI::composeSubRegIndices many times.
I'm attaching the testcase I'm using, if it helps - the machine code
goes bad (a copy that shouldn't gets removed) just after the second call
to adjustCopiesBackFrom().
[The intended action of the attachment is to perform a signed to
unsigned conversion with saturation.]
Cheers,
James
-------------- next part --------------
define float @f(<4 x float>* %pinput) {
%input = load <4 x float>* %pinput, align 16
%e0 = extractelement <4 x float> %input, i32 0
%e1 = extractelement <4 x float> %input, i32 1
%e2 = extractelement <4 x float> %input, i32 2
%e3 = extractelement <4 x float> %input, i32 3
%sum0 = fadd float %e0, %e1
%sum1 = fadd float %sum0, %e2
%sum2 = fadd float %sum1, %e3
ret float %sum2
}
More information about the llvm-commits
mailing list