[llvm-commits] [llvm] r161152 - in /llvm/trunk: include/llvm/Target/TargetInstrInfo.h lib/CodeGen/PeepholeOptimizer.cpp lib/Target/X86/X86InstrInfo.cpp lib/Target/X86/X86InstrInfo.h test/CodeGen/X86/2012-05-19-avx2-store.ll test/CodeGen/X86/break-sse-dep.ll test/CodeGen/X86/fold-load.ll test/CodeGen/X86/fold-pcmpeqd-1.ll test/CodeGen/X86/sse-minmax.ll test/CodeGen/X86/vec_compare.ll
Jakob Stoklund Olesen
stoklund at 2pi.dk
Thu Aug 2 11:02:39 PDT 2012
On Aug 2, 2012, at 12:31 AM, Michael Liao <michael.liao at intel.com> wrote:
> Some cases are considered conflicting with the previous effort to remove
> partial register update stall by Bruno Cardoso Lopes.
>
> For example, sqrtsd with memory operand is such an instruction updating
> only parts of the registers in SSE. It should be selected if the code is
> optimized for size. Otherwise, the sequence of movsd + sqrtsd is
> preferred than sqrtsd with memory operand.
Actually, our current approach to this is not very good.
We prevent loads from being folded into sqrtsd:
movsd (…), %xmm0
sqrtsd %xmm0, %xmm0
But we don't actually make any effort to make the sqrtsd input and output operands the same, so we might as well produce:
movsd (…), %xmm0
sqrtsd %xmm0, %xmm1
Which is completely pointless because there is still a partial register dependency on %xmm1.
A better approach would be to fold the load aggressively:
sqrtsd (…), %xmm1
And then teach X86InstrInfo::breakPartialRegDependency() to unfold the load instead of inserting an xorps dependency breaking instruction:
xorps %xmm1, %xmm1
sqrtsd (…), %xmm1
Would become:
movsd (…), %xmm1
sqrtsd %xmm1, %xmm1
Since this happens after register allocation, we can make sure to pick the same register for the sqrtsd input and output. The load will also only be unfolded where there is a nearby def of %xmm1.
/jakob
More information about the llvm-commits
mailing list