[llvm-commits] [llvm] r161152 - in /llvm/trunk: include/llvm/Target/TargetInstrInfo.h lib/CodeGen/PeepholeOptimizer.cpp lib/Target/X86/X86InstrInfo.cpp lib/Target/X86/X86InstrInfo.h test/CodeGen/X86/2012-05-19-avx2-store.ll test/CodeGen/X86/break-sse-dep.ll test/CodeGen/X86/fold-load.ll test/CodeGen/X86/fold-pcmpeqd-1.ll test/CodeGen/X86/sse-minmax.ll test/CodeGen/X86/vec_compare.ll

Thu Aug 2 11:02:39 PDT 2012

On Aug 2, 2012, at 12:31 AM, Michael Liao <michael.liao at intel.com> wrote:

> Some cases are considered conflicting with the previous effort to remove
> partial register update stall by Bruno Cardoso Lopes.
> 
> For example, sqrtsd with memory operand is such an instruction updating
> only parts of the registers in SSE. It should be selected if the code is
> optimized for size. Otherwise, the sequence of movsd + sqrtsd is
> preferred than sqrtsd with memory operand.

Actually, our current approach to this is not very good.

We prevent loads from being folded into sqrtsd:

  movsd (…), %xmm0
  sqrtsd %xmm0, %xmm0

But we don't actually make any effort to make the sqrtsd input and output operands the same, so we might as well produce:

  movsd (…), %xmm0
  sqrtsd %xmm0, %xmm1

Which is completely pointless because there is still a partial register dependency on %xmm1.

A better approach would be to fold the load aggressively:

  sqrtsd (…), %xmm1

And then teach X86InstrInfo::breakPartialRegDependency() to unfold the load instead of inserting an xorps dependency breaking instruction:

  xorps %xmm1, %xmm1
  sqrtsd (…), %xmm1

Would become:

  movsd (…), %xmm1
  sqrtsd %xmm1, %xmm1

Since this happens after register allocation, we can make sure to pick the same register for the sqrtsd input and output. The load will also only be unfolded where there is a nearby def of %xmm1.

/jakob