[llvm-commits] [llvm] r161152 - in /llvm/trunk: include/llvm/Target/TargetInstrInfo.h lib/CodeGen/PeepholeOptimizer.cpp lib/Target/X86/X86InstrInfo.cpp lib/Target/X86/X86InstrInfo.h test/CodeGen/X86/2012-05-19-avx2-store.ll test/CodeGen/X86/break-sse-dep.ll test/CodeGen/X86/fold-load.ll test/CodeGen/X86/fold-pcmpeqd-1.ll test/CodeGen/X86/sse-minmax.ll test/CodeGen/X86/vec_compare.ll

Michael Liao michael.liao at intel.com
Thu Aug 2 14:49:13 PDT 2012


On Thu, 2012-08-02 at 14:22 -0700, Jakob Stoklund Olesen wrote:
> On Aug 2, 2012, at 12:49 PM, Michael Liao <michael.liao at intel.com> wrote:
> 
> >> And then teach X86InstrInfo::breakPartialRegDependency() to unfold the load instead of inserting an xorps dependency breaking instruction:
> >> 
> >>  xorps %xmm1, %xmm1
> >>  sqrtsd (…), %xmm1
> > 
> > In fact, this's what I want to suggestion to break partial register
> > install. xorps idiom is better than movsd + sqrtsd by saving 1 byte in
> > instruction as well as having much efficient support in OOO proccesoors.
> > 
> > If no one works on that, I would start to develop a machine pass to
> > break this kind partial register stalls.
> 
> No need, ExecutionDepsFix.cpp already does that. See X86InstrInfo::getPartialRegUpdateClearance() and breakPartialRegDependency().
> 

Just found that.

> AFAICT, the only thing missing is that hasPartialRegUpdate() doesn't yet know about the 'rm' versions of the instructions.
> 
> We also need to do something about AVX instructions where the superfluous dependency is explicit in an extra operand. I think they just all get xmm0 currently.

Yeah, AVX insns always update full register. I could not confirm whether
only XMM0 is allocated for undef operand or not. If not, one option is
to tune reg-alloc for better choice if we could confirm some physical
registers not updated for a while (arbitary value and need tuning.) to
save one XOR.

Yours
- Michael

> 
> /jakob
> 





More information about the llvm-commits mailing list