<div dir="ltr"><div><div><div>Are you sure that's it? I commented that block out, rebuilt llvm 3.3,  and it still duplicates the constant.<br></div>My concern is that long constant loads increase code size and if they can be avoided by better targeting it would be a win. My project's application of llvm tends to use a lot of long constants so this can be a significant optimization.<br>

</div>I'll do some more debugging now that you have pointed me in the right direction.<br><br>thanks<br></div>/maurice<br><div><div><div><div><div></div><div style id="__af745f8f43-e961-4b88-8424-80b67790c964__"></div>

</div></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Aug 2, 2013 at 5:48 PM, Jakob Stoklund Olesen <span dir="ltr"><<a href="mailto:stoklund@2pi.dk" target="_blank">stoklund@2pi.dk</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>

On Aug 2, 2013, at 1:37 PM, Rafael Espíndola <<a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>> wrote:<br>

<br>

>> I expected that this optimization would be picked<br>

>> up in a cse, gvn, machine-cse or even peepholing pass.<br>

>><br>

>> Comments?<br>

><br>

><br>

> At the LLVM IR level this is represented as<br>

><br>

> define i64 @caller() #0 {<br>

> entry:<br>

>  store i64* @val, i64** @p, align 8, !tbaa !0<br>

>  store i64 <a href="tel:12345123400" value="+12345123400">12345123400</a>, i64* @val, align 8, !tbaa !3<br>

>  %call = tail call i64 @xtr(i64 12345123400) #2<br>

>  ret i64 %call<br>

> }<br>

><br>

> Which is probably the best representation to have at this relatively high level.<br>

><br>

> At the machine level it looks like it is the register coalescer that<br>

> is duplicating the constant. It transforms<br>

><br>

> 0B      BB#0: derived from LLVM BB %entry<br>

> 16B             %vreg0<def> = MOV64rm %RIP, 1, %noreg,<br>

> <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0<br>

> 32B             %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5],<br>

> %noreg; mem:LD8[GOT] GR64:%vreg1<br>

> 48B             MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0;<br>

> mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0<br>

> 64B             %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2<br>

> 80B             MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2;<br>

> mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2<br>

> 96B             %RDI<def> = COPY %vreg2; GR64:%vreg2<br>

> 112B            TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>,<br>

> %RDI<imp-use,kill><br>

><br>

> into<br>

><br>

> 0B      BB#0: derived from LLVM BB %entry<br>

> 16B             %vreg0<def> = MOV64rm %RIP, 1, %noreg,<br>

> <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0<br>

> 32B             %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5],<br>

> %noreg; mem:LD8[GOT] GR64:%vreg1<br>

> 48B             MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0;<br>

> mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0<br>

> 64B             %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2<br>

> 80B             MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2;<br>

> mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2<br>

> 96B             %RDI<def> = MOV64ri 12345123400<br>

> 112B            TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>,<br>

> %RDI<imp-use,kill><br>

><br>

> I am not sure why. Maybe this should be delayed until the register<br>

> allocator, which can split the range if it cannot assign rdi to vreg2?<br>

><br>

> Jakob, should I open a bug?<br>

<br>

</div></div>MachineCSE skips cheap instructions on purpose:<br>

<br>

  // Heuristics #1: Don't CSE "cheap" computation if the def is not local or in<br>

  // an immediate predecessor. We don't want to increase register pressure and<br>

  // end up causing other computation to be spilled.<br>

  if (MI->isAsCheapAsAMove()) {<br>

    MachineBasicBlock *CSBB = CSMI->getParent();<br>

    MachineBasicBlock *BB = MI->getParent();<br>

    if (CSBB != BB && !CSBB->isSuccessor(BB))<br>

      return false;<br>

  }<br>

<br>

This code is older than the greedy register allocator. We could delete it if somebody is willing to check for regressions in the test suite.<br>

<br>

I thought we had a PR about this already, but I can’t find it now.<br>

<br>

Thanks,<br>

/jakob<br>

<br>

</blockquote></div><br></div>