[LLVMdev] Missing optimization - constant parameter

Fri Aug 2 15:48:44 PDT 2013

On Aug 2, 2013, at 1:37 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:

>> I expected that this optimization would be picked
>> up in a cse, gvn, machine-cse or even peepholing pass.
>> 
>> Comments?
> 
> 
> At the LLVM IR level this is represented as
> 
> define i64 @caller() #0 {
> entry:
>  store i64* @val, i64** @p, align 8, !tbaa !0
>  store i64 12345123400, i64* @val, align 8, !tbaa !3
>  %call = tail call i64 @xtr(i64 12345123400) #2
>  ret i64 %call
> }
> 
> Which is probably the best representation to have at this relatively high level.
> 
> At the machine level it looks like it is the register coalescer that
> is duplicating the constant. It transforms
> 
> 0B      BB#0: derived from LLVM BB %entry
> 16B             %vreg0<def> = MOV64rm %RIP, 1, %noreg,
> <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0
> 32B             %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5],
> %noreg; mem:LD8[GOT] GR64:%vreg1
> 48B             MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0;
> mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0
> 64B             %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2
> 80B             MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2;
> mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2
> 96B             %RDI<def> = COPY %vreg2; GR64:%vreg2
> 112B            TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>,
> %RDI<imp-use,kill>
> 
> into
> 
> 0B      BB#0: derived from LLVM BB %entry
> 16B             %vreg0<def> = MOV64rm %RIP, 1, %noreg,
> <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0
> 32B             %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5],
> %noreg; mem:LD8[GOT] GR64:%vreg1
> 48B             MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0;
> mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0
> 64B             %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2
> 80B             MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2;
> mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2
> 96B             %RDI<def> = MOV64ri 12345123400
> 112B            TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>,
> %RDI<imp-use,kill>
> 
> I am not sure why. Maybe this should be delayed until the register
> allocator, which can split the range if it cannot assign rdi to vreg2?
> 
> Jakob, should I open a bug?

MachineCSE skips cheap instructions on purpose:

  // Heuristics #1: Don't CSE "cheap" computation if the def is not local or in
  // an immediate predecessor. We don't want to increase register pressure and
  // end up causing other computation to be spilled.
  if (MI->isAsCheapAsAMove()) {
    MachineBasicBlock *CSBB = CSMI->getParent();
    MachineBasicBlock *BB = MI->getParent();
    if (CSBB != BB && !CSBB->isSuccessor(BB))
      return false;
  }

This code is older than the greedy register allocator. We could delete it if somebody is willing to check for regressions in the test suite.

I thought we had a PR about this already, but I can’t find it now.

Thanks,
/jakob