[LLVMdev] Missing optimization - constant parameter
Jakob Stoklund Olesen
stoklund at 2pi.dk
Fri Aug 2 15:48:44 PDT 2013
On Aug 2, 2013, at 1:37 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>> I expected that this optimization would be picked
>> up in a cse, gvn, machine-cse or even peepholing pass.
>>
>> Comments?
>
>
> At the LLVM IR level this is represented as
>
> define i64 @caller() #0 {
> entry:
> store i64* @val, i64** @p, align 8, !tbaa !0
> store i64 12345123400, i64* @val, align 8, !tbaa !3
> %call = tail call i64 @xtr(i64 12345123400) #2
> ret i64 %call
> }
>
> Which is probably the best representation to have at this relatively high level.
>
> At the machine level it looks like it is the register coalescer that
> is duplicating the constant. It transforms
>
> 0B BB#0: derived from LLVM BB %entry
> 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg,
> <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0
> 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5],
> %noreg; mem:LD8[GOT] GR64:%vreg1
> 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0;
> mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0
> 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2
> 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2;
> mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2
> 96B %RDI<def> = COPY %vreg2; GR64:%vreg2
> 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>,
> %RDI<imp-use,kill>
>
> into
>
> 0B BB#0: derived from LLVM BB %entry
> 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg,
> <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0
> 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5],
> %noreg; mem:LD8[GOT] GR64:%vreg1
> 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0;
> mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0
> 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2
> 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2;
> mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2
> 96B %RDI<def> = MOV64ri 12345123400
> 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>,
> %RDI<imp-use,kill>
>
> I am not sure why. Maybe this should be delayed until the register
> allocator, which can split the range if it cannot assign rdi to vreg2?
>
> Jakob, should I open a bug?
MachineCSE skips cheap instructions on purpose:
// Heuristics #1: Don't CSE "cheap" computation if the def is not local or in
// an immediate predecessor. We don't want to increase register pressure and
// end up causing other computation to be spilled.
if (MI->isAsCheapAsAMove()) {
MachineBasicBlock *CSBB = CSMI->getParent();
MachineBasicBlock *BB = MI->getParent();
if (CSBB != BB && !CSBB->isSuccessor(BB))
return false;
}
This code is older than the greedy register allocator. We could delete it if somebody is willing to check for regressions in the test suite.
I thought we had a PR about this already, but I can’t find it now.
Thanks,
/jakob
More information about the llvm-dev
mailing list