[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

Tue Feb 24 04:17:22 PST 2015

On Mon, Feb 23, 2015 at 2:17 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Francois Pichet" <pichet2000 at gmail.com>
> > To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> > Sent: Sunday, February 22, 2015 5:34:11 PM
> > Subject: [LLVMdev] Question about shouldMergeGEPs in InstructionCombining
> >
> > Hello
> >
> > I am not sure I understand the logic for merging GEPs in
> > InstructionCombining.cpp:
> >
> > static bool shouldMergeGEPs (GEPOperator &GEP, GEPOperator &Src) {
> > // If this GEP has only 0 indices, it is the same pointer as
> > // Src. If Src is not a trivial GEP too, don't combine
> > // the indices.
> > if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&
> > !Src.hasOneUse())
> > return false;
> > return true;
> > }
> >
> >
> >
> > I have a case where
> > GEP: %arrayidx7 = getelementptr inbounds i32* %arrayidx, i32 %shl6
> > Src: %arrayidx = getelementptr inbounds [4096 x i32]* @phasor_4096,
> > i32 0, i32 %shl2
> >
> >
> > GEP. hasAllZeroIndices() will return false and the merge will occur
> > Why do we want to combine these 2 getelementptr?
> >
> >
> > On my out of tree target, combining these 2 GetElementPtr create a
> > performance regression because since GEP is in a loop (Src is out of
> > loop), GEP will lower to a more complicated address for a subsequent
> > load. (the complicated address needs to be calculated over and over
> > in the loop)
> >
>
> There are a couple of issues here. One, InstCombine's job is the move the
> IR toward a reasonable canonical form. It is doing that here, and I think
> that's the right thing to do. However, the problem you point out is a
> general one. Can you please explain why the MachineLICM pass is not able to
> hoist the redundant parts of the addressing computation out of the loop? We
> might also want to adjust for this in CodeGenPrep (CGP already has
> addressing-mode aware GEP splitting logic, although not quite for this
> case).
>
>
Hi Hal,

MachineLICM is not able to hoist anything because the address mode is not
loop invariant.

Here is a reduction of the code I am talking about.

extern const unsigned   phasor[4096];
void test(unsigned* out , unsigned step_size)
{
  unsigned big_step_size = step_size<<2;
  int *phasor_ptr_temp_1 = &phasor[big_step_size];
  for (int i = 0 ; i < 1020 ; i+=4)
  out[i] = phasor_ptr_temp_1[i<<step_size];
}

I am getting slightly better code on my target (Octasic's Opus) if I return
false for shouldMergeGEPs.
I just tried with ppc64 and x86-64 and I am also getting better code
without the GEP merging in InstructionCombining. I am not sure what the
solution is yet but I think we are too aggressive when merging GEPs in
InstructionCombining.

Here is the details for ppc64: http://pastie.org/9978022
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/606b7b39/attachment.html>