[PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Hal Finkel hfinkel at anl.gov
Wed Oct 22 00:33:33 PDT 2014


----- Original Message -----
> From: "Hao Liu" <Hao.Liu at arm.com>
> To: "Hao Liu" <Hao.Liu at arm.com>, "t p northover" <t.p.northover at gmail.com>, jingyue at google.com
> Cc: hfinkel at anl.gov, "amara emerson" <amara.emerson at arm.com>, llvm-commits at cs.uiuc.edu
> Sent: Wednesday, October 22, 2014 2:07:12 AM
> Subject: Re: [PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.
> 
> >>! In D5864#5, @hfinkel wrote:
> >> And most important is that it will also transform complex GEPs
> >> into a "ptrtoint + arithmetic + inttoptr" form, so that it is
> >> able to find CSE opportunities across basic blocks.
> > 
> > You'll need to disable this transformation when AA is enabled in
> > CodeGen. You can do this just like CGP does (essentially by
> > calling TM->getSubtarget<TargetSubtargetInfo>().useAA()). CGP
> > splits the GEPs into simpler GEPs when AA is in use, and uses
> > inttoptr otherwise. This all relates to the way that we use
> > BasicAA to override TBAA and catch cases of basic type punning --
> > but BasicAA does not analyze inttoptr, and so can no longer
> > fulfill this need after new inttoptrs are introduced. You could
> > try using simpler GEPs like CGP does when AA is in use, or just
> > disable this part of the transformation for now. Either is fine
> > with me.
> 
> Hi Hal,
> 
> Thanks for the explanation on useAA(). We do transformation because
> using simpler GEPs to replace a complex GEP like CGP does has small
> impact. Such transformation is not in CGP, because we want to make
> use of the constant extraction in SeparateConstOffsetFromGEP pass,
> and we also want to call other passes like EarlyCSE to remove common
> sub-expressions after such transformation. For the performance
> concern, I prefer to enable such transformation in AArch64 backend.
> But unfortunately, AArch64 backend enables AA for both cortex-a53 and
> cortex-a57:
>       bool useAA() const override { return isCortexA53() ||
>       isCortexA57(); }
> This was introduced recently in http://reviews.llvm.org/D5103. I find
> most of the backend including X86,ARM doesn't enable AA. AArch64
> backend also disables AA when CPU is cyclone. As Sanjin also says
> enabling AA for CortexA57 doesn't have performance changes in D5103,
> I think we can just remove CortexA57 from useAA(), so that we can
> get benefit from such transformation and has no regression.
> 
> For CortexA53, I don't test how much benefit from enabling AA in code
> gen. Maybe we can test following situations and find the best
> solution:
>   (1) disable AA, enable Transformation.
>   (2) enable AA, disable Transformation.
>   (3) enable AA and Transformation.
> Anyway, such test need a lot of time. So I think we'd better only
> consider about A57, so I've attached a new patch (diff ID 15234)
> following your suggestion and remove A57 form useAA().

Hi Hao,

This is not quite what I had in mind... when I said that we should disable the transformation, I did not mean that you should disable the entire pass, but rather, that you should only disable the part of SeparateConstOffsetFromGEP that breaks apart GEPs into ptrtoint/inttoptr when useAA is true like CGP does. Does SeparateConstOffsetFromGEP not convey benefits without this part of the transformation? Can you try implementing the same thing by breaking the GEPs into simpler ones like CGP does when useAA is true?

As a general note, while I understand why CGP breaks GEPs into ptrtoint/inttoptr, I'd like to move the generic CodeGen passes away from doing that. If you look at CGP, implementing this in terms of GEPs is pretty straightforward. When testing on x86 using -addr-sink-using-gep (even with useAA false), performance generally improves over using the ptrtoint/inttoptr (although there are a couple of regressions I need to investigate when I have some time). As a quasi-separate issue, I'm curious to know what using -addr-sink-using-gep does on ARM/AArch64. Regardless, I'm excited to see this work because I'd like to make SeparateConstOffsetFromGEP available to more backends -- including PowerPC that uses AA more aggressively -- and so I don't want the pass itself to introduce ptrtoint/inttoptr (at least not when useAA is true).

Thanks again,
Hal

> 
> Review please.
> 
> Thanks,
> -Hao
> 
> http://reviews.llvm.org/D5864
> 
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-commits mailing list