[llvm-dev] Load combine pass

Wed Sep 11 04:47:41 PDT 2019

Hi,

Can I ask what is the status of load widening. It seems there is no load
widening on IR at all.

// Paweł

On Wed, Oct 5, 2016 at 1:49 PM Artur Pilipenko via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Philip and I talked about this is person. Given the fact that load
> widening in presence of atomics is irreversible transformation we agreed
> that we don't want to do this early. For now it can be implemented as a
> peephole optimization over machine IR. MIR is preferred here because X86
> backend does GEP reassociation at MIR level and it can make information
> about addresses being adjacent available.
>
> Inline cost of the original pattern is a valid concern and we might want
> to revisit our decision later. But in order to do widening earlier we need
> to have a way to undo this transformation.
>
> I’m going to try implementing MIR optimization but not in the immediate
> future.
>
> Artur
>
> > On 28 Sep 2016, at 19:32, Artur Pilipenko <apilipenko at azulsystems.com>
> wrote:
> >
> > One of the arguments for doing this earlier is inline cost perception of
> the original pattern. Reading i32/i64 by bytes look much more expensive
> than it is and can prevent inlining of interesting function.
> >
> > Inhibiting other optimizations concern can be addressed by careful
> selection of the pattern we’d like to match. I limit the transformation to
> the case when all the individual have no uses other than forming a wider
> load. In this case it’s less likely to loose information during this
> transformation.
> >
> > I didn’t think of atomicity aspect though.
> >
> > Artur
> >
> >> On 28 Sep 2016, at 18:50, Philip Reames <listmail at philipreames.com>
> wrote:
> >>
> >> There's a bit of additional context worth adding here...
> >>
> >> Up until very recently, we had a form of widening implemented in GVN.
> We decided to remove this in https://reviews.llvm.org/D24096 precisely
> because its placement in the pass pipeline was inhibiting other
> optimizations. There's also a major problem with doing widening at the IR
> level which is that widening a pair of atomic loads into a single wider
> atomic load can not be undone. This creates a major pass ordering problem
> of its own.
> >>
> >> At this point, my general view is that widening transformations of any
> kind should be done very late.  Ideally, this is something the backend
> would do, but doing it as a CGP like fixup pass over the IR is also
> reasonable.
> >>
> >> With that in mind, I feel both the current placement of LoadCombine
> (within the inliner iteration) and the proposed InstCombine rule are
> undesirable.
> >>
> >> Philip
> >>
> >>
> >> On 09/28/2016 08:22 AM, Artur Pilipenko wrote:
> >>> Hi,
> >>>
> >>> I'm trying to optimize a pattern like this into a single i16 load:
> >>> %1 = bitcast i16* %pData to i8*
> >>> %2 = load i8, i8* %1, align 1
> >>> %3 = zext i8 %2 to i16
> >>> %4 = shl nuw i16 %3, 8
> >>> %5 = getelementptr inbounds i8, i8* %1, i16 1
> >>> %6 = load i8, i8* %5, align 1
> >>> %7 = zext i8 %6 to i16
> >>> %8 = shl nuw nsw i16 %7, 0
> >>> %9 = or i16 %8, %4
> >>>
> >>> I came across load combine pass which is motivated by virtualliy the
> same pattern. Load combine optimizes the pattern by combining adjacent
> loads into one load and lets the rest of the optimizer cleanup the rest.
> From what I see on the initial review for load combine (
> https://reviews.llvm.org/D3580) it was not enabled by default because it
> caused some performance regressions. It's not very surprising, I see how
> this type of widening can obscure some facts for the rest of the optimizer.
> >>>
> >>> I can't find any backstory for this pass, why was it chosen to
> optimize the pattern in question in this way? What is the current status of
> this pass?
> >>>
> >>> I have an alternative implementation for it locally. I implemented an
> instcombine rule similar to recognise bswap/bitreverse idiom. It relies on
> collectBitParts (Local.cpp) to determine the origin of the bits in a given
> or value. If all the bits are happen to be loaded from adjacent locations
> it replaces the or with a single load or a load plus bswap.
> >>>
> >>> If the alternative approach sounds reasonable I'll post my patches for
> review.
> >>>
> >>> Artur
> >>
> >
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190911/3c8f495a/attachment.html>