[cfe-dev] [llvm-dev] [RFC] Loading Bitfields with Smallest Needed Types

Bill Wendling via cfe-dev cfe-dev at lists.llvm.org
Fri May 29 22:50:39 PDT 2020


On Fri, May 29, 2020 at 10:09 PM Carrot Wei <carrot at google.com> wrote:
>
> On Fri, May 29, 2020 at 5:59 PM John McCall via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > On 29 May 2020, at 19:29, Bill Wendling wrote:
> > > On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at metafoo.co.uk>
> > > wrote:
> > >> On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev
> > >> <llvm-dev at lists.llvm.org> wrote:
> > >>> On 28 May 2020, at 18:42, Bill Wendling wrote:
> > >>>
> > >>>> On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev
> > >>>> <llvm-dev at lists.llvm.org> wrote:
> > >>>>>
> > >>>>> At least in this test-case, the "bitfield" part of this seems to
> > >>>>> be a
> > >>>>> distraction. As Eli notes, Clang has lowered the function to LLVM
> > >>>>> IR
> > >>>>> containing consistent i16 operations. Despite that being a
> > >>>>> different
> > >>>>> choice from GCC, it should still be correct and consistent.
> > >>>>>
> > >>>> I suspect that this is more prevalent with bitfields as they're
> > >>>> more
> > >>>> likely to have the load / bitwise op / store operations done on
> > >>>> them,
> > >>>> resulting in an access type that can be shortened. But yes, it's
> > >>>> not
> > >>>> specific to just bitfields.
> > >>>>
> > >>>> I'm more interested in consistency, to be honest. If the loads and
> > >>>> stores for the bitfields (or other such shorten-able objects) were
> > >>>> the
> > >>>> same, then we wouldn't run into the store-to-load forwarding issue
> > >>>> on
> > >>>> x86 (I don't know about other platforms, but suspect that
> > >>>> consistency
> > >>>> wouldn't hurt). I liked Arthur's idea of accessing the object using
> > >>>> the type size the bitfield was defined with (i8, i16, i256). It
> > >>>> would
> > >>>> help with improving the heuristic. The downside is that it could
> > >>>> lead
> > >>>> to un-optimal code, but that's the situation we have now, so...
> > >>>
> > >>> Okay, but what concretely are you suggesting here?  Clang IRGen is
> > >>> emitting accesses with consistent sizes, and LLVM is making them
> > >>> inconsistent.  Are you just asking Clang to emit smaller accesses
> > >>> in the hope that LLVM won’t mess them up?
> > >>
> > >>
> > >> I don't think this has anything to do with bit-fields or Clang's
> > >> lowering. This seems to "just" be an optimizer issue (one that
> > >> happens to show up for bit-field access patterns, but also affects
> > >> other cases). Much-reduced testcase:
> > >>
> > >> unsigned short n;
> > >> void set() { n |= 1; }
> >
> > To be clear, I agree with you that this is not a frontend problem.
> > I asked because the question seemed to lead towards changing the width
> > that Clang uses for accesses.
> >
> > > We're a bit (heh) tied in what we can do. LLVM's IR doesn't convey
> > > that we're working with a bitfield, but as many pointed out it's not
> > > bitfield-specific (I didn't mean to imply that it was *only* bitfields
> > > affected by this with my initial post, that was just where it showed
> > > up). The front-end generates code that's perfectly valid for the
> > > various accesses, and changing what it generates doesn't seem like it
> > > would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
> > > (note I'm *not* suggesting we do this, I'm just using as an example).)
> > >
> > > So what we're left with is trying to ensure that we don't generate
> > > sub-optimal code via the DAG combiner and other passes. Some ideas off
> > > the top of my head:
> > >
> > > 1. Avoid generating byte-sized bitwise operations, because as Richard
> > > pointed out they can lead to pessimizations.
> > >
> > > 2. Come up with a better heuristic to determine if we're going to
> > > generate inconsistent load/store accesses to the same object.
> > >
> > > 3. Transform the machine instructions before register allocation to
> > > ensure that all accesses to the same address are the same size, or at
> > > least that the stores are of equal or greater size than the loads.
> > >
> > > Option (1) may be the simplest and could potentially solve our
> > > immediate issue.
> > >
> > > Option (2) is difficult because the DAG combiner operates on a single
> > > BB at a time and doesn't have visibility into past DAGs it modified.
> > >
> > > Option (3) allows us to retain the current behavior and gives us
> > > visibility into the whole function, not just an individual BB.
> > >
> > > I kind of like option 3, but I'm open to other suggestions.
> >
> > What’s the concrete suggestion for (3), that we analyze the uses
> > of a single address and try to re-widen the access to make it use
> > a consistent width?  Widening accesses is difficult in general,
> > unless the information is still present that it was originally
> > wider before DAG combine.
> >
> > At any rate, seems like a backend problem. :)
> >
>  In order to modify load / store instructions, all related alu instructions must
> also be analyzed and modified. It is not convenient in machine instruction
> level. LLVM IR is more appropriate for this work. But DAG combine and
> selection can narrow and widen those load / store again. Actually most of
> current load / store narrow / widen problems come from DAG combine and
> selection. Maybe we should move these load / store narrow / widen
> optimization from DAG to an LLVM pass.

Even better. It's certainly easier to work with LLVM IR. :-)

-bw


More information about the cfe-dev mailing list