[cfe-dev] [llvm-dev] [RFC] Loading Bitfields with Smallest Needed Types

Fri May 29 20:23:01 PDT 2020

On Fri, May 29, 2020 at 5:59 PM John McCall <rjmccall at apple.com> wrote:
> On 29 May 2020, at 19:29, Bill Wendling wrote:
> > On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at metafoo.co.uk>
> > wrote:
> >> On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>> On 28 May 2020, at 18:42, Bill Wendling wrote:
> >>>
> >>>> On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev
> >>>> <llvm-dev at lists.llvm.org> wrote:
> >>>>>
> >>>>> At least in this test-case, the "bitfield" part of this seems to
> >>>>> be a
> >>>>> distraction. As Eli notes, Clang has lowered the function to LLVM
> >>>>> IR
> >>>>> containing consistent i16 operations. Despite that being a
> >>>>> different
> >>>>> choice from GCC, it should still be correct and consistent.
> >>>>>
> >>>> I suspect that this is more prevalent with bitfields as they're
> >>>> more
> >>>> likely to have the load / bitwise op / store operations done on
> >>>> them,
> >>>> resulting in an access type that can be shortened. But yes, it's
> >>>> not
> >>>> specific to just bitfields.
> >>>>
> >>>> I'm more interested in consistency, to be honest. If the loads and
> >>>> stores for the bitfields (or other such shorten-able objects) were
> >>>> the
> >>>> same, then we wouldn't run into the store-to-load forwarding issue
> >>>> on
> >>>> x86 (I don't know about other platforms, but suspect that
> >>>> consistency
> >>>> wouldn't hurt). I liked Arthur's idea of accessing the object using
> >>>> the type size the bitfield was defined with (i8, i16, i256). It
> >>>> would
> >>>> help with improving the heuristic. The downside is that it could
> >>>> lead
> >>>> to un-optimal code, but that's the situation we have now, so...
> >>>
> >>> Okay, but what concretely are you suggesting here?  Clang IRGen is
> >>> emitting accesses with consistent sizes, and LLVM is making them
> >>> inconsistent.  Are you just asking Clang to emit smaller accesses
> >>> in the hope that LLVM won’t mess them up?
> >>
> >>
> >> I don't think this has anything to do with bit-fields or Clang's
> >> lowering. This seems to "just" be an optimizer issue (one that
> >> happens to show up for bit-field access patterns, but also affects
> >> other cases). Much-reduced testcase:
> >>
> >> unsigned short n;
> >> void set() { n |= 1; }
>
> To be clear, I agree with you that this is not a frontend problem.
> I asked because the question seemed to lead towards changing the width
> that Clang uses for accesses.
>
> > We're a bit (heh) tied in what we can do. LLVM's IR doesn't convey
> > that we're working with a bitfield, but as many pointed out it's not
> > bitfield-specific (I didn't mean to imply that it was *only* bitfields
> > affected by this with my initial post, that was just where it showed
> > up). The front-end generates code that's perfectly valid for the
> > various accesses, and changing what it generates doesn't seem like it
> > would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
> > (note I'm *not* suggesting we do this, I'm just using as an example).)
> >
> > So what we're left with is trying to ensure that we don't generate
> > sub-optimal code via the DAG combiner and other passes. Some ideas off
> > the top of my head:
> >
> > 1. Avoid generating byte-sized bitwise operations, because as Richard
> > pointed out they can lead to pessimizations.
> >
> > 2. Come up with a better heuristic to determine if we're going to
> > generate inconsistent load/store accesses to the same object.
> >
> > 3. Transform the machine instructions before register allocation to
> > ensure that all accesses to the same address are the same size, or at
> > least that the stores are of equal or greater size than the loads.
> >
> > Option (1) may be the simplest and could potentially solve our
> > immediate issue.
> >
> > Option (2) is difficult because the DAG combiner operates on a single
> > BB at a time and doesn't have visibility into past DAGs it modified.
> >
> > Option (3) allows us to retain the current behavior and gives us
> > visibility into the whole function, not just an individual BB.
> >
> > I kind of like option 3, but I'm open to other suggestions.
>
> What’s the concrete suggestion for (3), that we analyze the uses
> of a single address and try to re-widen the access to make it use
> a consistent width?  Widening accesses is difficult in general,
> unless the information is still present that it was originally
> wider before DAG combine.
>
Alternatively, we could move the shortening logic out of DAG combine
and into a separate pass. I'll come up with a more concrete proposal.
In the meantime, some flags were added which may get us past the
initial issue. (Crossing fingers that it doesn't pessimise other
code.)

> At any rate, seems like a backend problem. :)

Yes. :)

-bw