[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types
Craig Topper via llvm-dev
llvm-dev at lists.llvm.org
Fri May 29 16:49:56 PDT 2020
I thought the IR you showed in March had the store in one function and the
load in another function. Option 3 would not handle that.
~Craig
On Fri, May 29, 2020 at 4:29 PM Bill Wendling via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at metafoo.co.uk>
> wrote:
> >
> > On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >> On 28 May 2020, at 18:42, Bill Wendling wrote:
> >>
> >> > On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev
> >> > <llvm-dev at lists.llvm.org> wrote:
> >> >>
> >> >> At least in this test-case, the "bitfield" part of this seems to be a
> >> >> distraction. As Eli notes, Clang has lowered the function to LLVM IR
> >> >> containing consistent i16 operations. Despite that being a different
> >> >> choice from GCC, it should still be correct and consistent.
> >> >>
> >> > I suspect that this is more prevalent with bitfields as they're more
> >> > likely to have the load / bitwise op / store operations done on them,
> >> > resulting in an access type that can be shortened. But yes, it's not
> >> > specific to just bitfields.
> >> >
> >> > I'm more interested in consistency, to be honest. If the loads and
> >> > stores for the bitfields (or other such shorten-able objects) were the
> >> > same, then we wouldn't run into the store-to-load forwarding issue on
> >> > x86 (I don't know about other platforms, but suspect that consistency
> >> > wouldn't hurt). I liked Arthur's idea of accessing the object using
> >> > the type size the bitfield was defined with (i8, i16, i256). It would
> >> > help with improving the heuristic. The downside is that it could lead
> >> > to un-optimal code, but that's the situation we have now, so...
> >>
> >> Okay, but what concretely are you suggesting here? Clang IRGen is
> >> emitting accesses with consistent sizes, and LLVM is making them
> >> inconsistent. Are you just asking Clang to emit smaller accesses
> >> in the hope that LLVM won’t mess them up?
> >
> >
> > I don't think this has anything to do with bit-fields or Clang's
> lowering. This seems to "just" be an optimizer issue (one that happens to
> show up for bit-field access patterns, but also affects other cases).
> Much-reduced testcase:
> >
> > unsigned short n;
> > void set() { n |= 1; }
> >
> > For this testcase, -O2 generates a 1-byte 'or' instruction, which will
> often be a pessimization when there are also full-width accesses. I don't
> think the frontend can or should be working around this.
>
> We're a bit (heh) tied in what we can do. LLVM's IR doesn't convey
> that we're working with a bitfield, but as many pointed out it's not
> bitfield-specific (I didn't mean to imply that it was *only* bitfields
> affected by this with my initial post, that was just where it showed
> up). The front-end generates code that's perfectly valid for the
> various accesses, and changing what it generates doesn't seem like it
> would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
> (note I'm *not* suggesting we do this, I'm just using as an example).)
>
> So what we're left with is trying to ensure that we don't generate
> sub-optimal code via the DAG combiner and other passes. Some ideas off
> the top of my head:
>
> 1. Avoid generating byte-sized bitwise operations, because as Richard
> pointed out they can lead to pessimizations.
>
> 2. Come up with a better heuristic to determine if we're going to
> generate inconsistent load/store accesses to the same object.
>
> 3. Transform the machine instructions before register allocation to
> ensure that all accesses to the same address are the same size, or at
> least that the stores are of equal or greater size than the loads.
>
> Option (1) may be the simplest and could potentially solve our immediate
> issue.
>
> Option (2) is difficult because the DAG combiner operates on a single
> BB at a time and doesn't have visibility into past DAGs it modified.
>
> Option (3) allows us to retain the current behavior and gives us
> visibility into the whole function, not just an individual BB.
>
> I kind of like option 3, but I'm open to other suggestions.
>
> -bw
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200529/67d82e3f/attachment.html>
More information about the llvm-dev
mailing list