[cfe-dev] [llvm-dev] [RFC] Loading Bitfields with Smallest Needed Types
Bill Wendling via cfe-dev
cfe-dev at lists.llvm.org
Fri May 29 16:56:51 PDT 2020
On Fri, May 29, 2020 at 4:50 PM Craig Topper <craig.topper at gmail.com> wrote:
>
> I thought the IR you showed in March had the store in one function and the load in another function. Option 3 would not handle that.
>
Yeah. That's an entirely different issue though as it'll need IPA. I'm
a bit skeptical about the initial report as I suspect that a store at
the end of a function would have retired or at least made it to a
cache by the time the function returns, the stack's readjusted,
registers restored, etc. So I think some of the details in the initial
reports may have been a bit off.
-bw
> On Fri, May 29, 2020 at 4:29 PM Bill Wendling via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>
>> On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at metafoo.co.uk> wrote:
>> >
>> > On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> On 28 May 2020, at 18:42, Bill Wendling wrote:
>> >>
>> >> > On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev
>> >> > <llvm-dev at lists.llvm.org> wrote:
>> >> >>
>> >> >> At least in this test-case, the "bitfield" part of this seems to be a
>> >> >> distraction. As Eli notes, Clang has lowered the function to LLVM IR
>> >> >> containing consistent i16 operations. Despite that being a different
>> >> >> choice from GCC, it should still be correct and consistent.
>> >> >>
>> >> > I suspect that this is more prevalent with bitfields as they're more
>> >> > likely to have the load / bitwise op / store operations done on them,
>> >> > resulting in an access type that can be shortened. But yes, it's not
>> >> > specific to just bitfields.
>> >> >
>> >> > I'm more interested in consistency, to be honest. If the loads and
>> >> > stores for the bitfields (or other such shorten-able objects) were the
>> >> > same, then we wouldn't run into the store-to-load forwarding issue on
>> >> > x86 (I don't know about other platforms, but suspect that consistency
>> >> > wouldn't hurt). I liked Arthur's idea of accessing the object using
>> >> > the type size the bitfield was defined with (i8, i16, i256). It would
>> >> > help with improving the heuristic. The downside is that it could lead
>> >> > to un-optimal code, but that's the situation we have now, so...
>> >>
>> >> Okay, but what concretely are you suggesting here? Clang IRGen is
>> >> emitting accesses with consistent sizes, and LLVM is making them
>> >> inconsistent. Are you just asking Clang to emit smaller accesses
>> >> in the hope that LLVM won’t mess them up?
>> >
>> >
>> > I don't think this has anything to do with bit-fields or Clang's lowering. This seems to "just" be an optimizer issue (one that happens to show up for bit-field access patterns, but also affects other cases). Much-reduced testcase:
>> >
>> > unsigned short n;
>> > void set() { n |= 1; }
>> >
>> > For this testcase, -O2 generates a 1-byte 'or' instruction, which will often be a pessimization when there are also full-width accesses. I don't think the frontend can or should be working around this.
>>
>> We're a bit (heh) tied in what we can do. LLVM's IR doesn't convey
>> that we're working with a bitfield, but as many pointed out it's not
>> bitfield-specific (I didn't mean to imply that it was *only* bitfields
>> affected by this with my initial post, that was just where it showed
>> up). The front-end generates code that's perfectly valid for the
>> various accesses, and changing what it generates doesn't seem like it
>> would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
>> (note I'm *not* suggesting we do this, I'm just using as an example).)
>>
>> So what we're left with is trying to ensure that we don't generate
>> sub-optimal code via the DAG combiner and other passes. Some ideas off
>> the top of my head:
>>
>> 1. Avoid generating byte-sized bitwise operations, because as Richard
>> pointed out they can lead to pessimizations.
>>
>> 2. Come up with a better heuristic to determine if we're going to
>> generate inconsistent load/store accesses to the same object.
>>
>> 3. Transform the machine instructions before register allocation to
>> ensure that all accesses to the same address are the same size, or at
>> least that the stores are of equal or greater size than the loads.
>>
>> Option (1) may be the simplest and could potentially solve our immediate issue.
>>
>> Option (2) is difficult because the DAG combiner operates on a single
>> BB at a time and doesn't have visibility into past DAGs it modified.
>>
>> Option (3) allows us to retain the current behavior and gives us
>> visibility into the whole function, not just an individual BB.
>>
>> I kind of like option 3, but I'm open to other suggestions.
>>
>> -bw
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list