[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

Fri May 29 17:59:43 PDT 2020

On 29 May 2020, at 19:29, Bill Wendling wrote:
> On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at metafoo.co.uk> 
> wrote:
>> On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev 
>> <llvm-dev at lists.llvm.org> wrote:
>>> On 28 May 2020, at 18:42, Bill Wendling wrote:
>>>
>>>> On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> At least in this test-case, the "bitfield" part of this seems to 
>>>>> be a
>>>>> distraction. As Eli notes, Clang has lowered the function to LLVM 
>>>>> IR
>>>>> containing consistent i16 operations. Despite that being a 
>>>>> different
>>>>> choice from GCC, it should still be correct and consistent.
>>>>>
>>>> I suspect that this is more prevalent with bitfields as they're 
>>>> more
>>>> likely to have the load / bitwise op / store operations done on 
>>>> them,
>>>> resulting in an access type that can be shortened. But yes, it's 
>>>> not
>>>> specific to just bitfields.
>>>>
>>>> I'm more interested in consistency, to be honest. If the loads and
>>>> stores for the bitfields (or other such shorten-able objects) were 
>>>> the
>>>> same, then we wouldn't run into the store-to-load forwarding issue 
>>>> on
>>>> x86 (I don't know about other platforms, but suspect that 
>>>> consistency
>>>> wouldn't hurt). I liked Arthur's idea of accessing the object using
>>>> the type size the bitfield was defined with (i8, i16, i256). It 
>>>> would
>>>> help with improving the heuristic. The downside is that it could 
>>>> lead
>>>> to un-optimal code, but that's the situation we have now, so...
>>>
>>> Okay, but what concretely are you suggesting here?  Clang IRGen is
>>> emitting accesses with consistent sizes, and LLVM is making them
>>> inconsistent.  Are you just asking Clang to emit smaller accesses
>>> in the hope that LLVM won’t mess them up?
>>
>>
>> I don't think this has anything to do with bit-fields or Clang's 
>> lowering. This seems to "just" be an optimizer issue (one that 
>> happens to show up for bit-field access patterns, but also affects 
>> other cases). Much-reduced testcase:
>>
>> unsigned short n;
>> void set() { n |= 1; }

To be clear, I agree with you that this is not a frontend problem.
I asked because the question seemed to lead towards changing the width
that Clang uses for accesses.

> We're a bit (heh) tied in what we can do. LLVM's IR doesn't convey
> that we're working with a bitfield, but as many pointed out it's not
> bitfield-specific (I didn't mean to imply that it was *only* bitfields
> affected by this with my initial post, that was just where it showed
> up). The front-end generates code that's perfectly valid for the
> various accesses, and changing what it generates doesn't seem like it
> would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
> (note I'm *not* suggesting we do this, I'm just using as an example).)
>
> So what we're left with is trying to ensure that we don't generate
> sub-optimal code via the DAG combiner and other passes. Some ideas off
> the top of my head:
>
> 1. Avoid generating byte-sized bitwise operations, because as Richard
> pointed out they can lead to pessimizations.
>
> 2. Come up with a better heuristic to determine if we're going to
> generate inconsistent load/store accesses to the same object.
>
> 3. Transform the machine instructions before register allocation to
> ensure that all accesses to the same address are the same size, or at
> least that the stores are of equal or greater size than the loads.
>
> Option (1) may be the simplest and could potentially solve our 
> immediate issue.
>
> Option (2) is difficult because the DAG combiner operates on a single
> BB at a time and doesn't have visibility into past DAGs it modified.
>
> Option (3) allows us to retain the current behavior and gives us
> visibility into the whole function, not just an individual BB.
>
> I kind of like option 3, but I'm open to other suggestions.

What’s the concrete suggestion for (3), that we analyze the uses
of a single address and try to re-widen the access to make it use
a consistent width?  Widening accesses is difficult in general,
unless the information is still present that it was originally
wider before DAG combine.

At any rate, seems like a backend problem. :)

John.