[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Wed May 27 09:25:15 PDT 2020


There is a related discussion about load widening (byte read --> 64bit
read) transformation 9 years ago:
http://lists.llvm.org/pipermail/llvm-dev/2011-December/046322.html

David

On Tue, May 26, 2020 at 8:37 PM Craig Topper via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Similar issue with aligned extload widening. The load and store are both
> i16 in IR.
>
> struct s {
> int x; // force alignment
> short y;
> };
>
> void g();
> short f2(s *a, short b) {
> a->y = b;
> g();
> return a->y - b;
> }
>
> produces this
> movq %rdi, %rbx
> movw %bp, 4(%rdi) // 2 byte store
> callq g()
> movl 4(%rbx), %eax // 4 byte load
> subl %ebp, %eax
>
> ~Craig
>
>
> On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> At least in this test-case, the "bitfield" part of this seems to be a
>> distraction. As Eli notes, Clang has lowered the function to LLVM IR
>> containing consistent i16 operations. Despite that being a different choice
>> from GCC, it should still be correct and consistent.
>>
>> Of course that insight does mean it's quite easy to create a test-case
>> with the exact same problematic store->load mismatch which doesn't use
>> bit-fields at all. For example:
>> short f2(short *bfs) {
>>     *bfs &= ~0x1;
>>     g();
>>     return *bfs;
>> }
>>
>> creates the same bad sequence:
>>         movq    %rdi, %rbx
>>         andb    $-2, (%rdi)
>>         callq   g()
>>         movzwl  (%rbx), %eax
>>
>> On Tue, May 26, 2020 at 9:30 PM John McCall via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On 26 May 2020, at 20:31, Arthur O'Dwyer wrote:
>>>
>>> On Tue, May 26, 2020 at 7:32 PM John McCall via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>> On 26 May 2020, at 18:28, Bill Wendling via llvm-dev wrote:
>>>
>>> [...] The store is a byte:
>>>
>>> orb $0x1,0x4a(%rbx)
>>>
>>> while the read is a word:
>>>
>>> movzwl 0x4a(%r12),%r15d
>>>
>>> The problem is that between the store and the load the value hasn't
>>> been retired / placed in the cache. One would expect store-to-load
>>> forwarding to kick in, but on x86 that doesn't happen because x86
>>> requires the store to be of equal or greater size than the load. So
>>> instead the load takes the slow path, causing unacceptable slowdowns.
>>>
>>> [...]
>>>
>>> Clang used to generate narrower loads and stores for bit-fields, but a
>>> long time ago it was intentionally changed to generate wider loads
>>> and stores, IIRC by Chandler. There are some cases where I think the
>>> “new” code goes overboard, but in this case I don’t particularly have
>>> an issue with the wider loads and stores. I guess we could make a
>>> best-effort attempt to stick to the storage-unit size when the
>>> bit-fields break evenly on a boundary. But mostly I think the frontend’s
>>> responsibility ends with it generating same-size accesses in both
>>> places, and if inconsistent access sizes trigger poor performance,
>>> the backend should be more careful about intentionally changing access
>>> sizes.
>>>
>>> FWIW, when I was at Green Hills, I recall the rule being "Always use the
>>> declared type of the bitfield to govern the size of the read or write."
>>> (There was a similar rule for the meaning of `volatile`. I hope I'm not
>>> just getting confused between the two. Actually, since of the compilers
>>> on
>>> Godbolt, only MSVC follows this rule <https://godbolt.org/z/Aq_APH>, I'm
>>> *probably* wrong.) That is, if the bitfield is declared `int16_t`, then
>>> use 16-bit loads and stores for it; if it's declared `int32_t`, then use
>>> 32-bit loads and stores.
>>>
>>> I’ve always liked MSVC’s bit-field rules as a coherent whole, but they
>>> are
>>> quite different from the standard Unix rules. On Windows, T x : 3
>>> literally allocates an entire T in the structure, and successive
>>> bit-fields get packed into that T only if their base type is of the
>>> same size (and they haven’t exhausted the original T). So of course
>>> all accesses to that bit-field are basically of the full size of the T;
>>> there’s no overlap to be concerned with. On Unix, bit-fields will
>>> typically
>>> get packed together regardless of the base type; the base type does have
>>> some influence, but it’s target-specific and somewhat odd.
>>>
>>> I’d prefer if we degraded to a Windows-like access behavior as much
>>> as we can, but it’s not always possible because of that packing.
>>>
>>> John.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200527/bd917a96/attachment-0001.html>


More information about the llvm-dev mailing list