[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types
John McCall via llvm-dev
llvm-dev at lists.llvm.org
Tue May 26 18:29:42 PDT 2020
On 26 May 2020, at 20:31, Arthur O'Dwyer wrote:
> On Tue, May 26, 2020 at 7:32 PM John McCall via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> On 26 May 2020, at 18:28, Bill Wendling via llvm-dev wrote:
>>> [...] The store is a byte:
>>>
>>> orb $0x1,0x4a(%rbx)
>>>
>>> while the read is a word:
>>>
>>> movzwl 0x4a(%r12),%r15d
>>>
>>> The problem is that between the store and the load the value hasn't
>>> been retired / placed in the cache. One would expect store-to-load
>>> forwarding to kick in, but on x86 that doesn't happen because x86
>>> requires the store to be of equal or greater size than the load. So
>>> instead the load takes the slow path, causing unacceptable
>>> slowdowns.
>> [...]
>>
>> Clang used to generate narrower loads and stores for bit-fields, but
>> a
>> long time ago it was intentionally changed to generate wider loads
>> and stores, IIRC by Chandler. There are some cases where I think the
>> “new” code goes overboard, but in this case I don’t
>> particularly have
>> an issue with the wider loads and stores. I guess we could make a
>> best-effort attempt to stick to the storage-unit size when the
>> bit-fields break evenly on a boundary. But mostly I think the
>> frontend’s
>> responsibility ends with it generating same-size accesses in both
>> places, and if inconsistent access sizes trigger poor performance,
>> the backend should be more careful about intentionally changing
>> access
>> sizes.
>>
>
> FWIW, when I was at Green Hills, I recall the rule being "Always use
> the
> declared type of the bitfield to govern the size of the read or
> write."
> (There was a similar rule for the meaning of `volatile`. I hope I'm
> not
> just getting confused between the two. Actually, since of the
> compilers on
> Godbolt, only MSVC follows this rule <https://godbolt.org/z/Aq_APH>,
> I'm
> *probably* wrong.) That is, if the bitfield is declared `int16_t`,
> then
> use 16-bit loads and stores for it; if it's declared `int32_t`, then
> use
> 32-bit loads and stores.
I’ve always liked MSVC’s bit-field rules as a coherent whole, but
they are
quite different from the standard Unix rules. On Windows, `T x : 3`
literally allocates an entire `T` in the structure, and successive
bit-fields get packed into that `T` only if their base type is of the
same size (and they haven’t exhausted the original `T`). So of course
all accesses to that bit-field are basically of the full size of the
`T`;
there’s no overlap to be concerned with. On Unix, bit-fields will
typically
get packed together regardless of the base type; the base type does have
some influence, but it’s target-specific and somewhat odd.
I’d prefer if we degraded to a Windows-like access behavior as much
as we can, but it’s not always possible because of that packing.
John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200526/a1ee328b/attachment.html>
More information about the llvm-dev
mailing list