[cfe-dev] [llvm-dev] [RFC] Loading Bitfields with Smallest Needed Types

Arthur O'Dwyer via cfe-dev cfe-dev at lists.llvm.org
Tue May 26 17:31:13 PDT 2020


On Tue, May 26, 2020 at 7:32 PM John McCall via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On 26 May 2020, at 18:28, Bill Wendling via llvm-dev wrote:
> > [...] The store is a byte:
> >
> >     orb    $0x1,0x4a(%rbx)
> >
> > while the read is a word:
> >
> >     movzwl 0x4a(%r12),%r15d
> >
> > The problem is that between the store and the load the value hasn't
> > been retired / placed in the cache. One would expect store-to-load
> > forwarding to kick in, but on x86 that doesn't happen because x86
> > requires the store to be of equal or greater size than the load. So
> > instead the load takes the slow path, causing unacceptable slowdowns.
> [...]
>
> Clang used to generate narrower loads and stores for bit-fields, but a
> long time ago it was intentionally changed to generate wider loads
> and stores, IIRC by Chandler.  There are some cases where I think the
> “new” code goes overboard, but in this case I don’t particularly have
> an issue with the wider loads and stores.  I guess we could make a
> best-effort attempt to stick to the storage-unit size when the
> bit-fields break evenly on a boundary.  But mostly I think the frontend’s
> responsibility ends with it generating same-size accesses in both
> places, and if inconsistent access sizes trigger poor performance,
> the backend should be more careful about intentionally changing access
> sizes.
>

FWIW, when I was at Green Hills, I recall the rule being "Always use the
declared type of the bitfield to govern the size of the read or write."
(There was a similar rule for the meaning of `volatile`. I hope I'm not
just getting confused between the two. Actually, since of the compilers on
Godbolt, only MSVC follows this rule <https://godbolt.org/z/Aq_APH>, I'm
*probably* wrong.)  That is, if the bitfield is declared `int16_t`, then
use 16-bit loads and stores for it; if it's declared `int32_t`, then use
32-bit loads and stores. This gives the programmer a reason to prefer one
declared type over another. For example, in

template<class T>
struct A {
    T w : 5;
    T x : 3;
    T y : 4;
    T z : 4;
};

the only differences between A<char> and A<short> are
- whether the struct's alignment is 1 or 2, and
- whether you use 8-bit or 16-bit accesses to modify its fields.

"The backend should be more careful about intentionally changing access
sizes" sounds like absolutely the correct diagnosis, to me.

my $.02,
Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200526/59c929f8/attachment.html>


More information about the cfe-dev mailing list