<div dir="ltr"><div dir="ltr">On Tue, May 26, 2020 at 7:32 PM John McCall via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

On 26 May 2020, at 18:28, Bill Wendling via llvm-dev wrote:<br>> [...] The store is a byte:<br>

><br>

>     orb    $0x1,0x4a(%rbx)<br>

><br>

> while the read is a word:<br>

><br>

>     movzwl 0x4a(%r12),%r15d<br>

><br>

> The problem is that between the store and the load the value hasn't<br>

> been retired / placed in the cache. One would expect store-to-load<br>

> forwarding to kick in, but on x86 that doesn't happen because x86<br>

> requires the store to be of equal or greater size than the load. So<br>

> instead the load takes the slow path, causing unacceptable slowdowns. [...]<br><br>

Clang used to generate narrower loads and stores for bit-fields, but a<br>

long time ago it was intentionally changed to generate wider loads<br>

and stores, IIRC by Chandler.  There are some cases where I think the<br>

“new” code goes overboard, but in this case I don’t particularly have<br>

an issue with the wider loads and stores.  I guess we could make a<br>

best-effort attempt to stick to the storage-unit size when the<br>

bit-fields break evenly on a boundary.  But mostly I think the frontend’s<br>

responsibility ends with it generating same-size accesses in both<br>

places, and if inconsistent access sizes trigger poor performance,<br>

the backend should be more careful about intentionally changing access<br>

sizes.<br></blockquote><div><br></div><div>FWIW, when I was at Green Hills, I recall the rule being "Always use the declared type of the bitfield to govern the size of the read or write."  (There was a similar rule for the meaning of `volatile`. I hope I'm not just getting confused between the two. Actually, since <a href="https://godbolt.org/z/Aq_APH">of the compilers on Godbolt, only MSVC follows this rule</a>, I'm <i>probably</i> wrong.)  That is, if the bitfield is declared `int16_t`, then use 16-bit loads and stores for it; if it's declared `int32_t`, then use 32-bit loads and stores. This gives the programmer a reason to prefer one declared type over another. For example, in</div><div><br></div><div>template<class T></div><div>struct A {</div><div>    T w : 5;</div><div>    T x : 3;</div><div>    T y : 4;</div><div>    T z : 4;</div><div>};</div><div><br></div><div>the only differences between A<char> and A<short> are</div><div>- whether the struct's alignment is 1 or 2, and</div><div>- whether you use 8-bit or 16-bit accesses to modify its fields.</div><div><br></div><div>"The backend should be more careful about intentionally changing access sizes" sounds like absolutely the correct diagnosis, to me. </div><div><br></div><div>my $.02,</div><div>Arthur</div></div></div>