<div dir="ltr"><div dir="ltr"></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 28 May 2020, at 18:42, Bill Wendling wrote:<br>
<br>
> On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev<br>
> <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>
>><br>
>> At least in this test-case, the "bitfield" part of this seems to be a <br>
>> distraction. As Eli notes, Clang has lowered the function to LLVM IR <br>
>> containing consistent i16 operations. Despite that being a different <br>
>> choice from GCC, it should still be correct and consistent.<br>
>><br>
> I suspect that this is more prevalent with bitfields as they're more<br>
> likely to have the load / bitwise op / store operations done on them,<br>
> resulting in an access type that can be shortened. But yes, it's not<br>
> specific to just bitfields.<br>
><br>
> I'm more interested in consistency, to be honest. If the loads and<br>
> stores for the bitfields (or other such shorten-able objects) were the<br>
> same, then we wouldn't run into the store-to-load forwarding issue on<br>
> x86 (I don't know about other platforms, but suspect that consistency<br>
> wouldn't hurt). I liked Arthur's idea of accessing the object using<br>
> the type size the bitfield was defined with (i8, i16, i256). It would<br>
> help with improving the heuristic. The downside is that it could lead<br>
> to un-optimal code, but that's the situation we have now, so...<br>
<br>
Okay, but what concretely are you suggesting here? Clang IRGen is<br>
emitting accesses with consistent sizes, and LLVM is making them<br>
inconsistent. Are you just asking Clang to emit smaller accesses<br>
in the hope that LLVM won’t mess them up?<br></blockquote><div><br></div><div>I don't think this has anything to do with bit-fields or Clang's lowering. This seems to "just" be an optimizer issue (one that happens to show up for bit-field access patterns, but also affects other cases). Much-reduced testcase:</div><div><br></div>unsigned short n;<br><div>void set() { n |= 1; }</div><div><br></div><div>For this testcase, -O2 generates a 1-byte 'or' instruction, <a href="http://quick-bench.com/e61y0Wn1qR-9K1YM6Bf9YoS6qfY">which will often be a pessimization</a> when there are also full-width accesses. I don't think the frontend can or should be working around this.</div></div></div>