<div dir="ltr"><div>I thought the IR you showed in March had the store in one function and the load in another function. Option 3 would not handle that.<br></div><div><br></div><div>~Craig<br></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 29, 2020 at 4:29 PM Bill Wendling via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, May 29, 2020 at 4:00 PM Richard Smith <<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>> wrote:<br>

><br>

> On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

>><br>

>> On 28 May 2020, at 18:42, Bill Wendling wrote:<br>

>><br>

>> > On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev<br>

>> > <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

>> >><br>

>> >> At least in this test-case, the "bitfield" part of this seems to be a<br>

>> >> distraction. As Eli notes, Clang has lowered the function to LLVM IR<br>

>> >> containing consistent i16 operations. Despite that being a different<br>

>> >> choice from GCC, it should still be correct and consistent.<br>

>> >><br>

>> > I suspect that this is more prevalent with bitfields as they're more<br>

>> > likely to have the load / bitwise op / store operations done on them,<br>

>> > resulting in an access type that can be shortened. But yes, it's not<br>

>> > specific to just bitfields.<br>

>> ><br>

>> > I'm more interested in consistency, to be honest. If the loads and<br>

>> > stores for the bitfields (or other such shorten-able objects) were the<br>

>> > same, then we wouldn't run into the store-to-load forwarding issue on<br>

>> > x86 (I don't know about other platforms, but suspect that consistency<br>

>> > wouldn't hurt). I liked Arthur's idea of accessing the object using<br>

>> > the type size the bitfield was defined with (i8, i16, i256). It would<br>

>> > help with improving the heuristic. The downside is that it could lead<br>

>> > to un-optimal code, but that's the situation we have now, so...<br>

>><br>

>> Okay, but what concretely are you suggesting here?  Clang IRGen is<br>

>> emitting accesses with consistent sizes, and LLVM is making them<br>

>> inconsistent.  Are you just asking Clang to emit smaller accesses<br>

>> in the hope that LLVM won’t mess them up?<br>

><br>

><br>

> I don't think this has anything to do with bit-fields or Clang's lowering. This seems to "just" be an optimizer issue (one that happens to show up for bit-field access patterns, but also affects other cases). Much-reduced testcase:<br>

><br>

> unsigned short n;<br>

> void set() { n |= 1; }<br>

><br>

> For this testcase, -O2 generates a 1-byte 'or' instruction, which will often be a pessimization when there are also full-width accesses. I don't think the frontend can or should be working around this.<br>

<br>

We're a bit (heh) tied in what we can do. LLVM's IR doesn't convey<br>

that we're working with a bitfield, but as many pointed out it's not<br>

bitfield-specific (I didn't mean to imply that it was *only* bitfields<br>

affected by this with my initial post, that was just where it showed<br>

up). The front-end generates code that's perfectly valid for the<br>

various accesses, and changing what it generates doesn't seem like it<br>

would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields<br>

(note I'm *not* suggesting we do this, I'm just using as an example).)<br>

<br>

So what we're left with is trying to ensure that we don't generate<br>

sub-optimal code via the DAG combiner and other passes. Some ideas off<br>

the top of my head:<br>

<br>

1. Avoid generating byte-sized bitwise operations, because as Richard<br>

pointed out they can lead to pessimizations.<br>

<br>

2. Come up with a better heuristic to determine if we're going to<br>

generate inconsistent load/store accesses to the same object.<br>

<br>

3. Transform the machine instructions before register allocation to<br>

ensure that all accesses to the same address are the same size, or at<br>

least that the stores are of equal or greater size than the loads.<br>

<br>

Option (1) may be the simplest and could potentially solve our immediate issue.<br>

<br>

Option (2) is difficult because the DAG combiner operates on a single<br>

BB at a time and doesn't have visibility into past DAGs it modified.<br>

<br>

Option (3) allows us to retain the current behavior and gives us<br>

visibility into the whole function, not just an individual BB.<br>

<br>

I kind of like option 3, but I'm open to other suggestions.<br>

<br>

-bw<br>

_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

</blockquote></div>