<div class="gmail_extra"><div class="gmail_quote">On Tue, Aug 28, 2012 at 5:02 PM, John McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank" class="cremed">rjmccall@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div class="im"><div>On Aug 28, 2012, at 4:34 PM, Chandler Carruth wrote:</div><blockquote type="cite">

<div class="gmail_extra"><div class="gmail_quote">On Tue, Aug 28, 2012 at 3:52 PM, John McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank" class="cremed">rjmccall@apple.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On Aug 28, 2012, at 3:18 PM, Chandler Carruth wrote:<br>

> I've added an explanatory comment. Hopefully won't get stale.<br>

><br>

> Any other comments? I'd love to get this into the tree as we're seeing actual races in the wild due to it....<br>

<br>

</div>If you would like a fast review, perhaps a more minimal change would<br>

have been in order. We're not going to rush into a total rewrite of bitfield<br>

IR-generation, especially not one that introduces a host of new assumptions<br>

and dependencies on a very general but poorly-specified IR feature that<br>

has not previously been used in code with binary compatibility<br>

requirements.<br></blockquote><div><br></div><div>So, first off, sorry for being in a hurry. I'm not trying to get a rushed review, just hoping to get review sooner rather than later. =D Anyways, I would *really* like thorough reviews of this. I'm very aware and sensitive to the importance of this code. In fact, that's why I'm working on it.</div>


<div><br></div><div>Also, I too would have preferred a more minimal change. None seemed readily available when both I and Richard Smith were looking at this in detail.</div><div><br></div><div>The primary idea we had was to bound the strided accesses to the size of the bitfield storage rather than just to the containing struct.</div>

</div></div></blockquote><div><br></div></div><div>Why not simply bound it to the offset of the next non-bitfield field (or, if there is none, the data size of the type)?</div></div></div></blockquote><div><br></div><div>

You also have to bound the start, not just the stop. That means reasonably often either issuing an unaligned load or peeling off the unaligned chunks. In my testing, this happened reasonably often due to the ABI layout not requiring alignment in many cases, even when the type would typically provide some.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div>  Is that challenging for some reason I'm not understanding?</div>

</div></div></blockquote><div><br></div><div>No, it's not challenging. As I explained in my previous email, doing this isn't hard, but in my testing it would very often result in byte-by-byte loads and stores. There seemed to be very good reason to not want that for performance reasons (allowing more aggressive combining across the bitfield, etc). What would be quite a bit more code (not particularly challenging, just a lot of code) would be peeling off the unaligned small chunks on the front and the back, and using wide loads and stores for sections entirely within the bitfield. It seemed like the *exact* code we would already need in LLVM to lower these integers effectively, and so it seemed preferably for the frontend to emit a simple single integer load and store, and teach the backend where necessary to handle it.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div class="im"><div><br></div><blockquote type="cite"><div class="gmail_extra">

<div class="gmail_quote"><div>I also happened to be hacking on SROA recently, and that pass was in fact forming similar IR structures you're worried about: arbitrarily sized integers with bit insertion and extraction as a "vector" of "bits". I'm not claiming that it produced the exact same structures or that there aren't bugs here. I would be more than happy to provide extra testing and try to flush out any crufty bits of LLVM here. I'm just saying I don't think this pattern is *completely* unused...</div>


<div><br></div><div>Anyways, does even the vague approach seem good? Should I work on a different strategy? Are there particular tests that I might be able to run? Any of this would be helpful feedback for me here.</div>

</div>

</div>

</blockquote></div></div><br><div>I asked Daniel to make a detailed review, since he wrote the current implementation and did a lot of investigation for it.</div><div><br></div><div>The things that worry me most are:</div>

<div><br></div><div>(1) How well-specified and -tested are the paths for non-power-of-2 integer types in LLVM?  Do we actually promise that, say, storing to an i17 will not touch the fourth byte?  Do we provide reasonably portable behavior across targets, or is this an instance of the x86 backend having received a lot of attention and everything else being a festering morass of latent issues?  I know that SROA's behavior has exposed bugs before (IIRC with the occasional response of making SROA less aggressive), not least of which because SROA's behavior doesn't really kick in a lot.</div>

</div></blockquote><div><br></div><div>Sure, this is a great point. I'm happy to do any testing I can here. One thing that I think I've already done is ensure that we only produce i8-multiples, not i17s and such. I'll double check, but I would not expect to see any of those. My hope was that this would in the worst cased be trivially legalized to a sequence of i8 operations, and in the best case, the optimal pattern.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>(2) How limiting is this going to be for bitfields on "weird" platforms?</div>

</div></blockquote><div><br></div><div>I would not expect this to be a problem, but perhaps I don't know the platforms where it will crop up. The intent is to treat this as a literal bag-of-bits. Any platform wher ethat works should be well supported.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>  Do we get anything useful from this change on big-endian targets?</div>

</div></blockquote><div><br></div><div>I think so. The code is much less confusing to me now. That is, if I understood the original code correctly. I'm not at all sure we have enough test coverage of big-endian IR generation to catch bugs, and I know that I'm not an expert on their ABI requirements. Maybe someone can tell me what to test for or look for in the output?</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>Will we find ourselves completely reimplementing it to support MS-style packed bitfields?</div>

</div></blockquote><div><br></div><div>I don't know anything about MS-style packed bitfields, but I'll go digging....</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="word-wrap:break-word"><div>When I surveyed a few other people for their first impressions (including Chris and Daniel), there was also a lot of concern that the backend will often not be able to widen these loads and stores because the high-level information simply isn't there.</div>

</div></blockquote><div><br></div><div>Correct. The point is to *not* widen them. ;]</div><div><br></div><div>From a memory model perspective, the frontend should emit the widest possible loads and stores. That tells the backend it has exactly that much latitude. This change actually helps this in several cases by emitting *wider* loads and stores and thus clarifying that there cannot be a race across that memory.</div>

<div><br></div><div>Hopefully, the optimizer is always going to widen the *load* side of this -- certainly there are some such optimizations already, but maybe we need to add more.</div><div><br></div><div>There is performance work to be done here though as I mentioned in the original email. The frontend should further widen stores to include any padding when that padding is safe to read and write. This should, for example, allow the common pattern of using an i32 store for a bitfield with between 17 and 24 bits rather than an i24 store. As I mentioned in my original email, I'm very happy to add this optimization in the frontend when it is demonstrably safe, I didn't include it in the original patch because... well as you point out, it's huge! =]</div>

<div><br></div><div>However, it is a fundamental aspect of the C++ memory model (or LLVM's which we wrote to support that of C++ and Java among others) that store widening isn't generally feasible for the reasons you mention. (The cases where it is, we can usually form an SSA value, obviating the entire thing.)</div>

</div></div>