Chris, pinging again. It's been 2 weeks since my last ping, and there since Daniel Dunbar handed it off to you. ;]<div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Sep 20, 2012 at 6:35 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Ping.<div><br></div><div>I really want to move forward with this. We are observing real world bugs, races, crashes, and other badness surrounding bitfields used across threads. We cannot even distinguish between user error and compiler error because the compiler is known to generate the wrong thing.</div>
<div><br></div><div>I'd love to hear any feedback or suggestions for further work here.</div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Sep 14, 2012 at 2:42 PM, Daniel Dunbar <span dir="ltr"><<a href="mailto:daniel@zuster.org" target="_blank">daniel@zuster.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>On Tue, Sep 11, 2012 at 11:02 AM, Chandler Carruth <<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>> wrote:<br>
> On Fri, Aug 31, 2012 at 11:46 AM, Daniel Dunbar <<a href="mailto:daniel@zuster.org" target="_blank">daniel@zuster.org</a>> wrote:<br>
>><br>
>> Hi Chandler,<br>
>><br>
>> I haven't done an extensive review of the patch yet (by which I would<br>
>> mostly mean looking at the codegen, I trust you to verify the<br>
>> correctness), so treat this as initial comments.<br>
><br>
><br>
> Thanks for these. I'm attaching an updated patch with some fixes and<br>
> improvements, see below.<br>
><br>
>><br>
>> My one main initial comment is that I would like to treat the PR and<br>
>> the rewrite as distinct issues. Obviously fixing the PR is a<br>
>> no-brainer, but it would be nice to put in the minimal patch for that<br>
>> in the current code. I wouldn't expect this to be hard or to result in<br>
>> byte-by-byte accesses, but I haven't looked closely at what the change<br>
>> should be. It sounds however like you attempted this and decided it<br>
>> was hard enough to prefer a rewrite...<br>
><br>
><br>
> Yes. Both Richard Smith and I looked at it, and neither of us had any clear<br>
> ideas about how to adapt the existing code to avoid the problem without<br>
> creating worse stores. As I explained in my previous email, the existing<br>
> strategy has two properties that make this hard: 1) it assumes it can read<br>
> before the bitfield in order to start off with an aligned load, and 2) it<br>
> wants to pre-split into strides which requires the frontend to implement<br>
> intelligent striding logic.<br>
><br>
>><br>
>> The one thing I am worried about is that the simplicity of your<br>
>> approach is contingent on us using irregular integer sized types. If<br>
>> we don't do that, I feel like the current version of the code is<br>
>> pretty good (feel free to disagree though), so really the question<br>
>> comes down to do we want to move to irregular sized integer types. At<br>
>> the time the code was written, it was a conscious decision not to do<br>
>> that. I agree with the general sentiment that SROA is already doing<br>
>> this, I am just worried that if we ever needed to break that we would<br>
>> be back to doing something like the current code is doing.<br>
><br>
><br>
> Yes, we may, some day need to go back and change things. But we'll have the<br>
> old code kicking around in history. This at least gets us to correct<br>
> behavior today and simplifies the code today.<br>
><br>
>><br>
>> I did a reasonable amount of looking at the codegen on bitfield test<br>
>> cases when I wrote the old code, but I probably did a poor job of<br>
>> turning these into test cases. I'm assuming you probably already did<br>
>> the same kind of testing on the new code? Do you have any test cases<br>
>> that demonstrate better or worse codegen from the new model?<br>
><br>
><br>
> Test cases I have looked at in depth are included. ;] Mostly, I carefully<br>
> checked the output of the existing bitfield tests. There are actually quite<br>
> a few. I added a test case to cover the correctness issue.<br>
><br>
> I'll work on a few stress test cases based on the examples you and eli gave,<br>
> as well as some that I've come up with.<br>
><br>
><br>
>><br>
>> I don't understand why this is true (the byte-by-byte part). If your<br>
>> rewrite is generating a wider load, then there is nothing in the<br>
>> structure of the old code that would prevent generating larger<br>
>> accesses as well. This seems independent of the structural change. It<br>
>> seems like implementing this in the current framework is a small<br>
>> amount of code.<br>
><br>
><br>
> Well, both Richard and I looked at it, and after I had tried a few times,<br>
> neither of us had really good ideas about how to make it a small amount of<br>
> code. To be fair, I didn't really *work* to find an elegant way to implement<br>
> it because after a few initial stabs and readings, I came to this<br>
> conclusion:<br>
><br>
>><br>
>> I agree that after we did that it might be very questionable why we<br>
>> were doing it in the frontend.<br>
><br>
><br>
> And once I was there, I wanted to do the right thing and solve this using<br>
> the facilities LLVM provides.<br>
><br>
>><br>
>> >> (1) How well-specified and -tested are the paths for non-power-of-2<br>
>> >> integer types in LLVM? Do we actually promise that, say, storing to an<br>
>> >> i17<br>
>> >> will not touch the fourth byte? Do we provide reasonably portable<br>
>> >> behavior<br>
>> >> across targets, or is this an instance of the x86 backend having<br>
>> >> received a<br>
>> >> lot of attention and everything else being a festering morass of latent<br>
>> >> issues? I know that SROA's behavior has exposed bugs before (IIRC with<br>
>> >> the<br>
>> >> occasional response of making SROA less aggressive), not least of which<br>
>> >> because SROA's behavior doesn't really kick in a lot.<br>
>> ><br>
>> ><br>
>> > Sure, this is a great point. I'm happy to do any testing I can here. One<br>
>> > thing that I think I've already done is ensure that we only produce<br>
>> > i8-multiples, not i17s and such. I'll double check, but I would not<br>
>> > expect<br>
>> > to see any of those. My hope was that this would in the worst cased be<br>
>> > trivially legalized to a sequence of i8 operations, and in the best<br>
>> > case,<br>
>> > the optimal pattern.<br>
>><br>
>> This is a concern of mine too. Of course, its also something we should<br>
>> just probably make sure is nicely defined in the backend given that<br>
>> SROA already uses it.<br>
><br>
><br>
> So I talked to Duncan specifically about this and this is actually the use<br>
> case he feels was intended originally and should be best supported. I've<br>
> done some LLVM-side testing and it seems to bear that out. Very reasonable<br>
> code was generated for integers in my testing.<br>
<br>
</div></div>Ok, I have more or less been convinced this is a better approach.<br>
<br>
I think you should get explicit sign off from Chris though, because<br>
when I discussed it in person with him he had a very strong feeling<br>
that we did not want the FE to start generating off-size integer<br>
memory accesses (as opposed to off-sized integer values). I don't<br>
think I can clearly convey his argument, though, so I'd like to see<br>
him chime in.<br>
<span><font color="#888888"><br>
- Daniel<br>
</font></span><div><div><br>
><br>
>><br>
>> I agree with the approach. I'm worried there might be cases where the<br>
>> backend does a worse job in codegen *today* on the off size integer<br>
>> types, but I could be wrong. And of course the right thing to do here<br>
>> is fix up the backend.<br>
>><br>
>> Regardless, thanks for working in this area!<br>
><br>
><br>
> Well, I've fixed a tiny bug, and done significantly more correctness<br>
> testing. The patch attached passes self-host and the full testsuite. The<br>
> test suite shows no significant performance regressions.<br>
><br>
> I've also added something to try to mitigate the performance concerns. I've<br>
> added logic to detect a case when there is guranteed to be padding bytes<br>
> after the bitfield and to widen the bitfield into the padding when doing so<br>
> produces more natural integer sizes.<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>