[cfe-commits] PATCH: Large re-working of bitfield IR-gen, and a fix for PR13691

Fri Aug 31 11:46:29 PDT 2012

Hi Chandler,

I haven't done an extensive review of the patch yet (by which I would
mostly mean looking at the codegen, I trust you to verify the
correctness), so treat this as initial comments.

My one main initial comment is that I would like to treat the PR and
the rewrite as distinct issues. Obviously fixing the PR is a
no-brainer, but it would be nice to put in the minimal patch for that
in the current code. I wouldn't expect this to be hard or to result in
byte-by-byte accesses, but I haven't looked closely at what the change
should be. It sounds however like you attempted this and decided it
was hard enough to prefer a rewrite...

Brain dead question: do we need to make the changes required for the
PR for C code, or only for C++? If the latter, we are giving up some
optimization latitude by doing it universally.

I was relatively happy with the code as currently structured, in that
I thought it did a reasonable job of separating out the bitfield
access policy. However, it sounds like you found it confusing so that
may be a matter of preference and I'm not sure any one else (maybe
Eli) is going to have a strong opinion. In general I am a fan of less
code, which your approach seems to win out.

The one thing I am worried about is that the simplicity of your
approach is contingent on us using irregular integer sized types. If
we don't do that, I feel like the current version of the code is
pretty good (feel free to disagree though), so really the question
comes down to do we want to move to irregular sized integer types. At
the time the code was written, it was a conscious decision not to do
that. I agree with the general sentiment that SROA is already doing
this, I am just worried that if we ever needed to break that we would
be back to doing something like the current code is doing.

I did a reasonable amount of looking at the codegen on bitfield test
cases when I wrote the old code, but I probably did a poor job of
turning these into test cases. I'm assuming you probably already did
the same kind of testing on the new code? Do you have any test cases
that demonstrate better or worse codegen from the new model?

A few more comments inline...

On Tue, Aug 28, 2012 at 5:33 PM, Chandler Carruth <chandlerc at google.com> wrote:
> On Tue, Aug 28, 2012 at 5:02 PM, John McCall <rjmccall at apple.com> wrote:
>>
>> On Aug 28, 2012, at 4:34 PM, Chandler Carruth wrote:
>>
>> On Tue, Aug 28, 2012 at 3:52 PM, John McCall <rjmccall at apple.com> wrote:
>>>
>>> On Aug 28, 2012, at 3:18 PM, Chandler Carruth wrote:
>>> > I've added an explanatory comment. Hopefully won't get stale.
>>> >
>>> > Any other comments? I'd love to get this into the tree as we're seeing
>>> > actual races in the wild due to it....
>>>
>>> If you would like a fast review, perhaps a more minimal change would
>>> have been in order. We're not going to rush into a total rewrite of
>>> bitfield
>>> IR-generation, especially not one that introduces a host of new
>>> assumptions
>>> and dependencies on a very general but poorly-specified IR feature that
>>> has not previously been used in code with binary compatibility
>>> requirements.
>>
>>
>> So, first off, sorry for being in a hurry. I'm not trying to get a rushed
>> review, just hoping to get review sooner rather than later. =D Anyways, I
>> would *really* like thorough reviews of this. I'm very aware and sensitive
>> to the importance of this code. In fact, that's why I'm working on it.
>>
>> Also, I too would have preferred a more minimal change. None seemed
>> readily available when both I and Richard Smith were looking at this in
>> detail.
>>
>> The primary idea we had was to bound the strided accesses to the size of
>> the bitfield storage rather than just to the containing struct.
>>
>>
>> Why not simply bound it to the offset of the next non-bitfield field (or,
>> if there is none, the data size of the type)?
>
>
> You also have to bound the start, not just the stop. That means reasonably
> often either issuing an unaligned load or peeling off the unaligned chunks.
> In my testing, this happened reasonably often due to the ABI layout not
> requiring alignment in many cases, even when the type would typically
> provide some.
>
>>  Is that challenging for some reason I'm not understanding?
>
>
> No, it's not challenging. As I explained in my previous email, doing this
> isn't hard, but in my testing it would very often result in byte-by-byte
> loads and stores. There seemed to be very good reason to not want that for
> performance reasons (allowing more aggressive combining across the bitfield,
> etc). What would be quite a bit more code (not particularly challenging,
> just a lot of code) would be peeling off the unaligned small chunks on the
> front and the back, and using wide loads and stores for sections entirely
> within the bitfield. It seemed like the *exact* code we would already need
> in LLVM to lower these integers effectively, and so it seemed preferably for
> the frontend to emit a simple single integer load and store, and teach the
> backend where necessary to handle it.

I don't understand why this is true (the byte-by-byte part). If your
rewrite is generating a wider load, then there is nothing in the
structure of the old code that would prevent generating larger
accesses as well. This seems independent of the structural change. It
seems like implementing this in the current framework is a small
amount of code.

I agree that after we did that it might be very questionable why we
were doing it in the frontend.

>> I also happened to be hacking on SROA recently, and that pass was in fact
>> forming similar IR structures you're worried about: arbitrarily sized
>> integers with bit insertion and extraction as a "vector" of "bits". I'm not
>> claiming that it produced the exact same structures or that there aren't
>> bugs here. I would be more than happy to provide extra testing and try to
>> flush out any crufty bits of LLVM here. I'm just saying I don't think this
>> pattern is *completely* unused...
>>
>> Anyways, does even the vague approach seem good? Should I work on a
>> different strategy? Are there particular tests that I might be able to run?
>> Any of this would be helpful feedback for me here.
>>
>>
>> I asked Daniel to make a detailed review, since he wrote the current
>> implementation and did a lot of investigation for it.
>>
>> The things that worry me most are:
>>
>> (1) How well-specified and -tested are the paths for non-power-of-2
>> integer types in LLVM?  Do we actually promise that, say, storing to an i17
>> will not touch the fourth byte?  Do we provide reasonably portable behavior
>> across targets, or is this an instance of the x86 backend having received a
>> lot of attention and everything else being a festering morass of latent
>> issues?  I know that SROA's behavior has exposed bugs before (IIRC with the
>> occasional response of making SROA less aggressive), not least of which
>> because SROA's behavior doesn't really kick in a lot.
>
>
> Sure, this is a great point. I'm happy to do any testing I can here. One
> thing that I think I've already done is ensure that we only produce
> i8-multiples, not i17s and such. I'll double check, but I would not expect
> to see any of those. My hope was that this would in the worst cased be
> trivially legalized to a sequence of i8 operations, and in the best case,
> the optimal pattern.

This is a concern of mine too. Of course, its also something we should
just probably make sure is nicely defined in the backend given that
SROA already uses it.

>> (2) How limiting is this going to be for bitfields on "weird" platforms?
>
>
> I would not expect this to be a problem, but perhaps I don't know the
> platforms where it will crop up. The intent is to treat this as a literal
> bag-of-bits. Any platform wher ethat works should be well supported.
>
>>
>>  Do we get anything useful from this change on big-endian targets?
>
>
> I think so. The code is much less confusing to me now. That is, if I
> understood the original code correctly. I'm not at all sure we have enough
> test coverage of big-endian IR generation to catch bugs, and I know that I'm
> not an expert on their ABI requirements. Maybe someone can tell me what to
> test for or look for in the output?
>
>> Will we find ourselves completely reimplementing it to support MS-style
>> packed bitfields?
>
>
> I don't know anything about MS-style packed bitfields, but I'll go
> digging....

I think that I agree with Eric's followup comment that this change
shouldn't effect things one way or other.

>> When I surveyed a few other people for their first impressions (including
>> Chris and Daniel), there was also a lot of concern that the backend will
>> often not be able to widen these loads and stores because the high-level
>> information simply isn't there.
>
>
> Correct. The point is to *not* widen them. ;]
>
> From a memory model perspective, the frontend should emit the widest
> possible loads and stores. That tells the backend it has exactly that much
> latitude. This change actually helps this in several cases by emitting
> *wider* loads and stores and thus clarifying that there cannot be a race
> across that memory.

I agree with the approach. I'm worried there might be cases where the
backend does a worse job in codegen *today* on the off size integer
types, but I could be wrong. And of course the right thing to do here
is fix up the backend.

Regardless, thanks for working in this area!

 - Daniel

> Hopefully, the optimizer is always going to widen the *load* side of this --
> certainly there are some such optimizations already, but maybe we need to
> add more.
>
> There is performance work to be done here though as I mentioned in the
> original email. The frontend should further widen stores to include any
> padding when that padding is safe to read and write. This should, for
> example, allow the common pattern of using an i32 store for a bitfield with
> between 17 and 24 bits rather than an i24 store. As I mentioned in my
> original email, I'm very happy to add this optimization in the frontend when
> it is demonstrably safe, I didn't include it in the original patch
> because... well as you point out, it's huge! =]
>
> However, it is a fundamental aspect of the C++ memory model (or LLVM's which
> we wrote to support that of C++ and Java among others) that store widening
> isn't generally feasible for the reasons you mention. (The cases where it
> is, we can usually form an SSA value, obviating the entire thing.)
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>