[cfe-commits] PATCH: Large re-working of bitfield IR-gen, and a fix for PR13691

Fri Aug 31 15:35:35 PDT 2012

On Tue, Aug 28, 2012 at 5:33 PM, Chandler Carruth <chandlerc at google.com> wrote:
> On Tue, Aug 28, 2012 at 5:02 PM, John McCall <rjmccall at apple.com> wrote:
>>
>> On Aug 28, 2012, at 4:34 PM, Chandler Carruth wrote:
>>
>> On Tue, Aug 28, 2012 at 3:52 PM, John McCall <rjmccall at apple.com> wrote:
>>>
>>> On Aug 28, 2012, at 3:18 PM, Chandler Carruth wrote:
>>> > I've added an explanatory comment. Hopefully won't get stale.
>>> >
>>> > Any other comments? I'd love to get this into the tree as we're seeing
>>> > actual races in the wild due to it....
>>>
>>> If you would like a fast review, perhaps a more minimal change would
>>> have been in order. We're not going to rush into a total rewrite of
>>> bitfield
>>> IR-generation, especially not one that introduces a host of new
>>> assumptions
>>> and dependencies on a very general but poorly-specified IR feature that
>>> has not previously been used in code with binary compatibility
>>> requirements.
>>
>>
>> So, first off, sorry for being in a hurry. I'm not trying to get a rushed
>> review, just hoping to get review sooner rather than later. =D Anyways, I
>> would *really* like thorough reviews of this. I'm very aware and sensitive
>> to the importance of this code. In fact, that's why I'm working on it.
>>
>> Also, I too would have preferred a more minimal change. None seemed
>> readily available when both I and Richard Smith were looking at this in
>> detail.
>>
>> The primary idea we had was to bound the strided accesses to the size of
>> the bitfield storage rather than just to the containing struct.
>>
>>
>> Why not simply bound it to the offset of the next non-bitfield field (or,
>> if there is none, the data size of the type)?
>
>
> You also have to bound the start, not just the stop. That means reasonably
> often either issuing an unaligned load or peeling off the unaligned chunks.
> In my testing, this happened reasonably often due to the ABI layout not
> requiring alignment in many cases, even when the type would typically
> provide some.
>
>>  Is that challenging for some reason I'm not understanding?
>
>
> No, it's not challenging. As I explained in my previous email, doing this
> isn't hard, but in my testing it would very often result in byte-by-byte
> loads and stores. There seemed to be very good reason to not want that for
> performance reasons (allowing more aggressive combining across the bitfield,
> etc). What would be quite a bit more code (not particularly challenging,
> just a lot of code) would be peeling off the unaligned small chunks on the
> front and the back, and using wide loads and stores for sections entirely
> within the bitfield. It seemed like the *exact* code we would already need
> in LLVM to lower these integers effectively, and so it seemed preferably for
> the frontend to emit a simple single integer load and store, and teach the
> backend where necessary to handle it.
>
>>
>> I also happened to be hacking on SROA recently, and that pass was in fact
>> forming similar IR structures you're worried about: arbitrarily sized
>> integers with bit insertion and extraction as a "vector" of "bits". I'm not
>> claiming that it produced the exact same structures or that there aren't
>> bugs here. I would be more than happy to provide extra testing and try to
>> flush out any crufty bits of LLVM here. I'm just saying I don't think this
>> pattern is *completely* unused...
>>
>> Anyways, does even the vague approach seem good? Should I work on a
>> different strategy? Are there particular tests that I might be able to run?
>> Any of this would be helpful feedback for me here.
>>
>>
>> I asked Daniel to make a detailed review, since he wrote the current
>> implementation and did a lot of investigation for it.
>>
>> The things that worry me most are:
>>
>> (1) How well-specified and -tested are the paths for non-power-of-2
>> integer types in LLVM?  Do we actually promise that, say, storing to an i17
>> will not touch the fourth byte?  Do we provide reasonably portable behavior
>> across targets, or is this an instance of the x86 backend having received a
>> lot of attention and everything else being a festering morass of latent
>> issues?  I know that SROA's behavior has exposed bugs before (IIRC with the
>> occasional response of making SROA less aggressive), not least of which
>> because SROA's behavior doesn't really kick in a lot.
>
>
> Sure, this is a great point. I'm happy to do any testing I can here. One
> thing that I think I've already done is ensure that we only produce
> i8-multiples, not i17s and such. I'll double check, but I would not expect
> to see any of those. My hope was that this would in the worst cased be
> trivially legalized to a sequence of i8 operations, and in the best case,
> the optimal pattern.
>
>> (2) How limiting is this going to be for bitfields on "weird" platforms?
>
>
> I would not expect this to be a problem, but perhaps I don't know the
> platforms where it will crop up. The intent is to treat this as a literal
> bag-of-bits. Any platform wher ethat works should be well supported.
>
>>
>>  Do we get anything useful from this change on big-endian targets?
>
>
> I think so. The code is much less confusing to me now. That is, if I
> understood the original code correctly. I'm not at all sure we have enough
> test coverage of big-endian IR generation to catch bugs, and I know that I'm
> not an expert on their ABI requirements. Maybe someone can tell me what to
> test for or look for in the output?
>
>> Will we find ourselves completely reimplementing it to support MS-style
>> packed bitfields?
>
>
> I don't know anything about MS-style packed bitfields, but I'll go
> digging....
>
>> When I surveyed a few other people for their first impressions (including
>> Chris and Daniel), there was also a lot of concern that the backend will
>> often not be able to widen these loads and stores because the high-level
>> information simply isn't there.
>
>
> Correct. The point is to *not* widen them. ;]
>
> From a memory model perspective, the frontend should emit the widest
> possible loads and stores. That tells the backend it has exactly that much
> latitude. This change actually helps this in several cases by emitting
> *wider* loads and stores and thus clarifying that there cannot be a race
> across that memory.

We also want to know the optimizer will narrow these when necessary
(current code also has that problem, but to a less severe degree). If
you are interested, here is a test case we don't currently
subsequently narrow:
--
#pragma pack(1)
struct s0 {
  unsigned a : 9;
  unsigned b0 : 5;
  unsigned b1 : 5;
  unsigned b2 : 5;
  unsigned b3 : 5;
  unsigned b4 : 5;
  unsigned b5 : 5;
  unsigned b6 : 5;
  unsigned b7 : 5;
  unsigned b8 : 5;
  unsigned b9 : 5;
  unsigned b10 : 5;
  unsigned b11 : 5;
  char c;
};

unsigned f0(struct s0 *p) {
  return p->b8;
}
--
which leaves us with a 64-bit load.

 - Daniel

>
> Hopefully, the optimizer is always going to widen the *load* side of this --
> certainly there are some such optimizations already, but maybe we need to
> add more.
>
> There is performance work to be done here though as I mentioned in the
> original email. The frontend should further widen stores to include any
> padding when that padding is safe to read and write. This should, for
> example, allow the common pattern of using an i32 store for a bitfield with
> between 17 and 24 bits rather than an i24 store. As I mentioned in my
> original email, I'm very happy to add this optimization in the frontend when
> it is demonstrably safe, I didn't include it in the original patch
> because... well as you point out, it's huge! =]
>
> However, it is a fundamental aspect of the C++ memory model (or LLVM's which
> we wrote to support that of C++ and Java among others) that store widening
> isn't generally feasible for the reasons you mention. (The cases where it
> is, we can usually form an SSA value, obviating the entire thing.)
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>