[cfe-commits] PATCH: Large re-working of bitfield IR-gen, and a fix for PR13691

Tue Sep 11 11:08:56 PDT 2012

On Fri, Aug 31, 2012 at 3:35 PM, Daniel Dunbar <daniel at zuster.org> wrote:

> On Tue, Aug 28, 2012 at 5:33 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
> > On Tue, Aug 28, 2012 at 5:02 PM, John McCall <rjmccall at apple.com> wrote:
> >>
> >> On Aug 28, 2012, at 4:34 PM, Chandler Carruth wrote:
> >>
> >> On Tue, Aug 28, 2012 at 3:52 PM, John McCall <rjmccall at apple.com>
> wrote:
> >>>
> >>> On Aug 28, 2012, at 3:18 PM, Chandler Carruth wrote:
> >>> > I've added an explanatory comment. Hopefully won't get stale.
> >>> >
> >>> > Any other comments? I'd love to get this into the tree as we're
> seeing
> >>> > actual races in the wild due to it....
> >>>
> >>> If you would like a fast review, perhaps a more minimal change would
> >>> have been in order. We're not going to rush into a total rewrite of
> >>> bitfield
> >>> IR-generation, especially not one that introduces a host of new
> >>> assumptions
> >>> and dependencies on a very general but poorly-specified IR feature that
> >>> has not previously been used in code with binary compatibility
> >>> requirements.
> >>
> >>
> >> So, first off, sorry for being in a hurry. I'm not trying to get a
> rushed
> >> review, just hoping to get review sooner rather than later. =D Anyways,
> I
> >> would *really* like thorough reviews of this. I'm very aware and
> sensitive
> >> to the importance of this code. In fact, that's why I'm working on it.
> >>
> >> Also, I too would have preferred a more minimal change. None seemed
> >> readily available when both I and Richard Smith were looking at this in
> >> detail.
> >>
> >> The primary idea we had was to bound the strided accesses to the size of
> >> the bitfield storage rather than just to the containing struct.
> >>
> >>
> >> Why not simply bound it to the offset of the next non-bitfield field
> (or,
> >> if there is none, the data size of the type)?
> >
> >
> > You also have to bound the start, not just the stop. That means
> reasonably
> > often either issuing an unaligned load or peeling off the unaligned
> chunks.
> > In my testing, this happened reasonably often due to the ABI layout not
> > requiring alignment in many cases, even when the type would typically
> > provide some.
> >
> >>  Is that challenging for some reason I'm not understanding?
> >
> >
> > No, it's not challenging. As I explained in my previous email, doing this
> > isn't hard, but in my testing it would very often result in byte-by-byte
> > loads and stores. There seemed to be very good reason to not want that
> for
> > performance reasons (allowing more aggressive combining across the
> bitfield,
> > etc). What would be quite a bit more code (not particularly challenging,
> > just a lot of code) would be peeling off the unaligned small chunks on
> the
> > front and the back, and using wide loads and stores for sections entirely
> > within the bitfield. It seemed like the *exact* code we would already
> need
> > in LLVM to lower these integers effectively, and so it seemed preferably
> for
> > the frontend to emit a simple single integer load and store, and teach
> the
> > backend where necessary to handle it.
> >
> >>
> >> I also happened to be hacking on SROA recently, and that pass was in
> fact
> >> forming similar IR structures you're worried about: arbitrarily sized
> >> integers with bit insertion and extraction as a "vector" of "bits". I'm
> not
> >> claiming that it produced the exact same structures or that there aren't
> >> bugs here. I would be more than happy to provide extra testing and try
> to
> >> flush out any crufty bits of LLVM here. I'm just saying I don't think
> this
> >> pattern is *completely* unused...
> >>
> >> Anyways, does even the vague approach seem good? Should I work on a
> >> different strategy? Are there particular tests that I might be able to
> run?
> >> Any of this would be helpful feedback for me here.
> >>
> >>
> >> I asked Daniel to make a detailed review, since he wrote the current
> >> implementation and did a lot of investigation for it.
> >>
> >> The things that worry me most are:
> >>
> >> (1) How well-specified and -tested are the paths for non-power-of-2
> >> integer types in LLVM?  Do we actually promise that, say, storing to an
> i17
> >> will not touch the fourth byte?  Do we provide reasonably portable
> behavior
> >> across targets, or is this an instance of the x86 backend having
> received a
> >> lot of attention and everything else being a festering morass of latent
> >> issues?  I know that SROA's behavior has exposed bugs before (IIRC with
> the
> >> occasional response of making SROA less aggressive), not least of which
> >> because SROA's behavior doesn't really kick in a lot.
> >
> >
> > Sure, this is a great point. I'm happy to do any testing I can here. One
> > thing that I think I've already done is ensure that we only produce
> > i8-multiples, not i17s and such. I'll double check, but I would not
> expect
> > to see any of those. My hope was that this would in the worst cased be
> > trivially legalized to a sequence of i8 operations, and in the best case,
> > the optimal pattern.
> >
> >> (2) How limiting is this going to be for bitfields on "weird" platforms?
> >
> >
> > I would not expect this to be a problem, but perhaps I don't know the
> > platforms where it will crop up. The intent is to treat this as a literal
> > bag-of-bits. Any platform wher ethat works should be well supported.
> >
> >>
> >>  Do we get anything useful from this change on big-endian targets?
> >
> >
> > I think so. The code is much less confusing to me now. That is, if I
> > understood the original code correctly. I'm not at all sure we have
> enough
> > test coverage of big-endian IR generation to catch bugs, and I know that
> I'm
> > not an expert on their ABI requirements. Maybe someone can tell me what
> to
> > test for or look for in the output?
> >
> >> Will we find ourselves completely reimplementing it to support MS-style
> >> packed bitfields?
> >
> >
> > I don't know anything about MS-style packed bitfields, but I'll go
> > digging....
> >
> >> When I surveyed a few other people for their first impressions
> (including
> >> Chris and Daniel), there was also a lot of concern that the backend will
> >> often not be able to widen these loads and stores because the high-level
> >> information simply isn't there.
> >
> >
> > Correct. The point is to *not* widen them. ;]
> >
> > From a memory model perspective, the frontend should emit the widest
> > possible loads and stores. That tells the backend it has exactly that
> much
> > latitude. This change actually helps this in several cases by emitting
> > *wider* loads and stores and thus clarifying that there cannot be a race
> > across that memory.
>
> We also want to know the optimizer will narrow these when necessary
> (current code also has that problem, but to a less severe degree). If
> you are interested, here is a test case we don't currently
> subsequently narrow:
> --
> #pragma pack(1)
> struct s0 {
>   unsigned a : 9;
>   unsigned b0 : 5;
>   unsigned b1 : 5;
>   unsigned b2 : 5;
>   unsigned b3 : 5;
>   unsigned b4 : 5;
>   unsigned b5 : 5;
>   unsigned b6 : 5;
>   unsigned b7 : 5;
>   unsigned b8 : 5;
>   unsigned b9 : 5;
>   unsigned b10 : 5;
>   unsigned b11 : 5;
>   char c;
> };
>
> unsigned f0(struct s0 *p) {
>   return p->b8;
> }
> --
> which leaves us with a 64-bit load.

Yea, this ends up with the same 64-bit load. Is that really bad? I suppose
it might be nice to teach the backend to narrow to a single byte load where
we can if only for the encoding shrink.

I'll add this as a test case if you like, but it'll have to be a split test
case, one in clang to produce the giant integer operation, and one in llvm
to test how it lowers it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120911/e3f30823/attachment.html>