r215614 - CodeGen: When bitfields fall on natural boundaries, split them up
chandlerc at google.com
Thu Aug 14 01:17:44 PDT 2014
On Wed, Aug 13, 2014 at 9:31 PM, Justin Bogner <mail at justinbogner.com>
> Chandler Carruth <chandlerc at google.com> writes:
> > CodeGen: When bitfields fall on natural boundaries, split them up
> > Please revert. We used to do this, and I worked very hard to specifically
> > design the system to work the way it does now. There are copious
> > email threads, and discussions where we debated how to handle this and
> > to do so in the exact way.
> > If you want to re-open the discussion, cool. But please do so as a
> > and cite specifically why the rationale for using fully-wide loads and
> > is inapplicable or causes insurmountable problems.
> Hm, I hadn't really realized the history on this. I guess you're talking
> about r169489, "Rework the bitfield access IR generation". Looks to me
> like we really did a lot more than just break bitfields up on natural
The old code was written by ddunbar. I talked to him extensively before
rewriting most of it. The attempt of the old code was to slice it up into
sizeof(int) chunks. There was lots of complexity as well, but that was the
primary motivation for the code.
> Note that my change here doesn't change the way we lay the bitfields out
> *at all*, and that all of the tests added in r169489 pass unchanged with
Yes, I understand what the change is doing.
> I can understand the argument against extra complexity though: I'll
> get back to that in a minute.
This isn't an argument against extra complexity, this is the *wrong model*.
Very fundamentally, it is essential that the frontend emit the widest loads
and stores permitted by the language spec. The memory model very narrowly
constrains the backends ability to widen loads or stores or to merge
adjacent loads and stores across control dependencies. By emitting the full
load and store at each point, LLVM is able to combine *much* more
aggressively around control dependencies without violating the memory model.
In addition to breaking this theoretical power of the middle-end
optimizers, it also effectively masks racing memory accesses to different
bitfield slots from tools like ThreadSanitizer. By using the full width in
the initial load/store emission, sanitizers are aware of the *potential*
domain of any race regardless of what gets dropped during lowering.
I'll reply separately to the specific performance problems, but please
revert this until there is an actual discussion about changing this very
fundamental design constraint. =/ This is not "obvious" or something that
should go in without review and careful consideration.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-commits