[cfe-commits] PATCH: Large re-working of bitfield IR-gen, and a fix for PR13691

Wed Nov 28 16:36:10 PST 2012

Hey Chris!

We chatted briefly about this patch during the devmtg, and I wanted to
follow up here about it. What follows is a very brief summary of our
discussion as I recall it -- chime in if I missed anything important.

On Fri, Oct 5, 2012 at 5:09 PM, Chris Lattner <clattner at apple.com> wrote:
> Hi Chandler,
>
> Sorry for the delay, I still haven't had a chance to look at your patch.
> That said, here are some high level comments based on my understanding of
> it:
>
> 1) I'm not worried about "weird" integer sizes in registers variables.  LLVM
> is pretty good at promoting stuff to "native" integers, and if there are
> bugs handling them, we want to know anyway.  If this is a convenient way to
> model arithmetic on an i56 bitfield, then that's cool with me.

Cool, no disagreement here.

> 2) I don't like the idea of "weird" integer sizes on loads and stores to
> struct memory though.  One of the major things that the current bitfield
> design is intended to encourage is to allow CSE and DSE of accesses to
> struct fields.  It currently does this by widening out loads (too much!) but
> if the widening is the problem, I'd rather see this expanded out into a
> series of loads (e.g. a i8 load and an i16 load, instead of an i24 load).
> The reason for this is that the optimizer won't be splitting up i24 loads -
> codegen will, and exposing truth to the mid-level optimizer is the right
> thing to do.

This is I think the interesting question. We discussed quite a bit in
person and I'm not going to try to repeat all of it here...

I think my essential argument is that the way the memory model (both
C++'s and LLVM's) works, it is specifically useful for the frontend to
emit maximally widened and/or merged stores. This exposes the maximal
width of store which will not introduce a data race. Without this
information, the optimizer can't freely use natural width operations
in many cases, and it becomes harder for the optimizer to combine
loads and stores to unrelated parts of a bitfield because they may
appear to be important to leave separate to avoid data races.

One possibility which we specifically discussed was pre-splitting the
wide load or store operations into "natural" integer widths for the
target. This has (in my mind) two significant downsides:

1) LLVM will hoist some of these operations across basic blocks which
will prevent the target from merging them when profitable (say the
memory is aligned, and it can do a 128bit store)

2) It duplicates the logic to intelligently decompose a load or store
of a wide integer between the frontend and the codegen layer in LLVM.
This is particularly frustrating because the codegen layer will in
many cases do a *better* job of decomposing things that the frontend
will by taking advantage of the exact instructions available and the
final alignment of the memory location.

> 3) This is the sort of change that needs some numbers to back it up.
> Performance results would be best, but a decent proxy for them in this case
> would be to look at the code size of functions in 176.gcc or something else
> that uses a bunch of bitfields.

I think this is crucial. =] I've run the LNT test suite with and
without this patch, and I get no regressions (or improvements) in
performance. That said, my box is quite noisy. If I can submit the
patch, the LNT build bots will give us much more stable data I
suspect.

I don't have 176.gcc handy, but I looked at the binary sizes for every
program in the LNT test suite. Here are the benchmarks which changed
by at least 1%. First column is the benchmark, then the old size of
the '.text' section, then the new size, and finally "(new - old) /
old" to show the % change.

MultiSource/Benchmarks/Fhourstones-3.1/Output/fhourstones3.1.simple,
4872, 4984, 0.022988505747126436

SingleSource/UnitTests/Output/2009-04-16-BitfieldInitialization.simple,
952, 888, -0.067226890756302518
SingleSource/UnitTests/Output/ms_struct-bitfield-init-1.simple, 664,
680, 0.024096385542168676
SingleSource/UnitTests/Output/2006-01-23-UnionInit.simple, 904, 920,
0.017699115044247787
SingleSource/Regression/C++/Output/2011-03-28-Bitfield.simple, 568,
552, -0.028169014084507043

So, one benchmark grew by 2.2%, but its a tiny benchmark to start
with, so this isn't terribly interesting.

Of the bitfield-heavy single source tests, we have 1.7%, 2.4%, -2.8%
and -6.7%. So the growth here doesn't seem worrisome at all.

Any other benchmarking you'd like me to do to validate this?