[cfe-commits] PATCH: Large re-working of bitfield IR-gen, and a fix for PR13691
Chandler Carruth
chandlerc at google.com
Wed Nov 28 17:20:42 PST 2012
On Wed, Nov 28, 2012 at 4:49 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> On Wed, Nov 28, 2012 at 4:36 PM, Chandler Carruth <chandlerc at google.com> wrote:
>> I don't have 176.gcc handy, but I looked at the binary sizes for every
>> program in the LNT test suite. Here are the benchmarks which changed
>> by at least 1%. First column is the benchmark, then the old size of
>> the '.text' section, then the new size, and finally "(new - old) /
>> old" to show the % change.
>>
>> MultiSource/Benchmarks/Fhourstones-3.1/Output/fhourstones3.1.simple,
>> 4872, 4984, 0.022988505747126436
>
> The actual numbers don't matter, but it's an interesting proxy for our
> handling of the following struct from that benchmark:
>
> typedef struct {
> unsigned biglock:LOCKSIZE;
> unsigned bigwork:6;
> unsigned newlock:LOCKSIZE;
> unsigned newscore:3;
> unsigned bigscore:3;
> } hashentry;
>
> The code-size increase probably indicates we're not doing a good job
> of narrowing the large loads generated by the new bitfield code.
I'm not sure what constitutes a "good job" here... The primary
difference is because this is a 64-bit machine, and thus codegen only
tries to narrow the loads and stores to 64-bit loads and stores. The
old bitfield code unconditionally split the loads and stores into
32-bit chunks.
This particular bitfield collection is exactly 64-bits wide, and so I
would expect loading and storing 64-bits at a time to be a good thing
on the whole, as it should be able to do fewer loads and stores and
instead perform bit-wise arithmetic to extract the values.... Notably,
there are no "strange sized" loads or stores that Chris was worried
about, these are all nice, target-legal, 64-bit integer loads and
stores.
Looking at the code in this benchmark, it goes both ways depending on
the circumstance: there are some places where doing 64-bit wide
operations generates better code, other places where it generates
worse code (both in terms of size and expected performance).
>From what I can tell, the codesize issues here are just a result of
the backend not being clever enough to split wide operations when
doing so makes the code significantly smaller, for example by removing
the need for very immediate masks. We should improve the backend for
these types of inputs, and then we'll be able to have the best of both
worlds -- wide stores when preferable, and narrow when that simplifies
something. We can't generally achieve this unless the frontend starts
off by using the wide loads and stores.
More information about the cfe-commits
mailing list