<div style="font-family: arial, helvetica, sans-serif; font-size: 10pt"><div dir="ltr">Chris, any final comments? Everyone else seems happy with this approach now.</div><div class="gmail_extra"><br><br><div class="gmail_quote">

On Wed, Nov 28, 2012 at 4:36 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hey Chris!<br>

<br>

We chatted briefly about this patch during the devmtg, and I wanted to<br>

follow up here about it. What follows is a very brief summary of our<br>

discussion as I recall it -- chime in if I missed anything important.<br>

<div class="im"><br>

On Fri, Oct 5, 2012 at 5:09 PM, Chris Lattner <<a href="mailto:clattner@apple.com">clattner@apple.com</a>> wrote:<br>

> Hi Chandler,<br>

><br>

> Sorry for the delay, I still haven't had a chance to look at your patch.<br>

> That said, here are some high level comments based on my understanding of<br>

> it:<br>

><br>

> 1) I'm not worried about "weird" integer sizes in registers variables.  LLVM<br>

> is pretty good at promoting stuff to "native" integers, and if there are<br>

> bugs handling them, we want to know anyway.  If this is a convenient way to<br>

> model arithmetic on an i56 bitfield, then that's cool with me.<br>

<br>

</div>Cool, no disagreement here.<br>

<div class="im"><br>

<br>

> 2) I don't like the idea of "weird" integer sizes on loads and stores to<br>

> struct memory though.  One of the major things that the current bitfield<br>

> design is intended to encourage is to allow CSE and DSE of accesses to<br>

> struct fields.  It currently does this by widening out loads (too much!) but<br>

> if the widening is the problem, I'd rather see this expanded out into a<br>

> series of loads (e.g. a i8 load and an i16 load, instead of an i24 load).<br>

> The reason for this is that the optimizer won't be splitting up i24 loads -<br>

> codegen will, and exposing truth to the mid-level optimizer is the right<br>

> thing to do.<br>

<br>

</div>This is I think the interesting question. We discussed quite a bit in<br>

person and I'm not going to try to repeat all of it here...<br>

<br>

I think my essential argument is that the way the memory model (both<br>

C++'s and LLVM's) works, it is specifically useful for the frontend to<br>

emit maximally widened and/or merged stores. This exposes the maximal<br>

width of store which will not introduce a data race. Without this<br>

information, the optimizer can't freely use natural width operations<br>

in many cases, and it becomes harder for the optimizer to combine<br>

loads and stores to unrelated parts of a bitfield because they may<br>

appear to be important to leave separate to avoid data races.<br>

<br>

<br>

One possibility which we specifically discussed was pre-splitting the<br>

wide load or store operations into "natural" integer widths for the<br>

target. This has (in my mind) two significant downsides:<br>

<br>

1) LLVM will hoist some of these operations across basic blocks which<br>

will prevent the target from merging them when profitable (say the<br>

memory is aligned, and it can do a 128bit store)<br>

<br>

2) It duplicates the logic to intelligently decompose a load or store<br>

of a wide integer between the frontend and the codegen layer in LLVM.<br>

This is particularly frustrating because the codegen layer will in<br>

many cases do a *better* job of decomposing things that the frontend<br>

will by taking advantage of the exact instructions available and the<br>

final alignment of the memory location.<br>

<div class="im"><br>

<br>

> 3) This is the sort of change that needs some numbers to back it up.<br>

> Performance results would be best, but a decent proxy for them in this case<br>

> would be to look at the code size of functions in 176.gcc or something else<br>

> that uses a bunch of bitfields.<br>

<br>

</div>I think this is crucial. =] I've run the LNT test suite with and<br>

without this patch, and I get no regressions (or improvements) in<br>

performance. That said, my box is quite noisy. If I can submit the<br>

patch, the LNT build bots will give us much more stable data I<br>

suspect.<br>

<br>

<br>

I don't have 176.gcc handy, but I looked at the binary sizes for every<br>

program in the LNT test suite. Here are the benchmarks which changed<br>

by at least 1%. First column is the benchmark, then the old size of<br>

the '.text' section, then the new size, and finally "(new - old) /<br>

old" to show the % change.<br>

<br>

MultiSource/Benchmarks/Fhourstones-3.1/Output/fhourstones3.1.simple,<br>

4872, 4984, 0.022988505747126436<br>

<br>

SingleSource/UnitTests/Output/2009-04-16-BitfieldInitialization.simple,<br>

952, 888, -0.067226890756302518<br>

SingleSource/UnitTests/Output/ms_struct-bitfield-init-1.simple, 664,<br>

680, 0.024096385542168676<br>

SingleSource/UnitTests/Output/2006-01-23-UnionInit.simple, 904, 920,<br>

0.017699115044247787<br>

SingleSource/Regression/C++/Output/2011-03-28-Bitfield.simple, 568,<br>

552, -0.028169014084507043<br>

<br>

So, one benchmark grew by 2.2%, but its a tiny benchmark to start<br>

with, so this isn't terribly interesting.<br>

<br>

Of the bitfield-heavy single source tests, we have 1.7%, 2.4%, -2.8%<br>

and -6.7%. So the growth here doesn't seem worrisome at all.<br>

<br>

<br>

Any other benchmarking you'd like me to do to validate this?<br>

</blockquote></div><br></div></div>