<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Aug 26, 2013 at 1:08 PM, Quentin Colombet <span dir="ltr"><<a href="mailto:qcolombet@apple.com" target="_blank" class="cremed">qcolombet@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>After looking into SROA, Nadav and I agreed that it does the right thing.</div><div>Therefore, the initial proposed patch is still the one to be reviewed.</div>

<div class="im"><div></div></div></blockquote></div><br>Very, very sorry for the late comment here. I misunderstood this comment and the result.</div><div class="gmail_extra"><br></div><div class="gmail_extra">I agree that SROA is correct here, and I also think that this should be the canonical form. With the LLVM memory model, it is very hard to merge two smaller loads back together if that is ever profitable. It is essentially impossible in many cases to merge two smaller stores back together if that is ever profitable. As such, it is very useful to preserve the widest known-safe load and store size as far as possible in the optimizer. At least getting it to the backend where the cost factor for various load and store operations is known is essential.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">Can we canonicalize toward the wide loads and stores with appropriate masking to extract narrow values, and then match these back to the small stores in the backend?</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">Both SROA and optimal bitfield code generation rely on this (at least for x86) so changing it will regress some things.</div></div>