Add a pass to convert aggregate loads/stores into scalar load stores

Fri Oct 31 12:00:19 PDT 2014

On 10/30/2014 04:31 PM, Reid Kleckner wrote:
>
>
> On Thu, Oct 30, 2014 at 2:08 PM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>
>
>     On 10/28/2014 09:27 AM, Chandler Carruth wrote:
>>
>>     On Tue, Oct 28, 2014 at 6:55 AM, Hal Finkel <hfinkel at anl.gov
>>     <mailto:hfinkel at anl.gov>> wrote:
>>
>>         Chandler,
>>
>>         Can you please look at this? How do you think we should
>>         canonicalize this (is this the right approach)?
>>
>>
>>     Oof... yea.
>>
>>     I wish I were more confident of what the "right" answer is here
>>     any more. =[
>     At the dev conference, Hal and I talked about a couple of
>     options.  I'm sure which is the "right" one, so let me lay out
>     what I remember of that discussion.
>
>     Option 1 - Leave the load alone, improve GVN
>     - not necessarily a bad option, but slightly risky in that it
>     involves changes to key infrastructure with little in tree motivation
>     - the original change doing this was rejected
>
>     Option 2 - Transform to load of component element types
>     - tricky to get layout exactly right, but definitely doable
>
>     Option 3 - Transform to load of iN where N is sizeof(agg)*8.
>
>     Option 4 - Transform to series of smaller integer loads
>     - This appears to be what this patch implements.  Not entirely
>     sure why this was chosen.
>
>     Option 5 - Transform to alloca and memcpy
>     - Clang appears to emit loads of structs via an alloca (for the
>     local) and a memcpy.  The optimizer deconstructs this where
>     appropriate.
>     - I have no idea why Clang chose this option.  My best guess is to
>     exploit information about POD types?
>
>     Personally, I'd lean towards 5,1,2 above (in roughly that order). 
>     1 & 2 seem like better long term solutions, but 5 probably works
>     fairly well today.  I'm not really a fan of either 3 or 4 since we
>     loose things like distinctions between pointers and integers.  If
>     we had to choose, I'd take 3 over 4.
>
>     I think we also discussed the trade off between a pass and an
>     instcombine transform.  I would lean towards making whichever
>     option we chose a canonicalization rule in instcombine.
>
>     Also, this patch implements option 2 for a struct with a single
>     element type which seems like a (independently) useful
>     canonicalization.  Should we introduce that transform as a
>     canonicalization in instcombine?  I'd lean towards that.
>
>     Chandler, Hal - Thoughts, opinions?
>
>
> 2 seems like the best option to me.
As others have pointed out, 2 is not ideal in that it looses information 
about the widest legal load or store.
> When you load and store an FCA, you don't get to copy the padding of 
> the struct with you. Once it's loaded, each element is its own value. 
> There's no way to recover the padding. memcpy doesn't represent this.
Reed, I'm a bit confused here.  Do you mean the padding *around* a 
struct, or the padding *within* a struct?  For the later, with data 
layout our existing memcpy based optimizations from clang seem to deal 
with it just fine.  Do you have a particular test case in mind which 
shows the memcpy approach not working?

This is what clang appears to use today.  If you have a case it doesn't 
work for, we should fix that.  :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141031/c82d89de/attachment.html>