Add a pass to convert aggregate loads/stores into scalar load stores

Thu Oct 30 17:34:17 PDT 2014

You can also model the clobbering of padding in aggregate stores with store
of undef.

I don't like the memcpy approach because we introduce a new memory
temporary that wasn't present in the input LLVM IR, and then rely on SROA
to zap it for us later. I think we should skip that step and transform
aggregate-load+extractvalue directly into scalar loads.

On Thu, Oct 30, 2014 at 4:53 PM, Finkel, Hal J. <hfinkel at anl.gov> wrote:

>  Good point. You can add struct tbaa to the memcpy to preserve padding.
>
>  -Hal
>
>  *Sent from my Verizon Wireless 4G LTE DROID*
>
>
> Reid Kleckner <rnk at google.com> wrote:
>
>
>
> On Thu, Oct 30, 2014 at 2:08 PM, Philip Reames <listmail at philipreames.com>
> wrote:
>
>>
>> On 10/28/2014 09:27 AM, Chandler Carruth wrote:
>>
>>
>> On Tue, Oct 28, 2014 at 6:55 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>>> Chandler,
>>>
>>> Can you please look at this? How do you think we should canonicalize
>>> this (is this the right approach)?
>>>
>>
>> Oof... yea.
>>
>>  I wish I were more confident of what the "right" answer is here any
>> more. =[
>>
>>  At the dev conference, Hal and I talked about a couple of options.  I'm
>> sure which is the "right" one, so let me lay out what I remember of that
>> discussion.
>>
>> Option 1 - Leave the load alone, improve GVN
>> - not necessarily a bad option, but slightly risky in that it involves
>> changes to key infrastructure with little in tree motivation
>> - the original change doing this was rejected
>>
>> Option 2 - Transform to load of component element types
>> - tricky to get layout exactly right, but definitely doable
>>
>> Option 3 - Transform to load of iN where N is sizeof(agg)*8.
>>
>> Option 4 - Transform to series of smaller integer loads
>> - This appears to be what this patch implements.  Not entirely sure why
>> this was chosen.
>>
>> Option 5 - Transform to alloca and memcpy
>> - Clang appears to emit loads of structs via an alloca (for the local)
>> and a memcpy.  The optimizer deconstructs this where appropriate.
>> - I have no idea why Clang chose this option.  My best guess is to
>> exploit information about POD types?
>>
>> Personally, I'd lean towards 5,1,2 above (in roughly that order).  1 & 2
>> seem like better long term solutions, but 5 probably works fairly well
>> today.  I'm not really a fan of either 3 or 4 since we loose things like
>> distinctions between pointers and integers.  If we had to choose, I'd take
>> 3 over 4.
>>
>> I think we also discussed the trade off between a pass and an instcombine
>> transform.  I would lean towards making whichever option we chose a
>> canonicalization rule in instcombine.
>>
>> Also, this patch implements option 2 for a struct with a single element
>> type which seems like a (independently) useful canonicalization.  Should we
>> introduce that transform as a canonicalization in instcombine?  I'd lean
>> towards that.
>>
>> Chandler, Hal - Thoughts, opinions?
>>
>
>  2 seems like the best option to me. When you load and store an FCA, you
> don't get to copy the padding of the struct with you. Once it's loaded,
> each element is its own value. There's no way to recover the padding.
> memcpy doesn't represent this.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141030/dd296183/attachment.html>