Add a pass to convert aggregate loads/stores into scalar load stores
Philip Reames
listmail at philipreames.com
Fri Oct 31 12:00:19 PDT 2014
On 10/30/2014 04:31 PM, Reid Kleckner wrote:
>
>
> On Thu, Oct 30, 2014 at 2:08 PM, Philip Reames
> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>
>
> On 10/28/2014 09:27 AM, Chandler Carruth wrote:
>>
>> On Tue, Oct 28, 2014 at 6:55 AM, Hal Finkel <hfinkel at anl.gov
>> <mailto:hfinkel at anl.gov>> wrote:
>>
>> Chandler,
>>
>> Can you please look at this? How do you think we should
>> canonicalize this (is this the right approach)?
>>
>>
>> Oof... yea.
>>
>> I wish I were more confident of what the "right" answer is here
>> any more. =[
> At the dev conference, Hal and I talked about a couple of
> options. I'm sure which is the "right" one, so let me lay out
> what I remember of that discussion.
>
> Option 1 - Leave the load alone, improve GVN
> - not necessarily a bad option, but slightly risky in that it
> involves changes to key infrastructure with little in tree motivation
> - the original change doing this was rejected
>
> Option 2 - Transform to load of component element types
> - tricky to get layout exactly right, but definitely doable
>
> Option 3 - Transform to load of iN where N is sizeof(agg)*8.
>
> Option 4 - Transform to series of smaller integer loads
> - This appears to be what this patch implements. Not entirely
> sure why this was chosen.
>
> Option 5 - Transform to alloca and memcpy
> - Clang appears to emit loads of structs via an alloca (for the
> local) and a memcpy. The optimizer deconstructs this where
> appropriate.
> - I have no idea why Clang chose this option. My best guess is to
> exploit information about POD types?
>
> Personally, I'd lean towards 5,1,2 above (in roughly that order).
> 1 & 2 seem like better long term solutions, but 5 probably works
> fairly well today. I'm not really a fan of either 3 or 4 since we
> loose things like distinctions between pointers and integers. If
> we had to choose, I'd take 3 over 4.
>
> I think we also discussed the trade off between a pass and an
> instcombine transform. I would lean towards making whichever
> option we chose a canonicalization rule in instcombine.
>
> Also, this patch implements option 2 for a struct with a single
> element type which seems like a (independently) useful
> canonicalization. Should we introduce that transform as a
> canonicalization in instcombine? I'd lean towards that.
>
> Chandler, Hal - Thoughts, opinions?
>
>
> 2 seems like the best option to me.
As others have pointed out, 2 is not ideal in that it looses information
about the widest legal load or store.
> When you load and store an FCA, you don't get to copy the padding of
> the struct with you. Once it's loaded, each element is its own value.
> There's no way to recover the padding. memcpy doesn't represent this.
Reed, I'm a bit confused here. Do you mean the padding *around* a
struct, or the padding *within* a struct? For the later, with data
layout our existing memcpy based optimizations from clang seem to deal
with it just fine. Do you have a particular test case in mind which
shows the memcpy approach not working?
This is what clang appears to use today. If you have a case it doesn't
work for, we should fix that. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141031/c82d89de/attachment.html>
More information about the llvm-commits
mailing list