[PATCH] [PATCH][SROA]Also slice the STORE when slicing a LOAD in AllocaSliceRewriter

Tue Sep 2 03:46:58 PDT 2014

Hi all,

I've just thrashed this out further with Chandler on IRC, and thought I'd
summarize what came up here. Chandler will rebutt if I make a mistake, I'm
sure :)

The main point is that there are instcombines and bunch of
arithmetic-simplifying optimizations that rely on SROA producing wide
stores. In fact, it isn't the fact that the store is wide that matters, it
is that there is a merged SSA node.

%a = i32 ...
%b = i32 ...
%c = shl i64 (zext i32 %a to i64), i32 32
%d = or i64 %c, (zext i32 %b to i64)
store i64* %p, i64 %d

%e = load i64* %p
...

In the above example, optimizers can follow the load of %p through, via the
store and end up with two constituent i32's that they can then do magic
with. In the alternative scenario:

%a = i32 ...
%b = i32 ...
%q = bitcast i64* %p to i32*
store i32* (getelementptr i32* %q, i32 0), i32 %a
store i32* (getelementptr i32* %a, i32 1), i32 %b

%e = load i64* %p
...

The optimizer can no longer see easily that %e is the concatenation of %a
and %b.

This is an important property, and is the main reason for not splitting
wide stores to match their loads. Most importantly, this optimization
happens in InstCombine and we run InstCombine after SLP and Loop
Vectorization, which means the IR should be in this form at least up until
the end of vectorization (or we lose this optimization after vectorization).

This then means that we need to teach the vectorizers and code metrics
about the cost of splitting concatenated stores, which is likely to be
awkward but we are left with little choice.

Cheers,

James

On 2 September 2014 11:20, Jiangning Liu <liujiangning1 at gmail.com> wrote:

> Hi Chandler,
>
>
>> Once you start slicing up memory accesses, *you break SSA form* and all
>> of the analyses that depend on it. I cannot express how strongly I feel
>> this is a very bad idea and the wrong direction in the middle end.
>>
>>
> What do you mean by "you break SSA form" for the case Hao's patch is
> solving? Do you mean some SSA form info could be lost?
>
> The transformation of Hao's patch is to change a single wide store to two
> separate narrow stores, but the address of those two narrow stores are
> still sequential. For the memory stored here we don't have any SSA
> information attached at all, right?
>
> So what SSA form information could be lost? And what optimization could be
> affected? Can you give an example?
>
> Thanks,
> -Jiangning
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140902/62e8cd41/attachment.html>