[PATCH] [PATCH][SROA]Also slice the STORE when slicing a LOAD in AllocaSliceRewriter

Fri Aug 29 02:23:09 PDT 2014

On Wed, Aug 27, 2014 at 8:46 PM, Hao Liu <Hao.Liu at arm.com> wrote:

> I think you concern about when the narrow LOADs are folded back, we can’t
> fuse the narrow STOREs back. Am I get your point?
>

No, my concern is about when we completely remove the loads through GVN or
even the mem2reg process that runs in SROA itself. The sliced loads are
expected to go away and become SSA registers. We might well be able to then
fold away the zext / shl / or / etc into the math that feeds those SSA
values. But we will in most cases fail to fuse the stores back together
once they are split.

> But I think the narrow LOADs won’t be folded back, as the SROA checks such
> LOAD can be split and removed. So if the narrow LOADs won’t exist, there
> are two choices for us:
>
> (1)    The additional ZEXT/SHL/OR and wide STORE.
>
> (2)    Two narrow STOREs.
>
> I still prefer the (2) than the (1). I think a better optimization for (1)
> is to split the STORE. Even if there are other optimizations to change the
> wide STORE, we’ll still have ZEXT/SHL/OR left. Such code with bit math
> seems not the best choice.
>

I'm not really sure what you mean about having instructions left over.

The fundamental thing is this: the width of memory stored to is actually a
very important property of the source program. It clarifies the maximum
width of memory that is correct to store two in a single instruction.
Splitting or narrowing a store is often irreversible because fusing or
widening can introduce data races.

As a consequence, it is a conscious choice throughout the optimizer to
preserve the maximal width of stores (and to a lesser extend loads). This
preserves the information in the middle-end about what freedoms the source
program has w.r.t. to memory accesses and data races.

Also, with the IR produced by SROA, the information needed is still
present. I think the problem is that both backends need to be taught the
trick of using multiple stores at indexed offsets to save math combining
two values. That's my suggestion for how to improve the quality of code for
these patterns.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140829/d0ded467/attachment.html>