[PATCH] Add basic support for removal of load that are fed by a store of an aggregate

Chandler Carruth chandlerc at google.com
Fri Jan 2 16:23:05 PST 2015


On Fri, Jan 2, 2015 at 4:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:

> So, basically you operate on an aggregate that contains two i8. In one
> case it is copied using memcpy and the other it is copied using load/store.
> What triggers your “bug” is that SROA does not handle these two copies the
> same way.
> The memcpy version is turned into two independent i8 stores:
>
>   %v.sroa.0.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0,
> i32 0
>   store i8 1, i8* %v.sroa.0.0.dst.sroa_idx, align 1
>   %v.sroa.2.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0,
> i32 1
>   store i8 2, i8* %v.sroa.2.0.dst.sroa_idx, align 1
>
> while the load/store version does not change the store:
>
>   %tmp.fca.0.insert = insertvalue { i8, i8 } undef, i8 1, 0
>   %tmp.fca.1.insert = insertvalue { i8, i8 } %tmp.fca.0.insert, i8 2, 1
>   store { i8, i8 } %tmp.fca.1.insert, { i8, i8 }* %0
>
>
> Now I am not sure if SROA shouldn’t produce the same result for the two
> inputs in this case? (i.e. splitting the store to the aggregate in store to
> the individual element)
> If not then teaching GVN about this case is probably correct.
>

I think it's neither...

The store that remains has nothing to do with an alloca. It is just a store
off to wild memory as an FCA. SROA shouldn't be touching it, and I don't
think we want to try to teach the entire optimizer about FCAs.

You're teaching GVN about FCAs in this patch, but we also reason about
store-to-load forwarding in *many* other places. I don't think it is really
feasible to teach every part of LLVM this.

The alternative is to define the canonical form as extracting the values
from the FCA and storing them individually. SROA does this for loads and
stores into allocas as a matter of correctness, but we could also teach
instcombine to do this for all FCA loads and stores as a matter of
canonical form and optimization. I think that is probably the right
direction long term, but I'm a little scared of the down-stream
rammifications.

The patch is trivial, and the code is already in SROA. Let me factor it out
and I can post a patch to see if it also solves your problems. But I want
to spend some time looking at what knock-on effects this has and whether
they are reasonable.


As a side note, I don't know what Rust's ABI concerns are, but if at all
possible, I would suggest moving away from FCAs in the frontend as much as
possible. Everything I have seen in LLVM is that they obstruct optimization
in significant ways.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150102/487f935d/attachment.html>


More information about the llvm-commits mailing list