[PATCH] [PATCH][SROA]Also slice the STORE when slicing a LOAD in AllocaSliceRewriter
chandlerc at gmail.com
Tue Sep 2 00:27:36 PDT 2014
On Tue, Sep 2, 2014 at 12:11 AM, Hao Liu <Hao.Liu at arm.com> wrote:
> Hi James & Chandler,
> I have two small test cases to show James’ first concern. Test results
> show loop vectorizor generates quite poor code for wide store. To see the
> result by following command lines:
> opt –S –loop-vectorize < wide-store.ll
> opt –S –loop-vectorize < narrow-stores.ll
> The wide-store.ll and narrow-stores.ll are generated from attached
> struct.cpp by with or without the my patch. This cpp case is simplified
> from a hot function in SPEC CPU 2006 473.astar. Currently the poor code
> affects the performance.
> Hi Chandler,
> I also agree with your concern. On the other hand, If the input is
> zext/shl/or and a wide store, the patch in SROA can not handle such case.
> For example, if the input is wide-store.ll, only a separate pass or
> function specific to handle such case can generate simpler code.
> But there is a conflict, even though we add code in the backend, we still
> can’t solve the problem about the wide-store affecting the Loop
> Vectorization issue. For this concern, I think maybe we prefer narrow
> stores than wide store.
Before I dig into trying to explain various ways it is or isn't possible to
generate better code with the wide stores, I think it is really important
to understand why you aren't concerned about the memory model implications
here which cause us to *lose information* in the IR when splitting stores.
Once fundamental invariants of the program are lost, they simply cannot be
recovered. This seems to me to be the overriding concern. The fact that we
need to improve lots of other parts of LLVM -- well, yes, we need to do
lots of improvements to LLVM.
And none of these improvements seem bad. A user could just as easily have
written this kind of wide store in their code, and we will fail to optimize
it in all the ways you outline. No changes to SROA will fix this. We can
only emit efficient code when the *user* provides a wide store by actually
teaching the optimizer to analyze and emit efficient code for it. Once we
do that, we have also solved the "problem" for SROA.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits