[PATCH] [PATCH][SROA]Also slice the STORE when slicing a LOAD in AllocaSliceRewriter
chandlerc at gmail.com
Tue Sep 2 02:43:04 PDT 2014
On Tue, Sep 2, 2014 at 2:26 AM, James Molloy <james at jamesmolloy.co.uk>
> So my (biased) summary would be:
> *Wide stores*
> + Preserve semantic information about consecutive/wide accesses
> + Users can already write them, so we have to handle them somehow anyway
- We don't currently handle them well in all cases in the vectorizer or in
the code generator.
> *Narrow stores*
> + Fewer constraints in the IR, provides more flexibility for optimizers
I don't understand this at all. The optimizer has *less* flexibility here.
Perhaps what you mean to say is that the optimizer already tends to
generate good code for these? That much is true.
> + IR closer matches expected generated machine code - IR-based
> heuristics more accurate
I mean, sure. But this seems pretty insignificant to me. I don't understand
why jump-threading would care. I don't think the inliner would care enough
for it to ever matter.
If you want to tilt at this windmill, there are just piles of places where
we diverge more wildly. For example, the existence of bitcasts. Or any of
the illegal operations on vector types that will cause an explosion of
machine code during legalization.
> - Have to write code to split up wide stores into narrow stores, if
> deemed useful (if they come from an OR/SHL?)
And fix all of the *myriad* of places where we suddenly stop re-combining
this arithmetic later on. Passes like instcombine can reason about a single
store being fed by this arithmetic *fundamentally better* than reasoning
about two stores having consecutive pointers. How would you even teach it
about such pointers?
> - Have to reconstruct consecutive pointer information, we already do
> this but has the potential to fail in some cases.
I guess maybe this is where you were hinting at the above problem.
Once you start slicing up memory accesses, *you break SSA form* and all of
the analyses that depend on it. I cannot express how strongly I feel this
is a very bad idea and the wrong direction in the middle end.
> *An alternative?*
> * If the above hasn't convinced you, how about an intrinsic that
> concatenates operands into memory? This could preserve the semantics and
> also can be inspected and treated differently in the vectorizers (and
> doesn't require an OR/SHL sequence).
> declare void llvm.store.wide.i64(i64* %ptr, ...)
I really don't know why we wouldn't just match the bit-math sequences that
form this? Is there something that makes matching these patterns really
deeply problematic? I understand that the DAG may just be missing the
information due to the basic block boundary, but the vectorizers should
definitely be able to reconstruct it, as should stuff like codegenprep and
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits