[llvm-commits] PATCH: A new SROA implementation

Thu Aug 30 01:37:25 PDT 2012

As with my responses to Duncan, I'm replying to issues of interest here,
and will post a new patch after incorporating your feedback.

Thanks again for taking the time to look at this.

On Wed, Aug 22, 2012 at 8:35 AM, Chris Lattner <clattner at apple.com> wrote:

> On Aug 21, 2012, at 10:46 PM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
> > Ok, I think this patch is pretty close to being ready to go into the
> tree, modulo any detailed code review folks are willing to do. Just send me
> comments if you have them.
>
> Very nice, I look forward to never seeing an i1024 from SRoA again :)
>
> Here are some initial thoughts, I haven't done a detailed algorithmic
> analysis of this yet:
>
> > I've run it through the nightly test suite now as well as a bootstrap,
> and everything looks better than clear, it looks great. ;]
> >
> > I'm seeing some pretty significant execution performance improvements.
> One test is 27% faster, several range from 2% to 6% faster. Only a couple
> really slow down. I haven't dug in to the performance too much as currently
> we're still running the old SROA in the CGSCC passes.
> >
> > I've still got some code cleanup, minor refactorings I'd like to do, and
> the SSAUpdater work, but I'm trying to not grow the patch on people.
>
> +/// which are merely transfered in and out of external memory remain
> unchanged
>
> typo: transferred.
>
> +/// Because this also performs alloca promotion, it can be thought of as
> a also
>
> "as a also"
>
> Several #includes can be pruned out, GlobalVariable and Function.h at
> least.
>

Yes, I've not pruned the includes from the old pass. Sorry about any
confusion there, I just didn't want to mess w/ compilation errors.

>
>
> +  struct Partition {
> +  struct PartitionUse {
>
> Doxygen comments please.  These look similar, the comment should explain
> what they're for and why they're different.
>

Done, and one better. They now share a common base to factor away the
commonality.

>
> +  /// \brief MemTransferInst auxilliary data.
>
> typo "auxilliary" in a few places.
>
>
>
> I'm not crazy about template instantiation of your entire "BuilderBase"
> class, how much code size does this cost, to what benefit?
>

I don't know, and frankly I didn't even measure it. It's four relatively
small functions, and some instance variables. I mostly factored it out
because I was tired of fixing bugs in two places, and sharing the instance
variables was a real pain.

I can't imagine this is a big deal, at two copies of these functions. If it
is, I would actually blame the compiler. ;] At a fundamental level I don't
see a lot of easy ways to make the base class not be templated. I tried a
few times and it always ended up really uglying the code up....

>
> +static Value *getNaturalGEPWithType(IRBuilder<> IRB, const TargetData &TD,
> +                                    Value *BasePtr, Type *Ty, Type
> *TargetTy,
> +                                    SmallVectorImpl<Value *> &Indices,
> +                                    const Twine &Prefix) {
> ...
> +    if (SequentialType *SeqTy = dyn_cast<SequentialType>(ElementTy)) {
>
> Do we really want to form GEP's into vectors?
>

Hmm, I suppose I would be OK having the rewriter canonicalize *all* vector
accesses as whole vector accesses with insert element and extract element.
However, in the case that this is buried within a much larger unpromoteable
structure, and we may never access the vector as a whole, I worry a bit
about actually making the code bigger as a consequence.

It seems more important to work hard at doing this when there is a chance
of promotion.

>
> +class SROA : public FunctionPass {
> +bool SROA::rewriteAllocaPartition(AllocaInst &AI,
> ...
>
> Doxygen comments please.
>

Yes, many many doxygen comments (and non-doxygen comments) have been added.

Is my mixture of doxygen / non-doxygen comments problematic? I tend to use
doxygen comments whenever the comment is really documenting an interface,
and not when the comment only makes since as an aid to the reader of the
code itself.

>
>
> getNaturalGEPWithType and others can be simplified through the use of
> IRB.getInt32()
>

Wow, yea. I'm sad I kept missing these helpers when reading IRBuilder.

> getNaturalGEPWithOffset, getAdjustedPtr, etc should pass APInts by const
> reference.
>
> TargetTy != IRB.getInt8Ty()
> -> should be "!TargetTy->isIntegerTy(8)" for efficiency.
>
>
> +    // Find the operand we need to rewrite here.
> +    User::op_iterator OI = PN.op_begin(), OE = PN.op_end();
> +    for (; OI != OE && *OI != OldPtr; ++OI)
> +      ;
> +    assert(OI != OE);
>
> Is it feasible to not lose the Use index?  PHI nodes can have a lot of
> operands to scan though.
>

Well, this code is really just broken.

There isn't a single index -- the PHI might use the same pointer in several
different operands. We unique stuff so we only will visit the PHI node once
for a particular used value, but we need to replace that value anywhere
within the PHI. I've fixed the code to do this. The only other way I can
see to write this is to walk the use list of the OldPtr and check each User
is the PHI node? Not sure that would be any cheaper.

Note that we also do a full linear walk of the operands when doing
speculation. =/

> bool SROA::runOnAlloca(AllocaInst &AI) {
> ...
>
> This code handles the case when something is a trivially promotable scalar
> (an important base case for SRoA) but it doesn't handle the case where we
> have a trivial aggregate aggregate being accessed in a "type safe" way.  Is
> this an interesting case to handle specially to improve compile time?
>

Can you clarify what you mean here by 'trivial aggregate aggregate being
accessed in a "type safe" way"? I'm not sure which exact case you're
referring to.

> +  // No partitions to split. Leave the dead alloca for a later pass to
> clean up.
> +  if (P.begin() == P.end())
>
> P.empty()?
>

One more method I have to add to AllocaPartitioning. With only one callsite
it hardly seemed worth it.

>
> +void SROA::deleteDeadInstructions() {
>
> Is this the same as RecursivelyDeleteTriviallyDeadInstructions?

Yep, it's just an attempt to decrease the malloc traffic by re-using the
nice scratch space we already have. I also cheat and clear some of the
state at the same time. It also keeps nice statistics. Not sure how much
its worth re-using the recursive function, the bulk of the work is in
isInstructionTriviallyDead which it already uses.

>  Perhaps RecursivelyDeleteTriviallyDeadInstructions could be enhanced to
> take an optional pointer to a vector, where (when present) it records all
> the instruction*'s it deletes?
>

Yea, I can see a way to factor the existing utility function to be what I
want here. I can do that if you want either now or later.

> It would be nice if the "load speculation" stuff was pulled out of
> AllocaPartitionRewriter as some sort of orthogonal transformation.  Perhaps
> the initial analysis of an alloca can build a list of "selects and phis
> that need to be speculated" in order to unblock the transformation, and
> they could be processed independently of the other heavy lifting going on.
>

This is definitely something I think should happen eventually. There should
be a simple function call that does the speculation for a PHI node or a
select. That said, I don't really want to do this until the pass stabilizes
a bit in the interest of avoiding hoisting and abstracting the wrong logic
here.

> It might make sense to split this pass up into two files to separate the
> partition analysis code from the transformation and utility code.  Just a
> single header file (in Transforms/Scalar) might be enough, or go with
> another header + .cpp file in Transforms/Scalar.  What do you think?
>

Meh. I mean, I see how I could split it up like that, definitely, but it
doesn't quite seem worth it at this stage. Maybe later as it gets more
complicated?

I would vote doing this later if at all. It'd just get in my way to have it
in multiple files right now.

> FYI, on a random note, Dan Gohman is looking enhancing memcpy to get TBAA
> and "hole" information per PR12260/PR11946, which seems like it should drop
> in nicely.

Yep. I have been talking with him (and thanks for forwarding the proposal
while the mailing lists were down). I actually have a counter proposal for
one particular aspect of it I need to write up. ;] Anyways, thanks for the
feedback. I'm going to try to squash a few crashers Duncan found and then
post an updated patch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120830/a32ec2fb/attachment.html>