[llvm-commits] PATCH: A new SROA implementation

Thu Sep 13 03:15:25 PDT 2012

Hello all,

Here is an updated patch. It contains all of the fixes I discussed here,
and it also contains some significant updates:

- Now has additionally survived intensive testing by Duncan. All crashers
found with all the dragonegg test suites (including many ada tests etc) are
fixed. The initial and immediate miscompile issues fixed. It's looking
pretty good testing wise.

- *Many* test cases added. Lots of bugs fixed. Generally, quite a bit of
hardening here. Duncan found several soft areas that I then was able to
more thoroughly exercise.

I think this is getting *really* close to being ready for an initial
commit. It is still behind a default-off flag. It will stay there for a
bit. In particular I want to get more testing, I want to implement
SSAUpdater-logic and tie it into the rest of the pass manager and collect
final performance results before enabling it. But I don't want to keep
hacking on this out of tree. =]

Anyways, review greatly appreciated.

On Thu, Aug 30, 2012 at 1:37 AM, Chandler Carruth <chandlerc at gmail.com>wrote:

> As with my responses to Duncan, I'm replying to issues of interest here,
> and will post a new patch after incorporating your feedback.
>
> Thanks again for taking the time to look at this.
>
> On Wed, Aug 22, 2012 at 8:35 AM, Chris Lattner <clattner at apple.com> wrote:
>
>> On Aug 21, 2012, at 10:46 PM, Chandler Carruth <chandlerc at gmail.com>
>> wrote:
>> > Ok, I think this patch is pretty close to being ready to go into the
>> tree, modulo any detailed code review folks are willing to do. Just send me
>> comments if you have them.
>>
>> Very nice, I look forward to never seeing an i1024 from SRoA again :)
>>
>> Here are some initial thoughts, I haven't done a detailed algorithmic
>> analysis of this yet:
>>
>> > I've run it through the nightly test suite now as well as a bootstrap,
>> and everything looks better than clear, it looks great. ;]
>> >
>> > I'm seeing some pretty significant execution performance improvements.
>> One test is 27% faster, several range from 2% to 6% faster. Only a couple
>> really slow down. I haven't dug in to the performance too much as currently
>> we're still running the old SROA in the CGSCC passes.
>> >
>> > I've still got some code cleanup, minor refactorings I'd like to do,
>> and the SSAUpdater work, but I'm trying to not grow the patch on people.
>>
>> +/// which are merely transfered in and out of external memory remain
>> unchanged
>>
>> typo: transferred.
>>
>> +/// Because this also performs alloca promotion, it can be thought of as
>> a also
>>
>> "as a also"
>>
>> Several #includes can be pruned out, GlobalVariable and Function.h at
>> least.
>>
>
> Yes, I've not pruned the includes from the old pass. Sorry about any
> confusion there, I just didn't want to mess w/ compilation errors.
>
>
>>
>>
>> +  struct Partition {
>> +  struct PartitionUse {
>>
>> Doxygen comments please.  These look similar, the comment should explain
>> what they're for and why they're different.
>>
>
> Done, and one better. They now share a common base to factor away the
> commonality.
>
>
>>
>> +  /// \brief MemTransferInst auxilliary data.
>>
>> typo "auxilliary" in a few places.
>>
>>
>>
>> I'm not crazy about template instantiation of your entire "BuilderBase"
>> class, how much code size does this cost, to what benefit?
>>
>
> I don't know, and frankly I didn't even measure it. It's four relatively
> small functions, and some instance variables. I mostly factored it out
> because I was tired of fixing bugs in two places, and sharing the instance
> variables was a real pain.
>
> I can't imagine this is a big deal, at two copies of these functions. If
> it is, I would actually blame the compiler. ;] At a fundamental level I
> don't see a lot of easy ways to make the base class not be templated. I
> tried a few times and it always ended up really uglying the code up....
>
>
>>
>> +static Value *getNaturalGEPWithType(IRBuilder<> IRB, const TargetData
>> &TD,
>> +                                    Value *BasePtr, Type *Ty, Type
>> *TargetTy,
>> +                                    SmallVectorImpl<Value *> &Indices,
>> +                                    const Twine &Prefix) {
>> ...
>> +    if (SequentialType *SeqTy = dyn_cast<SequentialType>(ElementTy)) {
>>
>> Do we really want to form GEP's into vectors?
>>
>
> Hmm, I suppose I would be OK having the rewriter canonicalize *all* vector
> accesses as whole vector accesses with insert element and extract element.
> However, in the case that this is buried within a much larger unpromoteable
> structure, and we may never access the vector as a whole, I worry a bit
> about actually making the code bigger as a consequence.
>
> It seems more important to work hard at doing this when there is a chance
> of promotion.
>
>
>>
>> +class SROA : public FunctionPass {
>> +bool SROA::rewriteAllocaPartition(AllocaInst &AI,
>> ...
>>
>> Doxygen comments please.
>>
>
> Yes, many many doxygen comments (and non-doxygen comments) have been added.
>
> Is my mixture of doxygen / non-doxygen comments problematic? I tend to use
> doxygen comments whenever the comment is really documenting an interface,
> and not when the comment only makes since as an aid to the reader of the
> code itself.
>
>
>>
>>
>> getNaturalGEPWithType and others can be simplified through the use of
>> IRB.getInt32()
>>
>
> Wow, yea. I'm sad I kept missing these helpers when reading IRBuilder.
>
>
>> getNaturalGEPWithOffset, getAdjustedPtr, etc should pass APInts by const
>> reference.
>>
>> TargetTy != IRB.getInt8Ty()
>> -> should be "!TargetTy->isIntegerTy(8)" for efficiency.
>>
>>
>> +    // Find the operand we need to rewrite here.
>> +    User::op_iterator OI = PN.op_begin(), OE = PN.op_end();
>> +    for (; OI != OE && *OI != OldPtr; ++OI)
>> +      ;
>> +    assert(OI != OE);
>>
>> Is it feasible to not lose the Use index?  PHI nodes can have a lot of
>> operands to scan though.
>>
>
> Well, this code is really just broken.
>
> There isn't a single index -- the PHI might use the same pointer in
> several different operands. We unique stuff so we only will visit the PHI
> node once for a particular used value, but we need to replace that value
> anywhere within the PHI. I've fixed the code to do this. The only other way
> I can see to write this is to walk the use list of the OldPtr and check
> each User is the PHI node? Not sure that would be any cheaper.
>
> Note that we also do a full linear walk of the operands when doing
> speculation. =/
>
>
>> bool SROA::runOnAlloca(AllocaInst &AI) {
>> ...
>>
>> This code handles the case when something is a trivially promotable
>> scalar (an important base case for SRoA) but it doesn't handle the case
>> where we have a trivial aggregate aggregate being accessed in a "type safe"
>> way.  Is this an interesting case to handle specially to improve compile
>> time?
>>
>
> Can you clarify what you mean here by 'trivial aggregate aggregate being
> accessed in a "type safe" way"? I'm not sure which exact case you're
> referring to.
>
>
>> +  // No partitions to split. Leave the dead alloca for a later pass to
>> clean up.
>> +  if (P.begin() == P.end())
>>
>> P.empty()?
>>
>
> One more method I have to add to AllocaPartitioning. With only one
> callsite it hardly seemed worth it.
>
>
>>
>> +void SROA::deleteDeadInstructions() {
>>
>> Is this the same as RecursivelyDeleteTriviallyDeadInstructions?
>
>
> Yep, it's just an attempt to decrease the malloc traffic by re-using the
> nice scratch space we already have. I also cheat and clear some of the
> state at the same time. It also keeps nice statistics. Not sure how much
> its worth re-using the recursive function, the bulk of the work is in
> isInstructionTriviallyDead which it already uses.
>
>
>>  Perhaps RecursivelyDeleteTriviallyDeadInstructions could be enhanced to
>> take an optional pointer to a vector, where (when present) it records all
>> the instruction*'s it deletes?
>>
>
> Yea, I can see a way to factor the existing utility function to be what I
> want here. I can do that if you want either now or later.
>
>
>> It would be nice if the "load speculation" stuff was pulled out of
>> AllocaPartitionRewriter as some sort of orthogonal transformation.  Perhaps
>> the initial analysis of an alloca can build a list of "selects and phis
>> that need to be speculated" in order to unblock the transformation, and
>> they could be processed independently of the other heavy lifting going on.
>>
>
> This is definitely something I think should happen eventually. There
> should be a simple function call that does the speculation for a PHI node
> or a select. That said, I don't really want to do this until the pass
> stabilizes a bit in the interest of avoiding hoisting and abstracting the
> wrong logic here.
>
>
>> It might make sense to split this pass up into two files to separate the
>> partition analysis code from the transformation and utility code.  Just a
>> single header file (in Transforms/Scalar) might be enough, or go with
>> another header + .cpp file in Transforms/Scalar.  What do you think?
>>
>
> Meh. I mean, I see how I could split it up like that, definitely, but it
> doesn't quite seem worth it at this stage. Maybe later as it gets more
> complicated?
>
> I would vote doing this later if at all. It'd just get in my way to have
> it in multiple files right now.
>
>
>> FYI, on a random note, Dan Gohman is looking enhancing memcpy to get TBAA
>> and "hole" information per PR12260/PR11946, which seems like it should drop
>> in nicely.
>
>
> Yep. I have been talking with him (and thanks for forwarding the proposal
> while the mailing lists were down). I actually have a counter proposal for
> one particular aspect of it I need to write up. ;] Anyways, thanks for the
> feedback. I'm going to try to squash a few crashers Duncan found and then
> post an updated patch.
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120913/5cc5a6ee/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sroa-rewrite.patch
Type: application/octet-stream
Size: 157616 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120913/5cc5a6ee/attachment.obj>