[llvm-commits] PATCH: A new SROA implementation

Wed Aug 22 08:35:25 PDT 2012

On Aug 21, 2012, at 10:46 PM, Chandler Carruth <chandlerc at gmail.com> wrote:
> Ok, I think this patch is pretty close to being ready to go into the tree, modulo any detailed code review folks are willing to do. Just send me comments if you have them.

Very nice, I look forward to never seeing an i1024 from SRoA again :)

Here are some initial thoughts, I haven't done a detailed algorithmic analysis of this yet:

> I've run it through the nightly test suite now as well as a bootstrap, and everything looks better than clear, it looks great. ;]
> 
> I'm seeing some pretty significant execution performance improvements. One test is 27% faster, several range from 2% to 6% faster. Only a couple really slow down. I haven't dug in to the performance too much as currently we're still running the old SROA in the CGSCC passes.
> 
> I've still got some code cleanup, minor refactorings I'd like to do, and the SSAUpdater work, but I'm trying to not grow the patch on people.

+/// which are merely transfered in and out of external memory remain unchanged

typo: transferred.

+/// Because this also performs alloca promotion, it can be thought of as a also

"as a also"

Several #includes can be pruned out, GlobalVariable and Function.h at least.

+  struct Partition {
+  struct PartitionUse {

Doxygen comments please.  These look similar, the comment should explain what they're for and why they're different.

+  /// \brief MemTransferInst auxilliary data.

typo "auxilliary" in a few places.

I'm not crazy about template instantiation of your entire "BuilderBase" class, how much code size does this cost, to what benefit?

+static Value *getNaturalGEPWithType(IRBuilder<> IRB, const TargetData &TD,
+                                    Value *BasePtr, Type *Ty, Type *TargetTy,
+                                    SmallVectorImpl<Value *> &Indices,
+                                    const Twine &Prefix) {
...
+    if (SequentialType *SeqTy = dyn_cast<SequentialType>(ElementTy)) {

Do we really want to form GEP's into vectors?

+class SROA : public FunctionPass {
+bool SROA::rewriteAllocaPartition(AllocaInst &AI,
...

Doxygen comments please.

getNaturalGEPWithType and others can be simplified through the use of IRB.getInt32()

getNaturalGEPWithOffset, getAdjustedPtr, etc should pass APInts by const reference.

TargetTy != IRB.getInt8Ty()
-> should be "!TargetTy->isIntegerTy(8)" for efficiency.

+    // Find the operand we need to rewrite here.
+    User::op_iterator OI = PN.op_begin(), OE = PN.op_end();
+    for (; OI != OE && *OI != OldPtr; ++OI)
+      ;
+    assert(OI != OE);

Is it feasible to not lose the Use index?  PHI nodes can have a lot of operands to scan though.

bool SROA::runOnAlloca(AllocaInst &AI) {
...

This code handles the case when something is a trivially promotable scalar (an important base case for SRoA) but it doesn't handle the case where we have a trivial aggregate aggregate being accessed in a "type safe" way.  Is this an interesting case to handle specially to improve compile time?

+  // No partitions to split. Leave the dead alloca for a later pass to clean up.
+  if (P.begin() == P.end())

P.empty()?

+void SROA::deleteDeadInstructions() {

Is this the same as RecursivelyDeleteTriviallyDeadInstructions?  Perhaps RecursivelyDeleteTriviallyDeadInstructions could be enhanced to take an optional pointer to a vector, where (when present) it records all the instruction*'s it deletes?

It would be nice if the "load speculation" stuff was pulled out of AllocaPartitionRewriter as some sort of orthogonal transformation.  Perhaps the initial analysis of an alloca can build a list of "selects and phis that need to be speculated" in order to unblock the transformation, and they could be processed independently of the other heavy lifting going on.

It might make sense to split this pass up into two files to separate the partition analysis code from the transformation and utility code.  Just a single header file (in Transforms/Scalar) might be enough, or go with another header + .cpp file in Transforms/Scalar.  What do you think?

FYI, on a random note, Dan Gohman is looking enhancing memcpy to get TBAA and "hole" information per PR12260/PR11946, which seems like it should drop in nicely.

-Chris