[llvm-commits] [llvm] r91184 - in /llvm/trunk: lib/Transforms/Scalar/ScalarReplAggregates.cpp test/Transforms/ScalarRepl/2009-12-11-NeonTypes.ll

Fri Dec 18 23:10:21 PST 2009

I made most of the changes you requested.  Comments/questions below....

On Dec 17, 2009, at 2:19 PM, Chris Lattner wrote:
> 
>> +  } else {
>> +    const ArrayType *AT = dyn_cast<ArrayType>(T);
>> +    assert(AT && "unexpected type for scalar replacement");
> 
> Please use cast<> if you know that it has to be an array.  However, is this really true?  Can't it be a vector?  If so, use SequentialType and explicitly assert it isn't a pointer.

It is really true.  The caller of that function check for arrays and structs only.  I changed it to use cast<>.

> 
> 
>> +    T = AT->getElementType();
>> +    uint64_t EltSize = TD->getTypeAllocSize(T);
>> +    Idx = (unsigned)(Offset / EltSize);
> 
> The returned Idx should be a uint64_t for large indexes into large arrays.

OK.  I had considered that earlier but since SROA only handles small arrays I decided it didn't matter.  I guess you could have a nested aggregate where the inner array was too big to address with 32 bits.  I'm still not sure if it wouldn't have been better to check for large arrays and exclude them earlier, but it's not a big deal either way.

> 
> With your rewrite, I think this constraint could be relaxed, allowing non-zero constant ints as the first index.  SRoA should be able to handle stuff like this pretty easily:
> 
>  A = alloca { i32, [4 x i32] }
>  B = gep A, 0, 1, 0
>  C = gep B, 2

Yes, it shouldn't be too hard to handle that.  I'll take a look at it.

> 
> Actually, on reflection, I'm not sure that this transformation is safe without more checking.  Consider indexing into the array with a variable index in a type like {[2 x i32], i32, i32}.  SRoA shouldn't try to canonicalize this because the index might be '3' (even though that is gross).
> 
> I'm not really sure how useful "variable indexes in two element arrays" really is anyway though.  It might be better to just remove this transformation.  That would simplify the code quite a bit too.

Yikes!  You are right.  The original code would only allow the 2-element array to be the outermost level of the aggregate, but with my changes, there is nothing to prevent it being nested.

The same thing is true of 1-element arrays with variable indexes, too.

I guess I will take that out....

>> 
>> +    if (BitCastInst *BC = dyn_cast<BitCastInst>(User)) {
>> +      if (BC->getOperand(0) == AI)
>> +        BC->setOperand(0, NewElts[0]);
> 
> What does this setOperand do?  If the invariant is that RewriteForScalarRepl leaves I with zero uses after  the transformation, then this isn't needed.  If I does have uses, this seems like potentially the wrong thing to do.  At the very least, this would benefit from a comment.

It was definitely the wrong thing to do, and that was why I had to revert the original patch.  In the current version of the code, if something needs to be modified, a new instruction is added and the uses are updated only after iterating through them.  Dead instructions are kept on a list and removed after walking over the instructions.

> 
>> +      // If the bitcast type now matches the operand type, it will be removed
>> +      // after processing its uses.
>> +      RewriteForScalarRepl(BC, AI, Offset, NewElts);
>> +    } else if (GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(User)) {
>> +      RewriteGEP(GEPI, AI, Offset, NewElts);
>> +    } else if (MemIntrinsic *MI = dyn_cast<MemIntrinsic>(User)) {
>> +      ConstantInt *Length = dyn_cast<ConstantInt>(MI->getLength());
>> +      uint64_t MemSize = Length->getZExtValue();
>> +      if (Offset == 0 &&
>> +          MemSize == TD->getTypeAllocSize(AI->getAllocatedType()))
>> +        RewriteMemIntrinUserOfAlloca(MI, I, AI, NewElts);
> 
> What is the 'else' case here?  If Offset != 0 it seems like a failure of the safety checking code?  Should the 'if' be turned into an assert?

I added a comment to explain that nothing needs to be done in the 'else' case: we've already checked that the mem intrinsic either references the whole aggregate or a single element.  In the latter case, the address operand will be updated and nothing needs to change in the mem intrinsic itself.

>> 
>> +  // Delete unused instructions and identity bitcasts.
>> +  if (I->use_empty())
>> +    I->eraseFromParent();
>> +  else if (BitCastInst *BC = dyn_cast<BitCastInst>(I)) {
>> +    if (BC->getDestTy() == BC->getSrcTy()) {
>> +      BC->replaceAllUsesWith(BC->getOperand(0));
>> +      BC->eraseFromParent();
> 
> This code (zapping identity bitcasts) seems both unneeded and harmful: this will add another use of the parent instruction whose use list is being iterated over.  Was this something that existed before or did you add it?  It is probably best to just remove this.

Yes, it's already gone.