[LLVMdev] Overzealous PromoteCastOfAllocation

Tue Sep 23 04:16:03 PDT 2008

Hi Dan,

> Oh, ok. So code that takes an alloca, bitcasts the address to a higher
> alignment,
Since alignment is not a property of a type, I don't think you can "bitcast to
a higher alignment" as such. I still understand what you mean, though :-)

> and then does a load or store at the higher alignment, is
> invoking undefined behavior. The alignment attribute on a load or store
> is an assertion about the actual alignment of the memory.
Should this undefined behaviour be caught by the verifier, or is it ok for it
to exist?

In any case, since this behaviour is undefined, it's not quite needed for the
instcombine pass to propagate the ABI alignment from a type bitcasted to
upwards into the alloca instruction, as you suggested. The frontend generating
the IR should have done this already when needed (when a load with that higher
alignment uses the bitcasted value, or the bitcasted value is passed to
another function, which requires ABI alignment).

I've now created a patch that makes sure that any struct typed alloca is never
changed by instcombine, unless all it's uses are bitcasts to the same type.
This seems like the most sane way to do things. This change did expose some
alignment-related bug in llvm-gcc, see PR2823. This means that committing
preserve-struct.diff will result in a test-suite failure (at least
SingleSource/UnitTests/Vector/divides, I haven't finished testing Multisource
yet), but this is caused by llvm-gcc, not instcombine.

Related to this issue, I've created two more patches. I'm including them here
for review, comments are welcome. I'm not sure if these patches actually make
a lot of difference in real world applications and they are not yet sufficient
to make scalarrepl do its work completely in our example code (lots of struct
passing and the resulting memmoves), but they will at least make the resulting
code more logical and make it possible to get there.

I've tested these patches against the dejagnu tests and the test-suite,
finding only two failures. The first is the one regarding PR2823 referenced
above. The second one is concerning
MultiSource/Benchmarks/MiBench/security-blowfish, for some reason opt emits an
invalid bitcode file (the bitcode reader asserts). From playing with the
debugger, it seems a store instruction is asked to store a an i8* to an 
[8 x i8]*. However, since the verifier doesn't catch this before outputting
the bitcode, it's probably something more involved. I'll look into this issue
a bit closer later (currently I don't even know which of the three patches
causes it).

Anyway, review of these patches and opinions as to their usefulness would be
welcome!

Gr.

Matthijs
-------------- next part --------------
Index: test/Transforms/InstCombine/2008-09-22-small-struct-memmove.ll
===================================================================

--- test/Transforms/InstCombine/2008-09-22-small-struct-memmove.ll	(revision 0)
+++ test/Transforms/InstCombine/2008-09-22-small-struct-memmove.ll	(revision 0)
@@ -0,0 +1,23 @@
+; This checks if instcombine replaces small struct memmoves with loads and
+; stores of a struct, instead of using a equally sized integer.
+
+; RUN: llvm-as < %s | opt -instcombine | llvm-dis > %t
+; Use . instead of % in grep, since %s in %struct gets replaced...
+; RUN: cat %t | grep {load .struct.two*}
+
+%struct.two = type <{ i16, i16 }>
+
+; Dummy function to give %S another use below.
+declare void @fill(%struct.two*)
+
+define void @main(%struct.two* %D) {
+entry:
+	%S = alloca %struct.two
+        call void @fill (%struct.two* %S)
+	%tmpS = bitcast %struct.two* %S to i8*
+	%tmpD = bitcast %struct.two* %D to i8*
+        call void @llvm.memmove.i32(i8* %tmpD, i8* %tmpS, i32 4, i32 1)
+        ret void
+}
+
+declare void @llvm.memmove.i32(i8*, i8*, i32, i32) nounwind
Index: lib/Transforms/Scalar/InstructionCombining.cpp
===================================================================
--- lib/Transforms/Scalar/InstructionCombining.cpp	(revision 56433)
+++ lib/Transforms/Scalar/InstructionCombining.cpp	(working copy)
@@ -8707,7 +8800,7 @@
           break;
       }
       
-      if (SrcETy->isSingleValueType())
+      if (SrcETy->isFirstClassType())
         NewPtrTy = PointerType::getUnqual(SrcETy);
     }
   }
-------------- next part --------------
Index: test/Transforms/InstCombine/2008-09-22-preserve-struct-alloca.ll
===================================================================
--- test/Transforms/InstCombine/2008-09-22-preserve-struct-alloca.ll	(revision 0)
+++ test/Transforms/InstCombine/2008-09-22-preserve-struct-alloca.ll	(revision 0)
@@ -0,0 +1,27 @@
+; This checks if instcombine leaves struct alloca's intact. It used to change
+; the below alloca to an alloca i32, which doesn't really make sense (and can in
+; a lot of cases confuse other passes such as scalarrepl.
+; Now, instcombine only touches struct alloca's when all uses are bitcasts to
+; the same type.
+
+; RUN: llvm-as < %s | opt -instcombine | llvm-dis > %t
+; RUN: cat %t | grep {alloca .struct.two}
+
+%struct.two = type <{ i16, i16 }>
+
+; Dummy function to give %S another use below.
+declare void @fill(%struct.two*)
+
+define void @main(%struct.two* %D) {
+entry:
+	%S = alloca %struct.two
+        call void @fill (%struct.two* %S)
+        ; Copy %S to %D byt loading and storing an i32. This is written this way
+        ; to lure instcombine into replacing the alloca above with an alloca
+        ; i32.
+	%tmpS = bitcast %struct.two* %S to i32*
+	%tmpD = bitcast %struct.two* %D to i32*
+	%val = load i32* %tmpS	
+	store i32 %val, i32* %tmpD
+        ret void
+}
Index: lib/Transforms/Scalar/InstructionCombining.cpp
===================================================================
--- lib/Transforms/Scalar/InstructionCombining.cpp	(revision 56433)
+++ lib/Transforms/Scalar/InstructionCombining.cpp	(working copy)
@@ -6946,6 +6951,17 @@
   // increasing the alignment of the resultant allocation.  If we keep it the
   // same, we open the door to infinite loops of various kinds.
   if (!AI.hasOneUse() && CastElTyAlign == AllocElTyAlign) return 0;
+  
+  // If the allocation has a struct type and has other uses than casts to
+  // PTy, don't change the alloca. Changing it would prevent passes like
+  // scalarrepl from doing their work later on.
+  if (!AI.hasOneUse() && isa<StructType>(AllocElTy))
+    for (Value::use_iterator UI = AI.use_begin(), UE = AI.use_end();
+         UI != UE; ++UI) {
+      CastInst *CI = dyn_cast<CastInst>(UI);
+      if (!CI || CI->getDestTy() != PTy)
+        return 0;
+    }
 
   uint64_t AllocElTySize = TD->getABITypeSize(AllocElTy);
   uint64_t CastElTySize = TD->getABITypeSize(CastElTy);
-------------- next part --------------
Index: test/Transforms/ScalarRepl/2008-09-22-vector-gep.ll
===================================================================
--- test/Transforms/ScalarRepl/2008-09-22-vector-gep.ll	(revision 0)
+++ test/Transforms/ScalarRepl/2008-09-22-vector-gep.ll	(revision 0)
@@ -0,0 +1,24 @@
+; This test checks to see if scalarrepl also works when a gep with all zeroes is
+; used instead of a bitcast to prepare a memmove pointer argument. Previously,
+; this would not work when there was a vector involved in the struct, preventing
+; scalarrepl from removing the alloca below.
+
+; RUN: llvm-as < %s | opt -scalarrepl | llvm-dis > %t
+; RUN: cat %t | not grep alloca
+
+%struct.two = type <{ < 2 x i8 >, i16 }>
+
+define void @main(%struct.two* %D, i16 %V) {
+entry:
+	%S = alloca %struct.two
+        %S.2 = getelementptr %struct.two* %S, i32 0, i32 1
+        store i16 %V, i16* %S.2
+        ; This gep is effectively a bitcast to i8*, but is sometimes generated
+        ; because the type of the first element in %struct.two is i8.
+	%tmpS = getelementptr %struct.two* %S, i32 0, i32 0, i32 0 
+	%tmpD = bitcast %struct.two* %D to i8*
+        call void @llvm.memmove.i32(i8* %tmpD, i8* %tmpS, i32 4, i32 1)
+        ret void
+}
+
+declare void @llvm.memmove.i32(i8*, i8*, i32, i32) nounwind
Index: lib/Transforms/Scalar/ScalarReplAggregates.cpp
===================================================================
--- lib/Transforms/Scalar/ScalarReplAggregates.cpp	(revision 56433)
+++ lib/Transforms/Scalar/ScalarReplAggregates.cpp	(working copy)
@@ -530,8 +530,9 @@
       return MarkUnsafe(Info);
     }
   }
+ 
+  bool hasVector = false;
   
-  
   // Walk through the GEP type indices, checking the types that this indexes
   // into.
   for (; I != E; ++I) {
@@ -539,22 +540,29 @@
     if (isa<StructType>(*I))
       continue;
     
-    // Don't SROA pointers into vectors.
-    if (isa<VectorType>(*I))
-      return MarkUnsafe(Info);
-    
-    // Otherwise, we must have an index into an array type.  Verify that this is
-    // an in-range constant integer.  Specifically, consider A[0][i].  We
-    // cannot know that the user isn't doing invalid things like allowing i to
-    // index an out-of-range subscript that accesses A[1].  Because of this, we
-    // have to reject SROA of any accesses into structs where any of the
-    // components are variables.
     ConstantInt *IdxVal = dyn_cast<ConstantInt>(I.getOperand());
     if (!IdxVal) return MarkUnsafe(Info);
-    if (IdxVal->getZExtValue() >= cast<ArrayType>(*I)->getNumElements())
+
+    // Are all indices still zero?
+    IsAllZeroIndices &= IdxVal->isZero();
+    
+    if (isa<ArrayType>(*I)) {
+      // Otherwise, we must have an index into an array type.  Verify that this is
+      // an in-range constant integer.  Specifically, consider A[0][i].  We
+      // cannot know that the user isn't doing invalid things like allowing i to
+      // index an out-of-range subscript that accesses A[1].  Because of this, we
+      // have to reject SROA of any accesses into structs where any of the
+      // components are variables.
+      if (IdxVal->getZExtValue() >= cast<ArrayType>(*I)->getNumElements())
+        return MarkUnsafe(Info);
+    }
+  
+    // Note if we've seen a vector type yet
+    hasVector |= isa<VectorType>(*I);
+    
+    // Don't SROA pointers into vectors, unless all indices are zero.
+    if (hasVector && !IsAllZeroIndices)
       return MarkUnsafe(Info);
-    
-    IsAllZeroIndices &= IdxVal->isZero();
   }
   
   // If there are any non-simple uses of this getelementptr, make sure to reject
@@ -661,6 +669,11 @@
       // It is likely that OtherPtr is a bitcast, if so, remove it.
       if (BitCastInst *BC = dyn_cast<BitCastInst>(OtherPtr))
         OtherPtr = BC->getOperand(0);
+      // All zero GEPs are effectively casts
+      if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(OtherPtr))
+        if (GEP->hasAllZeroIndices())
+          OtherPtr = GEP->getOperand(0);
+        
       if (ConstantExpr *BCE = dyn_cast<ConstantExpr>(OtherPtr))
         if (BCE->getOpcode() == Instruction::BitCast)
           OtherPtr = BCE->getOperand(0);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080923/a03107a9/attachment.sig>