[llvm-commits] PATCH: A new SROA implementation

Fri Sep 14 02:25:48 PDT 2012

On Thu, Sep 13, 2012 at 1:32 PM, Duncan Sands <baldrick at free.fr> wrote:

> Hi Chandler,
>
>
> On 13/09/12 12:15, Chandler Carruth wrote:
>
>> Hello all,
>>
>> Here is an updated patch. It contains all of the fixes I discussed here,
>> and it
>> also contains some significant updates:
>>
>> - Now has additionally survived intensive testing by Duncan. All crashers
>> found
>> with all the dragonegg test suites (including many ada tests etc) are
>> fixed. The
>> initial and immediate miscompile issues fixed. It's looking pretty good
>> testing
>> wise.
>>
>> - *Many* test cases added. Lots of bugs fixed. Generally, quite a bit of
>> hardening here. Duncan found several soft areas that I then was able to
>> more
>> thoroughly exercise.
>>
>>
>> I think this is getting *really* close to being ready for an initial
>> commit. It
>> is still behind a default-off flag. It will stay there for a bit. In
>> particular
>> I want to get more testing, I want to implement SSAUpdater-logic and tie
>> it into
>> the rest of the pass manager and collect final performance results before
>> enabling it. But I don't want to keep hacking on this out of tree. =]
>>
>> Anyways, review greatly appreciated.
>>
>
> looks great to me, I think you should apply it.

Thanks so much for the reviews and testing.

I've committed this version in r163883 as discussed on IRC, and will be
addressing your comments below with follow-up patches. Any further
comments, I'll also try to provide follow-up patches promptly to address.

> Some small comments:
>
>  --- /dev/null
>> +++ b/lib/Transforms/Scalar/SROA.**cpp
>>
> ...
>
>> +namespace {
>> +/// \brief Alloca partitioning representation.
>> +///
>> +/// This class represents a partitioning of an alloca into slices, and
>>
>> +/// information about the nature of uses of each slice of the alloca.
>> The goal
>> +/// is that this information is sufficient to decide if and how to split
>> the
>> +/// alloca apart and replace slices with scalars. It is also intended
>> that this
>> +/// structure can capture the relevant information needed both due
>> decide about
>>
>
> due decide -> to decide
>
>  +/// and to enact these transformations.
>> +class AllocaPartitioning {
>> +public:
>> +  /// \brief A common base class for representing a half-open byte range.
>> +  struct ByteRange {
>> +    /// \brief The beginning offset of the range.
>> +    uint64_t BeginOffset;
>> +
>> +    /// \brief The ending offset, not included in the range.
>> +    uint64_t EndOffset;
>> +
>> +    ByteRange() : BeginOffset(), EndOffset() {}
>> +    ByteRange(uint64_t BeginOffset, uint64_t EndOffset)
>> +        : BeginOffset(BeginOffset), EndOffset(EndOffset) {}
>> +
>> +    /// \brief Support for ordering ranges.
>> +    ///
>> +    /// This provides an ordering over ranges such that start offsets are
>> +    /// always increasing, and within equal start offsets, the end
>> offsets are
>> +    /// decreasing. Thus the spanning range comes first in in cluster
>> with the
>>
>
> in in -> in a
>
>  +    /// same start position.
>>
> ...
>
>> +  /// \brief A partition of an alloca.
>> +  ///
>> +  /// This structure represents a contiguous partition of the alloca.
>> These are
>> +  /// formed by examining the uses of the alloca. During formation, they
>> may
>> +  /// overlap but once an AllocaPartitioning is built, the Partitions
>> within it
>> +  /// are all disjoint.
>> +  struct Partition : public ByteRange {
>> +    /// \brief Whether this partition is splittable into smaller
>> partitions.
>> +    ///
>> +    /// We flag partitions as splittable when they are formed entirely
>> due to
>> +    /// accesses by trivially split operations such as memset and memcpy.
>>
>
> trivially split -> trivially splittable
>
>  +    ///
>> +    /// FIXME: At some point we should consider loads and stores of FCAs
>> to be
>> +    /// splittable and eagerly split them into scalar values.
>> +    bool IsSplittable;
>> +
>> +    Partition() : ByteRange(), IsSplittable() {}
>> +    Partition(uint64_t BeginOffset, uint64_t EndOffset, bool
>> IsSplittable)
>> +        : ByteRange(BeginOffset, EndOffset), IsSplittable(IsSplittable)
>> {}
>> +  };
>> +
>> +  /// \brief A particular use of a partition of the alloca.
>> +  ///
>> +  /// This structure is used to associate uses of a partition with it.
>> They
>> +  /// mark the range of bytes which are referenced by a particular
>> instruction,
>> +  /// and includes a handle to the user itself and the pointer value in
>> use.
>> +  /// The bounds of these uses are determined by intersecting the bounds
>> of the
>> +  /// memory use itself with a particular partition. As a consequence
>> there is
>> +  /// intentionally overlap between various usues of the same partition.
>>
>
> usues -> uses
>
> ...
>
>  +  /// \brief Allow iterating the dead users for this alloca.
>> +  ///
>> +  /// These are instructions which will never actually use the alloca as
>> they
>> +  /// are outside the allocated range. They are safe to replace with
>> undef and
>> +  /// delete.
>>
>
> If you were feeling bold you could replace them by "unreachable" (in this
> context probably a store to undef would be more appropriate).
>
>  +  /// @{
>> +  typedef SmallVectorImpl<Instruction *>::const_iterator
>> dead_user_iterator;
>> +  dead_user_iterator dead_user_begin() const { return DeadUsers.begin();
>> }
>> +  dead_user_iterator dead_user_end() const { return DeadUsers.end(); }
>> +  /// @}
>> +
>> +  /// \brief Allow iterating the dead operands referring to this alloca.
>>
>
> Maybe "dead expressions" would be clearer.  These are just some addresses
> computed from the alloca address, right?
>
> ...
>
>> +  /// \brief Compute a common type among the uses of a particular
>> partition.
>> +  ///
>> +  /// This routines walks all of the uses of a particular partition and
>> tries
>> +  /// to find a common type between them. Untyped operations such as
>> memset and
>> +  /// memcpy are ignored.
>>
>> +  Type *getCommonType(iterator I) const;
>> +
>> +  void print(raw_ostream &OS, const_iterator I, StringRef Indent = "  ")
>> const;
>> +  void printUsers(raw_ostream &OS, const_iterator I,
>> +                  StringRef Indent = "  ") const;
>> +  void print(raw_ostream &OS) const;
>> +  void dump(const_iterator I) const LLVM_ATTRIBUTE_NOINLINE
>> LLVM_ATTRIBUTE_USED;
>> +  void dump() const LLVM_ATTRIBUTE_NOINLINE LLVM_ATTRIBUTE_USED;
>>
>
> A patch just went in to wrap "dump" functions in ifndef NDEBUG (plus some
> other
> condition), maybe you should do the same here.
>
>  +  /// \brief The partitions of the alloca.
>> +  ///
>>
>> +  /// We store a vector of the partitions over the alloca here. This
>> vector is
>> +  /// sorted by increasing begin offset, and then by decreasing end
>> offset. See
>> +  /// the Partition inner class for more details. Initially there are
>> overlaps,
>> +  /// be during construction we form a disjoint sequence toward the end.
>>
>
> be during -> ?
> I couldn't make much sense of this sentence.
>
> ...
>
>  +  bool computeConstantGEPOffset(**GetElementPtrInst &GEPI, uint64_t
>> &GEPOffset) {
>>
>
> Is this routine really not available from some other part of LLVM?
>
>  +  Value *foldSelectInst(SelectInst &SI) {
>> +    // If the condition being selected on is a constant or the same
>> value is
>> +    // being selected between, fold the select. Yes this does (rarely)
>> happen
>> +    // early on.
>> +    if (ConstantInt *CI = dyn_cast<ConstantInt>(SI.**getCondition()))
>> +      return SI.getOperand(1+CI->isZero());
>> +    if (SI.getOperand(1) == SI.getOperand(2)) {
>> +      assert(*U == SI.getOperand(1));
>>
>
> Urgh, what is the *U that turns up here?  Seems like breaking the
> abstraction.
>
>
>  +      return SI.getOperand(1);
>> +    }
>> +    return 0;
>> +  }
>> +};
>>
>
> ...
>
>  +  bool handleLoadOrStore(Type *Ty, Instruction &I) {
>> +    uint64_t Size = TD.getTypeStoreSize(Ty);
>> +
>>
>> +    // If this memory access can be shown to *statically* extend outside
>> the
>> +    // bounds of of the allocation, it's behavior is undefined, so simply
>> +    // ignore it. Note that this is more strict than the generic clamping
>> +    // behavior of insertUse. We also try to handle cases which might
>> run the
>> +    // risk of overflow.
>> +    // FIXME: We should instead consider the pointer to have escaped if
>> this
>>
>> +    // function is being instrumented for addressing bugs or race
>> conditions.
>> +    if (Offset >= AllocSize || Size > AllocSize || Offset + Size >
>> AllocSize) {
>>
>
> Such accesses could be replaced by "unreachable".
>
>
>  +  bool visitGetElementPtrInst(**GetElementPtrInst &GEPI) {
>> +    //unsigned IntPtrWidth = TD->getPointerSizeInBits();
>> +    //assert(IntPtrWidth == Offset.getBitWidth());
>>
>
> Commented out code above...
>
>
>  +    uint64_t GEPOffset;
>> +    if (!computeConstantGEPOffset(**GEPI, GEPOffset))
>> +      return markAsEscaping(GEPI);
>> +
>> +    enqueueUsers(GEPI, GEPOffset);
>> +    return true;
>> +  }
>>
>
> ...
>
>> +  bool visitMemSetInst(MemSetInst &II) {
>>
>
> It is impossible for the use to be the size or the byte argument, right,
> because
> you only deal with expressions of pointer type?  But if someone adds
> ptrtoint
> support one day then you could be in trouble here, so better to check...
>
>  +    ConstantInt *Length = dyn_cast<ConstantInt>(II.**getLength());
>> +    insertUse(II, Length ? Length->getZExtValue() : AllocSize - Offset,
>> Length);
>> +    return true;
>> +  }
>>
>
> ...
>
>  +  // Disable SRoA for any intrinsics except for lifetime invariants.
>>
>
> What about debug intrinsics?
>
> ...
>
>
>  +/// \brief Use adder for the alloca partitioning.
>> +///
>> +/// This class adds the uses of an alloca to all of the partitions which
>> it
>> +/// uses.
>>
>
> Maybe should read: This class adds the uses of an alloca to all of the
> partitions which they use.
>
>
>  For splittable partitions, this can end up doing essentially a linear
>
>> +/// walk of the partitions, but the number of steps remains bounded by
>> the
>> +/// total result instruction size:
>> +/// - The number of partitions is a result of the number unsplittable
>> +///   instructions using the alloca.
>> +/// - The number of users of each partition is at worst the total number
>> of
>> +///   splittable instructions using the alloca.
>> +/// Thus we will produce N * M instructions in the end, where N are the
>> number
>> +/// of unsplittable uses and M are the number of splittable. This
>> visitor does
>> +/// the exact same number of updates to the partitioning.
>> +///
>> +/// In the more common case, this visitor will leverage the fact that the
>> +/// partition space is pre-sorted, and do a logarithmic search for the
>> +/// partition needed, making the total visit a classical ((N + M) *
>> log(N))
>> +/// complexity operation.
>>
>
> ...
>
>  +  void visitGetElementPtrInst(**GetElementPtrInst &GEPI) {
>> +    if (GEPI.use_empty())
>> +      return markAsDead(GEPI);
>>
>> +
>> +    //unsigned IntPtrWidth = TD->getPointerSizeInBits();
>> +    //assert(IntPtrWidth == Offset.getBitWidth());
>>
>
> Commented out code...
>
>  +    uint64_t GEPOffset;
>> +    if (!computeConstantGEPOffset(**GEPI, GEPOffset))
>> +      llvm_unreachable("Unable to compute constant offset for use");
>>
>> +
>> +    enqueueUsers(GEPI, GEPOffset);
>> +  }
>>
>
> ...
>
>  +  /// \brief A set of deleted alloca instructions.
>> +  ///
>> +  /// These pointers are *no longer valid* as they have been deleted.
>> They are
>> +  /// used to remove deleted allocas from the list of promotable allocas.
>> +  SmallPtrSet<AllocaInst *, 4> DeletedAllocas;
>>
>
> Does this have to live as a member variable?  It is only used in a very
> localized way: it could be declared in runOnFunction and passed to
> deleteDeadInstructions to fill in.  What is dangerous about this kind of
> set
> is that you mustn't allocate any new alloca's after adding something to the
> set (and before clearing the set) since it might be allocated with the
> address
> of the dead alloca, causing confusion and non-determinism.  Keeping the set
> highly local makes it obvious that this can't happen.
>
>  +
>> +  /// \brief A collection of alloca instructions we can directly promote.
>> +  std::vector<AllocaInst *> PromotableAllocas;
>>
>
> ...
>
>  +/// \brief Accumulate the constant offsets in a GEP into a single APInt
>> offset.
>> +///
>> +/// If the provided GEP is all-constant, the total byte offset formed by
>> the
>> +/// GEP is computed and Offset is set to it. If the GEP has any
>> non-constant
>> +/// operands, the function returns false and the value of Offset is
>> unmodified.
>> +static bool accumulateGEPOffsets(const TargetData &TD, GEPOperator &GEP,
>> +                                 APInt &Offset) {
>>
>
> This also sounds like the kind of routine that maybe exists already (or
> should
> exist already).
>
> ...
>
>> +/// \brief Recursively compute indices for a natural GEP.
>> +///
>> +/// This is the recursive step for getNaturalGEPWithOffset that walks
>> down the
>> +/// element types adding appropriate indices for the GEP.
>> +static Value *getNaturalGEPRecursively(**IRBuilder<> &IRB, const
>> TargetData &TD,
>> +                                       Value *Ptr, Type *Ty, APInt
>> &Offset,
>>
>> +                                       Type *TargetTy,
>> +                                       SmallVectorImpl<Value *> &Indices,
>> +                                       const Twine &Prefix) {
>> +  if (Offset == 0)
>> +    return getNaturalGEPWithType(IRB, TD, Ptr, Ty, TargetTy, Indices,
>> Prefix);
>> +
>> +  // We can't recurse through pointer types.
>> +  if (Ty->isPointerTy())
>> +    return 0;
>> +
>> +  if (VectorType *VecTy = dyn_cast<VectorType>(Ty)) {
>> +    unsigned ElementSizeInBits = VecTy->getScalarSizeInBits();
>> +    if (ElementSizeInBits % 8)
>> +      return 0; // GEPs over multiple of 8 size vector elements are
>> invalid.
>>
>
> over multiple -> over non-multiple
> You invented a new rule here, maybe add a comment that this whole area is
> a can
> of worms and not properly defined right now.
>
>  +    APInt ElementSize(Offset.**getBitWidth(), ElementSizeInBits / 8);
>> +    APInt NumSkippedElements = Offset.udiv(ElementSize);
>> +    if (NumSkippedElements.ugt(VecTy-**>getNumElements()))
>> +      return 0;
>>
>> +    Offset -= NumSkippedElements * ElementSize;
>> +    Indices.push_back(IRB.getInt(**NumSkippedElements));
>> +    return getNaturalGEPRecursively(IRB, TD, Ptr,
>> VecTy->getElementType(),
>> +                                    Offset, TargetTy, Indices, Prefix);
>> +  }
>>
>
> Above you seem to be assuming that the elements are naturally aligned,
> i.e. have
> alloc size equal to the size in bits.  I'm not sure that is guaranteed.
>
> ...
>
>  +  StructType *STy = dyn_cast<StructType>(Ty);
>> +  if (!STy)
>> +    return 0;
>> +
>> +  const StructLayout *SL = TD.getStructLayout(STy);
>> +  uint64_t StructOffset = Offset.getZExtValue();
>> +  if (StructOffset > SL->getSizeInBytes())
>>
>
> Probably this should be >= here.
>
>  +    return 0;
>> +  unsigned Index = SL->**getElementContainingOffset(**StructOffset);
>> +  Offset -= APInt(Offset.getBitWidth(), SL->getElementOffset(Index));
>> +  Type *ElementTy = STy->getElementType(Index);
>> +  if (Offset.uge(TD.**getTypeAllocSize(ElementTy)))
>> +    return 0; // The offset points into alignment padding.
>> +
>> +  Indices.push_back(IRB.**getInt32(Index));
>> +  return getNaturalGEPRecursively(IRB, TD, Ptr, ElementTy, Offset,
>> TargetTy,
>> +                                  Indices, Prefix);
>> +}
>>
>
> ...
>
>  +
>> +/// \brief Get a natural GEP from a base pointer to a particular offset
>> and
>> +/// resulting in a particular type.
>> +///
>> +/// The goal is to produce a "natural" looking GEP that works with the
>> existing
>> +/// composite types to arrive at the appropriate offset and element type
>> for
>> +/// a pointer. TargetTy is the element type the returned GEP should
>> point-to if
>> +/// possible. We recurse by decreasing Offset, adding the appropriate
>> index to
>> +/// Indices, and setting Ty to the result subtype.
>> +///
>> +/// If no natural GEP can be constructed, this function returns a null
>> Value*.
>>
>
> returns a null Value* -> returns null
>
> ...
>
>  +/// \brief Test whether the given alloca partition can be promoted to a
>> vector.
>> +///
>> +/// This is a quick test to check whether we can rewrite a particular
>> alloca
>> +/// partition (and its newly formed alloca) into a vector alloca with
>> only
>> +/// whole-vector loads and stores such that it could be promoted to a
>> vector
>> +/// SSA value. We only can ensure this for a limited set of operations,
>> and we
>> +/// don't want to do the rewrites unless we are confident that the
>> result will
>> +/// be promotable, so we have an early test here.
>> +static bool isVectorPromotionViable(const TargetData &TD,
>> +                                    Type *AllocaTy,
>> +                                    AllocaPartitioning &P,
>> +                                    uint64_t PartitionBeginOffset,
>> +                                    uint64_t PartitionEndOffset,
>> +                                    AllocaPartitioning::const_use_**iterator
>> I,
>> +                                    AllocaPartitioning::const_use_**iterator
>> E) {
>> +  VectorType *Ty = dyn_cast<VectorType>(AllocaTy)**;
>> +  if (!Ty)
>> +    return false;
>> +
>> +  uint64_t VecSize = TD.getTypeSizeInBits(Ty);
>> +  uint64_t ElementSize = Ty->getScalarSizeInBits();
>> +
>> +  // While the definition of LLVM vectors is bitpacked, we don't support
>> sizes
>> +  // that aren't byte sized.
>> +  if (ElementSize % 8)
>> +    return false;
>>
>
> You should probably also bail out if ElementSize is not the same as the
> alloc
> size, since no-one cares about those and they are very easy to get wrong.
>  In
> fact that would make the % 8 test redundant.  I reckon you should replace
> all
> of your % 8 tests everywhere with a utility function that checks that the
> alloc
> size and the bit size coincide.
>
> ...
>
>> +/// \brief Try to find a partition of the aggregate type passed in for a
>> given
>>
>> +/// offset and size.
>> +///
>> +/// This recurses through the aggregate type and tries to compute a
>> subtype
>> +/// based on the offset and size. When the offset and size span a
>> sub-section
>> +/// of an array, it will even compute a new array type for that
>> sub-section.
>> +static Type *getTypePartition(const TargetData &TD, Type *Ty,
>> +                              uint64_t Offset, uint64_t Size) {
>> +  if (Offset == 0 && TD.getTypeAllocSize(Ty) == Size)
>> +    return Ty;
>> +
>> +  if (SequentialType *SeqTy = dyn_cast<SequentialType>(Ty)) {
>> +    // We can't partition pointers...
>> +    if (SeqTy->isPointerTy())
>> +      return 0;
>> +
>>
>> +    Type *ElementTy = SeqTy->getElementType();
>> +    uint64_t ElementSize = TD.getTypeAllocSize(ElementTy)**;
>> +    uint64_t NumSkippedElements = Offset / ElementSize;
>> +    if (ArrayType *ArrTy = dyn_cast<ArrayType>(SeqTy))
>> +      if (NumSkippedElements >= ArrTy->getNumElements())
>> +        return 0;
>> +    if (VectorType *VecTy = dyn_cast<VectorType>(SeqTy))
>> +      if (NumSkippedElements >= VecTy->getNumElements())
>> +        return 0;
>>
>> +    Offset -= NumSkippedElements * ElementSize;
>> +
>> +    // First check if we need to recurse.
>> +    if (Offset > 0 || Size < ElementSize) {
>> +      // Bail if the partition ends in a different array element.
>> +      if ((Offset + Size) > ElementSize)
>> +        return 0;
>> +      // Recurse through the element type trying to peel off offset
>> bytes.
>> +      return getTypePartition(TD, ElementTy, Offset, Size);
>> +    }
>> +    assert(Offset == 0);
>> +
>> +    if (Size == ElementSize)
>> +      return ElementTy;
>> +    assert(Size > ElementSize);
>> +    uint64_t NumElements = Size / ElementSize;
>> +    if (NumElements * ElementSize != Size)
>> +      return 0;
>> +    return ArrayType::get(ElementTy, NumElements);
>> +  }
>> +
>> +  StructType *STy = dyn_cast<StructType>(Ty);
>> +  if (!STy)
>> +    return 0;
>> +
>> +  const StructLayout *SL = TD.getStructLayout(STy);
>> +  if (Offset > SL->getSizeInBytes())
>> +    return 0;
>> +  uint64_t EndOffset = Offset + Size;
>> +  if (EndOffset > SL->getSizeInBytes())
>> +    return 0;
>> +
>> +  unsigned Index = SL->**getElementContainingOffset(**Offset);
>> +  if (SL->getElementOffset(Index) != Offset)
>> +    return 0; // Inside of padding.
>> +  Offset -= SL->getElementOffset(Index);
>> +
>> +  Type *ElementTy = STy->getElementType(Index);
>>
>> +  uint64_t ElementSize = TD.getTypeAllocSize(ElementTy)**;
>> +  if (Offset >= ElementSize)
>> +    return 0; // The offset points into alignment padding.
>> +
>> +  // See if any partition must be contained by the element.
>>
>> +  if (Offset > 0 || Size < ElementSize) {
>> +    if ((Offset + Size) > ElementSize)
>> +      return 0;
>> +    // Bail if this is a poniter element, we can't recurse through them.
>> +    if (ElementTy->isPointerTy())
>> +      return 0;
>>
>
> Why handle this special case here, since getTypePartition will take care
> of it.
>
>  +    return getTypePartition(TD, ElementTy, Offset, Size);
>> +  }
>> +  assert(Offset == 0);
>> +
>> +  if (Size == ElementSize)
>> +    return ElementTy;
>> +
>> +  StructType::element_iterator EI = STy->element_begin() + Index,
>>
>> +                               EE = STy->element_end();
>> +  if (EndOffset < SL->getSizeInBytes()) {
>> +    unsigned EndIndex = SL->**getElementContainingOffset(**EndOffset);
>>
>
> Shouldn't this be EndOffset - 1?  (In which case it should be <= above).
>
>  +    if (Index == EndIndex)
>> +      return 0; // Within a single element and its padding.
>> +    assert(Index < EndIndex);
>> +    assert(Index + EndIndex <= STy->getNumElements());
>> +    EE = STy->element_begin() + EndIndex;
>> +  }
>> +
>> +  // Try to build up a sub-structure.
>>
>> +  SmallVector<Type *, 4> ElementTys;
>> +  do {
>> +    ElementTys.push_back(*EI++);
>> +  } while (EI != EE);
>> +  StructType *SubTy = StructType::get(STy->**getContext(), ElementTys,
>> +                                      STy->isPacked());
>> +  const StructLayout *SubSL = TD.getStructLayout(SubTy);
>> +  if (Size == SubSL->getSizeInBytes())
>> +    return SubTy;
>> +
>> +  // FIXME: We could potentially recurse down through the last element
>> in the
>> +  // sub-struct to find a natural end point.
>>
>> +  return 0;
>> +}
>> +
>> +/// \brief Rewrite an alloca partition's users.
>> +///
>> +/// This routine drives both of the rewriting goals of the SROA pass. It
>> tries
>> +/// to rewrite uses of an alloca partition to be conducive for SSA value
>> +/// promotion. If the partition needs a new, more refined alloca, this
>> will
>> +/// build that new alloca, preserving as much type information as
>> possible, and
>> +/// rewrite the uses of the old alloca to point at the new one and have
>> the
>> +/// appropriate new offsets. It also evaluates how successful the
>> rewrite was
>> +/// at enabling promotion and if it was successful queues the alloca to
>> be
>> +/// promoted.
>> +bool SROA::rewriteAllocaPartition(**AllocaInst &AI,
>> +                                  AllocaPartitioning &P,
>>
>> +                                  AllocaPartitioning::iterator PI) {
>> +  uint64_t AllocaSize = PI->EndOffset - PI->BeginOffset;
>> +  if (P.use_begin(PI) == P.use_end(PI))
>> +    return false; // No live uses left of this partition.
>> +
>>
>> +  // Try to compute a friendly type for this partition of the alloca.
>> This
>> +  // won't always succeed, in which case we fall back to a legal integer
>> type
>> +  // or an i8 array of an appropriate size.
>> +  Type *AllocaTy = 0;
>> +  if (Type *PartitionTy = P.getCommonType(PI))
>> +    if (TD->getTypeAllocSize(**PartitionTy) >= AllocaSize)
>> +      AllocaTy = PartitionTy;
>> +  if (!AllocaTy)
>>
>> +    if (Type *PartitionTy = getTypePartition(*TD, AI.getAllocatedType(),
>> +                                             PI->BeginOffset,
>> AllocaSize))
>> +      AllocaTy = PartitionTy;
>> +  if ((!AllocaTy ||
>> +       (AllocaTy->isArrayTy() &&
>> +        AllocaTy->getArrayElementType(**)->isIntegerTy())) &&
>> +      TD->isLegalInteger(AllocaSize * 8))
>> +    AllocaTy = Type::getIntNTy(*C, AllocaSize * 8);
>> +  if (!AllocaTy)
>> +    AllocaTy = ArrayType::get(Type::**getInt8Ty(*C), AllocaSize);
>>
>
> How about an assert here that the alloc size of AllocaTy is >= AllocaSize.
>
>  +  // Check for the case where we're going to rewrite to a new alloca of
>> the
>> +  // exact same type as the original, and with the same access offsets.
>> In that
>> +  // case, re-use the existing alloca, but still run through the
>> rewriter to
>> +  // performe phi and select speculation.
>> +  AllocaInst *NewAI;
>> +  if (AllocaTy == AI.getAllocatedType()) {
>> +    assert(PI->BeginOffset == 0 &&
>> +           "Non-zero begin offset but same alloca type");
>> +    assert(PI == P.begin() && "Begin offset is zero on later partition");
>> +    NewAI = &AI;
>> +  } else {
>> +    // FIXME: The alignment here is overly conservative -- we could in
>> many
>> +    // cases get away with much weaker alignment constraints.
>>
>> +    NewAI = new AllocaInst(AllocaTy, 0, AI.getAlignment(),
>> +                           AI.getName() + ".sroa." + Twine(PI -
>> P.begin()),
>> +                           &AI);
>> +    ++NumNewAllocas;
>> +  }
>> +
>> +  DEBUG(dbgs() << "Rewriting alloca partition "
>> +               << "[" << PI->BeginOffset << "," << PI->EndOffset << ")
>> to: "
>> +               << *NewAI << "\n");
>> +
>> +  AllocaPartitionRewriter Rewriter(*TD, P, PI, *this, AI, *NewAI,
>> +                                   PI->BeginOffset, PI->EndOffset);
>> +  DEBUG(dbgs() << "  rewriting ");
>> +  DEBUG(P.print(dbgs(), PI, ""));
>> +  if (Rewriter.visitUsers(P.use_**begin(PI), P.use_end(PI))) {
>> +    DEBUG(dbgs() << "  and queuing for promotion\n");
>> +    PromotableAllocas.push_back(**NewAI);
>> +  } else if (NewAI != &AI) {
>> +    // If we can't promote the alloca, iterate on it to check for new
>> +    // refinements exposed by splitting the current alloca. Don't
>> iterate on an
>> +    // alloca which didn't actually change and didn't get promoted.
>> +    Worklist.insert(NewAI);
>> +  }
>> +  return true;
>> +}
>>
>
> ...
>
> Ciao, Duncan.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120914/426adc9a/attachment.html>