<div dir="ltr">I know its a pain to have post-commit design review, but its impossible to catch everything pre-commit.<div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 26, 2015 at 2:51 PM, Chad Rosier <span dir="ltr"><<a href="mailto:mcrosier@codeaurora.org" target="_blank">mcrosier@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: mcrosier<br>

Date: Mon Jan 26 16:51:15 2015<br>

New Revision: 227149<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=227149&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=227149&view=rev</a><br>

Log:<br>

Commoning of target specific load/store intrinsics in Early CSE.<br>

<br>

Phabricator revision: <a href="http://reviews.llvm.org/D7121" target="_blank">http://reviews.llvm.org/D7121</a><br>

Patch by Sanjin Sijaric <<a href="mailto:ssijaric@codeaurora.org">ssijaric@codeaurora.org</a>>!<br></blockquote><div><br></div><div>The thing I completely missed when skimming the subject was the fact that this is not just target-specific, but leveraging TTI...</div><div><br></div><div>So, first question: why is this desirable? Why do we need this? There seems to be no real justification for this in the commit log, comments, or anywhere.</div><div><br></div><div>EarlyCSE is a very high-level pass. I don't know why it is reasonable to teach it how to deal with every kind of target-specific load intrinsic we come up with. This is really contrary to the entire IR's design.</div><div><br></div><div>The point of TTI was to expose *cost models* from the code generator to the IR pass. This is something completely different and subverts that design in a really fundamental way. I'm pretty opposed to this entire direction unless there is some deep and fundamental reason why we *have* to do this and so far I'm not seeing it. I could speculate about any number of other ways you might solve similar or related problems, but it seems pointless until there is a clear and precise description of the problem this was intended to solve.</div><div><br></div><div>So on multiple levels I feel like this is not the right design.</div><div><br></div><div>-Chandler</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Added:<br>

    llvm/trunk/test/Transforms/EarlyCSE/AArch64/<br>

    llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll<br>

    llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg<br>

Modified:<br>

    llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h<br>

    llvm/trunk/lib/Analysis/TargetTransformInfo.cpp<br>

    llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp<br>

    llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp<br>

<br>

Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h?rev=227149&r1=227148&r2=227149&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h?rev=227149&r1=227148&r2=227149&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h (original)<br>

+++ llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h Mon Jan 26 16:51:15 2015<br>

@@ -23,6 +23,7 @@<br>

 #define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H<br>

<br>

 #include "llvm/IR/Intrinsics.h"<br>

+#include "llvm/IR/IntrinsicInst.h"<br>

 #include "llvm/Pass.h"<br>

 #include "llvm/Support/DataTypes.h"<br>

<br>

@@ -35,6 +36,20 @@ class Type;<br>

 class User;<br>

 class Value;<br>

<br>

+/// \brief Information about a load/store intrinsic defined by the target.<br>

+struct MemIntrinsicInfo {<br>

+  MemIntrinsicInfo()<br>

+      : ReadMem(false), WriteMem(false), Vol(false), MatchingId(0),<br>

+        NumMemRefs(0), PtrVal(nullptr) {}<br>

+  bool ReadMem;<br>

+  bool WriteMem;<br>

+  bool Vol;<br>

+  // Same Id is set by the target for corresponding load/store intrinsics.<br>

+  unsigned short MatchingId;<br>

+  int NumMemRefs;<br>

+  Value *PtrVal;<br>

+};<br>

+<br>

 /// TargetTransformInfo - This pass provides access to the codegen<br>

 /// interfaces that are needed for IR-level transformations.<br>

 class TargetTransformInfo {<br>

@@ -443,6 +458,20 @@ public:<br>

   /// any callee-saved registers, so would require a spill and fill.<br>

   virtual unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type*> Tys) const;<br>

<br>

+  /// \returns True if the intrinsic is a supported memory intrinsic.  Info<br>

+  /// will contain additional information - whether the intrinsic may write<br>

+  /// or read to memory, volatility and the pointer.  Info is undefined<br>

+  /// if false is returned.<br>

+  virtual bool getTgtMemIntrinsic(IntrinsicInst *Inst,<br>

+                                  MemIntrinsicInfo &Info) const;<br>

+<br>

+  /// \returns A value which is the result of the given memory intrinsic.  New<br>

+  /// instructions may be created to extract the result from the given intrinsic<br>

+  /// memory operation.  Returns nullptr if the target cannot create a result<br>

+  /// from the given intrinsic.<br>

+  virtual Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst,<br>

+                                                   Type *ExpectedType) const;<br>

+<br>

   /// @}<br>

<br>

   /// Analysis group identification.<br>

<br>

Modified: llvm/trunk/lib/Analysis/TargetTransformInfo.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/TargetTransformInfo.cpp?rev=227149&r1=227148&r2=227149&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/TargetTransformInfo.cpp?rev=227149&r1=227148&r2=227149&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/lib/Analysis/TargetTransformInfo.cpp (original)<br>

+++ llvm/trunk/lib/Analysis/TargetTransformInfo.cpp Mon Jan 26 16:51:15 2015<br>

@@ -254,6 +254,16 @@ unsigned TargetTransformInfo::getCostOfK<br>

   return PrevTTI->getCostOfKeepingLiveOverCall(Tys);<br>

 }<br>

<br>

+Value *TargetTransformInfo::getOrCreateResultFromMemIntrinsic(<br>

+    IntrinsicInst *Inst, Type *ExpectedType) const {<br>

+  return PrevTTI->getOrCreateResultFromMemIntrinsic(Inst, ExpectedType);<br>

+}<br>

+<br>

+bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst,<br>

+                                             MemIntrinsicInfo &Info) const {<br>

+  return PrevTTI->getTgtMemIntrinsic(Inst, Info);<br>

+}<br>

+<br>

 namespace {<br>

<br>

 struct NoTTI final : ImmutablePass, TargetTransformInfo {<br>

@@ -656,6 +666,15 @@ struct NoTTI final : ImmutablePass, Targ<br>

     return 0;<br>

   }<br>

<br>

+  bool getTgtMemIntrinsic(IntrinsicInst *Inst,<br>

+                          MemIntrinsicInfo &Info) const override {<br>

+    return false;<br>

+  }<br>

+<br>

+  Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst,<br>

+                                           Type *ExpectedType) const override {<br>

+    return nullptr;<br>

+  }<br>

 };<br>

<br>

 } // end anonymous namespace<br>

<br>

Modified: llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp?rev=227149&r1=227148&r2=227149&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp?rev=227149&r1=227148&r2=227149&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (original)<br>

+++ llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp Mon Jan 26 16:51:15 2015<br>

@@ -44,6 +44,12 @@ class AArch64TTI final : public Immutabl<br>

   /// are set if the result needs to be inserted and/or extracted from vectors.<br>

   unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;<br>

<br>

+  enum MemIntrinsicType {<br>

+    VECTOR_LDST_TWO_ELEMENTS,<br>

+    VECTOR_LDST_THREE_ELEMENTS,<br>

+    VECTOR_LDST_FOUR_ELEMENTS<br>

+  };<br>

+<br>

 public:<br>

   AArch64TTI() : ImmutablePass(ID), TM(nullptr), ST(nullptr), TLI(nullptr) {<br>

     llvm_unreachable("This pass cannot be directly constructed");<br>

@@ -131,6 +137,11 @@ public:<br>

   void getUnrollingPreferences(const Function *F, Loop *L,<br>

                                UnrollingPreferences &UP) const override;<br>

<br>

+  Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst,<br>

+                                           Type *ExpectedType) const override;<br>

+<br>

+  bool getTgtMemIntrinsic(IntrinsicInst *Inst,<br>

+                          MemIntrinsicInfo &Info) const override;<br>

<br>

   /// @}<br>

 };<br>

@@ -554,3 +565,83 @@ void AArch64TTI::getUnrollingPreferences<br>

   // Disable partial & runtime unrolling on -Os.<br>

   UP.PartialOptSizeThreshold = 0;<br>

 }<br>

+<br>

+Value *AArch64TTI::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst,<br>

+                                                     Type *ExpectedType) const {<br>

+  switch (Inst->getIntrinsicID()) {<br>

+  default:<br>

+    return nullptr;<br>

+  case Intrinsic::aarch64_neon_st2:<br>

+  case Intrinsic::aarch64_neon_st3:<br>

+  case Intrinsic::aarch64_neon_st4: {<br>

+    // Create a struct type<br>

+    StructType *ST = dyn_cast<StructType>(ExpectedType);<br>

+    if (!ST)<br>

+      return nullptr;<br>

+    unsigned NumElts = Inst->getNumArgOperands() - 1;<br>

+    if (ST->getNumElements() != NumElts)<br>

+      return nullptr;<br>

+    for (unsigned i = 0, e = NumElts; i != e; ++i) {<br>

+      if (Inst->getArgOperand(i)->getType() != ST->getElementType(i))<br>

+        return nullptr;<br>

+    }<br>

+    Value *Res = UndefValue::get(ExpectedType);<br>

+    IRBuilder<> Builder(Inst);<br>

+    for (unsigned i = 0, e = NumElts; i != e; ++i) {<br>

+      Value *L = Inst->getArgOperand(i);<br>

+      Res = Builder.CreateInsertValue(Res, L, i);<br>

+    }<br>

+    return Res;<br>

+  }<br>

+  case Intrinsic::aarch64_neon_ld2:<br>

+  case Intrinsic::aarch64_neon_ld3:<br>

+  case Intrinsic::aarch64_neon_ld4:<br>

+    if (Inst->getType() == ExpectedType)<br>

+      return Inst;<br>

+    return nullptr;<br>

+  }<br>

+}<br>

+<br>

+bool AArch64TTI::getTgtMemIntrinsic(IntrinsicInst *Inst,<br>

+                                    MemIntrinsicInfo &Info) const {<br>

+  switch (Inst->getIntrinsicID()) {<br>

+  default:<br>

+    break;<br>

+  case Intrinsic::aarch64_neon_ld2:<br>

+  case Intrinsic::aarch64_neon_ld3:<br>

+  case Intrinsic::aarch64_neon_ld4:<br>

+    Info.ReadMem = true;<br>

+    Info.WriteMem = false;<br>

+    Info.Vol = false;<br>

+    Info.NumMemRefs = 1;<br>

+    Info.PtrVal = Inst->getArgOperand(0);<br>

+    break;<br>

+  case Intrinsic::aarch64_neon_st2:<br>

+  case Intrinsic::aarch64_neon_st3:<br>

+  case Intrinsic::aarch64_neon_st4:<br>

+    Info.ReadMem = false;<br>

+    Info.WriteMem = true;<br>

+    Info.Vol = false;<br>

+    Info.NumMemRefs = 1;<br>

+    Info.PtrVal = Inst->getArgOperand(Inst->getNumArgOperands() - 1);<br>

+    break;<br>

+  }<br>

+<br>

+  switch (Inst->getIntrinsicID()) {<br>

+  default:<br>

+    return false;<br>

+  case Intrinsic::aarch64_neon_ld2:<br>

+  case Intrinsic::aarch64_neon_st2:<br>

+    Info.MatchingId = VECTOR_LDST_TWO_ELEMENTS;<br>

+    break;<br>

+  case Intrinsic::aarch64_neon_ld3:<br>

+  case Intrinsic::aarch64_neon_st3:<br>

+    Info.MatchingId = VECTOR_LDST_THREE_ELEMENTS;<br>

+    break;<br>

+  case Intrinsic::aarch64_neon_ld4:<br>

+  case Intrinsic::aarch64_neon_st4:<br>

+    Info.MatchingId = VECTOR_LDST_FOUR_ELEMENTS;<br>

+    break;<br>

+  }<br>

+  return true;<br>

+}<br>

<br>

Modified: llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp?rev=227149&r1=227148&r2=227149&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp?rev=227149&r1=227148&r2=227149&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp (original)<br>

+++ llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp Mon Jan 26 16:51:15 2015<br>

@@ -18,6 +18,7 @@<br>

 #include "llvm/ADT/Statistic.h"<br>

 #include "llvm/Analysis/AssumptionCache.h"<br>

 #include "llvm/Analysis/InstructionSimplify.h"<br>

+#include "llvm/Analysis/TargetTransformInfo.h"<br>

 #include "llvm/IR/DataLayout.h"<br>

 #include "llvm/IR/Dominators.h"<br>

 #include "llvm/IR/Instructions.h"<br>

@@ -273,6 +274,7 @@ class EarlyCSE : public FunctionPass {<br>

 public:<br>

   const DataLayout *DL;<br>

   const TargetLibraryInfo *TLI;<br>

+  const TargetTransformInfo *TTI;<br>

   DominatorTree *DT;<br>

   AssumptionCache *AC;<br>

   typedef RecyclingAllocator<<br>

@@ -383,14 +385,83 @@ private:<br>

     bool Processed;<br>

   };<br>

<br>

+  /// \brief Wrapper class to handle memory instructions, including loads,<br>

+  /// stores and intrinsic loads and stores defined by the target.<br>

+  class ParseMemoryInst {<br>

+  public:<br>

+    ParseMemoryInst(Instruction *Inst, const TargetTransformInfo *TTI)<br>

+        : Load(false), Store(false), Vol(false), MayReadFromMemory(false),<br>

+          MayWriteToMemory(false), MatchingId(-1), Ptr(nullptr) {<br>

+      MayReadFromMemory = Inst->mayReadFromMemory();<br>

+      MayWriteToMemory = Inst->mayWriteToMemory();<br>

+      if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {<br>

+        MemIntrinsicInfo Info;<br>

+        if (!TTI->getTgtMemIntrinsic(II, Info))<br>

+          return;<br>

+        if (Info.NumMemRefs == 1) {<br>

+          Store = Info.WriteMem;<br>

+          Load = Info.ReadMem;<br>

+          MatchingId = Info.MatchingId;<br>

+          MayReadFromMemory = Info.ReadMem;<br>

+          MayWriteToMemory = Info.WriteMem;<br>

+          Vol = Info.Vol;<br>

+          Ptr = Info.PtrVal;<br>

+        }<br>

+      } else if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {<br>

+        Load = true;<br>

+        Vol = !LI->isSimple();<br>

+        Ptr = LI->getPointerOperand();<br>

+      } else if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {<br>

+        Store = true;<br>

+        Vol = !SI->isSimple();<br>

+        Ptr = SI->getPointerOperand();<br>

+      }<br>

+    }<br>

+    bool isLoad() { return Load; }<br>

+    bool isStore() { return Store; }<br>

+    bool isVolatile() { return Vol; }<br>

+    bool isMatchingMemLoc(const ParseMemoryInst &Inst) {<br>

+      return Ptr == Inst.Ptr && MatchingId == Inst.MatchingId;<br>

+    }<br>

+    bool isValid() { return Ptr != nullptr; }<br>

+    int getMatchingId() { return MatchingId; }<br>

+    Value *getPtr() { return Ptr; }<br>

+    bool mayReadFromMemory() { return MayReadFromMemory; }<br>

+    bool mayWriteToMemory() { return MayWriteToMemory; }<br>

+<br>

+  private:<br>

+    bool Load;<br>

+    bool Store;<br>

+    bool Vol;<br>

+    bool MayReadFromMemory;<br>

+    bool MayWriteToMemory;<br>

+    // For regular (non-intrinsic) loads/stores, this is set to -1. For<br>

+    // intrinsic loads/stores, the id is retrieved from the corresponding<br>

+    // field in the MemIntrinsicInfo structure.  That field contains<br>

+    // non-negative values only.<br>

+    int MatchingId;<br>

+    Value *Ptr;<br>

+  };<br>

+<br>

   bool processNode(DomTreeNode *Node);<br>

<br>

   void getAnalysisUsage(AnalysisUsage &AU) const override {<br>

     AU.addRequired<AssumptionCacheTracker>();<br>

     AU.addRequired<DominatorTreeWrapperPass>();<br>

     AU.addRequired<TargetLibraryInfoWrapperPass>();<br>

+    AU.addRequired<TargetTransformInfo>();<br>

     AU.setPreservesCFG();<br>

   }<br>

+<br>

+  Value *getOrCreateResult(Value *Inst, Type *ExpectedType) const {<br>

+    if (LoadInst *LI = dyn_cast<LoadInst>(Inst))<br>

+      return LI;<br>

+    else if (StoreInst *SI = dyn_cast<StoreInst>(Inst))<br>

+      return SI->getValueOperand();<br>

+    assert(isa<IntrinsicInst>(Inst) && "Instruction not supported");<br>

+    return TTI->getOrCreateResultFromMemIntrinsic(cast<IntrinsicInst>(Inst),<br>

+                                                  ExpectedType);<br>

+  }<br>

 };<br>

 }<br>

<br>

@@ -420,7 +491,7 @@ bool EarlyCSE::processNode(DomTreeNode *<br>

   /// as long as there in no instruction that reads memory.  If we see a store<br>

   /// to the same location, we delete the dead store.  This zaps trivial dead<br>

   /// stores which can occur in bitfield code among other things.<br>

-  StoreInst *LastStore = nullptr;<br>

+  Instruction *LastStore = nullptr;<br>

<br>

   bool Changed = false;<br>

<br>

@@ -475,10 +546,11 @@ bool EarlyCSE::processNode(DomTreeNode *<br>

       continue;<br>

     }<br>

<br>

+    ParseMemoryInst MemInst(Inst, TTI);<br>

     // If this is a non-volatile load, process it.<br>

-    if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {<br>

+    if (MemInst.isValid() && MemInst.isLoad()) {<br>

       // Ignore volatile loads.<br>

-      if (!LI->isSimple()) {<br>

+      if (MemInst.isVolatile()) {<br>

         LastStore = nullptr;<br>

         continue;<br>

       }<br>

@@ -486,27 +558,35 @@ bool EarlyCSE::processNode(DomTreeNode *<br>

       // If we have an available version of this load, and if it is the right<br>

       // generation, replace this instruction.<br>

       std::pair<Value *, unsigned> InVal =<br>

-          AvailableLoads->lookup(Inst->getOperand(0));<br>

+          AvailableLoads->lookup(MemInst.getPtr());<br>

       if (InVal.first != nullptr && InVal.second == CurrentGeneration) {<br>

-        DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst<br>

-                     << "  to: " << *InVal.first << '\n');<br>

-        if (!Inst->use_empty())<br>

-          Inst->replaceAllUsesWith(InVal.first);<br>

-        Inst->eraseFromParent();<br>

-        Changed = true;<br>

-        ++NumCSELoad;<br>

-        continue;<br>

+        Value *Op = getOrCreateResult(InVal.first, Inst->getType());<br>

+        if (Op != nullptr) {<br>

+          DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst<br>

+                       << "  to: " << *InVal.first << '\n');<br>

+          if (!Inst->use_empty())<br>

+            Inst->replaceAllUsesWith(Op);<br>

+          Inst->eraseFromParent();<br>

+          Changed = true;<br>

+          ++NumCSELoad;<br>

+          continue;<br>

+        }<br>

       }<br>

<br>

       // Otherwise, remember that we have this instruction.<br>

-      AvailableLoads->insert(Inst->getOperand(0), std::pair<Value *, unsigned>(<br>

-                                                      Inst, CurrentGeneration));<br>

+      AvailableLoads->insert(MemInst.getPtr(), std::pair<Value *, unsigned>(<br>

+                                                   Inst, CurrentGeneration));<br>

       LastStore = nullptr;<br>

       continue;<br>

     }<br>

<br>

     // If this instruction may read from memory, forget LastStore.<br>

-    if (Inst->mayReadFromMemory())<br>

+    // Load/store intrinsics will indicate both a read and a write to<br>

+    // memory.  The target may override this (e.g. so that a store intrinsic<br>

+    // does not read  from memory, and thus will be treated the same as a<br>

+    // regular store for commoning purposes).<br>

+    if (Inst->mayReadFromMemory() &&<br>

+        !(MemInst.isValid() && !MemInst.mayReadFromMemory()))<br>

       LastStore = nullptr;<br>

<br>

     // If this is a read-only call, process it.<br>

@@ -537,17 +617,19 @@ bool EarlyCSE::processNode(DomTreeNode *<br>

     if (Inst->mayWriteToMemory()) {<br>

       ++CurrentGeneration;<br>

<br>

-      if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {<br>

+      if (MemInst.isValid() && MemInst.isStore()) {<br>

         // We do a trivial form of DSE if there are two stores to the same<br>

         // location with no intervening loads.  Delete the earlier store.<br>

-        if (LastStore &&<br>

-            LastStore->getPointerOperand() == SI->getPointerOperand()) {<br>

-          DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore<br>

-                       << "  due to: " << *Inst << '\n');<br>

-          LastStore->eraseFromParent();<br>

-          Changed = true;<br>

-          ++NumDSE;<br>

-          LastStore = nullptr;<br>

+        if (LastStore) {<br>

+          ParseMemoryInst LastStoreMemInst(LastStore, TTI);<br>

+          if (LastStoreMemInst.isMatchingMemLoc(MemInst)) {<br>

+            DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore<br>

+                         << "  due to: " << *Inst << '\n');<br>

+            LastStore->eraseFromParent();<br>

+            Changed = true;<br>

+            ++NumDSE;<br>

+            LastStore = nullptr;<br>

+          }<br>

           // fallthrough - we can exploit information about this store<br>

         }<br>

<br>

@@ -556,13 +638,12 @@ bool EarlyCSE::processNode(DomTreeNode *<br>

         // version of the pointer.  It is safe to forward from volatile stores<br>

         // to non-volatile loads, so we don't have to check for volatility of<br>

         // the store.<br>

-        AvailableLoads->insert(SI->getPointerOperand(),<br>

-                               std::pair<Value *, unsigned>(<br>

-                                   SI->getValueOperand(), CurrentGeneration));<br>

+        AvailableLoads->insert(MemInst.getPtr(), std::pair<Value *, unsigned>(<br>

+                                                     Inst, CurrentGeneration));<br>

<br>

         // Remember that this was the last store we saw for DSE.<br>

-        if (SI->isSimple())<br>

-          LastStore = SI;<br>

+        if (!MemInst.isVolatile())<br>

+          LastStore = Inst;<br>

       }<br>

     }<br>

   }<br>

@@ -584,6 +665,7 @@ bool EarlyCSE::runOnFunction(Function &F<br>

   DataLayoutPass *DLP = getAnalysisIfAvailable<DataLayoutPass>();<br>

   DL = DLP ? &DLP->getDataLayout() : nullptr;<br>

   TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();<br>

+  TTI = &getAnalysis<TargetTransformInfo>();<br>

   DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();<br>

   AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);<br>

<br>

<br>

Added: llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll?rev=227149&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll?rev=227149&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll (added)<br>

+++ llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll Mon Jan 26 16:51:15 2015<br>

@@ -0,0 +1,231 @@<br>

+; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -early-cse | FileCheck %s<br>

+<br>

+define <4 x i32> @test_cse(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>

+entry:<br>

+; Check that @llvm.aarch64.neon.ld2 is optimized away by Early CSE.<br>

+; CHECK-LABEL: @test_cse<br>

+; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8<br>

+  %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>

+  %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>

+  br label %for.cond<br>

+<br>

+for.cond:                                         ; preds = %for.body, %entry<br>

+  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>

+  %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>

+  %cmp = icmp slt i32 %i.0, %n<br>

+  br i1 %cmp, label %for.body, label %for.end<br>

+<br>

+for.body:                                         ; preds = %for.cond<br>

+  %0 = bitcast i32* %a to i8*<br>

+  %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>

+  %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>

+  %3 = bitcast <16 x i8> %1 to <4 x i32><br>

+  %4 = bitcast <16 x i8> %2 to <4 x i32><br>

+  call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>

+  %5 = bitcast i32* %a to i8*<br>

+  %vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %5)<br>

+  %vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0<br>

+  %vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1<br>

+  %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld2.fca.0.extract)<br>

+  %inc = add nsw i32 %i.0, 1<br>

+  br label %for.cond<br>

+<br>

+for.end:                                          ; preds = %for.cond<br>

+  ret <4 x i32> %res.0<br>

+}<br>

+<br>

+define <4 x i32> @test_cse2(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>

+entry:<br>

+; Check that the first @llvm.aarch64.neon.st2 is optimized away by Early CSE.<br>

+; CHECK-LABEL: @test_cse2<br>

+; CHECK-NOT: call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %3, i8* %0)<br>

+; CHECK: call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>

+  %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>

+  %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>

+  br label %for.cond<br>

+<br>

+for.cond:                                         ; preds = %for.body, %entry<br>

+  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>

+  %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>

+  %cmp = icmp slt i32 %i.0, %n<br>

+  br i1 %cmp, label %for.body, label %for.end<br>

+<br>

+for.body:                                         ; preds = %for.cond<br>

+  %0 = bitcast i32* %a to i8*<br>

+  %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>

+  %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>

+  %3 = bitcast <16 x i8> %1 to <4 x i32><br>

+  %4 = bitcast <16 x i8> %2 to <4 x i32><br>

+  call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %3, i8* %0)<br>

+  call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>

+  %5 = bitcast i32* %a to i8*<br>

+  %vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %5)<br>

+  %vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0<br>

+  %vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1<br>

+  %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld2.fca.0.extract)<br>

+  %inc = add nsw i32 %i.0, 1<br>

+  br label %for.cond<br>

+<br>

+for.end:                                          ; preds = %for.cond<br>

+  ret <4 x i32> %res.0<br>

+}<br>

+<br>

+define <4 x i32> @test_cse3(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) #0 {<br>

+entry:<br>

+; Check that the first @llvm.aarch64.neon.ld2 is optimized away by Early CSE.<br>

+; CHECK-LABEL: @test_cse3<br>

+; CHECK: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8<br>

+; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8<br>

+  %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>

+  %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>

+  br label %for.cond<br>

+<br>

+for.cond:                                         ; preds = %for.body, %entry<br>

+  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>

+  %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>

+  %cmp = icmp slt i32 %i.0, %n<br>

+  br i1 %cmp, label %for.body, label %for.end<br>

+<br>

+for.body:                                         ; preds = %for.cond<br>

+  %0 = bitcast i32* %a to i8*<br>

+  %vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %0)<br>

+  %vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0<br>

+  %vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1<br>

+  %1 = bitcast i32* %a to i8*<br>

+  %vld22 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %1)<br>

+  %vld22.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld22, 0<br>

+  %vld22.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld22, 1<br>

+  %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld22.fca.0.extract)<br>

+  %inc = add nsw i32 %i.0, 1<br>

+  br label %for.cond<br>

+<br>

+for.end:                                          ; preds = %for.cond<br>

+  ret <4 x i32> %res.0<br>

+}<br>

+<br>

+<br>

+define <4 x i32> @test_nocse(i32* %a, i32* %b, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>

+entry:<br>

+; Check that the store prevents @llvm.aarch64.neon.ld2 from being optimized<br>

+; away by Early CSE.<br>

+; CHECK-LABEL: @test_nocse<br>

+; CHECK: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8<br>

+  %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>

+  %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>

+  br label %for.cond<br>

+<br>

+for.cond:                                         ; preds = %for.body, %entry<br>

+  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>

+  %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>

+  %cmp = icmp slt i32 %i.0, %n<br>

+  br i1 %cmp, label %for.body, label %for.end<br>

+<br>

+for.body:                                         ; preds = %for.cond<br>

+  %0 = bitcast i32* %a to i8*<br>

+  %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>

+  %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>

+  %3 = bitcast <16 x i8> %1 to <4 x i32><br>

+  %4 = bitcast <16 x i8> %2 to <4 x i32><br>

+  call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>

+  store i32 0, i32* %b, align 4<br>

+  %5 = bitcast i32* %a to i8*<br>

+  %vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %5)<br>

+  %vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0<br>

+  %vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1<br>

+  %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld2.fca.0.extract)<br>

+  %inc = add nsw i32 %i.0, 1<br>

+  br label %for.cond<br>

+<br>

+for.end:                                          ; preds = %for.cond<br>

+  ret <4 x i32> %res.0<br>

+}<br>

+<br>

+define <4 x i32> @test_nocse2(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>

+entry:<br>

+; Check that @llvm.aarch64.neon.ld3 is not optimized away by Early CSE due<br>

+; to mismatch between st2 and ld3.<br>

+; CHECK-LABEL: @test_nocse2<br>

+; CHECK: call { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8<br>

+  %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>

+  %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>

+  br label %for.cond<br>

+<br>

+for.cond:                                         ; preds = %for.body, %entry<br>

+  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>

+  %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>

+  %cmp = icmp slt i32 %i.0, %n<br>

+  br i1 %cmp, label %for.body, label %for.end<br>

+<br>

+for.body:                                         ; preds = %for.cond<br>

+  %0 = bitcast i32* %a to i8*<br>

+  %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>

+  %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>

+  %3 = bitcast <16 x i8> %1 to <4 x i32><br>

+  %4 = bitcast <16 x i8> %2 to <4 x i32><br>

+  call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>

+  %5 = bitcast i32* %a to i8*<br>

+  %vld3 = call { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8(i8* %5)<br>

+  %vld3.fca.0.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 0<br>

+  %vld3.fca.2.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 2<br>

+  %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld3.fca.0.extract, <4 x i32> %vld3.fca.2.extract)<br>

+  %inc = add nsw i32 %i.0, 1<br>

+  br label %for.cond<br>

+<br>

+for.end:                                          ; preds = %for.cond<br>

+  ret <4 x i32> %res.0<br>

+}<br>

+<br>

+define <4 x i32> @test_nocse3(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>

+entry:<br>

+; Check that @llvm.aarch64.neon.st3 is not optimized away by Early CSE due to<br>

+; mismatch between st2 and st3.<br>

+; CHECK-LABEL: @test_nocse3<br>

+; CHECK: call void @llvm.aarch64.neon.st3.v4i32.p0i8<br>

+; CHECK: call void @llvm.aarch64.neon.st2.v4i32.p0i8<br>

+  %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>

+  %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>

+  br label %for.cond<br>

+<br>

+for.cond:                                         ; preds = %for.body, %entry<br>

+  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>

+  %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>

+  %cmp = icmp slt i32 %i.0, %n<br>

+  br i1 %cmp, label %for.body, label %for.end<br>

+<br>

+for.body:                                         ; preds = %for.cond<br>

+  %0 = bitcast i32* %a to i8*<br>

+  %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>

+  %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>

+  %3 = bitcast <16 x i8> %1 to <4 x i32><br>

+  %4 = bitcast <16 x i8> %2 to <4 x i32><br>

+  call void @llvm.aarch64.neon.st3.v4i32.p0i8(<4 x i32> %4, <4 x i32> %3, <4 x i32> %3, i8* %0)<br>

+  call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %3, i8* %0)<br>

+  %5 = bitcast i32* %a to i8*<br>

+  %vld3 = call { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8(i8* %5)<br>

+  %vld3.fca.0.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 0<br>

+  %vld3.fca.1.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 1<br>

+  %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld3.fca.0.extract, <4 x i32> %vld3.fca.0.extract)<br>

+  %inc = add nsw i32 %i.0, 1<br>

+  br label %for.cond<br>

+<br>

+for.end:                                          ; preds = %for.cond<br>

+  ret <4 x i32> %res.0<br>

+}<br>

+<br>

+; Function Attrs: nounwind<br>

+declare void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32>, <4 x i32>, i8* nocapture)<br>

+<br>

+; Function Attrs: nounwind<br>

+declare void @llvm.aarch64.neon.st3.v4i32.p0i8(<4 x i32>, <4 x i32>, <4 x i32>, i8* nocapture)<br>

+<br>

+; Function Attrs: nounwind readonly<br>

+declare { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8*)<br>

+<br>

+; Function Attrs: nounwind readonly<br>

+declare { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8(i8*)<br>

+<br>

+define internal fastcc <4 x i32> @vaddq_s32(<4 x i32> %__p0, <4 x i32> %__p1) {<br>

+entry:<br>

+  %add = add <4 x i32> %__p0, %__p1<br>

+  ret <4 x i32> %add<br>

+}<br>

<br>

Added: llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg?rev=227149&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg?rev=227149&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg (added)<br>

+++ llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg Mon Jan 26 16:51:15 2015<br>

@@ -0,0 +1,5 @@<br>

+config.suffixes = ['.ll']<br>

+<br>

+targets = set(config.root.targets_to_build.split())<br>

+if not 'AArch64' in targets:<br>

+    config.unsupported = True<br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</blockquote></div><br></div></div>