<div dir="ltr">I know its a pain to have post-commit design review, but its impossible to catch everything pre-commit.<div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 26, 2015 at 2:51 PM, Chad Rosier <span dir="ltr"><<a href="mailto:mcrosier@codeaurora.org" target="_blank">mcrosier@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: mcrosier<br>
Date: Mon Jan 26 16:51:15 2015<br>
New Revision: 227149<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=227149&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=227149&view=rev</a><br>
Log:<br>
Commoning of target specific load/store intrinsics in Early CSE.<br>
<br>
Phabricator revision: <a href="http://reviews.llvm.org/D7121" target="_blank">http://reviews.llvm.org/D7121</a><br>
Patch by Sanjin Sijaric <<a href="mailto:ssijaric@codeaurora.org">ssijaric@codeaurora.org</a>>!<br></blockquote><div><br></div><div>The thing I completely missed when skimming the subject was the fact that this is not just target-specific, but leveraging TTI...</div><div><br></div><div>So, first question: why is this desirable? Why do we need this? There seems to be no real justification for this in the commit log, comments, or anywhere.</div><div><br></div><div>EarlyCSE is a very high-level pass. I don't know why it is reasonable to teach it how to deal with every kind of target-specific load intrinsic we come up with. This is really contrary to the entire IR's design.</div><div><br></div><div>The point of TTI was to expose *cost models* from the code generator to the IR pass. This is something completely different and subverts that design in a really fundamental way. I'm pretty opposed to this entire direction unless there is some deep and fundamental reason why we *have* to do this and so far I'm not seeing it. I could speculate about any number of other ways you might solve similar or related problems, but it seems pointless until there is a clear and precise description of the problem this was intended to solve.</div><div><br></div><div>So on multiple levels I feel like this is not the right design.</div><div><br></div><div>-Chandler</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Added:<br>
llvm/trunk/test/Transforms/EarlyCSE/AArch64/<br>
llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll<br>
llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg<br>
Modified:<br>
llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h<br>
llvm/trunk/lib/Analysis/TargetTransformInfo.cpp<br>
llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp<br>
llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp<br>
<br>
Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h?rev=227149&r1=227148&r2=227149&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h?rev=227149&r1=227148&r2=227149&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h (original)<br>
+++ llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h Mon Jan 26 16:51:15 2015<br>
@@ -23,6 +23,7 @@<br>
#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H<br>
<br>
#include "llvm/IR/Intrinsics.h"<br>
+#include "llvm/IR/IntrinsicInst.h"<br>
#include "llvm/Pass.h"<br>
#include "llvm/Support/DataTypes.h"<br>
<br>
@@ -35,6 +36,20 @@ class Type;<br>
class User;<br>
class Value;<br>
<br>
+/// \brief Information about a load/store intrinsic defined by the target.<br>
+struct MemIntrinsicInfo {<br>
+ MemIntrinsicInfo()<br>
+ : ReadMem(false), WriteMem(false), Vol(false), MatchingId(0),<br>
+ NumMemRefs(0), PtrVal(nullptr) {}<br>
+ bool ReadMem;<br>
+ bool WriteMem;<br>
+ bool Vol;<br>
+ // Same Id is set by the target for corresponding load/store intrinsics.<br>
+ unsigned short MatchingId;<br>
+ int NumMemRefs;<br>
+ Value *PtrVal;<br>
+};<br>
+<br>
/// TargetTransformInfo - This pass provides access to the codegen<br>
/// interfaces that are needed for IR-level transformations.<br>
class TargetTransformInfo {<br>
@@ -443,6 +458,20 @@ public:<br>
/// any callee-saved registers, so would require a spill and fill.<br>
virtual unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type*> Tys) const;<br>
<br>
+ /// \returns True if the intrinsic is a supported memory intrinsic. Info<br>
+ /// will contain additional information - whether the intrinsic may write<br>
+ /// or read to memory, volatility and the pointer. Info is undefined<br>
+ /// if false is returned.<br>
+ virtual bool getTgtMemIntrinsic(IntrinsicInst *Inst,<br>
+ MemIntrinsicInfo &Info) const;<br>
+<br>
+ /// \returns A value which is the result of the given memory intrinsic. New<br>
+ /// instructions may be created to extract the result from the given intrinsic<br>
+ /// memory operation. Returns nullptr if the target cannot create a result<br>
+ /// from the given intrinsic.<br>
+ virtual Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst,<br>
+ Type *ExpectedType) const;<br>
+<br>
/// @}<br>
<br>
/// Analysis group identification.<br>
<br>
Modified: llvm/trunk/lib/Analysis/TargetTransformInfo.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/TargetTransformInfo.cpp?rev=227149&r1=227148&r2=227149&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/TargetTransformInfo.cpp?rev=227149&r1=227148&r2=227149&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Analysis/TargetTransformInfo.cpp (original)<br>
+++ llvm/trunk/lib/Analysis/TargetTransformInfo.cpp Mon Jan 26 16:51:15 2015<br>
@@ -254,6 +254,16 @@ unsigned TargetTransformInfo::getCostOfK<br>
return PrevTTI->getCostOfKeepingLiveOverCall(Tys);<br>
}<br>
<br>
+Value *TargetTransformInfo::getOrCreateResultFromMemIntrinsic(<br>
+ IntrinsicInst *Inst, Type *ExpectedType) const {<br>
+ return PrevTTI->getOrCreateResultFromMemIntrinsic(Inst, ExpectedType);<br>
+}<br>
+<br>
+bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst,<br>
+ MemIntrinsicInfo &Info) const {<br>
+ return PrevTTI->getTgtMemIntrinsic(Inst, Info);<br>
+}<br>
+<br>
namespace {<br>
<br>
struct NoTTI final : ImmutablePass, TargetTransformInfo {<br>
@@ -656,6 +666,15 @@ struct NoTTI final : ImmutablePass, Targ<br>
return 0;<br>
}<br>
<br>
+ bool getTgtMemIntrinsic(IntrinsicInst *Inst,<br>
+ MemIntrinsicInfo &Info) const override {<br>
+ return false;<br>
+ }<br>
+<br>
+ Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst,<br>
+ Type *ExpectedType) const override {<br>
+ return nullptr;<br>
+ }<br>
};<br>
<br>
} // end anonymous namespace<br>
<br>
Modified: llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp?rev=227149&r1=227148&r2=227149&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp?rev=227149&r1=227148&r2=227149&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (original)<br>
+++ llvm/trunk/lib/Target/AArch64/AArch64TargetTransformInfo.cpp Mon Jan 26 16:51:15 2015<br>
@@ -44,6 +44,12 @@ class AArch64TTI final : public Immutabl<br>
/// are set if the result needs to be inserted and/or extracted from vectors.<br>
unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;<br>
<br>
+ enum MemIntrinsicType {<br>
+ VECTOR_LDST_TWO_ELEMENTS,<br>
+ VECTOR_LDST_THREE_ELEMENTS,<br>
+ VECTOR_LDST_FOUR_ELEMENTS<br>
+ };<br>
+<br>
public:<br>
AArch64TTI() : ImmutablePass(ID), TM(nullptr), ST(nullptr), TLI(nullptr) {<br>
llvm_unreachable("This pass cannot be directly constructed");<br>
@@ -131,6 +137,11 @@ public:<br>
void getUnrollingPreferences(const Function *F, Loop *L,<br>
UnrollingPreferences &UP) const override;<br>
<br>
+ Value *getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst,<br>
+ Type *ExpectedType) const override;<br>
+<br>
+ bool getTgtMemIntrinsic(IntrinsicInst *Inst,<br>
+ MemIntrinsicInfo &Info) const override;<br>
<br>
/// @}<br>
};<br>
@@ -554,3 +565,83 @@ void AArch64TTI::getUnrollingPreferences<br>
// Disable partial & runtime unrolling on -Os.<br>
UP.PartialOptSizeThreshold = 0;<br>
}<br>
+<br>
+Value *AArch64TTI::getOrCreateResultFromMemIntrinsic(IntrinsicInst *Inst,<br>
+ Type *ExpectedType) const {<br>
+ switch (Inst->getIntrinsicID()) {<br>
+ default:<br>
+ return nullptr;<br>
+ case Intrinsic::aarch64_neon_st2:<br>
+ case Intrinsic::aarch64_neon_st3:<br>
+ case Intrinsic::aarch64_neon_st4: {<br>
+ // Create a struct type<br>
+ StructType *ST = dyn_cast<StructType>(ExpectedType);<br>
+ if (!ST)<br>
+ return nullptr;<br>
+ unsigned NumElts = Inst->getNumArgOperands() - 1;<br>
+ if (ST->getNumElements() != NumElts)<br>
+ return nullptr;<br>
+ for (unsigned i = 0, e = NumElts; i != e; ++i) {<br>
+ if (Inst->getArgOperand(i)->getType() != ST->getElementType(i))<br>
+ return nullptr;<br>
+ }<br>
+ Value *Res = UndefValue::get(ExpectedType);<br>
+ IRBuilder<> Builder(Inst);<br>
+ for (unsigned i = 0, e = NumElts; i != e; ++i) {<br>
+ Value *L = Inst->getArgOperand(i);<br>
+ Res = Builder.CreateInsertValue(Res, L, i);<br>
+ }<br>
+ return Res;<br>
+ }<br>
+ case Intrinsic::aarch64_neon_ld2:<br>
+ case Intrinsic::aarch64_neon_ld3:<br>
+ case Intrinsic::aarch64_neon_ld4:<br>
+ if (Inst->getType() == ExpectedType)<br>
+ return Inst;<br>
+ return nullptr;<br>
+ }<br>
+}<br>
+<br>
+bool AArch64TTI::getTgtMemIntrinsic(IntrinsicInst *Inst,<br>
+ MemIntrinsicInfo &Info) const {<br>
+ switch (Inst->getIntrinsicID()) {<br>
+ default:<br>
+ break;<br>
+ case Intrinsic::aarch64_neon_ld2:<br>
+ case Intrinsic::aarch64_neon_ld3:<br>
+ case Intrinsic::aarch64_neon_ld4:<br>
+ Info.ReadMem = true;<br>
+ Info.WriteMem = false;<br>
+ Info.Vol = false;<br>
+ Info.NumMemRefs = 1;<br>
+ Info.PtrVal = Inst->getArgOperand(0);<br>
+ break;<br>
+ case Intrinsic::aarch64_neon_st2:<br>
+ case Intrinsic::aarch64_neon_st3:<br>
+ case Intrinsic::aarch64_neon_st4:<br>
+ Info.ReadMem = false;<br>
+ Info.WriteMem = true;<br>
+ Info.Vol = false;<br>
+ Info.NumMemRefs = 1;<br>
+ Info.PtrVal = Inst->getArgOperand(Inst->getNumArgOperands() - 1);<br>
+ break;<br>
+ }<br>
+<br>
+ switch (Inst->getIntrinsicID()) {<br>
+ default:<br>
+ return false;<br>
+ case Intrinsic::aarch64_neon_ld2:<br>
+ case Intrinsic::aarch64_neon_st2:<br>
+ Info.MatchingId = VECTOR_LDST_TWO_ELEMENTS;<br>
+ break;<br>
+ case Intrinsic::aarch64_neon_ld3:<br>
+ case Intrinsic::aarch64_neon_st3:<br>
+ Info.MatchingId = VECTOR_LDST_THREE_ELEMENTS;<br>
+ break;<br>
+ case Intrinsic::aarch64_neon_ld4:<br>
+ case Intrinsic::aarch64_neon_st4:<br>
+ Info.MatchingId = VECTOR_LDST_FOUR_ELEMENTS;<br>
+ break;<br>
+ }<br>
+ return true;<br>
+}<br>
<br>
Modified: llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp?rev=227149&r1=227148&r2=227149&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp?rev=227149&r1=227148&r2=227149&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp (original)<br>
+++ llvm/trunk/lib/Transforms/Scalar/EarlyCSE.cpp Mon Jan 26 16:51:15 2015<br>
@@ -18,6 +18,7 @@<br>
#include "llvm/ADT/Statistic.h"<br>
#include "llvm/Analysis/AssumptionCache.h"<br>
#include "llvm/Analysis/InstructionSimplify.h"<br>
+#include "llvm/Analysis/TargetTransformInfo.h"<br>
#include "llvm/IR/DataLayout.h"<br>
#include "llvm/IR/Dominators.h"<br>
#include "llvm/IR/Instructions.h"<br>
@@ -273,6 +274,7 @@ class EarlyCSE : public FunctionPass {<br>
public:<br>
const DataLayout *DL;<br>
const TargetLibraryInfo *TLI;<br>
+ const TargetTransformInfo *TTI;<br>
DominatorTree *DT;<br>
AssumptionCache *AC;<br>
typedef RecyclingAllocator<<br>
@@ -383,14 +385,83 @@ private:<br>
bool Processed;<br>
};<br>
<br>
+ /// \brief Wrapper class to handle memory instructions, including loads,<br>
+ /// stores and intrinsic loads and stores defined by the target.<br>
+ class ParseMemoryInst {<br>
+ public:<br>
+ ParseMemoryInst(Instruction *Inst, const TargetTransformInfo *TTI)<br>
+ : Load(false), Store(false), Vol(false), MayReadFromMemory(false),<br>
+ MayWriteToMemory(false), MatchingId(-1), Ptr(nullptr) {<br>
+ MayReadFromMemory = Inst->mayReadFromMemory();<br>
+ MayWriteToMemory = Inst->mayWriteToMemory();<br>
+ if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {<br>
+ MemIntrinsicInfo Info;<br>
+ if (!TTI->getTgtMemIntrinsic(II, Info))<br>
+ return;<br>
+ if (Info.NumMemRefs == 1) {<br>
+ Store = Info.WriteMem;<br>
+ Load = Info.ReadMem;<br>
+ MatchingId = Info.MatchingId;<br>
+ MayReadFromMemory = Info.ReadMem;<br>
+ MayWriteToMemory = Info.WriteMem;<br>
+ Vol = Info.Vol;<br>
+ Ptr = Info.PtrVal;<br>
+ }<br>
+ } else if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {<br>
+ Load = true;<br>
+ Vol = !LI->isSimple();<br>
+ Ptr = LI->getPointerOperand();<br>
+ } else if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {<br>
+ Store = true;<br>
+ Vol = !SI->isSimple();<br>
+ Ptr = SI->getPointerOperand();<br>
+ }<br>
+ }<br>
+ bool isLoad() { return Load; }<br>
+ bool isStore() { return Store; }<br>
+ bool isVolatile() { return Vol; }<br>
+ bool isMatchingMemLoc(const ParseMemoryInst &Inst) {<br>
+ return Ptr == Inst.Ptr && MatchingId == Inst.MatchingId;<br>
+ }<br>
+ bool isValid() { return Ptr != nullptr; }<br>
+ int getMatchingId() { return MatchingId; }<br>
+ Value *getPtr() { return Ptr; }<br>
+ bool mayReadFromMemory() { return MayReadFromMemory; }<br>
+ bool mayWriteToMemory() { return MayWriteToMemory; }<br>
+<br>
+ private:<br>
+ bool Load;<br>
+ bool Store;<br>
+ bool Vol;<br>
+ bool MayReadFromMemory;<br>
+ bool MayWriteToMemory;<br>
+ // For regular (non-intrinsic) loads/stores, this is set to -1. For<br>
+ // intrinsic loads/stores, the id is retrieved from the corresponding<br>
+ // field in the MemIntrinsicInfo structure. That field contains<br>
+ // non-negative values only.<br>
+ int MatchingId;<br>
+ Value *Ptr;<br>
+ };<br>
+<br>
bool processNode(DomTreeNode *Node);<br>
<br>
void getAnalysisUsage(AnalysisUsage &AU) const override {<br>
AU.addRequired<AssumptionCacheTracker>();<br>
AU.addRequired<DominatorTreeWrapperPass>();<br>
AU.addRequired<TargetLibraryInfoWrapperPass>();<br>
+ AU.addRequired<TargetTransformInfo>();<br>
AU.setPreservesCFG();<br>
}<br>
+<br>
+ Value *getOrCreateResult(Value *Inst, Type *ExpectedType) const {<br>
+ if (LoadInst *LI = dyn_cast<LoadInst>(Inst))<br>
+ return LI;<br>
+ else if (StoreInst *SI = dyn_cast<StoreInst>(Inst))<br>
+ return SI->getValueOperand();<br>
+ assert(isa<IntrinsicInst>(Inst) && "Instruction not supported");<br>
+ return TTI->getOrCreateResultFromMemIntrinsic(cast<IntrinsicInst>(Inst),<br>
+ ExpectedType);<br>
+ }<br>
};<br>
}<br>
<br>
@@ -420,7 +491,7 @@ bool EarlyCSE::processNode(DomTreeNode *<br>
/// as long as there in no instruction that reads memory. If we see a store<br>
/// to the same location, we delete the dead store. This zaps trivial dead<br>
/// stores which can occur in bitfield code among other things.<br>
- StoreInst *LastStore = nullptr;<br>
+ Instruction *LastStore = nullptr;<br>
<br>
bool Changed = false;<br>
<br>
@@ -475,10 +546,11 @@ bool EarlyCSE::processNode(DomTreeNode *<br>
continue;<br>
}<br>
<br>
+ ParseMemoryInst MemInst(Inst, TTI);<br>
// If this is a non-volatile load, process it.<br>
- if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {<br>
+ if (MemInst.isValid() && MemInst.isLoad()) {<br>
// Ignore volatile loads.<br>
- if (!LI->isSimple()) {<br>
+ if (MemInst.isVolatile()) {<br>
LastStore = nullptr;<br>
continue;<br>
}<br>
@@ -486,27 +558,35 @@ bool EarlyCSE::processNode(DomTreeNode *<br>
// If we have an available version of this load, and if it is the right<br>
// generation, replace this instruction.<br>
std::pair<Value *, unsigned> InVal =<br>
- AvailableLoads->lookup(Inst->getOperand(0));<br>
+ AvailableLoads->lookup(MemInst.getPtr());<br>
if (InVal.first != nullptr && InVal.second == CurrentGeneration) {<br>
- DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst<br>
- << " to: " << *InVal.first << '\n');<br>
- if (!Inst->use_empty())<br>
- Inst->replaceAllUsesWith(InVal.first);<br>
- Inst->eraseFromParent();<br>
- Changed = true;<br>
- ++NumCSELoad;<br>
- continue;<br>
+ Value *Op = getOrCreateResult(InVal.first, Inst->getType());<br>
+ if (Op != nullptr) {<br>
+ DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst<br>
+ << " to: " << *InVal.first << '\n');<br>
+ if (!Inst->use_empty())<br>
+ Inst->replaceAllUsesWith(Op);<br>
+ Inst->eraseFromParent();<br>
+ Changed = true;<br>
+ ++NumCSELoad;<br>
+ continue;<br>
+ }<br>
}<br>
<br>
// Otherwise, remember that we have this instruction.<br>
- AvailableLoads->insert(Inst->getOperand(0), std::pair<Value *, unsigned>(<br>
- Inst, CurrentGeneration));<br>
+ AvailableLoads->insert(MemInst.getPtr(), std::pair<Value *, unsigned>(<br>
+ Inst, CurrentGeneration));<br>
LastStore = nullptr;<br>
continue;<br>
}<br>
<br>
// If this instruction may read from memory, forget LastStore.<br>
- if (Inst->mayReadFromMemory())<br>
+ // Load/store intrinsics will indicate both a read and a write to<br>
+ // memory. The target may override this (e.g. so that a store intrinsic<br>
+ // does not read from memory, and thus will be treated the same as a<br>
+ // regular store for commoning purposes).<br>
+ if (Inst->mayReadFromMemory() &&<br>
+ !(MemInst.isValid() && !MemInst.mayReadFromMemory()))<br>
LastStore = nullptr;<br>
<br>
// If this is a read-only call, process it.<br>
@@ -537,17 +617,19 @@ bool EarlyCSE::processNode(DomTreeNode *<br>
if (Inst->mayWriteToMemory()) {<br>
++CurrentGeneration;<br>
<br>
- if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {<br>
+ if (MemInst.isValid() && MemInst.isStore()) {<br>
// We do a trivial form of DSE if there are two stores to the same<br>
// location with no intervening loads. Delete the earlier store.<br>
- if (LastStore &&<br>
- LastStore->getPointerOperand() == SI->getPointerOperand()) {<br>
- DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore<br>
- << " due to: " << *Inst << '\n');<br>
- LastStore->eraseFromParent();<br>
- Changed = true;<br>
- ++NumDSE;<br>
- LastStore = nullptr;<br>
+ if (LastStore) {<br>
+ ParseMemoryInst LastStoreMemInst(LastStore, TTI);<br>
+ if (LastStoreMemInst.isMatchingMemLoc(MemInst)) {<br>
+ DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore<br>
+ << " due to: " << *Inst << '\n');<br>
+ LastStore->eraseFromParent();<br>
+ Changed = true;<br>
+ ++NumDSE;<br>
+ LastStore = nullptr;<br>
+ }<br>
// fallthrough - we can exploit information about this store<br>
}<br>
<br>
@@ -556,13 +638,12 @@ bool EarlyCSE::processNode(DomTreeNode *<br>
// version of the pointer. It is safe to forward from volatile stores<br>
// to non-volatile loads, so we don't have to check for volatility of<br>
// the store.<br>
- AvailableLoads->insert(SI->getPointerOperand(),<br>
- std::pair<Value *, unsigned>(<br>
- SI->getValueOperand(), CurrentGeneration));<br>
+ AvailableLoads->insert(MemInst.getPtr(), std::pair<Value *, unsigned>(<br>
+ Inst, CurrentGeneration));<br>
<br>
// Remember that this was the last store we saw for DSE.<br>
- if (SI->isSimple())<br>
- LastStore = SI;<br>
+ if (!MemInst.isVolatile())<br>
+ LastStore = Inst;<br>
}<br>
}<br>
}<br>
@@ -584,6 +665,7 @@ bool EarlyCSE::runOnFunction(Function &F<br>
DataLayoutPass *DLP = getAnalysisIfAvailable<DataLayoutPass>();<br>
DL = DLP ? &DLP->getDataLayout() : nullptr;<br>
TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();<br>
+ TTI = &getAnalysis<TargetTransformInfo>();<br>
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();<br>
AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);<br>
<br>
<br>
Added: llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll?rev=227149&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll?rev=227149&view=auto</a><br>
==============================================================================<br>
--- llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll (added)<br>
+++ llvm/trunk/test/Transforms/EarlyCSE/AArch64/intrinsics.ll Mon Jan 26 16:51:15 2015<br>
@@ -0,0 +1,231 @@<br>
+; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -early-cse | FileCheck %s<br>
+<br>
+define <4 x i32> @test_cse(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>
+entry:<br>
+; Check that @llvm.aarch64.neon.ld2 is optimized away by Early CSE.<br>
+; CHECK-LABEL: @test_cse<br>
+; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8<br>
+ %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>
+ %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>
+ br label %for.cond<br>
+<br>
+for.cond: ; preds = %for.body, %entry<br>
+ %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>
+ %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>
+ %cmp = icmp slt i32 %i.0, %n<br>
+ br i1 %cmp, label %for.body, label %for.end<br>
+<br>
+for.body: ; preds = %for.cond<br>
+ %0 = bitcast i32* %a to i8*<br>
+ %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>
+ %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>
+ %3 = bitcast <16 x i8> %1 to <4 x i32><br>
+ %4 = bitcast <16 x i8> %2 to <4 x i32><br>
+ call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>
+ %5 = bitcast i32* %a to i8*<br>
+ %vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %5)<br>
+ %vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0<br>
+ %vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1<br>
+ %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld2.fca.0.extract)<br>
+ %inc = add nsw i32 %i.0, 1<br>
+ br label %for.cond<br>
+<br>
+for.end: ; preds = %for.cond<br>
+ ret <4 x i32> %res.0<br>
+}<br>
+<br>
+define <4 x i32> @test_cse2(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>
+entry:<br>
+; Check that the first @llvm.aarch64.neon.st2 is optimized away by Early CSE.<br>
+; CHECK-LABEL: @test_cse2<br>
+; CHECK-NOT: call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %3, i8* %0)<br>
+; CHECK: call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>
+ %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>
+ %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>
+ br label %for.cond<br>
+<br>
+for.cond: ; preds = %for.body, %entry<br>
+ %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>
+ %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>
+ %cmp = icmp slt i32 %i.0, %n<br>
+ br i1 %cmp, label %for.body, label %for.end<br>
+<br>
+for.body: ; preds = %for.cond<br>
+ %0 = bitcast i32* %a to i8*<br>
+ %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>
+ %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>
+ %3 = bitcast <16 x i8> %1 to <4 x i32><br>
+ %4 = bitcast <16 x i8> %2 to <4 x i32><br>
+ call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %3, i8* %0)<br>
+ call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>
+ %5 = bitcast i32* %a to i8*<br>
+ %vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %5)<br>
+ %vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0<br>
+ %vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1<br>
+ %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld2.fca.0.extract)<br>
+ %inc = add nsw i32 %i.0, 1<br>
+ br label %for.cond<br>
+<br>
+for.end: ; preds = %for.cond<br>
+ ret <4 x i32> %res.0<br>
+}<br>
+<br>
+define <4 x i32> @test_cse3(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) #0 {<br>
+entry:<br>
+; Check that the first @llvm.aarch64.neon.ld2 is optimized away by Early CSE.<br>
+; CHECK-LABEL: @test_cse3<br>
+; CHECK: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8<br>
+; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8<br>
+ %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>
+ %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>
+ br label %for.cond<br>
+<br>
+for.cond: ; preds = %for.body, %entry<br>
+ %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>
+ %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>
+ %cmp = icmp slt i32 %i.0, %n<br>
+ br i1 %cmp, label %for.body, label %for.end<br>
+<br>
+for.body: ; preds = %for.cond<br>
+ %0 = bitcast i32* %a to i8*<br>
+ %vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %0)<br>
+ %vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0<br>
+ %vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1<br>
+ %1 = bitcast i32* %a to i8*<br>
+ %vld22 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %1)<br>
+ %vld22.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld22, 0<br>
+ %vld22.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld22, 1<br>
+ %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld22.fca.0.extract)<br>
+ %inc = add nsw i32 %i.0, 1<br>
+ br label %for.cond<br>
+<br>
+for.end: ; preds = %for.cond<br>
+ ret <4 x i32> %res.0<br>
+}<br>
+<br>
+<br>
+define <4 x i32> @test_nocse(i32* %a, i32* %b, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>
+entry:<br>
+; Check that the store prevents @llvm.aarch64.neon.ld2 from being optimized<br>
+; away by Early CSE.<br>
+; CHECK-LABEL: @test_nocse<br>
+; CHECK: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8<br>
+ %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>
+ %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>
+ br label %for.cond<br>
+<br>
+for.cond: ; preds = %for.body, %entry<br>
+ %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>
+ %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>
+ %cmp = icmp slt i32 %i.0, %n<br>
+ br i1 %cmp, label %for.body, label %for.end<br>
+<br>
+for.body: ; preds = %for.cond<br>
+ %0 = bitcast i32* %a to i8*<br>
+ %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>
+ %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>
+ %3 = bitcast <16 x i8> %1 to <4 x i32><br>
+ %4 = bitcast <16 x i8> %2 to <4 x i32><br>
+ call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>
+ store i32 0, i32* %b, align 4<br>
+ %5 = bitcast i32* %a to i8*<br>
+ %vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %5)<br>
+ %vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0<br>
+ %vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1<br>
+ %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld2.fca.0.extract)<br>
+ %inc = add nsw i32 %i.0, 1<br>
+ br label %for.cond<br>
+<br>
+for.end: ; preds = %for.cond<br>
+ ret <4 x i32> %res.0<br>
+}<br>
+<br>
+define <4 x i32> @test_nocse2(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>
+entry:<br>
+; Check that @llvm.aarch64.neon.ld3 is not optimized away by Early CSE due<br>
+; to mismatch between st2 and ld3.<br>
+; CHECK-LABEL: @test_nocse2<br>
+; CHECK: call { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8<br>
+ %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>
+ %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>
+ br label %for.cond<br>
+<br>
+for.cond: ; preds = %for.body, %entry<br>
+ %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>
+ %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>
+ %cmp = icmp slt i32 %i.0, %n<br>
+ br i1 %cmp, label %for.body, label %for.end<br>
+<br>
+for.body: ; preds = %for.cond<br>
+ %0 = bitcast i32* %a to i8*<br>
+ %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>
+ %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>
+ %3 = bitcast <16 x i8> %1 to <4 x i32><br>
+ %4 = bitcast <16 x i8> %2 to <4 x i32><br>
+ call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)<br>
+ %5 = bitcast i32* %a to i8*<br>
+ %vld3 = call { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8(i8* %5)<br>
+ %vld3.fca.0.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 0<br>
+ %vld3.fca.2.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 2<br>
+ %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld3.fca.0.extract, <4 x i32> %vld3.fca.2.extract)<br>
+ %inc = add nsw i32 %i.0, 1<br>
+ br label %for.cond<br>
+<br>
+for.end: ; preds = %for.cond<br>
+ ret <4 x i32> %res.0<br>
+}<br>
+<br>
+define <4 x i32> @test_nocse3(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {<br>
+entry:<br>
+; Check that @llvm.aarch64.neon.st3 is not optimized away by Early CSE due to<br>
+; mismatch between st2 and st3.<br>
+; CHECK-LABEL: @test_nocse3<br>
+; CHECK: call void @llvm.aarch64.neon.st3.v4i32.p0i8<br>
+; CHECK: call void @llvm.aarch64.neon.st2.v4i32.p0i8<br>
+ %s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0<br>
+ %s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1<br>
+ br label %for.cond<br>
+<br>
+for.cond: ; preds = %for.body, %entry<br>
+ %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]<br>
+ %res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]<br>
+ %cmp = icmp slt i32 %i.0, %n<br>
+ br i1 %cmp, label %for.body, label %for.end<br>
+<br>
+for.body: ; preds = %for.cond<br>
+ %0 = bitcast i32* %a to i8*<br>
+ %1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8><br>
+ %2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8><br>
+ %3 = bitcast <16 x i8> %1 to <4 x i32><br>
+ %4 = bitcast <16 x i8> %2 to <4 x i32><br>
+ call void @llvm.aarch64.neon.st3.v4i32.p0i8(<4 x i32> %4, <4 x i32> %3, <4 x i32> %3, i8* %0)<br>
+ call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %3, i8* %0)<br>
+ %5 = bitcast i32* %a to i8*<br>
+ %vld3 = call { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8(i8* %5)<br>
+ %vld3.fca.0.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 0<br>
+ %vld3.fca.1.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 1<br>
+ %call = call <4 x i32> @vaddq_s32(<4 x i32> %vld3.fca.0.extract, <4 x i32> %vld3.fca.0.extract)<br>
+ %inc = add nsw i32 %i.0, 1<br>
+ br label %for.cond<br>
+<br>
+for.end: ; preds = %for.cond<br>
+ ret <4 x i32> %res.0<br>
+}<br>
+<br>
+; Function Attrs: nounwind<br>
+declare void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32>, <4 x i32>, i8* nocapture)<br>
+<br>
+; Function Attrs: nounwind<br>
+declare void @llvm.aarch64.neon.st3.v4i32.p0i8(<4 x i32>, <4 x i32>, <4 x i32>, i8* nocapture)<br>
+<br>
+; Function Attrs: nounwind readonly<br>
+declare { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8*)<br>
+<br>
+; Function Attrs: nounwind readonly<br>
+declare { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8(i8*)<br>
+<br>
+define internal fastcc <4 x i32> @vaddq_s32(<4 x i32> %__p0, <4 x i32> %__p1) {<br>
+entry:<br>
+ %add = add <4 x i32> %__p0, %__p1<br>
+ ret <4 x i32> %add<br>
+}<br>
<br>
Added: llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg?rev=227149&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg?rev=227149&view=auto</a><br>
==============================================================================<br>
--- llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg (added)<br>
+++ llvm/trunk/test/Transforms/EarlyCSE/AArch64/lit.local.cfg Mon Jan 26 16:51:15 2015<br>
@@ -0,0 +1,5 @@<br>
+config.suffixes = ['.ll']<br>
+<br>
+targets = set(config.root.targets_to_build.split())<br>
+if not 'AArch64' in targets:<br>
+ config.unsupported = True<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
</blockquote></div><br></div></div>