[PATCH] D13611: [Polly] Create virtual independent blocks

Michael Kruse via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 6 17:54:08 PST 2015


Meinersbur added a comment.

I have the impression that this very much relies on that accesses (mostly SCALAR READ, but also PHI writes in non-affine subregions) are computed per instruction, whereas in http://reviews.llvm.org/D13762 I tried to make them per-ScopStmt. The difference is that e.g. if there are two reads in the same BB, two loads would be generated. In http://reviews.llvm.org/D13762 I tried to ensure that there is only one load.

This patch seems to skip PHI accesses completely. In San Joe you mentioned that only loop-carried phis would be skipped. Where is that part? Where would DeLICM tell what memory can be reused?

I got the impression we could generalize this with in SCEV expander. A SCEV-like object (we already support SDiv/SRem) could store an entire operand tree of movable instructions and CodeGen would synthesize/materialize it  on request. This looks like more streamlined with current mechanisms instead of introducing a new one. What do you think?


================
Comment at: include/polly/CodeGen/BlockGenerators.h:390
@@ -389,1 +389,3 @@
 
+  /// @brief Recompute all scalars needed in this statement.
+  ///
----------------
"all" does not match the description below

================
Comment at: include/polly/CodeGen/BlockGenerators.h:394
@@ +393,3 @@
+  /// description as well as the dependences. However, they are only moved if
+  /// we can recompute them in the statements they are used. This method will
+  /// perform the recomputation before we clone the original statement into the
----------------
... where they are used.

================
Comment at: include/polly/CodeGen/BlockGenerators.h:396
@@ +395,3 @@
+  /// perform the recomputation before we clone the original statement into the
+  /// new, optimized region, thus ensure all scalars are available.
+  void recomputeDependentScalars(ScopStmt &Stmt, ValueMapT &BBMap,
----------------
... thus ensures that all ...

================
Comment at: include/polly/CodeGen/BlockGenerators.h:451
@@ -438,1 +450,3 @@
+  ///           o NULL, if no value is found and TryOnly is set.
+  ///           o Otherwise a trap is triggered.
   Value *getNewValue(ScopStmt &Stmt, Value *Old, ValueMapT &BBMap,
----------------
llvm_unreachable is not defined to trigger a trap.

================
Comment at: include/polly/CodeGen/BlockGenerators.h:456
@@ -441,3 +455,3 @@
   void copyInstScalar(ScopStmt &Stmt, Instruction *Inst, ValueMapT &BBMap,
-                      LoopToScevMapT &LTS);
+                      LoopToScevMapT &LTS, bool Recompute = false);
 
----------------
Missing documentation?

================
Comment at: include/polly/CodeGen/BlockGenerators.h:517
@@ -502,1 +516,3 @@
   ///                    memory accesses.
+  /// @param Recompute   Flag to indicate the instruction is a scalar that
+  ///                    needs to be recomputed in this statement. It basically
----------------
Flag to indicate that the instruction ...

================
Comment at: include/polly/LinkAllPasses.h:36
@@ -35,3 +35,2 @@
 llvm::Pass *createDOTViewerPass();
-llvm::Pass *createIndependentBlocksPass();
 llvm::Pass *createJSONExporterPass();
----------------
Separate patch to remove IndependentBlocks?

================
Comment at: include/polly/ScopInfo.h:518
@@ +517,3 @@
+  /// @param Stmt    The statement the copied access should reside in.
+  MemoryAccess *copy(AccFuncSetType &AccList, ScopStmt *Stmt) const;
+
----------------
We can get the AccList from getStatement()->getParent()->AccFuncMap. Not necessary to pass it as parameter.

Rename to "copyTo"?

================
Comment at: include/polly/ScopInfo.h:543
@@ -536,3 +542,3 @@
   /// @brief Get the type of a memory access.
-  enum AccessType getType() { return AccType; }
+  enum AccessType getType() const { return AccType; }
 
----------------
Unrelated?

================
Comment at: include/polly/ScopInfo.h:925
@@ +924,3 @@
+  /// @param Front  Flag to indicate where the access should be added.
+  void addAccess(MemoryAccess *Access, bool Front = false);
+
----------------
I think we should not depend on the order of accesses. ISL does not understand such an order and depending on it will likely cause some bugs (Think of a load after storing to the same location; do we even handle that case yet?).

AFAIK ISL assumes that all loads happen before the stores.

================
Comment at: include/polly/ScopInfo.h:933
@@ -914,1 +932,3 @@
+  ///
+  void removeMemoryAccess(MemoryAccess *MA, bool OnlyMA);
 
----------------
OnlyMA=false removes multiple accesses, so move to a "removeAccessesCausedBy" function?

================
Comment at: include/polly/ScopInfo.h:968
@@ +967,3 @@
+  /// @brief Return the scalars that need to be recomputed in this statement.
+  const SetVector<Instruction *> &getDependentScalars() const {
+    return DependentScalars;
----------------
It would be better if the class would not expose its implementation as an API. It could return iterators or an ArrayRef.

================
Comment at: include/polly/ScopInfo.h:1275
@@ +1274,3 @@
+  /// @returns True, if all instructions can be recomputed in @p Stmt.
+  bool canRecomputeInStmt(ScopStmt &Stmt, SmallPtrSet<Instruction *, 4> &Insts);
+
----------------
It would be more idomatic if each call would check only a single Instruction instead of passing a set implementation.

================
Comment at: include/polly/ScopInfo.h:1291
@@ +1290,3 @@
+  /// operands properly.
+  using NonTrivialOperandsPairTy =
+      std::pair<InstructionSetTy, OutsideOperandsSetTy>;
----------------
I think you should clarify that you actually mean a flattened operand _tree_. I think LLVM uses the term operands as the direct arguments/parameters of an instruction.

================
Comment at: include/polly/ScopInfo.h:1295
@@ +1294,3 @@
+  /// @brief Map from in SCoP instructions to their non-trivial operands.
+  DenseMap<Instruction *, NonTrivialOperandsPairTy> NonTrivialOperandsMap;
+
----------------
AFAICS it is only used during the execution of simplifyScalarAccesses. You could make it local and pass it to the functions that need it.

================
Comment at: include/polly/ScopInfo.h:1330
@@ +1329,3 @@
+  /// original code region at all.
+  void simplifyScalarAccesses();
+
----------------
The name "simplifyScalarAccesses" is very general. How about "recomputableDependentScalars"?

================
Comment at: lib/Analysis/ScopInfo.cpp:667
@@ +666,3 @@
+  isl_map *AR = getAccessRelation();
+  if (StmtDims > Dims)
+    AR = isl_map_add_dims(AR, isl_dim_in, StmtDims - Dims);
----------------
What if it has fewer dimensions? If it is impossible, could you add an assertion to make it clear?

================
Comment at: lib/Analysis/ScopInfo.cpp:673
@@ +672,3 @@
+  return CopyMA;
+}
+
----------------
Doesn't this update InstructionToAccesses?

================
Comment at: lib/Analysis/ScopInfo.cpp:1420
@@ +1419,3 @@
+  auto MALEnd = MAL.end();
+  auto MemAccsIt = MemAccs.begin();
+  while (true) {
----------------
I'd still ask you to refactor this code. It is hard to understand and looks fragile as it depends on order. Maybe you could use some functions from <algorithm>

================
Comment at: lib/Analysis/ScopInfo.cpp:2778
@@ +2777,3 @@
+
+  // TODO: Check if we can actually move the instructions.
+  return false;
----------------
What are you plans here? When can instructions with side effects be recomputed?

================
Comment at: lib/Analysis/ScopInfo.cpp:2783
@@ +2782,3 @@
+void Scop::collectNonTrivialOperands() {
+  // See: Scop::simplifyScalarAccesses()
+
----------------
What is there to see?

================
Comment at: lib/Analysis/ScopInfo.cpp:2788
@@ +2787,3 @@
+    for (MemoryAccess *MA : Stmt) {
+      if (MA->isExplicit() || MA->isRead() || MA->isPHI())
+        continue;
----------------
Why exit node PHIs processed, but not standard phis?

================
Comment at: lib/Analysis/ScopInfo.cpp:2800
@@ +2799,3 @@
+
+      auto &NonTrivialOperands = NonTrivialOperandsMap[AccessInst];
+      auto &SideEffectOperands = NonTrivialOperands.first;
----------------
Why for each access separately? Shouldn't this be per-ScopStmt because it only matters for cross-stmt operands?

================
Comment at: lib/Analysis/ScopInfo.cpp:2815
@@ +2814,3 @@
+
+        for (auto &InstOp : Inst->operands())
+          if (Instruction *InstOpInst = dyn_cast<Instruction>(InstOp)) {
----------------
auto -> Use ?

================
Comment at: lib/Analysis/ScopInfo.cpp:2817
@@ +2816,3 @@
+          if (Instruction *InstOpInst = dyn_cast<Instruction>(InstOp)) {
+            if (R.contains(InstOpInst))
+              Worklist.insert(InstOpInst);
----------------
This probably will have issues with PHIs in the scop's entry just is in PR25394, ie. if InstOpInst is a PHI with an incoming block from outside the scop.

================
Comment at: lib/Analysis/ScopInfo.cpp:2824
@@ +2823,3 @@
+        if (Inst->mayHaveSideEffects() || Inst->mayReadFromMemory())
+          SideEffectOperands.insert(Inst);
+
----------------
and continue ?

================
Comment at: lib/Analysis/ScopInfo.cpp:2876
@@ +2875,3 @@
+      auto &SideEffectOperands = NonTrivialOperands.first;
+      if (!canRecomputeInStmt(Stmt, SideEffectOperands))
+        continue;
----------------
Wouldn't it be possible to recompute some of the dependent values, even if some operands have side effects?

================
Comment at: lib/Analysis/ScopInfo.cpp:2880
@@ +2879,3 @@
+      auto &OutsideOperands = NonTrivialOperands.second;
+      for (auto &OutsideOperandPair : OutsideOperands) {
+        Instruction *OutsideOperand = OutsideOperandPair.first;
----------------
Why only instructions that have operands from outside? For movable instructions (DependentScalar) without such operands, but implicit accesses, don't we need to remove its memory accesses as well?

================
Comment at: lib/Analysis/ScopInfo.cpp:2885
@@ +2884,3 @@
+        assert(UserStmt);
+        auto *UseMAs = UserStmt->lookupAccessesFor(OutsideUser);
+        if (!UseMAs)
----------------
This is using the list of accesses this function modifies on the fly ("InstructionToAccess"). As such, it seems to me to depend on the order we iterate over all statements. E.g. the previous iteration already removed the read access of "OutsideUser", hence will not be added to AdditionalAccesses anymore, hence we miss some read accesses.

Or am I wrong? Why is it this cannot happen?

================
Comment at: lib/Analysis/ScopInfo.cpp:2890
@@ +2889,3 @@
+        for (const MemoryAccess *UseMA : *UseMAs)
+          if (UseMA->getBaseAddr() == OutsideOperand)
+            AdditionalAccesses.push_back(UseMA->copy(AccList, &Stmt));
----------------
What does this condition mean?

I think we generally are misusing "BaseAddr" for scalar access, because it actually not an address.

================
Comment at: lib/Analysis/ScopInfo.cpp:2894
@@ +2893,3 @@
+
+      Stmt.addDependentScalar(DefInst);
+      ResolvedAccesses.push_back(MA);
----------------
I'd expect this to be computed in collectNonTrivialOperands(), like a partition of instructions in the operand tree into

- Movable
- Outside/Sideeffect (operand tree leaves)

I understand your algorithm works differently, but to understand, what is the rationale to only have direct operands in DependentScalars, while we want to copy/move an entire operand tree?

================
Comment at: lib/Analysis/ScopInfo.cpp:2932
@@ +2931,3 @@
+        auto *UserMAs = UserStmt->lookupAccessesFor(UserInst);
+        if (!UserMAs)
+          continue;
----------------
If a user has no more accesses, is it because it is intended to be removed?

================
Comment at: lib/Analysis/ScopInfo.cpp:2935
@@ +2934,3 @@
+
+        for (auto *UserMA : *UserMAs) {
+          if (UserMA->isExplicit() || UserMA->isWrite())
----------------
auto -> MemoryAccess ?

================
Comment at: lib/Analysis/ScopInfo.cpp:2940
@@ +2939,3 @@
+          AllUsersRemoved = false;
+          break;
+        }
----------------
Is this break meant to leave both for-loops?

================
Comment at: lib/Analysis/ScopInfo.cpp:2947
@@ +2946,3 @@
+
+      ResolvedAccesses.push_back(MA);
+    }
----------------
This seems to be a different kind of "resolved" than in removeRecomputableScalarReads.

================
Comment at: lib/Analysis/ScopInfo.cpp:2963
@@ +2962,3 @@
+void Scop::simplifyScalarAccesses() {
+  // First iterate over all implicit write accesses, hence scalar definitions
+  // and collect all operands that might have side effects or read memory as
----------------
collectNonTrivialOperands actually skips PHIs which are also implicit accesses.

================
Comment at: lib/Analysis/ScopInfo.cpp:2972
@@ +2971,3 @@
+
+  // In the second step traverse all implicit read accesses, hence scalar uses
+  // in statements that do not define the scalar. However, at the moment we
----------------
hence -> i.e.

================
Comment at: lib/CodeGen/BlockGenerators.cpp:67
@@ +66,3 @@
+  for (auto *Inst : Stmt.getDependentScalars())
+    if (!GlobalMap.count(Inst))
+      copyInstruction(Stmt, Inst, BBMap, LTS, NewAccesses, true);
----------------
When does a dependent scalar become a global?

================
Comment at: lib/CodeGen/BlockGenerators.cpp:86
@@ +85,3 @@
+            if (R.contains(Inst))
+              copyInstruction(Stmt, Inst, BBMap, LTS, nullptr, true);
+        }
----------------
Shouldn't these be in the BBMap after recomputeDependentScalars() executed?

================
Comment at: lib/CodeGen/BlockGenerators.cpp:130
@@ -111,3 +129,3 @@
 
-  if (Value *New = trySynthesizeNewValue(Stmt, Old, BBMap, LTS, L))
-    return New;
+  if (canSynthesize(Old, &LI, &SE, &Stmt.getParent()->getRegion()))
+    if (Value *New = trySynthesizeNewValue(Stmt, Old, BBMap, LTS, L))
----------------
Why does this become necessary? Shouldn't "trySynthesizeNewValue" determine itself whether it is synthesizable?

================
Comment at: lib/CodeGen/BlockGenerators.cpp:147
@@ -124,4 +146,3 @@
   // The scalar dependence is neither available nor SCEVCodegenable.
   llvm_unreachable("Unexpected scalar dependence in region!");
 }
----------------
This essentially becomes

    assert(TryOnly)
    return nullptr;

================
Comment at: lib/CodeGen/BlockGenerators.cpp:169
@@ +168,3 @@
+      Instruction *NewOperandInst = dyn_cast_or_null<Instruction>(NewOperand);
+      if (!NewOperand || (NewOperandInst && R.contains(NewOperandInst))) {
+        if (Instruction *OldOperandInst = dyn_cast<Instruction>(OldOperand)) {
----------------
What is the rationale for this condition?

================
Comment at: lib/CodeGen/BlockGenerators.cpp:171
@@ +170,3 @@
+        if (Instruction *OldOperandInst = dyn_cast<Instruction>(OldOperand)) {
+          copyInstruction(Stmt, OldOperandInst, BBMap, LTS, nullptr, Recompute);
+          NewOperand = BBMap[OldOperand];
----------------
Shouldn't this been handled by recomputeDependentScalars() already? How do we know that the operand is even a DependentScalar?

If this is because DependendScalars is unordered, wouldn't it be cleaner to ensure they are in their order of dependence?

================
Comment at: lib/CodeGen/BlockGenerators.cpp:226
@@ -191,3 +225,3 @@
 
-Loop *BlockGenerator::getLoopForInst(const llvm::Instruction *Inst) {
-  return LI.getLoopFor(Inst->getParent());
+Loop *BlockGenerator::getLoopForStmt(const ScopStmt &Stmt) const {
+  auto *StmtBB =
----------------
I like this change, but could be committed independently. Maybe also as a member function of ScopStmt?

================
Comment at: lib/CodeGen/BlockGenerators.cpp:228
@@ +227,3 @@
+  auto *StmtBB =
+      Stmt.isBlockStmt() ? Stmt.getBasicBlock() : Stmt.getRegion()->getEntry();
+  return LI.getLoopFor(StmtBB);
----------------
This idiom appears multiple times. Introduce a helper function?

================
Comment at: lib/CodeGen/BlockGenerators.cpp:334
@@ +333,3 @@
+
+  recomputeDependentScalars(Stmt, BBMap, LTS, NewAccesses);
+
----------------
Any reason to do this before generateScalarLoads()? Naively, I'd think it belongs after it because some if the dependent instructions might still use scalars (e.g. read-only ones).

================
Comment at: test/Isl/CodeGen/srem-in-other-bb.ll:1
@@ -1,3 +1,2 @@
-; RUN: opt %loadPolly -polly-codegen -S \
-; RUN:     < %s | FileCheck %s
+; RUN: opt %loadPolly -polly-codegen -S < %s | FileCheck %s
 ;
----------------
Nice, but unrelated


http://reviews.llvm.org/D13611





More information about the llvm-commits mailing list