[PATCH] D113424: ext-tsp basic block layout

Mon Nov 8 21:19:44 PST 2021

rahmanl added a comment.

Can you try the new layout with CSPGO?  I think the improvements should be higher there since it has more precise profiles.

Please also report memory overhead. My concern is that the `Chain` objects take up non-linear space.

================
Comment at: llvm/lib/CodeGen/MachineBlockPlacement.cpp:3464
+    BlockFrequency BlockFreq = MBFI->getBlockFreq(&MBB);
+    BlockSizes[BlockIndex[&MBB]] = MBB.size();
+    BlockCounts[BlockIndex[&MBB]] = BlockFreq.getFrequency();
----------------
This will give you the number of instructions which is not exactly the binary size of the block.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:51-54
+const double ForwardWeight = 0.1;
+const double BackwardWeight = 0.1;
+const size_t ForwardDistance = 1024;
+const size_t BackwardDistance = 640;
----------------
Please consider adding parameters for these. I think it will be useful for users to tune based on their architecture.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:93
+using BlockPair = std::pair<Block *, Block *>;
+using JumpList = std::vector<std::pair<BlockPair, uint64_t>>;
+using BlockIter = std::vector<Block *>::const_iterator;
----------------
Please define a struct for the Jump (Src, Sink, Weight).

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:152
+  // Outgoing jumps from the block.
+  std::vector<std::pair<Block *, uint64_t>> OutJumps;
+  // Incoming jumps to the block.
----------------
The abstraction for block-edges is very loosely defined. I think defining a struct will help.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:228
+  // Whether the chain starts with the entry basic block.
+  bool IsEntry;
+  // Cached ext-tsp score for the chain.
----------------
Do we really need to store a boolean here? It adds extra burden to make sure we keep it consistent after merging.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:234
+  // Adjacent chains and corresponding edges (lists of jumps).
+  std::vector<std::pair<Chain *, Edge *>> Edges;
+};
----------------
How do we ensure the obsolete chains are removed from this structure? Would it make life easier if we define it as a map from the Chain pointers (or Chain ids) ?

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:240
+/// there is always at most one edge between a pair of chains
+class Edge {
+public:
----------------
Maybe call this ChainEdge so it's not confused with a basic block edge/jump.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:474
+
+  /// For a pair of blocks, A and B, block B is the fallthrough successor of A,
+  /// if (i) all jumps (based on profile) from A goes to B and (ii) all jumps
----------------
I think fallthrough may be a misnomer here. Normally, a block's fallthrough successor is its fallthrough block in the original layout regardless of whether these blocks have other outgoing edges.
I suggest renaming the concept. My suggestion is "mutually forced" or "mutually dominated" edges.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:492
+    // data isn't 100% accurate).
+    // Break the cycles by choosing the block with smallest index as the tail
+    for (auto &Block : AllBlocks) {
----------------
Please explain why you choose such block.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:591
+
+  /// Compute ExtTSP score for a given order of basic blocks.
+  double score(const MergedChain &MergedBlocks, const JumpList &Jumps) const {
----------------
This computes the total score for the jumps in the jump list. Please change the comment accordingly.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:634
+
+    // Attach (a part of) ChainPred before the first block of ChainSucc
+    for (auto &Jump : ChainSucc->blocks().front()->InJumps) {
----------------
I see the following two loops try to form a new fallthrough between ChainPred and ChainSucc and this is independent from the size of the ChainPred. This could be an independent optimization. Do you think you can do this as a later patch? That way, we can evaluate its benefit /overhead in a better way.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:675-678
+#ifndef NDEBUG
+          auto BB2 = ChainPred->blocks()[Offset];
+          assert(BB1->FallthroughSucc == BB2 && "Fallthrough not preserved");
+#endif
----------------
This seems to be an assertion that is unrelated to the purpose of this code block. Could you please remove or move it to a relevant position?

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:683
+        // Try to split the chain in different ways
+        Gain = computeMergeGain(Gain, ChainPred, ChainSucc, Jumps, Offset,
+                                MergeTypeTy::X1_Y_X2);
----------------
Why not doing X2_Y_X1 here?

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:695
+
+  /// Merge two chains and update the best gain.
+  MergeGainTy computeMergeGain(const MergeGainTy &CurGain,
----------------
Does this actually merge the chains together?

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:709
+    // The gain for the new chain
+    const auto NewScore = score(MergedBlocks, Jumps) - ChainPred->score();
+    auto NewGain = MergeGainTy(NewScore, MergeOffset, MergeType);
----------------
Is this the new score or the change in the score? `ChainSucc`'s intra-chain jumps are not included in the `Jumps` vector.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:711
+    auto NewGain = MergeGainTy(NewScore, MergeOffset, MergeType);
+    return CurGain < NewGain ? NewGain : CurGain;
+  }
----------------
This is inconsistent with the function comment. CurGain is not updated here, but rather in the caller of this function.

My suggestion is to add the following function to MergeGainTy and use it in the callsite. This way you can remove the first parameter from `computeMergeGain` and also `computeMergeGain` can return what it computes.

```
void updateIfLessThan(const MergeGainTy &OtherGain) {
      if (*this < OtherGain)
            * this = OtherGain;
}
```

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:720
+  MergedChain mergeBlocks(const std::vector<Block *> &X,
+                          const std::vector<Block *> &Y, size_t MergeOffset,
+                          MergeTypeTy MergeType) const {
----------------
Rename this to XSplitOffset.

================
Comment at: llvm/lib/Transforms/Utils/CodeLayout.cpp:766
+    auto Iter = std::remove(HotChains.begin(), HotChains.end(), From);
+    HotChains.erase(Iter, HotChains.end());
+
----------------
It seems that the obsolete chains are not deallocated from the `AllChains` vector. Does this raise any memory concerns?

================
Comment at: llvm/test/CodeGen/X86/code_placement_ext_tsp.ll:1
+; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -enable-ext-tsp-block-placement=1 < %s | FileCheck %s
+
----------------
The interesting code path (splitting the chain) is not exercised by these tests. Please add a test for that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113424/new/

https://reviews.llvm.org/D113424