[PATCH] D95136: [X86] Fix tile config register spill issue.

Pengfei Wang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Jan 23 18:48:33 PST 2021


pengfei added inline comments.


================
Comment at: llvm/lib/Target/X86/X86PreTileConfig.cpp:284
+  // 3. There're AMX instructions in the BB before a call instruction.
+  void findInsertPoints(MachineBasicBlock *MBB, MachineInstr *MI = nullptr) {
+    bool HasAMX = false;
----------------
LuoYuanke wrote:
> It seems a lot of change in the patch. If you drop tilerelease, will the patch be simpler. I would like solve tile config issue first and move forward.
Yes, it will be. But the former BFS + DFS algorithm is still complex than the single DFS one, which means we still need major changes.
I would like to keep the tilerelease handling here. I didn't find bug of it for now and the calculating math is straightforward. I can show you the 4 status in below:
```
    ______________________________
S3 |begin ... AMX ... call ... end|
    ------------------------------
    ______________________________
S2 |begin ... call ... AMX ... end|
    ------------------------------
           _____________
S1        |begin ... end|
           -------------
          _/_       _\_
   max(S(|BB0|), S(|BB1|), ...) >= 1
          ---       ---
```
S0 is similar with S1 where all its successors' status are all 0. It doesn't matter if there's loop in the graph. Because when calculating the status, we assign 0 to current BB at first (if there's no AMX in it). Then we will iterate all its successors. It turns to 1 if one of other successors is not 0, and keeps 0 if all other successors are 0.
We insert tilerelease after the last AMX of current BB only if all its successors' status are 0. Otherwise, we will iterate all its first level successors and insert tilerelease at the begining of BB whoes status is 0. We also need to consider the BB whoes status is 1. We will recursively find their successors and insert tilerelease for BBs whoes status is 0.
All BB will be calculated for tilerelease inserting case once at most, since we record them in BBNeedInsertVisited. There's no big algorithm complexity increasement here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95136/new/

https://reviews.llvm.org/D95136



More information about the llvm-commits mailing list