[PATCH] D100026: [X86] Support AMX fast register allocation

Xiang Zhang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 15 17:41:25 PDT 2021


xiangzhangllvm added a comment.

Address pengfei's comments:



================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:181
+    else
+      ShapedTiles.push_back(&MO);
+  }
----------------
pengfei wrote:
> Do you need to check MO's register class? If the KeyMI is store, you will save its tile to shape?
Checked at 176, MO must be TilePhysReg, Yes, KeyMI's all tile operands' shapes should be saved.
KeyMI  can never be a tilestore, because for a volatile model, tile data in tilestore must comes from tileload.
So the KeyMI prefer tileload.  line 154 in getKeyAMXInstr should never happen, I'll replace it with a assert.


================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:192
+void X86FastTileConfig::getShapeCfgInstrs(
+    MachineInstr *MI, std::map<unsigned, MachineInstr *> &RowCfgs,
+    std::map<unsigned, MachineInstr *> &ColCfgs) {
----------------
pengfei wrote:
> If the number is consecutive, use SmallVector should have better performance.
because we don't know the num of the shapes at first, we may meet the shapes not in order.


================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:197
+
+  for (auto II = Cfg; II != MBB->begin(); II--) {
+    if (isAMXInstr(*II) || II->isTerminator() || II->isCall())
----------------
pengfei wrote:
> Is it possible the shapes are not in current BB? E.g. The previous BB been split etc.
I think it is impossible in O0


================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:229
+// Here is the data format for the tile config.
+// 0      palette   = 0 now.
+// 1      start_row = 0 now.
----------------
pengfei wrote:
> Should palette = 1?
Yes, I miss it, thanks!


================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:283
+    SmallVector<MachineInstr *, 2> CFGs;
+    for (MachineInstr &MI : MBB)
+      if (MI.getOpcode() == X86::LDTILECFG)
----------------
pengfei wrote:
> I think it's better to collect the shape config here.
Do you mean materializeTileCfg(MI) here ? It will modify the MBB, that will broken the iterators of MBB


================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:56-58
+static cl::opt<bool>
+    X86ScalarizeAMX("enable-x86-scalar-amx", cl::init(false),
+                    cl::Hidden, cl::desc("X86: enable AMX scalarizition."));
----------------
pengfei wrote:
> You can move it into the namespace. By the way, clang-format.
I think that is not sensitive for a static opt.


================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:388
+  if (IsPHI) {
+    Value *PhiOp = dyn_cast<PHINode>(V)->getIncomingValue(0);
+    II = cast<IntrinsicInst>(PhiOp);
----------------
pengfei wrote:
> Better use cast to help check for failures.
the caller passed IsPHI will make sure it is phi. 


================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:392
+    II = cast<IntrinsicInst>(V);
+  Value *Row = II->getOperand(0);
+  Value *Col = II->getOperand(1);
----------------
pengfei wrote:
> Will `II` be nullptr?
It can't be nullptr, Current all tile def should comes from IntrinsicInst. (PHI has specially handled)


================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:510
+// ------------------------------------------------------
+void X86VolatileTileData::volatileTilePHI(PHINode *PHI) {
+  BasicBlock *BB = PHI->getParent();
----------------
pengfei wrote:
> Can we handle the phi used by another phi?
I begin to think this case, I think it should never happened, do we have meet Recursive PHI before? I think the " Recursive PHI" should be PHI which has more than 2 operands.


================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:616
+
     X86LowerAMXType LAT(F);
     bool C = LAT.visit();
----------------
pengfei wrote:
> Is this necessary for O0?
Greedy also need this AMX Lower Type pass.


================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:205
+      return false;
+    return Loads.contains(ST);
+  }
----------------
pengfei wrote:
> You can check Loads[0] directly.
What we care is that there should be only 1 tileload for tilestore. 


================
Comment at: llvm/test/CodeGen/X86/AMX/amx-configO0toO0.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -O0 -mtriple=x86_64-unknown-unknown -mattr=+amx-int8 -mattr=+avx512f | FileCheck %s --check-prefix=AMX_O0
+source_filename = "amx_api.c"
----------------
pengfei wrote:
> You don't need prefix for the single RUN. The same below.
you mean --check-prefix=AMX_O0 ? I just thought it is more clear for this test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100026/new/

https://reviews.llvm.org/D100026



More information about the llvm-commits mailing list