[PATCH] D100026: [X86] Support AMX fast register allocation

Thu Apr 15 07:32:43 PDT 2021

pengfei added inline comments.

================
Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:1321
+  // Allow targets to change the register assignments before rewriting.
+  addPreRewrite();
   return true;
----------------
Will this affect other targets? I think being called by FastRA might not be expected for them.

================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:44-45
+  const X86Subtarget *ST = nullptr;
+  const TargetRegisterInfo *TRI;
+  const TargetInstrInfo *TII;
+  MachineRegisterInfo *MRI = nullptr;
----------------
A bit strange leave them uninitialized.

================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:131
+bool X86FastTileConfig::isAMXInstr(MachineInstr &MI) {
+  // TODO: May need to handle some special nontile amx instrucion.
+  if (MI.getOpcode() == X86::LDTILECFG)
----------------
You should exclude debug MI here.

================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:134
+    return false;
+    
+  for (MachineOperand &MO : MI.operands())
----------------
Extra spaces.

================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:181
+    else
+      ShapedTiles.push_back(&MO);
+  }
----------------
Do you need to check MO's register class? If the KeyMI is store, you will save its tile to shape?

================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:192
+void X86FastTileConfig::getShapeCfgInstrs(
+    MachineInstr *MI, std::map<unsigned, MachineInstr *> &RowCfgs,
+    std::map<unsigned, MachineInstr *> &ColCfgs) {
----------------
If the number is consecutive, use SmallVector should have better performance.

================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:197
+
+  for (auto II = Cfg; II != MBB->begin(); II--) {
+    if (isAMXInstr(*II) || II->isTerminator() || II->isCall())
----------------
Is it possible the shapes are not in current BB? E.g. The previous BB been split etc.

================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:229
+// Here is the data format for the tile config.
+// 0      palette   = 0 now.
+// 1      start_row = 0 now.
----------------
Should palette = 1?

================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:272
+  // Orderly keep the tile uses and def in ShapedTiles;
+  unsigned NumCfg = getTileShapesCfg(CfgMI, ShapedTiles);
+  assert(NumCfg && "Not find shapes config!");
----------------
Nit. Maybe better define it return void and check ShapedTiles.size() in assert.

================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:283
+    SmallVector<MachineInstr *, 2> CFGs;
+    for (MachineInstr &MI : MBB)
+      if (MI.getOpcode() == X86::LDTILECFG)
----------------
I think it's better to collect the shape config here.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:56-58
+static cl::opt<bool>
+    X86ScalarizeAMX("enable-x86-scalar-amx", cl::init(false),
+                    cl::Hidden, cl::desc("X86: enable AMX scalarizition."));
----------------
You can move it into the namespace. By the way, clang-format.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:20
+/// -emit-llvm t.c" + "llc t.ll") we should make sure the amx data is volatile,
+/// because that is nessary for AMX fast register allocation. (In Fast register
+/// allocation, register will be allocated before spill/reload, so there is no
----------------
necessary

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:388
+  if (IsPHI) {
+    Value *PhiOp = dyn_cast<PHINode>(V)->getIncomingValue(0);
+    II = cast<IntrinsicInst>(PhiOp);
----------------
Better use cast to help check for failures.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:392
+    II = cast<IntrinsicInst>(V);
+  Value *Row = II->getOperand(0);
+  Value *Col = II->getOperand(1);
----------------
Will `II` be nullptr?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:510
+// ------------------------------------------------------
+void X86VolatileTileData::volatileTilePHI(PHINode *PHI) {
+  BasicBlock *BB = PHI->getParent();
----------------
Can we handle the phi used by another phi?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:616
+
     X86LowerAMXType LAT(F);
     bool C = LAT.visit();
----------------
Is this necessary for O0?

================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:205
+      return false;
+    return Loads.contains(ST);
+  }
----------------
You can check Loads[0] directly.

================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:216
+
+  // The def of KeyAMX should be stored.
+  // Todo: is it key amx can be no def?
----------------
Why can't be DPSSD etc.?

================
Comment at: llvm/test/CodeGen/X86/AMX/amx-configO0toO0.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -O0 -mtriple=x86_64-unknown-unknown -mattr=+amx-int8 -mattr=+avx512f | FileCheck %s --check-prefix=AMX_O0
+source_filename = "amx_api.c"
----------------
You don't need prefix for the single RUN. The same below.

================
Comment at: llvm/test/CodeGen/X86/AMX/amx-configO2toO0.ll:39
+; AMX_O0-NEXT:    vxorps %xmm0, %xmm0, %xmm0
+; AMX_O0-NEXT:    vmovdqa64 %zmm0, {{[0-9]+}}(%rsp)
+; AMX_O0-NEXT:    movb %al, %sil
----------------
You should always set palette = 1 after it.

================
Comment at: llvm/test/CodeGen/X86/AMX/amx-fast-tile-config.mir:31
+    %16 = bitcast <16 x i32>* %6 to i8*
+    store <16 x i32> zeroinitializer, <16 x i32>* %6, align 64
+    %amx.tmm.0.shape.row1 = getelementptr i8, i8* %16, i64 48
----------------
But the alignment of store and alloca is not match. You may cause runtime crush due the the alignment.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100026/new/

https://reviews.llvm.org/D100026