[PATCH] D100026: [X86] Support AMX fast register allocation

Wed Apr 14 01:00:53 PDT 2021

xiangzhangllvm added a comment.

Address Pengfei's comments:

================
Comment at: llvm/lib/Target/X86/X86ExpandPseudo.cpp:483
+  case X86::PLDTILECFGV: {
+    MI.setDesc(TII->get(X86::LDTILECFG));
+    return true;
----------------
pengfei wrote:
> Maybe don't need pseudo instruction?
Yes, I'll change it (duo to some history I used pseudo instruction.)

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:521
+    Value *Op = PHI->getIncomingValue(i);
+    Instruction *I = dyn_cast<Instruction>(Op);
+    assert(I && "We shouldn't fold AMX instrution!");
----------------
pengfei wrote:
> You may need to exclude debug intrinsics.
PHI's operands shouldn't be  debug intrinsics.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:529
+    if (DomBB == BB)
+      Pos = I;
+    else if (DomBB != LastDomBB)
----------------
pengfei wrote:
> I maybe not the last phi.
Here no need domination relation, I'll remove it, (for some history reason, I put allocation instruction in dominated BB)

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:596
+      else
+        AMXDefInsts.push_back(&I);
+    }
----------------
pengfei wrote:
> Do you also insert a store for load intrinsic?
If the tileload comes from user's source code, O0 or my patch will generate for it.
We shouldn't generate tilestore for auto generated tileload.

================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:60
+static bool IsAMXIntrinsic(IntrinsicInst *II) {
+  for (Value *Operand : II->operands())
+    if (Operand->getType()->isX86_AMXTy())
----------------
pengfei wrote:
> You may need to exclude debug intrinsics.
debug IRs are not IntrinsicInst.

================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:173
+  Type *V512Ty = VectorType::get(Builder.getInt32Ty(), 16, false);
+  Align Alignment = DL.getPrefTypeAlign(Type::getInt32Ty(Ctx));
+
----------------
pengfei wrote:
> Should be V512Ty?
we set the cfg mem align to 4 Bytes before, so here sync with it.

================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:291
+
+      I = getConfigPosEnd(I, PosAndShapes[&*I]);
+      Find = true;
----------------
pengfei wrote:
> Better add assert to check I is load.
BB.end() is possible.

================
Comment at: llvm/test/CodeGen/X86/AMX/amx-fast-tile-config.mir:416
+     ; CHECK: renamable $tmm2 = PTILELOADDV renamable $di, renamable $cx, killed renamable $r9, 1, renamable $rsi, 0, $noreg
+     ; CHECK: renamable $tmm0 = PTILELOADDV renamable $ax, renamable $cx, killed renamable $r8, 1, renamable $rsi, 0, $noreg
+     ; CHECK: renamable $tmm0 = PTDPBSSDV renamable $ax, renamable $cx, killed renamable $di, renamable $tmm0, killed renamable $tmm1, killed renamable $tmm2
----------------
pengfei wrote:
> The shapes for tmm0 is ax and cx, but the stored shape in stack is $sil and 8?
Here is a mistake, I'll fix it, thanks!

================
Comment at: llvm/test/CodeGen/X86/AMX/amx-fast-tile-config.mir:441
+    renamable $di = MOV16ri 8
+    renamable $tmm1 = PTILELOADDV renamable $ax, renamable $di, killed renamable $r10, 1, renamable $rsi, 0, $noreg
+    renamable $tmm2 = PTILELOADDV renamable $di, renamable $cx, killed renamable $r9, 1, renamable $rsi, 0, $noreg
----------------
pengfei wrote:
> I don't find where we shore this shape.
above mov8/16* are pre-generated before register allocation, they will adjust their store position after this pass tested.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100026/new/

https://reviews.llvm.org/D100026