[PATCH] D100026: [X86] Support AMX fast register allocation
Xiang Zhang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 14 01:00:53 PDT 2021
xiangzhangllvm added a comment.
Address Pengfei's comments:
================
Comment at: llvm/lib/Target/X86/X86ExpandPseudo.cpp:483
+ case X86::PLDTILECFGV: {
+ MI.setDesc(TII->get(X86::LDTILECFG));
+ return true;
----------------
pengfei wrote:
> Maybe don't need pseudo instruction?
Yes, I'll change it (duo to some history I used pseudo instruction.)
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:521
+ Value *Op = PHI->getIncomingValue(i);
+ Instruction *I = dyn_cast<Instruction>(Op);
+ assert(I && "We shouldn't fold AMX instrution!");
----------------
pengfei wrote:
> You may need to exclude debug intrinsics.
PHI's operands shouldn't be debug intrinsics.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:529
+ if (DomBB == BB)
+ Pos = I;
+ else if (DomBB != LastDomBB)
----------------
pengfei wrote:
> I maybe not the last phi.
Here no need domination relation, I'll remove it, (for some history reason, I put allocation instruction in dominated BB)
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:596
+ else
+ AMXDefInsts.push_back(&I);
+ }
----------------
pengfei wrote:
> Do you also insert a store for load intrinsic?
If the tileload comes from user's source code, O0 or my patch will generate for it.
We shouldn't generate tilestore for auto generated tileload.
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:60
+static bool IsAMXIntrinsic(IntrinsicInst *II) {
+ for (Value *Operand : II->operands())
+ if (Operand->getType()->isX86_AMXTy())
----------------
pengfei wrote:
> You may need to exclude debug intrinsics.
debug IRs are not IntrinsicInst.
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:173
+ Type *V512Ty = VectorType::get(Builder.getInt32Ty(), 16, false);
+ Align Alignment = DL.getPrefTypeAlign(Type::getInt32Ty(Ctx));
+
----------------
pengfei wrote:
> Should be V512Ty?
we set the cfg mem align to 4 Bytes before, so here sync with it.
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:291
+
+ I = getConfigPosEnd(I, PosAndShapes[&*I]);
+ Find = true;
----------------
pengfei wrote:
> Better add assert to check I is load.
BB.end() is possible.
================
Comment at: llvm/test/CodeGen/X86/AMX/amx-fast-tile-config.mir:416
+ ; CHECK: renamable $tmm2 = PTILELOADDV renamable $di, renamable $cx, killed renamable $r9, 1, renamable $rsi, 0, $noreg
+ ; CHECK: renamable $tmm0 = PTILELOADDV renamable $ax, renamable $cx, killed renamable $r8, 1, renamable $rsi, 0, $noreg
+ ; CHECK: renamable $tmm0 = PTDPBSSDV renamable $ax, renamable $cx, killed renamable $di, renamable $tmm0, killed renamable $tmm1, killed renamable $tmm2
----------------
pengfei wrote:
> The shapes for tmm0 is ax and cx, but the stored shape in stack is $sil and 8?
Here is a mistake, I'll fix it, thanks!
================
Comment at: llvm/test/CodeGen/X86/AMX/amx-fast-tile-config.mir:441
+ renamable $di = MOV16ri 8
+ renamable $tmm1 = PTILELOADDV renamable $ax, renamable $di, killed renamable $r10, 1, renamable $rsi, 0, $noreg
+ renamable $tmm2 = PTILELOADDV renamable $di, renamable $cx, killed renamable $r9, 1, renamable $rsi, 0, $noreg
----------------
pengfei wrote:
> I don't find where we shore this shape.
above mov8/16* are pre-generated before register allocation, they will adjust their store position after this pass tested.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100026/new/
https://reviews.llvm.org/D100026
More information about the llvm-commits
mailing list