[PATCH] D100026: [X86] Support AMX fast register allocation
Pengfei Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 15 07:32:43 PDT 2021
pengfei added inline comments.
================
Comment at: llvm/lib/CodeGen/TargetPassConfig.cpp:1321
+ // Allow targets to change the register assignments before rewriting.
+ addPreRewrite();
return true;
----------------
Will this affect other targets? I think being called by FastRA might not be expected for them.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:44-45
+ const X86Subtarget *ST = nullptr;
+ const TargetRegisterInfo *TRI;
+ const TargetInstrInfo *TII;
+ MachineRegisterInfo *MRI = nullptr;
----------------
A bit strange leave them uninitialized.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:131
+bool X86FastTileConfig::isAMXInstr(MachineInstr &MI) {
+ // TODO: May need to handle some special nontile amx instrucion.
+ if (MI.getOpcode() == X86::LDTILECFG)
----------------
You should exclude debug MI here.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:134
+ return false;
+
+ for (MachineOperand &MO : MI.operands())
----------------
Extra spaces.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:181
+ else
+ ShapedTiles.push_back(&MO);
+ }
----------------
Do you need to check MO's register class? If the KeyMI is store, you will save its tile to shape?
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:192
+void X86FastTileConfig::getShapeCfgInstrs(
+ MachineInstr *MI, std::map<unsigned, MachineInstr *> &RowCfgs,
+ std::map<unsigned, MachineInstr *> &ColCfgs) {
----------------
If the number is consecutive, use SmallVector should have better performance.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:197
+
+ for (auto II = Cfg; II != MBB->begin(); II--) {
+ if (isAMXInstr(*II) || II->isTerminator() || II->isCall())
----------------
Is it possible the shapes are not in current BB? E.g. The previous BB been split etc.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:229
+// Here is the data format for the tile config.
+// 0 palette = 0 now.
+// 1 start_row = 0 now.
----------------
Should palette = 1?
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:272
+ // Orderly keep the tile uses and def in ShapedTiles;
+ unsigned NumCfg = getTileShapesCfg(CfgMI, ShapedTiles);
+ assert(NumCfg && "Not find shapes config!");
----------------
Nit. Maybe better define it return void and check ShapedTiles.size() in assert.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:283
+ SmallVector<MachineInstr *, 2> CFGs;
+ for (MachineInstr &MI : MBB)
+ if (MI.getOpcode() == X86::LDTILECFG)
----------------
I think it's better to collect the shape config here.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:56-58
+static cl::opt<bool>
+ X86ScalarizeAMX("enable-x86-scalar-amx", cl::init(false),
+ cl::Hidden, cl::desc("X86: enable AMX scalarizition."));
----------------
You can move it into the namespace. By the way, clang-format.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:20
+/// -emit-llvm t.c" + "llc t.ll") we should make sure the amx data is volatile,
+/// because that is nessary for AMX fast register allocation. (In Fast register
+/// allocation, register will be allocated before spill/reload, so there is no
----------------
necessary
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:388
+ if (IsPHI) {
+ Value *PhiOp = dyn_cast<PHINode>(V)->getIncomingValue(0);
+ II = cast<IntrinsicInst>(PhiOp);
----------------
Better use cast to help check for failures.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:392
+ II = cast<IntrinsicInst>(V);
+ Value *Row = II->getOperand(0);
+ Value *Col = II->getOperand(1);
----------------
Will `II` be nullptr?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:510
+// ------------------------------------------------------
+void X86VolatileTileData::volatileTilePHI(PHINode *PHI) {
+ BasicBlock *BB = PHI->getParent();
----------------
Can we handle the phi used by another phi?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:616
+
X86LowerAMXType LAT(F);
bool C = LAT.visit();
----------------
Is this necessary for O0?
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:205
+ return false;
+ return Loads.contains(ST);
+ }
----------------
You can check Loads[0] directly.
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:216
+
+ // The def of KeyAMX should be stored.
+ // Todo: is it key amx can be no def?
----------------
Why can't be DPSSD etc.?
================
Comment at: llvm/test/CodeGen/X86/AMX/amx-configO0toO0.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -O0 -mtriple=x86_64-unknown-unknown -mattr=+amx-int8 -mattr=+avx512f | FileCheck %s --check-prefix=AMX_O0
+source_filename = "amx_api.c"
----------------
You don't need prefix for the single RUN. The same below.
================
Comment at: llvm/test/CodeGen/X86/AMX/amx-configO2toO0.ll:39
+; AMX_O0-NEXT: vxorps %xmm0, %xmm0, %xmm0
+; AMX_O0-NEXT: vmovdqa64 %zmm0, {{[0-9]+}}(%rsp)
+; AMX_O0-NEXT: movb %al, %sil
----------------
You should always set palette = 1 after it.
================
Comment at: llvm/test/CodeGen/X86/AMX/amx-fast-tile-config.mir:31
+ %16 = bitcast <16 x i32>* %6 to i8*
+ store <16 x i32> zeroinitializer, <16 x i32>* %6, align 64
+ %amx.tmm.0.shape.row1 = getelementptr i8, i8* %16, i64 48
----------------
But the alignment of store and alloca is not match. You may cause runtime crush due the the alignment.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100026/new/
https://reviews.llvm.org/D100026
More information about the llvm-commits
mailing list