[PATCH] D100026: [X86] Support AMX fast register allocation
Pengfei Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 13 23:08:56 PDT 2021
pengfei added inline comments.
================
Comment at: llvm/lib/Target/X86/X86ExpandPseudo.cpp:483
+ case X86::PLDTILECFGV: {
+ MI.setDesc(TII->get(X86::LDTILECFG));
+ return true;
----------------
Maybe don't need pseudo instruction?
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:26
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/PostOrderIterator.h"
+#include "llvm/CodeGen/LiveIntervals.h"
----------------
You can remove it if you don't use it.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:28
+#include "llvm/CodeGen/LiveIntervals.h"
+#include "llvm/CodeGen/MachineDominators.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
----------------
Same above.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:37
+#include "llvm/CodeGen/TileShapeInfo.h"
+#include "llvm/CodeGen/VirtRegMap.h"
+#include "llvm/InitializePasses.h"
----------------
Same above.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:52-55
+ MachineDominatorTree *DomTree = nullptr;
+ MachineRegisterInfo *MRI = nullptr;
+ VirtRegMap *VRM = nullptr;
+ LiveIntervals *LIS = nullptr;
----------------
You can remove them if you don't use.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:63
+
+ void FastTileConfig();
+ bool IsTileLoad(MachineInstr &MI);
----------------
Naming conversions: function should start with lower case letter.
================
Comment at: llvm/lib/Target/X86/X86FastTileConfig.cpp:111-114
+void X86FastTileConfig::getAnalysisUsage(AnalysisUsage &AU) const {
+ AU.setPreservesAll();
+ MachineFunctionPass::getAnalysisUsage(AU);
+}
----------------
You can remove it if you don't require any analysis.
================
Comment at: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:4705
}
+ case Intrinsic::x86_ldtilecfg_internal: {
+ unsigned Opc = X86::PLDTILECFGV;
----------------
You can move it to .td file.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:521
+ Value *Op = PHI->getIncomingValue(i);
+ Instruction *I = dyn_cast<Instruction>(Op);
+ assert(I && "We shouldn't fold AMX instrution!");
----------------
You may need to exclude debug intrinsics.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:529
+ if (DomBB == BB)
+ Pos = I;
+ else if (DomBB != LastDomBB)
----------------
I maybe not the last phi.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:596
+ else
+ AMXDefInsts.push_back(&I);
+ }
----------------
Do you also insert a store for load intrinsic?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:632
+ // return false;
+ DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
+
----------------
We may need to postpone it after we find a AMX intrinsic.
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:60
+static bool IsAMXIntrinsic(IntrinsicInst *II) {
+ for (Value *Operand : II->operands())
+ if (Operand->getType()->isX86_AMXTy())
----------------
You may need to exclude debug intrinsics.
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:173
+ Type *V512Ty = VectorType::get(Builder.getInt32Ty(), 16, false);
+ Align Alignment = DL.getPrefTypeAlign(Type::getInt32Ty(Ctx));
+
----------------
Should be V512Ty?
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:193
+
+// Todo: We may need to handle "more than one store" case in the future.
+bool X86PreAMXConfig::checkVolatileModel(SmallSet<Value *, 4> &Loads,
----------------
Better add assert for this case.
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:291
+
+ I = getConfigPosEnd(I, PosAndShapes[&*I]);
+ Find = true;
----------------
Better add assert to check I is load.
================
Comment at: llvm/lib/Target/X86/X86PreAMXConfig.cpp:356
+ // Prepare for fast register allocation at O0.
+ if (TM->getOptLevel() == CodeGenOpt::None) {
+
----------------
Do we need to check it since the pass is only created under O0?
================
Comment at: llvm/test/CodeGen/X86/AMX/amx-configO2toO0.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -O0 -mtriple=x86_64-unknown-unknown -mattr=+amx-int8 -mattr=+avx512f | FileCheck %s --check-prefix=AMX_O0
+
----------------
Better change one case to use avx or sse to check if stack cleared correctly.
================
Comment at: llvm/test/CodeGen/X86/AMX/amx-fast-tile-config.mir:416
+ ; CHECK: renamable $tmm2 = PTILELOADDV renamable $di, renamable $cx, killed renamable $r9, 1, renamable $rsi, 0, $noreg
+ ; CHECK: renamable $tmm0 = PTILELOADDV renamable $ax, renamable $cx, killed renamable $r8, 1, renamable $rsi, 0, $noreg
+ ; CHECK: renamable $tmm0 = PTDPBSSDV renamable $ax, renamable $cx, killed renamable $di, renamable $tmm0, killed renamable $tmm1, killed renamable $tmm2
----------------
The shapes for tmm0 is ax and cx, but the stored shape in stack is $sil and 8?
================
Comment at: llvm/test/CodeGen/X86/AMX/amx-fast-tile-config.mir:441
+ renamable $di = MOV16ri 8
+ renamable $tmm1 = PTILELOADDV renamable $ax, renamable $di, killed renamable $r10, 1, renamable $rsi, 0, $noreg
+ renamable $tmm2 = PTILELOADDV renamable $di, renamable $cx, killed renamable $r9, 1, renamable $rsi, 0, $noreg
----------------
I don't find where we shore this shape.
================
Comment at: llvm/utils/gn/secondary/llvm/lib/Target/X86/BUILD.gn:113
"X86LowerAMXType.cpp",
+ "X86PreAMXConfig.cpp",
"X86LowerTileCopy.cpp",
----------------
Why don't add X86FastTileConfig?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100026/new/
https://reviews.llvm.org/D100026
More information about the llvm-commits
mailing list