[PATCH] D93594: [X86] Pass to transform amx intrinsics to scalar operation.
LuoYuanke via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Feb 20 00:57:24 PST 2021
LuoYuanke added inline comments.
================
Comment at: llvm/include/llvm/CodeGen/Passes.h:496
+
+ FunctionPass *createX86LowerAMXIntrinsicsPass();
} // End llvm namespace
----------------
Add comments to describe what the pass does?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:1
+//===- llvm/CodeGen/TileShapeInfo.h - ---------------------------*- C++ -*-===//
+//
----------------
This seems wrong file name.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:11
+/// This pass is only enabled with -O0. With -O0, the def of shape to amx
+/// intrinsics is near the amx intrinsics code. We are not bale to find a
+/// point which post-dominate all the shape and dominate all amx intrinsics.
----------------
Type 'able'.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:159
+ Value *Ptr, Value *Stride, Value *Tile) {
+ Loop *RowLoop = LI.AllocateLoop();
+ Loop *ColLoop = LI.AllocateLoop();
----------------
Not sure if we can extract the common code of createTileLoadLoops and createTileStoreLoops, so that it can be used by both and some other functions.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:250
+ FixedVectorType *V256I32Ty = FixedVectorType::get(B.getInt32Ty(), 256);
+ // Type *EltTy = V256I32Ty->getElementType();
+ Value *VecC, *VecA, *VecB;
----------------
Delete the dead code.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:267
+ // %vec.c.phi.row = phi <256 x i32> [ %VecC, %continue ], [ %NewVecC,
+ // %tiledpbssd.scalarize.rows.latch ] %vec.d.phi.row = phi <256 x i32> [
+ // zeroinitializer, %continue ], [ %NewVecD, %tiledpbssd.scalarize.rows.latch
----------------
It should be in another line.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:280
+ // %tiledpbssd.scalarize.rows.body ], [ %NewVecC,
+ // %tiledpbssd.scalarize.cols.latch ] %vec.d.phi.col = phi <256 x i32> [
+ // %vec.d.phi.row, %tiledpbssd.scalarize.rows.body ], [ %NewVecD,
----------------
Better to be in a new line.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:292
+ // %tiledpbssd.scalarize.cols.body ], [ %NewVecC,
+ // %tiledpbssd.scalarize.inner.latch ] %vec.d.inner.phi = phi <256 x i32> [
+ // %vec.d.phi.col, %tiledpbssd.scalarize.cols.body ], [ %NewVecD,
----------------
Better to be in a new line.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:372
+ Instruction *InsertI = TileDPBSSD;
+ IRBuilder<> BuilderPrepare(TileDPBSSD);
+ BuilderPrepare.SetInsertPoint(TileDPBSSD);
----------------
The name seems not good. Is "PreBuilder" better? And why we need two builder in the function?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:377
+ // %k_dword = udiv i16 %k, 4
+ Value *NDWord = BuilderPrepare.CreateUDiv(N, BuilderPrepare.getInt16(4));
+ Value *KDWord = BuilderPrepare.CreateUDiv(K, BuilderPrepare.getInt16(4));
----------------
Maybe use right shift instruction which is more efficient. Don't the following pass can optimize the operation.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:411
+ Instruction *InsertI = TileLoad;
+ IRBuilder<> BuilderPrepare(TileLoad);
+ BuilderPrepare.SetInsertPoint(TileLoad);
----------------
Is "PreBuilder" better?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:415
+ Value *StrideDWord =
+ BuilderPrepare.CreateUDiv(Stride, BuilderPrepare.getInt64(4));
+ BasicBlock *Start = InsertI->getParent();
----------------
Shift?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:448
+ Instruction *InsertI = TileStore;
+ IRBuilder<> BuilderPrepare(TileStore);
+ BuilderPrepare.SetInsertPoint(TileStore);
----------------
PreBuilder?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:487
+ SmallVector<Instruction *, 8> WorkList;
+ for (BasicBlock *BB : depth_first(&Func)) {
+ for (BasicBlock::iterator II = BB->begin(), IE = BB->end(); II != IE;) {
----------------
Do we iterate the instructions in topology order or in post order?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D93594/new/
https://reviews.llvm.org/D93594
More information about the llvm-commits
mailing list