[PATCH] D93594: [X86] Pass to transform amx intrinsics to scalar operation.

Sat Feb 20 00:57:24 PST 2021

LuoYuanke added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/Passes.h:496
+
+  FunctionPass *createX86LowerAMXIntrinsicsPass();
 } // End llvm namespace
----------------
Add comments to describe what the pass does?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:1
+//===- llvm/CodeGen/TileShapeInfo.h - ---------------------------*- C++ -*-===//
+//
----------------
This seems wrong file name.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:11
+/// This pass is only enabled with -O0. With -O0, the def of shape to amx
+/// intrinsics is near the amx intrinsics code. We are not bale to find a
+/// point which post-dominate all the shape and dominate all amx intrinsics.
----------------
Type 'able'.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:159
+                                 Value *Ptr, Value *Stride, Value *Tile) {
+  Loop *RowLoop = LI.AllocateLoop();
+  Loop *ColLoop = LI.AllocateLoop();
----------------
Not sure if we can extract the common code of createTileLoadLoops and createTileStoreLoops, so that it can be used by both and some other functions.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:250
+  FixedVectorType *V256I32Ty = FixedVectorType::get(B.getInt32Ty(), 256);
+  // Type *EltTy = V256I32Ty->getElementType();
+  Value *VecC, *VecA, *VecB;
----------------
Delete the dead code.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:267
+  // %vec.c.phi.row = phi <256 x i32> [ %VecC, %continue ], [ %NewVecC,
+  // %tiledpbssd.scalarize.rows.latch ] %vec.d.phi.row = phi <256 x i32> [
+  // zeroinitializer, %continue ], [ %NewVecD, %tiledpbssd.scalarize.rows.latch
----------------
It should be in another line.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:280
+  // %tiledpbssd.scalarize.rows.body ], [ %NewVecC,
+  // %tiledpbssd.scalarize.cols.latch ] %vec.d.phi.col = phi <256 x i32> [
+  // %vec.d.phi.row, %tiledpbssd.scalarize.rows.body ], [ %NewVecD,
----------------
Better to be in a new line.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:292
+  // %tiledpbssd.scalarize.cols.body ], [ %NewVecC,
+  // %tiledpbssd.scalarize.inner.latch ] %vec.d.inner.phi = phi <256 x i32> [
+  // %vec.d.phi.col, %tiledpbssd.scalarize.cols.body ], [ %NewVecD,
----------------
Better to be in a new line.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:372
+  Instruction *InsertI = TileDPBSSD;
+  IRBuilder<> BuilderPrepare(TileDPBSSD);
+  BuilderPrepare.SetInsertPoint(TileDPBSSD);
----------------
The name seems not good. Is "PreBuilder" better? And why we need two builder in the function?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:377
+  // %k_dword = udiv i16 %k, 4
+  Value *NDWord = BuilderPrepare.CreateUDiv(N, BuilderPrepare.getInt16(4));
+  Value *KDWord = BuilderPrepare.CreateUDiv(K, BuilderPrepare.getInt16(4));
----------------
Maybe use right shift instruction which is more efficient. Don't the following pass can optimize the operation.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:411
+  Instruction *InsertI = TileLoad;
+  IRBuilder<> BuilderPrepare(TileLoad);
+  BuilderPrepare.SetInsertPoint(TileLoad);
----------------
Is "PreBuilder" better?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:415
+  Value *StrideDWord =
+      BuilderPrepare.CreateUDiv(Stride, BuilderPrepare.getInt64(4));
+  BasicBlock *Start = InsertI->getParent();
----------------
Shift?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:448
+  Instruction *InsertI = TileStore;
+  IRBuilder<> BuilderPrepare(TileStore);
+  BuilderPrepare.SetInsertPoint(TileStore);
----------------
PreBuilder?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:487
+  SmallVector<Instruction *, 8> WorkList;
+  for (BasicBlock *BB : depth_first(&Func)) {
+    for (BasicBlock::iterator II = BB->begin(), IE = BB->end(); II != IE;) {
----------------
Do we iterate the instructions in topology order or in post order?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93594/new/

https://reviews.llvm.org/D93594