[PATCH] D93594: [X86] Pass to transform amx intrinsics to scalar operation.

Wed Mar 3 05:52:55 PST 2021

pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM with some nitpicks 😊

================
Comment at: llvm/include/llvm/CodeGen/Passes.h:496
+
+  /// The pass transform amx intrinsics to scalar operation if the function has
+  /// optnone attribute or it is O0.
----------------
transforms 

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:9
+//
+/// \file Pass to transform amx intrinsics to scalar operation.
+/// This pass is only enabled with -O0. With -O0, the def of shape to amx
----------------
operations

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:10
+/// \file Pass to transform amx intrinsics to scalar operation.
+/// This pass is only enabled with -O0. With -O0, the def of shape to amx
+/// intrinsics is near the amx intrinsics code. We are not able to find a
----------------
We always enable it. Also need to mention optnone.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:46
+static bool isV256I32Ty(Type *Ty) {
+  if (auto *FVT = dyn_cast<FixedVectorType>(Ty)) {
+    return FVT->getNumElements() == 256 &&
----------------
Curly brackets are not necessary here.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:82
+  DTU.applyUpdatesPermissive({
+      {DominatorTree::Delete, Preheader, Tmp},
+      {DominatorTree::Insert, Header, Body},
----------------
Do we need to remove the successor? Isn't it still being dominated?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:96
+
+template <bool isTileLoad>
+static Value *createTileLoadStoreLoops(BasicBlock *Start, BasicBlock *End,
----------------
IsTileLoad

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:117
+  BasicBlock *ColBody =
+      createLoop(RowBody, RowLatch, Col, B.getInt16(ColStep),
+                 IntrinName + ".scalarize.cols", B, DTU, ColLoop, LI);
----------------
Use 1 directly?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:121
+  BasicBlock *ColLoopLatch = ColBody->getSingleSuccessor();
+  BasicBlock *ColumnLoopHeader = ColBody->getSinglePredecessor();
+  BasicBlock *RowLoopHeader = RowBody->getSinglePredecessor();
----------------
Better use the same naming conversion, i.e. `ColLoopHeader`

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:125-126
+  Value *CurrentCol = &*ColumnLoopHeader->begin();
+  FixedVectorType *V256I32Ty = FixedVectorType::get(B.getInt32Ty(), 256);
+  Type *EltTy = V256I32Ty->getElementType();
+
----------------
Better to change the order, e.g.
```
Type *EltTy = B.getInt32Ty();
FixedVectorType *V256I32Ty = FixedVectorType::get(EltTy, 256);
```

================
Comment at: llvm/test/CodeGen/X86/AMX/amx-low-intrinsics-no-amx-bitcast.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -mtriple=x86_64 -lower-amx-intrinsics %s -S | FileCheck %s
----------------
I think we should move the files to llvm/test/Transforms/

================
Comment at: llvm/test/CodeGen/X86/AMX/amx-type.ll:2
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -lower-amx-type %s -S | FileCheck %s
+; RUN: opt --codegen-opt-level=2 -mtriple=x86_64 -lower-amx-type %s -S | FileCheck %s

----------------
Why adding this? Is it O2 by default?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93594/new/

https://reviews.llvm.org/D93594