[PATCH] D93594: [X86] Pass to transform amx intrinsics to scalar operation.

Tue Feb 9 01:45:18 PST 2021

pengfei added inline comments.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:433
+bool X86LowerAMXIntrinsics::visit() {
+  bool C;
+  SmallVector<Instruction *, 8> TileDPBSSDs;
----------------
`bool C = false`

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:439
+
+  for (BasicBlock *BB : post_order(&Func)) {
+    for (BasicBlock::reverse_iterator II = BB->rbegin(), IE = BB->rend();
----------------
We can use a forward order to iterate it.
Besides, we cannot assume there always be bitcast after e.g. x86_tileloadd64_internal. So we need to insert one bitcast as required.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:470
+
+  for (auto *Inst : TileLoads) {
+    C |= lowerTileLoad(Inst);
----------------
Remove the `{}` for single line loop.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:506
+    X86LowerAMXIntrinsics LAT(F, &DT, &LI);
+    bool C = LAT.visit();
+    return C;
----------------
You can just return it by `return LAT.visit()`.

================
Comment at: llvm/test/CodeGen/X86/AMX/amx-low-intrinsics.ll:60
+entry:
+  %amx = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col, i8* %ptr, i64 %stride)
+  %vec = bitcast x86_amx %amx to <256 x i32>
----------------
Maybe we can use zero mask load in future optimization.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93594/new/

https://reviews.llvm.org/D93594