[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

Wed Dec 23 19:57:05 PST 2020

pengfei added a comment.

In D91927#2470818 <https://reviews.llvm.org/D91927#2470818>, @LuoYuanke wrote:

> In D91927#2469977 <https://reviews.llvm.org/D91927#2469977>, @pengfei wrote:
>
>>> In my test case, it is transformed after Combine redundant instructions.
>>
>> Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.
>
> Ok, I'll disable the transform for AMX type.

Good job.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:123

-  LoadMap[Inst] = std::make_pair(Lo, Hi);
+  auto *Tile = Bitcast->getOperand(0);
+  auto *II = cast<IntrinsicInst>(Tile);
----------------
Value

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:196
+    // %2 = load <256 x i32>, <256 x i32>* %addr, align 1024
+    auto *II = dyn_cast<IntrinsicInst>(Src);
+    if (!II)
----------------
How about the `Tile` comes from tdpbssd?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:47
+  Type *V256I32Ty = VectorType::get(Builder.getInt32Ty(), 256, false);
+  auto AllocaAlignment = DL.getPrefTypeAlign(V256I32Ty);
+  unsigned AllocaAS = DL.getAllocaAddrSpace();
----------------
Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should be enough here.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:124
+  auto *II = cast<IntrinsicInst>(Tile);
+  // Tile is output from AMX intrinsic. The first operand of the
+  // intrinsic is row, the second operand of the intrinsic is column.
----------------
How about the `Tile` comes from tdpbssd?

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:79
+    // -->
+    // %addr = alloca <256 x i32>, align 1024
+    // store <256 x i32> %src, <256 x i32>* %addr, align 1024
----------------
LuoYuanke wrote:
> pengfei wrote:
> > Why the alignment not be 64?
> 1024 is conservatives, because vector require the alignment to be the vector size. Here generate vector <256 x i32> load/store.
We don't need to align to 1024. 64 should be enough. The same for below comments.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927