[PATCH] D107544: [X86] [AMX] Replace bitcast with specific AMX intrinsics with X86 specific cast.

Tue Aug 17 00:07:57 PDT 2021

yubing marked 6 inline comments as done.
yubing added inline comments.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:844
+    for (User *V : make_early_inc_range(OldPN->users())) {
+      Instruction *ACI = dyn_cast<Instruction>(V);
+      if (ACI && isAMXCast(ACI)) {
----------------
LuoYuanke wrote:
> What does ACI stand for? AMX cast intrinsic?
Yeah, AMX cast intrinsic

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:920
+    for (Instruction &I : BB) {
+      if (isAMXCast(&I)) {
+        if (PHINode *PN = dyn_cast<PHINode>(I.getOperand(0)))
----------------
LuoYuanke wrote:
> We can erase dead cast code from Vec2TileInsts and Vec2TileInsts and get AMX cast instruction from there, so that we can avoid iterate basic block again.
We will refactor it in next patch

================
Comment at: llvm/test/CodeGen/X86/AMX/lat-combine-amx-bitcast.ll:106
+for.body.i.lr.ph.i:                               ; preds = %wrapper_entry
+  %1 = call x86_amx @llvm.x86.cast.vector.to.tile.v110i32(<110 x i32> undef)
+  %2 = call x86_amx @llvm.x86.cast.vector.to.tile.v616i8(<616 x i8> undef)
----------------
LuoYuanke wrote:
> We can optimize udef or zero vector with tilezero. We may do it in another patch.
Sure

================
Comment at: llvm/test/CodeGen/X86/AMX/lat-transform-amx-bitcast.ll:291
+;
+  %t0 = load <256 x i32>, <256 x i32>* %pa, align 64
+  %t1 = call x86_amx @llvm.x86.cast.vector.to.tile.v256i32(<256 x i32> %t0)
----------------
LuoYuanke wrote:
> We can combine load and cast to @llvm.x86.tileloadd64.internal.
Sure, we will do it in the next patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107544/new/

https://reviews.llvm.org/D107544