[PATCH] D91927: [X86] Add x86_amx type for intel AMX.
Pengfei Wang via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Dec 23 19:57:05 PST 2020
pengfei added a comment.
In D91927#2470818 <https://reviews.llvm.org/D91927#2470818>, @LuoYuanke wrote:
> In D91927#2469977 <https://reviews.llvm.org/D91927#2469977>, @pengfei wrote:
>
>>> In my test case, it is transformed after Combine redundant instructions.
>>
>> Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.
>
> Ok, I'll disable the transform for AMX type.
Good job.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:123
- LoadMap[Inst] = std::make_pair(Lo, Hi);
+ auto *Tile = Bitcast->getOperand(0);
+ auto *II = cast<IntrinsicInst>(Tile);
----------------
Value
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:196
+ // %2 = load <256 x i32>, <256 x i32>* %addr, align 1024
+ auto *II = dyn_cast<IntrinsicInst>(Src);
+ if (!II)
----------------
How about the `Tile` comes from tdpbssd?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:47
+ Type *V256I32Ty = VectorType::get(Builder.getInt32Ty(), 256, false);
+ auto AllocaAlignment = DL.getPrefTypeAlign(V256I32Ty);
+ unsigned AllocaAS = DL.getAllocaAddrSpace();
----------------
Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should be enough here.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:124
+ auto *II = cast<IntrinsicInst>(Tile);
+ // Tile is output from AMX intrinsic. The first operand of the
+ // intrinsic is row, the second operand of the intrinsic is column.
----------------
How about the `Tile` comes from tdpbssd?
================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:79
+ // -->
+ // %addr = alloca <256 x i32>, align 1024
+ // store <256 x i32> %src, <256 x i32>* %addr, align 1024
----------------
LuoYuanke wrote:
> pengfei wrote:
> > Why the alignment not be 64?
> 1024 is conservatives, because vector require the alignment to be the vector size. Here generate vector <256 x i32> load/store.
We don't need to align to 1024. 64 should be enough. The same for below comments.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D91927/new/
https://reviews.llvm.org/D91927
More information about the cfe-commits
mailing list