[PATCH] D93594: [X86] Pass to transform amx intrinsics to scalar operation.
Pengfei Wang via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Feb 9 01:45:18 PST 2021
pengfei added inline comments.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:433
+bool X86LowerAMXIntrinsics::visit() {
+ bool C;
+ SmallVector<Instruction *, 8> TileDPBSSDs;
----------------
`bool C = false`
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:439
+
+ for (BasicBlock *BB : post_order(&Func)) {
+ for (BasicBlock::reverse_iterator II = BB->rbegin(), IE = BB->rend();
----------------
We can use a forward order to iterate it.
Besides, we cannot assume there always be bitcast after e.g. x86_tileloadd64_internal. So we need to insert one bitcast as required.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:470
+
+ for (auto *Inst : TileLoads) {
+ C |= lowerTileLoad(Inst);
----------------
Remove the `{}` for single line loop.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:506
+ X86LowerAMXIntrinsics LAT(F, &DT, &LI);
+ bool C = LAT.visit();
+ return C;
----------------
You can just return it by `return LAT.visit()`.
================
Comment at: llvm/test/CodeGen/X86/AMX/amx-low-intrinsics.ll:60
+entry:
+ %amx = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col, i8* %ptr, i64 %stride)
+ %vec = bitcast x86_amx %amx to <256 x i32>
----------------
Maybe we can use zero mask load in future optimization.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D93594/new/
https://reviews.llvm.org/D93594
More information about the cfe-commits
mailing list