[PATCH] D93594: [X86] Pass to transform amx intrinsics to scalar operation.

Mon Mar 1 23:21:40 PST 2021

yubing marked 15 inline comments as done.
yubing added inline comments.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:311
+  Value *ResElt = B.CreateAdd(EltC, SubVecR);
+  Value *NewVecC = B.CreateInsertElement(VecCPhi, ResElt, IdxC);
+  Value *NewVecD = B.CreateInsertElement(VecDPhi, ResElt, IdxC);
----------------
pengfei wrote:
> yubing wrote:
> > pengfei wrote:
> > > Is it necessary to insert the ResElt to VecC?
> > Yes, it is necessary since you should use updated eltC(aka, Cij) when you are doing matrix dotproduct:
> > Cij =Cij+Ai1.*B1j
> > Cij =Cij+Ai2.*B2j
> > ....
> > Cij =Cij+AiK.*BKj
> But you don't need to update both C and D. Something like the psudo code should enough:
> ```
> for (k : K)
>   Dij += Aik * Bkj;
> Dij += Cij
> ```
I change code into the following style, and it can also reduce inner loop's size:
```
for (k : K)
  Cij += Aik * Bkj;
Dij = Cij
```
Besides, I hoist the procedure of calculating (i,j)'s linear index above inner loops.

================
Comment at: llvm/test/CodeGen/X86/AMX/amx-low-intrinsics-no-bitcast.ll:13
+; CHECK-NEXT:    br label [[TILELOAD_SCALARIZE_ROWS_BODY:%.*]]
+; CHECK:       tileload.scalarize.rows.body:
+; CHECK-NEXT:    br label [[TILELOAD_SCALARIZE_COLS_HEADER:%.*]]
----------------
pengfei wrote:
> It seems the body block is not necessary
In fact, ISEL PASS can merge basicblocks together.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93594/new/

https://reviews.llvm.org/D93594