[PATCH] D93594: [X86] Pass to transform amx intrinsics to scalar operation.
LuoYuanke via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 2 01:24:17 PST 2021
LuoYuanke added inline comments.
================
Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:311
+ Value *ResElt = B.CreateAdd(EltC, SubVecR);
+ Value *NewVecC = B.CreateInsertElement(VecCPhi, ResElt, IdxC);
+ Value *NewVecD = B.CreateInsertElement(VecDPhi, ResElt, IdxC);
----------------
yubing wrote:
> pengfei wrote:
> > yubing wrote:
> > > pengfei wrote:
> > > > Is it necessary to insert the ResElt to VecC?
> > > Yes, it is necessary since you should use updated eltC(aka, Cij) when you are doing matrix dotproduct:
> > > Cij =Cij+Ai1.*B1j
> > > Cij =Cij+Ai2.*B2j
> > > ....
> > > Cij =Cij+AiK.*BKj
> > But you don't need to update both C and D. Something like the psudo code should enough:
> > ```
> > for (k : K)
> > Dij += Aik * Bkj;
> > Dij += Cij
> > ```
> I change code into the following style, and it can also reduce inner loop's size:
> ```
> for (k : K)
> Cij += Aik * Bkj;
> Dij = Cij
> ```
> Besides, I hoist the procedure of calculating (i,j)'s linear index above inner loops.
It seems keeping vector C unchanged is simpler. We can eliminate the phi, extract and insert instruction for vector C.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D93594/new/
https://reviews.llvm.org/D93594
More information about the llvm-commits
mailing list