[PATCH] D124378: [X86][AMX] combine tile cast and load/store instruction.

Thu Jun 15 02:32:38 PDT 2023

LuoYuanke added inline comments.

================
Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:930
+  // stride.
+  Value *Stride = Builder.getInt64(64);
+  Value *I8Ptr =
----------------
yubing wrote:
> LuoYuanke wrote:
> > yubing wrote:
> > > Why stride is 64 here instead of Col?
> > Both 64 and Col should work as long as load/store keep the same stride value, but 64 is constant, so it is prefered.
> how about the following IR:
> %tile = call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 32, i8* %src_ptr, i64 64)
> %vec = call <256 x i8> @llvm.x86.cast.tile.to.vector.v256i8(x86_amx...%tile)
> store <256 x i8> %vec, <256 x i8>* %dst_ptr, align 256
> 
> if you combine into:
> %tile = call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 32, i8* %src_ptr, i64 64)
> call void @llvm.x86.tilestored64.internal(i16 8, i16 32, i8* %dst_ptr, i64 64, x86_amx %tile)
> 
> definitely it will out of bound.
Why there is <256 x i8>? Shouldn't the tile size be <256 x i32> which is 1024 bytes?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124378/new/

https://reviews.llvm.org/D124378