[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

Mon Mar 22 07:02:50 PDT 2021

Yes, bitcasts introduced by the frontend call amx intrinsics. We use vector to represent 2D amx tile in C language, on the other hand we don’t want to mix our amx tile to other vector operation, so x86_amx is introduced to isolate amx intrinsics from normal vector operation. The bitcast is to monitor that a normal vector is passed to amx intrinsics. In below example, we need to transform the bitcast to a vector store and an amx load intrinsic. The x86_amx* is unexpected at the beginning, but in the pass of InstrCombine the middle-end generate the x86_amx pointer.

define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, i16 %c, i8* %buf, i64 %s) {
; CHECK-LABEL: @test_src_add(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    [[TMP0:%.*]] = alloca <256 x i32>, align 64
; CHECK-NEXT:    [[ADD:%.*]] = add <256 x i32> [[Y:%.*]], [[X:%.*]]
; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <256 x i32>* [[TMP0]] to i8*
; CHECK-NEXT:    store <256 x i32> [[ADD]], <256 x i32>* [[TMP0]], align 1024
; CHECK-NEXT:    [[TMP2:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 [[C:%.*]], i8* [[TMP1]], i64 64)
; CHECK-NEXT:    call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP2]])
; CHECK-NEXT:    ret void
;
entry:
  %add = add <256 x i32> %y, %x
  %t = bitcast <256 x i32> %add to x86_amx
  call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64 %s, x86_amx %t)
  ret void
}

Thanks
Yuanke

From: Florian Hahn <florian_hahn at apple.com>
Sent: Monday, March 22, 2021 9:40 PM
To: Zhang, Xiang1 <xiang1.zhang at intel.com>; llvm-dev <llvm-dev at lists.llvm.org>
Cc: James Y Knight <jyknight at google.com>; Luo, Yuanke <yuanke.luo at intel.com>
Subject: Re: [llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?

On Mar 19, 2021, at 02:04, Zhang, Xiang1 via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Yes, that is equivalent, but at Front end, we don’t have existed type to express AMX type.
The “AMX type” in c/c++ language is implied by the following structure:

typedef int tile1024i __attribute__((__vector_size__(1024), __aligned__(64)));
typedef struct __tile1024i_str {
  const unsigned short row;
  const unsigned short col;
  tile1024i tile;
} __tile1024i

So we handle the “%src = load <256 x i32>, <256 x i32>* %addr, align 64       %2 = bitcast <256 x i32> %src to x86_amx”
not “%2 = load x86_amx, x86_amx* %addr, align 64”

Are the bitcasts introduced by the frontend? If you need different semantics for loading from an `x86_amx` pointer, could the frontend generate a call to an intrinsic instead?

Cheers,
Florian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210322/132d411f/attachment-0001.html>