[llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?
Luo, Yuanke via llvm-dev
llvm-dev at lists.llvm.org
Mon Mar 22 07:02:50 PDT 2021
Yes, bitcasts introduced by the frontend call amx intrinsics. We use vector to represent 2D amx tile in C language, on the other hand we don’t want to mix our amx tile to other vector operation, so x86_amx is introduced to isolate amx intrinsics from normal vector operation. The bitcast is to monitor that a normal vector is passed to amx intrinsics. In below example, we need to transform the bitcast to a vector store and an amx load intrinsic. The x86_amx* is unexpected at the beginning, but in the pass of InstrCombine the middle-end generate the x86_amx pointer.
define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, i16 %c, i8* %buf, i64 %s) {
; CHECK-LABEL: @test_src_add(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = alloca <256 x i32>, align 64
; CHECK-NEXT: [[ADD:%.*]] = add <256 x i32> [[Y:%.*]], [[X:%.*]]
; CHECK-NEXT: [[TMP1:%.*]] = bitcast <256 x i32>* [[TMP0]] to i8*
; CHECK-NEXT: store <256 x i32> [[ADD]], <256 x i32>* [[TMP0]], align 1024
; CHECK-NEXT: [[TMP2:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 [[C:%.*]], i8* [[TMP1]], i64 64)
; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP2]])
; CHECK-NEXT: ret void
;
entry:
%add = add <256 x i32> %y, %x
%t = bitcast <256 x i32> %add to x86_amx
call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64 %s, x86_amx %t)
ret void
}
Thanks
Yuanke
From: Florian Hahn <florian_hahn at apple.com>
Sent: Monday, March 22, 2021 9:40 PM
To: Zhang, Xiang1 <xiang1.zhang at intel.com>; llvm-dev <llvm-dev at lists.llvm.org>
Cc: James Y Knight <jyknight at google.com>; Luo, Yuanke <yuanke.luo at intel.com>
Subject: Re: [llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?
On Mar 19, 2021, at 02:04, Zhang, Xiang1 via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Yes, that is equivalent, but at Front end, we don’t have existed type to express AMX type.
The “AMX type” in c/c++ language is implied by the following structure:
typedef int tile1024i __attribute__((__vector_size__(1024), __aligned__(64)));
typedef struct __tile1024i_str {
const unsigned short row;
const unsigned short col;
tile1024i tile;
} __tile1024i
So we handle the “%src = load <256 x i32>, <256 x i32>* %addr, align 64 %2 = bitcast <256 x i32> %src to x86_amx”
not “%2 = load x86_amx, x86_amx* %addr, align 64”
Are the bitcasts introduced by the frontend? If you need different semantics for loading from an `x86_amx` pointer, could the frontend generate a call to an intrinsic instead?
Cheers,
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210322/132d411f/attachment-0001.html>
More information about the llvm-dev
mailing list