<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hi Florian,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Yes, we use `bitcast` in the frontend to convert between regular vector and the AMX values.<o:p></o:p></p>
<p class="MsoNormal">The approach 1 looks elegant to me. Thank you for the good idea. We will do some prototype for approach 1. Hopefully, it can solve all the issues in the middle-end.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks<o:p></o:p></p>
<p class="MsoNormal">Yuanke<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Florian Hahn <florian_hahn@apple.com> <br>
<b>Sent:</b> Monday, March 22, 2021 11:04 PM<br>
<b>To:</b> Luo, Yuanke <yuanke.luo@intel.com>; llvm-dev <llvm-dev@lists.llvm.org><br>
<b>Cc:</b> Zhang, Xiang1 <xiang1.zhang@intel.com>; James Y Knight <jyknight@google.com><br>
<b>Subject:</b> Re: [llvm-dev] Does middle-end pass need to consider some special type when doing optimization? Or letting back-end to revert the optimization accordingly?<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">On Mar 22, 2021, at 14:02, Luo, Yuanke <<a href="mailto:yuanke.luo@intel.com">yuanke.luo@intel.com</a>> wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Yes, bitcasts introduced by the frontend call amx intrinsics. We use vector to represent 2D amx tile in C language, on the other hand we don’t want to mix our amx tile to other vector operation, so x86_amx is introduced to isolate amx intrinsics
from normal vector operation. The bitcast is to monitor that a normal vector is passed to amx intrinsics. In below example, we need to transform the bitcast to a vector store and an amx load intrinsic. The x86_amx* is unexpected at the beginning, but in the
pass of InstrCombine the middle-end generate the x86_amx pointer.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, i16 %c, i8* %buf, i64 %s) {<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">; CHECK-LABEL: @test_src_add(<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">; CHECK-NEXT: entry:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">; CHECK-NEXT: [[TMP0:%.*]] = alloca <256 x i32>, align 64<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">; CHECK-NEXT: [[ADD:%.*]] = add <256 x i32> [[Y:%.*]], [[X:%.*]]<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">; CHECK-NEXT: [[TMP1:%.*]] = bitcast <256 x i32>* [[TMP0]] to i8*<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">; CHECK-NEXT: store <256 x i32> [[ADD]], <256 x i32>* [[TMP0]], align 1024<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">; CHECK-NEXT: [[TMP2:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 [[C:%.*]], i8* [[TMP1]], i64 64)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP2]])<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">; CHECK-NEXT: ret void<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">;<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">entry:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> %add = add <256 x i32> %y, %x<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> %t = bitcast <256 x i32> %add to x86_amx<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64 %s, x86_amx %t)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> ret void<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">}<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</blockquote>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Ok I think I understand the issue better now. IIUC you use `bitcast` in the frontend to convert between regular vector and the AMX values?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">This doesn’t really match the way `bitcast` is defined (as discussed earlier) and this mismatch seems to be the source of the issues. I don’t think you should use `bitcast`s that way and instead adjust the frontend to emit different code
for the conversion between vector and amx values (e.g. use an intrinsic to convert between vector and amx values; the intrinsic can be directly lowered to the conversion code).<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I think there are at least two ways forward:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">1. Avoid using bitcasts for the conversion in the frontend.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">2. Try & define the semantics of bitcast/load for AMX types, such that the transformations you want to exclude in instcombine are illegal. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">If you decide to go with 2., you probably will have to make a convincing argument why this is the right thing to do and why other alternatives do not work, because it means that certain general transformations that are legal at the moment
become illegal for certain types (which is illustrated by the instcombine patches you mentioned)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Cheers.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Florian<o:p></o:p></p>
</div>
</div>
</body>
</html>