<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p>I find your answer unconvincing. I'm not going to debate it as I
don't wish to take the time to build the appropriate context, but
my initial response is skepticism.</p>
<p>Philip<br>
</p>
<div class="moz-cite-prefix">On 8/14/20 4:49 PM, Luo, Yuanke wrote:<br>
</div>
<blockquote type="cite"
cite="mid:SN6PR11MB3135825D36C644EA2DFD84609A400@SN6PR11MB3135.namprd11.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@SimSun";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin-top:0cm;
margin-right:0cm;
margin-bottom:8.0pt;
margin-left:0cm;
line-height:105%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:8.0pt;
margin-left:0cm;
text-indent:21.0pt;
line-height:105%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Courier New";}
span.EmailStyle23
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1441491452;
mso-list-type:hybrid;
mso-list-template-ids:-344847632 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">[Yuanke] AMX register is
special. It needs to be configured before use and the config
instruction is expensive. To avoid unnecessary tile
configure, we collect the tile shape information as much as
possible and combine them into one ldtilecfg instruction.
The ldtilecfg instruction should dominate any AMX
instruction that access tile register. On the other side,
the ldtilecfg should post-dominated the instruction that
define the tile shape. For tile register spill, it should
avoid re-config due to the different tile shape, the spilled
register should be reloaded to the register that share the
same tile shape. Since tile register allocation is special
and it may allocate general virtual register to configure
tile register, we can add a sperate pass to do it before
general register allocation pass. After register allocation,
the tile shape information is not needed anymore, so we can
transform the pseudo AMX instruction to real AMX instruction
by removing the row and column operands.<o:p></o:p></span></p>
<p><span lang="EN-US">[Philip]<o:p></o:p></span></p>
<p><span lang="EN-US">This seems complicated.<o:p></o:p></span></p>
<p><span lang="EN-US">Reading through the documentation, there
appears to be a single global tile config for all tile
registers at any time.<o:p></o:p></span></p>
<p><span lang="EN-US">Why not simply model this tile config as a
designated special register and the tile instructions as
having an implicit use of this register? That would seem to
ensure that the register allocator has all the constraints
needed. You'd need to teach it how to spill the special
registers with the appropriate instructions, but that seems
a lot more straight forward?<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%" lang="EN-US">[Yuanke]
In that case user need to configure the tile register by
themselves. Spilling configure register is very expensive,
because it clears all the tile data register to zero. In our
proposal, compiler is responsible to deduce the shape for
virtual of tile data register, allocate physical registers
for them and then configure those physical register. We may
build the dependency as you proposed and it can be used for
machine IR check to ensure tile data register is configured
before use. <o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%" lang="EN-US"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"
style="margin-bottom:0cm;margin-bottom:.0001pt;line-height:normal">
<b><span lang="EN-US">From:</span></b><span lang="EN-US">
Philip Reames <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a>
<br>
<b>Sent:</b> Saturday, August 15, 2020 1:17 AM<br>
<b>To:</b> Luo, Yuanke <a class="moz-txt-link-rfc2396E" href="mailto:yuanke.luo@intel.com"><yuanke.luo@intel.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>; <a class="moz-txt-link-abbreviated" href="mailto:florian_hahn@apple.com">florian_hahn@apple.com</a>; Kaylor,
Andrew <a class="moz-txt-link-rfc2396E" href="mailto:andrew.kaylor@intel.com"><andrew.kaylor@intel.com></a>; Topper, Craig
<a class="moz-txt-link-rfc2396E" href="mailto:craig.topper@intel.com"><craig.topper@intel.com></a>; Lu, Hongjiu
<a class="moz-txt-link-rfc2396E" href="mailto:hongjiu.lu@intel.com"><hongjiu.lu@intel.com></a><br>
<b>Subject:</b> Re: [llvm-dev] Intel AMX programming
model discussion.<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p><span lang="EN-US"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-US">On 8/14/20 6:27 AM,
Luo, Yuanke via llvm-dev wrote:<o:p></o:p></span></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span lang="EN-US">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Intel Advanced Matrix
Extensions (Intel AMX) is a new programming paradigm
consisting of two components: a set of 2-dimensional
registers (tiles) representing sub-arrays from a larger
2-dimensional memory image, and accelerators able to
operate on tiles. Capability of Intel AMX implementation
is enumerated by palettes. Two palettes are supported:
palette 0 represents the initialized state and palette 1
consists of 8 tile registers of up to 1 KB size, which is
controlled by a tile control register.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The instruction manual
is posted at <a
href="https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html"
moz-do-not-send="true">
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html</a>.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The AMX abi proposal
is posted at <a
href="https://groups.google.com/g/x86-64-abi/c/NRejFm7pwb4"
moz-do-not-send="true">
https://groups.google.com/g/x86-64-abi/c/NRejFm7pwb4</a>.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">This email is to
discuss the programming model for AMX. Florian has
introduced the matrix type and intrinsics in LLVM
community. We’d like to adopt some ideas from it.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Here is what we
propose for the AMX programming model.<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">1.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US"> Data
type. <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We’d like to have
fixed vector type for AMX. Since the shape to AMX register
can be configurable, the vector size is the maximum size
of AMX register. That means the vector size is 1024 bytes.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The C code may look
like this.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">typedef int _tile_data
__attribute__((__vector_size__(1024), __aligned__(64)));<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">_tile_data tile;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">And the LLVM IR may
look like this.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">@tile = dso_local
local_unnamed_addr global <256 x i32>
zeroinitializer, align 64<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">For llvm IR, it is
nice to have a new type x86_amxtile that can be mapped to
AMX registers.<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">2.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">AMX
Intrinsics. <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The internal
intrinsics are 1:1 mapped to AMX instructions. The
parameter m, n, k identifies the shape of the tile. The
shape can be variable, but it cannot exceed the size that
AMX HW can support. Compiler can deduce shape of the tile
from the AMX intrinsics.<o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:5.5pt"><span
lang="EN-US">_tile_data _tile_loadd_internal(char m, short
n, const void *base, int stride);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">_tile_data
_tile_dpbssd_internal(char m, short n, short k, _tile_data
dst, _tile_data src1, _tile_data src2);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">_tile_data
_tile_dpbf16ps_internal(char m, short n, short k,
_tile_data dst, _tile_data src1, _tile_data src2);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">void
_tile_stored_internal(char m, short n, void *base, int
stride, _tile_data tile);<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">3.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">User
interfaces.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The tile shape and
tile data are combined into a struct in C language. The
shape of the tile is only allowed to be initialized once.
The user interface looks as this.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> 3 #define
__DEFAULT_FN_AMX \<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> 4
__attribute__((__always_inline__, __nodebug__,
__target__("amx-int8")))<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> 9 typedef struct
__tile_str {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">10 const char row;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">11 const short col;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">12 _tile_data tile;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">13 }__tile;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">14<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">15 __DEFAULT_FN_AMX<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">16 void
__tile_loadd(__tile *dst, const void *base, long stride) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">17 dst->tile =
_tile_loadd_internal(dst->row, dst->col, base,
stride);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">18 }<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">19<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">20 __DEFAULT_FN_AMX<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">21 void
__tile_dpbsud(__tile *dst, __tile src1, __tile src2) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">22 dst->tile =
_tile_dpbssd_internal(src1.row, src2.col, src1.col,
dst->tile, src1.tile, src2.tile);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">23 }<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">24<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">25 __DEFAULT_FN_AMX<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">26 void
__tile_stored(void *base, long stride, __tile src) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">27
_tile_stored_internal(src.row, src.col, base, stride,
src.tile);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">28 }<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">4.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Example
code<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The example shows how
to use the user interface in a function.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> 51 void api(int cond,
short row, short col) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">52 __tile a = {row,
col};<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">53 __tile b = {row,
col};<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">54 __tile c = {row,
col};<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">55<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">56 if(cond) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">57
__tile_loadd(&a, buf, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">58
__tile_loadd(&b, buf, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">59
__tile_loadd(&c, buf, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">60 } else {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">61
__tile_loadd(&a, buf2, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">62
__tile_loadd(&b, buf2, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">63
__tile_loadd(&c, buf2, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">64 }<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="IT">65
__tile_dpbsud(&c, a, b);</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">66
__tile_stored(buf, STRIDE, c);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">67 }<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">5.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">LLVM
IR<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The LLVM intrinsics IR
take the row and column information as the input
parameter, so that compiler can deduce the shape of tile
data. The remaining parameters are what AMX instructions
require. This is the LLVM IR corresponding to the example
code.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">12 define dso_local
void @api(i32 %cond, i16 signext %row, i16 signext %col)
local_unnamed_addr #2 {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">13 entry:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">14 %tobool = icmp eq
i32 %cond, 0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">15 %sext = shl i16
%col, 8<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">16 %conv.i31 = ashr
exact i16 %sext, 8<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">17 br i1 %tobool,
label %if.else, label %if.then<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">18<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">19
if.then: ; preds
= %entry<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">20 %0 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">21 %1 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">22 %2 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">23 br label %if.end<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">24<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">25
if.else: ; preds
= %entry<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">26 %3 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf2, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">27 %4 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf2, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">28 %5 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf2, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">29 br label %if.end<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">30<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">31
if.end: ; preds
= %if.else, %if.then<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">32 %a.sroa.1186.0 =
phi <256 x i32> [ %3, %if.else ], [ %0, %if.then ]<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">33 %b.sroa.1068.0 =
phi <256 x i32> [ %4, %if.else ], [ %1, %if.then ]<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">34 %c.sroa.1149.0 =
phi <256 x i32> [ %5, %if.else ], [ %2, %if.then ]<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">35 %6 = tail call
<256 x i32> @llvm.x86.tdpbssd(i16 %row, i16
%conv.i31, i16 %conv.i31, <256 x i32>
%c.sroa.1149.0, <256 x i32> %a.sroa.1186.0, <256
x i32> %b.sroa.1068.0) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">36 tail call void
@llvm.x86.tilestored64(i16 %row, i16 %conv.i31, i8*
getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf,
i64 0, i64 0), i64 32, <256 x i32> %6) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">37 ret void<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">38 }<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">6.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Shape
propagation<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">When in -O0 build,
some general load/store for tile vector is generated by
front-end. We need to root from AMX intrinsics to
propagate the shape information to the virtual tile
register. If the an AMX intrinsic use the result of load
instruction, the shape is propagated to the load and the
load is transformed to tile load intrinsic. If the store
instruction uses any result of AMX intrinsic, the shape is
propagated to store instruction and the store is
transformed to tile store intrinsic<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">7.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Machine
IR<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Since the AMX
intrinsics take the row and column as the input
parameters, we can create a pseudo instruction
corresponding to it. The AMX intrinsics are lowered to the
pseudo AMX instruction which has extra row and column
operands corresponding to AMX intrinsic. The real AMX
instructions don’t need the row and column operands. The
row and column information should be configured by
ldtilecfg before executing any AMX instruction.<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">8.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Register
allocation<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">AMX register is
special. It needs to be configured before use and the
config instruction is expensive. To avoid unnecessary tile
configure, we collect the tile shape information as much
as possible and combine them into one ldtilecfg
instruction. The ldtilecfg instruction should dominate any
AMX instruction that access tile register. On the other
side, the ldtilecfg should post-dominated the instruction
that define the tile shape. For tile register spill, it
should avoid re-config due to the different tile shape,
the spilled register should be reloaded to the register
that share the same tile shape. Since tile register
allocation is special and it may allocate general virtual
register to configure tile register, we can add a sperate
pass to do it before general register allocation pass.
After register allocation, the tile shape information is
not needed anymore, so we can transform the pseudo AMX
instruction to real AMX instruction by removing the row
and column operands.<o:p></o:p></span></p>
</blockquote>
<p><span lang="EN-US">This seems complicated.<o:p></o:p></span></p>
<p><span lang="EN-US">Reading through the documentation, there
appears to be a single global tile config for all tile
registers at any time.<o:p></o:p></span></p>
<p><span lang="EN-US">Why not simply model this tile config as a
designated special register and the tile instructions as
having an implicit use of this register? That would seem to
ensure that the register allocator has all the constraints
needed. You'd need to teach it how to spill the special
registers with the appropriate instructions, but that seems
a lot more straight forward?<o:p></o:p></span></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">9.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Use
recommendation <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Due to the shape
configure issue, we recommend user to define the tile
shape at the entry of the function entry and inline
function as much as possible. The AMX instructions focus
on computation instead of storage, so global variable for
tile data is not recommended.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%" lang="EN-US"> </span><span
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%" lang="EN-US">Thanks</span><span
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%" lang="EN-US">Yuanke</span><span
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"
style="margin-bottom:0cm;margin-bottom:.0001pt;line-height:normal">
<span lang="EN-US"><br>
<br>
<o:p></o:p></span></p>
<pre><span lang="EN-US">_______________________________________________<o:p></o:p></span></pre>
<pre><span lang="EN-US">LLVM Developers mailing list<o:p></o:p></span></pre>
<pre><span lang="EN-US"><a href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><o:p></o:p></span></pre>
<pre><span lang="EN-US"><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><o:p></o:p></span></pre>
</blockquote>
</div>
</blockquote>
</body>
</html>