<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 8/14/20 6:27 AM, Luo, Yuanke via
llvm-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:SN6PR11MB313542E13EAA851C2E6ED43D9A400@SN6PR11MB3135.namprd11.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@SimSun";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin-top:0cm;
margin-right:0cm;
margin-bottom:8.0pt;
margin-left:0cm;
line-height:105%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:8.0pt;
margin-left:0cm;
text-indent:21.0pt;
line-height:105%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle21
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1441491452;
mso-list-type:hybrid;
mso-list-template-ids:-344847632 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Intel Advanced Matrix
Extensions (Intel AMX) is a new programming paradigm
consisting of two components: a set of 2-dimensional
registers (tiles) representing sub-arrays from a larger
2-dimensional memory image, and accelerators able to operate
on tiles. Capability of Intel AMX implementation is
enumerated by palettes. Two palettes are supported: palette
0 represents the initialized state and palette 1 consists of
8 tile registers of up to 1 KB size, which is controlled by
a tile control register.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The instruction manual
is posted at <a
href="https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html"
moz-do-not-send="true">
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html</a>.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The AMX abi proposal is
posted at <a
href="https://groups.google.com/g/x86-64-abi/c/NRejFm7pwb4"
moz-do-not-send="true">
https://groups.google.com/g/x86-64-abi/c/NRejFm7pwb4</a>.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">This email is to discuss
the programming model for AMX. Florian has introduced the
matrix type and intrinsics in LLVM community. We’d like to
adopt some ideas from it.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Here is what we propose
for the AMX programming model.<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">1.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US"> Data
type. <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We’d like to have fixed
vector type for AMX. Since the shape to AMX register can be
configurable, the vector size is the maximum size of AMX
register. That means the vector size is 1024 bytes.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The C code may look like
this.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">typedef int _tile_data
__attribute__((__vector_size__(1024), __aligned__(64)));<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">_tile_data tile;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">And the LLVM IR may look
like this.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">@tile = dso_local
local_unnamed_addr global <256 x i32> zeroinitializer,
align 64<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">For llvm IR, it is nice
to have a new type x86_amxtile that can be mapped to AMX
registers.<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">2.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">AMX
Intrinsics. <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The internal intrinsics
are 1:1 mapped to AMX instructions. The parameter m, n, k
identifies the shape of the tile. The shape can be variable,
but it cannot exceed the size that AMX HW can support.
Compiler can deduce shape of the tile from the AMX
intrinsics.<o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:5.5pt"><span
lang="EN-US">_tile_data _tile_loadd_internal(char m, short
n, const void *base, int stride);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">_tile_data
_tile_dpbssd_internal(char m, short n, short k, _tile_data
dst, _tile_data src1, _tile_data src2);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">_tile_data
_tile_dpbf16ps_internal(char m, short n, short k, _tile_data
dst, _tile_data src1, _tile_data src2);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">void
_tile_stored_internal(char m, short n, void *base, int
stride, _tile_data tile);<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">3.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">User
interfaces.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The tile shape and tile
data are combined into a struct in C language. The shape of
the tile is only allowed to be initialized once. The user
interface looks as this.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> 3 #define
__DEFAULT_FN_AMX \<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> 4
__attribute__((__always_inline__, __nodebug__,
__target__("amx-int8")))<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> 9 typedef struct
__tile_str {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">10 const char row;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">11 const short col;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">12 _tile_data tile;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">13 }__tile;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">14<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">15 __DEFAULT_FN_AMX<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">16 void
__tile_loadd(__tile *dst, const void *base, long stride) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">17 dst->tile =
_tile_loadd_internal(dst->row, dst->col, base,
stride);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">18 }<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">19<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">20 __DEFAULT_FN_AMX<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">21 void
__tile_dpbsud(__tile *dst, __tile src1, __tile src2) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">22 dst->tile =
_tile_dpbssd_internal(src1.row, src2.col, src1.col,
dst->tile, src1.tile, src2.tile);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">23 }<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">24<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">25 __DEFAULT_FN_AMX<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">26 void
__tile_stored(void *base, long stride, __tile src) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">27
_tile_stored_internal(src.row, src.col, base, stride,
src.tile);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">28 }<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">4.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Example
code<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The example shows how to
use the user interface in a function.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> 51 void api(int cond,
short row, short col) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">52 __tile a = {row,
col};<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">53 __tile b = {row,
col};<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">54 __tile c = {row,
col};<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">55<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">56 if(cond) {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">57
__tile_loadd(&a, buf, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">58
__tile_loadd(&b, buf, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">59
__tile_loadd(&c, buf, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">60 } else {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">61
__tile_loadd(&a, buf2, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">62
__tile_loadd(&b, buf2, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">63
__tile_loadd(&c, buf2, STRIDE);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">64 }<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="IT">65 __tile_dpbsud(&c,
a, b);</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">66 __tile_stored(buf,
STRIDE, c);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">67 }<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">5.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">LLVM
IR<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The LLVM intrinsics IR
take the row and column information as the input parameter,
so that compiler can deduce the shape of tile data. The
remaining parameters are what AMX instructions require. This
is the LLVM IR corresponding to the example code.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">12 define dso_local void
@api(i32 %cond, i16 signext %row, i16 signext %col)
local_unnamed_addr #2 {<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">13 entry:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">14 %tobool = icmp eq
i32 %cond, 0<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">15 %sext = shl i16
%col, 8<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">16 %conv.i31 = ashr
exact i16 %sext, 8<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">17 br i1 %tobool,
label %if.else, label %if.then<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">18<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">19
if.then: ; preds =
%entry<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">20 %0 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024 x
i8]* @buf, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">21 %1 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024 x
i8]* @buf, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">22 %2 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024 x
i8]* @buf, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">23 br label %if.end<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">24<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">25
if.else: ; preds =
%entry<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">26 %3 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024 x
i8]* @buf2, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">27 %4 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024 x
i8]* @buf2, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">28 %5 = tail call
<256 x i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x i8], [1024 x
i8]* @buf2, i64 0, i64 0), i64 32) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">29 br label %if.end<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">30<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">31
if.end: ; preds =
%if.else, %if.then<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">32 %a.sroa.1186.0 =
phi <256 x i32> [ %3, %if.else ], [ %0, %if.then ]<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">33 %b.sroa.1068.0 =
phi <256 x i32> [ %4, %if.else ], [ %1, %if.then ]<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">34 %c.sroa.1149.0 =
phi <256 x i32> [ %5, %if.else ], [ %2, %if.then ]<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">35 %6 = tail call
<256 x i32> @llvm.x86.tdpbssd(i16 %row, i16 %conv.i31,
i16 %conv.i31, <256 x i32> %c.sroa.1149.0, <256 x
i32> %a.sroa.1186.0, <256 x i32> %b.sroa.1068.0) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">36 tail call void
@llvm.x86.tilestored64(i16 %row, i16 %conv.i31, i8*
getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64
0, i64 0), i64 32, <256 x i32> %6) #3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">37 ret void<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">38 }<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">6.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Shape
propagation<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">When in -O0 build, some
general load/store for tile vector is generated by
front-end. We need to root from AMX intrinsics to propagate
the shape information to the virtual tile register. If the
an AMX intrinsic use the result of load instruction, the
shape is propagated to the load and the load is transformed
to tile load intrinsic. If the store instruction uses any
result of AMX intrinsic, the shape is propagated to store
instruction and the store is transformed to tile store
intrinsic<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">7.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Machine
IR<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Since the AMX intrinsics
take the row and column as the input parameters, we can
create a pseudo instruction corresponding to it. The AMX
intrinsics are lowered to the pseudo AMX instruction which
has extra row and column operands corresponding to AMX
intrinsic. The real AMX instructions don’t need the row and
column operands. The row and column information should be
configured by ldtilecfg before executing any AMX
instruction.<o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">8.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Register
allocation<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">AMX register is special.
It needs to be configured before use and the config
instruction is expensive. To avoid unnecessary tile
configure, we collect the tile shape information as much as
possible and combine them into one ldtilecfg instruction.
The ldtilecfg instruction should dominate any AMX
instruction that access tile register. On the other side,
the ldtilecfg should post-dominated the instruction that
define the tile shape. For tile register spill, it should
avoid re-config due to the different tile shape, the spilled
register should be reloaded to the register that share the
same tile shape. Since tile register allocation is special
and it may allocate general virtual register to configure
tile register, we can add a sperate pass to do it before
general register allocation pass. After register allocation,
the tile shape information is not needed anymore, so we can
transform the pseudo AMX instruction to real AMX instruction
by removing the row and column operands.</span></p>
</div>
</blockquote>
<p>This seems complicated.</p>
<p>Reading through the documentation, there appears to be a single
global tile config for all tile registers at any time.</p>
<p>Why not simply model this tile config as a designated special
register and the tile instructions as having an implicit use of
this register? That would seem to ensure that the register
allocator has all the constraints needed. You'd need to teach it
how to spill the special registers with the appropriate
instructions, but that seems a lot more straight forward?</p>
<blockquote type="cite"
cite="mid:SN6PR11MB313542E13EAA851C2E6ED43D9A400@SN6PR11MB3135.namprd11.prod.outlook.com">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoListParagraph"
style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span lang="EN-US"><span
style="mso-list:Ignore">9.<span style="font:7.0pt
"Times New Roman"">
</span></span></span><!--[endif]--><span lang="EN-US">Use
recommendation <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Due to the shape
configure issue, we recommend user to define the tile shape
at the entry of the function entry and inline function as
much as possible. The AMX instructions focus on computation
instead of storage, so global variable for tile data is not
recommended.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%" lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%" lang="EN-US">Thanks<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%" lang="EN-US">Yuanke<o:p></o:p></span></p>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
</body>
</html>