<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
<p>Hi, Yuanke,</p>
<p>Thanks for writing this up. Let me back up a bit because the
scheme I proposed last week doesn't work without further
modification: within a particular "configuration region" (i.e.,
the code in between the LDTILECFG and the TILERELEASE (or next
LDTILECFG)), each tile register can only be used with one shape,
and in addition, no register can have its shape changed without
zeroing out all of the tile registers. Thus, just using different
register classes for the different shapes, as I had suggested,
isn't sufficient to model the allocation requirements. That would
not prevent the same register from essentially being assigned to
differently-shaped virtual registers with non-overlapping live
ranges within one configuration region.</p>
<p>Also, as you point out, when multiple non-static tile shapes are
in use, if you use one register class for each shape, you would
need different register classes for these too. Luckily, I don't
think that using the separate register classes actually buys us
anything, so please disregard that suggestion of mine. Use only
one register class.</p>
<p>Once the configuration regions are identified, you'll know how
many tile register shapes are required. If this number is greater
than eight, then you'll need to cut the region (requiring all live
tiles to be spilled and restored around each re-configuration
point). After that, we'll assume that we have eight or fewer
distinct shapes.</p>
<p>Now the problem is that you need to allocate registers,
satisfying all of the usual constraints (non-overlapping live
ranges, etc.), but with an additional constraint: once a physical
register has been used with some particular tile shape, it cannot
be assigned to any other tile shape.</p>
<p>I think that the current infrastructure can support this as
follows:</p>
<p> 1. Add an override X86RegisterInfo::getRegAllocationHints. Like
SystemZRegisterInfo::getRegAllocationHints does sometimes, when
hinting the tile registers, the function will return true (to
indicate a hard constraint). As registers are assigned in
RegAllocGreedy, getRegAllocationHints is called for each virtual
register. For virtual tile registers, look at the passed
VirtRegMap, etc. for already-assigned tile virtual registers with
different shape requirements as the current virtual register
(you'll need to cache the shape requirements in
X86MachineFunctionInfo for this to be efficient), and return a
hints list consisting of all other non-reserved tile registers.</p>
<p> 2. To support RegAllocFast, which doesn't use
getRegAllocationHints, you would need to make the configuration
regions small enough that it doesn't matter (and if you're doing
this around every tile instruction, this is automatically true).</p>
<p> 3. To support RegAllocPBQP (which is likely a good thing to do,
but probably not required), I believe you can support this by
adding custom constraints to the solver (kind of like what
AArch64PBQPRegAlloc.cpp does).<br>
</p>
<p>Once the allocation process is complete, you'll need to go back
and update the LDTILECFG data to reflect the chosen shape ->
register mapping.<br>
</p>
<p>What I don't know, however, is how well the getRegAllocationHints
method will work. The benefit is that you don't need to write a
custom pre-allocator allocator. On the other hand, it might visit
the virtual registers to assign in a suboptimal order because it
doesn't really understand the constraint being imposed (generally,
we just assign larger live ranges first). On the other hand, it is
a greedy algorithm and if you want something systematically closer
to optimal, maybe you should be using PBQP anyway. If you do end
up needing a custom allocator for these, I recommend looking at
the PBQP solver (which, as I recall, is independently reusable).<br>
</p>
<p>Hopefully, this is more-helpful advice.</p>
<p> -Hal<br>
</p>
<div class="moz-cite-prefix">On 8/21/20 9:54 PM, Luo, Yuanke wrote:<br>
</div>
<blockquote type="cite" cite="mid:SN6PR11MB3135C8871C491F33C0A788549A580@SN6PR11MB3135.namprd11.prod.outlook.com">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin-top:0in;
margin-right:0in;
margin-bottom:8.0pt;
margin-left:0in;
line-height:105%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:8.0pt;
margin-left:0in;
text-indent:21.0pt;
line-height:105%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Courier New";}
span.EmailStyle25
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}margin-bottom:0in;}
ul
{margin-bottom:0in;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal">It seems I make a mistake on sharing
register unit. Can we share register unit for tile register
that is within different tile register class (different
register class has different tile shape)? Think about two
virtual tile register <i>%2:vtile1x1 </i>and <i>%3:vtile1x2</i>.
First %2 is allocated to $tmm0, after that %2 is killed and
%t3 is allocated to $tmm0. This is not allowed, because when
$tmm0 is allocated to %2, its shape is configured to 1x1. If
we reallocated $tmm0 to %3, then we need to re-config $tmm0 to
1x2 which cause $tmm0~$tmm7 be clobbered.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Yuanke<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<b>From:</b> Luo, Yuanke <br>
<b>Sent:</b> Friday, August 21, 2020 2:12 PM<br>
<b>To:</b> Hal Finkel <a class="moz-txt-link-rfc2396E"
href="mailto:hfinkel@anl.gov" moz-do-not-send="true"><hfinkel@anl.gov></a>;
Topper, Craig <a class="moz-txt-link-rfc2396E"
href="mailto:craig.topper@intel.com"
moz-do-not-send="true"><craig.topper@intel.com></a>;
Kaylor, Andrew <a class="moz-txt-link-rfc2396E"
href="mailto:andrew.kaylor@intel.com"
moz-do-not-send="true"><andrew.kaylor@intel.com></a>;
Philip Reames <a class="moz-txt-link-rfc2396E"
href="mailto:listmail@philipreames.com"
moz-do-not-send="true"><listmail@philipreames.com></a>;
<a class="moz-txt-link-abbreviated"
href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a>; <a
class="moz-txt-link-abbreviated"
href="mailto:florian_hahn@apple.com"
moz-do-not-send="true">florian_hahn@apple.com</a>; Lu,
Hongjiu <a class="moz-txt-link-rfc2396E"
href="mailto:hongjiu.lu@intel.com"
moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
<b>Subject:</b> RE: [llvm-dev] Intel AMX programming model
discussion.<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi Hal,<o:p></o:p></p>
<p class="MsoNormal">The proposal is attractive to me, but there
is something I still can’t figure out. Let’s take below MIR as
an example. We assume we have 256 register classes (vtile1x1,
vtile1x2, …, tile16x16).<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l0 level1
lfo2">
<!--[if !supportLists]--><span style="mso-list:Ignore">1.<span
style="font:7.0pt "Times New Roman""> </span></span><!--[endif]-->After
instruction selection, the pseudo AMX instruction is
generated. The name of pseudo instructions have ‘P’ prefix.
Now all the AMX pseudo instruction take vtile as register
class. Let’s assume %13 is constant 3, %10 is constant 4 and
%14 is variable.<o:p></o:p></p>
<p class="MsoNormal"><i> %1:vtile = <b><span style="color:red">P</span></b>TILELOADDV
%13:gr16, %10:gr16, %17:gr64, 1, %18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i> %2:vtile = <b>P</b>TILELOADDV
%10:gr16, %14:gr16, %17:gr64, 1, %18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i> %3:vtile = <b>P</b>TILELOADDV
%13:gr16, %14:gr16, %17:gr64, 1, %18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i>%21:vtile = <b>P</b>TDPBSSDV %13:gr16,
%10:gr16, %14:gr16, %3:vtile(tied-def 0), %1:vtile, %2:vtile
<o:p></o:p></i></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l0 level1
lfo2">
<!--[if !supportLists]--><span style="mso-list:Ignore">2.<span
style="font:7.0pt "Times New Roman""> </span></span><!--[endif]-->The
configuration-placement pass looks at all of the AMX
pseudo-instructions and identifies regions in which the
pseudo-instructions use the same configuration parameters. It
first replaces the register class for all tile registers whose
shape is known in compile-time. Since the shape of %1 is
constant, so it replaces %1:vtile with %1:vtile3x4 which
change the register class and morph pseudo instruction into
AMX real instruction. The shape of %2 and %3 is unknown in
compile-time, so it arbitrarily picks up a tile register class
which is not assigned before and assign the register class to
%2 and %3. After register class allocation, the code is
transformed as this. The register class for %2:vtile1x1 and
%3:vtile1x2 is allocated. <o:p></o:p></p>
<p class="MsoNormal"><i> <b>P</b>LDTILECFG<o:p></o:p></i></p>
<p class="MsoNormal"><i> %1:vtile3x4 = TILELOADDV %17:gr64, 1,
%18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i> %2:vtile1x1 = TILELOADDV %17:gr64, 1,
%18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i> %3:vtile1x2 = TILELOADDV %17:gr64, 1,
%18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i>%21:vtile1x2 = TDPBSSDV
%9:vtile1x2(tied-def 0), %1:vtile3x4, %2:vtile1x1 <o:p></o:p></i></p>
<p class="MsoNormal">Something I am not figured out. <o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l3 level1
lfo4">
<!--[if !supportLists]--><span style="mso-list:Ignore">a.<span
style="font:7.0pt "Times New Roman""> </span></span><!--[endif]-->I
not sure if we can have AMX instruction’s inputs and outputs
fit multiple register classes (vtile1x1, …, vtile16x16),
otherwise we need 256 pseudo instructions.<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l3 level1
lfo4">
<!--[if !supportLists]--><span style="mso-list:Ignore">b.<span
style="font:7.0pt "Times New Roman""> </span></span><!--[endif]-->Whether
256 register class is enough to be allocated. There may be
more 256 unknow shape tile registers.<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l3 level1
lfo4">
<!--[if !supportLists]--><span style="mso-list:Ignore">c.<span
style="font:7.0pt "Times New Roman""> </span></span><!--[endif]-->In
this pass we also find the proper pointer (common dominator)
to insert ldtilecfg, but at this time the register is
allocated, we don’t know the shape of each physical tile
register. So we just insert a pseudo tile config instruction.<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l0 level1
lfo2">
<!--[if !supportLists]--><span style="mso-list:Ignore">3.<span
style="font:7.0pt "Times New Roman""> </span></span><!--[endif]-->All
tile register class share the same register unit. We do
register allocation by the framework, and the code is
transformed as this.<o:p></o:p></p>
<p class="MsoNormal"><i> $tmm0 = TILELOADDV %17:gr64, 1,
%18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i> $tmm1 = TILELOADDV %17:gr64, 1,
%18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i> $tmm2 = TILELOADDV %17:gr64, 1,
%18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i>$tmm2 = TDPBSSDV $tmm2(tied-def 0),
$tmm0, $tmm1<o:p></o:p></i></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l0 level1
lfo2">
<!--[if !supportLists]--><span style="mso-list:Ignore">4.<span
style="font:7.0pt "Times New Roman""> </span></span><!--[endif]-->Run
config pass to collect the shape of each physical tile
register and config them. The code can be generated as below.
Here is the problem, how can we know the shape of the physical
tile register?<o:p></o:p></p>
<p class="MsoNormal"><b><i> MOV row, col info to %stack.0 for
each physical tile register ??????<o:p></o:p></i></b></p>
<p class="MsoNormal"><b><i> LDTILECFG %stack.0, 1, $noreg, 0,
$noreg, implicit-def $tmm0, implicit-def $tmm1,
implicit-def $tmm2, implicit-def $tmm3, implicit-def
$tmm4, implicit-def $tmm5, implicit-def $tmm6,
implicit-def $tmm7<o:p></o:p></i></b></p>
<p class="MsoNormal"><i> $tmm0 = TILELOADDV %17:gr64, 1,
%18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i> $tmm1 = TILELOADDV %17:gr64, 1,
%18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i> $tmm2 = TILELOADDV %17:gr64, 1,
%18:gr64_nosp, 0, $noreg<o:p></o:p></i></p>
<p class="MsoNormal"><i>$tmm2 = TDPBSSDV $tmm2(tied-def 0),
$tmm0, $tmm1<o:p></o:p></i></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks<o:p></o:p></p>
<p class="MsoNormal">Yuanke<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
...<o:p></o:p> </div>
</blockquote>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>