<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 8/20/20 2:47 PM, Topper, Craig
wrote:<br>
</div>
<blockquote type="cite" cite="mid:MWHPR11MB0046DC94CD931CD57620FD42935A0@MWHPR11MB0046.namprd11.prod.outlook.com">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin-top:0in;
margin-right:0in;
margin-bottom:8.0pt;
margin-left:0in;
line-height:105%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:8.0pt;
margin-left:0in;
text-indent:21.0pt;
line-height:105%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Courier New";}
span.EmailStyle23
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal">I think I’m still missing something here.
The configuration is per tile. The multiply instructions take
a MxK tile and multiply it by a KxN tile and accumulate into
an MxN tile. So the configuration needs to know how many of
each size of tile it needs to avoid a spill. Wouldn’t the
register allocator then need to know which physical tiles have
been configured to which sizes so that it only chooses those
tiles for an operand that needs that size?</p>
</div>
</blockquote>
<p><br>
</p>
<p>Yes, I think so. But it will because that information is
essentially encoded in the virtual register classes. I certainly
could be missing something. It seems like you first figure that
out, and then you assign virtual tile registers corresponding to
the correct tile sizes. Perhaps this comes down to what you mean
by "avoid a spill." We still might spill, and I assume that the
infrastructure always needs to deal with that. We should continue
to do instruction scheduling in order to minimize register
pressure. Once we assign the right virtual register classes to the
AMX instructions, shouldn't this automatically happen? If we do
spill, since none of the original live ranges cross the ldtilecfg,
then there shouldn't be any fundamental issue with using a regular
load/store spill implementation.</p>
<p>I'm definitely not an expert in this instruction set, so I may
just not understand some aspect of this. If there's something I'm
overlooking, a little example would be helpful.<br>
</p>
<p>Thanks again,</p>
<p>Hal</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:MWHPR11MB0046DC94CD931CD57620FD42935A0@MWHPR11MB0046.namprd11.prod.outlook.com">
<div class="WordSection1">
<p class="MsoNormal"><o:p></o:p></p>
<p class="MsoNormal">~Craig<o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<b>From:</b> Hal Finkel <a class="moz-txt-link-rfc2396E" href="mailto:hfinkel@anl.gov"><hfinkel@anl.gov></a> <br>
<b>Sent:</b> Thursday, August 20, 2020 12:35 PM<br>
<b>To:</b> Topper, Craig <a class="moz-txt-link-rfc2396E" href="mailto:craig.topper@intel.com"><craig.topper@intel.com></a>;
Kaylor, Andrew <a class="moz-txt-link-rfc2396E" href="mailto:andrew.kaylor@intel.com"><andrew.kaylor@intel.com></a>; Luo,
Yuanke <a class="moz-txt-link-rfc2396E" href="mailto:yuanke.luo@intel.com"><yuanke.luo@intel.com></a>; Philip Reames
<a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>; <a class="moz-txt-link-abbreviated" href="mailto:florian_hahn@apple.com">florian_hahn@apple.com</a>; Lu,
Hongjiu <a class="moz-txt-link-rfc2396E" href="mailto:hongjiu.lu@intel.com"><hongjiu.lu@intel.com></a><br>
<b>Subject:</b> Re: [llvm-dev] Intel AMX programming model
discussion.<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p><o:p> </o:p></p>
<div>
<p class="MsoNormal">On 8/19/20 3:09 PM, Topper, Craig wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">The width and height can be runtime
values that we would just copy into 64 byte configuration
block we pass to ldtilecfg. So the code doesn’t need to be
multiversioned. The user code would also use those values to
update pointers in the loops they write using the tiles. If
we can’t determine that two tiles were defined with the same
width and height we need to assume the shape is different
and try to avoid ever giving the same tile.<o:p></o:p></p>
<p class="MsoNormal">Hal, for your suggestion would which
physical registers are in which register class be defined
dynamically before register allocation?<o:p></o:p></p>
</blockquote>
<p><o:p> </o:p></p>
<p>Here's my thought:<o:p></o:p></p>
<p>First, you have a set of intrinsics that take tile values
along with tile configuration parameters (which, presently,
seem just to be the sizes). These get lowered into
pseudo-instructions that do the same. Thus, you have some
register class that represents these arbitrarily-sized tile
registers that you'll assign to these pseudo-instruction
operands (i.e., they take virtual tile registers right after
instruction selection). You might use the 16x16 tile register
class for this purpose, but it shouldn't really matter.<o:p></o:p></p>
<p>Second, you run this configuration-placement pass. This pass
looks at all of the AMX pseudo-instructions and identifies
regions in which the pseudo-instructions use the same
configuration parameters (i.e., the same SSA values and/or
constants). This pass might reorder the pseudo-instructions
when legal in order to form larger regions. Then it places the
ldtilecfg at the start of each region (in some common
dominating position). ldtilecfg implicitly defines all of the
tile registers in every concrete class of tile registers (all
256 of them, or whatever). The pseudo-instructions are
replaced by real MI instructions taking a tile register class
appropriate for the configuration (which will default to the
16x16 class for cases where the configuration is not a
compile-time-known constant). When the configuration is a
known constant, the instructions take operands with a register
class appropriate for that configuration (e.g., 1x1, 4x4).<o:p></o:p></p>
<p>Third, the rest of the framework runs as usual. Tile
registers from the appropriate class are allocated by the
register allocator. No live range of any virtual tile register
can pass through the ldtilecfg (because it defines them all),
but that's okay, none of live ranges will by construction (the
configuration-placement pass ensures this).<o:p></o:p></p>
<p>Thanks again,<o:p></o:p></p>
<p>Hal<o:p></o:p></p>
<p><o:p> </o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<b>From:</b> Hal Finkel <a
href="mailto:hfinkel@anl.gov" moz-do-not-send="true"><hfinkel@anl.gov></a>
<br>
<b>Sent:</b> Wednesday, August 19, 2020 12:52 PM<br>
<b>To:</b> Kaylor, Andrew <a
href="mailto:andrew.kaylor@intel.com"
moz-do-not-send="true"><andrew.kaylor@intel.com></a>;
Luo, Yuanke
<a href="mailto:yuanke.luo@intel.com"
moz-do-not-send="true"><yuanke.luo@intel.com></a>;
Philip Reames <a
href="mailto:listmail@philipreames.com"
moz-do-not-send="true">
<listmail@philipreames.com></a>; <a
href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a>;
<a href="mailto:florian_hahn@apple.com"
moz-do-not-send="true">florian_hahn@apple.com</a>;
Topper, Craig
<a href="mailto:craig.topper@intel.com"
moz-do-not-send="true"><craig.topper@intel.com></a>;
Lu, Hongjiu
<a href="mailto:hongjiu.lu@intel.com"
moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
<b>Subject:</b> Re: [llvm-dev] Intel AMX programming
model discussion.<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p> <o:p></o:p></p>
<div>
<p class="MsoNormal">On 8/19/20 10:24 AM, Kaylor, Andrew
wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p>> When the tile shape is unknown at compile time, how
do you plan to do the register allocation of the tiles? My
question is: do you do the allocation for this case in the
same way as you would if you knew the size was 16x16
(i.e., conservatively assume the largest size)?<o:p></o:p></p>
<p class="MsoNormal">I think what will happen is that the
registers are allocated based on a number of runtime
values that are assumed to be different from one another
but less than or equal to 16. So, for example, we’ll
allocate registers for MxN tiles, NxM tiles and MxM tiles
without knowing what M and N are. Then at runtime the
values of these variables will be used to create the
actual tile configuration. The instructions that need to
know the shape take these runtime values as operands.<o:p></o:p></p>
</blockquote>
<p> <o:p></o:p></p>
<p>So you're going to multiversion the code?<o:p></o:p></p>
<p>In any case, my point is that you probably don't need a
custom register allocator. If you just define the tile
registers and make sure that the ldtilecfgs implicitly
defines them all, then the regular infrastructure likely
works. You'll have a bunch of register classes, but that's
not necessarily a problem. I recommend trying this, and let
us know what you discover, before we go down the road of a
new, dedicated allocator just for these registers.<o:p></o:p></p>
<p> -Hal<o:p></o:p></p>
<p> <o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">There may be some artifacts coming from
the front end that conservatively assume a 16x16 tile, but
I think those generally go away in SROA or later
specialized passes. Yuanke can confirm or correct my
understanding of this.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<b>From:</b> Hal Finkel <a
href="mailto:hfinkel@anl.gov" moz-do-not-send="true"><hfinkel@anl.gov></a>
<br>
<b>Sent:</b> Wednesday, August 19, 2020 5:14 AM<br>
<b>To:</b> Luo, Yuanke <a
href="mailto:yuanke.luo@intel.com"
moz-do-not-send="true"><yuanke.luo@intel.com></a>;
Kaylor, Andrew
<a href="mailto:andrew.kaylor@intel.com"
moz-do-not-send="true"><andrew.kaylor@intel.com></a>;
Philip Reames
<a href="mailto:listmail@philipreames.com"
moz-do-not-send="true"><listmail@philipreames.com></a>;
<a href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">
llvm-dev@lists.llvm.org</a>; <a
href="mailto:florian_hahn@apple.com"
moz-do-not-send="true">florian_hahn@apple.com</a>;
Topper, Craig
<a href="mailto:craig.topper@intel.com"
moz-do-not-send="true"><craig.topper@intel.com></a>;
Lu, Hongjiu
<a href="mailto:hongjiu.lu@intel.com"
moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
<b>Subject:</b> Re: [llvm-dev] Intel AMX programming
model discussion.<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p> <o:p></o:p></p>
<div>
<p class="MsoNormal">On 8/19/20 5:34 AM, Luo, Yuanke
wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">There is no problem to have 256
register classes. Just a lot of register classes to me.<o:p></o:p></p>
<p class="MsoNormal">We don’t assume the shape of each
physical register be 16x16, it is defined by user. For
variable shape, I mean the shape is known in runtime and
in compile time the shape is unknown. Take below code as
an example, the %row and %col are variable instead of
constant. Compiler recognizes llvm.x86.tileloadd64 and
deduce the shape of %0 is %row x %col.<o:p></o:p></p>
<p class="MsoNormal">%0 = tail call <256 x i32>
@llvm.x86.tileloadd64(i16 %row, i16 %col, i8*
getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf,
i64 0, i64 0), i64 32)<o:p></o:p></p>
</blockquote>
<p> <o:p></o:p></p>
<p>When the tile shape is unknown at compile time, how do
you plan to do the register allocation of the tiles? My
question is: do you do the allocation for this case in the
same way as you would if you knew the size was 16x16
(i.e., conservatively assume the largest size)?<o:p></o:p></p>
<p>Thanks again,<o:p></o:p></p>
<p>Hal<o:p></o:p></p>
<p> <o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<b>From:</b> Hal Finkel <a
href="mailto:hfinkel@anl.gov"
moz-do-not-send="true"><hfinkel@anl.gov></a>
<br>
<b>Sent:</b> Wednesday, August 19, 2020 4:58 PM<br>
<b>To:</b> Luo, Yuanke <a
href="mailto:yuanke.luo@intel.com"
moz-do-not-send="true"><yuanke.luo@intel.com></a>;
Kaylor, Andrew
<a href="mailto:andrew.kaylor@intel.com"
moz-do-not-send="true"><andrew.kaylor@intel.com></a>;
Philip Reames
<a href="mailto:listmail@philipreames.com"
moz-do-not-send="true"><listmail@philipreames.com></a>;
<a href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">
llvm-dev@lists.llvm.org</a>; <a
href="mailto:florian_hahn@apple.com"
moz-do-not-send="true">florian_hahn@apple.com</a>;
Topper, Craig
<a href="mailto:craig.topper@intel.com"
moz-do-not-send="true"><craig.topper@intel.com></a>;
Lu, Hongjiu
<a href="mailto:hongjiu.lu@intel.com"
moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
<b>Subject:</b> Re: [llvm-dev] Intel AMX programming
model discussion.<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p> <o:p></o:p></p>
<div>
<p class="MsoNormal">On 8/19/20 2:21 AM, Luo, Yuanke
wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Hi Hal,<o:p></o:p></p>
<p class="MsoNormal">There is 3 aspect to be solved. <o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span style="mso-list:Ignore">1.<span
style="font:7.0pt "Times New Roman"">
</span></span><!--[endif]-->The HW support max shape
16x16, so there are many register classes from 1x1 to
16x16. We need 256 register classes.
<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span style="mso-list:Ignore">2.<span
style="font:7.0pt "Times New Roman"">
</span></span><!--[endif]-->We want to support
variable shape, so compiler don’t know what register
class to fit tile shape as it is only known in
runtime.<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l0
level1 lfo2">
<!--[if !supportLists]--><span style="mso-list:Ignore">3.<span
style="font:7.0pt "Times New Roman"">
</span></span><!--[endif]-->The tile configure is to
configure physical tile register, so we need to
allocate register and then we know the shape of each
physical tile register and configure the tile
register.<o:p></o:p></p>
<p class="MsoNormal">I think your suggestion is helpful
to reduce the complexity if we only support fixed
(constant) tile shape.<o:p></o:p></p>
<p class="MsoNormal">-Yuanke<o:p></o:p></p>
</blockquote>
<p> <o:p></o:p></p>
<p>Thanks, Yuanke.<o:p></o:p></p>
<p>It's not clear to me that having 256 register classes
is, in itself, a problem. Is it?<o:p></o:p></p>
<p>What does it mean to support variable-shape tiles in
this context? Do you do something other than
conservatively assume that they are 16x16 for
register-allocation purposes?<o:p></o:p></p>
<p> -Hal<o:p></o:p></p>
<p> <o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<b>From:</b> Hal Finkel <a
href="mailto:hfinkel@anl.gov"
moz-do-not-send="true"><hfinkel@anl.gov></a>
<br>
<b>Sent:</b> Wednesday, August 19, 2020 8:20 AM<br>
<b>To:</b> Kaylor, Andrew <a
href="mailto:andrew.kaylor@intel.com"
moz-do-not-send="true"><andrew.kaylor@intel.com></a>;
Philip Reames
<a href="mailto:listmail@philipreames.com"
moz-do-not-send="true"><listmail@philipreames.com></a>;
Luo, Yuanke
<a href="mailto:yuanke.luo@intel.com"
moz-do-not-send="true"><yuanke.luo@intel.com></a>;
<a href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">
llvm-dev@lists.llvm.org</a>; <a
href="mailto:florian_hahn@apple.com"
moz-do-not-send="true">florian_hahn@apple.com</a>;
Topper, Craig
<a href="mailto:craig.topper@intel.com"
moz-do-not-send="true"><craig.topper@intel.com></a>;
Lu, Hongjiu
<a href="mailto:hongjiu.lu@intel.com"
moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
<b>Subject:</b> Re: [llvm-dev] Intel AMX
programming model discussion.<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p>Hi, Andy,<o:p></o:p></p>
<p>I don't quite understand everything that's going on
here. Could we model this as:<o:p></o:p></p>
<p> 1. Define a collection of register classes, one for
2x4 tiles, one for 4x2 tiles, etc. each populated with
a set of tile registers. Registers can have aliasing
relationships (instead of worrying of any kind of
subregister/superregister relationships -- these won't
be useful anyway).<o:p></o:p></p>
<p> 2. Define the tile-configuration instructions so
that they implicitly define all of the registers in
all of the classes.<o:p></o:p></p>
<p>Then you would still need to pre-schedule the tile
operations as you've described, and collect the
configuration information in order to add the
ldtilecfgs, but the regular register allocator can
handle the allocation itself in the usual way. What do
you think?<o:p></o:p></p>
<p> -Hal<o:p></o:p></p>
<div>
<p class="MsoNormal">On 8/18/20 6:58 PM, Kaylor,
Andrew via llvm-dev wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
The AMX registers are complicated. The single
configuration register (which is mostly used
implicitly, similar to MXCSR for floating point)
controls the shape of all the tile registers, and if
you change the tile configuration every single tile
register is cleared. In practice, if we have to
change the the configuration while any of the tile
registers are live, performance is going to be
terrible. We need to handle this case for
correctness, but users of this programming interface
will need to have enough awareness of the
performance issues and the hardware details to
prevent this. We’ll also want a diagnostic that lets
the user know when this has happened.<o:p></o:p></p>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<o:p></o:p></p>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
When the tile configuration is set, the shape of
each tile is locked in, so the individual tile
registers aren’t interchangeable at that point. If a
function needs 2x4 tiles, 4x2 tiles, and 4x4 tiles,
the configuration needs to be set with this in mind.
The shape isn’t explicit in every instruction and
intrinsic. It must be deduced. And again, we’ll need
a way to tell the user when efficient allocation
can’t be done. In practice, I don’t expect any
function to be using more than three tile shapes.<o:p></o:p></p>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<o:p></o:p></p>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
The implication of all this is that I don’t think
the greedy register allocator is well suited to
figure all of this out. We need a special pass to
pre-allocate these registers. If the function is
written in a way that makes good performance
possible, it should be a relatively simple task to
allocate everything with minimal spilling. If it
isn’t possible to get good performance, we don’t
need to do anything especially clever. We can just
do something straightforward that is correct and let
the user know that they aren’t going to be happy
with the results.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">-Andy<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<b>From:</b> Philip Reames <a
href="mailto:listmail@philipreames.com"
moz-do-not-send="true"><listmail@philipreames.com></a>
<br>
<b>Sent:</b> Friday, August 14, 2020 8:29 PM<br>
<b>To:</b> Luo, Yuanke <a
href="mailto:yuanke.luo@intel.com"
moz-do-not-send="true"><yuanke.luo@intel.com></a>;
<a href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a>;
<a href="mailto:florian_hahn@apple.com"
moz-do-not-send="true">
florian_hahn@apple.com</a>; Kaylor, Andrew <a
href="mailto:andrew.kaylor@intel.com"
moz-do-not-send="true">
<andrew.kaylor@intel.com></a>; Topper,
Craig <a href="mailto:craig.topper@intel.com"
moz-do-not-send="true">
<craig.topper@intel.com></a>; Lu,
Hongjiu <a href="mailto:hongjiu.lu@intel.com"
moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
<b>Subject:</b> Re: [llvm-dev] Intel AMX
programming model discussion.<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p>I find your answer unconvincing. I'm not going to
debate it as I don't wish to take the time to build
the appropriate context, but my initial response is
skepticism.<o:p></o:p></p>
<p>Philip<o:p></o:p></p>
<div>
<p class="MsoNormal">On 8/14/20 4:49 PM, Luo, Yuanke
wrote:<o:p></o:p></p>
</div>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">[Yuanke] AMX register is
special. It needs to be configured before use and
the config instruction is expensive. To avoid
unnecessary tile configure, we collect the tile
shape information as much as possible and combine
them into one ldtilecfg instruction. The ldtilecfg
instruction should dominate any AMX instruction
that access tile register. On the other side, the
ldtilecfg should post-dominated the instruction
that define the tile shape. For tile register
spill, it should avoid re-config due to the
different tile shape, the spilled register should
be reloaded to the register that share the same
tile shape. Since tile register allocation is
special and it may allocate general virtual
register to configure tile register, we can add a
sperate pass to do it before general register
allocation pass. After register allocation, the
tile shape information is not needed anymore, so
we can transform the pseudo AMX instruction to
real AMX instruction by removing the row and
column operands.<o:p></o:p></p>
<p>[Philip]<o:p></o:p></p>
<p>This seems complicated.<o:p></o:p></p>
<p>Reading through the documentation, there appears
to be a single global tile config for all tile
registers at any time.<o:p></o:p></p>
<p>Why not simply model this tile config as a
designated special register and the tile
instructions as having an implicit use of this
register? That would seem to ensure that the
register allocator has all the constraints
needed. You'd need to teach it how to spill the
special registers with the appropriate
instructions, but that seems a lot more straight
forward?<o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%">[Yuanke]
In that case user need to configure the tile
register by themselves. Spilling configure
register is very expensive, because it clears
all the tile data register to zero. In our
proposal, compiler is responsible to deduce the
shape for virtual of tile data register,
allocate physical registers for them and then
configure those physical register. We may build
the dependency as you proposed and it can be
used for machine IR check to ensure tile data
register is configured before use. </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<b>From:</b> Philip Reames <a
href="mailto:listmail@philipreames.com"
moz-do-not-send="true"><listmail@philipreames.com></a>
<br>
<b>Sent:</b> Saturday, August 15, 2020 1:17 AM<br>
<b>To:</b> Luo, Yuanke <a
href="mailto:yuanke.luo@intel.com"
moz-do-not-send="true"><yuanke.luo@intel.com></a>;
<a href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a>;
<a href="mailto:florian_hahn@apple.com"
moz-do-not-send="true">
florian_hahn@apple.com</a>; Kaylor, Andrew <a
href="mailto:andrew.kaylor@intel.com"
moz-do-not-send="true">
<andrew.kaylor@intel.com></a>; Topper,
Craig <a href="mailto:craig.topper@intel.com"
moz-do-not-send="true">
<craig.topper@intel.com></a>; Lu,
Hongjiu <a href="mailto:hongjiu.lu@intel.com"
moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
<b>Subject:</b> Re: [llvm-dev] Intel AMX
programming model discussion.<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p> <o:p></o:p></p>
<div>
<p class="MsoNormal">On 8/14/20 6:27 AM, Luo,
Yuanke via llvm-dev wrote:<o:p></o:p></p>
</div>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal">Intel Advanced Matrix
Extensions (Intel AMX) is a new programming
paradigm consisting of two components: a set of
2-dimensional registers (tiles) representing
sub-arrays from a larger 2-dimensional memory
image, and accelerators able to operate on
tiles. Capability of Intel AMX implementation is
enumerated by palettes. Two palettes are
supported: palette 0 represents the initialized
state and palette 1 consists of 8 tile registers
of up to 1 KB size, which is controlled by a
tile control register.<o:p></o:p></p>
<p class="MsoNormal">The instruction manual is
posted at <a
href="https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html"
moz-do-not-send="true">
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html</a>.<o:p></o:p></p>
<p class="MsoNormal">The AMX abi proposal is
posted at <a
href="https://groups.google.com/g/x86-64-abi/c/NRejFm7pwb4"
moz-do-not-send="true">
https://groups.google.com/g/x86-64-abi/c/NRejFm7pwb4</a>.<o:p></o:p></p>
<p class="MsoNormal">This email is to discuss the
programming model for AMX. Florian has
introduced the matrix type and intrinsics in
LLVM community. We’d like to adopt some ideas
from it.<o:p></o:p></p>
<p class="MsoNormal">Here is what we propose for
the AMX programming model.<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l1
level1 lfo4">
<!--[if !supportLists]--><span
style="mso-list:Ignore">1.<span
style="font:7.0pt "Times New
Roman"">
</span></span><!--[endif]--> Data type. <o:p></o:p></p>
<p class="MsoNormal">We’d like to have fixed
vector type for AMX. Since the shape to AMX
register can be configurable, the vector size is
the maximum size of AMX register. That means the
vector size is 1024 bytes.<o:p></o:p></p>
<p class="MsoNormal">The C code may look like
this.<o:p></o:p></p>
<p class="MsoNormal">typedef int _tile_data
__attribute__((__vector_size__(1024),
__aligned__(64)));<o:p></o:p></p>
<p class="MsoNormal">_tile_data tile;<o:p></o:p></p>
<p class="MsoNormal">And the LLVM IR may look like
this.<o:p></o:p></p>
<p class="MsoNormal">@tile = dso_local
local_unnamed_addr global <256 x i32>
zeroinitializer, align 64<o:p></o:p></p>
<p class="MsoNormal">For llvm IR, it is nice to
have a new type x86_amxtile that can be mapped
to AMX registers.<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l1
level1 lfo4">
<!--[if !supportLists]--><span
style="mso-list:Ignore">2.<span
style="font:7.0pt "Times New
Roman"">
</span></span><!--[endif]-->AMX Intrinsics. <o:p></o:p></p>
<p class="MsoNormal">The internal intrinsics are
1:1 mapped to AMX instructions. The parameter m,
n, k identifies the shape of the tile. The shape
can be variable, but it cannot exceed the size
that AMX HW can support. Compiler can deduce
shape of the tile from the AMX intrinsics.<o:p></o:p></p>
<p class="MsoNormal" style="text-indent:5.5pt">_tile_data
_tile_loadd_internal(char m, short n, const void
*base, int stride);<o:p></o:p></p>
<p class="MsoNormal">_tile_data
_tile_dpbssd_internal(char m, short n, short k,
_tile_data dst, _tile_data src1, _tile_data
src2);<o:p></o:p></p>
<p class="MsoNormal">_tile_data
_tile_dpbf16ps_internal(char m, short n, short
k, _tile_data dst, _tile_data src1, _tile_data
src2);<o:p></o:p></p>
<p class="MsoNormal">void
_tile_stored_internal(char m, short n, void
*base, int stride, _tile_data tile);<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l1
level1 lfo4">
<!--[if !supportLists]--><span
style="mso-list:Ignore">3.<span
style="font:7.0pt "Times New
Roman"">
</span></span><!--[endif]-->User interfaces.<o:p></o:p></p>
<p class="MsoNormal">The tile shape and tile data
are combined into a struct in C language. The
shape of the tile is only allowed to be
initialized once. The user interface looks as
this.<o:p></o:p></p>
<p class="MsoNormal"> 3 #define
__DEFAULT_FN_AMX \<o:p></o:p></p>
<p class="MsoNormal"> 4
__attribute__((__always_inline__, __nodebug__,
__target__("amx-int8")))<o:p></o:p></p>
<p class="MsoNormal"> 9 typedef struct
__tile_str {<o:p></o:p></p>
<p class="MsoNormal">10 const char row;<o:p></o:p></p>
<p class="MsoNormal">11 const short col;<o:p></o:p></p>
<p class="MsoNormal">12 _tile_data tile;<o:p></o:p></p>
<p class="MsoNormal">13 }__tile;<o:p></o:p></p>
<p class="MsoNormal">14<o:p></o:p></p>
<p class="MsoNormal">15 __DEFAULT_FN_AMX<o:p></o:p></p>
<p class="MsoNormal">16 void __tile_loadd(__tile
*dst, const void *base, long stride) {<o:p></o:p></p>
<p class="MsoNormal">17 dst->tile =
_tile_loadd_internal(dst->row, dst->col,
base, stride);<o:p></o:p></p>
<p class="MsoNormal">18 }<o:p></o:p></p>
<p class="MsoNormal">19<o:p></o:p></p>
<p class="MsoNormal">20 __DEFAULT_FN_AMX<o:p></o:p></p>
<p class="MsoNormal">21 void __tile_dpbsud(__tile
*dst, __tile src1, __tile src2) {<o:p></o:p></p>
<p class="MsoNormal">22 dst->tile =
_tile_dpbssd_internal(src1.row, src2.col,
src1.col, dst->tile, src1.tile, src2.tile);<o:p></o:p></p>
<p class="MsoNormal">23 }<o:p></o:p></p>
<p class="MsoNormal">24<o:p></o:p></p>
<p class="MsoNormal">25 __DEFAULT_FN_AMX<o:p></o:p></p>
<p class="MsoNormal">26 void __tile_stored(void
*base, long stride, __tile src) {<o:p></o:p></p>
<p class="MsoNormal">27
_tile_stored_internal(src.row, src.col, base,
stride, src.tile);<o:p></o:p></p>
<p class="MsoNormal">28 }<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l1
level1 lfo4">
<!--[if !supportLists]--><span
style="mso-list:Ignore">4.<span
style="font:7.0pt "Times New
Roman"">
</span></span><!--[endif]-->Example code<o:p></o:p></p>
<p class="MsoNormal">The example shows how to use
the user interface in a function.
<o:p></o:p></p>
<p class="MsoNormal"> 51 void api(int cond, short
row, short col) {<o:p></o:p></p>
<p class="MsoNormal">52 __tile a = {row, col};<o:p></o:p></p>
<p class="MsoNormal">53 __tile b = {row, col};<o:p></o:p></p>
<p class="MsoNormal">54 __tile c = {row, col};<o:p></o:p></p>
<p class="MsoNormal">55<o:p></o:p></p>
<p class="MsoNormal">56 if(cond) {<o:p></o:p></p>
<p class="MsoNormal">57 __tile_loadd(&a,
buf, STRIDE);<o:p></o:p></p>
<p class="MsoNormal">58 __tile_loadd(&b,
buf, STRIDE);<o:p></o:p></p>
<p class="MsoNormal">59 __tile_loadd(&c,
buf, STRIDE);<o:p></o:p></p>
<p class="MsoNormal">60 } else {<o:p></o:p></p>
<p class="MsoNormal">61 __tile_loadd(&a,
buf2, STRIDE);<o:p></o:p></p>
<p class="MsoNormal">62 __tile_loadd(&b,
buf2, STRIDE);<o:p></o:p></p>
<p class="MsoNormal">63 __tile_loadd(&c,
buf2, STRIDE);<o:p></o:p></p>
<p class="MsoNormal">64 }<o:p></o:p></p>
<p class="MsoNormal"><span lang="IT">65
__tile_dpbsud(&c, a, b);</span><o:p></o:p></p>
<p class="MsoNormal">66 __tile_stored(buf,
STRIDE, c);<o:p></o:p></p>
<p class="MsoNormal">67 }<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l1
level1 lfo4">
<!--[if !supportLists]--><span
style="mso-list:Ignore">5.<span
style="font:7.0pt "Times New
Roman"">
</span></span><!--[endif]-->LLVM IR<o:p></o:p></p>
<p class="MsoNormal">The LLVM intrinsics IR take
the row and column information as the input
parameter, so that compiler can deduce the shape
of tile data. The remaining parameters are what
AMX instructions require. This is the LLVM IR
corresponding to the example code.<o:p></o:p></p>
<p class="MsoNormal">12 define dso_local void
@api(i32 %cond, i16 signext %row, i16 signext
%col) local_unnamed_addr #2 {<o:p></o:p></p>
<p class="MsoNormal">13 entry:<o:p></o:p></p>
<p class="MsoNormal">14 %tobool = icmp eq i32
%cond, 0<o:p></o:p></p>
<p class="MsoNormal">15 %sext = shl i16 %col, 8<o:p></o:p></p>
<p class="MsoNormal">16 %conv.i31 = ashr exact
i16 %sext, 8<o:p></o:p></p>
<p class="MsoNormal">17 br i1 %tobool, label
%if.else, label %if.then<o:p></o:p></p>
<p class="MsoNormal">18<o:p></o:p></p>
<p class="MsoNormal">19
if.then:
; preds = %entry<o:p></o:p></p>
<p class="MsoNormal">20 %0 = tail call <256 x
i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x
i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32)
#3<o:p></o:p></p>
<p class="MsoNormal">21 %1 = tail call <256 x
i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x
i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32)
#3<o:p></o:p></p>
<p class="MsoNormal">22 %2 = tail call <256 x
i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x
i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32)
#3<o:p></o:p></p>
<p class="MsoNormal">23 br label %if.end<o:p></o:p></p>
<p class="MsoNormal">24<o:p></o:p></p>
<p class="MsoNormal">25
if.else:
; preds = %entry<o:p></o:p></p>
<p class="MsoNormal">26 %3 = tail call <256 x
i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x
i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32)
#3<o:p></o:p></p>
<p class="MsoNormal">27 %4 = tail call <256 x
i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x
i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32)
#3<o:p></o:p></p>
<p class="MsoNormal">28 %5 = tail call <256 x
i32> @llvm.x86.tileloadd64(i16 %row, i16
%conv.i31, i8* getelementptr inbounds ([1024 x
i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32)
#3<o:p></o:p></p>
<p class="MsoNormal">29 br label %if.end<o:p></o:p></p>
<p class="MsoNormal">30<o:p></o:p></p>
<p class="MsoNormal">31
if.end:
; preds = %if.else, %if.then<o:p></o:p></p>
<p class="MsoNormal">32 %a.sroa.1186.0 = phi
<256 x i32> [ %3, %if.else ], [ %0,
%if.then ]<o:p></o:p></p>
<p class="MsoNormal">33 %b.sroa.1068.0 = phi
<256 x i32> [ %4, %if.else ], [ %1,
%if.then ]<o:p></o:p></p>
<p class="MsoNormal">34 %c.sroa.1149.0 = phi
<256 x i32> [ %5, %if.else ], [ %2,
%if.then ]<o:p></o:p></p>
<p class="MsoNormal">35 %6 = tail call <256 x
i32> @llvm.x86.tdpbssd(i16 %row, i16
%conv.i31, i16 %conv.i31, <256 x i32>
%c.sroa.1149.0, <256 x i32>
%a.sroa.1186.0, <256 x i32>
%b.sroa.1068.0) #3<o:p></o:p></p>
<p class="MsoNormal">36 tail call void
@llvm.x86.tilestored64(i16 %row, i16 %conv.i31,
i8* getelementptr inbounds ([1024 x i8], [1024 x
i8]* @buf, i64 0, i64 0), i64 32, <256 x
i32> %6) #3<o:p></o:p></p>
<p class="MsoNormal">37 ret void<o:p></o:p></p>
<p class="MsoNormal">38 }<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l1
level1 lfo4">
<!--[if !supportLists]--><span
style="mso-list:Ignore">6.<span
style="font:7.0pt "Times New
Roman"">
</span></span><!--[endif]-->Shape propagation<o:p></o:p></p>
<p class="MsoNormal">When in -O0 build, some
general load/store for tile vector is generated
by front-end. We need to root from AMX
intrinsics to propagate the shape information to
the virtual tile register. If the an AMX
intrinsic use the result of load instruction,
the shape is propagated to the load and the load
is transformed to tile load intrinsic. If the
store instruction uses any result of AMX
intrinsic, the shape is propagated to store
instruction and the store is transformed to tile
store intrinsic<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l1
level1 lfo4">
<!--[if !supportLists]--><span
style="mso-list:Ignore">7.<span
style="font:7.0pt "Times New
Roman"">
</span></span><!--[endif]-->Machine IR<o:p></o:p></p>
<p class="MsoNormal">Since the AMX intrinsics take
the row and column as the input parameters, we
can create a pseudo instruction corresponding to
it. The AMX intrinsics are lowered to the pseudo
AMX instruction which has extra row and column
operands corresponding to AMX intrinsic. The
real AMX instructions don’t need the row and
column operands. The row and column information
should be configured by ldtilecfg before
executing any AMX instruction.<o:p></o:p></p>
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l1
level1 lfo4">
<!--[if !supportLists]--><span
style="mso-list:Ignore">8.<span
style="font:7.0pt "Times New
Roman"">
</span></span><!--[endif]-->Register
allocation<o:p></o:p></p>
<p class="MsoNormal">AMX register is special. It
needs to be configured before use and the config
instruction is expensive. To avoid unnecessary
tile configure, we collect the tile shape
information as much as possible and combine them
into one ldtilecfg instruction. The ldtilecfg
instruction should dominate any AMX instruction
that access tile register. On the other side,
the ldtilecfg should post-dominated the
instruction that define the tile shape. For tile
register spill, it should avoid re-config due to
the different tile shape, the spilled register
should be reloaded to the register that share
the same tile shape. Since tile register
allocation is special and it may allocate
general virtual register to configure tile
register, we can add a sperate pass to do it
before general register allocation pass. After
register allocation, the tile shape information
is not needed anymore, so we can transform the
pseudo AMX instruction to real AMX instruction
by removing the row and column operands.<o:p></o:p></p>
</blockquote>
<p>This seems complicated.<o:p></o:p></p>
<p>Reading through the documentation, there appears
to be a single global tile config for all tile
registers at any time.<o:p></o:p></p>
<p>Why not simply model this tile config as a
designated special register and the tile
instructions as having an implicit use of this
register? That would seem to ensure that the
register allocator has all the constraints
needed. You'd need to teach it how to spill the
special registers with the appropriate
instructions, but that seems a lot more straight
forward?<o:p></o:p></p>
<blockquote
style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoListParagraph"
style="margin-left:.5in;text-indent:-.25in;mso-list:l1
level1 lfo4">
<!--[if !supportLists]--><span
style="mso-list:Ignore">9.<span
style="font:7.0pt "Times New
Roman"">
</span></span><!--[endif]-->Use recommendation
<o:p></o:p></p>
<p class="MsoNormal">Due to the shape configure
issue, we recommend user to define the tile
shape at the entry of the function entry and
inline function as much as possible. The AMX
instructions focus on computation instead of
storage, so global variable for tile data is not
recommended.<o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%"> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%">Thanks</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:10.5pt;line-height:105%">Yuanke</span><o:p></o:p></p>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<o:p></o:p></p>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>LLVM Developers mailing list<o:p></o:p></pre>
<pre><a href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><o:p></o:p></pre>
<pre><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><o:p></o:p></pre>
</blockquote>
</blockquote>
<p class="MsoNormal"
style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<br>
<br>
<br>
<br>
<br>
<br>
<o:p></o:p></p>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>LLVM Developers mailing list<o:p></o:p></pre>
<pre><a href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><o:p></o:p></pre>
<pre><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><o:p></o:p></pre>
</blockquote>
<pre>-- <o:p></o:p></pre>
<pre>Hal Finkel<o:p></o:p></pre>
<pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
<pre>Leadership Computing Facility<o:p></o:p></pre>
<pre>Argonne National Laboratory<o:p></o:p></pre>
</blockquote>
<pre>-- <o:p></o:p></pre>
<pre>Hal Finkel<o:p></o:p></pre>
<pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
<pre>Leadership Computing Facility<o:p></o:p></pre>
<pre>Argonne National Laboratory<o:p></o:p></pre>
</blockquote>
<pre>-- <o:p></o:p></pre>
<pre>Hal Finkel<o:p></o:p></pre>
<pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
<pre>Leadership Computing Facility<o:p></o:p></pre>
<pre>Argonne National Laboratory<o:p></o:p></pre>
</blockquote>
<pre>-- <o:p></o:p></pre>
<pre>Hal Finkel<o:p></o:p></pre>
<pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
<pre>Leadership Computing Facility<o:p></o:p></pre>
<pre>Argonne National Laboratory<o:p></o:p></pre>
</blockquote>
<pre>-- <o:p></o:p></pre>
<pre>Hal Finkel<o:p></o:p></pre>
<pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
<pre>Leadership Computing Facility<o:p></o:p></pre>
<pre>Argonne National Laboratory<o:p></o:p></pre>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>