<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 8/20/20 2:47 PM, Topper, Craig
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:MWHPR11MB0046DC94CD931CD57620FD42935A0@MWHPR11MB0046.namprd11.prod.outlook.com">
      
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin-top:0in;
        margin-right:0in;
        margin-bottom:8.0pt;
        margin-left:0in;
        line-height:105%;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:8.0pt;
        margin-left:0in;
        text-indent:21.0pt;
        line-height:105%;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:"Courier New";}
span.EmailStyle23
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}mso-level-tab-stop:4.5in;
        mso-level-number-position:left;
        text-indent:-.25in;}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal">I think I’m still missing something here.
          The configuration is per tile. The multiply instructions take
          a MxK tile and multiply it by a KxN tile and accumulate into
          an MxN tile. So the configuration needs to know how many of
          each size of tile it needs to avoid a spill. Wouldn’t the
          register allocator then need to know which physical tiles have
          been configured to which sizes so that it only chooses those
          tiles for an operand that needs that size?</p>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>Yes, I think so. But it will because that information is
      essentially encoded in the virtual register classes. I certainly
      could be missing something. It seems like you first figure that
      out, and then you assign virtual tile registers corresponding to
      the correct tile sizes. Perhaps this comes down to what you mean
      by "avoid a spill." We still might spill, and I assume that the
      infrastructure always needs to deal with that. We should continue
      to do instruction scheduling in order to minimize register
      pressure. Once we assign the right virtual register classes to the
      AMX instructions, shouldn't this automatically happen? If we do
      spill, since none of the original live ranges cross the ldtilecfg,
      then there shouldn't be any fundamental issue with using a regular
      load/store spill implementation.</p>
    <p>I'm definitely not an expert in this instruction set, so I may
      just not understand some aspect of this. If there's something I'm
      overlooking, a little example would be helpful.<br>
    </p>
    <p>Thanks again,</p>
    <p>Hal</p>
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:MWHPR11MB0046DC94CD931CD57620FD42935A0@MWHPR11MB0046.namprd11.prod.outlook.com">
      <div class="WordSection1">
        <p class="MsoNormal"><o:p></o:p></p>
        <p class="MsoNormal">~Craig<o:p></o:p></p>
        <div>
          <div style="border:none;border-top:solid #E1E1E1
            1.0pt;padding:3.0pt 0in 0in 0in">
            <p class="MsoNormal"
              style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
              <b>From:</b> Hal Finkel <a class="moz-txt-link-rfc2396E" href="mailto:hfinkel@anl.gov"><hfinkel@anl.gov></a> <br>
              <b>Sent:</b> Thursday, August 20, 2020 12:35 PM<br>
              <b>To:</b> Topper, Craig <a class="moz-txt-link-rfc2396E" href="mailto:craig.topper@intel.com"><craig.topper@intel.com></a>;
              Kaylor, Andrew <a class="moz-txt-link-rfc2396E" href="mailto:andrew.kaylor@intel.com"><andrew.kaylor@intel.com></a>; Luo,
              Yuanke <a class="moz-txt-link-rfc2396E" href="mailto:yuanke.luo@intel.com"><yuanke.luo@intel.com></a>; Philip Reames
              <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a>;
              <a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>; <a class="moz-txt-link-abbreviated" href="mailto:florian_hahn@apple.com">florian_hahn@apple.com</a>; Lu,
              Hongjiu <a class="moz-txt-link-rfc2396E" href="mailto:hongjiu.lu@intel.com"><hongjiu.lu@intel.com></a><br>
              <b>Subject:</b> Re: [llvm-dev] Intel AMX programming model
              discussion.<o:p></o:p></p>
          </div>
        </div>
        <p class="MsoNormal"><o:p> </o:p></p>
        <p><o:p> </o:p></p>
        <div>
          <p class="MsoNormal">On 8/19/20 3:09 PM, Topper, Craig wrote:<o:p></o:p></p>
        </div>
        <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
          <p class="MsoNormal">The width and height can be runtime
            values that we would just copy into 64 byte configuration
            block we pass to ldtilecfg. So the code doesn’t need to be
            multiversioned. The user code would also use those values to
            update pointers in the loops they write using the tiles. If
            we can’t determine that two tiles were defined with the same
            width and height we need to assume the shape is different
            and try to avoid ever giving the same tile.<o:p></o:p></p>
          <p class="MsoNormal">Hal, for your suggestion would which
            physical registers are in which register class be defined
            dynamically before register allocation?<o:p></o:p></p>
        </blockquote>
        <p><o:p> </o:p></p>
        <p>Here's my thought:<o:p></o:p></p>
        <p>First, you have a set of intrinsics that take tile values
          along with tile configuration parameters (which, presently,
          seem just to be the sizes). These get lowered into
          pseudo-instructions that do the same. Thus, you have some
          register class that represents these arbitrarily-sized tile
          registers that you'll assign to these pseudo-instruction
          operands (i.e., they take virtual tile registers right after
          instruction selection). You might use the 16x16 tile register
          class for this purpose, but it shouldn't really matter.<o:p></o:p></p>
        <p>Second, you run this configuration-placement pass. This pass
          looks at all of the AMX pseudo-instructions and identifies
          regions in which the pseudo-instructions use the same
          configuration parameters (i.e., the same SSA values and/or
          constants). This pass might reorder the pseudo-instructions
          when legal in order to form larger regions. Then it places the
          ldtilecfg at the start of each region (in some common
          dominating position). ldtilecfg implicitly defines all of the
          tile registers in every concrete class of tile registers (all
          256 of them, or whatever). The pseudo-instructions are
          replaced by real MI instructions taking a tile register class
          appropriate for the configuration (which will default to the
          16x16 class for cases where the configuration is not a
          compile-time-known constant). When the configuration is a
          known constant, the instructions take operands with a register
          class appropriate for that configuration (e.g., 1x1, 4x4).<o:p></o:p></p>
        <p>Third, the rest of the framework runs as usual. Tile
          registers from the appropriate class are allocated by the
          register allocator. No live range of any virtual tile register
          can pass through the ldtilecfg (because it defines them all),
          but that's okay, none of live ranges will by construction (the
          configuration-placement pass ensures this).<o:p></o:p></p>
        <p>Thanks again,<o:p></o:p></p>
        <p>Hal<o:p></o:p></p>
        <p><o:p> </o:p></p>
        <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
          <p class="MsoNormal"> <o:p></o:p></p>
          <div>
            <div style="border:none;border-top:solid #E1E1E1
              1.0pt;padding:3.0pt 0in 0in 0in">
              <p class="MsoNormal"
                style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                <b>From:</b> Hal Finkel <a
                  href="mailto:hfinkel@anl.gov" moz-do-not-send="true"><hfinkel@anl.gov></a>
                <br>
                <b>Sent:</b> Wednesday, August 19, 2020 12:52 PM<br>
                <b>To:</b> Kaylor, Andrew <a
                  href="mailto:andrew.kaylor@intel.com"
                  moz-do-not-send="true"><andrew.kaylor@intel.com></a>;
                Luo, Yuanke
                <a href="mailto:yuanke.luo@intel.com"
                  moz-do-not-send="true"><yuanke.luo@intel.com></a>;
                Philip Reames <a
                  href="mailto:listmail@philipreames.com"
                  moz-do-not-send="true">
                  <listmail@philipreames.com></a>; <a
                  href="mailto:llvm-dev@lists.llvm.org"
                  moz-do-not-send="true">llvm-dev@lists.llvm.org</a>;
                <a href="mailto:florian_hahn@apple.com"
                  moz-do-not-send="true">florian_hahn@apple.com</a>;
                Topper, Craig
                <a href="mailto:craig.topper@intel.com"
                  moz-do-not-send="true"><craig.topper@intel.com></a>;
                Lu, Hongjiu
                <a href="mailto:hongjiu.lu@intel.com"
                  moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
                <b>Subject:</b> Re: [llvm-dev] Intel AMX programming
                model discussion.<o:p></o:p></p>
            </div>
          </div>
          <p class="MsoNormal"> <o:p></o:p></p>
          <p> <o:p></o:p></p>
          <div>
            <p class="MsoNormal">On 8/19/20 10:24 AM, Kaylor, Andrew
              wrote:<o:p></o:p></p>
          </div>
          <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
            <p>> When the tile shape is unknown at compile time, how
              do you plan to do the register allocation of the tiles? My
              question is: do you do the allocation for this case in the
              same way as you would if you knew the size was 16x16
              (i.e., conservatively assume the largest size)?<o:p></o:p></p>
            <p class="MsoNormal">I think what will happen is that the
              registers are allocated based on a number of runtime
              values that are assumed to be different from one another
              but less than or equal to 16. So, for example, we’ll
              allocate registers for MxN tiles, NxM tiles and MxM tiles
              without knowing what M and N are. Then at runtime the
              values of these variables will be used to create the
              actual tile configuration. The instructions that need to
              know the shape take these runtime values as operands.<o:p></o:p></p>
          </blockquote>
          <p> <o:p></o:p></p>
          <p>So you're going to multiversion the code?<o:p></o:p></p>
          <p>In any case, my point is that you probably don't need a
            custom register allocator. If you just define the tile
            registers and make sure that the ldtilecfgs implicitly
            defines them all, then the regular infrastructure likely
            works. You'll have a bunch of register classes, but that's
            not necessarily a problem. I recommend trying this, and let
            us know what you discover, before we go down the road of a
            new, dedicated allocator just for these registers.<o:p></o:p></p>
          <p> -Hal<o:p></o:p></p>
          <p> <o:p></o:p></p>
          <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
            <p class="MsoNormal">There may be some artifacts coming from
              the front end that conservatively assume a 16x16 tile, but
              I think those generally go away in SROA or later
              specialized passes. Yuanke can confirm or correct my
              understanding of this.<o:p></o:p></p>
            <p class="MsoNormal"> <o:p></o:p></p>
            <div>
              <div style="border:none;border-top:solid #E1E1E1
                1.0pt;padding:3.0pt 0in 0in 0in">
                <p class="MsoNormal"
                  style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                  <b>From:</b> Hal Finkel <a
                    href="mailto:hfinkel@anl.gov" moz-do-not-send="true"><hfinkel@anl.gov></a>
                  <br>
                  <b>Sent:</b> Wednesday, August 19, 2020 5:14 AM<br>
                  <b>To:</b> Luo, Yuanke <a
                    href="mailto:yuanke.luo@intel.com"
                    moz-do-not-send="true"><yuanke.luo@intel.com></a>;
                  Kaylor, Andrew
                  <a href="mailto:andrew.kaylor@intel.com"
                    moz-do-not-send="true"><andrew.kaylor@intel.com></a>;
                  Philip Reames
                  <a href="mailto:listmail@philipreames.com"
                    moz-do-not-send="true"><listmail@philipreames.com></a>;
                  <a href="mailto:llvm-dev@lists.llvm.org"
                    moz-do-not-send="true">
                    llvm-dev@lists.llvm.org</a>; <a
                    href="mailto:florian_hahn@apple.com"
                    moz-do-not-send="true">florian_hahn@apple.com</a>;
                  Topper, Craig
                  <a href="mailto:craig.topper@intel.com"
                    moz-do-not-send="true"><craig.topper@intel.com></a>;
                  Lu, Hongjiu
                  <a href="mailto:hongjiu.lu@intel.com"
                    moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
                  <b>Subject:</b> Re: [llvm-dev] Intel AMX programming
                  model discussion.<o:p></o:p></p>
              </div>
            </div>
            <p class="MsoNormal"> <o:p></o:p></p>
            <p> <o:p></o:p></p>
            <div>
              <p class="MsoNormal">On 8/19/20 5:34 AM, Luo, Yuanke
                wrote:<o:p></o:p></p>
            </div>
            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
              <p class="MsoNormal">There is no problem to have 256
                register classes. Just a lot of register classes to me.<o:p></o:p></p>
              <p class="MsoNormal">We don’t assume the shape of each
                physical register be 16x16, it is defined by user. For
                variable shape, I mean the shape is known in runtime and
                in compile time the shape is unknown. Take below code as
                an example, the %row and %col are variable instead of
                constant. Compiler recognizes llvm.x86.tileloadd64 and
                deduce the shape of %0 is %row x %col.<o:p></o:p></p>
              <p class="MsoNormal">%0 = tail call <256 x i32>
                @llvm.x86.tileloadd64(i16 %row, i16 %col, i8*
                getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf,
                i64 0, i64 0), i64 32)<o:p></o:p></p>
            </blockquote>
            <p> <o:p></o:p></p>
            <p>When the tile shape is unknown at compile time, how do
              you plan to do the register allocation of the tiles? My
              question is: do you do the allocation for this case in the
              same way as you would if you knew the size was 16x16
              (i.e., conservatively assume the largest size)?<o:p></o:p></p>
            <p>Thanks again,<o:p></o:p></p>
            <p>Hal<o:p></o:p></p>
            <p> <o:p></o:p></p>
            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
              <p class="MsoNormal"> <o:p></o:p></p>
              <div>
                <div style="border:none;border-top:solid #E1E1E1
                  1.0pt;padding:3.0pt 0in 0in 0in">
                  <p class="MsoNormal"
                    style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                    <b>From:</b> Hal Finkel <a
                      href="mailto:hfinkel@anl.gov"
                      moz-do-not-send="true"><hfinkel@anl.gov></a>
                    <br>
                    <b>Sent:</b> Wednesday, August 19, 2020 4:58 PM<br>
                    <b>To:</b> Luo, Yuanke <a
                      href="mailto:yuanke.luo@intel.com"
                      moz-do-not-send="true"><yuanke.luo@intel.com></a>;
                    Kaylor, Andrew
                    <a href="mailto:andrew.kaylor@intel.com"
                      moz-do-not-send="true"><andrew.kaylor@intel.com></a>;
                    Philip Reames
                    <a href="mailto:listmail@philipreames.com"
                      moz-do-not-send="true"><listmail@philipreames.com></a>;
                    <a href="mailto:llvm-dev@lists.llvm.org"
                      moz-do-not-send="true">
                      llvm-dev@lists.llvm.org</a>; <a
                      href="mailto:florian_hahn@apple.com"
                      moz-do-not-send="true">florian_hahn@apple.com</a>;
                    Topper, Craig
                    <a href="mailto:craig.topper@intel.com"
                      moz-do-not-send="true"><craig.topper@intel.com></a>;
                    Lu, Hongjiu
                    <a href="mailto:hongjiu.lu@intel.com"
                      moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
                    <b>Subject:</b> Re: [llvm-dev] Intel AMX programming
                    model discussion.<o:p></o:p></p>
                </div>
              </div>
              <p class="MsoNormal"> <o:p></o:p></p>
              <p> <o:p></o:p></p>
              <div>
                <p class="MsoNormal">On 8/19/20 2:21 AM, Luo, Yuanke
                  wrote:<o:p></o:p></p>
              </div>
              <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                <p class="MsoNormal"> <o:p></o:p></p>
                <p class="MsoNormal">Hi Hal,<o:p></o:p></p>
                <p class="MsoNormal">There is 3 aspect to be solved. <o:p></o:p></p>
                <p class="MsoListParagraph"
                  style="margin-left:.5in;text-indent:-.25in;mso-list:l0
                  level1 lfo2">
                  <!--[if !supportLists]--><span style="mso-list:Ignore">1.<span
                      style="font:7.0pt "Times New Roman"">      
                    </span></span><!--[endif]-->The HW support max shape
                  16x16, so there are many register classes from 1x1 to
                  16x16. We need 256 register classes.
                  <o:p></o:p></p>
                <p class="MsoListParagraph"
                  style="margin-left:.5in;text-indent:-.25in;mso-list:l0
                  level1 lfo2">
                  <!--[if !supportLists]--><span style="mso-list:Ignore">2.<span
                      style="font:7.0pt "Times New Roman"">      
                    </span></span><!--[endif]-->We want to support
                  variable shape, so compiler don’t know what register
                  class to fit tile shape as it is only known in
                  runtime.<o:p></o:p></p>
                <p class="MsoListParagraph"
                  style="margin-left:.5in;text-indent:-.25in;mso-list:l0
                  level1 lfo2">
                  <!--[if !supportLists]--><span style="mso-list:Ignore">3.<span
                      style="font:7.0pt "Times New Roman"">      
                    </span></span><!--[endif]-->The tile configure is to
                  configure physical tile register, so we need to
                  allocate register and then we know the shape of each
                  physical tile register and configure the tile
                  register.<o:p></o:p></p>
                <p class="MsoNormal">I think your suggestion is helpful
                  to reduce the complexity if we only support fixed
                  (constant) tile shape.<o:p></o:p></p>
                <p class="MsoNormal">-Yuanke<o:p></o:p></p>
              </blockquote>
              <p> <o:p></o:p></p>
              <p>Thanks, Yuanke.<o:p></o:p></p>
              <p>It's not clear to me that having 256 register classes
                is, in itself, a problem. Is it?<o:p></o:p></p>
              <p>What does it mean to support variable-shape tiles in
                this context? Do you do something other than
                conservatively assume that they are 16x16 for
                register-allocation purposes?<o:p></o:p></p>
              <p> -Hal<o:p></o:p></p>
              <p> <o:p></o:p></p>
              <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                <p class="MsoNormal"> <o:p></o:p></p>
                <div>
                  <div style="border:none;border-top:solid #E1E1E1
                    1.0pt;padding:3.0pt 0in 0in 0in">
                    <p class="MsoNormal"
                      style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                      <b>From:</b> Hal Finkel <a
                        href="mailto:hfinkel@anl.gov"
                        moz-do-not-send="true"><hfinkel@anl.gov></a>
                      <br>
                      <b>Sent:</b> Wednesday, August 19, 2020 8:20 AM<br>
                      <b>To:</b> Kaylor, Andrew <a
                        href="mailto:andrew.kaylor@intel.com"
                        moz-do-not-send="true"><andrew.kaylor@intel.com></a>;
                      Philip Reames
                      <a href="mailto:listmail@philipreames.com"
                        moz-do-not-send="true"><listmail@philipreames.com></a>;
                      Luo, Yuanke
                      <a href="mailto:yuanke.luo@intel.com"
                        moz-do-not-send="true"><yuanke.luo@intel.com></a>;
                      <a href="mailto:llvm-dev@lists.llvm.org"
                        moz-do-not-send="true">
                        llvm-dev@lists.llvm.org</a>; <a
                        href="mailto:florian_hahn@apple.com"
                        moz-do-not-send="true">florian_hahn@apple.com</a>;
                      Topper, Craig
                      <a href="mailto:craig.topper@intel.com"
                        moz-do-not-send="true"><craig.topper@intel.com></a>;
                      Lu, Hongjiu
                      <a href="mailto:hongjiu.lu@intel.com"
                        moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
                      <b>Subject:</b> Re: [llvm-dev] Intel AMX
                      programming model discussion.<o:p></o:p></p>
                  </div>
                </div>
                <p class="MsoNormal"> <o:p></o:p></p>
                <p>Hi, Andy,<o:p></o:p></p>
                <p>I don't quite understand everything that's going on
                  here. Could we model this as:<o:p></o:p></p>
                <p> 1. Define a collection of register classes, one for
                  2x4 tiles, one for 4x2 tiles, etc. each populated with
                  a set of tile registers. Registers can have aliasing
                  relationships (instead of worrying of any kind of
                  subregister/superregister relationships -- these won't
                  be useful anyway).<o:p></o:p></p>
                <p> 2. Define the tile-configuration instructions so
                  that they implicitly define all of the registers in
                  all of the classes.<o:p></o:p></p>
                <p>Then you would still need to pre-schedule the tile
                  operations as you've described, and collect the
                  configuration information in order to add the
                  ldtilecfgs, but the regular register allocator can
                  handle the allocation itself in the usual way. What do
                  you think?<o:p></o:p></p>
                <p> -Hal<o:p></o:p></p>
                <div>
                  <p class="MsoNormal">On 8/18/20 6:58 PM, Kaylor,
                    Andrew via llvm-dev wrote:<o:p></o:p></p>
                </div>
                <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                  <p class="MsoNormal"
                    style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                    The AMX registers are complicated. The single
                    configuration register (which is mostly used
                    implicitly, similar to MXCSR for floating point)
                    controls the shape of all the tile registers, and if
                    you change the tile configuration every single tile
                    register is cleared. In practice, if we have to
                    change the the configuration while any of the tile
                    registers are live, performance is going to be
                    terrible. We need to handle this case for
                    correctness, but users of this programming interface
                    will need to have enough awareness of the
                    performance issues and the hardware details to
                    prevent this. We’ll also want a diagnostic that lets
                    the user know when this has happened.<o:p></o:p></p>
                  <p class="MsoNormal"
                    style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                     <o:p></o:p></p>
                  <p class="MsoNormal"
                    style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                    When the tile configuration is set, the shape of
                    each tile is locked in, so the individual tile
                    registers aren’t interchangeable at that point. If a
                    function needs 2x4 tiles, 4x2 tiles, and 4x4 tiles,
                    the configuration needs to be set with this in mind.
                    The shape isn’t explicit in every instruction and
                    intrinsic. It must be deduced. And again, we’ll need
                    a way to tell the user when efficient allocation
                    can’t be done. In practice, I don’t expect any
                    function to be using more than three tile shapes.<o:p></o:p></p>
                  <p class="MsoNormal"
                    style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                     <o:p></o:p></p>
                  <p class="MsoNormal"
                    style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                    The implication of all this is that I don’t think
                    the greedy register allocator is well suited to
                    figure all of this out. We need a special pass to
                    pre-allocate these registers. If the function is
                    written in a way that makes good performance
                    possible, it should be a relatively simple task to
                    allocate everything with minimal spilling. If it
                    isn’t possible to get good performance, we don’t
                    need to do anything especially clever. We can just
                    do something straightforward that is correct and let
                    the user know that they aren’t going to be happy
                    with the results.<o:p></o:p></p>
                  <p class="MsoNormal"> <o:p></o:p></p>
                  <p class="MsoNormal">-Andy<o:p></o:p></p>
                  <p class="MsoNormal"> <o:p></o:p></p>
                  <div>
                    <div style="border:none;border-top:solid #E1E1E1
                      1.0pt;padding:3.0pt 0in 0in 0in">
                      <p class="MsoNormal"
                        style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                        <b>From:</b> Philip Reames <a
                          href="mailto:listmail@philipreames.com"
                          moz-do-not-send="true"><listmail@philipreames.com></a>
                        <br>
                        <b>Sent:</b> Friday, August 14, 2020 8:29 PM<br>
                        <b>To:</b> Luo, Yuanke <a
                          href="mailto:yuanke.luo@intel.com"
                          moz-do-not-send="true"><yuanke.luo@intel.com></a>;
                        <a href="mailto:llvm-dev@lists.llvm.org"
                          moz-do-not-send="true">llvm-dev@lists.llvm.org</a>;
                        <a href="mailto:florian_hahn@apple.com"
                          moz-do-not-send="true">
                          florian_hahn@apple.com</a>; Kaylor, Andrew <a
                          href="mailto:andrew.kaylor@intel.com"
                          moz-do-not-send="true">
                          <andrew.kaylor@intel.com></a>; Topper,
                        Craig <a href="mailto:craig.topper@intel.com"
                          moz-do-not-send="true">
                          <craig.topper@intel.com></a>; Lu,
                        Hongjiu <a href="mailto:hongjiu.lu@intel.com"
                          moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
                        <b>Subject:</b> Re: [llvm-dev] Intel AMX
                        programming model discussion.<o:p></o:p></p>
                    </div>
                  </div>
                  <p class="MsoNormal"> <o:p></o:p></p>
                  <p>I find your answer unconvincing.  I'm not going to
                    debate it as I don't wish to take the time to build
                    the appropriate context, but my initial response is
                    skepticism.<o:p></o:p></p>
                  <p>Philip<o:p></o:p></p>
                  <div>
                    <p class="MsoNormal">On 8/14/20 4:49 PM, Luo, Yuanke
                      wrote:<o:p></o:p></p>
                  </div>
                  <blockquote
                    style="margin-top:5.0pt;margin-bottom:5.0pt">
                    <p class="MsoNormal">[Yuanke] AMX register is
                      special. It needs to be configured before use and
                      the config instruction is expensive. To avoid
                      unnecessary tile configure, we collect the tile
                      shape information as much as possible and combine
                      them into one ldtilecfg instruction. The ldtilecfg
                      instruction should dominate any AMX instruction
                      that access tile register. On the other side, the
                      ldtilecfg should post-dominated the instruction
                      that define the tile shape. For tile register
                      spill, it should avoid re-config due to the
                      different tile shape, the spilled register should
                      be reloaded to the register that share the same
                      tile shape. Since tile register allocation is
                      special and it may allocate general virtual
                      register to configure tile register, we can add a
                      sperate pass to do it before general register
                      allocation pass. After register allocation, the
                      tile shape information is not needed anymore, so
                      we can transform the pseudo AMX instruction to
                      real AMX instruction by removing the row and
                      column operands.<o:p></o:p></p>
                    <p>[Philip]<o:p></o:p></p>
                    <p>This seems complicated.<o:p></o:p></p>
                    <p>Reading through the documentation, there appears
                      to be a single global tile config for all tile
                      registers at any time.<o:p></o:p></p>
                    <p>Why not simply model this tile config as a
                      designated special register and the tile
                      instructions as having an implicit use of this
                      register?  That would seem to ensure that the
                      register allocator has all the constraints
                      needed.  You'd need to teach it how to spill the
                      special registers with the appropriate
                      instructions, but that seems a lot more straight
                      forward?<o:p></o:p></p>
                    <p class="MsoNormal"><span
                        style="font-size:10.5pt;line-height:105%">[Yuanke]
                        In that case user need to configure the tile
                        register by themselves. Spilling configure
                        register is very expensive, because it clears
                        all the tile data register to zero. In our
                        proposal, compiler is responsible to deduce the
                        shape for virtual of tile data register,
                        allocate physical registers for them and then
                        configure those physical register. We may build
                        the dependency as you proposed and it can be
                        used for machine IR check to ensure tile data
                        register is configured before use. </span><o:p></o:p></p>
                    <p class="MsoNormal"><span
                        style="font-size:10.5pt;line-height:105%"> </span><o:p></o:p></p>
                    <div>
                      <div style="border:none;border-top:solid #E1E1E1
                        1.0pt;padding:3.0pt 0in 0in 0in">
                        <p class="MsoNormal"
                          style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                          <b>From:</b> Philip Reames <a
                            href="mailto:listmail@philipreames.com"
                            moz-do-not-send="true"><listmail@philipreames.com></a>
                          <br>
                          <b>Sent:</b> Saturday, August 15, 2020 1:17 AM<br>
                          <b>To:</b> Luo, Yuanke <a
                            href="mailto:yuanke.luo@intel.com"
                            moz-do-not-send="true"><yuanke.luo@intel.com></a>;
                          <a href="mailto:llvm-dev@lists.llvm.org"
                            moz-do-not-send="true">llvm-dev@lists.llvm.org</a>;
                          <a href="mailto:florian_hahn@apple.com"
                            moz-do-not-send="true">
                            florian_hahn@apple.com</a>; Kaylor, Andrew <a
                            href="mailto:andrew.kaylor@intel.com"
                            moz-do-not-send="true">
                            <andrew.kaylor@intel.com></a>; Topper,
                          Craig <a href="mailto:craig.topper@intel.com"
                            moz-do-not-send="true">
                            <craig.topper@intel.com></a>; Lu,
                          Hongjiu <a href="mailto:hongjiu.lu@intel.com"
                            moz-do-not-send="true"><hongjiu.lu@intel.com></a><br>
                          <b>Subject:</b> Re: [llvm-dev] Intel AMX
                          programming model discussion.<o:p></o:p></p>
                      </div>
                    </div>
                    <p class="MsoNormal"> <o:p></o:p></p>
                    <p> <o:p></o:p></p>
                    <div>
                      <p class="MsoNormal">On 8/14/20 6:27 AM, Luo,
                        Yuanke via llvm-dev wrote:<o:p></o:p></p>
                    </div>
                    <blockquote
                      style="margin-top:5.0pt;margin-bottom:5.0pt">
                      <p class="MsoNormal">Hi,<o:p></o:p></p>
                      <p class="MsoNormal">Intel Advanced Matrix
                        Extensions (Intel AMX) is a new programming
                        paradigm consisting of two components: a set of
                        2-dimensional registers (tiles) representing
                        sub-arrays from a larger 2-dimensional memory
                        image, and accelerators able to operate on
                        tiles. Capability of Intel AMX implementation is
                        enumerated by palettes. Two palettes are
                        supported: palette 0 represents the initialized
                        state and palette 1 consists of 8 tile registers
                        of up to 1 KB size, which is controlled by a
                        tile control register.<o:p></o:p></p>
                      <p class="MsoNormal">The instruction manual is
                        posted at <a
href="https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html"
                          moz-do-not-send="true">
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html</a>.<o:p></o:p></p>
                      <p class="MsoNormal">The AMX abi proposal is
                        posted at <a
                          href="https://groups.google.com/g/x86-64-abi/c/NRejFm7pwb4"
                          moz-do-not-send="true">
https://groups.google.com/g/x86-64-abi/c/NRejFm7pwb4</a>.<o:p></o:p></p>
                      <p class="MsoNormal">This email is to discuss the
                        programming model for AMX. Florian has
                        introduced the matrix type and intrinsics in
                        LLVM community. We’d like to adopt some ideas
                        from it.<o:p></o:p></p>
                      <p class="MsoNormal">Here is what we propose for
                        the AMX programming model.<o:p></o:p></p>
                      <p class="MsoListParagraph"
                        style="margin-left:.5in;text-indent:-.25in;mso-list:l1
                        level1 lfo4">
                        <!--[if !supportLists]--><span
                          style="mso-list:Ignore">1.<span
                            style="font:7.0pt "Times New
                            Roman"">      
                          </span></span><!--[endif]--> Data type. <o:p></o:p></p>
                      <p class="MsoNormal">We’d like to have fixed
                        vector type for AMX. Since the shape to AMX
                        register can be configurable, the vector size is
                        the maximum size of AMX register. That means the
                        vector size is 1024 bytes.<o:p></o:p></p>
                      <p class="MsoNormal">The C code may look like
                        this.<o:p></o:p></p>
                      <p class="MsoNormal">typedef int _tile_data
                        __attribute__((__vector_size__(1024),
                        __aligned__(64)));<o:p></o:p></p>
                      <p class="MsoNormal">_tile_data tile;<o:p></o:p></p>
                      <p class="MsoNormal">And the LLVM IR may look like
                        this.<o:p></o:p></p>
                      <p class="MsoNormal">@tile = dso_local
                        local_unnamed_addr global <256 x i32>
                        zeroinitializer, align 64<o:p></o:p></p>
                      <p class="MsoNormal">For llvm IR, it is nice to
                        have a new type x86_amxtile that can be mapped
                        to AMX registers.<o:p></o:p></p>
                      <p class="MsoListParagraph"
                        style="margin-left:.5in;text-indent:-.25in;mso-list:l1
                        level1 lfo4">
                        <!--[if !supportLists]--><span
                          style="mso-list:Ignore">2.<span
                            style="font:7.0pt "Times New
                            Roman"">      
                          </span></span><!--[endif]-->AMX Intrinsics. <o:p></o:p></p>
                      <p class="MsoNormal">The internal intrinsics are
                        1:1 mapped to AMX instructions. The parameter m,
                        n, k identifies the shape of the tile. The shape
                        can be variable, but it cannot exceed the size
                        that AMX HW can support. Compiler can deduce
                        shape of the tile from the AMX intrinsics.<o:p></o:p></p>
                      <p class="MsoNormal" style="text-indent:5.5pt">_tile_data
                        _tile_loadd_internal(char m, short n, const void
                        *base, int stride);<o:p></o:p></p>
                      <p class="MsoNormal">_tile_data
                        _tile_dpbssd_internal(char m, short n, short k,
                        _tile_data dst, _tile_data src1, _tile_data
                        src2);<o:p></o:p></p>
                      <p class="MsoNormal">_tile_data
                        _tile_dpbf16ps_internal(char m, short n, short
                        k, _tile_data dst, _tile_data src1, _tile_data
                        src2);<o:p></o:p></p>
                      <p class="MsoNormal">void
                        _tile_stored_internal(char m, short n, void
                        *base, int stride, _tile_data tile);<o:p></o:p></p>
                      <p class="MsoListParagraph"
                        style="margin-left:.5in;text-indent:-.25in;mso-list:l1
                        level1 lfo4">
                        <!--[if !supportLists]--><span
                          style="mso-list:Ignore">3.<span
                            style="font:7.0pt "Times New
                            Roman"">      
                          </span></span><!--[endif]-->User interfaces.<o:p></o:p></p>
                      <p class="MsoNormal">The tile shape and tile data
                        are combined into a struct in C language. The
                        shape of the tile is only allowed to be
                        initialized once. The user interface looks as
                        this.<o:p></o:p></p>
                      <p class="MsoNormal">   3  #define
                        __DEFAULT_FN_AMX    \<o:p></o:p></p>
                      <p class="MsoNormal">   4 
                        __attribute__((__always_inline__, __nodebug__,
                        __target__("amx-int8")))<o:p></o:p></p>
                      <p class="MsoNormal">   9 typedef struct
                        __tile_str {<o:p></o:p></p>
                      <p class="MsoNormal">10   const char row;<o:p></o:p></p>
                      <p class="MsoNormal">11   const short col;<o:p></o:p></p>
                      <p class="MsoNormal">12   _tile_data tile;<o:p></o:p></p>
                      <p class="MsoNormal">13 }__tile;<o:p></o:p></p>
                      <p class="MsoNormal">14<o:p></o:p></p>
                      <p class="MsoNormal">15 __DEFAULT_FN_AMX<o:p></o:p></p>
                      <p class="MsoNormal">16 void __tile_loadd(__tile
                        *dst, const void *base, long stride) {<o:p></o:p></p>
                      <p class="MsoNormal">17   dst->tile =
                        _tile_loadd_internal(dst->row, dst->col,
                        base, stride);<o:p></o:p></p>
                      <p class="MsoNormal">18 }<o:p></o:p></p>
                      <p class="MsoNormal">19<o:p></o:p></p>
                      <p class="MsoNormal">20 __DEFAULT_FN_AMX<o:p></o:p></p>
                      <p class="MsoNormal">21 void __tile_dpbsud(__tile
                        *dst, __tile src1, __tile src2) {<o:p></o:p></p>
                      <p class="MsoNormal">22   dst->tile =
                        _tile_dpbssd_internal(src1.row, src2.col,
                        src1.col, dst->tile, src1.tile, src2.tile);<o:p></o:p></p>
                      <p class="MsoNormal">23 }<o:p></o:p></p>
                      <p class="MsoNormal">24<o:p></o:p></p>
                      <p class="MsoNormal">25 __DEFAULT_FN_AMX<o:p></o:p></p>
                      <p class="MsoNormal">26 void __tile_stored(void
                        *base, long stride, __tile src) {<o:p></o:p></p>
                      <p class="MsoNormal">27  
                        _tile_stored_internal(src.row, src.col, base,
                        stride, src.tile);<o:p></o:p></p>
                      <p class="MsoNormal">28 }<o:p></o:p></p>
                      <p class="MsoNormal"> <o:p></o:p></p>
                      <p class="MsoListParagraph"
                        style="margin-left:.5in;text-indent:-.25in;mso-list:l1
                        level1 lfo4">
                        <!--[if !supportLists]--><span
                          style="mso-list:Ignore">4.<span
                            style="font:7.0pt "Times New
                            Roman"">      
                          </span></span><!--[endif]-->Example code<o:p></o:p></p>
                      <p class="MsoNormal">The example shows how to use
                        the user interface in a function.
                        <o:p></o:p></p>
                      <p class="MsoNormal"> 51 void api(int cond, short
                        row, short col) {<o:p></o:p></p>
                      <p class="MsoNormal">52   __tile a = {row, col};<o:p></o:p></p>
                      <p class="MsoNormal">53   __tile b = {row, col};<o:p></o:p></p>
                      <p class="MsoNormal">54   __tile c = {row, col};<o:p></o:p></p>
                      <p class="MsoNormal">55<o:p></o:p></p>
                      <p class="MsoNormal">56   if(cond) {<o:p></o:p></p>
                      <p class="MsoNormal">57     __tile_loadd(&a,
                        buf, STRIDE);<o:p></o:p></p>
                      <p class="MsoNormal">58     __tile_loadd(&b,
                        buf, STRIDE);<o:p></o:p></p>
                      <p class="MsoNormal">59     __tile_loadd(&c,
                        buf, STRIDE);<o:p></o:p></p>
                      <p class="MsoNormal">60   } else {<o:p></o:p></p>
                      <p class="MsoNormal">61     __tile_loadd(&a,
                        buf2, STRIDE);<o:p></o:p></p>
                      <p class="MsoNormal">62     __tile_loadd(&b,
                        buf2, STRIDE);<o:p></o:p></p>
                      <p class="MsoNormal">63     __tile_loadd(&c,
                        buf2, STRIDE);<o:p></o:p></p>
                      <p class="MsoNormal">64   }<o:p></o:p></p>
                      <p class="MsoNormal"><span lang="IT">65  
                          __tile_dpbsud(&c, a, b);</span><o:p></o:p></p>
                      <p class="MsoNormal">66   __tile_stored(buf,
                        STRIDE, c);<o:p></o:p></p>
                      <p class="MsoNormal">67 }<o:p></o:p></p>
                      <p class="MsoListParagraph"
                        style="margin-left:.5in;text-indent:-.25in;mso-list:l1
                        level1 lfo4">
                        <!--[if !supportLists]--><span
                          style="mso-list:Ignore">5.<span
                            style="font:7.0pt "Times New
                            Roman"">      
                          </span></span><!--[endif]-->LLVM IR<o:p></o:p></p>
                      <p class="MsoNormal">The LLVM intrinsics IR take
                        the row and column information as the input
                        parameter, so that compiler can deduce the shape
                        of tile data. The remaining parameters are what
                        AMX instructions require. This is the LLVM IR
                        corresponding to the example code.<o:p></o:p></p>
                      <p class="MsoNormal">12 define dso_local void
                        @api(i32 %cond, i16 signext %row, i16 signext
                        %col) local_unnamed_addr #2 {<o:p></o:p></p>
                      <p class="MsoNormal">13 entry:<o:p></o:p></p>
                      <p class="MsoNormal">14   %tobool = icmp eq i32
                        %cond, 0<o:p></o:p></p>
                      <p class="MsoNormal">15   %sext = shl i16 %col, 8<o:p></o:p></p>
                      <p class="MsoNormal">16   %conv.i31 = ashr exact
                        i16 %sext, 8<o:p></o:p></p>
                      <p class="MsoNormal">17   br i1 %tobool, label
                        %if.else, label %if.then<o:p></o:p></p>
                      <p class="MsoNormal">18<o:p></o:p></p>
                      <p class="MsoNormal">19
                        if.then:                                         
                        ; preds = %entry<o:p></o:p></p>
                      <p class="MsoNormal">20   %0 = tail call <256 x
                        i32> @llvm.x86.tileloadd64(i16 %row, i16
                        %conv.i31, i8* getelementptr inbounds ([1024 x
                        i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32)
                        #3<o:p></o:p></p>
                      <p class="MsoNormal">21   %1 = tail call <256 x
                        i32> @llvm.x86.tileloadd64(i16 %row, i16
                        %conv.i31, i8* getelementptr inbounds ([1024 x
                        i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32)
                        #3<o:p></o:p></p>
                      <p class="MsoNormal">22   %2 = tail call <256 x
                        i32> @llvm.x86.tileloadd64(i16 %row, i16
                        %conv.i31, i8* getelementptr inbounds ([1024 x
                        i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32)
                        #3<o:p></o:p></p>
                      <p class="MsoNormal">23   br label %if.end<o:p></o:p></p>
                      <p class="MsoNormal">24<o:p></o:p></p>
                      <p class="MsoNormal">25
                        if.else:                     
                                            ; preds = %entry<o:p></o:p></p>
                      <p class="MsoNormal">26   %3 = tail call <256 x
                        i32> @llvm.x86.tileloadd64(i16 %row, i16
                        %conv.i31, i8* getelementptr inbounds ([1024 x
                        i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32)
                        #3<o:p></o:p></p>
                      <p class="MsoNormal">27   %4 = tail call <256 x
                        i32> @llvm.x86.tileloadd64(i16 %row, i16
                        %conv.i31, i8* getelementptr inbounds ([1024 x
                        i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32)
                        #3<o:p></o:p></p>
                      <p class="MsoNormal">28   %5 = tail call <256 x
                        i32> @llvm.x86.tileloadd64(i16 %row, i16
                        %conv.i31, i8* getelementptr inbounds ([1024 x
                        i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32)
                        #3<o:p></o:p></p>
                      <p class="MsoNormal">29   br label %if.end<o:p></o:p></p>
                      <p class="MsoNormal">30<o:p></o:p></p>
                      <p class="MsoNormal">31
                        if.end:                                          
                        ; preds = %if.else, %if.then<o:p></o:p></p>
                      <p class="MsoNormal">32   %a.sroa.1186.0 = phi
                        <256 x i32> [ %3, %if.else ], [ %0,
                        %if.then ]<o:p></o:p></p>
                      <p class="MsoNormal">33   %b.sroa.1068.0 = phi
                        <256 x i32> [ %4, %if.else ], [ %1,
                        %if.then ]<o:p></o:p></p>
                      <p class="MsoNormal">34   %c.sroa.1149.0 = phi
                        <256 x i32> [ %5, %if.else ], [ %2,
                        %if.then ]<o:p></o:p></p>
                      <p class="MsoNormal">35   %6 = tail call <256 x
                        i32> @llvm.x86.tdpbssd(i16 %row, i16
                        %conv.i31, i16 %conv.i31, <256 x i32>
                        %c.sroa.1149.0, <256 x i32>
                        %a.sroa.1186.0, <256 x i32>
                        %b.sroa.1068.0) #3<o:p></o:p></p>
                      <p class="MsoNormal">36   tail call void
                        @llvm.x86.tilestored64(i16 %row, i16 %conv.i31,
                        i8* getelementptr inbounds ([1024 x i8], [1024 x
                        i8]* @buf, i64 0, i64 0), i64 32, <256 x
                        i32> %6) #3<o:p></o:p></p>
                      <p class="MsoNormal">37   ret void<o:p></o:p></p>
                      <p class="MsoNormal">38 }<o:p></o:p></p>
                      <p class="MsoListParagraph"
                        style="margin-left:.5in;text-indent:-.25in;mso-list:l1
                        level1 lfo4">
                        <!--[if !supportLists]--><span
                          style="mso-list:Ignore">6.<span
                            style="font:7.0pt "Times New
                            Roman"">      
                          </span></span><!--[endif]-->Shape propagation<o:p></o:p></p>
                      <p class="MsoNormal">When in -O0 build, some
                        general load/store for tile vector is generated
                        by front-end. We need to root from AMX
                        intrinsics to propagate the shape information to
                        the virtual tile register. If the an AMX
                        intrinsic use the result of load instruction,
                        the shape is propagated to the load and the load
                        is transformed to tile load intrinsic. If the
                        store instruction uses any result of AMX
                        intrinsic, the shape is propagated to store
                        instruction and the store is transformed to tile
                        store intrinsic<o:p></o:p></p>
                      <p class="MsoListParagraph"
                        style="margin-left:.5in;text-indent:-.25in;mso-list:l1
                        level1 lfo4">
                        <!--[if !supportLists]--><span
                          style="mso-list:Ignore">7.<span
                            style="font:7.0pt "Times New
                            Roman"">      
                          </span></span><!--[endif]-->Machine IR<o:p></o:p></p>
                      <p class="MsoNormal">Since the AMX intrinsics take
                        the row and column as the input parameters, we
                        can create a pseudo instruction corresponding to
                        it. The AMX intrinsics are lowered to the pseudo
                        AMX instruction which has extra row and column
                        operands corresponding to AMX intrinsic. The
                        real AMX instructions don’t need the row and
                        column operands. The row and column information
                        should be configured by ldtilecfg before
                        executing any AMX instruction.<o:p></o:p></p>
                      <p class="MsoListParagraph"
                        style="margin-left:.5in;text-indent:-.25in;mso-list:l1
                        level1 lfo4">
                        <!--[if !supportLists]--><span
                          style="mso-list:Ignore">8.<span
                            style="font:7.0pt "Times New
                            Roman"">      
                          </span></span><!--[endif]-->Register
                        allocation<o:p></o:p></p>
                      <p class="MsoNormal">AMX register is special. It
                        needs to be configured before use and the config
                        instruction is expensive. To avoid unnecessary
                        tile configure, we collect the tile shape
                        information as much as possible and combine them
                        into one ldtilecfg instruction. The ldtilecfg
                        instruction should dominate any AMX instruction
                        that access tile register. On the other side,
                        the ldtilecfg should post-dominated the
                        instruction that define the tile shape. For tile
                        register spill, it should avoid re-config due to
                        the different tile shape, the spilled register
                        should be reloaded to the register that share
                        the same tile shape. Since tile register
                        allocation is special and it may allocate
                        general virtual register to configure tile
                        register, we can add a sperate pass to do it
                        before general register allocation pass. After
                        register allocation, the tile shape information
                        is not needed anymore, so we can transform the
                        pseudo AMX instruction to real AMX instruction
                        by removing the row and column operands.<o:p></o:p></p>
                    </blockquote>
                    <p>This seems complicated.<o:p></o:p></p>
                    <p>Reading through the documentation, there appears
                      to be a single global tile config for all tile
                      registers at any time.<o:p></o:p></p>
                    <p>Why not simply model this tile config as a
                      designated special register and the tile
                      instructions as having an implicit use of this
                      register?  That would seem to ensure that the
                      register allocator has all the constraints
                      needed.  You'd need to teach it how to spill the
                      special registers with the appropriate
                      instructions, but that seems a lot more straight
                      forward?<o:p></o:p></p>
                    <blockquote
                      style="margin-top:5.0pt;margin-bottom:5.0pt">
                      <p class="MsoListParagraph"
                        style="margin-left:.5in;text-indent:-.25in;mso-list:l1
                        level1 lfo4">
                        <!--[if !supportLists]--><span
                          style="mso-list:Ignore">9.<span
                            style="font:7.0pt "Times New
                            Roman"">      
                          </span></span><!--[endif]-->Use recommendation
                        <o:p></o:p></p>
                      <p class="MsoNormal">Due to the shape configure
                        issue, we recommend user to define the tile
                        shape at the entry of the function entry and
                        inline function as much as possible. The AMX
                        instructions focus on computation instead of
                        storage, so global variable for tile data is not
                        recommended.<o:p></o:p></p>
                      <p class="MsoNormal"><span
                          style="font-size:10.5pt;line-height:105%"> </span><o:p></o:p></p>
                      <p class="MsoNormal"><span
                          style="font-size:10.5pt;line-height:105%">Thanks</span><o:p></o:p></p>
                      <p class="MsoNormal"><span
                          style="font-size:10.5pt;line-height:105%">Yuanke</span><o:p></o:p></p>
                      <p class="MsoNormal"
                        style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                        <br>
                        <br>
                        <br>
                        <br>
                        <br>
                        <br>
                        <br>
                        <br>
                        <o:p></o:p></p>
                      <pre>_______________________________________________<o:p></o:p></pre>
                      <pre>LLVM Developers mailing list<o:p></o:p></pre>
                      <pre><a href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><o:p></o:p></pre>
                      <pre><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><o:p></o:p></pre>
                    </blockquote>
                  </blockquote>
                  <p class="MsoNormal"
                    style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <o:p></o:p></p>
                  <pre>_______________________________________________<o:p></o:p></pre>
                  <pre>LLVM Developers mailing list<o:p></o:p></pre>
                  <pre><a href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><o:p></o:p></pre>
                  <pre><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><o:p></o:p></pre>
                </blockquote>
                <pre>-- <o:p></o:p></pre>
                <pre>Hal Finkel<o:p></o:p></pre>
                <pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
                <pre>Leadership Computing Facility<o:p></o:p></pre>
                <pre>Argonne National Laboratory<o:p></o:p></pre>
              </blockquote>
              <pre>-- <o:p></o:p></pre>
              <pre>Hal Finkel<o:p></o:p></pre>
              <pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
              <pre>Leadership Computing Facility<o:p></o:p></pre>
              <pre>Argonne National Laboratory<o:p></o:p></pre>
            </blockquote>
            <pre>-- <o:p></o:p></pre>
            <pre>Hal Finkel<o:p></o:p></pre>
            <pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
            <pre>Leadership Computing Facility<o:p></o:p></pre>
            <pre>Argonne National Laboratory<o:p></o:p></pre>
          </blockquote>
          <pre>-- <o:p></o:p></pre>
          <pre>Hal Finkel<o:p></o:p></pre>
          <pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
          <pre>Leadership Computing Facility<o:p></o:p></pre>
          <pre>Argonne National Laboratory<o:p></o:p></pre>
        </blockquote>
        <pre>-- <o:p></o:p></pre>
        <pre>Hal Finkel<o:p></o:p></pre>
        <pre>Lead, Compiler Technology and Programming Languages<o:p></o:p></pre>
        <pre>Leadership Computing Facility<o:p></o:p></pre>
        <pre>Argonne National Laboratory<o:p></o:p></pre>
      </div>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>