[llvm-dev] Intel AMX programming model discussion.
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Fri Aug 14 17:45:59 PDT 2020
On 8/14/20 6:39 PM, Luo, Yuanke wrote:
> *From:*Hal Finkel <hfinkel at anl.gov>
> *Sent:* Friday, August 14, 2020 11:27 PM
> *To:* Luo, Yuanke <yuanke.luo at intel.com>; llvm-dev at lists.llvm.org;
> florian_hahn at apple.com; Kaylor, Andrew <andrew.kaylor at intel.com>;
> Topper, Craig <craig.topper at intel.com>; Lu, Hongjiu <hongjiu.lu at intel.com>
> *Subject:* Re: [llvm-dev] Intel AMX programming model discussion.
> On 8/14/20 8:27 AM, Luo, Yuanke via llvm-dev wrote:
> 8.Register allocation
> AMX register is special. It needs to be configured before use and
> the config instruction is expensive. To avoid unnecessary tile
> configure, we collect the tile shape information as much as
> possible and combine them into one ldtilecfg instruction. The
> ldtilecfg instruction should dominate any AMX instruction that
> access tile register. On the other side, the ldtilecfg should
> post-dominated the instruction that define the tile shape. For
> tile register spill, it should avoid re-config due to the
> different tile shape, the spilled register should be reloaded to
> the register that share the same tile shape. Since tile register
> allocation is special and it may allocate general virtual register
> to configure tile register, we can add a sperate pass to do it
> before general register allocation pass. After register
> allocation, the tile shape information is not needed anymore, so
> we can transform the pseudo AMX instruction to real AMX
> instruction by removing the row and column operands.
> Can you take advantage of our IPRA capability so that internal
> function calls might avoid this reconfiguration if the necessary
> configuration is always done in the caller?
> [Yuanke] I don’t know IPRA capability and I am very interesting on it.
> Would you post some linkage that introduce IPRA?
Interestingly, it looks like some documentation was written but never
<https://reviews.llvm.org/D23980> - in general, if you search for IPRA
in LLVM, you'll see the relevant pieces. The really short description is
that functions are emitted in topological order, leaves of the call
graph first, so that customized clobber register masks can be attached
to call sites of relevant internal functions.
> How will the implementation of __builtin_setjmp/longjmp be affected?
> [Yuanke] That depends on the ABI. We propose all tile register is
> caller saved, so I think setjmp/longjmp is not affected.
> Thanks again,
> 9.Use recommendation
> Due to the shape configure issue, we recommend user to define the
> tile shape at the entry of the function entry and inline function
> as much as possible. The AMX instructions focus on computation
> instead of storage, so global variable for tile data is not
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev