[llvm-dev] [RFC] Matrix support (take 2)
Simon Moll via llvm-dev
llvm-dev at lists.llvm.org
Wed Dec 19 14:37:29 PST 2018
On 12/19/18 11:07 PM, Adam Nemet via llvm-dev wrote:
>> On Dec 19, 2018, at 1:31 PM, Stephen Canon <scanon at apple.com
>> <mailto:scanon at apple.com>> wrote:
>>> On Dec 19, 2018, at 11:09 AM, Stephen Canon via llvm-dev
>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>> On Dec 18, 2018, at 10:18 PM, Adam Nemet <anemet at apple.com
>>>> <mailto:anemet at apple.com>> wrote:
>>>>> I don’t understand this. What is the benefit of providing layout
>>>>> info to element wise operations? This defeats the goal of having
>>>>> simple lowering and representation: you are encoding an ND vector
>>>>> form into the IR in a really ugly way, and this will cause a
>>>>> proliferation of intrinsics that are redundant with the core ops.
>>>> The reason we need that information so that for example we can
>>>> lower an operation on a 3-element column into a vector of 2 and a
>>>> scalar op. This should be beneficial for power consumption since
>>>> for example in the case of a 3x3 with a single element padding
>>>> rather than operating on 12 elements you’d operate only on 9
>>>> (vector ops consume more power than their scalar counterparts).
>>>> That said we should be able to remove these intrinsics in the long
>>>> term. Once we have masking on the core ops in the IR, we should be
>>>> able to express the same semantics without dedicated intrinsics.
>>> There may be some cases where this holds (maybe with 5x5 or
>>> something), but most of the time I would expect to get better power
>>> from doing a four-element vector op with one wasted lane than doing
>>> two arithmetic ops (plus possibly extracts and inserts, depending on
>>> physical layout details).
>>> Explicit masking or arranging for zero in padding lanes seems like a
>>> better way forward to me.
>>> – Steve
>> I spent some time chatting with Adam about this and have a better
>> understanding of his concerns here. It seems to me that if having
>> masking intrinsics is the long-term solution we want, we should do
>> that now (for add and sub) rather than building arbitrary matrix
>> layout info into intrinsics, since a mask has all the information
>> that we actually need.
> I think that sounds like a reasonable compromise. We already have
> masked load/store intrinsics so adding add and sub just follows that
> precedent. If the decision is made to move masking to the core
> operations, the new intrinsics would just move as well.
> So an add->multiply for option B + masking intrinsics would look like
> %a = load <12 x float>, <12 x float>* %A, align 16
> %b = load <12 x float>, <12 x float>* %B, align 16
> %c = load <8 x float>, <8 x float>* %C, align 16
> %add = call <12 x float> @llvm.masked.fadd(<12 x float> %a, <12 x
> float> %b,
> ; mask, if false element is taken from passthrough
> <12 x i1> <i1 true, i1 true, i1 true, i1 false,
> i1 true, i1 true, i1 true, i1 false,
> i1 true, i1 true, i1 true, i1 false >
> ; passthrough:
> <12 x float> <float undef, float undef, float undef, float undef,
> float undef, float undef, float undef, float undef,
> float undef, float undef, float undef, float undef >)
> %mul = call <8 x float> @llvm.matrix.multiply(<12 x float> %add, <8 x
> float> %c,
> ; 3 x 3 3 x 2 column-major:
> i32 3, i32 3, i32 3, i32 2, i1 true)
> store <8 x float> %mul, <8 x float>* %MUL, align 16
We've started an RFC that proposes exactly this:
The RFC proposes intrinsics that take a mask and an explicit vector
length argument. The explicit vector length is aimed at RISC-V V and NEC
SX-Aurora and it can be legalized away for targets that do not support
it (eg AVX512). We also propose a couple of new attributes that should
help with function call vectorization.
I'll present this in Zurich at the upcoming LLVM Social on January, 10th
for people who are interested. I also talked about a bit about this at
the last DevMtg (from ~15:00 in https://youtu.be/BAZClv6nMxY).
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
Researcher / PhD Student
Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31
Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065 : http://compilers.cs.uni-saarland.de/people/moll
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev