[llvm-dev] [RFC] Vector Predication

Mon Feb 4 16:16:28 PST 2019

On 1/31/19 5:41 PM, Saito, Hideki wrote:
>
> I think you and I are talking two different things.
>
> As far as Intel’s vector function ABI is concerned, unless the 
> programmer specifically says otherwise, given an OpenMP declare simd 
> function, compiler will
>
> deduce the VF from HW vector register size and other function 
> signatures. Of course, there can be different vector function ABIs for 
> different targets. Intel
>
> compiler cost model uses vector function VF as part of loop 
> vectorization VF determination. So, it’s tightly coupled.
>
> A hypothetical vector target may vectorize such a vector function for 
> 4096b vector, with an explicit VF parameter 20 also passed to it, to 
> execute only the lower
>
> 20-elements parts of the whole thing.
>
> I think this scenario answers Philip’s question on why separate mask 
> and VF parameters and why VF can’t be conservatively deduced from the 
> mask/mask compute.
>
I think this does come close, yes.  There's still the question of just 
how common a short vectorized function of this form is in practice after 
inlining, but I can understand why being able to represent this 
cleanly/concisely would be useful.  My scheme would require the 
mask->length computation code be inserted as essentially part of the 
prolog, and doing so might be reasonable expensive.

On the other hand, if the vector length is already part of the ABI - 
which is sounds like this case is - inserting a bit of dummy code which 
enforces the predicate mask only has bits set below VLen could be done 
w/a simple shift/dec/and sequence.  While the sequence itself would be 
dynamically useless, it would make it obvious what the vlen for the 
function was if it hadn't been expressed in the IR.

Or alternatively, we could use the calling convention ABI detail to 
*assume* (and thus insert during SelectionDAG), the fact that the VLEN 
parameter's relation to the vector mask one.

My point in the above is not that this is obviously the right answer - 
it's not - simply that it probably could be made to work.  As such, I 
don't think we should be automatically assuming we have to match the IR 
definition precisely to the hardware. Doing so is a recipe for 
over-fitting and a hard to maintain long term design.

It's worth pointing out that including the vlen parameter in the 
intrinsic definitions creates exactly the opposite problem on a SIMD 
platform.  (i.e. we have to mask out the predicated based on the length 
when generating code.)

Philip

p.s. Reminder, just playing devil's advocate.  No strong opinions 
actually held.  :)

> *From:*Bruce Hoult [mailto:bruce at hoult.org]
> *Sent:* Thursday, January 31, 2019 5:13 PM
> *To:* Saito, Hideki <hideki.saito at intel.com>
> *Cc:* Philip Reames <listmail at philipreames.com>; Robin Kruppe 
> <robin.kruppe at gmail.com>; David Greene <dag at cray.com>; via llvm-dev 
> <llvm-dev at lists.llvm.org>; Maslov, Sergey V 
> <sergey.v.maslov at intel.com>; Topper, Craig <craig.topper at intel.com>
> *Subject:* Re: [llvm-dev] [RFC] Vector Predication
>
> On Thu, Jan 31, 2019 at 4:31 PM Saito, Hideki via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     >when we have a mask loaded from an external source (memory,
>     function call boundary, etc...) and a short sequence of vector ops
>
>     Mask value from function call parameter is common. OpenMP declare
>     simd function does exactly that for the masked cases.
>
> Such a mask is at the application level, not at the vector 
> strip-mining loop level.
>
> As well as possibly being many times longer than the masks the 
> hardware works with, it's likely to not even in the the format the 
> hardware uses: different library APIs might pack a mask into bits, or 
> one mask element per byte, short, or int.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190204/fd78f4e7/attachment.html>