[llvm-dev] sum elements in the vector

Rail Shafigulin via llvm-dev llvm-dev at lists.llvm.org
Wed May 18 10:55:14 PDT 2016


On Wed, May 18, 2016 at 5:56 AM, Martin J. O'Riordan <
martin.oriordan at movidius.com> wrote:

> Hi Rail,
>
>
>
> We used a very simple pattern expansion (actually, not a pattern
> fragment).  For example, for AND, ADD (horizontal sum), OR and XOR of 4
> elements we use something like the following TableGen structure:
>
>
>
> class HORIZ_Op4<SDNode opc, RegisterClass regVT, ValueType rt, ValueType
> vt, string asmstr> :
>
>     SHAVE_Instr<(outs regVT:$dst), (ins VRF128:$src),
>
>                     !strconcat(asmstr, " $dst $src"),
>
>                     [(set regVT:$dst,
>
>                       (opc (rt (vector_extract(vt VRF128:$src), 0 ) ),
>
>                         (opc (rt (vector_extract(vt VRF128:$src), 1 ) ),
>
>                           (opc (rt (vector_extract(vt VRF128:$src), 2 ) ),
>
>                             (rt (vector_extract(vt VRF128:$src), 3 ) )
>
>                           )
>
>                         )
>
>                       )
>
>                     )]>;
>
>
>
> This is okay for 4 element vectors, and it will get selected if the
> programmer writes something like:
>
>
>
> vec[0] & vec[1] & vec[2] & vec[3]
>
>
>
> but not with a simple variant like:
>
>
>
> vec[0] & vec[2] & vec[1] & vec[3]
>
>
>
> If this was properly represented by an ISD node, the other permutations
> could be more easily handled through normalisation.  We “could” write
> patterns for each of the permutations, but it is verbose, and in practice
> most people only write it one way anyway.
>
>
>
> The 8-lane equivalent has TableGen left thinking for quite a long time,
> and the 16-lane equivalent seems to hang TableGen.
>
>
>
>             MartinO
>

Martin,

Thanks for the reply. If I read a pattern correctly (and I'm not sure if I
do) then you are extracting data from the vector first and then perform an
operation. What I'm trying to place data into a vector and perform and
operation. Here is what I'm talking about:

Convert the following:

int a[] = {1, 2, 3, 4};
int sum = 0;
for (int i = 0; i < 4; i++)
  sum+= a[i];

into

vector.load vector.register.0, addressOfA
horizontal.add gpr.0, vector.register.0

Original thought was to match a pattern of adds and then use insert_elt
instruction, but this solution doesn't produce a good result, since it uses
more instructions than a chain of adds. Do you think there is a simple
solution to the problem that I'm suggesting or I have to make some major
code changes? As I said before, my experience with all of this is very
limited, so any help is greatly appreciated.

P.S. I wasn't able to find anything related to SHAVE_Instr in the LLVM
trunk. Are you guys not committing your work.





>
>
> *From:* Rail Shafigulin [mailto:rail at esenciatech.com]
> *Sent:* 16 May 2016 23:50
> *To:* Martin J. O'Riordan
> *Cc:* LLVM Developers
>
> *Subject:* Re: [llvm-dev] sum elements in the vector
>
>
>
>
>
>
>
> On Mon, May 16, 2016 at 3:11 AM, Martin J. O'Riordan via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> This would be really cool.  We have several instructions that perform
> horizontal vector operations, and have to use built-ins to select them as
> there is no easy way of expressing them in a TD file.  Some like SUM for a ‘
> v4i32’ are easy enough to express with a pattern fragment,
>
> Do you mind sharing how to do it with a pattern fragment? I'm not new to
> TD files but all the work I've done was very simple.
>
>
>
>
>
> SUM ‘v8i16’ takes TableGen a long time to compute, but SUM ‘v16i8’
> resulted in TableGen disappearing into itself for hours trying to reduce
> the patterns before I gave up and cancelled it.
>
>
>
> If there were ISD nodes for these, then it would be far simpler to express
> in TableGen, and also, the pattern fragments only match a very specific
> form of IR to the desired instruction.
>
>
>
> The horizontal operations are particularly useful for finalising a
> vectorised operation - for example I may want to compute the scalar MAX,
> MIN or SUM of a large number of items.  If the number of items is divisible
> by the vector lanes (e.g. 4, 8, or 16 in our case), then 4, 8 or 16 at a
> time can be computed using normal vector operation, and then the final
> scalar value can be computed using a single horizontal operation.
>
>
>
>             MartinO
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Chandler
> Carruth via llvm-dev
> *Sent:* 16 May 2016 2:16
> *To:* Shahid, Asghar-ahmad; Rail Shafigulin; llvm-dev; Hal Finkel
>
>
> *Subject:* Re: [llvm-dev] sum elements in the vector
>
>
>
> I'm starting to think we should directly implement horizontal operations
> on vector types.
>
>
>
> My suspicion is that coming up with a nice model for this would help us a
> lot with things like:
>
> - Idiom recognition of reduction patterns that use horizontal arithmetic
>
> - Ability to use horizontal operations in SLPVectorizer
>
> - Significantly easier cost modeling of vectorizing loops with reductions
> in LoopVectorize
>
> - Other things I've not thought of?
>
>
>  Curious what others think?
>
>
>
> -Chandler
>
>
>
> On Wed, May 11, 2016 at 10:07 PM Shahid, Asghar-ahmad via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> > why in order to add this particular instruction (sum elements in a
> vector) I need to add an insrinsic?
>
> Adding intrinsic is not the only way, it is one of the way and user
> WILL-NOT be required to invoke
>
> It specifically.
>
>
>
> Currently LLVM does not have any instruction to directly represent “sum of
> elements in a vector” and
>
> generate your particular instruction.However, you can do it without
> intrinsic by pattern matching the
>
> LLVM-IRs representing “sum of elements in vector” to your particular
> instruction in DAGCombiner.
>
>
>
> Regards,
>
> Shahid
>
>
>
>
>
> *From:* Rail Shafigulin [mailto:rail at esenciatech.com]
> *Sent:* Monday, May 09, 2016 11:59 PM
> *To:* Shahid, Asghar-ahmad; llvm-dev
> *Cc:* Das, Dibyendu
>
>
> *Subject:* Re: [llvm-dev] sum elements in the vector
>
>
>
> I'm a little confused. Here is why.
>
>
>
> I was able to add a vector add instruction to my target without using any
> intrinsics and without adding any new instructions to LLVM. So here is my
> question: how come I managed to add a new vector instruction without adding
> an intrinsic and why in order to add this particular instruction (sum
> elements in a vector) I need to add an insrinsic?
>
>
>
> Another question that I have is whether compiler will be able to target
> this new instruction (sum elements in a vector) if it is implemented as an
> intrinsic or the user will have to specifically invoke an instrinsic.
>
>
>
> Pardon if questions seem dumb, I'm still learning things.
>
>
>
> Any help is appreciated.
>
>
>
> On Fri, May 6, 2016 at 1:51 PM, Rail Shafigulin <rail at esenciatech.com>
> wrote:
>
> Thanks for the reply. These steps will add an instruction as an intrinsic.
> Is it possible to add an actual new instruction so that a compiler could
> target it during an optimization? How hard is it to do it? Is that a
> realistic objective.
>
>
>
> Rail
>
>
>
> On Mon, Apr 4, 2016 at 9:02 PM, Shahid, Asghar-ahmad <
> Asghar-ahmad.Shahid at amd.com> wrote:
>
> Hi Rail,
>
>
>
> We had done this for generation of X86 PSAD (sum of absolute difference)
> instruction through
>
> Llvm intrinsic. Doing this requires following
>
> 1.       Define an intrinsic, xyz(),  for the required instruction and
> corresponding SDNode
>
> 2.       Generate the “call xyz() “ IR based the matched pattern
>
> 3.       Map “call xyz()” IR to corresponding SDNode in
> SelectionDagBuilder.cpp
>
> 4.       Provide default expansion of the xyz() intrinsic
>
> 5.       Legalize type and/or operation
>
> 6.       Provide Lowering of intrinsic/SDNode to generate your target
> instruction
>
>
>
> You can visit http://llvm.org/docs/ExtendingLLVM.html for details.
>
>
>
> Regards,
>
> Shahid
>
>
>
>
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Rail
> Shafigulin via llvm-dev
> *Sent:* Monday, April 04, 2016 11:00 PM
> *To:* Das, Dibyendu
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] sum elements in the vector
>
>
>
> Thanks for the pointers. I looked at hadd instructions. They seem to do
> very similar to what I need. Unfortunately as I said before my LLVM
> experience is limited. My understanding is that when I create a new type of
> SDNode I need to specify a pattern for it, so that when LLVM is analyzing
> the code and is seeing a given pattern it would create this particular
> node. I'm really struggling to understand how it is done. So here are the
> problems that I'm having.
>
>
>
> 1. How do I identify that pattern that should be used?
>
> 2. How do I specify a given pattern?
>
>
>
> Do you (or someone else) mind helping me out?
>
>
>
> Any help is appreciated.
>
>
>
> On Mon, Apr 4, 2016 at 9:59 AM, Das, Dibyendu <Dibyendu.Das at amd.com>
> wrote:
>
> This is roughly along the lines of x86 hadd* instructions though the
> semantics of hadd* may not exactly match what you are looking for. This is
> probably more in line with x86/ARM SAD-like instructions but I don’t think
> llvm generates SAD without intrinsics.
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Rail
> Shafigulin via llvm-dev
> *Sent:* Monday, April 04, 2016 9:34 AM
> *To:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* [llvm-dev] sum elements in the vector
>
>
>
> My target has an instruction that adds up all elements in the vector and
> stores the result in a register. I'm trying to implement it in my compiler
> but I'm not sure even where to start.
>
>
>
> I did look at other targets, but they don't seem to have anything like it
> ( I could be wrong. My experience with LLVM is limited, so if I missed it,
> I'd appreciate if someone could point it out ).
>
>
>
> My understanding is that if SDNode for such an instruction doesn't exist I
> have to define one. Unfortunately, I don't know how to do it. I don't even
> know where to start looking. Would someone care to point me in the right
> direction?
>
>
>
> Any help is appreciated.
>
>
>
> --
>
> Rail Shafigulin
>
> Software Engineer
> Esencia Technologies
>
>
>
>
>
> --
>
> Rail Shafigulin
>
> Software Engineer
> Esencia Technologies
>
>
>
>
>
> --
>
> Rail Shafigulin
>
> Software Engineer
> Esencia Technologies
>
>
>
>
>
> --
>
> Rail Shafigulin
>
> Software Engineer
> Esencia Technologies
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
>
> --
>
> Rail Shafigulin
>
> Software Engineer
> Esencia Technologies
>



-- 
Rail Shafigulin
Software Engineer
Esencia Technologies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160518/192739cb/attachment.html>


More information about the llvm-dev mailing list