[PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation
James Molloy
james at jamesmolloy.co.uk
Thu Apr 16 09:11:52 PDT 2015
Hi Asghar-ahmad,
Thanks for responding. I'll try and explain in more detail what I mean. I
agree that we can custom lower things and that we could implement your
intrinsic on our and other architecture. That is not in question. What is
in question is whether the definition of the intrinsic and behaviour as-is
would allow *efficient* implementation on multiple target architectures.
To reiterate the examples from earlier, there are seemingly two different
approaches for lowering a sum-of-absolute-differences loop. Assume 'a' and
'b' are the two input arrays, as some p\
ointer to vector type.
1:
int r = 0;
for (i = ...) {
r += sum( abs( *a++ - *b++ ) );
}
// r holds the sum-of-absolute-differences
2:
vector int r = {0};
for (i = ...) {
r += abs( *a++ - *b++ );
}
// r holds partial sums.
int sad = sum(r);
// sad holds the sum-of-absolute-differences
The most efficient form of lowering for X86 may possibly be (1), where a
PSAD instruction can be used (although for non-i8 types perhaps not?). For
ARM, AArch64 and according to an appnote I found [0] Altivec (I couldn't
find anything about MIPS), (2) is going to be better.
So the goal as I see it is to define these intrinsics and IR idioms such
that both forms can be generated depending on the target (and/or datatype -
you don't have PSAD for floating point types, so if someone does a non-int
SAD loop the most efficient form for you would be (2)).
With regards custom lowering, I think you have misinterpreted me. What I
was saying was that if the intrinsic is defined as "sum( abs( *a++ - *b++ )
)", non-X86 backends could custom lower it as something like "ABD + ADDV"
(absolute difference, sum-of-all-lanes). However, you'd end up with the
sum-of-all-lanes unnecessarily being inside the loop! By the time the
intrinsic is expanded, it may be difficult to determine that the sum can be
moved outside the loop.
Conversely, if we defined the intrinsic as "abs( *a++ - *b--)", we could
still easily generate loop type (1) by adding a sum() around it. As it is
easier to match a pattern than split a pattern apart and move it around
(ISel is made for pattern matching!) this is the implementation I am
suggesting.
Yes, you're right, this means the name "SAD" for the node may be a
misnomer. What I've asked for is the splitting apart of an opaque intrinsic
into a smaller opaque intrinsic and generic support IR, which is something
we do try and do where possible elsewhere in the compiler. I hope I've
explained why the node as you've described it may not be useful for any
non-X86 target.
Also, you haven't answered my question about signedness that I've mentioned
several times.
Cheers,
[0]
http://www.freescale.com/webapp/sps/download/license.jsp?colCode=AVEC_SAD
On Thu, 16 Apr 2015 at 12:12 Shahid, Asghar-ahmad <
Asghar-ahmad.Shahid at amd.com> wrote:
>
>
> > -----Original Message-----
> > From: James Molloy [mailto:james.molloy at arm.com]
> > Sent: Thursday, April 16, 2015 3:00 PM
> > To: Shahid, Asghar-ahmad; hfinkel at anl.gov; spatel at rotateright.com;
> > aschwaighofer at apple.com; ahmed.bougacha at gmail.com;
> > james.molloy at arm.com
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: Re: [PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and
> > corresponding ISD::SAD node for "Sum Of Absolute Differences" operation
> >
> > Hi,
> >
> > Thanks for continuing to work on this!
> >
> > From a high level, I still stand by the comments I made on your
> > LoopVectorizer review that this is a bit too Intel-specific. ARM and
> AArch64's
> > SAD instructions do not behave as you model here. You model SAD as a per-
> > element subtract, abs, then a cross-lane (horizontal) summation. The VABD
> > instruction in ARM does not do the last step - that is assumed to be
> done in a
> > separate instruction.
> While modeling SAD we wanted to be more generic than being tilted towards
> any target.
> That is why we abstracted the whole semantic (sub, abs, sum) into this
> intrinsic. Even the
> Result we have modeled to be an scalar value.
>
> >
> > Now, we can model your intrinsic semantics in ARM using:
> >
> > val2 = ABD val0, val1
> > val3 = ADDV val2
> This can be custom lowered for ARM. At one point X86 also need this kind
> of custom lowering for v16i8 operand type.
> Which I wrongly put into ExpandSAD() and need to be brought back to x86
> lowering.
>
> >
> > However that isn't very efficient - the sum-across-lanes is expensive.
> Ideally,
> > in a loop, we'd do something like this:
> >
> > loop:
> > reduction1 = ABD array0[i], array1[i]
> > br loop
> > val = ADDV reduction1
> >
> > So we'd only do the horizontal step once, not on every iteration. That
> is why I
> > think having the intrinsic represent just the per-element operations,
> not the
> > horizontal part, would be the lowest common denominator between our
> > two targets. It is easier to pattern match two nodes than to split one
> node
> > apart and LICM part of it.
> >
> > Similarly, the above construction should the loop vectorizer emit it is
> not very
> > good for you. You would never be able to match your PSAD instruction.
> We have already matched "psad" both coming from LV & SLP. So I am think
> this
> generalization is ok.
>
> So
> > my suggestion is this:
> >
> > - The intrinsic only models the per-element operations, not the
> horizontal
> > part.
> This is more of modeling "absolute diff" than "sum of absolute diff".
>
> > - The X86 backend pattern matches the sad intrinsic plus a horizontal
> > reduction to its PSAD node.
> > - The Loop Vectorizer has code to emit *either* the per-element-only or
> the
> > per-element-plus-horizontal SAD as part of its reduction logic.
> > - The TargetTransformInfo hook is extended to return an `enum SADType`,
> in
> > much the same way that `popcnt` type is done currently.
> >
> > How does this sound?
> With the current modeling I don't see why custom lowering will not be a
> good solution for any target
> Related issues.
>
> >
> > Cheers,
> >
> > James
> >
> >
> > REPOSITORY
> > rL LLVM
> >
> > http://reviews.llvm.org/D9029
> >
> > EMAIL PREFERENCES
> > http://reviews.llvm.org/settings/panel/emailpreferences/
> >
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150416/1bdb96bc/attachment.html>
More information about the llvm-commits
mailing list