[PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and corresponding SDNodes for horizontal sum operation.

Hal Finkel via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 1 10:28:42 PDT 2015


----- Original Message -----
> From: "Asghar-ahmad Shahid" <Asghar-ahmad.Shahid at amd.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-commits at lists.llvm.org, reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org, "renato golin"
> <renato.golin at linaro.org>, "james molloy" <james.molloy at arm.com>
> Sent: Thursday, October 1, 2015 12:16:56 PM
> Subject: RE: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and corresponding SDNodes for horizontal sum operation.
> 
> Hi Hal,
> 
> > This pattern does not look hard to match in the backend, albeit a
> > bit tedious.
> > Did you try and run into problems?
> 
> No, I did not try that.

Also, as I recall, James was working on a patch (or helping to review one?) to help eliminate those extra zexts from this kind of vectorized code. James, can you comment on what happened there?

Thanks again,
Hal

> 
> Regards,
> Shahid
> 
> > -----Original Message-----
> > From: Hal Finkel [mailto:hfinkel at anl.gov]
> > Sent: Thursday, October 01, 2015 5:58 PM
> > To: Shahid, Asghar-ahmad
> > Cc: llvm-commits at lists.llvm.org;
> > reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org; renato
> > golin;
> > james molloy
> > Subject: Re: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and
> > corresponding SDNodes for horizontal sum operation.
> > 
> > ----- Original Message -----
> > > From: "Asghar-ahmad Shahid" <Asghar-ahmad.Shahid at amd.com>
> > > To: "Hal Finkel" <hfinkel at anl.gov>
> > > Cc: llvm-commits at lists.llvm.org,
> > reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org, "renato
> > golin"
> > > <renato.golin at linaro.org>, "james molloy" <james.molloy at arm.com>
> > > Sent: Thursday, October 1, 2015 3:32:30 AM
> > > Subject: RE: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and
> > corresponding SDNodes for horizontal sum operation.
> > >
> > > Hi Hal,
> > >
> > > Reply inlined.
> > >
> > > > -----Original Message-----
> > > > From: Hal Finkel [mailto:hfinkel at anl.gov]
> > > > Sent: Thursday, October 01, 2015 4:14 AM
> > > > To: Shahid, Asghar-ahmad
> > > > Cc: llvm-commits at lists.llvm.org;
> > > > reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org; renato
> > > > golin;
> > > > james molloy
> > > > Subject: Re: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*'
> > > > and
> > > > corresponding SDNodes for horizontal sum operation.
> > > >
> > > > ----- Original Message -----
> > > > > From: "Asghar-ahmad Shahid" <Asghar-ahmad.Shahid at amd.com>
> > > > > To: "Hal Finkel" <hfinkel at anl.gov>
> > > > > Cc: llvm-commits at lists.llvm.org,
> > > > reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org,
> > > > "renato
> > > > golin"
> > > > > <renato.golin at linaro.org>, "james molloy"
> > > > > <james.molloy at arm.com>
> > > > > Sent: Wednesday, September 30, 2015 1:19:06 AM
> > > > > Subject: RE: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*'
> > > > > and
> > > > corresponding SDNodes for horizontal sum operation.
> > > > >
> > > > > Hi Hal,
> > > > >
> > > > > Let me present the full context. For below example:
> > > > > ===============================================
> > > > > unsigned sad_fun(unsigned char *a1, unsigned char *a2)  {
> > > > > unsigned
> > > > > sum =0;
> > > > >     for(int i=0; i<16; i++) {
> > > > >       sum += abs(a1[i] - a2[i]);
> > > > >     }
> > > > >   return sum;
> > > > > }
> > > > > ===============================================
> > > > >
> > > > > LLVM IR:
> > > > >
> > > > > ; ModuleID = 'sad_1.c'
> > > > > target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > > > > target triple = "x86_64-unknown-linux-gnu"
> > > > >
> > > > > ; Function Attrs: nounwind readonly uwtable define i32
> > > > > @sad_fun(i8*
> > > > > nocapture readonly %a1, i8* nocapture readonly %a2) #0 {
> > > > > entry:
> > > > >   br label %vector.body
> > > > >
> > > > > vector.body:                                      ; preds =
> > > > > %entry
> > > > >   %0 = bitcast i8* %a1 to <4 x i8>*
> > > > >   %wide.load = load <4 x i8>, <4 x i8>* %0, align 1, !tbaa !1
> > > > >   %1 = zext <4 x i8> %wide.load to <4 x i32>
> > > > >   %2 = bitcast i8* %a2 to <4 x i8>*
> > > > >   %wide.load12 = load <4 x i8>, <4 x i8>* %2, align 1, !tbaa
> > > > >   !1
> > > > >   %3 = zext <4 x i8> %wide.load12 to <4 x i32>
> > > > >   %4 = sub nsw <4 x i32> %1, %3
> > > > >   %5 = icmp sgt <4 x i32> %4, <i32 -1, i32 -1, i32 -1, i32
> > > > >   -1>
> > > > >   %6 = sub nsw <4 x i32> zeroinitializer, %4
> > > > >   %7 = select <4 x i1> %5, <4 x i32> %4, <4 x i32> %6
> > > > >   %8 = getelementptr inbounds i8, i8* %a1, i64 4
> > > > >   %9 = bitcast i8* %8 to <4 x i8>*
> > > > >   %wide.load.1 = load <4 x i8>, <4 x i8>* %9, align 1, !tbaa
> > > > >   !1
> > > > >   %10 = zext <4 x i8> %wide.load.1 to <4 x i32>
> > > > >   %11 = getelementptr inbounds i8, i8* %a2, i64 4
> > > > >   %12 = bitcast i8* %11 to <4 x i8>*
> > > > >   %wide.load12.1 = load <4 x i8>, <4 x i8>* %12, align 1,
> > > > >   !tbaa
> > > > >   !1
> > > > >   %13 = zext <4 x i8> %wide.load12.1 to <4 x i32>
> > > > >   %14 = sub nsw <4 x i32> %10, %13
> > > > >   %15 = icmp sgt <4 x i32> %14, <i32 -1, i32 -1, i32 -1, i32
> > > > >   -1>
> > > > >   %16 = sub nsw <4 x i32> zeroinitializer, %14
> > > > >   %17 = select <4 x i1> %15, <4 x i32> %14, <4 x i32> %16
> > > > >   %18 = add nsw <4 x i32> %17, %7
> > > > >   %19 = getelementptr inbounds i8, i8* %a1, i64 8
> > > > >   %20 = bitcast i8* %19 to <4 x i8>*
> > > > >   %wide.load.2 = load <4 x i8>, <4 x i8>* %20, align 1, !tbaa
> > > > >   !1
> > > > >   %21 = zext <4 x i8> %wide.load.2 to <4 x i32>
> > > > >   %22 = getelementptr inbounds i8, i8* %a2, i64 8
> > > > >   %23 = bitcast i8* %22 to <4 x i8>*
> > > > >   %wide.load12.2 = load <4 x i8>, <4 x i8>* %23, align 1,
> > > > >   !tbaa
> > > > >   !1
> > > > >   %24 = zext <4 x i8> %wide.load12.2 to <4 x i32>
> > > > >   %25 = sub nsw <4 x i32> %21, %24
> > > > >   %26 = icmp sgt <4 x i32> %25, <i32 -1, i32 -1, i32 -1, i32
> > > > >   -1>
> > > > >   %27 = sub nsw <4 x i32> zeroinitializer, %25
> > > > >   %28 = select <4 x i1> %26, <4 x i32> %25, <4 x i32> %27
> > > > >   %29 = add nsw <4 x i32> %28, %18
> > > > >   %30 = getelementptr inbounds i8, i8* %a1, i64 12
> > > > >   %31 = bitcast i8* %30 to <4 x i8>*
> > > > >   %wide.load.3 = load <4 x i8>, <4 x i8>* %31, align 1, !tbaa
> > > > >   !1
> > > > >   %32 = zext <4 x i8> %wide.load.3 to <4 x i32>
> > > > >   %33 = getelementptr inbounds i8, i8* %a2, i64 12
> > > > >   %34 = bitcast i8* %33 to <4 x i8>*
> > > > >   %wide.load12.3 = load <4 x i8>, <4 x i8>* %34, align 1,
> > > > >   !tbaa
> > > > >   !1
> > > > >   %35 = zext <4 x i8> %wide.load12.3 to <4 x i32>
> > > > >   %36 = sub nsw <4 x i32> %32, %35
> > > > >   %37 = icmp sgt <4 x i32> %36, <i32 -1, i32 -1, i32 -1, i32
> > > > >   -1>
> > > > >   %38 = sub nsw <4 x i32> zeroinitializer, %36
> > > > >   %39 = select <4 x i1> %37, <4 x i32> %36, <4 x i32> %38
> > > > >   %40 = add nsw <4 x i32> %39, %29
> > > > >   %rdx.shuf = shufflevector <4 x i32> %40, <4 x i32> undef,
> > > > >   <4 x
> > > > >   i32>
> > > > >   <i32 2, i32 3, i32 undef, i32 undef>
> > > > >   %bin.rdx = add <4 x i32> %40, %rdx.shuf
> > > > >   %rdx.shuf13 = shufflevector <4 x i32> %bin.rdx, <4 x i32>
> > > > >   undef, <4
> > > > >   x i32> <i32 1, i32 undef, i32 undef, i32 undef>
> > > > >   %bin.rdx14 = add <4 x i32> %bin.rdx, %rdx.shuf13
> > > > >   %41 = extractelement <4 x i32> %bin.rdx14, i32 0
> > > > >   ret i32 %41
> > > > > }
> > > > >
> > > > > In the above LLVM IR, vectorizer has vectorized the SAD
> > > > > pattern
> > > > > with vector type v4xi8.
> > > > > For example on X86, we want to generate "v16xi8" version of
> > > > > *PSAD*
> > > > > instruction. However to do this we need to catch the full
> > > > > chain of
> > > > > vectorized absolute diffrence and the reduction sum.
> > > > > We found this is difficult to pattern match in codegen and
> > > > > thought
> > > > > to go the intrinsic way.
> > > >
> > > > Put you need the vectorizer to vectorize in terms of v16xi8
> > > > regardless of how you match the sum. You say that the pattern
> > > > is
> > > > difficult to match, but do you mean matching it with the right
> > > > or
> > > > the wrong vectorization factor?
> > >
> > > I mean matching with the right vectorization factor for a given
> > > target.
> > 
> > Good.
> > 
> > This pattern does not look hard to match in the backend, albeit a
> > bit tedious.
> > Did you try and run into problems?
> > 
> > >
> > > >
> > > > > We also realized
> > > > > That with intrinsic we can also use SLP to vectorize the
> > > > > unrolled
> > > > > pattern if required. Then we simplified the SAD intrinsic
> > > > > operation into two intrinsic which can be utilized to
> > > > > generate
> > > > > "SAD"
> > > > > operation
> > > > > as well as in different context. For example "hsum",  as
> > > > > mentioned
> > > > > by Bruno, can be used in context of POPCNT operation.
> > > > >
> > > > > > No; vectorization is not special in that regard, and needs
> > > > > > to be
> > > > > > semantics preserving. We have 'fast math' flags for
> > > > > > floating-point operations, which help there, but there are
> > > > > > no
> > > > > > such flags for integer operations.
> > > > > Do we need ordering definition for integer ADD operation?
> > > > >
> > > > > > Do the underlying hardware operations on x86 (etc.) really
> > > > > > not
> > > > > > define an ordering for the horizontal adds?
> > > > >  X86 *PSAD*,  which is for integer type does the horizontal
> > > > >  add
> > > > > on intermediate result, does not define ordering.
> > > >
> > > > Do you mean that the intermediate accumulator has a larger
> > > > number of
> > > > bits than each individual vector element?
> > >
> > > No, with intermediate result I meant the result of the "Absolute
> > > Difference" operation whose width is same as individual vector
> > > element.
> > 
> > Alright, looking at the documentation for PSADBW, I believe I see
> > the source
> > of our confusion. PSADBW is defined to, "Computes the absolute
> > value of
> > the difference of 8 unsigned byte integers from the source operand
> > (first
> > operand) and from the destination operand (second operand). These 8
> > differences are then summed to produce an unsigned word integer
> > result
> > that is stored in the destination operand."
> > 
> > So, this, fundamentally, is an *unsigned* i8 operation. If this is
> > what you're
> > trying to model, we should remove from our discussion the topic of
> > nsw-
> > related semantics here. nuw also does not apply. Wrapping of all
> > kinds is
> > perfectly well defined. Given that wrapping is well defined, and
> > that
> > unsigned integer addition is associative (when we don't have
> > wrapping
> > restrictions), we also don't need to discuss the order of the
> > additions either
> > (they're irrelevant; all orderings would be permitted under 'as if'
> > regardless
> > of what we specified).
> > 
> >  -Hal
> > 
> > >
> > > >
> > > >  -Hal
> > > >
> > > > >
> > > > > Regards,
> > > > > Shahid
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Hal Finkel [mailto:hfinkel at anl.gov]
> > > > > > Sent: Wednesday, September 30, 2015 3:42 AM
> > > > > > To: Shahid, Asghar-ahmad
> > > > > > Cc: llvm-commits at lists.llvm.org;
> > > > > > reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org;
> > renato
> > > > > > golin;
> > > > > > james molloy
> > > > > > Subject: Re: [PATCH] D10964: [Codegen] Add intrinsics
> > > > > > 'hsum*'
> > > > > > and
> > > > > > corresponding SDNodes for horizontal sum operation.
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > > From: "Asghar-ahmad via llvm-commits Shahid"
> > > > > > > <llvm-commits at lists.llvm.org>
> > > > > > > To:
> > > > > > > reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org,
> > > > > > > "renato
> > > > > > golin" <renato.golin at linaro.org>, "james molloy"
> > > > > > > <james.molloy at arm.com>
> > > > > > > Cc: llvm-commits at lists.llvm.org
> > > > > > > Sent: Sunday, September 27, 2015 11:42:46 PM
> > > > > > > Subject: RE: [PATCH] D10964: [Codegen] Add intrinsics
> > > > > > > 'hsum*'
> > > > > > > and
> > > > > > corresponding SDNodes for horizontal sum operation.
> > > > > > >
> > > > > > > Hi Hal,
> > > > > > >
> > > > > > > Response inlined.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Shahid
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: hfinkel at anl.gov [mailto:hfinkel at anl.gov]
> > > > > > > > Sent: Saturday, September 26, 2015 2:58 AM
> > > > > > > > To: Shahid, Asghar-ahmad; renato.golin at linaro.org;
> > > > > > > > hfinkel at anl.gov; james.molloy at arm.com
> > > > > > > > Cc: llvm-dev at redking.me.uk; bruno.cardoso at gmail.com;
> > > > > > > > llvm-
> > > > > > > > commits at lists.llvm.org
> > > > > > > > Subject: Re: [PATCH] D10964: [Codegen] Add intrinsics
> > > > > > > > 'hsum*'
> > > > > > > > and
> > > > > > > > corresponding SDNodes for horizontal sum operation.
> > > > > > > >
> > > > > > > > hfinkel added a comment.
> > > > > > > >
> > > > > > > > I need to take a step back here; why are we doing this
> > > > > > > > again?
> > > > > > > >
> > > > > > > > I read again the RFC thread
> > > > > > > > (http://lists.llvm.org/pipermail/llvm-dev/2015-
> > > > > > > > May/085078.html), and it ended with the following (from
> > > > > > > > Renato):
> > > > > > > >
> > > > > > > > > > BTW, now my plan is to just add the two intrinsics
> > > > > > > > > > for
> > > > > > > > > > 'absolute
> > > > > > > > difference'
> > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > > >  and 'horizontal add'.
> > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > > That's ok, as long as they're impossible to represent
> > > > > > > > > in
> > > > > > > > > plain IR.
> > > > > > > >
> > > > > > > >
> > > > > > > > and I think that we were all in agreement on this
> > > > > > > > point.
> > > > > > > > But now
> > > > > > > > I'm not sure you've demonstrated the prerequisite.
> > > > > > > Earlier to this RFC there was a discussion around a patch
> > > > > > > (http://reviews.llvm.org/D9029) regarding llvm.sad
> > > > > > > intrinsic
> > > > > > > which highlights the need to have llvm.hsum and friend
> > > > > > > intrinsics. This RFC was a result of that discussion.
> > > > > > >
> > > > > > > > The underlying operations here (and in
> > > > > > > > http://reviews.llvm.org/D10867) seem like they are
> > > > > > > > representable using IR (as demonstrated by the fact
> > > > > > > > that you
> > > > > > > > provide potentially-equivalent IR sequences in the
> > > > > > > > documentation), except for the ordering freedom here.
> > > > > > >
> > > > > > > Of course the logical expansion of the underlying
> > > > > > > operation is
> > > > > > > trivially representable but not the pattern matching in
> > > > > > > codegen.
> > > > > > > Actually pattern matching is complex and may well be
> > > > > > > fragile
> > > > > > > in codegen due to the long chain of generic IR to be
> > > > > > > matched.
> > > > > > > Moreover movements of truncate/extends ops in DAG
> > > > > > > increase the
> > > > > > > complexity of matching further.
> > > > > >
> > > > > > It might, but it is not clearly bad enough to require an
> > > > > > alternative solution.
> > > > > >
> > > > > > In my experience, the patterns that get difficult to match
> > > > > > at
> > > > > > the SDAG level are those that consist, in whole or in part,
> > > > > > of
> > > > > > operations that are illegal (either involving illegal types
> > > > > > or
> > > > > > illegal operations). This becomes difficult because you
> > > > > > either
> > > > > > need to match the post-legalized nodes (which might involve
> > > > > > undoing scalarization, etc.) or you need to match the nodes
> > > > > > pre-legalization (where you cannot depend on the
> > > > > > canonicalization that restricts the state space later).
> > > > > > Here,
> > > > > > however, we seem to be dealing with patterns of otherwise
> > > > > > legal
> > > > > > nodes, on otherwise legal vector types, and while the
> > > > > > matching
> > > > > > might not be trivial, it is not obviously difficult in
> > > > > > *ISelLowering.cpp either.
> > > > > >
> > > > > > The other issue that I've run into with SDAG pattern
> > > > > > matching is
> > > > > > the fact that it is basic-block local, and some of the
> > > > > > operations you want to match may have been CSE'd or
> > > > > > otherwise
> > > > > > hoisted into dominating blocks. We generally put code in
> > > > > > CGP to
> > > > > > deal with these kinds of things.
> > > > > >
> > > > > > One advantage of putting the code into SDAG pattern
> > > > > > matching is
> > > > > > that the patterns get used regardless of where the code
> > > > > > originates (vectorizers or manually-written vector code).
> > > > > > This
> > > > > > is important.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > And this, I fear, is where we run into trouble. The
> > > > > > > > thing
> > > > > > > > that is not representable in the IR is that the order
> > > > > > > > of
> > > > > > > > operations in the horizontal sum is undefined, and at
> > > > > > > > the
> > > > > > > > same time, signed overflow is undefined. This cannot be
> > > > > > > > represented in the IR because nsw adds don't
> > > > > > > > reassociate,
> > > > > > > > and thus, there's no way to represent the sequence of
> > > > > > > > nsw
> > > > > > > > adds such that they can be reassociated while retaining
> > > > > > > > their nsw property. But the problem is that, because
> > > > > > > > this
> > > > > > > > freedom cannot be represented in the IR, we also can't
> > > > > > > > generate it from IR in a semantics-preserving way; and,
> > > > > > > > thus, it would not be legal to generate it in the
> > > > > > > > vectorizers.
> > > > > > >
> > > > > > > As proposed, we are not going to generate it always in
> > > > > > > vectorizer but only based on target checks, can't we also
> > > > > > > check for ordering relaxation there?
> > > > > > > Or fix the ordering
> > > > > > > constraint by some other means.
> > > > > >
> > > > > > No; vectorization is not special in that regard, and needs
> > > > > > to be
> > > > > > semantics preserving. We have 'fast math' flags for
> > > > > > floating-point operations, which help there, but there are
> > > > > > no
> > > > > > such flags for integer operations.
> > > > > >
> > > > > > Do the underlying hardware operations on x86 (etc.) really
> > > > > > not
> > > > > > define an ordering for the horizontal adds?
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thus, this change does not seem right, and approving
> > > > > > > > http://reviews.llvm.org/D10867 seems like a mistake as
> > > > > > > > well.
> > > > > > > > We could certainly fix the definition here to make it
> > > > > > > > exactly representable in the IR, but then what's the
> > > > > > > > point
> > > > > > > > of doing so?
> > > > > > > >
> > > > > > > > In the RFC, you mentioned cost modeling as a major
> > > > > > > > motivation for adding intrinsics, but that seems like
> > > > > > > > an
> > > > > > > > unrelated problem (at least in part).
> > > > > > >
> > > > > > > In RFC, I had also proposed llvm.sad intrinsic which I
> > > > > > > thought
> > > > > > > would make cost modelling easier for those targets which
> > > > > > > support sad operation natively. Hence The motivation. But
> > > > > > > Renato's explanation clarified the things related to cost
> > > > > > > modeling. However the complexity of pattern matching was
> > > > > > > always a bigger issue.
> > > > > >
> > > > > > Perhaps, but it is localized. And matching adds and
> > > > > > subtracts
> > > > > > plus shuffles does not seem like it should be unusually
> > > > > > painful.
> > > > > >
> > > > > > >
> > > > > > > > During
> > > > > > > > vectorization, we can use special interfaces to
> > > > > > > > estimate the
> > > > > > > > cost of complex patterns. In fact, we already have an
> > > > > > > > interface for
> > > > > > > > reductions:
> > > > > > > > TTI.getReductionCost. There is a second relevant code
> > > > > > > > model:
> > > > > > > > That
> > > > > > > > used by the unroller and inliner. Vectorization happens
> > > > > > > > after inlining, so that interaction is not really
> > > > > > > > relevant,
> > > > > > > > but partial unrolling happens after vectorization, and
> > > > > > > > so
> > > > > > > > the cost model there might want to understand that a
> > > > > > > > complex
> > > > > > > > sequence of shuffles, extracts and adds has a
> > > > > > > > disproportionately-
> > > > > > > > low cost. The same is true of the inliner if the input
> > > > > > > > IR
> > > > > > > > uses vector types and initially contains these
> > > > > > > > operations,
> > > > > > > > but even in that case, you'd not want to canonicalize
> > > > > > > > on the
> > > > > > > > intrinsics too early in case other optimizations remove
> > > > > > > > the
> > > > > > > > need for most of the operations. Thus, in the end, you
> > > > > > > > need
> > > > > > > > pattern-
> > > > > > > > matching code near the end of the pipeline anyway to
> > > > > > > > account
> > > > > > > > for input IR directly containing relevant operations on
> > > > > > > > vector types.
> > > > > > > >
> > > > > > > > In short, I don't understand why we're going in this
> > > > > > > > direction.
> > > > > > > > You
> > > > > > > > can match these operations in the backend, and you can
> > > > > > > > match
> > > > > > > > them in the cost-model code. If we can't do the latter,
> > > > > > > > then
> > > > > > > > we should fix that. And we already have special
> > > > > > > > interfaces
> > > > > > > > for vectorization costs for complex operations such as
> > > > > > > > this.
> > > > > > > >
> > > > > > > >
> > > > > > > > http://reviews.llvm.org/D10964
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > llvm-commits mailing list
> > > > > > > llvm-commits at lists.llvm.org
> > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Hal Finkel
> > > > > > Assistant Computational Scientist Leadership Computing
> > > > > > Facility
> > > > > > Argonne National Laboratory
> > > > >
> > > >
> > > > --
> > > > Hal Finkel
> > > > Assistant Computational Scientist
> > > > Leadership Computing Facility
> > > > Argonne National Laboratory
> > >
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory


More information about the llvm-commits mailing list