[PATCH] Add support to recognize non SIMD kind of parallelism in SLPVectorizer

Wed Jun 18 09:12:32 PDT 2014

================
Comment at: lib/Analysis/TargetTransformInfo.cpp:575
@@ -572,1 +574,3 @@
+    if (Kind == SK_Alternate)
+      return 2;
     return 1;
----------------
Please leave a cost of 1 here.

================
Comment at: lib/CodeGen/BasicTargetTransformInfo.cpp:335
@@ -332,1 +334,3 @@
+  if (Kind == SK_Alternate)
+    return 2;
   return 1;
----------------
Karthik thanks again for working on this. 

Now that we generate shuffles we need to make sure costs are approximately right.

BasicTTI is supposed to return a conservative estimate if targets don't override it. We can't always return two here. The cost of the shuffle could be much higher.

We should return a conservative default. One is not right either.

A conservative cost would be something like the cost of VecEltNum * cost(extractelement, Tp) + cost(insert element, Tp), that is the cost of constructing the shuffled vector by extracting the individual items and then creating the result vector.

Targets should then override this to provide more accurate estimates.

Costs should depend on the type. So for example:

X86TTI::getShuffleCost(SK_Alternate, <2 x double>) == 1
X86TTI::getShuffleCost(SK_Alternate, <4 x float>) == 2

We try to make vectorized code not slower than the scalar version. It is therefore important to not underestimate costs of vectorized code.

If we are doing integer addsubs we need to make sure we also return sensible costs for integer alternate shuffles as well.

(My sample code before was using the wrong indices on the shuffle mask. The mask should have been 0, 5, 2, 7 of course)

We will have to look at what code we generate for <8 x i16> for example, <16 x i8> usw.
  
  define void @test3(<8 x i16> *%a, <8 x i16> *%b, <8 x i16> *%c) {
  entry:
    %in1 = load <8 x i16>* %a
    %in2 = load <8 x i16>* %b
    %add = add <8 x i16> %in1, %in2
    %sub = sub <8 x i16> %in1, %in2
    %Shuff = shufflevector <8 x i16> %add,
                           <8 x i16> %sub,
                           <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>
    store <8 x i16> %Shuff, <8 x i16>* %c
    ret void
  }

http://reviews.llvm.org/D4015