[PATCH] Enable vectorization of GEP expressions in SLP

Michael Zolotukhin mzolotukhin at apple.com
Mon Jun 2 18:34:01 PDT 2014


Hi Arnold,

Thank you for the review! My comments are inline, and an updated patch is attached.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: slp-vectorize-gep-v2.patch
Type: application/octet-stream
Size: 5223 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140602/24c44309/attachment.obj>
-------------- next part --------------


On May 30, 2014, at 3:01 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:

> +    case Instruction::GetElementPtr: {
> +      bool GEPVectorizable = true;
> +
> +      // We can't combine several GEPs into one vector if they have different
> +      // number of indexes.
> +      unsigned NumOps = VL0->getNumOperands();
> +      for (unsigned j = 0; j < VL.size(); ++j) {
> +        if (NumOps != cast<Instruction>(VL[j])->getNumOperands()) {
> +          GEPVectorizable = false;
> +          break;
> +        }
> +      }
> +
> +      // We can't combine several GEPs into one vector if they operate on
> +      // different types.
> +      if (GEPVectorizable) {
> +        Type *Ty0 = cast<Instruction>(VL0)->getOperand(0)->getType();
> +        for (unsigned j = 0; j < VL.size(); ++j) {
> +          Type *CurTy = cast<Instruction>(VL[j])->getOperand(0)->getType();
> +          if (Ty0 != CurTy) {
> +            GEPVectorizable = false;
> +            break;
> +          }
> +        }
> +      }
> +
> +     [SNIP]
> +      if (!GEPVectorizable) {
> +        DEBUG(dbgs() << "SLP: not-vectorizable GEP.\n");
> +        newTreeEntry(VL, false);
> +        return;
> +      }
> 
> 
> Can you replace the GEPVectorizable construct by early exists.
> 
> Simply copy the following code into the early exits:
> 
> +        DEBUG(dbgs() << "SLP: not-vectorizable GEP.\n");
> +        newTreeEntry(VL, false);
> +        return;
Done.

> 
> Is the cost of an arbitrary vector gep with variable indices just the cost of an add? What if you have non-zero intermediate indices? Should we add the cost of multiplies?
> 
> This excerpt from the langref:
> <result> = getelementptr <ptr vector> ptrval, <vector index type> idx
> 
> Lets me think we don’t support multiple indices. But that is probably a typo. We should check what kind of code we generate on multi indices geps and then decide what to.
I looked at the code generated for the vectorized GEPs with complicated nested indexes and decided to disallow vectorization for this case: scalar code looks pretty simpler and cleaner than the vectorized one. Thus, I limited the optimization to the case when we have only one index and it is constant. In other cases there is no easy and efficient way to produce effective vector code.

FWIW, getelementptr supports vector indices, but all elements of the vector must have the same value (we compute splat-value of the vector to find out which field we're addressing, and this value is only defined for vectors with all-equal elements). However, it looks like the last vector-index could contain elements with different values (we don’t crash in this case because we’re not trying to go any further).

Thanks,
Michael
> 
> 
> +    case Instruction::GetElementPtr: {
> +      TargetTransformInfo::OperandValueKind Op1VK =
> +          TargetTransformInfo::OK_AnyValue;
> +      TargetTransformInfo::OperandValueKind Op2VK =
> +          TargetTransformInfo::OK_UniformConstantValue;
> +
> +      int ScalarCost =
> +          VecTy->getNumElements() *
> +          TTI->getArithmeticInstrCost(Instruction::Add, ScalarTy, Op1VK, Op2VK);
> +      int VecCost =
> +          TTI->getArithmeticInstrCost(Instruction::Add, VecTy, Op1VK, Op2VK);
> +
> +      return VecCost - ScalarCost;
> +    }
> 
> Thanks!
> 
>> On May 30, 2014, at 11:35 AM, Michael Zolotukhin <mzolotukhin at apple.com> wrote:
>> 
>> Hi,
>> 
>> Currently SLP vectorizer can’t handle GEP expressions, and hence it fails to vectorize the following example:
>> struct bounds {
>>  int *start;
>>  int *end;
>> 
>>  void init(bounds *one) {
>>    start = one->start + 10;
>>    end   = one->end   + 12;
>>  };
>> };
>> 
>> Here GEP could be easily vectorized just like a usual add-operation. The attached patch implements it.
>> 
>> With this change performance on SPEC/CFP2006/447.dealII is improved by 2%, while performance of other tests remains unchanged.
>> 
>> Is it ok for trunk?
>> 
>> <slp-vectorize-gep.patch>
>> 
>> Thanks,
>> Michael
> 



More information about the llvm-commits mailing list