[PATCH 2/3] ARM cost model: Address computation in vector mem ops not free

Arnold aschwaighofer at apple.com
Thu Feb 7 06:31:36 PST 2013


On Feb 7, 2013, at 6:22 AM, Renato Golin <renato.golin at linaro.org> wrote:

> On 7 February 2013 05:55, Nadav Rotem <nrotem at apple.com> wrote:
> I am not sure that its worth modeling this because it only affects the latency and not the throughput of the machine. 
> 
> It seems like penalizing the insertion into a D-subregister is getting out of hand. This is likely to occur when dealing with relative numbers (like instruction costs), so I'm not particularly excited about this patch, either. I'm also not unhappy, so if there is a tangible benefit, by all means...

I agree with you, it is unfortunate. However, I am trying to model an idiosyncrasy of the processor that has a big implication on performance. It is very expensive on swift if you happen to load into a S register, or D sub lane. Two such instructions are not pipelined but sequentialized.
 
 // Penalize inserting into an D-subregister.
  if (ST->isSwift() &&
      Opcode == Instruction::InsertElement &&
      ValTy->isVectorTy() &&
      ValTy->getScalarSizeInBits() <= 32)
    return 3;

> 
> Some specific questions:
> 
> +      if (Stride > 0)
> +        return Cost;
> 
> Can the stride ever be zero?


Stride has the value of the isConsecutivePtr method:

  /// 0 - Stride is unknown or non consecutive.
  /// 1 - Address is consecutive.
  /// -1 - Address is consecutive, and decreasing.
  int isConsecutivePtr(Value *Ptr);

> 
> -    return 2;
> +    return 3;
> 
> This seems arbitrary (I know it's not but), would be good to have them exposed as function calls, for instance, getAddressComputationCost() + whatever.
> 

This code comes from:

unsigned ARMTTI::getVectorInstrCost(unsigned Opcode, Type *ValTy,
                                    unsigned Index) const {
  // Penalize inserting into an D-subregister.
  if (ST->isSwift() &&
      Opcode == Instruction::InsertElement &&
      ValTy->isVectorTy() &&
      ValTy->getScalarSizeInBits() <= 32)
    return 3;

  return TargetTransformInfo::getVectorInstrCost(Opcode, ValTy, Index);
}

I don't think we need a function call for the value 3 here. It is a value just like any other that is returned by TTI.

> cheers,
> --renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130207/5c571a48/attachment.html>


More information about the llvm-commits mailing list