[LLVMdev] ARM vectorizer cost model

Nadav Rotem nrotem at apple.com
Wed Jan 9 09:10:41 PST 2013

Hi Renato, 

> I'm interested in knowing how you'll work up the ARM cost model and how easy it'd be to split the work.

Yes, I am starting to work on the ARM cost model and I would appreciate any help in the form of: advice, performance measurements, patches, etc. 

I tune the cost model by running the cost model analysis pass and I compare the output of the analysis to the output of LLC.  

For example:
	"opt -cost-model -analyze dumper.ll -mtriple=thumbv7 -mcpu=cortex-a15" 

I also run the vectorizer with -debug-only=loop-vectorize because it dumps the costs of all of the instructions with different vectorization factors, and it also detects the different kinds of shuffles that we support. 

> As far as I can see, LoopVectorizationCostModel is the class that does all the work, with assistance from the target transform info.

The LoopVectorizerCostModel only predicts which IR will be generated when vectorizing to a specific vector width. It uses TTI to get the cost of each IR instruction. Chandler recently refactored TTI (thank!) and now TTI is an analysis group. The BasicTTI attempts to handle all of the target independent logic. It uses the TargetLowering interface to check if the types are legal and how many times large vectors need to be split. Different targets need to implement the cases that the BasicTTI does not catch. For example, the cost of zext <8xi8> to <8 x i32> which is custom lowered on some targets.

> Do you think that updating ARMTTI would be the best course of action now, and inspect the differences in the CostModel later?
We should update TTI and inspect the cost model as we go.

> I also haven't seen anything related to context switches and pipeline decisions on the cost model, another issue that will be quite different between targets and sub-targets (especially in ARM world). But that can wait…

I am not aware of anything that we can do in regard to context switches. Do you mean the cost of moving GPR to NEON ? Its a good point. We need to increase the cost of insert/extract vector. It should be easy to model and we have all of the hooks already. 

We can use the Subtarget when we implement the hooks. This is an example from the ARMTTI

  unsigned getNumberOfRegisters(bool Vector) const {
    if (Vector) {
      if (ST->hasNEON())
        return 16; 
      return 0;

    if (ST->isThumb1Only())
      return 8;
    return 16; 

  unsigned getMaximumUnrollFactor() const {
    // These are out of order CPUs:
    if (ST->isCortexA15() || ST->isSwift())
      return 2;
    return 1;



> cheers,
> --renato

