[PATCH] Calculate vectorization factor using the narrowest type instead of widest type

Sat Apr 11 21:51:17 PDT 2015

> On Apr 11, 2015, at 6:41 PM, hfinkel at anl.gov wrote:
> 
> [+Arnold, Nadav,Chandler]

The loop vectorizer chooses a vectorization factor using a cost model and we stop the search at the widest type to limit cross-register shuffles. It is very difficult to model the performance impact of cross-register shuffles (both the quality of the shuffles that we generate and the performance impact of using multiple registers on register pressure).

 Using the narrowest element type would increase the in-register utilization but decrease the utilization of execution units due to unrolling, so I am not sure what we are gaining here (except for extra shuffles).

> 
> If I understand this correctly, this will cause us to potentially generate wider vectors than we have underlying vector registers, and I think that, generically, this makes sense. Now that our X86 shuffle handling is sane, the splitting of wide vectors, and shuffling that you get from vector extends/truncates is hopefully not too bad. Other opinions?
> 
> Did you see any performance changes on the test suite?
> 
> We might need to update the register-pressure heuristic (LoopVectorizationCostModel::calculateRegisterUsage()) to understand that very-wide vectors use multiple vector registers.

I agree.  Additionally, I think that we should rewrite this code. I made the mistake of calculating register pressure by scanning the code forward (start to end) instead of backwards, which made the code unnecessarily complicated (liveness should be calculated by scanning the basic block backwards). 

> 
> 
> http://reviews.llvm.org/D8943
> 
> EMAIL PREFERENCES
>  http://reviews.llvm.org/settings/panel/emailpreferences/
> 
>