[PATCH] D8943: Calculate vectorization factor using the narrowest type instead of widest type

Mon Sep 21 17:01:10 PDT 2015

congh added a comment.

I am so sorry for being away from this patch for so long time!

In http://reviews.llvm.org/D8943#179239, @chandlerc wrote:

> Just updating this revision as it seems a bit stalled. I think there are a few things going on here...
>
> 1. I think it would be good to first at least mostly address the problem of identifying places where we can hoist truncs to narrow the width ad which we're doing operations within the vector. Without this, I think measuring the performance impact of this change will be hard -- we'll see wins that could be realized with a less register pressure intensive change.

In the updated patch I estimated the register usage of larger VFs to ensure that it isn't too large. This should help to reduce register pressure.

> 

> 

> 2. I think this needs some more high-level tests -- we should actually add a loop test case that should vectorize differently as a consequence.

I added a test case to this patch for larger VF. However, I found the cost estimation of many operations when VF is large is inaccurate at least for X86, and so that the maximum VF could not be applied due to the unreasonable large cost. I will fix those issues later and add more test cases.

> 

> 

> 3. The fp64_to_uint32-cost-model.ll change seems odd -- either the update to the test or the comments in the test are wrong... Don't know which.

As the change is protected by an option, many updates to test cases are not necessary now.

> 

> 

> 4. I think we would need numbers on non-x86 architectures in order to be confident that the register pressure increase wasn't problematic. This might mean using a temporary debug flag to enable this until we can hear back from other backend maintainers. I don't imagine any of the backends outside of ARM, x86, and PPC have enough autovectorization users to really care, so it shouldn't be too bad.

Following you suggestion, I have added an option in this patch. The register pressure problem is also alleviated as described above.

http://reviews.llvm.org/D8943