[PATCH] Calculate vectorization factor using the narrowest type instead of widest type
chandlerc at gmail.com
Wed May 27 02:14:40 PDT 2015
Just updating this revision as it seems a bit stalled. I think there are a few things going on here...
1. I think it would be good to first at least mostly address the problem of identifying places where we can hoist truncs to narrow the width ad which we're doing operations within the vector. Without this, I think measuring the performance impact of this change will be hard -- we'll see wins that could be realized with a less register pressure intensive change.
2. I think this needs some more high-level tests -- we should actually add a loop test case that should vectorize differently as a consequence.
3. The fp64_to_uint32-cost-model.ll change seems odd -- either the update to the test or the comments in the test are wrong... Don't know which.
4. I think we would need numbers on non-x86 architectures in order to be confident that the register pressure increase wasn't problematic. This might mean using a temporary debug flag to enable this until we can hear back from other backend maintainers. I don't imagine any of the backends outside of ARM, x86, and PPC have enough autovectorization users to really care, so it shouldn't be too bad.
More information about the llvm-commits