[PATCH] D21251: [TTI] The cost model should not assume illegal vector casts get completely scalarized
Michael Kuperstein via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 23 14:48:41 PDT 2016
mkuper added a comment.
In http://reviews.llvm.org/D21251#465930, @delena wrote:
> Could you, please, add a test for "sext <16 x i32 > to <16 x i64 >" for Skylake-avx512 and see what happens?
> The cost should be 2. I know that this case did not work correctly and in many cases prevented vectorization to 16.
The cost is now 3 (instead of the old 48). It's not 2 because of the getVectorSplitCost() fudge factor.
We'll probably need to tune this in the future, as I said before, this is just a first approximation. It's possible that 0 is the right value most of the time.
If you don't mind, I'll add the test as a separate patch - we don't only want this specific test, we need tests for a bunch of sexts/zexts, like ARM has.
================
Comment at: test/Analysis/CostModel/ARM/cast.ll:267
@@ -266,3 +266,3 @@
%r117 = fptosi <4 x float> undef to <4 x i32>
- ; CHECK: Found an estimated cost of 64 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>
+ ; CHECK: Found an estimated cost of 65 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>
%r118 = fptoui <4 x float> undef to <4 x i64>
----------------
mkuper wrote:
> arsenm wrote:
> > LGTM, but it looks to me like this should be adding 0, so not increasing by 1?
> Generally, with the new formula, it makes sense to have costs like 2 * 32 + 1, so I didn't pay too much attention to those little changes (what concerned me more were the big drops e.g. 64 -> 11 on line 326). But you're right, I need to verify that this is really reasonable and not some unexpected artifact.
> Thanks!
So it mostly makes sense.
This used to be evaluated as fully scalarizing, with a per-element scalarization cost of 6, and cast cost of 10. So, (6 + 10) * 4 = 64.
Now we evaluate it as 1 + 2 * (cost(fptoui <2 x float> to <2 x i64>)). But the 2-wide cast is still considered fully scalarizing (even though the types are now legal), so we get 1 + 2 * (2 * (10 + 6)) = 65.
http://reviews.llvm.org/D21251
More information about the llvm-commits
mailing list