[PATCH] D21251: [TTI] The cost model should not assume illegal vector casts get completely scalarized

Michael Kuperstein via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 23 14:48:41 PDT 2016


mkuper added a comment.

In http://reviews.llvm.org/D21251#465930, @delena wrote:

> Could you, please, add a test for "sext <16 x i32 > to <16 x i64 >" for Skylake-avx512 and see what happens?
>  The cost should be 2. I know that this case did not work correctly and in many cases prevented vectorization to 16.


The cost is now 3 (instead of the old 48). It's not 2 because of the getVectorSplitCost() fudge factor.
We'll probably need to tune this in the future, as I said before, this is just a first approximation. It's possible that 0 is the right value most of the time.

If you don't mind, I'll add the test as a separate patch - we don't only want this specific test, we need tests for a bunch of sexts/zexts, like ARM has.


================
Comment at: test/Analysis/CostModel/ARM/cast.ll:267
@@ -266,3 +266,3 @@
   %r117 = fptosi <4 x float> undef to <4 x i32>
-  ; CHECK:  Found an estimated cost of 64 for instruction:   %r118 = fptoui <4 x float> undef to <4 x i64>
+  ; CHECK:  Found an estimated cost of 65 for instruction:   %r118 = fptoui <4 x float> undef to <4 x i64>
   %r118 = fptoui <4 x float> undef to <4 x i64>
----------------
mkuper wrote:
> arsenm wrote:
> > LGTM, but it looks to me like this should be adding 0, so not increasing by 1?
> Generally, with the new formula, it makes sense to have costs like 2 * 32 + 1, so I didn't pay too much attention to those little changes (what concerned me more were the big drops e.g. 64 -> 11 on line 326). But you're right, I need to verify that this is really reasonable and not some unexpected artifact. 
> Thanks!
So it mostly makes sense.
This used to be evaluated as fully scalarizing, with a per-element scalarization cost of 6, and cast cost of 10. So, (6 + 10) * 4 = 64.
Now we evaluate it as 1 + 2 * (cost(fptoui <2 x float> to <2 x i64>)). But the 2-wide cast is still considered fully scalarizing (even though the types are now legal), so we get 1 + 2 * (2 * (10 + 6)) = 65.


http://reviews.llvm.org/D21251





More information about the llvm-commits mailing list