[PATCH] D21251: [TTI] The cost model should not assume illegal vector casts get completely scalarized
Michael Kuperstein via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 14 18:34:38 PDT 2016
mkuper added inline comments.
================
Comment at: test/Analysis/CostModel/AMDGPU/addrspacecast.ll:39
@@ -38,3 +38,3 @@
; CHECK: 'addrspacecast_local_to_flat_v32'
-; CHECK: estimated cost of 32 for {{.*}} addrspacecast <32 x i8 addrspace(3)*> %ptr to <32 x i8 addrspace(4)*>
+; CHECK: estimated cost of 47 for {{.*}} addrspacecast <32 x i8 addrspace(3)*> %ptr to <32 x i8 addrspace(4)*>
define <32 x i8 addrspace(4)*> @addrspacecast_local_to_flat_v32(<32 x i8 addrspace(3)*> %ptr) #0 {
----------------
arsenm wrote:
> Pretty much everything should be scalarized. The vector insert and extracts are supposed to be free (and the cost is reported as 0 for those) so I think adding the one there is inconsistent and should check the extract/insert cost
Until now, we assumed scalarization, but I think this is actually the rare case in practice. If the platform cares about vectors, I'd expect it to support most vector operations at least at some vector width, so it usually won't scalarize. And if we assume partial splitting instead of scalarization, using the insert/extract costs will be the wrong thing, regardless of how imprecise "1" is for splitting (and it's definitely imprecise, but it's what the generic getTypeLegalizationCost() uses).
We could trace the entire legalization chain, and see whether the end result is a vector or a scalar, and then use either 1 or the getScalarizationOvehead() based on that, but I'm not a huge fan of that.
(Is full scalarization common on AMDGPU, or is this a corner case? If it's common, perhaps we should specialize this for the AMDGPU TTI.)
http://reviews.llvm.org/D21251
More information about the llvm-commits
mailing list