[PATCH] D21251: [TTI] The cost model should not assume illegal vector casts get completely scalarized

Fri Jun 17 13:53:47 PDT 2016

arsenm added inline comments.

================
Comment at: test/Analysis/CostModel/AMDGPU/addrspacecast.ll:39
@@ -38,3 +38,3 @@
 ; CHECK: 'addrspacecast_local_to_flat_v32'
-; CHECK: estimated cost of 32 for {{.*}} addrspacecast <32 x i8 addrspace(3)*> %ptr to <32 x i8 addrspace(4)*>
+; CHECK: estimated cost of 47 for {{.*}} addrspacecast <32 x i8 addrspace(3)*> %ptr to <32 x i8 addrspace(4)*>
 define <32 x i8 addrspace(4)*> @addrspacecast_local_to_flat_v32(<32 x i8 addrspace(3)*> %ptr) #0 {
----------------
mkuper wrote:
> arsenm wrote:
> > Pretty much everything should be scalarized. The vector insert and extracts are supposed to be free (and the cost is reported as 0 for those) so I think adding the one there is inconsistent and should check the extract/insert cost
> Until now, we assumed scalarization, but I think this is actually the rare case in practice. If the platform cares about vectors, I'd expect it to support most vector operations at least at some vector width, so it usually won't scalarize. And if we assume partial splitting instead of scalarization, using the insert/extract costs will be the wrong thing, regardless of how imprecise "1" is for splitting (and it's definitely imprecise, but it's what the generic getTypeLegalizationCost() uses).
> 
> We could trace the entire legalization chain, and see whether the end result is a vector or a scalar, and then use either 1 or the getScalarizationOvehead() based on that, but I'm not a huge fan of that.
> (Is full scalarization common on AMDGPU, or is this a corner case? If it's common, perhaps we should specialize this for the AMDGPU TTI.)
There are no vector operations.Vectors are only for loading and storing, every operation is scalar


http://reviews.llvm.org/D21251