[llvm] 184b383 - Add v16f64 value type

Thu May 14 14:28:11 PDT 2020

Author: Stanislav Mekhanoshin
Date: 2020-05-14T14:28:00-07:00
New Revision: 184b38345746a8f2b8ff5608fdd115991fa2c0fe

URL: https://github.com/llvm/llvm-project/commit/184b38345746a8f2b8ff5608fdd115991fa2c0fe
DIFF: https://github.com/llvm/llvm-project/commit/184b38345746a8f2b8ff5608fdd115991fa2c0fe.diff

LOG: Add v16f64 value type

We need to use it to handle <16 x double> indirect indexes
in the AMDGPU BE.

The only visible change from adding it is in ARM cost model.
To me it looks reasonable. With doubling a vector size it
quadruples the cost up to the size 8 and then it did only
double it. Now it also quadruples, which seems a logical
progression to me.

Actual AMDGPU code is to follow, this is a common part, plus
load/store legalization in the AMDGPU BE not to break what
works now.

Differential Revision: https://reviews.llvm.org/D79952

Added: 
    

Modified: 
    llvm/include/llvm/CodeGen/ValueTypes.td
    llvm/include/llvm/IR/Intrinsics.td
    llvm/include/llvm/Support/MachineValueType.h
    llvm/lib/CodeGen/ValueTypes.cpp
    llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
    llvm/test/Analysis/CostModel/ARM/cast.ll
    llvm/utils/TableGen/CodeGenTarget.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/CodeGen/ValueTypes.td b/llvm/include/llvm/CodeGen/ValueTypes.td
index 16df565bc8b8..2ec0ed7ce3bd 100644

--- a/llvm/include/llvm/CodeGen/ValueTypes.td
+++ b/llvm/include/llvm/CodeGen/ValueTypes.td
@@ -112,60 +112,61 @@ def v1f64    : ValueType<64, 84>;     //    1 x f64 vector value
 def v2f64    : ValueType<128, 85>;    //    2 x f64 vector value
 def v4f64    : ValueType<256, 86>;    //    4 x f64 vector value
 def v8f64    : ValueType<512, 87>;    //    8 x f64 vector value
-
-def nxv1i1  : ValueType<1,   88>;  // n x  1 x i1  vector value
-def nxv2i1  : ValueType<2,   89>;  // n x  2 x i1  vector value
-def nxv4i1  : ValueType<4,   90>;  // n x  4 x i1  vector value
-def nxv8i1  : ValueType<8,   91>;  // n x  8 x i1  vector value
-def nxv16i1 : ValueType<16,  92>;  // n x 16 x i1  vector value
-def nxv32i1 : ValueType<32,  93>;  // n x 32 x i1  vector value
-
-def nxv1i8  : ValueType<8,   94>;  // n x  1 x i8  vector value
-def nxv2i8  : ValueType<16,  95>;  // n x  2 x i8  vector value
-def nxv4i8  : ValueType<32,  96>;  // n x  4 x i8  vector value
-def nxv8i8  : ValueType<64,  97>;  // n x  8 x i8  vector value
-def nxv16i8 : ValueType<128, 98>;  // n x 16 x i8  vector value
-def nxv32i8 : ValueType<256, 99>;  // n x 32 x i8  vector value
-
-def nxv1i16 : ValueType<16,  100>; // n x  1 x i16 vector value
-def nxv2i16 : ValueType<32,  101>; // n x  2 x i16 vector value
-def nxv4i16 : ValueType<64,  102>; // n x  4 x i16 vector value
-def nxv8i16 : ValueType<128, 103>; // n x  8 x i16 vector value
-def nxv16i16: ValueType<256, 104>; // n x 16 x i16 vector value
-def nxv32i16: ValueType<512, 105>; // n x 32 x i16 vector value
-
-def nxv1i32 : ValueType<32,  106>; // n x  1 x i32 vector value
-def nxv2i32 : ValueType<64,  107>; // n x  2 x i32 vector value
-def nxv4i32 : ValueType<128, 108>; // n x  4 x i32 vector value
-def nxv8i32 : ValueType<256, 109>; // n x  8 x i32 vector value
-def nxv16i32: ValueType<512, 110>; // n x 16 x i32 vector value
-def nxv32i32: ValueType<1024,111>; // n x 32 x i32 vector value
-
-def nxv1i64 : ValueType<64,  112>; // n x  1 x i64 vector value
-def nxv2i64 : ValueType<128, 113>; // n x  2 x i64 vector value
-def nxv4i64 : ValueType<256, 114>; // n x  4 x i64 vector value
-def nxv8i64 : ValueType<512, 115>; // n x  8 x i64 vector value
-def nxv16i64: ValueType<1024,116>; // n x 16 x i64 vector value
-def nxv32i64: ValueType<2048,117>; // n x 32 x i64 vector value
-
-def nxv2f16  : ValueType<32 , 118>; // n x  2 x f16 vector value
-def nxv4f16  : ValueType<64 , 119>; // n x  4 x f16 vector value
-def nxv8f16  : ValueType<128, 120>; // n x  8 x f16 vector value
-def nxv1f32  : ValueType<32 , 121>; // n x  1 x f32 vector value
-def nxv2f32  : ValueType<64 , 122>; // n x  2 x f32 vector value
-def nxv4f32  : ValueType<128, 123>; // n x  4 x f32 vector value
-def nxv8f32  : ValueType<256, 124>; // n x  8 x f32 vector value
-def nxv16f32 : ValueType<512, 125>; // n x 16 x f32 vector value
-def nxv1f64  : ValueType<64,  126>; // n x  1 x f64 vector value
-def nxv2f64  : ValueType<128, 127>; // n x  2 x f64 vector value
-def nxv4f64  : ValueType<256, 128>; // n x  4 x f64 vector value
-def nxv8f64  : ValueType<512, 129>; // n x  8 x f64 vector value
-
-def x86mmx : ValueType<64 , 130>;   // X86 MMX value
-def FlagVT : ValueType<0  , 131>;   // Pre-RA sched glue
-def isVoid : ValueType<0  , 132>;   // Produces no value
-def untyped: ValueType<8  , 133>;   // Produces an untyped value
-def exnref: ValueType<0, 134>;      // WebAssembly's exnref type
+def v16f64   : ValueType<1024, 88>;   //   16 x f64 vector value
+
+def nxv1i1  : ValueType<1,   89>;  // n x  1 x i1  vector value
+def nxv2i1  : ValueType<2,   90>;  // n x  2 x i1  vector value
+def nxv4i1  : ValueType<4,   91>;  // n x  4 x i1  vector value
+def nxv8i1  : ValueType<8,   92>;  // n x  8 x i1  vector value
+def nxv16i1 : ValueType<16,  93>;  // n x 16 x i1  vector value
+def nxv32i1 : ValueType<32,  94>;  // n x 32 x i1  vector value
+
+def nxv1i8  : ValueType<8,    95>;  // n x  1 x i8  vector value
+def nxv2i8  : ValueType<16,   96>;  // n x  2 x i8  vector value
+def nxv4i8  : ValueType<32,   97>;  // n x  4 x i8  vector value
+def nxv8i8  : ValueType<64,   98>;  // n x  8 x i8  vector value
+def nxv16i8 : ValueType<128,  99>;  // n x 16 x i8  vector value
+def nxv32i8 : ValueType<256, 100>;  // n x 32 x i8  vector value
+
+def nxv1i16 : ValueType<16,  101>; // n x  1 x i16 vector value
+def nxv2i16 : ValueType<32,  102>; // n x  2 x i16 vector value
+def nxv4i16 : ValueType<64,  103>; // n x  4 x i16 vector value
+def nxv8i16 : ValueType<128, 104>; // n x  8 x i16 vector value
+def nxv16i16: ValueType<256, 105>; // n x 16 x i16 vector value
+def nxv32i16: ValueType<512, 106>; // n x 32 x i16 vector value
+
+def nxv1i32 : ValueType<32,  107>; // n x  1 x i32 vector value
+def nxv2i32 : ValueType<64,  108>; // n x  2 x i32 vector value
+def nxv4i32 : ValueType<128, 109>; // n x  4 x i32 vector value
+def nxv8i32 : ValueType<256, 110>; // n x  8 x i32 vector value
+def nxv16i32: ValueType<512, 111>; // n x 16 x i32 vector value
+def nxv32i32: ValueType<1024,112>; // n x 32 x i32 vector value
+
+def nxv1i64 : ValueType<64,  113>; // n x  1 x i64 vector value
+def nxv2i64 : ValueType<128, 114>; // n x  2 x i64 vector value
+def nxv4i64 : ValueType<256, 115>; // n x  4 x i64 vector value
+def nxv8i64 : ValueType<512, 116>; // n x  8 x i64 vector value
+def nxv16i64: ValueType<1024,117>; // n x 16 x i64 vector value
+def nxv32i64: ValueType<2048,118>; // n x 32 x i64 vector value
+
+def nxv2f16  : ValueType<32 , 119>; // n x  2 x f16 vector value
+def nxv4f16  : ValueType<64 , 120>; // n x  4 x f16 vector value
+def nxv8f16  : ValueType<128, 121>; // n x  8 x f16 vector value
+def nxv1f32  : ValueType<32 , 122>; // n x  1 x f32 vector value
+def nxv2f32  : ValueType<64 , 123>; // n x  2 x f32 vector value
+def nxv4f32  : ValueType<128, 124>; // n x  4 x f32 vector value
+def nxv8f32  : ValueType<256, 125>; // n x  8 x f32 vector value
+def nxv16f32 : ValueType<512, 126>; // n x 16 x f32 vector value
+def nxv1f64  : ValueType<64,  127>; // n x  1 x f64 vector value
+def nxv2f64  : ValueType<128, 128>; // n x  2 x f64 vector value
+def nxv4f64  : ValueType<256, 129>; // n x  4 x f64 vector value
+def nxv8f64  : ValueType<512, 130>; // n x  8 x f64 vector value
+
+def x86mmx : ValueType<64 , 131>;   // X86 MMX value
+def FlagVT : ValueType<0  , 132>;   // Pre-RA sched glue
+def isVoid : ValueType<0  , 133>;   // Produces no value
+def untyped: ValueType<8  , 134>;   // Produces an untyped value
+def exnref : ValueType<0  , 135>;   // WebAssembly's exnref type
 def token  : ValueType<0  , 248>;   // TokenTy
 def MetadataVT: ValueType<0, 249>;  // Metadata
 

diff  --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index fc17ffe3118f..afb454e18d17 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -289,6 +289,7 @@ def llvm_v1f64_ty      : LLVMType<v1f64>;    //  1 x double
 def llvm_v2f64_ty      : LLVMType<v2f64>;    //  2 x double
 def llvm_v4f64_ty      : LLVMType<v4f64>;    //  4 x double
 def llvm_v8f64_ty      : LLVMType<v8f64>;    //  8 x double
+def llvm_v16f64_ty     : LLVMType<v16f64>;   // 16 x double
 
 def llvm_vararg_ty     : LLVMType<isVoid>;   // this means vararg here
 

diff  --git a/llvm/include/llvm/Support/MachineValueType.h b/llvm/include/llvm/Support/MachineValueType.h
index 26b45a602763..224353c5047f 100644
--- a/llvm/include/llvm/Support/MachineValueType.h
+++ b/llvm/include/llvm/Support/MachineValueType.h
@@ -140,63 +140,64 @@ namespace llvm {
       v2f64          =  85,   //    2 x f64
       v4f64          =  86,   //    4 x f64
       v8f64          =  87,   //    8 x f64
+      v16f64         =  88,   //   16 x f64
 
       FIRST_FP_FIXEDLEN_VECTOR_VALUETYPE = v2f16,
-      LAST_FP_FIXEDLEN_VECTOR_VALUETYPE = v8f64,
+      LAST_FP_FIXEDLEN_VECTOR_VALUETYPE = v16f64,
 
       FIRST_FIXEDLEN_VECTOR_VALUETYPE = v1i1,
-      LAST_FIXEDLEN_VECTOR_VALUETYPE = v8f64,
-
-      nxv1i1         =  88,   // n x  1 x i1
-      nxv2i1         =  89,   // n x  2 x i1
-      nxv4i1         =  90,   // n x  4 x i1
-      nxv8i1         =  91,   // n x  8 x i1
-      nxv16i1        =  92,   // n x 16 x i1
-      nxv32i1        =  93,   // n x 32 x i1
-
-      nxv1i8         =  94,   // n x  1 x i8
-      nxv2i8         =  95,   // n x  2 x i8
-      nxv4i8         =  96,   // n x  4 x i8
-      nxv8i8         =  97,   // n x  8 x i8
-      nxv16i8        =  98,   // n x 16 x i8
-      nxv32i8        =  99,   // n x 32 x i8
-
-      nxv1i16        =  100,  // n x  1 x i16
-      nxv2i16        =  101,  // n x  2 x i16
-      nxv4i16        =  102,  // n x  4 x i16
-      nxv8i16        =  103,  // n x  8 x i16
-      nxv16i16       =  104,  // n x 16 x i16
-      nxv32i16       =  105,  // n x 32 x i16
-
-      nxv1i32        =  106,  // n x  1 x i32
-      nxv2i32        =  107,  // n x  2 x i32
-      nxv4i32        =  108,  // n x  4 x i32
-      nxv8i32        =  109,  // n x  8 x i32
-      nxv16i32       =  110,  // n x 16 x i32
-      nxv32i32       =  111,  // n x 32 x i32
-
-      nxv1i64        =  112,  // n x  1 x i64
-      nxv2i64        =  113,  // n x  2 x i64
-      nxv4i64        =  114,  // n x  4 x i64
-      nxv8i64        =  115,  // n x  8 x i64
-      nxv16i64       =  116,  // n x 16 x i64
-      nxv32i64       =  117,  // n x 32 x i64
+      LAST_FIXEDLEN_VECTOR_VALUETYPE = v16f64,
+
+      nxv1i1         =  89,   // n x  1 x i1
+      nxv2i1         =  90,   // n x  2 x i1
+      nxv4i1         =  91,   // n x  4 x i1
+      nxv8i1         =  92,   // n x  8 x i1
+      nxv16i1        =  93,   // n x 16 x i1
+      nxv32i1        =  94,   // n x 32 x i1
+
+      nxv1i8         =  95,   // n x  1 x i8
+      nxv2i8         =  96,   // n x  2 x i8
+      nxv4i8         =  97,   // n x  4 x i8
+      nxv8i8         =  98,   // n x  8 x i8
+      nxv16i8        =  99,   // n x 16 x i8
+      nxv32i8        =  100,  // n x 32 x i8
+
+      nxv1i16        =  101,  // n x  1 x i16
+      nxv2i16        =  102,  // n x  2 x i16
+      nxv4i16        =  103,  // n x  4 x i16
+      nxv8i16        =  104,  // n x  8 x i16
+      nxv16i16       =  105,  // n x 16 x i16
+      nxv32i16       =  106,  // n x 32 x i16
+
+      nxv1i32        =  107,  // n x  1 x i32
+      nxv2i32        =  108,  // n x  2 x i32
+      nxv4i32        =  109,  // n x  4 x i32
+      nxv8i32        =  110,  // n x  8 x i32
+      nxv16i32       =  111,  // n x 16 x i32
+      nxv32i32       =  112,  // n x 32 x i32
+
+      nxv1i64        =  113,  // n x  1 x i64
+      nxv2i64        =  114,  // n x  2 x i64
+      nxv4i64        =  115,  // n x  4 x i64
+      nxv8i64        =  116,  // n x  8 x i64
+      nxv16i64       =  117,  // n x 16 x i64
+      nxv32i64       =  118,  // n x 32 x i64
 
       FIRST_INTEGER_SCALABLE_VECTOR_VALUETYPE = nxv1i1,
       LAST_INTEGER_SCALABLE_VECTOR_VALUETYPE = nxv32i64,
 
-      nxv2f16        =  118,  // n x  2 x f16
-      nxv4f16        =  119,  // n x  4 x f16
-      nxv8f16        =  120,  // n x  8 x f16
-      nxv1f32        =  121,  // n x  1 x f32
-      nxv2f32        =  122,  // n x  2 x f32
-      nxv4f32        =  123,  // n x  4 x f32
-      nxv8f32        =  124,  // n x  8 x f32
-      nxv16f32       =  125,  // n x 16 x f32
-      nxv1f64        =  126,  // n x  1 x f64
-      nxv2f64        =  127,  // n x  2 x f64
-      nxv4f64        =  128,  // n x  4 x f64
-      nxv8f64        =  129,  // n x  8 x f64
+      nxv2f16        =  119,  // n x  2 x f16
+      nxv4f16        =  120,  // n x  4 x f16
+      nxv8f16        =  121,  // n x  8 x f16
+      nxv1f32        =  122,  // n x  1 x f32
+      nxv2f32        =  123,  // n x  2 x f32
+      nxv4f32        =  124,  // n x  4 x f32
+      nxv8f32        =  125,  // n x  8 x f32
+      nxv16f32       =  126,  // n x 16 x f32
+      nxv1f64        =  127,  // n x  1 x f64
+      nxv2f64        =  128,  // n x  2 x f64
+      nxv4f64        =  129,  // n x  4 x f64
+      nxv8f64        =  130,  // n x  8 x f64
 
       FIRST_FP_SCALABLE_VECTOR_VALUETYPE = nxv2f16,
       LAST_FP_SCALABLE_VECTOR_VALUETYPE = nxv8f64,
@@ -207,20 +208,20 @@ namespace llvm {
       FIRST_VECTOR_VALUETYPE = v1i1,
       LAST_VECTOR_VALUETYPE  = nxv8f64,
 
-      x86mmx         =  130,   // This is an X86 MMX value
+      x86mmx         =  131,   // This is an X86 MMX value
 
-      Glue           =  131,   // This glues nodes together during pre-RA sched
+      Glue           =  132,   // This glues nodes together during pre-RA sched
 
-      isVoid         =  132,   // This has no value
+      isVoid         =  133,   // This has no value
 
-      Untyped        =  133,   // This value takes a register, but has
+      Untyped        =  134,   // This value takes a register, but has
                                // unspecified type.  The register class
                                // will be determined by the opcode.
 
-      exnref         =  134,   // WebAssembly's exnref type
+      exnref         =  135,   // WebAssembly's exnref type
 
       FIRST_VALUETYPE = 1,     // This is always the beginning of the list.
-      LAST_VALUETYPE =  135,   // This always remains at the end of the list.
+      LAST_VALUETYPE =  136,   // This always remains at the end of the list.
 
       // This is the current maximum for LAST_VALUETYPE.
       // MVT::MAX_ALLOWED_VALUETYPE is used for asserts and to size bit vectors
@@ -374,7 +375,7 @@ namespace llvm {
     bool is1024BitVector() const {
       return (SimpleTy == MVT::v1024i1 || SimpleTy == MVT::v128i8 ||
               SimpleTy == MVT::v64i16  || SimpleTy == MVT::v32i32 ||
-              SimpleTy == MVT::v16i64);
+              SimpleTy == MVT::v16i64  || SimpleTy == MVT::v16f64);
     }
 
     /// Return true if this is a 2048-bit vector type.
@@ -537,6 +538,7 @@ namespace llvm {
       case v2f64:
       case v4f64:
       case v8f64:
+      case v16f64:
       case nxv1f64:
       case nxv2f64:
       case nxv4f64:
@@ -589,6 +591,7 @@ namespace llvm {
       case v16i64:
       case v16f16:
       case v16f32:
+      case v16f64:
       case nxv16i1:
       case nxv16i8:
       case nxv16i16:
@@ -805,6 +808,7 @@ namespace llvm {
       case v64i16:
       case v32i32:
       case v16i64:
+      case v16f64:
       case v32f32: return TypeSize::Fixed(1024);
       case nxv32i32:
       case nxv16i64: return TypeSize::Scalable(1024);
@@ -1010,6 +1014,7 @@ namespace llvm {
         if (NumElements == 2)  return MVT::v2f64;
         if (NumElements == 4)  return MVT::v4f64;
         if (NumElements == 8)  return MVT::v8f64;
+        if (NumElements == 16) return MVT::v16f64;
         break;
       }
       return (MVT::SimpleValueType)(MVT::INVALID_SIMPLE_VALUE_TYPE);

diff  --git a/llvm/lib/CodeGen/ValueTypes.cpp b/llvm/lib/CodeGen/ValueTypes.cpp
index 55dfabfaf6fe..e24ad844a62c 100644
--- a/llvm/lib/CodeGen/ValueTypes.cpp
+++ b/llvm/lib/CodeGen/ValueTypes.cpp
@@ -254,6 +254,7 @@ Type *EVT::getTypeForEVT(LLVMContext &Context) const {
   case MVT::v2f64:   return VectorType::get(Type::getDoubleTy(Context), 2);
   case MVT::v4f64:   return VectorType::get(Type::getDoubleTy(Context), 4);
   case MVT::v8f64:   return VectorType::get(Type::getDoubleTy(Context), 8);
+  case MVT::v16f64:  return VectorType::get(Type::getDoubleTy(Context), 16);
   case MVT::nxv1i1:
     return VectorType::get(Type::getInt1Ty(Context), 1, /*Scalable=*/ true);
   case MVT::nxv2i1:

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 9b4ee748f7b9..cfc18d823139 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -119,6 +119,12 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
   setOperationAction(ISD::LOAD, MVT::v8f64, Promote);
   AddPromotedToType(ISD::LOAD, MVT::v8f64, MVT::v16i32);
 
+  setOperationAction(ISD::LOAD, MVT::v16i64, Promote);
+  AddPromotedToType(ISD::LOAD, MVT::v16i64, MVT::v32i32);
+
+  setOperationAction(ISD::LOAD, MVT::v16f64, Promote);
+  AddPromotedToType(ISD::LOAD, MVT::v16f64, MVT::v32i32);
+
   // There are no 64-bit extloads. These should be done as a 32-bit extload and
   // an extension to 64-bit.
   for (MVT VT : MVT::integer_valuetypes()) {
@@ -177,11 +183,13 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
   setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2f32, Expand);
   setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4f32, Expand);
   setLoadExtAction(ISD::EXTLOAD, MVT::v8f64, MVT::v8f32, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::v16f64, MVT::v16f32, Expand);
 
   setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f16, Expand);
   setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2f16, Expand);
   setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4f16, Expand);
   setLoadExtAction(ISD::EXTLOAD, MVT::v8f64, MVT::v8f16, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::v16f64, MVT::v16f16, Expand);
 
   setOperationAction(ISD::STORE, MVT::f32, Promote);
   AddPromotedToType(ISD::STORE, MVT::f32, MVT::i32);
@@ -231,6 +239,12 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
   setOperationAction(ISD::STORE, MVT::v8f64, Promote);
   AddPromotedToType(ISD::STORE, MVT::v8f64, MVT::v16i32);
 
+  setOperationAction(ISD::STORE, MVT::v16i64, Promote);
+  AddPromotedToType(ISD::STORE, MVT::v16i64, MVT::v32i32);
+
+  setOperationAction(ISD::STORE, MVT::v16f64, Promote);
+  AddPromotedToType(ISD::STORE, MVT::v16f64, MVT::v32i32);
+
   setTruncStoreAction(MVT::i64, MVT::i1, Expand);
   setTruncStoreAction(MVT::i64, MVT::i8, Expand);
   setTruncStoreAction(MVT::i64, MVT::i16, Expand);
@@ -263,6 +277,8 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
   setTruncStoreAction(MVT::v8f64, MVT::v8f32, Expand);
   setTruncStoreAction(MVT::v8f64, MVT::v8f16, Expand);
 
+  setTruncStoreAction(MVT::v16f64, MVT::v16f32, Expand);
+  setTruncStoreAction(MVT::v16f64, MVT::v16f16, Expand);
 
   setOperationAction(ISD::Constant, MVT::i32, Legal);
   setOperationAction(ISD::Constant, MVT::i64, Legal);

diff  --git a/llvm/test/Analysis/CostModel/ARM/cast.ll b/llvm/test/Analysis/CostModel/ARM/cast.ll
index dc8222ca699c..a7fd0a141a56 100644
--- a/llvm/test/Analysis/CostModel/ARM/cast.ll
+++ b/llvm/test/Analysis/CostModel/ARM/cast.ll
@@ -372,12 +372,12 @@ define i32 @casts() {
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %r81 = fptrunc <2 x double> undef to <2 x float>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %r82 = fptrunc <4 x double> undef to <4 x float>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %r83 = fptrunc <8 x double> undef to <8 x float>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 80 for instruction: %r84 = fptrunc <16 x double> undef to <16 x float>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 160 for instruction: %r84 = fptrunc <16 x double> undef to <16 x float>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %r85 = fpext float undef to double
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %r86 = fpext <2 x float> undef to <2 x double>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 82 for instruction: %r87 = fpext <4 x float> undef to <4 x double>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 328 for instruction: %r88 = fpext <8 x float> undef to <8 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 656 for instruction: %r89 = fpext <16 x float> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1312 for instruction: %r89 = fpext <16 x float> undef to <16 x double>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8>
@@ -448,16 +448,16 @@ define i32 @casts() {
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %r157 = fptosi <16 x float> undef to <16 x i32>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1312 for instruction: %r158 = fptoui <16 x float> undef to <16 x i64>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1312 for instruction: %r159 = fptosi <16 x float> undef to <16 x i64>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 661 for instruction: %r160 = fptoui <16 x double> undef to <16 x i1>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 661 for instruction: %r161 = fptosi <16 x double> undef to <16 x i1>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 661 for instruction: %r162 = fptoui <16 x double> undef to <16 x i8>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 661 for instruction: %r163 = fptosi <16 x double> undef to <16 x i8>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 660 for instruction: %r164 = fptoui <16 x double> undef to <16 x i16>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 660 for instruction: %r165 = fptosi <16 x double> undef to <16 x i16>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 656 for instruction: %r166 = fptoui <16 x double> undef to <16 x i32>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 656 for instruction: %r167 = fptosi <16 x double> undef to <16 x i32>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 640 for instruction: %r168 = fptoui <16 x double> undef to <16 x i64>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 640 for instruction: %r169 = fptosi <16 x double> undef to <16 x i64>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1322 for instruction: %r160 = fptoui <16 x double> undef to <16 x i1>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1322 for instruction: %r161 = fptosi <16 x double> undef to <16 x i1>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1322 for instruction: %r162 = fptoui <16 x double> undef to <16 x i8>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1322 for instruction: %r163 = fptosi <16 x double> undef to <16 x i8>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1320 for instruction: %r164 = fptoui <16 x double> undef to <16 x i16>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1320 for instruction: %r165 = fptosi <16 x double> undef to <16 x i16>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1312 for instruction: %r166 = fptoui <16 x double> undef to <16 x i32>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1312 for instruction: %r167 = fptosi <16 x double> undef to <16 x i32>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1280 for instruction: %r168 = fptoui <16 x double> undef to <16 x i64>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1280 for instruction: %r169 = fptosi <16 x double> undef to <16 x i64>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %r170 = uitofp <2 x i1> undef to <2 x float>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %r171 = sitofp <2 x i1> undef to <2 x float>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %r172 = uitofp <2 x i8> undef to <2 x float>
@@ -528,16 +528,16 @@ define i32 @casts() {
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %r237 = sitofp <16 x i32> undef to <16 x float>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 160 for instruction: %r238 = uitofp <16 x i64> undef to <16 x float>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 160 for instruction: %r239 = sitofp <16 x i64> undef to <16 x float>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1045 for instruction: %r240 = uitofp <16 x i1> undef to <16 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1045 for instruction: %r241 = sitofp <16 x i1> undef to <16 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1045 for instruction: %r242 = uitofp <16 x i8> undef to <16 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1045 for instruction: %r243 = sitofp <16 x i8> undef to <16 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1044 for instruction: %r244 = uitofp <16 x i16> undef to <16 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1044 for instruction: %r245 = sitofp <16 x i16> undef to <16 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1044 for instruction: %r246 = uitofp <16 x i16> undef to <16 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1044 for instruction: %r247 = sitofp <16 x i16> undef to <16 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1024 for instruction: %r248 = uitofp <16 x i64> undef to <16 x double>
-; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 1024 for instruction: %r249 = sitofp <16 x i64> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2090 for instruction: %r240 = uitofp <16 x i1> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2090 for instruction: %r241 = sitofp <16 x i1> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2090 for instruction: %r242 = uitofp <16 x i8> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2090 for instruction: %r243 = sitofp <16 x i8> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2088 for instruction: %r244 = uitofp <16 x i16> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2088 for instruction: %r245 = sitofp <16 x i16> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2088 for instruction: %r246 = uitofp <16 x i16> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2088 for instruction: %r247 = sitofp <16 x i16> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2048 for instruction: %r248 = uitofp <16 x i64> undef to <16 x double>
+; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 2048 for instruction: %r249 = sitofp <16 x i64> undef to <16 x double>
 ; CHECK-MVE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
 ;
 ; CHECK-V8M-MAIN-LABEL: 'casts'

diff  --git a/llvm/utils/TableGen/CodeGenTarget.cpp b/llvm/utils/TableGen/CodeGenTarget.cpp
index 921d20e7af76..e0470e4266f8 100644
--- a/llvm/utils/TableGen/CodeGenTarget.cpp
+++ b/llvm/utils/TableGen/CodeGenTarget.cpp
@@ -150,6 +150,7 @@ StringRef llvm::getEnumName(MVT::SimpleValueType T) {
   case MVT::v2f64:    return "MVT::v2f64";
   case MVT::v4f64:    return "MVT::v4f64";
   case MVT::v8f64:    return "MVT::v8f64";
+  case MVT::v16f64:   return "MVT::v16f64";
   case MVT::nxv1i1:   return "MVT::nxv1i1";
   case MVT::nxv2i1:   return "MVT::nxv2i1";
   case MVT::nxv4i1:   return "MVT::nxv4i1";