[llvm] baecc9e - [CostModel][X86] getShuffleCost - add fallback (to half vector) for bfloat vector shuffle costs
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 5 03:14:47 PDT 2023
Author: Simon Pilgrim
Date: 2023-10-05T11:12:40+01:00
New Revision: baecc9e997dd912a6b3589a529394c77570acc19
URL: https://github.com/llvm/llvm-project/commit/baecc9e997dd912a6b3589a529394c77570acc19
DIFF: https://github.com/llvm/llvm-project/commit/baecc9e997dd912a6b3589a529394c77570acc19.diff
LOG: [CostModel][X86] getShuffleCost - add fallback (to half vector) for bfloat vector shuffle costs
Add initial half/bfloat broadcast shuffles test coverage (more to follow)
Fixes #68117 - which was stuck in a loop between getting scalarized insert/extract costs for the shuffle and then trying to convert a bfloat insert into a shuffle again......
Added:
Modified:
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/shuffle-broadcast-codesize.ll
llvm/test/Analysis/CostModel/X86/shuffle-broadcast-latency.ll
llvm/test/Analysis/CostModel/X86/shuffle-broadcast-sizelatency.ll
llvm/test/Analysis/CostModel/X86/shuffle-broadcast.ll
Removed:
################################################################################
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index d838d1f96e310d8..5feb2d12e8b36fc 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -1481,6 +1481,10 @@ InstructionCost X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
if (Kind == TTI::SK_Broadcast)
LT.first = 1;
+ // Treat <X x bfloat> shuffles as <X x half>.
+ if (LT.second.isVector() && LT.second.getScalarType() == MVT::bf16)
+ LT.second = LT.second.changeVectorElementType(MVT::f16);
+
// Subvector extractions are free if they start at the beginning of a
// vector and cheap if the subvectors are aligned.
if (Kind == TTI::SK_ExtractSubvector && LT.second.isVector()) {
diff --git a/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-codesize.ll b/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-codesize.ll
index c2454855ae44353..a149ec45c863e3e 100644
--- a/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-codesize.ll
+++ b/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-codesize.ll
@@ -150,6 +150,96 @@ define void @test_vXi32(<2 x i32> %src64, <4 x i32> %src128, <8 x i32> %src256,
ret void
}
+define void @test_vXf16(<2 x half> %src32, <4 x half> %src64, <8 x half> %src128, <16 x half> %src256, <32 x half> %src512) {
+; SSE2-LABEL: 'test_vXf16'
+; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; SSSE3-LABEL: 'test_vXf16'
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; SSE42-LABEL: 'test_vXf16'
+; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX2-LABEL: 'test_vXf16'
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX512-LABEL: 'test_vXf16'
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+ %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+ %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+ %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+ %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+ %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+ ret void
+}
+
+define void @test_vXbf16(<2 x bfloat> %src32, <4 x bfloat> %src64, <8 x bfloat> %src128, <16 x bfloat> %src256, <32 x bfloat> %src512) {
+; SSE-LABEL: 'test_vXbf16'
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX1-LABEL: 'test_vXbf16'
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX2-LABEL: 'test_vXbf16'
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX512-LABEL: 'test_vXbf16'
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+ %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+ %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+ %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+ %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+ %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+ ret void
+}
+
define void @test_vXi16(<2 x i16> %src32, <4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512) {
; SSE2-LABEL: 'test_vXi16'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> zeroinitializer
diff --git a/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-latency.ll b/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-latency.ll
index 7a5e91bb80561ad..119bce9410d5e75 100644
--- a/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-latency.ll
+++ b/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-latency.ll
@@ -150,6 +150,96 @@ define void @test_vXi32(<2 x i32> %src64, <4 x i32> %src128, <8 x i32> %src256,
ret void
}
+define void @test_vXf16(<2 x half> %src32, <4 x half> %src64, <8 x half> %src128, <16 x half> %src256, <32 x half> %src512) {
+; SSE2-LABEL: 'test_vXf16'
+; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; SSSE3-LABEL: 'test_vXf16'
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; SSE42-LABEL: 'test_vXf16'
+; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX2-LABEL: 'test_vXf16'
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX512-LABEL: 'test_vXf16'
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+ %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+ %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+ %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+ %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+ %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+ ret void
+}
+
+define void @test_vXbf16(<2 x bfloat> %src32, <4 x bfloat> %src64, <8 x bfloat> %src128, <16 x bfloat> %src256, <32 x bfloat> %src512) {
+; SSE-LABEL: 'test_vXbf16'
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX1-LABEL: 'test_vXbf16'
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX2-LABEL: 'test_vXbf16'
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX512-LABEL: 'test_vXbf16'
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+ %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+ %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+ %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+ %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+ %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+ ret void
+}
+
define void @test_vXi16(<2 x i16> %src32, <4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512) {
; SSE2-LABEL: 'test_vXi16'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> zeroinitializer
diff --git a/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-sizelatency.ll b/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-sizelatency.ll
index 34754f8fba2f5ea..b182738aa518697 100644
--- a/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-sizelatency.ll
+++ b/llvm/test/Analysis/CostModel/X86/shuffle-broadcast-sizelatency.ll
@@ -150,6 +150,96 @@ define void @test_vXi32(<2 x i32> %src64, <4 x i32> %src128, <8 x i32> %src256,
ret void
}
+define void @test_vXf16(<2 x half> %src32, <4 x half> %src64, <8 x half> %src128, <16 x half> %src256, <32 x half> %src512) {
+; SSE2-LABEL: 'test_vXf16'
+; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; SSSE3-LABEL: 'test_vXf16'
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; SSE42-LABEL: 'test_vXf16'
+; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX2-LABEL: 'test_vXf16'
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX512-LABEL: 'test_vXf16'
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+ %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+ %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+ %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+ %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+ %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+ ret void
+}
+
+define void @test_vXbf16(<2 x bfloat> %src32, <4 x bfloat> %src64, <8 x bfloat> %src128, <16 x bfloat> %src256, <32 x bfloat> %src512) {
+; SSE-LABEL: 'test_vXbf16'
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX1-LABEL: 'test_vXbf16'
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX2-LABEL: 'test_vXbf16'
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+; AVX512-LABEL: 'test_vXbf16'
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+ %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+ %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+ %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+ %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+ %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+ ret void
+}
+
define void @test_vXi16(<2 x i16> %src32, <4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512) {
; SSE2-LABEL: 'test_vXi16'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> zeroinitializer
diff --git a/llvm/test/Analysis/CostModel/X86/shuffle-broadcast.ll b/llvm/test/Analysis/CostModel/X86/shuffle-broadcast.ll
index 6de1c631ec643a4..7614313bdb37ea7 100644
--- a/llvm/test/Analysis/CostModel/X86/shuffle-broadcast.ll
+++ b/llvm/test/Analysis/CostModel/X86/shuffle-broadcast.ll
@@ -150,6 +150,96 @@ define void @test_vXi32(<2 x i32> %src64, <4 x i32> %src128, <8 x i32> %src256,
ret void
}
+define void @test_vXf16(<2 x half> %src32, <4 x half> %src64, <8 x half> %src128, <16 x half> %src256, <32 x half> %src512) {
+; SSE2-LABEL: 'test_vXf16'
+; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; SSSE3-LABEL: 'test_vXf16'
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; SSE42-LABEL: 'test_vXf16'
+; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; AVX2-LABEL: 'test_vXf16'
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; AVX512-LABEL: 'test_vXf16'
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+ %V32 = shufflevector <2 x half> %src32, <2 x half> undef, <2 x i32> zeroinitializer
+ %V64 = shufflevector <4 x half> %src64, <4 x half> undef, <4 x i32> zeroinitializer
+ %V128 = shufflevector <8 x half> %src128, <8 x half> undef, <8 x i32> zeroinitializer
+ %V256 = shufflevector <16 x half> %src256, <16 x half> undef, <16 x i32> zeroinitializer
+ %V512 = shufflevector <32 x half> %src512, <32 x half> undef, <32 x i32> zeroinitializer
+ ret void
+}
+
+define void @test_vXbf16(<2 x bfloat> %src32, <4 x bfloat> %src64, <8 x bfloat> %src128, <16 x bfloat> %src256, <32 x bfloat> %src512) {
+; SSE-LABEL: 'test_vXbf16'
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; AVX1-LABEL: 'test_vXbf16'
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; AVX2-LABEL: 'test_vXbf16'
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; AVX512-LABEL: 'test_vXbf16'
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+ %V32 = shufflevector <2 x bfloat> %src32, <2 x bfloat> undef, <2 x i32> zeroinitializer
+ %V64 = shufflevector <4 x bfloat> %src64, <4 x bfloat> undef, <4 x i32> zeroinitializer
+ %V128 = shufflevector <8 x bfloat> %src128, <8 x bfloat> undef, <8 x i32> zeroinitializer
+ %V256 = shufflevector <16 x bfloat> %src256, <16 x bfloat> undef, <16 x i32> zeroinitializer
+ %V512 = shufflevector <32 x bfloat> %src512, <32 x bfloat> undef, <32 x i32> zeroinitializer
+ ret void
+}
+
define void @test_vXi16(<2 x i16> %src32, <4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512) {
; SSE2-LABEL: 'test_vXi16'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> zeroinitializer
More information about the llvm-commits
mailing list