[llvm] 080cb83 - [X86][AVX] Narrow VPBROADCASTQ->VPBROADCASTD if we don't need the upper bits.
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 23 02:41:21 PDT 2021
Author: Simon Pilgrim
Date: 2021-03-23T09:41:02Z
New Revision: 080cb83e52c3059a62bbb87142cbcf3f68c14ba2
URL: https://github.com/llvm/llvm-project/commit/080cb83e52c3059a62bbb87142cbcf3f68c14ba2
DIFF: https://github.com/llvm/llvm-project/commit/080cb83e52c3059a62bbb87142cbcf3f68c14ba2.diff
LOG: [X86][AVX] Narrow VPBROADCASTQ->VPBROADCASTD if we don't need the upper bits.
Helps fix cases where we've splatted smaller types to a wider vector element type without needing the upper bits.
Avoid this on AVX512 targets as that can affect broadcast folding.
Added:
Modified:
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll
Removed:
################################################################################
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 76b4aaa111902..0e22301f4ec6b 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -38959,6 +38959,19 @@ bool X86TargetLowering::SimplifyDemandedBitsForTargetNode(
if (SimplifyDemandedBits(Src, OriginalDemandedBits, DemandedElts, Known,
TLO, Depth + 1))
return true;
+ // If we don't need the upper bits, attempt to narrow the broadcast source.
+ // Don't attempt this on AVX512 as it might affect broadcast folding.
+ // TODO: Should we attempt this for i32/i16 splats? They tend to be slower.
+ if ((BitWidth == 64) && SrcVT.isScalarInteger() && !Subtarget.hasAVX512() &&
+ OriginalDemandedBits.countLeadingZeros() >= (BitWidth / 2)) {
+ MVT NewSrcVT = MVT::getIntegerVT(BitWidth / 2);
+ SDValue NewSrc =
+ TLO.DAG.getNode(ISD::TRUNCATE, SDLoc(Src), NewSrcVT, Src);
+ MVT NewVT = MVT::getVectorVT(NewSrcVT, VT.getVectorNumElements() * 2);
+ SDValue NewBcst =
+ TLO.DAG.getNode(X86ISD::VBROADCAST, SDLoc(Op), NewVT, NewSrc);
+ return TLO.CombineTo(Op, TLO.DAG.getBitcast(VT, NewBcst));
+ }
break;
}
case X86ISD::PCMPGT:
diff --git a/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll b/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
index 1ce9f0113ca17..471298492735e 100644
--- a/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
+++ b/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
@@ -220,9 +220,8 @@ define <4 x i64> @ext_i4_4i64(i4 %a0) {
;
; AVX2-LABEL: ext_i4_4i64:
; AVX2: # %bb.0:
-; AVX2-NEXT: # kill: def $edi killed $edi def $rdi
-; AVX2-NEXT: vmovq %rdi, %xmm0
-; AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
+; AVX2-NEXT: vmovd %edi, %xmm0
+; AVX2-NEXT: vpbroadcastd %xmm0, %ymm0
; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [1,2,4,8]
; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0
diff --git a/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll b/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll
index bbc1c148a6ab7..d014798c78c4f 100644
--- a/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll
+++ b/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll
@@ -278,9 +278,8 @@ define <4 x i64> @ext_i4_4i64(i4 %a0) {
;
; AVX2-LABEL: ext_i4_4i64:
; AVX2: # %bb.0:
-; AVX2-NEXT: # kill: def $edi killed $edi def $rdi
-; AVX2-NEXT: vmovq %rdi, %xmm0
-; AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
+; AVX2-NEXT: vmovd %edi, %xmm0
+; AVX2-NEXT: vpbroadcastd %xmm0, %ymm0
; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [1,2,4,8]
; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0
More information about the llvm-commits
mailing list