[llvm] [X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded (PR #133923)

Simon Pilgrim via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 2 06:28:27 PDT 2025


================
@@ -29,15 +29,21 @@ define <4 x double> @concat_vpermv3_ops_vpermv_v4f64(ptr %p0, <4 x i64> %m) {
 define <4 x double> @concat_vpermv3_ops_vpermv_swap_v4f64(ptr %p0, <4 x i64> %m) {
 ; X86-LABEL: concat_vpermv3_ops_vpermv_swap_v4f64:
 ; X86:       # %bb.0:
+; X86-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    vmovapd 32(%eax), %ymm1
-; X86-NEXT:    vpermi2pd (%eax), %ymm1, %ymm0
+; X86-NEXT:    vmovupd (%eax), %zmm1
+; X86-NEXT:    vshuff64x2 {{.*#+}} zmm1 = zmm1[4,5,6,7,0,1,2,3]
+; X86-NEXT:    vpermpd %zmm1, %zmm0, %zmm0
----------------
RKSimon wrote:

This is the VPERMV(M,CONCAT(Y,X)) handling I mentioned in the summary, it allows the free concat but then needs to commute the halves. Would you prefer I remove it for now?

https://github.com/llvm/llvm-project/pull/133923


More information about the llvm-commits mailing list