[llvm] [X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded (PR #133923)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 2 06:28:27 PDT 2025
================
@@ -29,15 +29,21 @@ define <4 x double> @concat_vpermv3_ops_vpermv_v4f64(ptr %p0, <4 x i64> %m) {
define <4 x double> @concat_vpermv3_ops_vpermv_swap_v4f64(ptr %p0, <4 x i64> %m) {
; X86-LABEL: concat_vpermv3_ops_vpermv_swap_v4f64:
; X86: # %bb.0:
+; X86-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: vmovapd 32(%eax), %ymm1
-; X86-NEXT: vpermi2pd (%eax), %ymm1, %ymm0
+; X86-NEXT: vmovupd (%eax), %zmm1
+; X86-NEXT: vshuff64x2 {{.*#+}} zmm1 = zmm1[4,5,6,7,0,1,2,3]
+; X86-NEXT: vpermpd %zmm1, %zmm0, %zmm0
----------------
RKSimon wrote:
This is the VPERMV(M,CONCAT(Y,X)) handling I mentioned in the summary, it allows the free concat but then needs to commute the halves. Would you prefer I remove it for now?
https://github.com/llvm/llvm-project/pull/133923
More information about the llvm-commits
mailing list