[PATCH] D52318: [x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)
Andrea Di Biagio via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 21 07:37:12 PDT 2018
andreadb added a comment.
Hi Sanjay,
You should add a test where the mask vector is not a constant.
I verified that on Jaguar, this change improves cases where:
- the mask is a constant
- users access the lo/hi part of the defined YMM.
In one particular case, I saw a quite nice improvement in IPC.
Unfortunately, I also found this regression:
define <8 x i32> @bar(<8 x i32> %A, <8 x i32> %B, <8 x i32> %Mask) {
%1 = and <8 x i32> %A, %Mask
%2 = xor <8 x i32> %1, %Mask
%3 = add <8 x i32> %2, %B
ret <8 x i32> %3
}
Before this patch (-mcpu=btver2):
vandnps %ymm2, %ymm0, %ymm0
vextractf128 $1, %ymm1, %xmm3
vextractf128 $1, %ymm0, %xmm2
vpaddd %xmm1, %xmm0, %xmm0
vpaddd %xmm3, %xmm2, %xmm2
vinsertf128 $1, %xmm2, %ymm0, %ymm0
retq
After your patch:
vxorps %xmm3, %xmm3, %xmm3
vextractf128 $1, %ymm1, %xmm4
vcmptrueps %ymm3, %ymm3, %ymm3
vxorps %ymm3, %ymm0, %ymm0
vandps %xmm2, %xmm0, %xmm3
vextractf128 $1, %ymm0, %xmm0
vextractf128 $1, %ymm2, %xmm2
vpand %xmm2, %xmm0, %xmm0
vpaddd %xmm1, %xmm3, %xmm1
vpaddd %xmm4, %xmm0, %xmm0
vinsertf128 $1, %xmm0, %ymm1, %ymm0
retq
Could you please have a look at it?
Thanks,
Andrea
https://reviews.llvm.org/D52318
More information about the llvm-commits
mailing list