[PATCH] D142602: [X86] Expand transform (icmp eq/ne (ABS A), C) -> (and/or (icmp eq/ne A, C), (icmp eq/ne A, -C))
Noah Goldstein via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Feb 11 22:37:58 PST 2023
goldstein.w.n added inline comments.
================
Comment at: llvm/test/CodeGen/X86/icmp-abs-C-vec.ll:104-107
+; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm2 = [18446744073709551487,18446744073709551487,18446744073709551487,18446744073709551487]
+; AVX2-NEXT: vpcmpeqq %ymm2, %ymm0, %ymm2
; AVX2-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm0
+; AVX2-NEXT: vpor %ymm2, %ymm0, %ymm0
----------------
pengfei wrote:
> I doubt if this is beneficial. The transform neither reduces instructions nor improves throughput, but it introduces extra memory load. WDYT?
It's not a lot more memory, only 8 more bytes for the broadcast. If the new constant micro-fused with the vpcmp then it would be +32 bytes but save a true instruction.
Also note `vblendvpd` is 2 uops, not 1.
But I see the point. Think it would generally make sense as in a loop the load can be hoisted in which case vpcmpeq + vpor is better than vpsub + vblendvpd but granted not by much.
Could make this transform only happen if -C already exists as a node in the DAG, you think that preferable?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D142602/new/
https://reviews.llvm.org/D142602
More information about the llvm-commits
mailing list