[PATCH] D142602: [X86] Expand transform (icmp eq/ne (ABS A), C) -> (and/or (icmp eq/ne A, C), (icmp eq/ne A, -C))

Noah Goldstein via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Feb 11 22:37:58 PST 2023


goldstein.w.n added inline comments.


================
Comment at: llvm/test/CodeGen/X86/icmp-abs-C-vec.ll:104-107
+; AVX2-NEXT:    vpbroadcastq {{.*#+}} ymm2 = [18446744073709551487,18446744073709551487,18446744073709551487,18446744073709551487]
+; AVX2-NEXT:    vpcmpeqq %ymm2, %ymm0, %ymm2
 ; AVX2-NEXT:    vpcmpeqq %ymm1, %ymm0, %ymm0
+; AVX2-NEXT:    vpor %ymm2, %ymm0, %ymm0
----------------
pengfei wrote:
> I doubt if this is beneficial. The transform neither reduces instructions nor improves throughput, but it introduces extra memory load. WDYT?
It's not a lot more memory, only 8 more bytes for the broadcast. If the new constant micro-fused with the vpcmp then it would be +32 bytes but save a true instruction.

Also note `vblendvpd` is 2 uops, not 1.

But I see the point. Think it would generally make sense as in a loop the load can be hoisted in which case vpcmpeq + vpor is better than vpsub + vblendvpd but granted not by much.

Could make this transform only happen if -C already exists as a node in the DAG, you think that preferable?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142602/new/

https://reviews.llvm.org/D142602



More information about the llvm-commits mailing list