[PATCH] D69222: [X86] NFC: expand inline memcmp test coverage

Mon Oct 21 02:51:52 PDT 2019

davezarzycki added a comment.

And now that I've had a chance to rebase D69044 <https://reviews.llvm.org/D69044> ("up to four load pairs") on top of this updated test file, I can report that:

1. The 48 and 96 byte memcmps do not improve for AVX2 or AVX512.
2. The AVX1 code gen is relatively reasonable for 48 bytes: three XMM compares. It could have been one YMM compare and one zero extended XMM compare.

I think I figured out why 48 and 96 bytes are awful. It seems that lowering a vector that is the result of a zero extended scalar generates terrible code. Should `combineVectorSizedSetCCEquality` detect the zero extend and create an `ISD::INSERT_SUBVECTOR` node? Or should something more fundamental detect this scenario and create the `ISD::INSERT_SUBVECTOR`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69222/new/

https://reviews.llvm.org/D69222