[PATCH] D68457: [X86] Enable AVX512BW and AVX512VL for memcmp

Fri Oct 4 11:51:33 PDT 2019

davezarzycki added a comment.

That's a reasonable point. From my perspective:

1. Staying in the opmask registers as long as possible relieves general purpose register pressure.
2. When/if `MaxLoadsPerMemcmp` becomes greater than 2, then keeping the intermediate results in the opmask registers is the right thing to do.
3. I have a pending patch to increase `MaxLoadsPerMemcmp` to 4 if-and-only-if a compare against zero is happening. Changing the value to 8 (if-and-only-if a compare against zero is happening) will require more work (I think).
4. In the case of 512-bit ops, using AVX512BW instead of just AVX512F better matches people's mental model (that byte-wise memcmp() should generate byte-wise vector instructions).

What do you think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68457/new/

https://reviews.llvm.org/D68457