[PATCH] D68457: [X86] Enable AVX512BW and AVX512VL for memcmp
David Zarzycki via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 4 11:51:33 PDT 2019
davezarzycki added a comment.
That's a reasonable point. From my perspective:
1. Staying in the opmask registers as long as possible relieves general purpose register pressure.
2. When/if `MaxLoadsPerMemcmp` becomes greater than 2, then keeping the intermediate results in the opmask registers is the right thing to do.
3. I have a pending patch to increase `MaxLoadsPerMemcmp` to 4 if-and-only-if a compare against zero is happening. Changing the value to 8 (if-and-only-if a compare against zero is happening) will require more work (I think).
4. In the case of 512-bit ops, using AVX512BW instead of just AVX512F better matches people's mental model (that byte-wise memcmp() should generate byte-wise vector instructions).
What do you think?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D68457/new/
https://reviews.llvm.org/D68457
More information about the llvm-commits
mailing list