[PATCH] D22225: [x86, SSE] optimize pcmp results better (PR28484)

Mon Jul 11 10:21:26 PDT 2016

spatel created this revision.
spatel added reviewers: mkuper, DavidKreitzer, ab, RKSimon.
spatel added a subscriber: llvm-commits.
Herald added a subscriber: mcrosier.

We know that pcmp (SSE/AVX at least; I'm intentionally leaving 512-bit out of this patch because I don't know what happens there) produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.

FWIW, I see no perf differences in test-suite with this change. I don't expect that a zext of a bitmask is a common pattern. This is a first step towards the better motivating example in PR28486:
https://llvm.org/bugs/show_bug.cgi?id=28486
...which is itself just an extract from a case where we seemingly get everything wrong:
https://godbolt.org/g/Ez2bDW

One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the same throughput potential as load+and on those cores, but I think that should be handled as a CPU-specific later transformation if it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505

http://reviews.llvm.org/D22225

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/avx512-ext.ll
  test/CodeGen/X86/avx512-vec-cmp.ll
  test/CodeGen/X86/shift-pcmp.ll
  test/CodeGen/X86/vector-pcmp.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D22225.63519.patch
Type: text/x-patch
Size: 9017 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160711/db9fe8e8/attachment.bin>