[PATCH] D41714: [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 3 13:31:15 PST 2018


spatel created this revision.
spatel added reviewers: courbet, RKSimon, zvi.
Herald added a subscriber: mcrosier.

This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325

We're trading branch and compares for loads and logic ops. This makes the code smaller and hopefully faster in most cases.

The 24-byte case shows an interesting construct: we load the trailing scalar elements into vector registers and generate the same pcmpeq+movmsk code that we expected for a pair of full vector elements (see the 32- and 64-byte tests)


https://reviews.llvm.org/D41714

Files:
  lib/CodeGen/ExpandMemCmp.cpp
  lib/Target/X86/X86ISelLowering.h
  test/CodeGen/X86/memcmp-optsize.ll
  test/CodeGen/X86/memcmp.ll
  test/Transforms/ExpandMemCmp/X86/memcmp.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D41714.128553.patch
Type: text/x-patch
Size: 47637 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180103/28cbcf34/attachment.bin>


More information about the llvm-commits mailing list