[PATCH] D41714: [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 3 13:31:15 PST 2018
spatel created this revision.
spatel added reviewers: courbet, RKSimon, zvi.
Herald added a subscriber: mcrosier.
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325
We're trading branch and compares for loads and logic ops. This makes the code smaller and hopefully faster in most cases.
The 24-byte case shows an interesting construct: we load the trailing scalar elements into vector registers and generate the same pcmpeq+movmsk code that we expected for a pair of full vector elements (see the 32- and 64-byte tests)
https://reviews.llvm.org/D41714
Files:
lib/CodeGen/ExpandMemCmp.cpp
lib/Target/X86/X86ISelLowering.h
test/CodeGen/X86/memcmp-optsize.ll
test/CodeGen/X86/memcmp.ll
test/Transforms/ExpandMemCmp/X86/memcmp.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D41714.128553.patch
Type: text/x-patch
Size: 47637 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180103/28cbcf34/attachment.bin>
More information about the llvm-commits
mailing list