[PATCH] D41714: [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 3 13:31:15 PST 2018
spatel created this revision.
spatel added reviewers: courbet, RKSimon, zvi.
Herald added a subscriber: mcrosier.
This is the last step needed to fix PR33325:
We're trading branch and compares for loads and logic ops. This makes the code smaller and hopefully faster in most cases.
The 24-byte case shows an interesting construct: we load the trailing scalar elements into vector registers and generate the same pcmpeq+movmsk code that we expected for a pair of full vector elements (see the 32- and 64-byte tests)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 47637 bytes
Desc: not available
More information about the llvm-commits