[PATCH] D12663: [X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Sat Sep 5 07:33:30 PDT 2015
RKSimon created this revision.
RKSimon added reviewers: chandlerc, qcolombet, delena, andreadb.
RKSimon added subscribers: llvm-commits, logan.
RKSimon set the repository for this revision to rL LLVM.
Herald added a subscriber: aemerson.
Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1))
Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x))
Originally I was intending to implement this generically in the VectorLegalizer but hit the issue that the 2i64 implementations were vectorized and saw a large perf regression. I could still do this and provide a 'empty' custom implementation on X86 to force scalarization - not sure if its good practice though? It would have the benefit that we could remove the very similar implementation in the ARM target as well (Logan any comments?).
Repository:
rL LLVM
http://reviews.llvm.org/D12663
Files:
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/vector-tzcnt-128.ll
test/CodeGen/X86/vector-tzcnt-256.ll
test/CodeGen/X86/vector-tzcnt-512.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D12663.34110.patch
Type: text/x-patch
Size: 199362 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150905/c26e9ed3/attachment-0001.bin>
More information about the llvm-commits
mailing list