[PATCH] D12663: [X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF

Sat Sep 5 07:33:30 PDT 2015

RKSimon created this revision.
RKSimon added reviewers: chandlerc, qcolombet, delena, andreadb.
RKSimon added subscribers: llvm-commits, logan.
RKSimon set the repository for this revision to rL LLVM.
Herald added a subscriber: aemerson.

Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1))

Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x))

Originally I was intending to implement this generically in the VectorLegalizer but hit the issue that the 2i64 implementations were vectorized and saw a large perf regression. I could still do this and provide a 'empty' custom implementation on X86 to force scalarization - not sure if its good practice though? It would have the benefit that we could remove the very similar implementation in the ARM target as well (Logan any comments?).

Repository:
  rL LLVM

http://reviews.llvm.org/D12663

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/vector-tzcnt-128.ll
  test/CodeGen/X86/vector-tzcnt-256.ll
  test/CodeGen/X86/vector-tzcnt-512.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D12663.34110.patch
Type: text/x-patch
Size: 199362 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150905/c26e9ed3/attachment-0001.bin>