[PATCH] D55138: [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1

Fri Nov 30 11:39:44 PST 2018

craig.topper created this revision.
craig.topper added reviewers: spatel, RKSimon.

With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

https://reviews.llvm.org/D55138

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/vector-idiv-udiv-128.ll
  test/CodeGen/X86/vector-idiv-udiv-256.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D55138.176173.patch
Type: text/x-patch
Size: 17186 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181130/68f3ec45/attachment.bin>