[PATCH] D44267: Remove SRAs from v16i8 multiply lowering on sse2 targets

Thu Mar 8 11:10:20 PST 2018

craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel.

Previously we unpacked the even bytes of each input into the high byte of 16-bit elements then did an v8i16 arithmetic shift right by 8 bits to fill the upper bits of each byte with sign bits. Then we did the v8i16 multiply and then masked to zero the upper 8-bits of each result. The similar was done for all the odd bytes. The results are then packed together with packuswb

Since we are masking each multiply result element to 8-bits, and those 8-bits are determined only by the lower 8-bits of each of the inputs, we don't need to fill the upper bits with sign bits. So we can just unpack into the low byte of each element and treat the upper bits as garbage. This is what gcc also does.

https://reviews.llvm.org/D44267

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/pmul.ll
  test/CodeGen/X86/vector-idiv-sdiv-128.ll
  test/CodeGen/X86/vector-idiv-udiv-128.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D44267.137618.patch
Type: text/x-patch
Size: 22123 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180308/6ad13f9f/attachment.bin>