[PATCH] [X86][SSE] Add v16i8/v32i8 multiplication support

Mon Apr 20 13:33:20 PDT 2015

I think that this sequence is good for SSE2. But for SSE4, AVX2 and definitely AVX-512 we can find a better chain. See my comments inside.

REPOSITORY
  rL LLVM

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15899
@@ +15898,3 @@
+  if (VT == MVT::v16i8 || VT == MVT::v32i8) {
+    MVT ExVT = (VT == MVT::v16i8 ? MVT::v8i16 : MVT::v16i16);
+    // Extract the lo parts, sign extend to i16 and multiply
----------------
There is a more optimal way for sign extend on SSE4, AVX2, at least for lower part. just VPMOVSXBW.
And for AVX-512 (skx) we have truncate from W to B.
So I suggest to write more generic code and then lower it according to target:
1) ALo = sign extend lower part of A from "B" to "W" (ISD::SIGN_EXTEND)
2) BLo =  sign extend lower part of B from "B" to "W"
3) multiply ALo * BLo
4) shift the whole vector A right to put the high part instead of low (VPALIGNR)
5) do the same with AHi, BHi
6) use ISD::TRUNCATE for writing result back

you can optimize truncate/extend according to the target capabilities

http://reviews.llvm.org/D9115

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/