[PATCH] [X86][SSE] Add v16i8/v32i8 multiplication support
Elena Demikhovsky
elena.demikhovsky at intel.com
Mon Apr 20 13:33:20 PDT 2015
I think that this sequence is good for SSE2. But for SSE4, AVX2 and definitely AVX-512 we can find a better chain. See my comments inside.
REPOSITORY
rL LLVM
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:15899
@@ +15898,3 @@
+ if (VT == MVT::v16i8 || VT == MVT::v32i8) {
+ MVT ExVT = (VT == MVT::v16i8 ? MVT::v8i16 : MVT::v16i16);
+ // Extract the lo parts, sign extend to i16 and multiply
----------------
There is a more optimal way for sign extend on SSE4, AVX2, at least for lower part. just VPMOVSXBW.
And for AVX-512 (skx) we have truncate from W to B.
So I suggest to write more generic code and then lower it according to target:
1) ALo = sign extend lower part of A from "B" to "W" (ISD::SIGN_EXTEND)
2) BLo = sign extend lower part of B from "B" to "W"
3) multiply ALo * BLo
4) shift the whole vector A right to put the high part instead of low (VPALIGNR)
5) do the same with AHi, BHi
6) use ISD::TRUNCATE for writing result back
you can optimize truncate/extend according to the target capabilities
http://reviews.llvm.org/D9115
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list