[PATCH] D41484: [X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 21 04:26:44 PST 2017
RKSimon created this revision.
RKSimon added reviewers: craig.topper, pcordes, zvi, spatel.
If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow.
The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend.
If people want I can try to incorporate this more into the ShrinkMode enum returned by canReduceVMulWidth ?
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 7641 bytes
Desc: not available
More information about the llvm-commits