[PATCH] D41484: [X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros

Thu Dec 28 02:09:27 PST 2017

RKSimon added a comment.

In https://reviews.llvm.org/D41484#964571, @craig.topper wrote:

> LGTM
>
> I didn't realize when I made that avx comment that shrink vmul only applies to pre-sse4.1

Thanks - I'm wondering whether we should try to use MADD for SSE41+ targets as well - realistically v2Xi16 multiplies are always going to be faster than vXi32 (1cy or more latency saving according to Agner). Similar to your avx512 vXi64 multiply patches I guess.

Repository:
  rL LLVM

https://reviews.llvm.org/D41484