[PATCH] D41484: [X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 28 02:09:27 PST 2017


RKSimon added a comment.

In https://reviews.llvm.org/D41484#964571, @craig.topper wrote:

> LGTM
>
> I didn't realize when I made that avx comment that shrink vmul only applies to pre-sse4.1


Thanks - I'm wondering whether we should try to use MADD for SSE41+ targets as well - realistically v2Xi16 multiplies are always going to be faster than vXi32 (1cy or more latency saving according to Agner). Similar to your avx512 vXi64 multiply patches I guess.


Repository:
  rL LLVM

https://reviews.llvm.org/D41484





More information about the llvm-commits mailing list