[PATCH] D110588: [X86] decomposeMulByConstant - decompose legal vXi32 mutliplies on SlowPMULLD targets
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 27 14:08:41 PDT 2021
RKSimon created this revision.
RKSimon added reviewers: pengfei, spatel, craig.topper, lebedev.ri, andreadb.
Herald added a subscriber: hiraditya.
RKSimon requested review of this revision.
Herald added a project: LLVM.
X86's decomposeMulByConstant never permits mul decomposition to shift+add/sub if the vector multiply is legal.
Unfortunately this isn't great for SSE41+ targets which have PMULLD for vXi32 multiplies, but is often quite slow. This initial patch proposes to allow decomposition if the target has the SlowPMULLD flag (i.e. Silvermont), but I'm wondering whether it might be worthwhile to always do this? PMULLD appears to always have worst perf (tp or lat) than a PSLLD+PADDD/PSUBD.
I'm also wondering whether we should always decompose legal vXi64 multiplies - even latest IceLake has really poor latencies for PMULLQ.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D110588
Files:
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/vector-mul.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D110588.375397.patch
Type: text/x-patch
Size: 8140 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210927/0a7141b5/attachment.bin>
More information about the llvm-commits
mailing list