[PATCH] D110480: [X86] Alter throughput for vpshufb/vpperm on bdver2 model to match AMD documentation (PR51539)
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Sep 25 09:05:15 PDT 2021
RKSimon created this revision.
RKSimon added reviewers: lebedev.ri, GGanesh.
Herald added subscribers: pengfei, gbedwell, hiraditya.
Herald added a reviewer: andreadb.
RKSimon requested review of this revision.
Herald added a project: LLVM.
As reported on PR51539, codegen involving vpshufb/vpperm appears to report higher than likely throughput costs.
e.g. ctpop: https://c.godbolt.org/z/4hcaMqPzd
According to the AMDFam15h SoG, these are fastpath (tp = 1.0) but just on pipe1 (xbr). Agner + Instxlat agree that both the latency and throughput are faster than the model as well.
AMD (https://www.amd.com/system/files/TechDocs/47414_15h_sw_opt_guide.pdf)
Agner (https://agner.org/optimize/instruction_tables.pdf)
Instxlat (http://users.atw.hu/instlatx64/AuthenticAMD/AuthenticAMD0610F01_K15_Piledriver_InstLatX64.txt)
I think most other shuffles should probably be using xbr as well?
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D110480
Files:
llvm/lib/Target/X86/X86ScheduleBdVer2.td
llvm/test/tools/llvm-mca/X86/BdVer2/resources-avx1.s
llvm/test/tools/llvm-mca/X86/BdVer2/resources-ssse3.s
llvm/test/tools/llvm-mca/X86/BdVer2/resources-xop.s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D110480.375041.patch
Type: text/x-patch
Size: 14436 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210925/98806034/attachment.bin>
More information about the llvm-commits
mailing list