[llvm-dev] Ryzen (znver1) scheduler and instruction selection
Das, Dibyendu via llvm-dev
llvm-dev at lists.llvm.org
Tue Mar 14 07:56:24 PDT 2017
Yes we are working on the scheduler and we will upstream by llvm 5.0. However you can get in touch with me and we can discuss.
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Denis Steckelmacher via llvm-dev
Sent: Tuesday, March 14, 2017 8:13 PM
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] Ryzen (znver1) scheduler and instruction selection
I have just bought an AMD Ryzen 7 1700 CPU that I use to run scientific computing programs (built on Numpy). Getting lower performance than expected, I started to profile my entire system using "perf top" and to fix any library that does not properly recognize my CPU. For instance, OpenBLAS was using non-vectorized fall-back paths everywhere, dividing performance by 3 on my benchmark.
Having closely watched LLVM for a couple of years, I have seen that Zen support has been recently been added. I have looked at the relevant commits (and the files as they now are in SVN), and I see that znver1 still uses the BtVer2 scheduler model. In my experiments, using the Haswell scheduler for znver1 leads to marginal gains, but I still wanted to develop a complete Zen scheduler model. However, we currently do not have enough information from AMD and even reading the GCC patches for Zen did not allow me to produce a valid scheduler. Basically, my scheduler leads to performance consistently 5-10% below the Haswell scheduler (on C-Ray multithreaded and pgbench v9.4.3). I'm still quite impressed at how important a scheduler can be.
Does someone know if someone else has already worked on a Zen scheduler? If not, I'll continue my work and I will keep you informed.
Another small issue that I have found, and that may or may not be important, is how X86 instructions are selected. In lib/Target/X86/ X86TargetTransformInfo.cpp, the cost of plenty of instructions is given in tables. Different tables allow to have different costs depending on the processor. However, a processor is mapped to a technology (Zen supports AVX2), then a technology is mapped to costs (AVX2 to costs optimized for Intel Haswell). My Ryzen CPU therefore gets Haswell costs. I have no idea of whether there is a significant difference in costs between CPU implementations, but this architecture may prevent LLVM from getting the most out of non-Intel CPUs. Has anyone looked into this?
I want to stress the fact that this email is more a list of questions than a complain. I am well aware that most developers are probably using an Intel-based machine, which introduces a natural bias towards Intel as it is the platform on which tests and benchmarks are run. I would like to start a discussion on how to make LLVM, and compilers in general, more architecture-independent with regards to optimization.
(I am a PhD student and have no connection with AMD; I bought my Ryzen CPU with my own personal money) _______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
More information about the llvm-dev