[LLVMdev] AVX code gen
Ken Gahagan
ken.gahagan at gmail.com
Wed Dec 11 12:59:47 PST 2013
Hello -
I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am new to clang / llvm so I may not be invoking the tools correctly but given that –fvectorize and –fslp-vectorize are on by default at 3.4 I would have thought that if the code is AVX-able by icc that clang / llvm would be able to do the same… The code is basic matrix multiplication written a number of ways (with and without transposition and such) as a performance measurement exercise.
The environments I’ve tried are:
Intel Ivy Bridge-EX (pre-release hardware) running Red Hat Linux 6.5
Generic desktop with Haswell processor running Fedora 18
If you have a moment to point me to the appropriate docs I’m happy to go learn on my own – but I’ve now googled for the better part of 3 days trying to find what invocation parameters I should use to get the desired use of packed AVX instructions and the YMM registers and I just can’t seem to get it right. I’m also grateful if you just send the correct invocation.
I’ve actually started digging through the code as well - but since I am starting from zero it could take me a while to find an answer this way - just didn’t want you to think I’m not willing to try to find the answer on my own :-)
Thank you,
Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131211/eeb7c61c/attachment.html>
More information about the llvm-dev
mailing list