[PATCH] D24681: Optimize patterns of vectorized interleaved memory accesses for X86.

Fri Oct 7 10:20:54 PDT 2016

Farhana added inline comments.

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:70
+///   %v2 = shuffle %intrshuffvec3, %intrshuffvec4, <0, 4, 2, 6>;
+///   %v3 = shuffle %intrshuffvec3, %intrshuffvec4, <1, 5, 3, 7>;
+///
----------------
delena wrote:
> AVX512 probably has another set of shuffles
Yes. The plan is to support AVX512 in a separate check-in, also handle the patterns that take advantage of its extended shuffle instructions and the wider vector length. This change-set is meant to support only AVX1 and AVX2.

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:143
+
+  return lower(LI, Shuffles, Indices, Factor);
+}
----------------
delena wrote:
> Farhana wrote:
> > delena wrote:
> > > It is not a good name for function. I think that you don't need additional function call here at all.
> > You are right in the current context. My plan is to define a class to encapsulate all the information and allow data sharing where I will have two main functions one for load generation and the other one for shuffle-generation. In order to keep the follow-up change-set with minimal changes I decided to create a function here. I hope it's ok to do so.
> Each patch should look good regardless of future plans.  
I totally agree with you. I in-lined the function.

================
Comment at: test/CodeGen/X86/x86-interleaved-access.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=x86_64-pc-linux  -mattr=+avx < %s | FileCheck %s --check-prefix=AVX --check-prefix=AVX1
----------------
Farhana wrote:
> RKSimon wrote:
> > Farhana wrote:
> > > RKSimon wrote:
> > > > Is this actually true? The checks below don't look like what the script would generate.
> > > Hi Simon,
> > > I am not sure whether I understand your concern. Which checks are you talking about?
> > > Farhana
> > The 'AVX-NEXT'/'AVX1-NEXT'/'AVX2-NEXT' checks - the update script would generate quite a bit more than what is shown below.
> If I understand your comment correctly, you are saying the optimization will generate more instructions than it is checking for. Yes, it only checks for the must instructions, because the rest can be optimized away depending on the uses. 
Hi Simon,

I think I understand your question now (Dave helped me).

You are right the script update_llc_test_checks.py generates quite a bit more checks than what I have here. Yes, the checks are not auto-generated by the script. I got rid of the NOTE. 

But now I am wondering whether I should have used the script or not. I did not want to put all the checks because in my opinion putting all of them would be unnecessary in this case, checking for first few instructions would be enough to ensure the behavior.

Let me know if you think it's good practice to use the script always...

https://reviews.llvm.org/D24681