[PATCH] D24681: Optimize patterns of vectorized interleaved memory accesses for X86.
Farhana Aleen via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 30 14:03:18 PDT 2016
Farhana added inline comments.
> DavidKreitzer wrote in X86InterleavedAccess.cpp:87
> Wouldn't it be simpler to create a bitcast to a pointer to an array of the smaller vector type? That would avoid the need for further bitcasts, saving 4 LLVM IR instructions. For example, in the 4 interleaved double case, the bitcast would look like this:
>
> %bc = bitcast <16 x double>* %ptr to [4 x <4 x double>]*
>
> And then your GEPs would look like this:
>
> %gep0 = getelementptr [4 x <4 x double>], [4 x <4 x double>]* %bc, i32 0, i32 0
> %gep1 = getelementptr [4 x <4 x double>], [4 x <4 x double>]* %bc, i32 0, i32 1
> %gep2 = getelementptr [4 x <4 x double>], [4 x <4 x double>]* %bc, i32 0, i32 2
> %gep3 = getelementptr [4 x <4 x double>], [4 x <4 x double>]* %bc, i32 0, i32 3
Good suggestion!
> DavidKreitzer wrote in X86InterleavedAccess.cpp:119
> You seem to be making unwarranted assumptions about the order of the Shuffles array. Your code is only correct if
>
> Indices[0] = 3
> Indices[1] = 2
> Indices[2] = 1
> Indices[3] = 0
>
> I see nothing in the interleaved access pass that guarantees this ordering.
>
> A better way to write this function would be to populate a new array of shuffles where NewShuffles[0] corresponds to an index of 0, NewShuffles[1] corresponds to an index of 1, etc. Then you can do the shuffle replacement like this:
>
> for (unsigned i = 0; i < Shuffles.size(); i++) {
> unsigned Index = Indices[i];
> Shuffles[i]->replaceAllUsesWith(NewShuffles[Index]);
> }
Well, vectorizer guarantees the order, but certainly I should not have relied on that since any other optimizations can mess up the order.
> DavidKreitzer wrote in X86InterleavedAccess.cpp:146
> Do you really need this test for unique indices? I see no reason not to handle multiple shuffles with the same index.
>
> I also don't see a need for all possible indices to be covered, which you are implicitly requiring by forcing all the indices to be unique and checking that Shuffles.size() == 4 in isSupported. If you remove this requirement, you will automatically handle more cases. For example, suppose we vectorize a loop that accesses only fields a, b, and d of an array of this type of structure:
>
> struct {
> double a, b, c, d;
> };
>
> Your shuffle sequence will still work well for this case. One of the output shuffles will just go unused.
>
> This suggestion has implications on the isSupported routine. You would have to replace the Shuffles.size() == 4 check with Factor == 4.
What about the case where we only have only one strided_load and generating 4 extracts would be more profitable than generating 8 vector instructions, right?
> RKSimon wrote in x86-interleaved-access.ll:1
> Is this actually true? The checks below don't look like what the script would generate.
Hi Simon,
I am not sure whether I understand your concern. Which checks are you talking about?
Farhana
https://reviews.llvm.org/D24681
More information about the llvm-commits
mailing list