[llvm-dev] Data structure improvement for the SLP vectorizer

Shahid, Asghar-ahmad via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 14 21:34:46 PDT 2017


Thanks Keno for your interest. My response inlined below.

> -----Original Message-----
> From: Keno Fischer [mailto:keno at juliacomputing.com]
> Sent: Wednesday, March 15, 2017 6:31 AM
> To: llvm-dev at lists.llvm.org
> Cc: Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com>; Michael
> Kuperstein <mkuper at google.com>; Matthias Braun <matze at braunis.de>
> Subject: Data structure improvement for the SLP vectorizer
> 
> There was some discussion of this on the llvm-commits list, but I wanted to
> raise the topic for discussion here. The background of the -commits
> discussion was that r296863 added the ability to sort memory access when
> the SLP vectorizer reached a load (the SLP vectorizer starts at a store or some
> other sink, and tries to go up the tree vectorizing as it goes along - if the input
> is in a different order than the output, you need a shufflevector somewhere
> in the middle).
> However, that commit had to be reverted because the SLP tree is not actually
> a tree and the same load bundle could be used by several different users,
> causing problems with representing the required shuffle on the load bundle
> node (Please add any history I'm missing here).
> 
> Now, I'm jumping into this because I want to teach the vectorizer about a
> similar operation, namely a 2->4 (or 4->8 for AVX512) broadcast. This
> operation also requires a load followed by a shufflevector, so it would be
> nice to use the same representation for both of these.
How do you plan to identify or mark a certain "load" bundle for broadcast,
As these "loads" presumably will not be unordered?

> 
> Now, the primary problem is that we don't really have anywhere to put this
> permutation information. The two solutions seem to be duplicating the
> nodes (and maybe deduplicating them later again when materializing the
> tree), or taking the hit and representing the graph more explicitly, such that
> we can annotate information on the actual edges of the graph.
> 
> The latter seems nicer to me, we'd essentially have a DAG of scalar bundles
> with vector shuffles in between them.

That’s right.

Right now the vectorizer can't really
> handle intermediate vector shuffles at all as far as I can tell (with a few hand
> coded exceptions for common patterns). From my perspective that seems a
> bit odd. AVX512 for example can represent most (all?) shufflevector
> operations in one instruction, so failing to vectorize a large tree because it
> would require one of them seems very problematic.

Do you mean widening the vector shuffles itself?

 This is also related to my
> desire for vector->larger vector broadcasting. Especially when targeting
> AVX512, there can be whole 4-wide subtrees that the SLP vectorizer isn't
> currently looking at, because it can't handle the 4->8 expansion. Whatever
> the new representation, I'd like the SLP vectorizer to be able to handle that.
> 
> I hope that summarizes the problem. My understanding of the situation is
> admittedly incomplete - I came to this because I need the vectorizer to
> correctly handle my narrow-width subtrees, but I'm hoping we can
> collectively come up with a design that would cover all the relevant use cases.

I did not get this part can you please elaborate?

Regards,
Shahid


More information about the llvm-dev mailing list