[llvm-dev] Data structure improvement for the SLP vectorizer

Shahid, Asghar-ahmad via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 16 21:19:51 PDT 2017



> -----Original Message-----
> From: Keno Fischer [mailto:keno at juliacomputing.com]
> Sent: Thursday, March 16, 2017 6:11 PM
> To: Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com>
> Cc: llvm-dev at lists.llvm.org; Michael Kuperstein <mkuper at google.com>;
> Matthias Braun <matze at braunis.de>
> Subject: Re: Data structure improvement for the SLP vectorizer
> 
> On Thu, Mar 16, 2017 at 1:11 AM, Shahid, Asghar-ahmad <Asghar-
> ahmad.Shahid at amd.com> wrote:
> >
> > Here, %load should be 4 element load instead of 2 and you can still do
> > the required broadcast With above shuffle. Why this is important is
> > that with our scheme of having a DAG with masks on Edges to a single
> > tree node, generating a vector load of lesser length than the chosen vector
> factor Will be problematic.
> 
> Could you elaborate why you think this is? There doesn't seem a problem to
> me of having on 2-element bundle and then putting (0,1,0,1) on the edge to a
> 4-element, bundle, but I may be missing something.

That is because no. of elements for a bundle is decided by a chosen vector factor(VF).
VF is chosen by the target vector register length divided by data type used. In case it is
4, then the tree of different length is built and profitability is checked and the profitable
Tree gets vectorized with the chosen VF.

Another way to look at it is, if you have a case where you need to use full length loaded vector and also partially, reloading same values for lesser no. of element is not efficient instead using the just required values from the fully loaded vector is fine using shuffle.

Regards,
Shahid

> 
> Thanks,
> Keno


More information about the llvm-dev mailing list