[PATCH] D30680: new method TargetTransformInfo::supportsVectorElementLoadStore() for LoopVectorizer

Tue Apr 4 21:11:41 PDT 2017

anemet added a comment.

In https://reviews.llvm.org/D30680#713835, @jonpa wrote:

> In https://reviews.llvm.org/D30680#713268, @anemet wrote:
>
> > Sorry about the delay on this but I was working on something related for ARM that may benefit from this as well.  What I need for ARM is something that can communicate to the SLPVectorizer that load-pair and store-pair (of two registers) is efficiently supported on the target.  I am wondering if we can combine the two things if your new hook would take the type and the vectorization width.
> >
> > What do you think?
>
>
> Is this also in the context of scalarizing a load / store?
>
> For SystemZ, a scalarized memory access will have to do VF memory operations, but there is no need to extract or insert any of the data elements, as there are vector element load/store instructions.

We have something like this on ARM too.  ld1 can load any element of a vector (e.g. ld1.s {v1}[1], [x1] loads lane 1 of vector reg v1) and st1 can store any element.  That said, ld1 is still a partial write of the vector register so in terms of performance, it's worse than a regular store which is a full write.  I think that modeling its cost as a load + insert (for non-zero-lane) is fairly accurate.  Doesn't this match the situation on SystemZ?

https://reviews.llvm.org/D30680