[llvm-dev] [arm, aarch64] Alignment checking in interleaved access pass

Thu Oct 6 11:58:11 PDT 2016

All,

Gentle reminder in the hopes of getting some answers to the questions in
the original email.

Thank you,
Alina

On Mon, Sep 19, 2016 at 1:52 PM, Alina Sbirlea <alina.sbirlea at gmail.com>
wrote:

> Hi,
>
> As a follow up to Patch D23646 <https://reviews.llvm.org/D23646>, I'm
> trying to figure out if there should be an alignment check and what the
> correct approach is.
>
> Some background:
> For stores, the pass turns:
> %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1,
>                  <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>
> store <12 x i32> %i.vec, <12 x i32>* %ptr
> Into:
> %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3>
> %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7>
> %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11>
> call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)
>
> The purpose of the above patch is to enable more general patterns such as
> turning:
> %i.vec = shuffle <32 x i32> %v0, <32 x i32> %v1,
>                 <4, 32, 16, 5, 33, 17, 6, 34, 18, 7, 35, 19>
> store <12 x i32> %i.vec, <12 x i32>* %ptr
> Into:
> %sub.v0 = shuffle <32 x i32> %v0, <32 x i32> v1, <4, 5, 6, 7>
> %sub.v1 = shuffle <32 x i32> %v0, <32 x i32> v1, <32, 33, 34, 35>
> %sub.v2 = shuffle <32 x i32> %v0, <32 x i32> v1, <16, 17, 18, 19>
> call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)
>
> The question I'm trying to get answered if there should have been an
> alignment check for the original pass, and, similarly, if there should be
> an expanded one for the more general pattern.
> In the example above, I was looking to check if the data at positions 4,
> 16, 32 is aligned, but I cannot get a clear picture on the impact on
> performance (hence the side question below).
> Also, some preliminary alignment checks I added break some ARM tests (and
> not their AArch64 counterparts). The cause is getting "not fast" from
> allowsMisalignedMemoryAccesses, from checking hasV7Ops.
> I'd appreciate getting some guidance one how to best address and analyze
> this.
>
> Side question for Tim and other ARM folks, could I get a recommendation on
> reading material for performance tuning for the different ARM archs?
>
> Thank you,
> Alina
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161006/6ec89b56/attachment.html>