[llvm-dev] [arm, aarch64] Alignment checking in interleaved access pass

Renato Golin via llvm-dev llvm-dev at lists.llvm.org
Fri Oct 14 07:38:31 PDT 2016


On 10 October 2016 at 22:16, Alina Sbirlea <alina.sbirlea at gmail.com> wrote:
> IMO, it makes sense to have Halide generate this instead:
> %114 = shufflevector <16 x i32> %112, <16 x i32> %113, <16 x i32> <i32 0,
> i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18,
> i32 26, i32 3, i32 11, i32 19, i32 27>
> store <16 x i32> %114, <16 x i32>* %sunkaddr262
>  %115 = shufflevector <16 x i32> %112, <16 x i32> %113, <16 x i32> <i32 4,
> i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32
> 22, i32 30, i32 7, i32 15, i32 23, i32 31>
> store <16 x i32> %115, <16 x i32>* %scevgep241
> With the changes from the patch, this translates to the code above, and it
> is arch independent.

Right, this makes sense.

This should generate 2 VST4/ST4, which together will be contiguous,
but not individually.


> Yes, I did that with some of the codes generated by Halide, it's what led to
> patch D23646 to extend the patterns. The new code being generated is the
> "expected" one.

I have added some comments on the review, but I think overall, it
makes sense and it's a much simpler patch than I was expecting to find
working all the way to the end. :)


> Also, benchmarking some of their apps showed that llvm's pass (after the
> patch) does the job as well as the custom code generation they were using
> before. (Note, that Halide's code generation was written before the
> interleaved access pass was added, so it made sense at the time.)

Nice!

cheers,
--renato


More information about the llvm-dev mailing list