[PATCH] D39415: [ARMISelLowering] Better handling of NEON load/store for sequential memory regions

Tue Oct 31 02:00:58 PDT 2017

rengolin added a comment.

In https://reviews.llvm.org/D39415#911554, @evgeny777 wrote:

> With patch:
>
>   real	1m40.086s
>
>
> W/o patch:
>
>   real	1m40.619s
>

@evgeny777, while this is encouraging, you're still only running the one case where you know this is good.

> Those measurements were done with this bash script:
> 
>   #!/bin/bash
>   LLC=/data/llvm/build_ninja_Release/bin/llc
>   for ((i=1;i<=10000;i++)); do 
>      $LLC -mtriple=arm-eabi -float-abi=soft -mattr=+neon mat_mul_4x4.ll 
>   done

This is very far from ideal, as kernel caching, bash and whatever running on your current system could be affecting your timing more than the difference itself.

> An interesting fact is that execution of patched llc is stably slightly less than
>  that of non-patched version (both were run 3 times in a row). Not sure what the reason 
>  is (may be less number of SD nodes after DAGCombine). My machine specs are:
>  Core-i5 2500K, 8GB RAM Ubuntu 16.04

Honestly, I don't think your results carry any statistical significance, the variability of the runs could be due to too many factors. I wouldn't worry about those fluctuations right now.

> With patch:
> 
>   MI scheduler: 2549066 usec
>   SD scheduler: 2647092 usec
> 
> 
> W/o patch:
> 
>   MI scheduler: 3039261 usec
>   SD scheduler: 2843175 usec
> 
> 
> We're using MI scheduler model added in https://reviews.llvm.org/D28152. With SD scheduler improvement is smaller, but still
>  noticable.

Again, this is the case that was tailored and reduced, so of course, the numbers will look very inflated.

What ARM machine have you ran these? This makes a huge difference. From your comments, I'm guessing this is ARMv8, which is by far *not* the common case for AArch32.

Until you can show results on real benchmarks on at least one core for both compile and run times, there isn't much else we can do.

Once that's available, I'd encourage other people to do the same on their cores, as well as have a deeper look into the code.

>From @javed.absar's comment, I'm not the only one finding this patch non-intuitive, even with the good amount of comments.

cheers,
--renato

Repository:
  rL LLVM

https://reviews.llvm.org/D39415