[PATCH] D39415: [ARMISelLowering] Better handling of NEON load/store for sequential memory regions

Tue Oct 31 01:45:22 PDT 2017

evgeny777 added a comment.

@rengolin

1. Compilation times

With patch:

  real	1m40.086s
  user	0m57.040s
  sys	0m27.444s

W/o patch:

  real	1m40.619s
  user	0m58.120s
  sys	0m25.944s

Those measurements were done with this bash script:

  #!/bin/bash
  LLC=/data/llvm/build_ninja_Release/bin/llc
  for ((i=1;i<=10000;i++)); do 
     $LLC -mtriple=arm-eabi -float-abi=soft -mattr=+neon mat_mul_4x4.ll 
  done

An interesting fact is that execution of patched llc is stably slightly less than
that of non-patched version (both were run 3 times in a row). Not sure what the reason 
is (may be less number of SD nodes after DAGCombine). My machine specs are:
Core-i5 2500K, 8GB RAM Ubuntu 16.04

2. Execution times of matrix multiplication example (ARMv8, 32-bit) on ARM Cortex A57, 2GHz:

With patch:

  MI scheduler: 2549066 usec
  SD scheduler: 2647092 usec

W/o patch:

  MI scheduler: 3039261 usec
  SD scheduler: 2843175 usec

We're using MI scheduler model added in https://reviews.llvm.org/D28152. With SD scheduler improvement is smaller, but still
noticable.

Repository:
  rL LLVM

https://reviews.llvm.org/D39415