[PATCH] D155355: [AArch64] Set maximum vscale VF with shouldMaximizeVectorBandwidth

Mon Jul 17 01:01:26 PDT 2023

Allen added a comment.

I don't have a server with SVE to support run the performance of large benchmark spec2017.
But when I run the Lammp with intel mode (https://www.lammps.org/#gsc.tab=0) on emulator, I find the
hot function PairLJCutCoulLongIntel::eval in file pair_lj_cut_coul_long_intel.cpp:337 will enlarge the VF from 2 to 4
because there are float and double types in the kernel loop body, so choose a more widen VF will have
wider parallelism, and the performance gain about 16%  (https://github.com/lammps/lammps/blob/develop/src/INTEL/pair_lj_cut_coul_long_intel.cpp#L337).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155355/new/

https://reviews.llvm.org/D155355