[PATCH] D122148: [SLP] Peek into loads when hitting the RecursionMaxDepth

Mon Apr 4 07:44:23 PDT 2022

dmgreen added a comment.

SPEC I've tried for AArch64 - that was one of the places that showed the need for something like this. For AArch64 along with D122145 <https://reviews.llvm.org/D122145> this helps one of the routines in x264 to speed up the whole thing. The exact speedup is quite dependant on the ordering of shuffles chosen, and may need some more work to come out as good as it can. The introduction of select shuffles wasn't great for targets without them. We are talking about something like a 5% improvement.

I've been trying to run X86 too, but I don't have a great setup for it. On what I think is some sort of "Xeon 6148" thing, these were the performance scores I saw from running SPEC 2017 with -march=native (again with this and D122145 <https://reviews.llvm.org/D122145>):

  500.perlbench_r	0.829412281
  502.gcc_r	0.398122431
  505.mcf_r	-0.352758469
  520.omnetpp_r	0.256906516
  523.xalancbmk_r	-0.789195872
  525.x264_r	0.198126785
  531.deepsjeng_r	-0.024245574
  541.leela_r	0.003397322
  557.xz_r	-0.722254612

They can be quite noisy though I'm afraid, even if those results are averaged between three runs. I wouldn't be surprised if all those changes were down to machine noise or knock-on alignment changes. We have a better setup for AArch64 than we do for X86, but there is always some noise.

I tried 2006 too on AArch64. It was only the perlbench binary that changed there, according to the file hashes. And the performance was the same.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122148/new/

https://reviews.llvm.org/D122148