[PATCH] D81397: [ARM] Better reductions

Tue Jun 9 07:05:57 PDT 2020

dmgreen marked an inline comment as done.
dmgreen added inline comments.

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:9322
+  // Use Mul(X, Rev(X)) until 4 items remain
+  while (NumActiveLanes > 4) {
+    unsigned RevOpcode = NumActiveLanes == 16 ? ARMISD::VREV16 : ARMISD::VREV32;
----------------
samparker wrote:
> So, why 4? Is this beat and/or register pressure related? If these is beat related, shouldn't the subtarget be controlling this?
The options are going down to 2 or 4 really. 4 seemed best on the test I ran it on, especially for float. There you get to the point where you can pull out of each lane independently, which is important for fp16, not needing any vmovx's.

For integer it's probably closer. 2 will be less instructions, but there wasn't a lot in the performance. Some sizes/operators were quicker, some were slower by a cycle or 2.  They are likely to be much rarer than float. We could go down to 2 with a vrev64, but like you said that would cross a beats boundary.

I'd prefer not to add a subtarget hook until we actually find that we need it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81397/new/

https://reviews.llvm.org/D81397