[PATCH] D97961: [Cost]Canonicalize the cost for logical or/and reductions.

Fri Mar 5 00:45:00 PST 2021

david-arm added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/BasicTTIImpl.h:1902
+      // Or reduction for i1 is represented as:
+      // %val = bitcast <ReduxWidth x i1> to iReduxWidth
+      // %res = cmp ne iReduxWidth %val, 0
----------------
ABataev wrote:
> david-arm wrote:
> > I'm not sure this is always true because some backends (e.g. AArch64) promote i1 to larger integers. The costs for AArch64 still look a bit odd to be honest. I tried them out manually and I observe about 8 instructions for AND reductions using <4 x i1> vectors since we have lots of bytewise moves of -1 into the vector lanes of a <4 x i32> vector.
> This is known problem, see
> https://bugs.llvm.org/show_bug.cgi?id=41636
> https://bugs.llvm.org/show_bug.cgi?id=41635
> https://bugs.llvm.org/show_bug.cgi?id=41634 
> 
> Looks like the construct is not lowered properly on some targets
Sure, I totally agree the codegen for ARM and AArch64 is awful and I take your point. I was just wondering if this assumption was a problem:

  %val = bitcast <ReduxWidth x i1> to iReduxWidth

as I don't think is true for targets that promote i1 to i32 or something like that. In the bug shown above (https://bugs.llvm.org/show_bug.cgi?id=41636) even the optimal code is still operating on vectors of i8 types.

I guess for those targets that do promote i1->iX they can come up with their own cost in the target specific getArithmeticReductionCost so maybe this isn't really a problem?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97961/new/

https://reviews.llvm.org/D97961