[PATCH] D67841: [SLP] avoid reduction transform on patterns that the backend can load-combine

Mon Oct 7 07:13:57 PDT 2019

spatel added a comment.

In D67841#1696910 <https://reviews.llvm.org/D67841#1696910>, @mstorsjo wrote:

> This caused lots of failed asserts in building many different projects, see https://bugs.llvm.org/show_bug.cgi?id=43582, so I went ahead and reverted it for now.

Thanks. I looked at the test cases attached to the bug report, and this patch causes scary behavior:
LV: Found an estimated cost of 4294967293 for VF 1 For instruction:   %or75 = or i32 %shl74, %shl71

The loop vectorizer assumes that costs are always positive (it converts the value returned by the cost model to an *unsigned* value).
This matches the assert in the  getArithmeticInstrCost() implementation that we tried to bypass:

  assert(Cost >= 0 && "TTI should not produce negative costs!");

But we want SLP to weigh the *relative* cost of scalar code (that will be reduced) vs. vector code.

I think we should use the earlier revision of this patch that created a dedicated function for estimating a load combining pattern. Ie, we tried to squeeze this into the more general getArithmeticInstrCost() API, but it does not belong there. Existing callers have made assumptions about using that cost model API, and we violated the contract:

  /// This is an approximation of reciprocal throughput of a math/logic op.
  /// A higher cost indicates less expected throughput.
  /// From Agner Fog's guides, reciprocal throughput is "the average number of
  /// clock cycles per instruction when the instructions are not part of a
  /// limiting dependency chain."
  /// Therefore, costs should be scaled to account for multiple execution units
  /// on the target that can process this type of instruction. For example, if
  /// there are 5 scalar integer units and 2 vector integer units that can
  /// calculate an 'add' in a single cycle, this model should indicate that the
  /// cost of the vector add instruction is 2.5 times the cost of the scalar
  /// add instruction.
  /// \p Args is an optional argument which holds the instruction operands
  /// values so the TTI can analyze those values searching for special
  /// cases or optimizations based on those values.
  int getArithmeticInstrCost(

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67841/new/

https://reviews.llvm.org/D67841