[PATCH] D103799: [CostModel] Express cost(urem) as cost(div+mul+sub) when set to Expand.

Tue Jul 6 09:26:49 PDT 2021

sdesmalen added a comment.

I had another look at the costs for X86.

These are a factor three bigger because the cost of a scalar REM is now calculated to be expanded to div+sub+mul rather than the default `1` for 'unknown', which is then multiplied in `X86TTIImpl::getArithmeticInstrCost` by an arbitrary `20 * LT.first * LT.second.getVectorNumElements()`. The comment of that code (https://github.com/llvm/llvm-project/blob/6f3f9535fcafcde11d3b3ef72fdc0f357813e9da/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L1044) says:

  // It is not a good idea to vectorize division. We have to scalarize it and
  // in the process we will often end up having to spilling regular
  // registers. The overhead of division is going to dominate most kernels
  // anyways so try hard to prevent vectorization of division - it is
  // generally a bad idea. Assume somewhat arbitrarily that we have to be able
  // to hide "20 cycles" for each lane.

The options are to:

1. Tweak this number '20' to keep the cost similar to what they are now (that would require special-casing the code in X86TTI for UREM/SREM, since the condition is currently bundled with SDIV/UDIV).
2. Leave it as-is, since the values are already largely arbitrary an arbitrary number and the higher number satisfies the goal of preventing vectorization of division.

Does anyone have a suggestion and/or preference?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103799/new/

https://reviews.llvm.org/D103799