[PATCH] D132322: [AArch64][SelectionDAG] Optimize multiplication by constant

Mon Aug 22 23:21:00 PDT 2022

dmgreen added a comment.

Hi - I had been looking at mul vs add+shift recently, but had not had the time to get very far and had not got to the important part yet - exactly where and when should we be doing the transform, especially when you consider all the various cpus present.

AArch64 usually handles mul vs add/shift here: https://github.com/llvm/llvm-project/blob/6c6c4f6a9b3ef2d7db937cb78784245ea8a61418/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L14465
With a cost-model of sorts here: https://github.com/llvm/llvm-project/blob/6c6c4f6a9b3ef2d7db937cb78784245ea8a61418/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L14442

My understanding (but some of this may not be correct) is that:

- The existing cost model is conservative and could do with adjustment, but is there for a reason. It prevents the transform when there is an add/sub user (to create a madd) or if there is a zext/sext operand (to create a umull).
- A good cost model is hard to be precise about but add cost 1, shift costs 1, add+shift usually costs 2 but can be 1 for some cpus and constants. mul costs somewhere between 2 and 5, depending on the cpu, size (i32/i64) and the values in the registers. madd costs the same as mul (so the add is free). Newer cpus have a lower cost for mul, especially i64 mul.
- Replacing a mul with 2 instructions is probably good, a mul with 3 instructions is more iffy.
- I was using https://godbolt.org/z/cPTEMnP5x with different C to compare when we do the transform compared to gcc.
- The cost model is certainly worse where the operands lead into a load/store pointer operand, as the add can be folded into the addressing mode so it won't form a madd. Small shifts can sometimes be free again. I was probably planning to alter the existing profitability checks.

So whilst this may be an improvement on some targets over what we have already, a lot of the changes here either look like obvious regressions (cntd/decd/etc), are non-obvious whether they are regressions or not (madd vs add+lsl+add_lsl), or are not longer testing the point of the tests (machine-combiner-madd.ll).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132322/new/

https://reviews.llvm.org/D132322