[PATCH] D60890: [AArch64] splat before (f)mul to allow mul-by-element isel

Thu Apr 18 16:45:58 PDT 2019

efriedma added a comment.

> The constant cases look better, but I'm not sure if this is a win if both operands are variables.

The key really isn't whether one of the operands is a constant; it's whether the operand is free or cheap to splat.  Constants are usually free to splat.  A splat is free to splat... although I guess that's unlikely to come up in practice given other optimizations.  A load is cheap to splat (the addressing mode for ld1r is very limited, so you're likely adding an extra instruction for address computation).  A loop-invariant operand is likely cheap to splat, but I don't think there's any way to handle that in SelectionDAG at the moment.  And of course, a multiply with an operand that's free to splat is itself free to splat, recursively.  Probably worth adding a testcase with more than one multiply.

For two arbitrary variables, it's basically neutral, like you've noted; splatting an operand has the same cost as splatting a result.

It's worth noting that at least on some chips, 128-bit multiplies have half the throughput of scalar and 64-bit vector multiplies.  But I don't think that directly affects this patch; you probably wouldn't want to add extra instructions just to make a multiply smaller.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60890/new/

https://reviews.llvm.org/D60890