[llvm] [CodeGen] Avoid sinking vector comparisons during CodeGenPrepare (PR #113158)

Fri Nov 8 01:36:08 PST 2024

david-arm wrote:

> Sorry for the delay I was going through things in a strange order. I had previously only checked the test on AArch64. Do we need to worry about the extra xtn/sshll this introduces? I was going to suggest we just remove them post-isel, but that might require getting the top bits correct which requires a lot of infrastructure that currently doesn't exist.

Do you mean for the `no_sink_simple` case? So the typical use case that we see here is when vectorising a loop with a condition and having predicated blocks that depend upon each element of the result. So when there are 4 or more uses to sink (e.g. `vector_loop_with_icmp`) this definitely gives some significant performance gains, which was the motivation for this patch. You could be right that this isn't worth it when there are only two uses, but I'm not sure how to account for that. If possible I'd prefer to keep the `hasMultiplePredicateRegisters` function as it is because I think it's more useful as a standalone hook rather than being tied to the specifics of sinking a comparison instruction. It looks like `sinkCmpExpression` was never originally intended for sinking vector instructions, and presumably this just started happening by accident for vectors without necessarily doing a cost-benefit analysis. The only clean way I can think of to work around this is by introducing a brand new TLI hook also called `shouldConsiderSinkingCmp` that checks if the number of uses of a fixed-width vector are > 2, e.g.

```
static bool sinkCmpExpression(const DataLayout &DL, CmpInst *Cmp,
                              const TargetLowering &TLI) {
  EVT ResVT = TLI.getValueType(DL, Cmp->getType());
  if (!TLI.shouldConsiderSinkingCmp(Cmp) || TLI.hasMultiplePredicateRegisters(ResVT))
    return false;

  // Avoid sinking soft-FP comparisons, since this can move them into a loop.
  if (TLI.useSoftFloat() && isa<FCmpInst>(Cmp))
    return false;
```

where shouldConsiderSinkingCmp returns true in the default case and for AArch64 does something like this:

```
bool shouldConsiderSinkingCmp(CmpInst *Cmp) {
  auto VecTy = dyn_cast<FixedVectorType>(Cmp->getType());
  return !VecTy || Cmp->getNumUsers() > 2;
}
```

What do you think? Or it could go one step further and look at the users to see if they are extractelement or not, i.e. return false if and only if it's a 2-element vector where each user is an extract?

https://github.com/llvm/llvm-project/pull/113158