[PATCH] D126234: [AArch64] Add support for FMA intrinsics to shouldSinkOperands.

Wed May 25 04:08:21 PDT 2022

dmgreen added a comment.

I think that non-legal types will be legalized to legal types where a splat can still be used for lane indexing.
And it seems like the first two operands of a fma are commutative?
And I don't the splat index can be out of range, if we consider illegal types to be legalized.

Which means we might be able to simplify it to this, maybe with a check for fullfp16 and the the scalar types are f16/f32/f64?

  +    case Intrinsic::fma:
       case Intrinsic::aarch64_neon_sqdmull:
       case Intrinsic::aarch64_neon_sqdmulh:
       case Intrinsic::aarch64_neon_sqrdmulh:
         // Sink splats for index lane variants
         if (isSplatShuffle(II->getOperand(0)))
           Ops.push_back(&II->getOperandUse(0));
         if (isSplatShuffle(II->getOperand(1)))
           Ops.push_back(&II->getOperandUse(1));
         return !Ops.empty();

Note that I have also heard of cases where the lane index fma instructions are slightly slower than a normal fma, as they get cracked into a dup and a fma micro-op. Which would make leaving them outside of the loop a better idea, and might make Machine Combiner a better place for this if it worked. But I don't know where that actually happens, and there doesn't seem to be an obvious place intree where the latencies are different except for one old core and this cyclone difference which doesn't make a lot of sense: https://godbolt.org/z/63eWsEqT1.   So I think sinking them unconditionally is probably fine if it saves the register pressure. We just might need to change that in the future if we find cases where it does matter.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126234/new/

https://reviews.llvm.org/D126234