[PATCH] D106237: [ISel] Port AArch64 HADD and RHADD to ISel

Fri Jan 21 09:32:33 PST 2022

dmgreen added a comment.

In D106237#3228430 <https://reviews.llvm.org/D106237#3228430>, @craig.topper wrote:

> In D106237#3114240 <https://reviews.llvm.org/D106237#3114240>, @dmgreen wrote:
>
>> In D106237#3112542 <https://reviews.llvm.org/D106237#3112542>, @craig.topper wrote:
>>
>>> Instead of making this part of SimplifyDemandedBits, could you emit (and (sext (hadds X, Y)), 0x7fffffff) for the (lshr (add (sext(X), sext(Y)), 1) case and let the AND be optimized by itself? Or would the transform not be profitable if it doesn't get removed?
>>
>> Yeah that might work. Like https://alive2.llvm.org/ce/z/9pwKEi. I'll have to check if it looks worse anywhere - I think it should be fine in general, possibly minus the cost of materializing the constant.
>
> Did you check on this?

Yeah - AArch64 as a saddl instruction that can perform 'add(sext, sext)' as a single instruction. That with a shift will be better than a `hadd;extend;bic`, and it will look worse if the And is less efficient (possibly having to dup a constant).

With the trunc it will perform OK, I'm not sure what the likelyhood of sext with lshr will be without it. With the trunc it will be one of the most common cases, given how the ashr is converted to a lshr.

Providing that the depth is limited, is there an issue with doing it as a demand bits combine?

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:957
+      KnownBits = std::min(NumSignedA, NumSignedB) - 1;
+    } else if (NumZeroA >= 2 && NumZeroB >= 2) {
+      IsSigned = false;
----------------
craig.topper wrote:
> ComputeNumSignBits can call computeKnownBits internally and will count leading 0s as sign bits. Does the zero check need to have priority here?
I was trying to keep SRA producing HADDS, and SRL preferring HADDU. I think it should be using the known bits to preferably make the lowest bitwidth hadd.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:1351
+    if (HasInt256) {
+      setOperationAction(ISD::AVGCEILU, MVT::v16i16, Legal);
+      setOperationAction(ISD::AVGCEILU, MVT::v32i8,  Legal);
----------------
RKSimon wrote:
> These can be custom lowered on AVX1 targets by splitting, but I'm OK with just a TODO for now.
Thanks. I thought that the natural splitting of vectors would capture that already.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106237/new/

https://reviews.llvm.org/D106237