[PATCH] D131047: [AArch64] Change aarch64_neon_pmull{,64} intrinsic ISel through a new SDNode.

Wed Aug 17 00:55:35 PDT 2022

mingmingl added a comment.

In D131047#3725210 <https://reviews.llvm.org/D131047#3725210>, @dmgreen wrote:

> In D131047#3725122 <https://reviews.llvm.org/D131047#3725122>, @mingmingl wrote:
>
>> In D131047#3723365 <https://reviews.llvm.org/D131047#3723365>, @dmgreen wrote:
>>
>>> Can you upload with full context?
>>
>> Ah, I should have made it more explicit that the context above was all my findings.
>
> I meant update the patch with -U9999999 as per https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface. It makes the reviews easier to read.

Got it. Used "git show HEAD -U999999 > mypatch.patch" this time.

>>> If this is relying on DUPLANE(SCALAR_TO_VEC) not simplifying into DUP, that might not always be true as more optimizations are added in the future.
>>
>> I didn't realize the change relies on the miss of `duplane(scalar_to_vector)->dup`. Thanks for the catch!
>>
>>> It may be better to canonicalize all PMULL64 to use v1i64 vectors, and use EXTRACT_SUBVECTOR from vectors. It may require adding a new PMULL64 node if the intrinsics require i64 inputs, but it should allow ld1r to be created as opposed to load+dup.
>>
>> Re "canonicalize all PMULL64 to use v1i64 and adding a PMULL64 node"  does this mean adding a aarch64-specific SelectionDAG node for PMULL64 (llvm doc <https://github.com/llvm/llvm-project/blob/556efdba8569f748fd805e30beeae8afa2d6a343/llvm/lib/Target/AArch64/AArch64InstrInfo.td#L5238>), like what SMULL/UMULL does? This sounds promising and I will make a change. Let me know if I misunderstand anything, thanks for all the guidance!
>
> Yeah, a AArch64ISD::PMULL64 node. It can hopefully make the intent that the inputs are i64's in vector registers easier to specify.

Done with two things to mention

1. While only one operand is a higher-half, the other operand is legalized with DUP.
  - If the other operand is a lower-half, DUPLANE64 is better. Use a FIXME assuming `aarch64_neon_pmull64(higher-half, lower-half)` is not common. This is feasible to fix (using Optional<uint64_t> to represent lane1, lane0 or not SIMD) but just add a little more code complexity.
2. It seems to me that `ld1r` requires the base address to be a GPR so address with offset becomes another add instruction; while with ldr (small) offsets could be folded into instruction itself. So use `ISD::SCALAR_TO_VECTOR` for `i64->v1i64` rather than `ISD::AArch64Dup`.

Let me know if I miss anything in 1) or 2). Thanks!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131047/new/

https://reviews.llvm.org/D131047