[PATCH] D131047: [AArch64] Add a tablegen pattern to transform duplane(scalar_to_vector(x),0) to dup(x), and vectorize scalar operands for aarch64.neon.pmull64 intrinsic

Tue Aug 16 00:18:30 PDT 2022

dmgreen added a comment.

In D131047#3725122 <https://reviews.llvm.org/D131047#3725122>, @mingmingl wrote:

> In D131047#3723365 <https://reviews.llvm.org/D131047#3723365>, @dmgreen wrote:
>
>> Can you upload with full context?
>
> Ah, I should have made it more explicit that the context above was all my findings.

I meant update the patch with -U9999999 as per https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface. It makes the reviews easier to read.

>> If this is relying on DUPLANE(SCALAR_TO_VEC) not simplifying into DUP, that might not always be true as more optimizations are added in the future.
>
> I didn't realize the change relies on the miss of `duplane(scalar_to_vector)->dup`. Thanks for the catch!
>
>> It may be better to canonicalize all PMULL64 to use v1i64 vectors, and use EXTRACT_SUBVECTOR from vectors. It may require adding a new PMULL64 node if the intrinsics require i64 inputs, but it should allow ld1r to be created as opposed to load+dup.
>
> Re "canonicalize all PMULL64 to use v1i64 and adding a PMULL64 node"  does this mean adding a aarch64-specific SelectionDAG node for PMULL64 (llvm doc <https://github.com/llvm/llvm-project/blob/556efdba8569f748fd805e30beeae8afa2d6a343/llvm/lib/Target/AArch64/AArch64InstrInfo.td#L5238>), like what SMULL/UMULL does? This sounds promising and I will make a change. Let me know if I misunderstand anything, thanks for all the guidance!

Yeah, a AArch64ISD::PMULL64 node. It can hopefully make the intent that the inputs are i64's in vector registers easier to specify.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131047/new/

https://reviews.llvm.org/D131047