[PATCH] D99324: [AArch64][SVE] Simplify codegen of svdup_lane intrinsic

Thu Mar 25 04:49:53 PDT 2021

junparser added a comment.

In D99324#2650130 <https://reviews.llvm.org/D99324#2650130>, @junparser wrote:

> In D99324#2650103 <https://reviews.llvm.org/D99324#2650103>, @paulwalker-arm wrote:
>
>> In D99324#2650100 <https://reviews.llvm.org/D99324#2650100>, @junparser wrote:
>>
>>> In D99324#2650064 <https://reviews.llvm.org/D99324#2650064>, @paulwalker-arm wrote:
>>>
>>>> I'm not saying all the pieces will come for free but this feels like an intrinsic optimisation problem rather than an instruction selection one.  What about extending SVEIntrinsicOpts.cpp to convert the pattern to a stock `splat_vector(extract_vector_elt(vec, idx))` and then letting the code generator decide how best to lower the LLVM way of doing things.  This'll mean we solve the problem once for ACLE and auto-vectorisation.
>>>
>>> Actually, it is an isel issue,  The svdup_lane in title is just where I find this issue.
>>> 1), there is no intrinsic direct map to dup (index) instruction, while vector_extract may lower with dup (index), it is not enough. 2) svdup_lane  acle intrinsic generates as  sve.dup.x + sve.tbl  in llvm ir, and covert to AArch64tbl ( ... splat_vector(..., constant)) , then lower to AArch64tbl ( ... DUP(..., imm)). This is the pattern this patch try to match.
>>
>> Sure, I understand that.  But the problem of good code generation to duplicate a vector lane seems like a generic one and thus we can solve that first.  Then we can canonicalise ACLE related intrinsic patterns to stock LLVM IR and thus not require multiple solutions to the same problem.  In the future this will also have the benefit of allowing other stock LLVM transforms to kick in that would otherwise not understand the SVE specific intrinsics.
>
> OK, I understand you point. splat_vector(extract_vector_elt(vec, idx)) looks ok for me, and why you prefer do it in in SVEIntrinsicOpts.cpp ? what about do this  in performdagcombine  with AArch64TBL node?

The reason i prefer to handle in performdagcombine  is that what we want to match is AArch64tbl ( ... splat_vector(..., constant)) rather than sve.tbl + sve.dupx. Since shufflevector can also convert to splat_vector.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99324/new/

https://reviews.llvm.org/D99324