[PATCH] D110524: [AArch64ISelLowering] Avoid duplane in some cases when sve enabled

Sander de Smalen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 29 03:35:11 PDT 2021


sdesmalen requested changes to this revision.
sdesmalen added a comment.
This revision now requires changes to proceed.

Hi @guopeilin, sorry for the stern 'requested changes' on your patch, but I think the code isn't entirely correct just yet.



================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9469
     // Example: dup v2f32 (extract v4f32 X, 2), 1 --> dup v4f32 X, 3
-    Lane += V.getConstantOperandVal(1);
-    V = V.getOperand(0);
+    auto ExtractedValType = V.getOperand(0).getValueType();
+    if (ExtractedValType.isFixedLengthVector() &&
----------------
nit: The name `ExtractedValType` isn't exactly accurate, because this doesn't represent the value type of the extracted value, but rather the (input) value that a subvector is extracted from.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9471
+    if (ExtractedValType.isFixedLengthVector() &&
+        ExtractedValType.getSizeInBits().getValue() <= 128) {
+      Lane += V.getConstantOperandVal(1);
----------------
david-arm wrote:
> nit: Before committing could you change this to:
> 
>   ExtractedValType.getFixedSizeInBits()
> 
> It's a bit shorter and it also asserts that the TypeSize is fixed.
It should be fine to extract:

  v2f32 (extract v16f32 X, 2), 1 -> dup X, 3

The current code seems to check that sizeof(v16f32) <= 128, whereas it should use the index `ExtractedValType.getVectorElementType().getSizeInBits() * (Lane + V.getConstantOperandVal(1))` to determine whether the lane can be indexed..

Additionally, the 128 seems conservative as well, since SVE indexed DUP allows an index < 512bits.



================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-limit-duplane.ll:33
+entry:
+  %0 = load <16 x i32>, <16 x i32>* %arg1, align 256
+  %1 = load <16 x i32>, <16 x i32>* %arg2, align 256
----------------
It seems we can use SVE's indexed DUP instructions for this, see `sve_int_perm_dup_i`, which only has patterns for `AArch64dup` (which is a different ISD node than DUPLANE32).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110524/new/

https://reviews.llvm.org/D110524



More information about the llvm-commits mailing list