[PATCH] D110524: [AArch64ISelLowering] Avoid duplane in some cases when sve enabled
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 29 03:35:11 PDT 2021
sdesmalen requested changes to this revision.
sdesmalen added a comment.
This revision now requires changes to proceed.
Hi @guopeilin, sorry for the stern 'requested changes' on your patch, but I think the code isn't entirely correct just yet.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9469
// Example: dup v2f32 (extract v4f32 X, 2), 1 --> dup v4f32 X, 3
- Lane += V.getConstantOperandVal(1);
- V = V.getOperand(0);
+ auto ExtractedValType = V.getOperand(0).getValueType();
+ if (ExtractedValType.isFixedLengthVector() &&
----------------
nit: The name `ExtractedValType` isn't exactly accurate, because this doesn't represent the value type of the extracted value, but rather the (input) value that a subvector is extracted from.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9471
+ if (ExtractedValType.isFixedLengthVector() &&
+ ExtractedValType.getSizeInBits().getValue() <= 128) {
+ Lane += V.getConstantOperandVal(1);
----------------
david-arm wrote:
> nit: Before committing could you change this to:
>
> ExtractedValType.getFixedSizeInBits()
>
> It's a bit shorter and it also asserts that the TypeSize is fixed.
It should be fine to extract:
v2f32 (extract v16f32 X, 2), 1 -> dup X, 3
The current code seems to check that sizeof(v16f32) <= 128, whereas it should use the index `ExtractedValType.getVectorElementType().getSizeInBits() * (Lane + V.getConstantOperandVal(1))` to determine whether the lane can be indexed..
Additionally, the 128 seems conservative as well, since SVE indexed DUP allows an index < 512bits.
================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-limit-duplane.ll:33
+entry:
+ %0 = load <16 x i32>, <16 x i32>* %arg1, align 256
+ %1 = load <16 x i32>, <16 x i32>* %arg2, align 256
----------------
It seems we can use SVE's indexed DUP instructions for this, see `sve_int_perm_dup_i`, which only has patterns for `AArch64dup` (which is a different ISD node than DUPLANE32).
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D110524/new/
https://reviews.llvm.org/D110524
More information about the llvm-commits
mailing list