[PATCH] D141469: [AArch64][SVE] Add more intrinsics in 'isZeroingInactiveLanes'.

dewen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 16 18:48:15 PST 2023


dewen added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:301-304
+    case Intrinsic::aarch64_sve_trn1:
+    case Intrinsic::aarch64_sve_trn2:
+    case Intrinsic::aarch64_sve_uzp1:
+    case Intrinsic::aarch64_sve_uzp2:
----------------
paulwalker-arm wrote:
> Does `trn` and `uzp` zero inactive lanes? My reading of the pseudo code is that all lanes of the result come from one of the inputs.  For example when `trn1` is used with a `.d` element type each "lane" is actually a block of the 8 predicate bits that sit within it and they are copied as a block without modification.
> 
> Actually I think we've likely got an upstream bug here with how we're lowering the svbool builtins of these instructions to these intrinsics because the builtins fully define the entire predicate register, which when fixed should mean this function will not need to worry about them. 
> 
> Either way, if you agree I think it's best to remove their entries/tests from this patch.
I agree with you. at paulwalker-arm


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:305-311
+    case Intrinsic::aarch64_sve_orr_z:
+    case Intrinsic::aarch64_sve_orn_z:
+    case Intrinsic::aarch64_sve_and_z:
+    case Intrinsic::aarch64_sve_bic_z:
+    case Intrinsic::aarch64_sve_eor_z:
+    case Intrinsic::aarch64_sve_nor_z:
+    case Intrinsic::aarch64_sve_nand_z:
----------------
paulwalker-arm wrote:
> I don't believe the logical instructions zero inactive lanes either. You can see this by looking at the tests you've added.  The instructions are always "byte" based and so if the input general predicate contains garbage then so will the result. In this instance this is not a problem because from the ACLE's point of view we only support these intrinsics for `<vscale x 16 x i1>`[1] types and when used with such types they'll be no reinterprets to introduce the unnecessary `and` and so I think these entries should also be removed.
> 
> [1] I know the code generation works for other predicate types, but that's just a fluke based on some stale isel patterns (i.e. they'll never be generated via the clang builtins) that should probably be removed.
I don't understand this part, but I think these logic instructions are going to put Inactive elements in the destination predicate register are set to zero. @paulwalker-arm

for example:
{F26194894}
{F26194900}
{F26194902}
{F26194903}
{F26194906}
{F26194911}


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141469/new/

https://reviews.llvm.org/D141469



More information about the llvm-commits mailing list