[PATCH] D141469: [AArch64][SVE] Add more intrinsics in 'isZeroingInactiveLanes'.

Mon Jan 16 17:55:29 PST 2023

paulwalker-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:251
 // Returns true if inactive lanes are known to be zeroed by construction.
 static bool isZeroingInactiveLanes(SDValue Op) {
   switch (Op.getOpcode()) {
----------------
Just to add context to my comments below. Here "inactive lanes" means those lanes that are not visible within the DAG.  By which I mean the MVTs for SVE's predicate vector types represent a subset of the underlying bits within a predicate register.  This function is asking whether the specified operation is known to zero the bits that are not part of this subset. (e.g. nxv2i1 represents every eighth bit of the predicate register 0,8,16... and this function returns true if the operation is guaranteed to zero bits 1-7,9-15...)

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:301-304
+    case Intrinsic::aarch64_sve_trn1:
+    case Intrinsic::aarch64_sve_trn2:
+    case Intrinsic::aarch64_sve_uzp1:
+    case Intrinsic::aarch64_sve_uzp2:
----------------
Does `trn` and `uzp` zero inactive lanes? My reading of the pseudo code is that all lanes of the result come from one of the inputs.  For example when `trn1` is used with a `.d` element type each "lane" is actually a block of the 8 predicate bits that sit within it and they are copied as a block without modification.

Actually I think we've likely got an upstream bug here with how we're lowering the svbool builtins of these instructions to these intrinsics because the builtins fully define the entire predicate register, which when fixed should mean this function will not need to worry about them. 

Either way, if you agree I think it's best to remove their entries/tests from this patch.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:305-311
+    case Intrinsic::aarch64_sve_orr_z:
+    case Intrinsic::aarch64_sve_orn_z:
+    case Intrinsic::aarch64_sve_and_z:
+    case Intrinsic::aarch64_sve_bic_z:
+    case Intrinsic::aarch64_sve_eor_z:
+    case Intrinsic::aarch64_sve_nor_z:
+    case Intrinsic::aarch64_sve_nand_z:
----------------
I don't believe the logical instructions zero inactive lanes either. You can see this by looking at the tests you've added.  The instructions are always "byte" based and so if the input general predicate contains garbage then so will the result. In this instance this is not a problem because from the ACLE's point of view we only support these intrinsics for `<vscale x 16 x i1>`[1] types and when used with such types they'll be no reinterprets to introduce the unnecessary `and` and so I think these entries should also be removed.

[1] I know the code generation works for other predicate types, but that's just a fluke based on some stale isel patterns (i.e. they'll never be generated via the clang builtins) that should probably be removed.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141469/new/

https://reviews.llvm.org/D141469