[PATCH] D113376: [AArch64][SVE] Lower shuffles to permute instructions: zip1/2, uzp1/2, trn1/2

Thu Dec 16 10:55:25 PST 2021

paulwalker-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19586-19589
+  // In order to ensure the correctness of the shuffle lowering result,
+  // when the vector length and the target register size are inconsistent,
+  // we need to add some restrictions to prevent the new generated instructions
+  // like zip2/uzp1/uzp2/rev from having some wrong or undefined behavior.
----------------
I figured this comment could be better and started to write something but ended up with
```Functions like isZIPMask return true when a ISD::VECTOR_SHUFFLE's mask represents the same logical operation as performed by a ZIP instruction.  In isolation these functions do not mean the ISD::VECTOR_SHUFFLE is exactly equivalent to an AArch64 instruction.  There's the extra component of ISD::VECTOR_SHUFFLE's value type to consider.  Prior to SVE these functions only operated on 64/128bit vector types that have a direct mapping to a target register and so an exact mapping is implied.

However, when using SVE for fixed length vectors, most legal vector types are actually sub-vectors of a larger SVE register.  When mapping ISD::VECTOR_SHUFFLE to an SVE instruction care must be taken to consider how the mask's indices translate.  Specifically, when the mapping requires an exact meaning for a specific vector index (e.g. Index X is the last vector element in the register) then such mappings are often only safe when the exact SVE register size is know.  The main exception to this is when indices are logically relative to the first element of either ISD::VECTOR_SHUFFLE operand because these relative indices don't change when converting from fixed-length to scalable vector types (i.e. the start of a fixed length vector is always the start of a scalable vector).
```

Which is more like a novel than a comment :) I've posted it anyway just in case there's something in there that's useful.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-permute-zip-uzp-trn.ll:10-14
+; VBITS_EQ_256-NEXT:    ptrue p0.b
+; VBITS_EQ_256-NEXT:    ld1b { z0.b }, p0/z, [x0]
+; VBITS_EQ_256-NEXT:    ld1b { z1.b }, p0/z, [x1]
+; VBITS_EQ_256-NEXT:    st2 { v0.16b, v1.16b }, [x0]
+; VBITS_EQ_256-NEXT:    ret
----------------
It looks like `InterleavedAccessPass` is causing your code to be bypassed.  I couldn't immediately see a way to disable the pass other than using `-O0` which might cause other issues but I think if you use `volatile` loads and stores within these tests you'll get what you need.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-permute-zip-uzp-trn.ll:413
+
+attributes #0 = { "target-features"="+sve" }
+
----------------
Please place the attributes together at the end of the file because otherwise they're hard to find when trying to see what attributes exist for a specific function.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fixed-length-permute-zip-uzp-trn.ll:627
+
+attributes #1 = { "target-features"="+sve" vscale_range(2,2) }
+
----------------
As above.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113376/new/

https://reviews.llvm.org/D113376