[PATCH] D114174: [ARM][CodeGen] Add support for complex deinterleaving

Mon Oct 24 08:50:00 PDT 2022

dmgreen added a comment.

Thanks for the updates. This is looking good now.

There are some extra comments below. I was trying some examples that have awkward orders and cross basicblocks, but couldn't find ways to make it break.

================
Comment at: llvm/include/llvm/CodeGen/ComplexDeinterleavingPass.h:39-42
+  // The following 'operations' are used
+  // to represent internal states. Backends
+  // are not expected to try and support
+  // these in any capacity.
----------------
This can now be re-flowed.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:19
+//
+// Replacement:
+// This step performs the necessary input wrangling (chasing values through
----------------
Can this comment be updated now?

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:559
+  UncommonImagOp = (CommonOperand == I0) ? I1 : I0;
+  ///  0:  r: cr + ar * br
+  ///      i: ci + ar * bi
----------------
These can be // comments. They are very useful, but maybe it is not necessary to repeat them.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-complex-deinterleaving-mixed-cases.ll:4

 target triple = "thumbv8.1m.main-none-none-eabi"

----------------
Are there any tests with missing fast-math flags?

================
Comment at: llvm/test/CodeGen/Thumb2/mve-complex-deinterleaving-mixed-cases.ll:5
 target triple = "thumbv8.1m.main-none-none-eabi"

 ; Expected to transform
----------------
I managed to come up with this testcase that failed because of the vector size. It is probably larger than it needs to be. It could work or not, depending if the backend handles non-power-2 sizes. In either rate, it would be good to add this example. (Once integers are added too, some odd size integer width tests would be good too).
```
define arm_aapcs_vfpcc <12 x float> @abp90c12(<12 x float> %a, <12 x float> %b, <12 x float> %c) {
entry:
  %ar = shufflevector <12 x float> %a, <12 x float> poison, <6 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10>
  %ai = shufflevector <12 x float> %a, <12 x float> poison, <6 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11>
  %br = shufflevector <12 x float> %b, <12 x float> poison, <6 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10>
  %bi = shufflevector <12 x float> %b, <12 x float> poison, <6 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11>
  %cr = shufflevector <12 x float> %c, <12 x float> poison, <6 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10>
  %ci = shufflevector <12 x float> %c, <12 x float> poison, <6 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11>

  %i6 = fmul fast <6 x float> %br, %ar
  %i7 = fmul fast <6 x float> %bi, %ai
  %xr = fsub fast <6 x float> %i6, %i7
  %i9 = fmul fast <6 x float> %bi, %ar
  %i10 = fmul fast <6 x float> %br, %ai
  %xi = fadd fast <6 x float> %i9, %i10

  %zr = fsub fast <6 x float> %cr, %xi
  %zi = fadd fast <6 x float> %ci, %xr
  %interleaved.vec = shufflevector <6 x float> %zr, <6 x float> %zi, <12 x i32> <i32 0, i32 6, i32 1, i32 7, i32 2, i32 8, i32 3, i32 9, i32 4, i32 10, i32 5, i32 11>
  ret <12 x float> %interleaved.vec
}
```

================
Comment at: llvm/test/CodeGen/Thumb2/mve-complex-deinterleaving-uniform-cases.ll:79

 ; Expected to not transform
 define arm_aapcs_vfpcc <4 x float> @simple_add_270(<4 x float> %a, <4 x float> %b) {
----------------
Is this just not transforming because of commutativity on the add? Can we add a test for `fadd fast <2 x float> %strided.vec20, %strided.vec` too, to show it does transform.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114174/new/

https://reviews.llvm.org/D114174