[PATCH] D114174: [ARM][CodeGen] Add support for complex deinterleaving

Tue Nov 8 04:10:20 PST 2022

NickGuy added inline comments.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:173
+  SmallVector<NodePtr> CompositeNodes;
+  SmallPtrSet<Instruction *, 16> AllInstructions;
+
----------------
SjoerdMeijer wrote:
> I haven't looked carefully, but why do we also need to record all instructions here? I mean, the graph can be queried for existence of nodes/instructions, or is this bookkeeping duplication more efficient?
The duplication makes it simpler to check the internal uses of a node (See `ComplexDeinterleavingCompositeNode::hasAllInternalUses`). Without it, we'd need to iterate over the nodes twice, and check the instructions of each node (All of `Real`, `Imag`, and the contents of `InternalInstructions`), while this approach allows us to simply say "Is this instruction known within the graph?".

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:181
+
+  NodePtr validateAndSubmitCompositeNode(NodePtr Node) {
+    CompositeNodes.push_back(Node);
----------------
SjoerdMeijer wrote:
> SjoerdMeijer wrote:
> > Nit: no validation going on here, only insertion?
> Nit: no validation going on here, only insertion?
Good catch, the validation that was here was moved to be more tied into the identification. Renamed

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:367
+ComplexDeinterleavingGraph::NodePtr
+ComplexDeinterleavingGraph::identifyNodeWithImplicitAdd(
+    Instruction *Real, Instruction *Imag,
----------------
SjoerdMeijer wrote:
> Nit: maybe I overlooked something, but this doesn't seem to use any state from the class so could be just a helper function?
It populates the compositeNode structure through (formerly) `validateAndSubmitCompositeNode`, and also calls `identifyNode` further, which does use state from the class.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:467
+
+  NodePtr CommonNode = identifyNode(cast<Instruction>(PartialMatch.first),
+                                    cast<Instruction>(PartialMatch.second));
----------------
SjoerdMeijer wrote:
> Nit: I am wondering if these casts here and below are necessary.
You are correct, good catch.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:483
+      ComplexDeinterleavingOperation::CMulPartial, Real, Imag);
+  Node->Rotation = Rotation;
+  Node->addOperand(CommonNode);
----------------
SjoerdMeijer wrote:
> nit: maybe some or all of these node setting things can be done in a constructor.
Not sure I agree with that, there are cases - such as a `Shuffle` - where none of the other settings are relevant. The operation type, real instruction, and imaginary instructions are the only settings relevant to all cases.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-complex-deinterleaving-mixed-cases.ll:5
 target triple = "thumbv8.1m.main-none-none-eabi"

 ; Expected to transform
----------------
dmgreen wrote:
> I managed to come up with this testcase that failed because of the vector size. It is probably larger than it needs to be. It could work or not, depending if the backend handles non-power-2 sizes. In either rate, it would be good to add this example. (Once integers are added too, some odd size integer width tests would be good too).
> ```
> define arm_aapcs_vfpcc <12 x float> @abp90c12(<12 x float> %a, <12 x float> %b, <12 x float> %c) {
> entry:
>   %ar = shufflevector <12 x float> %a, <12 x float> poison, <6 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10>
>   %ai = shufflevector <12 x float> %a, <12 x float> poison, <6 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11>
>   %br = shufflevector <12 x float> %b, <12 x float> poison, <6 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10>
>   %bi = shufflevector <12 x float> %b, <12 x float> poison, <6 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11>
>   %cr = shufflevector <12 x float> %c, <12 x float> poison, <6 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10>
>   %ci = shufflevector <12 x float> %c, <12 x float> poison, <6 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11>
> 
>   %i6 = fmul fast <6 x float> %br, %ar
>   %i7 = fmul fast <6 x float> %bi, %ai
>   %xr = fsub fast <6 x float> %i6, %i7
>   %i9 = fmul fast <6 x float> %bi, %ar
>   %i10 = fmul fast <6 x float> %br, %ai
>   %xi = fadd fast <6 x float> %i9, %i10
> 
>   %zr = fsub fast <6 x float> %cr, %xi
>   %zi = fadd fast <6 x float> %ci, %xr
>   %interleaved.vec = shufflevector <6 x float> %zr, <6 x float> %zi, <12 x i32> <i32 0, i32 6, i32 1, i32 7, i32 2, i32 8, i32 3, i32 9, i32 4, i32 10, i32 5, i32 11>
>   ret <12 x float> %interleaved.vec
> }
> ```
Fixed this case (by checking on the backend whether the vector width is a power of 2), and added the test.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-complex-deinterleaving-uniform-cases.ll:79

 ; Expected to not transform
 define arm_aapcs_vfpcc <4 x float> @simple_add_270(<4 x float> %a, <4 x float> %b) {
----------------
dmgreen wrote:
> Is this just not transforming because of commutativity on the add? Can we add a test for `fadd fast <2 x float> %strided.vec20, %strided.vec` too, to show it does transform.
Yep, it's the commutativity. I've added a note to this test stating this, and added another test with the fadd operands reversed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114174/new/

https://reviews.llvm.org/D114174