[PATCH] D114174: [ARM][CodeGen] Add support for complex deinterleaving

Tue Sep 6 03:24:28 PDT 2022

dmgreen added a comment.

Some of this seems to have returned to a previous version of the patch?

================
Comment at: llvm/lib/CodeGen/CMakeLists.txt:49
   CommandFlags.cpp
+        ComplexDeinterleavingPass.cpp
   CriticalAntiDepBreaker.cpp
----------------
dmgreen wrote:
> Formatting is off here
This formatting again?

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:528
+
+  Value *CommonOperand =
+      getCommonValues({RealMulI->getOperand(0), RealMulI->getOperand(1),
----------------
dmgreen wrote:
> What if the operands are FMul(A, A) and FMul (B, C)?
> I think it needs to be more precise about which nodes are which. 
I still think this needs to be a more precise with regard to what is considered the CommonOperand and CommonOperand.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:431
+
+  Value *CommonOperand = getCommonValues(Values);
+  if (CommonOperand == nullptr)
----------------
Similar to identifyPartialMul, this probably needs to be more careful about what it is selecting as which operands.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:568-570
+      if (!NodeCreated)
+        identifyNode(cast<Instruction>(Pair.Imaginary),
+                     cast<Instruction>(Pair.Real), TL, _);
----------------
Is it valid to test the operands in the opposite order?

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:623
+
+  identifyNode(AR, AI, TL, CommonOperandI);
+  identifyNode(BR, BI, TL, CommonOperandI);
----------------
Should these be checking the subnodes?
```
if (!identifyNode(..))
  return nullptr;
```

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:646
+                                         const TargetLowering *TL,
+                                         Instruction *&CommonOperandI) {
+  auto IsZeroInitializer = [&](Value *V) -> bool { return match(V, m_Zero()); };
----------------
CommonOperandI only seems to be important between identifyPartialMul and identifyNodeWithImplicitAdd

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:679-680
+
+    if (!isDeinterleavingMask(RealShuffle->getShuffleMask()) ||
+        !isDeinterleavingMask(ImagShuffle->getShuffleMask()))
+      IdentifySuccess = false;
----------------
These need to check that the first is the real deinterleave with offset=0, and the imag has offset=1. And maybe that they only take elements from the first operand and don't change size.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:723-731
+  if (getOperatingComponentOfValue(RootI->getOperand(0)) !=
+      OperatingComponent::Real) {
+    return false;
+  }
+
+  if (getOperatingComponentOfValue(RootI->getOperand(1)) !=
+      OperatingComponent::Imaginary) {
----------------
I feel that this should be correct by construction. What cases are not correct?

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:790-792
+    // Discover the relevant inputs for a given node.
+    // This can't be moved any sooner, as the inputs could be the replacement
+    // value of a previous node.
----------------
I don't think this should be needed. The inputs should just be present from the ReplacementNode of the operands.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:810
+    // what it might be based on the vector width
+    // TODO Defer to the target for this info. Do we care enough?
+    auto *VTy = cast<FixedVectorType>(CN->Input0->getType());
----------------
If the statistic is being awkward then it is probably not worth keeping (or keeping simple - just counting number of transforms, not the number of individual intrinsics that might become in the backend).

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:21817
+    int Stride = Ty->getNumElements() / 2;
+    const int SplitMask[] = {0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10,
+                             11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
----------------
It is better not to use fixed sized arrays like this, just construct the array to be any needed size. If the values are continuous you can use iota.

================
Comment at: llvm/test/CodeGen/ARM/ComplexArithmetic/complex-arithmetic-f16-add.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -o - | FileCheck %s
+; RUN: llc < %s -o /dev/null -stats -stats-json 2>&1 | FileCheck %s --check-prefix=STATS
----------------
dmgreen wrote:
> I would use run lines that that are similar to the llvm/test/CodeGen/Thumb2/mve-xyz.ll tests. Something like:
> ```
> ; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -verify-machineinstrs %s -o - | FileCheck %s
> ```
> It's best not to use a specific cpu, using the arch instead.
> 
> The tests can probably go in the same place too, under llvm/test/CodeGen/Thumb2/mve-xyz.ll for mve tests.
These tests seem to have moved back to the wrong place.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114174/new/

https://reviews.llvm.org/D114174