[PATCH] D114174: [ARM][CodeGen] Add support for complex addition and multiplication

Wed Jun 22 03:41:47 PDT 2022

dmgreen added a comment.

I haven't really looked to deeply into the meat of the pass yet - how it does the matching - but I had a chunk of comments for the rest.

What happened to the ideas of starting from a shuffle and working through a worklist of pairs of [Real, Imaginary] Values that we match to complex nodes? It would build up a graph of complex nodes to replace the original shuffle, providing that the leaves were all valid de-interleavings. At least on paper I feel we should just be able to perform the search without looking though a lot of uses (except for checking there are no extra uses, of course), and all the Unmatched instructions and what looks like matching of random unrelated snippets would be cleaned up.

================
Comment at: llvm/lib/CodeGen/CMakeLists.txt:49
   CommandFlags.cpp
+        ComplexDeinterleavingPass.cpp
   CriticalAntiDepBreaker.cpp
----------------
Formatting is off here

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1-2
+//===- ComplexDeinterleavingPass.cpp
+//------------------------------------------===//
+//
----------------
This looks like it got formatted incorrectly.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:99
+
+    for (const auto &item : ContainedInstructions) {
+      for (unsigned i = 0; i < item->getNumOperands(); i++) {
----------------
item -> Item

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:100
+    for (const auto &item : ContainedInstructions) {
+      for (unsigned i = 0; i < item->getNumOperands(); i++) {
+        auto *V = item->getOperand(i);
----------------
i -> I (or a better name, if I is already used)

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:149
+  llvm::TargetTransformInfo::TargetCostKind CostKind =
+      llvm::TargetTransformInfo::TCK_Latency;
+
----------------
Reciprocal Throughput is more common.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:511
+
+bool ComplexDeinterleaving::evaluateComplexDeinterleavingBasicBlock(
+    BasicBlock *B) {
----------------
evaluateComplexDeinterleavingBasicBlock -> evaluateBasicBlock

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:991-992
+        CN->Accumulator);
+    if (!CN->ReplacementNode) {
+      LLVM_DEBUG(dbgs() << "Target failed to create Intrinsic call.\n");
+      return false;
----------------
It is better to structure things in a way where we decide whether to do something, then do it. Not get half way through doing it and decide we didn't want to in the end. In what ways would we expect createComplexDeinterleavingIR to return nullptr at the moment?

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1017
+                    << ".\n");
+  if (CostOfIntrinsics > CostOfNodes) {
+    LLVM_DEBUG(dbgs() << "Not replacing, cost was too high.\n");
----------------
Is this cost necessary at the moment, or will it always be profitable for the simple cases?

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:21738-21742
+  if (!VTy)
+    return false;
+
+  if (VTy->getNumElements() * VTy->getScalarSizeInBits() != 128)
+    return false;
----------------
Things like this can be a single if:
```
if (!VTy || VTy->getNumElements() * VTy->getScalarSizeInBits() != 128)
    return false;
```

I presume the == 128 can be removed if we teach it how to split the vectors up?

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:21773
+
+    if (!ConstMulRot)
+      return nullptr;
----------------
Can this be an assert instead.

================
Comment at: llvm/lib/Target/ARM/ARMTargetMachine.cpp:437
   // Match interleaved memory accesses to ldN/stN intrinsics.
-  if (TM->getOptLevel() != CodeGenOpt::None)
+  if (TM->getOptLevel() != CodeGenOpt::None) {
     addPass(createInterleavedAccessPass());
----------------
This doesn't need to add the new brackets.

================
Comment at: llvm/test/CodeGen/ARM/ComplexArithmetic/complex-arithmetic-f32-mul.ll:32

 define <4 x float> @complex_mul_v4f32(<4 x float> %a, <4 x float> %b) #0 {
 ; CHECK-LABEL: complex_mul_v4f32:
----------------
Add arm_aapcs_vfpcc to any tests that take or return vectors.
I think you can remove #0 too.

================
Comment at: llvm/test/CodeGen/ARM/ComplexArithmetic/complex-arithmetic-f64-mul.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s --mattr=+mve.fp -o - | FileCheck %s
+
----------------
It might be a little cleaner to add +fp64 for all these f64 tests. 

================
Comment at: llvm/test/CodeGen/ARM/ComplexArithmetic/complex-arithmetic-rotations-add.ll:1
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc < %s --mattr=+mve.fp,+neon -o - | FileCheck %s
----------------
MVE tests can go into llvm/test/CodeGen/Thumb2/mve-complex-xyz.ll. So long as they are all updated by the test script, that should be fine to keep them with the rest.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114174/new/

https://reviews.llvm.org/D114174