[PATCH] D63476: [ARM] DLS/LE low-overhead loop code generation

Fri Jun 21 08:38:15 PDT 2019

SjoerdMeijer added inline comments.

================
Comment at: lib/Target/ARM/ARMFinalizeLoops.cpp:25
+#define DEBUG_TYPE "arm-finalize-loops"
+#define ARM_FINALIZE_LOOPS_NAME "ARM loop finalization pass"
+
----------------
Nit, perhaps: "ARM low-overhead loop..."

================
Comment at: lib/Target/ARM/ARMFinalizeLoops.cpp:175
+    // within the loop can only read and write to LR. So, there should be a
+    // mov to setup the count. WLS/DLS perform this move, so find the original
+    // and delete it - inserting WLS/DLS in its place.
----------------
This looks fine for now. It might be that WLS/DLS is an expensive MOV instruction, that we possibly don't even need. But I think that's an optimisation that we can worry about later.

================
Comment at: lib/Target/ARM/ARMInstrThumb2.td:5194
+  4, IIC_Br, []>, Sched<[WriteBr]>;
+}
+
----------------
nit, perhaps `} // isNotDuplicable = 1` 

================
Comment at: test/Transforms/HardwareLoops/ARM/massive.mir:1
+# RUN: llc -mtriple=armv8.1m.main -run-pass=arm-finalize-loops %s -o - | FileCheck %s
+# CHECK: body:
----------------
yes, it is massive! :-) But I think we can simplify this a lot by using intrinsic `@llvm.arm.space`:

  // A space-consuming intrinsic primarily for testing ARMConstantIslands. The
  // first argument is the number of bytes this "instruction" takes up, the second
  // and return value are essentially chains, used to force ordering during ISel.
  def int_arm_space : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], []>;

as mentioned constant island tests are using this, I think you want to do something similar here. 

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63476/new/

https://reviews.llvm.org/D63476