[PATCH] D46477: [AARCH64] Change max stores for memcpy/memmov/memset and gang up loads and stores (for memcpy) for pairing.

Tue May 8 09:24:21 PDT 2018

sebpop added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/TargetLowering.h:2519
+  ///
+  /// When memcpy is inline based on MaxStoresPerMemcpy, specify maximum number
+  /// of store instructions that you can want to keep tother. This helps in
----------------
s/inline/inlined/

================
Comment at: llvm/include/llvm/CodeGen/TargetLowering.h:2520
+  /// When memcpy is inline based on MaxStoresPerMemcpy, specify maximum number
+  /// of store instructions that you can want to keep tother. This helps in
+  /// pairing and vectorazation later on.
----------------
s/that you can want//
s/tother/together/

================
Comment at: llvm/include/llvm/CodeGen/TargetLowering.h:2521
+  /// of store instructions that you can want to keep tother. This helps in
+  /// pairing and vectorazation later on.
+  unsigned MaxGluedStoresPerMemcpy = 0;
----------------
s/vectorazation/vectorization/

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:92

+static cl::opt<bool> DisableMemCpyDAGOpt("disable-memcpy-dag-opt",
+       cl::Hidden, cl::init(false),
----------------
To avoid double negative, which is confusing, please rename this command line option as EnableMem... .

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:95
+       cl::desc("Gang up loads and stores generated by inlining of memcpy "
+                "such that such loads and stores can be paired."));
+
----------------
remove this second part of the sentence.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5363
+      }
+
     }
----------------
Remove empty line.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5397

+  unsigned GluedLdStLimit = (MaxLdStGlue == 0 ?
+                                TLI.getMaxGluedStoresPerMemcpy() : MaxLdStGlue);
----------------
remove parentheses

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5402
+  if (NumLdStInMemcpy) {
+  // Only if we have load/stores because of memcpy. It maybe that mempy
+  // might be converted to memset if it's memcpy of constants. In that case,
----------------
The first part is not a sentence, please rewrite in a non-ambiguous way.
s/maybe/may be/
s/mempy/memcpy/

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5406
+    if ((GluedLdStLimit <= 1) || DisableMemCpyDAGOpt) {
+    // If target does not care, just leave as it.
+      for (unsigned i = 0; i < NumLdStInMemcpy; ++i) {
----------------
clang-format

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5412
+    }
+    else {
+      // Ld/St less than/equal limit set by target.
----------------
clang-format

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5424
+        for (unsigned cnt = 0; cnt < NumberLdChain; ++cnt) {
+          unsigned IndexFrom = (NumLdStInMemcpy - GlueIter - GluedLdStLimit);
+          unsigned IndexTo   = (NumLdStInMemcpy - GlueIter);
----------------
remove parentheses

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5425
+          unsigned IndexFrom = (NumLdStInMemcpy - GlueIter - GluedLdStLimit);
+          unsigned IndexTo   = (NumLdStInMemcpy - GlueIter);
+
----------------
remove parentheses

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:582

-  MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 8;
-  MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 4;
-  MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 4;
+  MaxStoresPerMemset = 32;
+  MaxStoresPerMemsetOptSize = 8;
----------------
Let's commit this change https://reviews.llvm.org/D45098 separately after this patch lands.

================
Comment at: llvm/test/CodeGen/AArch64/arm64-memset-to-bzero.ll:11
+; CHECK: stp [[R0:q[0-9]+]], [[R0:q[0-9]+]], [x0, #224]
+; CHECK: stp [[R0:q[0-9]+]], [[R0:q[0-9]+]], [x0]
 define void @fct1(i8* nocapture %ptr) {
----------------
You won't need to change this testcase if you remove the change to "MaxStoresPerMemset = 32;"

================
Comment at: llvm/test/CodeGen/AArch64/arm64-memset-to-bzero.ll:89
   %tmp = tail call i64 @llvm.objectsize.i64(i8* %ptr, i1 false)
-  %call = tail call i8* @__memset_chk(i8* %ptr, i32 1, i64 256, i64 %tmp)
+  %call = tail call i8* @__memset_chk(i8* %ptr, i32 1, i64 260, i64 %tmp)
   ret void
----------------
Do we need to change the testcase here? Why?

================
Comment at: llvm/test/CodeGen/AArch64/arm64-misaligned-memcpy-inline.ll:10
 entry:
-  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %out, i8* %in, i64 16, i1 false)
+  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %out, i8* %in, i64 24, i1 false)
   ret void
----------------
Do we need to change the testcase here? Why?

Repository:
  rL LLVM

https://reviews.llvm.org/D46477