[PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Thu Oct 30 02:33:19 PDT 2014

> Regarding the correctness issue with struct field indices, I think the solutions you proposed make sense. Look forward to your fix in the next diff. 
Hi Jingyue,

I'm just a bit confused about the "fix" you mean. Did you mean the solution to extract constant in the structure index? The solution is already in the diff.
I'm only not sure about the "a = 1+2" problem, and I've replied to your comment. For other problems, I've fixed in the attached diff. 

Thanks,
-Hao

================
Comment at: lib/Target/AArch64/AArch64TargetMachine.cpp:208
@@ +207,3 @@
+    addPass(createEarlyCSEPass());
+    // Do loop invariant code motion in case any of the address calculations are
+    // actually loop invariant.
----------------
jingyue wrote:
> I'd explain why running GEP reassociation can expose more opportunities for LICM. I guess only the split part (splitting a GEP with multiple indices into a set of smaller GEPs) buys you such opportunities. Right? 
Yes. The whole GEP may not be invariant, but after splitting, we can move the invariant part and leave the variant part in the loop.

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:93
@@ +92,3 @@
+//     basic blocks, CodeGen will not be able to tell if there are common
+//     subexpressions between them.
+//
----------------
jingyue wrote:
> I don't understand reason 2. Can you explain more? Are you saying, given a[i][j1] and a[i][j2], you can reuse the common part &a[i]? 
Yes, GEPs like "getelementptr %ptr, i, j1" and "getelementptr %ptr, i, j2" can not be eliminated. But transformed into simpler forms, "i" and "j1" "j2" are separated, and we can find the common expressions about "i".

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:365
@@ -277,1 +364,3 @@
+  /// operations or several simpler GEPs).
+  bool SimplifyGEP;
 };
----------------
jingyue wrote:
> What would be a good name? You are not really *simplifying* but to some extend *complicating* GEPs :)
> 
> Does LowerGEP sound good to you? You are *lowering* a GEP into multiple uglygeps (btw, uglygep is a known concept in LLVM) or arithmetic operations which are closer to machine code. 
> 
> Make sure the comments are consistent after renaming. 
That make sence. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:857
@@ -765,1 +856,3 @@
 
+int64_t SeparateConstOffsetFromGEP::accumulateByteOffsetForSeqAndStruct(
+    GetElementPtrInst *GEP, bool &NeedsExtraction) {
----------------
jingyue wrote:
> Similarly, this function should be merged into accumulateByteOffset. 
Make sence. It can be controlled by the LowerGEP.

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:889
@@ +888,3 @@
+
+void SeparateConstOffsetFromGEP::transformToSimplerGEPs(
+    GetElementPtrInst *GEP, int64_t AccumulativeByteOffset) {
----------------
jingyue wrote:
> Can we merge the two transformToXXX functions? The logics are pretty similar. 
I think we'd better not do merge. They logic look similar, but they do quite different things. If we want to merge them, we need to add checks like "if (TransformToSimplerGEP)" in many places (From the first for loop logic to the function end). That would be a quite hard work for other people to understand the code.

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1016
@@ +1015,3 @@
+          ConstantOffsetExtractor::Extract(GEP->getOperand(I), DL, GEP);
+      if (NewIdx != nullptr) {
+        // Watch out for the index like a = 1 + 2. If such index is analyzed,
----------------
jingyue wrote:
> OK, now I see what you're doing here. According to our discussion in D5863, this looks like a temporary work around for the CFGSimplification issue, and will eventually go away. I don't think it's a good idea to hack this pass to work around that issue. I'd rather run instsimplify before SeparateConstOffset in the AArch64 backend, and add a FIXME there saying once the CFGSimplification issue is fixed we needn't run instsimplify any more. 
I just don't think this is a temporary work. We cannot assume there is no situation like "a = 1+2". People also can create a test case of "a = 1 + 2" and feed to this pass. I mean people can call this pass without any other passes. So if we don't accumulate all none zero indices in this place, we'll get incorrect result.

We want to "// Remove the constant offset in each GEP index" and "// Splits this GEP index into a variadic part and a constant offset" in this for loop. So the later part about transformation won't consider about the none zero constant indices, it assumes there is all varaible indices or zero indices. See that I only check "if (!isa<ConstantInt>(GEP->getOperand(I)))". So if we leave an index of "constant 1" behand, it will be ignored and we'll generate incorrect result. That's why I want to make sure we have accumulated all the none zero constant indices.

http://reviews.llvm.org/D5864