[PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Thu Nov 13 10:48:50 PST 2014

A general suggestion: I guess you upload your diff using Phab's web interface. If so, I suggest you upload patches with full context (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface) from now on or use arc which uploads full context by default. It's a little difficult to review a diff with only 3-line contexts and Phab seems to have a little trouble computing the diff between two diffs with limited context (at least the line numbers don't match in many places).

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:720
@@ +719,3 @@
+      StructType *StTy = cast<StructType>(*GTI);
+      int64_t Field = cast<ConstantInt>(GEP->getOperand(I))->getSExtValue();
+      // Skip field 0 as the offset is always 0.
----------------
HaoLiu wrote:
> jingyue wrote:
> > Not critical, but I slightly prefer getZExtValue() because a field index is non-negative and getElementOffset takes unsigned. 
> Aha, previously I use getZExtValue(). You commented I should change to getSExtValue(). I thought it was not critical, so I modified it. 
> Now, It seems I should roll back...
> : )
No, I don't think I suggested you change it to getSExtValue(). When I said "getSExtValue() returns uint64_t", I expected you to write
```
uint64_t Field = ...->getZExtValue();
```
because I slightly prefer delaying truncating which loses precision. But, anyway, I'm fine with "unsigned". 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:184
@@ -130,3 +183,3 @@
   /// Looks for a constant offset without extracting it. The meaning of the
   /// arguments and the return value are the same as Extract.
   static int64_t Find(Value *Idx, const DataLayout *DL, GetElementPtrInst *GEP);
----------------
HaoLiu wrote:
> jingyue wrote:
> > jingyue wrote:
> > > You may want to say Find looks for constant offsets in both array indices and field indices. 
> > > 
> > > Because the signature of Extract is changed, you need to remove the comment "The meaning of the arguments and the return value ..." and explicitly explain what its arguments and return value mean. 
> > I don't see the first part of this comment addressed. I'd like to emphasize "both array indices and field indices". 
> I think "both array indices and field indices" means all indices. We don't need to explain this common behavior. It is quite obvious. I think we need comments if we only find in one kind of "array indices" or "field indices", or do something special.
> Anyway, the new comments will be like the comments for Extract()
That makes sense. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:721
@@ +720,3 @@
+  for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
+    if (!isa<ConstantInt>(GEP->getOperand(I))) {
+      Value *Idx = GEP->getOperand(I);
----------------
HaoLiu wrote:
> jingyue wrote:
> > HaoLiu wrote:
> > > jingyue wrote:
> > > > Why checking whether it's a constant? Shouldn't you check whether it is an array index? My understanding is, the offsets of structure indices (which are guaranteed to be constant) and the constant offsets of array indices are already counted in accumulativeByteOffsets, and now you only need to rebuild the variadic part of the array indices. 
> > > Yes, you are right. I'll add comments to explain this.
> > I wasn't just proposing a comment change. I meant the source code should read something like
> > ```
> > if (isa<SequentialType>(GTI))
> > ```
> > You are building new GEPs/arithmetics for the remainder of the array indices, and checking whether the indexed type is a SequentialType matches your intention much better. 
> > 
> > I also believe checking for SequentialType instead of ConstantInt solves your previous question on handling "a = 1 + 2". I'll go back to that later. 
> If check SequentialType, we also need to check wheter it is ConstantInt. Then the code will be like:
>     if (isa<SequentialType>(GTI))
>       if (!isa<ConstantInt>(GEP->getOperand(I))
>          ...
>       else
>          AccumulativeByteOffset += Const
> This looks more complicate.
> 
> About the "a = 1 + 2", I think handling it when extracting constant in indices is better. Then we don't need to add code in both lowerToXXX functions. Also if we handle it in lowerToXXX functions, the logic for the original splitGEP (I.E. when lowerGEP is diabled) can still not handle this problem.
> 
I'm confused. How does AccumulativeByteOffset in your snippet come up here? I think AccumulativeByteOffset is computed before calling transformToSimplerGEPs. Why do you bother changing that again? 

In case I haven't made this clear, I never expect Find/Extract always extracts all constant offsets. For example, given a = 1 + 2, I'm totally fine if the Find function finds only 2, and in that case the remaining code should treat 1 just as a variable remainder. The GEP after extraction may look like "gep base, i1, 1, i2" which may not be the optimal form, but again I am fine with that because it's sound. If we want to get the optimal form, I am looking for either a more comprehensive way (not just handling tricky cases like adding two constants; your initial diff might serve as a good start but the code was too complicated) or reassociating each index before this pass so that all constant offsets in the index are merged (which I suppose is the approach we want to go with). 

With that, do we still need the nested "if (!isa<CostantInt>)" check? Do you have an example where simply checking isa<SequentialType> isn't enough? I think a concrete example may help to make things clearer. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:827
@@ +826,3 @@
+  // splitting probably won't be beneficial.
+  if (!LowerGEP) {
+    TargetTransformInfo &TTI = getAnalysis<TargetTransformInfo>();
----------------
HaoLiu wrote:
> jingyue wrote:
> > HaoLiu wrote:
> > > jingyue wrote:
> > > > Why do we need to check LowerGEP here? My understanding is, even when LowerGEP is set, +AccumulativeByteOffset will still become part of the lowered GEPs or arithmetics (see how transformToSimplerGEPs create a GEP with offset AccumulativeByteOffset, and how transformToArithmetics create an add with offset AccumulativeByteOffset). Shouldn't we still check whether this offset can be nicely folded into an addressing mode? 
> > > I think even a pointer plus a constant index can not be matched in an addressing mode, it is worth to lower a GEP with multiple indices. Because we can do optimizations on the variable indices. Also if the constant index is extracted from multiple indices, the result looks better than the original GEP.
> > > If I remove such check, some test cases will fail. So it is kept when disable LowerGEP.
> > That makes some sense. Out of curiosity, which test would fail? 
> Oh, interesting. I supposed some of your test cases in test/Transforms/SeparateConstOffsetFromGEP/NVPTX may fail.
> But I tried removing such check and ran "make check", there is no test failures. So it seems there is no such test cases...
That was my suspicion. This check on the addressing mode is a very weak check: I believe both NVPTX and AArch64 support reg+<a potentially very large offset>, so I would be surprised that having/not having this check would cause tests to fail. 

You explained pretty well why it's still worth to lower a GEP even if address mode matching fails. I'd like to see that in the source code comments. Thanks! 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:889
@@ +888,3 @@
+
+void SeparateConstOffsetFromGEP::transformToSimplerGEPs(
+    GetElementPtrInst *GEP, int64_t AccumulativeByteOffset) {
----------------
HaoLiu wrote:
> jingyue wrote:
> > jingyue wrote:
> > > HaoLiu wrote:
> > > > jingyue wrote:
> > > > > Can we merge the two transformToXXX functions? The logics are pretty similar. 
> > > > I think we'd better not do merge. They logic look similar, but they do quite different things. If we want to merge them, we need to add checks like "if (TransformToSimplerGEP)" in many places (From the first for loop logic to the function end). That would be a quite hard work for other people to understand the code.
> > > The more I look at these two functions, the more strongly I believe we should merge them. Especially, after you get rid of ResultIndices, you probably only need to check LowerGEP once in the merged function. No? 
> > I don't see this comment addressed. 
> I still doubt about your suggestion. The diff 16027 gets ride of ResultIndices. They are still quite different and difficult to combine them.
> Lowering to arithmetic operations starts from creating ptrtoint, and then creates ADD/MUL/SHL, and ends with creating ADD and inttoptr. 
> Lowering to GEPs starts from creating bitcast, and then creates MUL/SHL/GEPs, and ends with creating GEP and bitcast. 
Anyway, I'll probably do that by myself after this diff is in.

http://reviews.llvm.org/D5864