[PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Fri Nov 7 17:04:35 PST 2014

I see what's going on: you only extract constant offsets from array indices, and leave struct indices in those transformXXX functions. Thanks for clarification! I think that's worth some comment though, and I will point that out in my inline comments.

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:82
@@ -81,1 +81,3 @@
 //
+// Another improvement enabled by the LowerGEP flag is to transform the given
+// complex GEP into a simpler form of several simpler GEPs or arithmetic
----------------
The word "simpler" is overloaded. I'd explicitly say something like:

// to lower a GEP with multiple indices into multiple GEPs with a single index or arithmetic operations ...

Is this more accurate? 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:102
@@ +101,3 @@
+// If LowerGEP is true, this pass will transform them into arithmetic operations
+// as follows (Another simpler form of simpler GEPs is similar):
+//  BB1:
----------------
The sentence "another simpler form of simpler GEPs is similar" is a little awkward. The word "simple" is very overloaded. What do you mean exactly by "simpler"? Consider using multiple sentences and/or an example to explain what this lowering to GEPs exactly mean. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:125
@@ +124,3 @@
+//
+// This can also benefit other passes such as LICM and CGP.
+// LICM (Loop Invariant Code Motion) can not optimize a complex GEP if it has
----------------
This => Lowering GEPs

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:126
@@ +125,3 @@
+// This can also benefit other passes such as LICM and CGP.
+// LICM (Loop Invariant Code Motion) can not optimize a complex GEP if it has
+// variant part. Such transformation may split it into invariant and variant
----------------
The word "optimize" is overloaded. Is hoist/sink a better word? 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:131
@@ +130,3 @@
+// target's addressing modes. A complex GEP may not match and will not be sunk.
+// Such transformation can split it. Then part of the original GEP may be
+// matched and should be sunk, so we end up with better address calculations.
----------------
Such transformation can split it => Lowering GEPs can split such GEPs

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:184
@@ -130,3 +183,3 @@
   /// Looks for a constant offset without extracting it. The meaning of the
   /// arguments and the return value are the same as Extract.
   static int64_t Find(Value *Idx, const DataLayout *DL, GetElementPtrInst *GEP);
----------------
You may want to say Find looks for constant offsets in both array indices and field indices. 

Because the signature of Extract is changed, you need to remove the comment "The meaning of the arguments and the return value ..." and explicitly explain what its arguments and return value mean. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:313
@@ +312,3 @@
+  /// \p AccumulativeByteOffset The extracted constant offset.
+  void transformToSimplerGEPs(GetElementPtrInst *GEP,
+                              int64_t AccumulativeByteOffset);
----------------
I don't like the word "transform" and "simpler". To be more accurate, I'd use lower instead of transform. What do you exactly mean by "simpler"? Are you lowering a GEP with multiple indices to multiple GEPs with a single index? If that's the case, I'd say that explicitly in order to be more accurate. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:319
@@ +318,3 @@
+  /// \p AccumulativeByteOffset The extracted constant offset.
+  void transformToArithmetics(GetElementPtrInst *GEP,
+                              int64_t AccumulativeByteOffset);
----------------
ditto

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:323
@@ +322,3 @@
+  /// LowerGEP is true, it will find in indices of both sequential and structure
+  /// types, else it will only find in sequential indices. The output
+  /// NeedsExtraction indicates whether we can find a none zero constant offset.
----------------
will find => finds
will only find => only finds
else => ; otherwise, 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:324
@@ -258,1 +323,3 @@
+  /// types, else it will only find in sequential indices. The output
+  /// NeedsExtraction indicates whether we can find a none zero constant offset.
   int64_t accumulateByteOffset(GetElementPtrInst *GEP, bool &NeedsExtraction);
----------------
none zero => non-zero

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:345
@@ +344,3 @@
+  const TargetMachine *TM;
+  /// Whether transforms a complex GEP into a simpler form (Arithmetic
+  /// operations or several simpler GEPs).
----------------
/// Whether to transform

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:713
@@ +712,3 @@
+  // InBounds when creating MUL/SHL.
+  bool InBounds = GEP->isInBounds();
+  IRBuilder<> Builder(GEP);
----------------
If you buy that transformXXX should run after setIsInBounds(false), you needn't check inbound in this function. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:717
@@ +716,3 @@
+
+  SmallVector<Value *, 4> ResultIndices;
+  gep_type_iterator GTI = gep_type_begin(*GEP);
----------------
These are essentially byte offsets instead of indices, because they are scaled by element sizes. Do we have a better name? ByteOffsetOfIndex? 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:717
@@ +716,3 @@
+
+  SmallVector<Value *, 4> ResultIndices;
+  gep_type_iterator GTI = gep_type_begin(*GEP);
----------------
jingyue wrote:
> These are essentially byte offsets instead of indices, because they are scaled by element sizes. Do we have a better name? ByteOffsetOfIndex? 
Actually, is this vector even necessary? Can you build the resultant uglygep along the way? 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:719
@@ +718,3 @@
+  gep_type_iterator GTI = gep_type_begin(*GEP);
+  // Create new indices for each vairable indices.
+  for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
----------------
vairable => variable

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:721
@@ +720,3 @@
+  for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
+    if (!isa<ConstantInt>(GEP->getOperand(I))) {
+      Value *Idx = GEP->getOperand(I);
----------------
Why checking whether it's a constant? Shouldn't you check whether it is an array index? My understanding is, the offsets of structure indices (which are guaranteed to be constant) and the constant offsets of array indices are already counted in accumulativeByteOffsets, and now you only need to rebuild the variadic part of the array indices. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:726
@@ +725,3 @@
+      if (ElementSize != 1) {
+        if (ElementSize.isPowerOf2())
+          Idx = Builder.CreateShl(
----------------
{ at the end

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:730
@@ +729,3 @@
+              false, InBounds);
+        else
+          Idx = Builder.CreateMul(Idx, ConstantInt::get(IntPtrTy, ElementSize),
----------------
} else {

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:744
@@ +743,3 @@
+
+  // Create simpler GEPs for each variable indices.
+  for (auto I = ResultIndices.begin(), E = ResultIndices.end(); I != E; ++I) {
----------------
The meaning of "simpler" here is unclear. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:779
@@ +778,3 @@
+  for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
+    if (!isa<ConstantInt>(GEP->getOperand(I))) {
+      Value *Idx = GEP->getOperand(I);
----------------
ditto. See Line 721

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:784
@@ +783,3 @@
+      if (ElementSize != 1) {
+        if (ElementSize.isPowerOf2())
+          Idx = Builder.CreateShl(
----------------
ditto. curly brackets. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:797
@@ +796,3 @@
+  // Create an ADD for the constant offset.
+  if (AccumulativeByteOffset != 0)
+    ResultPtr = Builder.CreateAdd(
----------------
{

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:801
@@ +800,3 @@
+        false, InBounds);
+
+  ResultPtr = Builder.CreateIntToPtr(ResultPtr, GEP->getType());
----------------
}

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:827
@@ +826,3 @@
+  // splitting probably won't be beneficial.
+  if (!LowerGEP) {
+    TargetTransformInfo &TTI = getAnalysis<TargetTransformInfo>();
----------------
Why do we need to check LowerGEP here? My understanding is, even when LowerGEP is set, +AccumulativeByteOffset will still become part of the lowered GEPs or arithmetics (see how transformToSimplerGEPs create a GEP with offset AccumulativeByteOffset, and how transformToArithmetics create an add with offset AccumulativeByteOffset). Shouldn't we still check whether this offset can be nicely folded into an addressing mode? 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:836
@@ -654,2 +835,2 @@
 
   // Remove the constant offset in each GEP index. The resultant GEP computes
----------------
// ... in each sequential index

Emphasizing you only remove constant offsets from sequential indices is important, because the behavior here is slightly different from accumulateByteOffset which looks into both sequential and struct indices. 

You also need to explain why removing constant offsets in struct field index here is dangerous, and mention the transformXXX functions will take care of that. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:863
@@ +862,3 @@
+
+  // Transforms a complex GEP into a simpler form.
+  if (LowerGEP) {
----------------
Run these transformXXX functions before GEP->setIsInBounds(false) seems wrong to me, because transformXXX may treat off-bound GEPs as inbound and add nsw flags incorrectly. 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:866
@@ +865,3 @@
+    // As currently BasicAA does not analyze ptrtoint/inttoptr, do not transform
+    // to such form if the target use alias analysis in codegen.
+    if (TM && TM->getSubtarget<TargetSubtargetInfo>().useAA())
----------------
"such form" is too obscure

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:889
@@ +888,3 @@
+
+void SeparateConstOffsetFromGEP::transformToSimplerGEPs(
+    GetElementPtrInst *GEP, int64_t AccumulativeByteOffset) {
----------------
HaoLiu wrote:
> jingyue wrote:
> > Can we merge the two transformToXXX functions? The logics are pretty similar. 
> I think we'd better not do merge. They logic look similar, but they do quite different things. If we want to merge them, we need to add checks like "if (TransformToSimplerGEP)" in many places (From the first for loop logic to the function end). That would be a quite hard work for other people to understand the code.
The more I look at these two functions, the more strongly I believe we should merge them. Especially, after you get rid of ResultIndices, you probably only need to check LowerGEP once in the merged function. No? 

================
Comment at: lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1016
@@ +1015,3 @@
+          ConstantOffsetExtractor::Extract(GEP->getOperand(I), DL, GEP);
+      if (NewIdx != nullptr) {
+        // Watch out for the index like a = 1 + 2. If such index is analyzed,
----------------
HaoLiu wrote:
> jingyue wrote:
> > OK, now I see what you're doing here. According to our discussion in D5863, this looks like a temporary work around for the CFGSimplification issue, and will eventually go away. I don't think it's a good idea to hack this pass to work around that issue. I'd rather run instsimplify before SeparateConstOffset in the AArch64 backend, and add a FIXME there saying once the CFGSimplification issue is fixed we needn't run instsimplify any more. 
> I just don't think this is a temporary work. We cannot assume there is no situation like "a = 1+2". People also can create a test case of "a = 1 + 2" and feed to this pass. I mean people can call this pass without any other passes. So if we don't accumulate all none zero indices in this place, we'll get incorrect result.
> 
> We want to "// Remove the constant offset in each GEP index" and "// Splits this GEP index into a variadic part and a constant offset" in this for loop. So the later part about transformation won't consider about the none zero constant indices, it assumes there is all varaible indices or zero indices. See that I only check "if (!isa<ConstantInt>(GEP->getOperand(I)))". So if we leave an index of "constant 1" behand, it will be ignored and we'll generate incorrect result. That's why I want to make sure we have accumulated all the none zero constant indices.
I'm confused. Check my comment on the line 

```
if (!isa<ConstantInt>(GEP->getOperand(I)))
```
and if my comment there makes sense I'll come back here.

http://reviews.llvm.org/D5864