[PATCH] D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently

Tue Apr 25 10:29:07 PDT 2017

wmi added a comment.

Thanks for drafting the comments. It is apparently more descriptive and clearer, and I like the varnames -- (LargeVal and SmallVal), which are much better than what I used -- (OrigVal, MaskedVal). I will rewrite the comments based on your draft.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:374-396
+/// Try to shrink the store when the input \p SI has one of the patterns:
+/// Pattern1: or(and(Val, Cst), MaskedVal).
+/// Pattern2: or/and/xor(load, Cst).
+/// For the first pattern, when the Cst and MaskedVal satisfies some
+/// requirements, the or+and pair has the property that only a portion of
+/// Val is modified and the rest of it are not changed. We want to shrink
+/// the store to be aligned to the modified range of the Val.
----------------
chandlerc wrote:
> After reading more of this routine, I think you should split it into two routines, one that tries to handle the first pattern, and one that only handles the second pattern.
> 
> You can factor the rewriting code that is currently shared by both patterns into utility functions that are called for both. But the logic of this routine is harder to follow because you always have this state to hold between doing two different kinds of transforms.
I tried that splitting before but I had to move many temporaries to MemAccessShrinkingPass. However, some temporaries are only used by store shrinking but not load shrinking, so that looked a little weird. I agree the logic will be clearer if it is split into two routines. I will try again, and see if I can separate some common temporaries into MemAccessShrinkingPass and leave the special temporaries as parameters, or create a store shrinking class to keep the temporaries.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:407
+
+  unsigned TBits = DL->getTypeSizeInBits(StoreTy);
+  if (TBits != DL->getTypeStoreSizeInBits(StoreTy))
----------------
chandlerc wrote:
> `TBits` doesn't really give me enough information as a variable name... Maybe `StoreBitSize`?
Ok, will change it.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:417-430
+  // However the pattern matching below is more complex than that because
+  // of the commutation and some matching preference we expect.
+  // for case: or(and(or(and(LI, Cst_1), MaskedVal_1), Cst_2), MaskedVal_2)
+  // and if MaskedVal_2 happens to be another and operator, we hope we can
+  // match or(and(LI, Cst_1), MaskedVal_1) to Val instead of MaskedVal.
+  bool OrAndPattern =
+      match(Val, m_c_Or(m_And(m_Load(LI), m_ConstantInt(Cst)),
----------------
chandlerc wrote:
> Even after reading your comment I'm not sure I understand what is driving the complexity of this match.
> 
> Can you explain (maybe just here in the review) what patterns you're trying to handle that are motivating this?
> 
> I'm wondering whether there is any canonicalization that can be leveraged (or added if not already there) to reduce the complexity of the pattern here. Or if we really have to handle this complexity, what the best way to write it and/or comment it so that readers understand the result.
I borrow the template in your comments to explain: store(or(and(LargeVal, MaskConstant), SmallVal), address)
The case is:

store(or_1(and_1(or_2(and_2(load, -65281), Val1), -256), and_3(Val2, 7))

The two operands of "or_1" are "and_1" and "and_3", but it doesn't know which subtree of and1 or and3 contains the LargeVal.  I hope or_2 can be matched to the LargeVal. It is a common pattern after bitfield load/store coalescing.

But I realize when I am explaining to you, that I can split the complex pattern matching above into two, which may be simpler.
%%% bool OrAndPattern = match(Val, m_c_Or(m_And(m_Value(LargeVal), m_ConstantInt(Cst)), m_Value(SmallVal)));
if (match(SmallVal, m_c_And(m_c_Or(m_And(m_Value(), m_Value()), m_Value())))
  std::swap(SmallVal, LargeVal);
%%%

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:453-456
+  if (OrAndPattern)
+    analyzeOrAndPattern(*MaskedVal, *Cst, MR, TBits);
+  if (BopPattern)
+    analyzeBOpPattern(*Val, *Cst, MR, TBits);
----------------
chandlerc wrote:
> What happens when both are true? It looks like we just overwrite the 'MR' code?
> 
> I feel like both of these `analyze...()` methods should return the `ModRange` struct rather than having an output parameter.
We will just overwrite "MR" but it is still not good for "OrAndPattern". I will change the second "if" to "else if".

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:467-470
+    // If MR.Width is not the length of legal type, we cannot
+    // store MaskedVal directly.
+    if (MR.Width != 8 && MR.Width != 16 && MR.Width != 32)
+      MR.ShrinkWithMaskedVal = false;
----------------
chandlerc wrote:
> Should this be testing against the `DataLayout` rather than hard coded `8`, `16`, and `32`? What if 64 bits is legal and that's the width of the MR?
That is better. Will fix it.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:604-607
+/// Check the range of Cst containing non-zero bit is within \p MR when
+/// \p AInB is true. If \p AInB is false, check MR is within the range
+/// of Cst.
+static bool CompareAPIntWithModRange(APInt &AI, ModRange &MR, bool AInB) {
----------------
chandlerc wrote:
> This comment and function name don't really add up for me...
> 
> There is no `Cst` parameter here. I assume you mean `AI`?
> 
> Also having a flag like `AInB` seems to make this much more confusing to read. Why not just have two routines for each case?
> 
> My guess at what this is actually trying to do is `areConstantBitsWithinModRange` and `areConstantBitsOutsideModRange`?
Sorry, Cst means AI here. 
The func is doing areConstantBitsWithinModRange and areModRangeWithinConstantBits. 

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:619-623
+/// Check if there is overlap between range \MR and \OtherMR.
+static bool nonoverlap(ModRange &MR, ModRange &OtherMR) {
+  return (MR.Shift > OtherMR.Shift + OtherMR.Width - 1) ||
+         (OtherMR.Shift > MR.Shift + MR.Width - 1);
+}
----------------
chandlerc wrote:
> Maybe a method and use the term 'disjoint'? `MR1.isDisjoint(MR2)` reads a bit better to me.
Ok. 

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:626-628
+/// the MemoryLocation specified by MR. We relax it a little bit to allow
+/// clobber instruction in the same BB as IN, if that happens we need to
+/// set the NewInsertPt to be the new location to insert the shrinked load.
----------------
chandlerc wrote:
> This makes this a very confusing API -- now it isn't really just a predicate, it also computes the insertion point...
> 
> Also, why do you need to compute a particular insertion point within a basic block? Can't you just always insert into the beginning of the basic block and let the scheduler do any other adjustments?
Good point. If there is a clobber instruction, but that clobber instruction is at the same block as "To" Instruction, I can simply insert it at the beginning of the block of "To" instruction, and NewInsertPt is not needed.

But I still prefer to use the insertion point closer to "To" instruction if there is no clobber instruction, because the IR looks better. That means at least a flag showing whether I need to insert at the beginning of the "To" instruction block has to be returned.

I.E., I can simplify "Instruction *&NewInsertPt" to a flag. Is that API acceptable?

Repository:
  rL LLVM

https://reviews.llvm.org/D30416