[PATCH] D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently

Mon Apr 24 18:41:39 PDT 2017

chandlerc added a comment.

First off, really sorry to keep sending *partial* code reviews. =[ I again didn't quite have enough time to do a full review of the patch (it is a bit large) but wanted to at least send out everything I have so that you aren't blocked waiting on me to produce some comments. =] I'll try again tomorrow to make more progress here although it may start to make sense for me to wait for an iteration as one of the refactorings I'm suggesting here will I think change the structure quite a bit.

In https://reviews.llvm.org/D30416#734516, @wmi wrote:

> Thanks for bearing with my poor English.

Please don't stress at all. =D I think reviewing comments, phrasing, etc., needs to happen in any code review. The whole point is to figure out how to write comments and such in a way that make sense to others, and speaking for myself at least, no level of knowledge about the English language is enough there -- it really requires someone else reading it to figure this out.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:374-396
+/// Try to shrink the store when the input \p SI has one of the patterns:
+/// Pattern1: or(and(Val, Cst), MaskedVal).
+/// Pattern2: or/and/xor(load, Cst).
+/// For the first pattern, when the Cst and MaskedVal satisfies some
+/// requirements, the or+and pair has the property that only a portion of
+/// Val is modified and the rest of it are not changed. We want to shrink
+/// the store to be aligned to the modified range of the Val.
----------------
I have to say I find this entire comment block hard to understand. There are a lot of factors contributing to that.

- The formatting has paragraphs and lists run together.
- It doesn't really introduce things, and then expand on them in a way that helps the reader have the context necessary.
- Inconsistencies like switching between Val and 'load'.
- Referencing things like `ShrinkWithMaskedVal` where the name doesn't clearly convey the underlying semantics you're trying to reference (and there is no documentation).

Here is my attempt at suggesting improved comments, but I'm not sure I'm really capturing everything because I'm not sure I understand the original comment:

  Try to shrink a store by shrinking its inputs.

  The first pattern we try to handle is when a smaller value is "inserted"
  into some bits of a larger value. This typically looks something like:

    store(or(and(LargeVal, MaskConstant), zext(SmallVal)), address)

  Here, we try to use a narrow store of `SmallVal` rather than
  bit operations to combine them:

    store(LargeVal, address)
    store(SmallVal, address)

  Or if `LargeVal` was a load, we may not need to store it at all:

    store(SmallVal, address)

  We may also be able incorporate shifts of `SmallVal` by using an offset
  of `address`.

  However, this may not be valid if, for example, the size of `SmallVal`
  isn't a valid type to store or the shift can't be modeled as a valid
  offset from the address.

  The second kind of pattern we try to shrink (which may also apply to
  code matching the first pattern when that transformation isn't valid)
  is to shrink the inputs to basic bit operations (or, and, xor) and
  then store the smaller width result. This is valid whenever we know
  that shrinking the inputs and operations doesn't change the result.
  For example, if the removed bits are known to be 0s, 1s, or undef,
  we may be able to avoid the bitwise computation on them. This
  is particularly useful when the type width is not a legal type,
  and when the inputs are loads and constants.

You probably want to rewrite this some (as I said, I'm kind of guessing about some of this) but hopefully it helps show a structure that (for myself at least) would be much easier to read and understand.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:374-396
+/// Try to shrink the store when the input \p SI has one of the patterns:
+/// Pattern1: or(and(Val, Cst), MaskedVal).
+/// Pattern2: or/and/xor(load, Cst).
+/// For the first pattern, when the Cst and MaskedVal satisfies some
+/// requirements, the or+and pair has the property that only a portion of
+/// Val is modified and the rest of it are not changed. We want to shrink
+/// the store to be aligned to the modified range of the Val.
----------------
After reading more of this routine, I think you should split it into two routines, one that tries to handle the first pattern, and one that only handles the second pattern.

You can factor the rewriting code that is currently shared by both patterns into utility functions that are called for both. But the logic of this routine is harder to follow because you always have this state to hold between doing two different kinds of transforms.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:407
+
+  unsigned TBits = DL->getTypeSizeInBits(StoreTy);
+  if (TBits != DL->getTypeStoreSizeInBits(StoreTy))
----------------
`TBits` doesn't really give me enough information as a variable name... Maybe `StoreBitSize`?

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:417-430
+  // However the pattern matching below is more complex than that because
+  // of the commutation and some matching preference we expect.
+  // for case: or(and(or(and(LI, Cst_1), MaskedVal_1), Cst_2), MaskedVal_2)
+  // and if MaskedVal_2 happens to be another and operator, we hope we can
+  // match or(and(LI, Cst_1), MaskedVal_1) to Val instead of MaskedVal.
+  bool OrAndPattern =
+      match(Val, m_c_Or(m_And(m_Load(LI), m_ConstantInt(Cst)),
----------------
Even after reading your comment I'm not sure I understand what is driving the complexity of this match.

Can you explain (maybe just here in the review) what patterns you're trying to handle that are motivating this?

I'm wondering whether there is any canonicalization that can be leveraged (or added if not already there) to reduce the complexity of the pattern here. Or if we really have to handle this complexity, what the best way to write it and/or comment it so that readers understand the result.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:453-456
+  if (OrAndPattern)
+    analyzeOrAndPattern(*MaskedVal, *Cst, MR, TBits);
+  if (BopPattern)
+    analyzeBOpPattern(*Val, *Cst, MR, TBits);
----------------
What happens when both are true? It looks like we just overwrite the 'MR' code?

I feel like both of these `analyze...()` methods should return the `ModRange` struct rather than having an output parameter.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:467-470
+    // If MR.Width is not the length of legal type, we cannot
+    // store MaskedVal directly.
+    if (MR.Width != 8 && MR.Width != 16 && MR.Width != 32)
+      MR.ShrinkWithMaskedVal = false;
----------------
Should this be testing against the `DataLayout` rather than hard coded `8`, `16`, and `32`? What if 64 bits is legal and that's the width of the MR?

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:604-607
+/// Check the range of Cst containing non-zero bit is within \p MR when
+/// \p AInB is true. If \p AInB is false, check MR is within the range
+/// of Cst.
+static bool CompareAPIntWithModRange(APInt &AI, ModRange &MR, bool AInB) {
----------------
This comment and function name don't really add up for me...

There is no `Cst` parameter here. I assume you mean `AI`?

Also having a flag like `AInB` seems to make this much more confusing to read. Why not just have two routines for each case?

My guess at what this is actually trying to do is `areConstantBitsWithinModRange` and `areConstantBitsOutsideModRange`?

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:619-623
+/// Check if there is overlap between range \MR and \OtherMR.
+static bool nonoverlap(ModRange &MR, ModRange &OtherMR) {
+  return (MR.Shift > OtherMR.Shift + OtherMR.Width - 1) ||
+         (OtherMR.Shift > MR.Shift + MR.Width - 1);
+}
----------------
Maybe a method and use the term 'disjoint'? `MR1.isDisjoint(MR2)` reads a bit better to me.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:626-628
+/// the MemoryLocation specified by MR. We relax it a little bit to allow
+/// clobber instruction in the same BB as IN, if that happens we need to
+/// set the NewInsertPt to be the new location to insert the shrinked load.
----------------
This makes this a very confusing API -- now it isn't really just a predicate, it also computes the insertion point...

Also, why do you need to compute a particular insertion point within a basic block? Can't you just always insert into the beginning of the basic block and let the scheduler do any other adjustments?

Repository:
  rL LLVM

https://reviews.llvm.org/D30416