[LLVMdev] Continuing PR5680: preserve order of use lists in bitcode

Wed Jul 23 18:14:07 PDT 2014

I had a long discussion with Nick about this where he changed several of my
core assumptions and feelings about the best way to proceed here. I'd
rather he provide his perspective rather me try to repeat it, so poking him
on this thread....

On Sun, Jul 20, 2014 at 3:21 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2014 Jul 8, at 18:47, Duncan P. N. Exon Smith <dexonsmith at apple.com>
> wrote:
> >
> >> On 2014-Jul-08, at 16:48, Chandler Carruth <chandlerc at google.com>
> wrote:
> >>
> >>
> >>> On Tue, Jul 8, 2014 at 4:29 PM, Duncan P. N. Exon Smith <
> duncan at exonsmith.com> wrote:
> >>> I'm looking to tackle PR5680 [1].  The primary goal is to prevent
> >>> behaviour changes in passes that depend on the order of use lists when
> >>> serializing/deserializing the IR to a bitcode file (.bc) between
> passes.
> >>>
> >>> Here's a quick high-level summary:
> >>>
> >>>  - When writing bitcode, calculate what the use list order will be
> >>>    after reading the bitcode.  (Assumption: the order is predictable.)
> >>>
> >>>  - Write a use-list-order-diff in a new (optional) section in the
> >>>    bitcode.
> >>>
> >>>  - When reading bitcode, apply the use-list-order-diff to restore the
> >>>    original use list order.
> >>
> >> So, it may be totally obvious, but is there a reason not to embed the
> observed order of the use list in the bitcode more directly? This seems
> quite round-about, and I'm curious why its needed.
> >
> > I think it would be harder (for the author *and* the compiler) to
> > read in the "correct" order on-the-fly.  In particular, I'm not
> > sure how to do it without explicitly storing full use-lists and
> > creating forward-refs to fill them up.
> >
> > So this design (when it's turned on) trades increased writer
> > complexity for smaller file size and decreased reader complexity.
>
> I've looked more deeply into how values are stored in bitcode.
>
> Here are some notes about how values are stored in bitcode, how reading
> affects their use-list-orders, my plan on how to proceed (skip there for
> TL;DR), and unresolved questions about future direction.
>
> How values are stored in bitcode
> ================================
>
> Every `Value` that has a use-list is mapped to an `unsigned` value ID,
> assigned from 1 in the order that values will be written to bitcode.
> `void` instructions (such as terminators) are not assigned value IDs,
> since they cannot be referenced.
>
> When writing `User`s, operands are indicated by their value ID.
>
> Assigning value IDs
> -------------------
>
> Globals are assigned value IDs first, and then written out inside the
> module block.  `GlobalValue`s are given the first set of value IDs, then
> all the non-`GlobalValue` constants that they use (in particular,
> constants used by aliases and variable initializers).
>
> For each function block, values in the function are assigned value IDs.
> The constants used by the function (that are not in the global block)
> are given a contiguous set of value IDs at the beginning, followed by
> other values.  After the function is written, the value IDs are reset to
> the state from before -- that is, including only the globals.  As a
> result, the function-local value IDs are reused for different `Value`s.
>
> If a constant is used by any global, it's only written out once (in the
> module block), even if it also has users in functions.  If a constant is
> not used by a global, it's written out in each function that uses it.
>
> Optimizing the constants for size
> ---------------------------------
>
> Each block of value IDs has a contiguous set of (non-`GlobalValue`)
> constants.  These constants are created in an order that precludes
> forward-references within the constant pool.
>
> However, Before they're used, the constants are sorted by type to save
> space in the bitcode (write one type, then all the constants that use
> that type).  This inserts forward references between constants.
>
> Value IDs in the reader
> -----------------------
>
> As the reader traverses the bitcode, values are implicitly assigned
> value IDs as they're created (skipping `void` instructions).  When a
> `User` has a forward-reference to a higher value ID, it uses a
> placeholder operand.  When that `Value` is eventually created, it RAUWs
> the placeholder.
>
> Constants typically can't be RAUW-ed directly: since they can have
> constant `User`s, those `User`s must also be replaced to use the new
> operands.  In order to minimize this, resolving constant placeholders is
> deferred until all constants (in the block) have been read, when they're
> replaced in bulk.
>
> Note that if not for the size optimization above, the only forward
> references to constants would be from `GlobalValue`s.
>
> Straw man: writing out use-lists directly
> =========================================
>
> Writing out use-lists directly is impractical.  Not all `User`s have
> value IDs, and those that do have overlapping value IDs.
>
> Theoretically, when writing out `User`s we could annotate its operands
> by which slot this `User` should take in each operand's use-list.
> However, I think this would change the bitcode too fundamentally, and
> would require more complicated management in the reader.
>
> Predicting the use-list order in the reader
> ===========================================
>
> Use-lists are stored in a link list.  `User`s are pushed on in the order
> that they're added.  When one `Value` is replaced with another (via RAUW
> or equivalent), its `User`s are popped off of one use-list and pushed
> onto the other, reversing the use-list order.
>
> Since the order of value IDs corresponds to the order that `Value`s are
> written to bitcode, we can perfectly predict the order of use-lists
> after reading them back in.
>
> Note that if the order matches, we don't need to write anything out.
>
> Normal values
> -------------
>
> For a typical `Value` (not constant, no `void` users), its use-list
> order is easily predictable.  Its forward references will be pushed
> first, then the use-list will be reversed when the `Value` is RAUW-ed,
> and finally its normal (back) references will be pushed.
>
> E.g., assume id-3 has four users: id-1, id-2, id-4, and id-5.  Given
> that values are read in order of value ID, this is how the use-list for
> id-3 will change as each value is read:
>
>   - after id-1: [id-1]
>   - after id-2: [id-2, id-1]
>   - after id-3: [id-1, id-2]
>   - after id-4: [id-4, id-1, id-2]
>   - after id-5: [id-5, id-4, id-1, id-2]
>
> For such "normal" values, it's trivial to sort a use-list to match what
> the reader will see:
>
>     auto compareUsers = [&IDs, ValueID](User *LHS, User *RHS) {
>         unsigned LID = IDs.lookup(LHS), RID = IDS.lookup(RHS);
>         if (LID > ValueID)
>             return LID > RID;
>         if (RID > ValueID)
>             return false;
>         return LID < RID;
>     }
>
> Void users
> ----------
>
> Instructions that have `void` type (such as `store` and terminators)
> aren't given value IDs at all.  However, we need an ordering for all
> users.
>
> Fortunately, when processing each function, all instructions are given
> an instruction ID.  We could modify `compareUsers()` above to take a
> function `getUserID()` written something like this:
>
>     auto getUserID = [&InstIDs, &IDs](User *U) {
>         if (unsigned ID = InstIDs.lookup(U))
>             return IDs.size() + ID;
>         return IDs.lookup(U);
>     };
>     auto compareUsers = [&getUserID](User *LHS, User *RHS) { /* ... */ }
>
> Constant forward-refs
> ---------------------
>
> Constants are more complicated.
>
>  1. Since (non-`GlobalValue`) constants are uniqued, when a constant's
>     operand is replaced, it needs to be replaced as well.  This causes
>     an extra reversal of its use list.  If a constant's operand's
>     operand is replaced, this has a ripple effect up the chain.
>
>     If a single constant has multiple operands that are placeholders,
>     both placeholders are resolved at once, causing only a single
>     reversal.
>
>  2. Constants in the module block can be referenced from any function
>     block, so their value IDs may overlap.
>
>  3. Function-local constants can also have users in multiple functions.
>
> (1) can be solved in an obvious way: design away the replacement
> behaviour.  In particular, when we are saving use lists, we can skip the
> size optimization that reorders constants, or enable a limited version
> that prevents forward references.
>
> The obvious advantage of this solution is that it lowers the complexity of
> predicting use-list-order.  A more subtle benefit is that use-list order
> won't depend on changes to the reader-side optimizations for resolving
> constant references in bulk.
>
> (2) and (3) can be solved together by keeping `User`s of
> unfinished-constants ordered in a side table.  At the end of each block,
> if all the `User`s have been serialized, then we can write out the
> use-lists for the constant as well.
>
> We can modify the `getUserID()` function from above to incorporate the
> side table:
>
>     auto getUserID = [&ConstantUserIDs, &InstIDs, &IDs](User *U) {
>         if (unsigned ID = ConstantUserIDs.lookup(U))
>             return IDs.size() + ID;
>         if (unsigned ID = InstIDs.lookup(U))
>             return IDs.size() + ConstantUserIDs.size() + ID;
>         return IDs.lookup(V);
>     };
>
> Shuffling the use-list
> ======================
>
> Assume for simplicity that value IDs are global and use-lists are always
> in order of value ID after reading.  In particular, assume that writing
> to then reading from bitcode will sort the use-lists like this:
>
>     auto compareUsers = [&IDs](User *LHS, User *RHS) {
>         return IDs.lookup(LHS) < IDS.lookup(RHS);
>     }
>
> How can we shuffle use-lists back from value-id-order to the
> original-order (when they don't match)?
>
> We can sort use-lists from original-order to value-id-order along with a
> parallel array of integers:
>
>     sort(0..7) by [id-20, id-7, id-9, id-2, id-18, id-12, id-5, id-24]
>         => [3, 6, 1, 2, 5, 4, 0, 7]
>
> This results in an indirect array that maps from original-order to
> value-id-order.  There are two ways to continue:
>
>  1. Reverse the process to get an indirect array that maps from
>     value-id-order to original-order:
>
>         sort(0..7) by [3, 6, 1, 2, 5, 4, 0, 7]
>             => [6, 2, 3, 0, 5, 4, 1, 7]
>
>     The writer stores this.
>
>     On the reader side, index the use list by corresponding entries in
>     the indirect array:
>
>         index([id-2, id-5, id-7, id-9, id-12, id-18, id-20, id-24])
>             by [6, 2, 3, 0, 5, 4, 1, 7]
>             => [id-20, id-7, id-9, id-2, id-18, id-12, id-5, id-24])
>
>     Indexing would be linear time, but requires auxiliary storage since
>     use-lists are stored in linked lists and would need to be copied
>     into an array.
>
>  2. Alternatively, the writer could store the reverse-indirect array
>     directly.
>
>     The reader can then sort the use list by corresponding entries in
>     the reverse-indirect array:
>
>         sort([id-2, id-5, id-7, id-9, id-12, id-18, id-20, id-24])
>             by [3, 6, 1, 2, 5, 4, 0, 7]
>             => [id-20, id-7, id-9, id-2, id-18, id-12, id-5, id-24])
>
>     Compared to (1), this is less work at runtime overall, but more work
>     on the reader side.
>
> In either case, let's call this array of indexes a "shuffle".
>
> Storing the shuffle
> ===================
>
> There are a few ways to store shuffles.
>
>  1. Store the indirect array.
>
>  2. Store the indirect array, special-cased to use fewer bits when
>     storing smaller numbers.
>
>     E.g., [6, 2, 3, 0, 5, 4, 1, 7] only needs 3 bits per index, so it
>     could be stored in 24-bits.
>
>     I imagine the following calculation for bits-per-index, given an
>     array size `N`:
>
>         if (N <= 1u << 3)  // Fits in a single uint32_t.
>           return 3;
>         if (N <= 1u << 4)  // Fits in a single uint64_t.
>           return 4;
>         if (N <= 1u << 8)  // String of uint8_t.
>           return 8;
>         if (N <= 1u << 16) // String of uint16_t.
>           return 16;
>         return 32;         // String of uint32_t.
>
>  3. Straw man: store a Lehmer code, which uses a successively smaller
>     radix for each index.
>
>     E.g., the indirect array [6, 2, 3, 0, 5, 4, 1, 7] converts to the
>     Lehmer code of [6, 2, 2, 0, 2, 1, 0, 0].
>
> (1) is simplest.  I think (2) is strictly better.  At very little extra
> complexity, we have up to 8 arguments fitting in 32-bits, and 16
> arguments in 64-bits.
>
> The internet says (3) is how to talk about permutations, but I don't see
> practical advantages over (2) for use-list-orders -- I think Lehmer
> codes are more about *generating* permutations.  For small use lists,
> there's only a tiny difference in compression.  For large use lists, the
> quadratic algorithm to convert back to an indirect array dominates.
>
> Where in the bitcode to write the use-lists
> ===========================================
>
> The old plan (unfinished in tree) involves attaching an additional "use
> list" section at the end of the bitcode file, and iterate through the
> functions within that.
>
> This is pretty inefficient during writeout.  Every function needs to be
> re-incorporated to remap its value IDs, and its instructions need to
> remapped to solve the `void` problem.
>
> Worse, the value ID mapping that's built up naturally in the reader
> cannot be leveraged.
>
> It will be easier, more efficient, and less brittle to add
> function-local use-list-orders as a section in each function block.
>
> Plan
> ====
>
>  1. Disable the size optimization that causes constant forward
>     references when storing use-list-orders.  Re-enabling it partially
>     (or entirely) can be a later incremental step.
>
>  2. Save use-list-orders in each function block.  Constants'
>     use-list-orders are saved in the last function block that they're
>     used in (i.e., the first function block where all their users have
>     been serialized).  Global constants whose users are all global are
>     stored in the module block.
>
>  3. Save the use-list-orders as indirect arrays, bit-packing when
>     possible.  (Only save them when the order needs to be shuffled.)
>
>  4. Save the shuffle array that requires less overall processing.
>
> Unresolved questions
> ====================
>
>  1. How important is the size optimization?  How quickly should a
>     solution be found that re-enables it?  How hard is it to enable it
>     partially (in order to avoid creating forward references)?
>
>  2. What about assembly IR?
>
>  3. What about metadata?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140723/38be3a77/attachment.html>