[PATCH] D78796: [Support] Refactor LEB128 encoding into an input iterator

Daniel Sanders via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 7 11:55:18 PDT 2020


dsanders added a comment.

I'm inclined to agree that the patch series as-is doesn't really warrant the iterators as the interface as no callers have been updated. However, I also don't see much that's iterator specific (ULEBifier would be roughly similar code leaving the iterator portion as trivial wrappers on the ULEBifier) and there are a few places (particularly in tablegen) that are emitting LEB's into containers where the loop to add bytes one by one is just noise and something like `std::copy(to_uleb(...), std::back_inserter(Table));` would be somewhat nice. The loop could easily be hidden in something like `append_uleb(Table, ...)` though I don't think there's a strong argument for (or against) iterators.

What I would suggest is separating out the byte-sequence generation into a ULEBifier as David suggested but still keeping the iterator object as a thin adapter (effectively implementing that while loop) to support inter-operation with STL functions like std::copy.



================
Comment at: llvm/include/llvm/Support/LEB128.h:26-27
+template <class ValueT> class LEB128InputIterator {
+  /// Denotes whether this object represents an end() iterator.
+  bool IsEnd;
+
----------------
We could fold this into Optional<uint8_t> CurrByte


================
Comment at: llvm/include/llvm/Support/LEB128.h:32-33
+
+  /// Whether this iterator is outputting a signed or unsigned LEB128 data.
+  bool IsSigned;
+
----------------
It may be worth mentioning that this is needed for types that don't carry signedness like APInt. unsigned/int/etc. wouldn't need it


================
Comment at: llvm/include/llvm/Support/LEB128.h:35-36
+
+  /// Whether there will be more output after the previously outputted byte.
   bool More;
+
----------------
This is misleading as the iterator can produce more bytes after this becomes false. I think it's also unnecessary as each time encodeNextByte() shifts Value right by 7 it's bringing in the correct padding bits at the top. We could just keep reading from Value for the padding bytes.


================
Comment at: llvm/include/llvm/Support/LEB128.h:38-42
+  /// The output will be sext-ed/zext-ed to this number of bytes if necessary.
+  unsigned PadTo;
+
+  /// The current number of outputted bytes.
+  unsigned Count;
----------------
Would it make sense to have a RemainingBytes that counts down to zero? Is there a reason to keep PadTo and Count separate?


================
Comment at: llvm/include/llvm/Support/LEB128.h:50
+    CurrByte = Value & 0x7f;
+    if (IsSigned) {
+      int64_t SValue = static_cast<int64_t>(Value);
----------------
Is testing this on each byte measurably slower? It seems unlikely but this is called a lot and I notice that the previous code didn't do it. If it does, it would be good if we can have the template instantiation pick one side or the other


================
Comment at: llvm/include/llvm/Support/LEB128.h:121-136
+    // Pad with 0s or 1s depending on whether we want to zext or sext.
+    uint8_t PadValue;
+    if (IsSigned && static_cast<int64_t>(Value) < 0)
+      PadValue = 0x7f;
+    else
+      PadValue = 0x00;
+
----------------
I think this belongs inside encodeNextByte(). This operator should essentially be something like:
  assert(CurrByte.hasValue() && "operator++() called on past-the-end LEB128InputIterator");
  CurrByte = encodeNextByte();
  return this;
where CurrByte is Optional<uint8_t>. As noted in another comment, I think special-casing the padding bytes isn't really needed.


================
Comment at: llvm/include/llvm/Support/LEB128.h:155-157
+    // Both are end-of-input iterators, so they compare the same.
+    if (IsEnd && Other.IsEnd)
+      return true;
----------------
I see no harm in allowing two past-the-end iterators from different sequences to be equal but I wonder if it's necessary/useful. With the ULEBifier object it could give you an end iterator and then this operator would be a plain equality comparison.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78796/new/

https://reviews.llvm.org/D78796





More information about the llvm-commits mailing list