[PATCH] D78796: [Support] Refactor LEB128 encoding into an input iterator

Nicolas Guillemot via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri May 8 11:48:20 PDT 2020


nlguillemot marked 6 inline comments as done.
nlguillemot added a comment.

Thanks for the comments. Added some replies.



================
Comment at: llvm/include/llvm/Support/LEB128.h:35-36
+
+  /// Whether there will be more output after the previously outputted byte.
   bool More;
+
----------------
dsanders wrote:
> This is misleading as the iterator can produce more bytes after this becomes false. I think it's also unnecessary as each time encodeNextByte() shifts Value right by 7 it's bringing in the correct padding bits at the top. We could just keep reading from Value for the padding bytes.
> his is misleading as the iterator can produce more bytes after this becomes false. 

For context, I named it that way because that's how it was named in the original implementation of the code. I agree it could be better.




================
Comment at: llvm/include/llvm/Support/LEB128.h:38-42
+  /// The output will be sext-ed/zext-ed to this number of bytes if necessary.
+  unsigned PadTo;
+
+  /// The current number of outputted bytes.
+  unsigned Count;
----------------
dsanders wrote:
> Would it make sense to have a RemainingBytes that counts down to zero? Is there a reason to keep PadTo and Count separate?
For context, the `Count` and `PadTo` variables are there to match the original implementation of the code. Maybe could use a refactoring.


================
Comment at: llvm/include/llvm/Support/LEB128.h:50
+    CurrByte = Value & 0x7f;
+    if (IsSigned) {
+      int64_t SValue = static_cast<int64_t>(Value);
----------------
mkitzan wrote:
> dsanders wrote:
> > Is testing this on each byte measurably slower? It seems unlikely but this is called a lot and I notice that the previous code didn't do it. If it does, it would be good if we can have the template instantiation pick one side or the other
> `IsSigned` could easily be changed to a template parameter. Could then turn this is into `if constexpr (IsSigned)` (if you can use C++17), or just SFINAE it. Better yet, use [[ https://en.cppreference.com/w/cpp/types/is_signed | `is_signed` type trait ]] on `ValueT`.
Last I heard, we have to support C++14, so no `if constexpr` fanciness allowed. :(

Using `is_signed` would work for types like `uint64_t` and `int64_t`, but not `APInt`, so we somehow need to figure out how to handle that case.

Seems like there's general agreement that it might as well be a template parameter though.


================
Comment at: llvm/include/llvm/Support/LEB128.h:80
+  /// Constructs the end-of-input iterator.
+  LEB128InputIterator() : IsEnd(true) {}
+
----------------
mkitzan wrote:
> This constructor leaves other data members uninitialized which may be fine for an end iterator, but is a code smell [[ http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-complete | C.41 ]].
I don't think this breaks that core guideline, since the invariants are set and the object is totally usable, though there should probably be an assert for `!IsEnd` at the start of `operator++` to enforce the API requirements.

On the other hand I agree it might be good to initialize the other members anyways, just to be on the safe side, and to avoid having a copy constructor that reads uninitialized memory. The alternative is to hand-write the copy constructor to avoid copying the other fields if `IsEnd` is set, which sounds ugly and less maintainable.


================
Comment at: llvm/include/llvm/Support/LEB128.h:132
+      // Add a continuation bit to signal that there is more padding after this.
+      if (Count < PadTo - 1)
+        CurrByte |= 0x80;
----------------
mkitzan wrote:
> If `PadTo` is 0, may have unintentional underflow since it is `unsigned` (this may be you intention though, and if so it's worth a comment).
> 
> Seeing it a lot in the other code. Maybe underflow is part of the plan here?
If `PadTo` is 0 then it's impossible for the check above of `if (Count < PadTo)` to pass, so we would never hit this line of code. This code should be safe at least. If there are other cases we should double-check them as well.


================
Comment at: llvm/include/llvm/Support/LEB128.h:155-157
+    // Both are end-of-input iterators, so they compare the same.
+    if (IsEnd && Other.IsEnd)
+      return true;
----------------
dsanders wrote:
> I see no harm in allowing two past-the-end iterators from different sequences to be equal but I wonder if it's necessary/useful. With the ULEBifier object it could give you an end iterator and then this operator would be a plain equality comparison.
The same design issue also happens with existing std input iterators like `std::istream_iterator`, so I think it's acceptable even if slightly odd.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78796/new/

https://reviews.llvm.org/D78796





More information about the llvm-commits mailing list