[PATCH] D78796: [Support] Refactor LEB128 encoding into an input iterator

Tue Apr 28 12:23:36 PDT 2020

nlguillemot added a comment.

In D78796#2007033 <https://reviews.llvm.org/D78796#2007033>, @dblaikie wrote:

> I'm not sure this is worth a full iterator abstraction - would the uses be that much the worse for it if they had a non-iterator type they had to iterate manually & pull values from? A more simplified iterator abstraction, essentially:
>
>   ULEBifier U(Value, IsSigned, PadTo);
>   while (Optional<char> C = U.next())
>     OS.write(*C);
>
>
> Or something like that.

I understand that the std iterator design is not the simplest. It needs a bunch of boilerplate, and the STL iterator APIs have their own quirks that might increase the mental burden for users. With these cons in mind, let me try to justify why I think this design is the right direction.

The main advantage of using the iterator interface is to reuse code in `std::`/`llvm::`. For example, in `FixedLenDecoderEmitter.cpp`, there's the following code:

  // Encode and emit the value to filter against.
  uint8_t Buffer[16];
  unsigned Len = encodeULEB128(Filter.first, Buffer);
  Table.insert(Table.end(), Buffer, Buffer + Len);

With the iterator style interface, this turns into a "one-liner" (ok it's really three lines with formatting but hey...) :

  // Encode and emit the value to filter against.
  std::copy(
    LEB128InputIterator<unsigned>(Filter.first, /* IsSigned */ false, /* PadTo */ 0),
    LEBInputIterator<unsigned>(), std::back_inserter(Table));

At this point, we could simplify it further by make further syntactical improvements:

- Make a utility function like `makeULEB128InputIterator()` to deduce the `ValueT` template argument, to avoid explicitly passing `IsSigned`, and to have `PadTo = 0` as a default value.
- Make a utility function like `makeULEB128InputRange()` that uses `llvm::make_range()` to automatically package the end iterator with the begin iterator.

If we had these further improvements, we could then use `llvm::copy` from STLExtras to get the following code, which is finally truly a "one-liner":

  // Encode and emit the value to filter against.
  copy(makeULEB128InputRange(Filter.first), std::back_inserter(Table));

This shows that we can use this API to simplify existing code, and we can also reuse existing code in `std::` and `llvm::` to simplify it further. The standard iterator interface is what enables this.

The advantage of being able to reuse code in `std::`/`llvm::` is also demonstrated in this patch itself: The existing implementations were reduced to "one-liners" of `std::copy` that only differ by the output iterator.

----

Another point: If somebody wants to write a loop in the style that you showed with `ULEBifier`, that can also be done with this interface (though the example below could be improved by using prefix increment):

  LEB128InputIterator<unsigned> U(Value, IsSigned, PadTo);
  while (U != LEBInputIterator<unsigned>())
    OS.write(*U++);

If we can express the same code with both interface designs, then I think we might as well use the more general design that allows us to build on top of the existing `std::`/`llvm::` algorithms. I think it would be particularly unfortunate if we started with something like `ULEBifier`, then later we added an iterator style wrapper for it anyways, since that would create code duplication and redundant APIs.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78796/new/

https://reviews.llvm.org/D78796