[llvm-dev] RFC: Refactor SubclassData

Bruno Ricci via llvm-dev llvm-dev at lists.llvm.org
Mon Dec 30 05:23:01 PST 2019


Do you have some code we can look at (even if it is in a nasty unpolished state, just mark it WIP
and put it on Phab) ? It is hard to evaluate an alternative without the code. That said I think
that the table is a little bit one-sided. I have added some inline comments.

On 30/12/2019 11:53, Ehud Katz wrote:
> The solution in Clang is still very complicated and error prone. A lot of synchronization work (between different classes - at least in the same hierarchy) needs to be done manually.
> I'll summarize the capabilities of the 3 approaches in the following (kinda) table, using the following columns *[ Current LLVM | Clang | New RFC ]* :
>   * *[_|X|X] static assert *- that the declared accumulated bitfields do not exceed the underlying subclass data size (note that int the New implementation it is automatically added on declaration)
>       o *[_|_|X] automatic static assert *- is adding the static assert needs to be manually or is it done automatically with the declaration of the new bitfield.

This is not actually true. There is only a single static_assert for the size of
the union in Stmt [1] and the same could be done for the union in DeclBase.

>   * *[_|_|X] runtime assert* - that a new value set, fits into the the bitfield (without truncation).

This is true, I agree here that this is useful.

>   * *[_|_|X] typed* - as opposed to using a representative type (like `int`) and then cast to the actual required type (like `bool` or `enum`). Typed (ordinary) bitfields cannot be implemented correctly in MSVC, as the types of all the bitfields must be of the same type. Using typed bitfields also saves us the need to synchronize the use of `unsigned/signed int` with the actual type needed.
>   * *[X|_|X] declare in actual class* - as opposed to one of the base classes.

True, but I am curious to see how you can avoid doing this in the base. The whole point
of doing this bit-field thing is to save space by reusing the padding after the base
subobject. Maybe with an aligned char array reinterpret_cast'ed by the derived classes
as mentioned before? The trick is that we don't want to repeat the members from the
base classes. Also we have to avoid UB.  

>   * *[_|_|X] declare (a bitfield) in a single line* - as opposed to the need to declare helpers or somekind, like `enum` (manually)>   * *[_|_|X] clean bitfields* - without exposing a bit manipulation `enum`.

The enum is only used by the other bit-field classes. This is invisible to the class itself.
No helpers are needed: just write SomeClassBits.SomeBit to refer to SomeBit in the
bit-field of SomeClass.

>   * *[_|_|X] automatic inheritance of unused bits* - no need to get offset from super (manually).
>   * *[_|_|X] automatic calculation of unused bits* - changing a single bitfield doesn't require any other change, but the actual bitfield itself (as opposed to changing also the sum of the bit count used by the class, in an `enum` - for exmple).
>   * *[_|_|X] implicit reference to superclass* - as opposed to the need to use the base class' info explicitly.
>   * *[_|_|X] no need to know anything about any of the base classes*.

I agree that forgetting to update the offset in the enum can be a source of error (but
the enum is just a few lines below so it is hard to miss). Some things to keep in mind:

- Some bit-fields are aligned to a byte boundary for (benchmarked) faster access [2]
- Some bit-field classes have a hole which is used by derived classes [3]
- Some bit-fields do not have a fixed size, just something "large enough".
  (this is not ideal and it would be better to have a well-defined limit
   which could then be used to trigger an error instead of just overflowing,
   that's on my TODO list...) [4]

> I think the table speaks for itself.
> Craig, regarding the `getSubclassDataFromInstruction()`, it still does not turn the tides of the table, above, into the current implementation.


[1] https://github.com/llvm/llvm-project/blob/master/clang/include/clang/AST/Stmt.h#L1088
[2] https://github.com/llvm/llvm-project/blob/master/clang/include/clang/AST/Stmt.h#L471
[3] https://github.com/llvm/llvm-project/blob/master/clang/include/clang/AST/Stmt.h#L849
[4] https://github.com/llvm/llvm-project/blob/master/clang/include/clang/AST/Stmt.h#L804

Bruno Ricci

More information about the llvm-dev mailing list