[PATCH] D85191: [AST] Get field size in chars rather than bits in RecordLayoutBuilder.

Thu Aug 6 06:00:48 PDT 2020

ebevhan added a comment.

In D85191#2197550 <https://reviews.llvm.org/D85191#2197550>, @bjope wrote:

> In D85191#2196863 <https://reviews.llvm.org/D85191#2196863>, @rsmith wrote:
>
>> In D85191#2195923 <https://reviews.llvm.org/D85191#2195923>, @ebevhan wrote:
>>
>>> In D85191#2193645 <https://reviews.llvm.org/D85191#2193645>, @rsmith wrote:
>>>
>>>>> This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.
>>>>
>>>> How can we have a non-bitfield member whose size is not a multiple of the size of a char?
>>>
>>> Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an `i24` in IR -- is to return a non-charsize multiple from getTypeInfo.
>>
>> This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like `bool` reports a size of `CHAR_BIT` bits despite having only 1 value bit, and x86_64 `long double` reports a type size of 128 bits despite having only 80 value bits.
>
> I don't see that it breaks the language rules. The sizeof result for the 24 bit type should be 2 in the target described by @ebevhan  (two 16-bit bytes). But I imagine that without this patch it is reported as 24/16=1, right?

Yes, this is what's happening. The sizeof should be reported as 2 (32 bits), but isn't, because toCharUnitsFromBits always rounds down.

> So isn't the problem that toCharUnitsFromBits is rounding down when given a bitsize that isn't a multiple of CHAR_BIT? Would it perhaps make sense to let it round up instead?

The issue with toCharUnitsFromBits is that it's an inherently dangerous API. There could be cases where you want to round down, and cases where you want to round up. The function cannot know.

It could be better if toCharUnitsFromBits took an extra parameter that explicitly specifies the rounding, and if that parameter is set to a default (for unspecified rounding) and the amount passed is not a multiple of the char size, it asserts. This would make a lot of tests fail until all of the uses are corrected, though.

In D85191#2196863 <https://reviews.llvm.org/D85191#2196863>, @rsmith wrote:

> In D85191#2195923 <https://reviews.llvm.org/D85191#2195923>, @ebevhan wrote:
>
>> In D85191#2193645 <https://reviews.llvm.org/D85191#2193645>, @rsmith wrote:
>>
>>>> This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.
>>>
>>> How can we have a non-bitfield member whose size is not a multiple of the size of a char?
>>
>> Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an `i24` in IR -- is to return a non-charsize multiple from getTypeInfo.
>
> This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like `bool` reports a size of `CHAR_BIT` bits despite having only 1 value bit, and x86_64 `long double` reports a type size of 128 bits despite having only 80 value bits.

But this is the crux of the matter; if you aren't allowed to return non-char-sizes from getTypeInfo, then there's no way to specify via TargetInfo/getTypeInfo that the number of value bits of a type is less than the size in chars. That is, that a type is padded. And as your examples show, C does not disallow that.

In D85191#2197663 <https://reviews.llvm.org/D85191#2197663>, @efriedma wrote:

>> If the intent is for getTypeInfo to always return sizes that are multiples of the char size, then the design should be inverted and getTypeInfo should simply be calling getTypeInfoInChars and multiply that result by the char size. But that isn't how it works.
>
> The reason it doesn't work this way is just that someone made the wrong choice a decade ago, and nobody has spent the time to rewrite it since.  Patch welcome.

This does sound like a good thing to do, but it would be problematic downstream since it would completely prohibit the design that we're trying to use.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85191/new/

https://reviews.llvm.org/D85191