[PATCH] D84306: [clang-format][NFC] Be more careful about the layout of FormatToken.

Bruno Ricci via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 23 08:03:56 PDT 2020


riccibruno marked 3 inline comments as done.
riccibruno added inline comments.


================
Comment at: clang/lib/Format/ContinuationIndenter.cpp:652
+      (Current.isNot(TT_LineComment) ||
+       Previous.getBlockKind() == BK_BracedInit)) {
     State.Stack.back().Indent = State.Column + Spaces;
----------------
MyDeveloperDay wrote:
> I think this is better in that its now easier perhaps to see when the block kind gets checked:
> 
> I wonder if it would read even better as if we added `is(BraceBlockKind)`   `isNot(BraceBlockKind)` 
> 
> e.g.
> 
> `Previous.is(BK_BraceInit)`
Yes I think I agree with you; adding `is` and `isNot` would be an improvement. I will still leave `getBlockKind` if someone wants to switch on it.


================
Comment at: clang/lib/Format/FormatToken.h:151
+        BlockKind(BK_Unknown), Type(TT_Unknown), Decision(FD_Unformatted),
+        PackingKind(PPK_Inconclusive) {}
 
----------------
MyDeveloperDay wrote:
> I much prefer putting the initialization here, I think it makes it MUCH clearer
It is necessary anyway since a bit-field cannot have a default member initializer pre-C++20.


================
Comment at: clang/lib/Format/FormatToken.h:177
   /// Indicates that this is the first token of the file.
-  bool IsFirst = false;
+  unsigned IsFirst : 1;
 
----------------
MyDeveloperDay wrote:
> educate me, why
> 
> ```
> unsigned IsFirst : 1;
> ```
> 
> here and not
> 
> ```
> bool IsFirst : 1;
> ```
> 
> is that equivalent? (I'm literally not sure myself), I wrote a little test just to remind myself how this stuff works.
> 
> ```
> #include <iostream>
> 
> class Foo
> {
> public:
>     Foo()
>       : A(true)
>       , B(false)
>       , C(true)
>     {
>     }
>     bool A : 1;
>     bool B : 1;
>     bool C : 1;
> };
> 
> class Bar
> {
> public:
>     Bar()
>       : A(true)
>       , B(false)
>       , C(true)
>     {
>     }
>     unsigned A : 1;
>     unsigned B : 1;
>     unsigned C : 1;
> };
> 
> class Fuz
> {
> public:
>     Fuz()
>       : A(true)
>       , B(false)
>       , C(true)
>     {
>     }
>     bool A;
>     bool B;
>     bool C;
> };
> 
> class Baz
> {
> public:
>     Baz()
>       : A(true)
>       , B(false)
>       , C(true)
>     {
>     }
>     unsigned A;
>     unsigned B;
>     unsigned C;
> };
> 
> int
> main(int argc, char *argv[])
> {
>     std::cerr << "Foo " << sizeof(Foo) << "\n";
>     std::cerr << "Bar " <<sizeof(Bar) << "\n";
>     std::cerr << "Fuz " <<sizeof(Fuz) << "\n";
>     std::cerr << "Baz " <<sizeof(Baz) << "\n";
> 
>     return 0;
> }
> ```
> 
> When run gives the following:
> 
> ```
> Foo 1
> Bar 4
> Fuz 3
> Baz 12
> ```
> 
> So I guess my question is could there be more space savings if using bool IsFirst:1  (and the others), I'd also think that would help clarity a little (or did I misunderstand?)
> 
> 
It has to do with the ABI since as per [class.bit]p1:

`[...] Allocation of bit-fields within a class object is implementation-defined. Alignment of bit fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ]`

Now the two relevant ABIs are the Itanium ABI (https://github.com/itanium-cxx-abi/cxx-abi/blob/master/abi-layout.html for the details) and the MS ABI (say on x86-64). Happily both are supported by clang so we can use `-fdump-record-layouts` and compare them.

Consider `S0` in https://godbolt.org/z/orYv5j (itanium on the left/MS on the right):

Both ABIs will not let a bit-field cross a boundary. Therefore `c0` and `c1` will use 10 bits. If the type had been `short` instead of `char` then only 9 bits would have been used. The size of `S0` would still have been 2 in both cases.

Consider now `S1`. The MS ABI, unlike the Itanium ABI, will not put bit-fields together if their types have a different size. Therefore `sizeof(S1)` is 2 under the Itanium ABI and `4` under the MS ABI.

Using an `unsigned` systematically avoids having a boundary every 8 bits, and avoids the issue with the MS ABI.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84306/new/

https://reviews.llvm.org/D84306





More information about the cfe-commits mailing list