[PATCH] D49410: [PDB] Parse UDT symbols and pointers to members

Aleksandr Urakov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 25 01:55:35 PDT 2018


aleksandr.urakov added inline comments.


================
Comment at: lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp:37
+  };
+  union {  // Test unnamed union. MSVC treats it as `int a; float b;`
+    int a;
----------------
aleksandr.urakov wrote:
> aleksandr.urakov wrote:
> > Hui wrote:
> > > aleksandr.urakov wrote:
> > > > Here is a problem. `MicrosoftRecordLayoutBuilder` asserts every field or base offset, but in our case fields `a` and `b` are treated as `struct Complex`'s fields, not `union`'s, so lldb crashes in debug on this. I can't find enough info in PDB to restore the unnamed union here. Do you have any ideas about it?
> > > 
> > > Based on MSVC cl yielded PDB, you could have full information to restore the unnamed UDT.
> > > 
> > > From my experience, PDB yielded by clang-cl  (/Z7 or /Zi) is slightly different from the one by cl.
> > > 
> > > Both contain information about forwarded unnamed UDT.
> > > However PDB yielded by clang-cl  more or less lacks the member information.  See below.
> > > 
> > > The CodeView info is good. Maybe you need to look at LLC?
> > > 
> > > CodeView 
> > >  
> > > ```
> > > FieldList (0x1044) {
> > >     TypeLeafKind: LF_FIELDLIST (0x1203)
> > >     DataMember {
> > >       TypeLeafKind: LF_MEMBER (0x150D)
> > >       AccessSpecifier: Public (0x3)
> > >       Type: int (0x74)
> > >       FieldOffset: 0x0
> > >       Name: a
> > >     }
> > >     DataMember {
> > >       TypeLeafKind: LF_MEMBER (0x150D)
> > >       AccessSpecifier: Public (0x3)
> > >       Type: float (0x40)
> > >       FieldOffset: 0x0
> > >       Name: b
> > >     }
> > >   }
> > > 
> > > Union (0x1045) {
> > >     TypeLeafKind: LF_UNION (0x1506)
> > >     MemberCount: 2
> > >     Properties [ (0x608)
> > >       HasUniqueName (0x200)
> > >       Nested (0x8)
> > >       Sealed (0x400)
> > >     ]
> > >     FieldList: <field list> (0x1044)
> > >     SizeOf: 4
> > >     Name: Complex::<unnamed-tag>
> > >     LinkageName: .?AT<unnamed-type-$S2>@Complex@@
> > >   }
> > > 
> > > ```
> > > 
> > > llvm-pdbutil  pdb  (clang-cl /z7)
> > > 
> > > (found unnamed symbol, however size = 0, they will be just ignored. See PDBASTParser.cpp #259
> > > The size should not be zero) 
> > > 
> > > 
> > > ```
> > >     struct Complex::<unnamed-tag> [sizeof = 0] {}
> > > 
> > >     union Complex::<unnamed-tag> [sizeof = 0] {}
> > > 
> > >  struct Complex [sizeof = 728] {
> > >       data +0x00 [sizeof=720] _List* array[90]
> > >       data +0x2d0 [sizeof=4] int x
> > >       data +0x2d4 [sizeof=4] int a
> > >       data +0x2d4 [sizeof=4] float b
> > >     }
> > > 
> > > ```
> > > 
> > > llvm-pdbutil pdb ( cl /z7)
> > > 
> > > ( you have full information to restore unnamed) 
> > > ```
> > > 
> > > struct Complex [sizeof = 728] {
> > >       data +0x00 [sizeof=720] _List* array[90]
> > >       data +0x2d0 [sizeof=4] int x
> > >       data +0x2d4 [sizeof=4] int a
> > >       data +0x2d4 [sizeof=4] float b
> > >     }
> > > 
> > >     Total padding 3 bytes (25% of class size)
> > >     Immediate padding 3 bytes (25% of class size)
> > > 
> > >     struct Complex::<unnamed-tag> [sizeof = 4] {
> > >       data +0x00 [sizeof=4] int x
> > >     }
> > > 
> > >     union Complex::<unnamed-tag> [sizeof = 4] {
> > >       data +0x00 [sizeof=4] int a
> > >       data +0x00 [sizeof=4] float b
> > >     }
> > > ```
> > > 
> > > 
> > Thank you! But what means `LLC`?
> I have figured that out, sorry. I usually use disassembly tools for this purpose.
I have just dumped two PDBs, one was produced with `cl` and `link`, and other with `clang-cl` and `lld-link` with the same keys (`/Zi /GS- /c` for compilation, `/nodefaultlib /debug:full /entry:main` for linking). I have compiled the source:

```
struct S {
  struct {
    char a;
    short b;
  };
  short c;
  union {
    short d;
    int e;
  };
};

int main() {
  S ss[sizeof(S)];
  return 0;
}
```

and have retrieved identical type infos from PDBs:

```
struct S::<unnamed-tag> [sizeof = 0] {}

union S::<unnamed-tag> [sizeof = 0] {}

struct S [sizeof = 12] {
  data +0x00 [sizeof=1] char a
  <padding> (1 bytes)
  data +0x02 [sizeof=2] short b
  data +0x04 [sizeof=2] short c
  <padding> (2 bytes)
  data +0x08 [sizeof=2] short d
  data +0x08 [sizeof=4] int e
}
Total padding 3 bytes (25% of class size)
Immediate padding 3 bytes (25% of class size)

struct S::<unnamed-tag> [sizeof = 4] {
  data +0x00 [sizeof=1] char a
  <padding> (1 bytes)
  data +0x02 [sizeof=2] short b
}
Total padding 1 bytes (25% of class size)
Immediate padding 1 bytes (25% of class size)

union S::<unnamed-tag> [sizeof = 4] {
  data +0x00 [sizeof=2] short d
  data +0x00 [sizeof=4] int e
}
```

So it seems that both `cl` and `clang` emit enough info to restore the layout of unnamed unions or structs. But we also need to:

- Find a location of unnamed fields of such types in the outer structure (`S` in our case);
- Somehow drop fields `a`, `b`, `d` and `e` from it (because we will place the unnamed fields of the unnamed types there).

And I can't find enough info for that. How do you think, is it possible?


https://reviews.llvm.org/D49410





More information about the llvm-commits mailing list