[PATCH] D62701: [PDB] Copy inlinee lines records into the PDB

Fri May 31 14:01:22 PDT 2019

rnk marked an inline comment as done.
rnk added inline comments.

================
Comment at: lld/test/COFF/pdb-inlinees-extrafiles.s:6
+
+# The assembly was hand written to model the following C code. As of this
+# writing, clang does not emit extra files for inlinees, so it had to be hand
----------------
aganea wrote:
> aganea wrote:
> > rnk wrote:
> > > aganea wrote:
> > > > Why not:
> > > > ```
> > > > $ cl /Z7 /c /O2 t.c
> > > > $ obj2yaml t.obj >t.yaml
> > > > ```
> > > > Which changes the test to:
> > > > ```
> > > > # REQUIRES: x86
> > > > # RUN: yaml2obj %s -o=%t.obj
> > > > # RUN: lld-link -entry:main -nodefaultlib %t.obj -out:%t.exe -pdb:%t.pdb -debug
> > > > # RUN: llvm-pdbutil dump -il %t.pdb | FileCheck %s
> > > > ```
> > > > I think it should be made clear/easy to re-generate the tests, if we want to change it, or if someone wants to duplicate it.
> > > > 
> > > Basically it boils down to, which do we think is a more useful test format, YAML or assembly? My preference is for assembly, and I'd like to replace a lot of the .test YAML inputs with .s inputs. Maybe that's unique to me, but it mirrors the direction the ELF linker took, where they moved away from YAML object input tests to assembly tests.
> > > 
> > > I can't generate assembly from MSVC, so I started with clang assembly output, and modified it to exercise the corner case in question.
> > Would there be a way to better represent "structured data" in the assembly below? Can you show a mock of what your intern is doing?
> > 
> > If you choose to go the ELF way for coherence, that is justified. Was the move in ELF because of file size concerns?
> > 
> > 
> Oh I see, you want to represent both code and debug info in the same textual formal, and YAML doesn't offer that (only through raw `SectionData`s. If the assembly below could offer structured representations (and named fields), in essence YAML readability, that would be a great!
Exactly, YAML doesn't do a very good job of representing machine code or relocations in a very human readable way. The code is just in hex SectionData, and the relocations are stored separately in an array, so it's not easily modifiable.

Here's an excerpt of the project proposal I wrote:

---

Currently, type info is emitted using CodeViewRecordIO as bytes, and then dumped as textual comments in the assembler output. It currently looks like this:

```
	# Struct (0x1003) {
	#   TypeLeafKind: LF_STRUCTURE (0x1505)
	#   MemberCount: 2
	#   Properties [ (0x200)
	#     HasUniqueName (0x200)
	#   ]
	#   FieldList: <field list> (0x1002)
	#   DerivedFrom: 0x0
	#   VShape: 0x0
	#   SizeOf: 16
	#   Name: Foo
	#   LinkageName: .?AUFoo@@
	# }
	.byte	0x22, 0x00, 0x05, 0x15
	.byte	0x02, 0x00, 0x00, 0x02
	.byte	0x02, 0x10, 0x00, 0x00
	.byte	0x00, 0x00, 0x00, 0x00
	.byte	0x00, 0x00, 0x00, 0x00
	.byte	0x10, 0x00, 0x46, 0x6f
	.byte	0x6f, 0x00, 0x2e, 0x3f
	.byte	0x41, 0x55, 0x46, 0x6f
	.byte	0x6f, 0x40, 0x40, 0x00
```

However, this is needlessly hard to read. It would be much nicer if we emitted assembly that looked like this:

```
        .short 34       # RecordLen
        .short 0x1505   # LF_STRUCTURE
        .long 0x200     # Properties
        .long 0x1002    # FieldList
        .long 0x0       # DerivedFrom
        .long 0x0       # VShape
        .short 16       # SizeOf
        .asciz "Foo"    # Name
        .asciz ".?AUFoo@@"      # LinkageName
```

---

Nilanjana is still getting started and doesn't have a Phabricator account yet, but this is kind of the direction that I want to go. I figured we'd start with type records, then try to make symbol records more structured, and then move on to .cv_def_range, which is pretty gross right now.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62701/new/

https://reviews.llvm.org/D62701