[PATCH] D62701: [PDB] Copy inlinee lines records into the PDB

Mon Jun 3 12:00:05 PDT 2019

rnk marked an inline comment as done.
rnk added inline comments.

================
Comment at: lld/test/COFF/pdb-inlinees-extrafiles.s:6
+
+# The assembly was hand written to model the following C code. As of this
+# writing, clang does not emit extra files for inlinees, so it had to be hand
----------------
aganea wrote:
> aganea wrote:
> > rnk wrote:
> > > aganea wrote:
> > > > aganea wrote:
> > > > > rnk wrote:
> > > > > > aganea wrote:
> > > > > > > Why not:
> > > > > > > ```
> > > > > > > $ cl /Z7 /c /O2 t.c
> > > > > > > $ obj2yaml t.obj >t.yaml
> > > > > > > ```
> > > > > > > Which changes the test to:
> > > > > > > ```
> > > > > > > # REQUIRES: x86
> > > > > > > # RUN: yaml2obj %s -o=%t.obj
> > > > > > > # RUN: lld-link -entry:main -nodefaultlib %t.obj -out:%t.exe -pdb:%t.pdb -debug
> > > > > > > # RUN: llvm-pdbutil dump -il %t.pdb | FileCheck %s
> > > > > > > ```
> > > > > > > I think it should be made clear/easy to re-generate the tests, if we want to change it, or if someone wants to duplicate it.
> > > > > > > 
> > > > > > Basically it boils down to, which do we think is a more useful test format, YAML or assembly? My preference is for assembly, and I'd like to replace a lot of the .test YAML inputs with .s inputs. Maybe that's unique to me, but it mirrors the direction the ELF linker took, where they moved away from YAML object input tests to assembly tests.
> > > > > > 
> > > > > > I can't generate assembly from MSVC, so I started with clang assembly output, and modified it to exercise the corner case in question.
> > > > > Would there be a way to better represent "structured data" in the assembly below? Can you show a mock of what your intern is doing?
> > > > > 
> > > > > If you choose to go the ELF way for coherence, that is justified. Was the move in ELF because of file size concerns?
> > > > > 
> > > > > 
> > > > Oh I see, you want to represent both code and debug info in the same textual formal, and YAML doesn't offer that (only through raw `SectionData`s. If the assembly below could offer structured representations (and named fields), in essence YAML readability, that would be a great!
> > > Exactly, YAML doesn't do a very good job of representing machine code or relocations in a very human readable way. The code is just in hex SectionData, and the relocations are stored separately in an array, so it's not easily modifiable.
> > > 
> > > Here's an excerpt of the project proposal I wrote:
> > > 
> > > ---
> > > 
> > > Currently, type info is emitted using CodeViewRecordIO as bytes, and then dumped as textual comments in the assembler output. It currently looks like this:
> > > 
> > > ```
> > > 	# Struct (0x1003) {
> > > 	#   TypeLeafKind: LF_STRUCTURE (0x1505)
> > > 	#   MemberCount: 2
> > > 	#   Properties [ (0x200)
> > > 	#     HasUniqueName (0x200)
> > > 	#   ]
> > > 	#   FieldList: <field list> (0x1002)
> > > 	#   DerivedFrom: 0x0
> > > 	#   VShape: 0x0
> > > 	#   SizeOf: 16
> > > 	#   Name: Foo
> > > 	#   LinkageName: .?AUFoo@@
> > > 	# }
> > > 	.byte	0x22, 0x00, 0x05, 0x15
> > > 	.byte	0x02, 0x00, 0x00, 0x02
> > > 	.byte	0x02, 0x10, 0x00, 0x00
> > > 	.byte	0x00, 0x00, 0x00, 0x00
> > > 	.byte	0x00, 0x00, 0x00, 0x00
> > > 	.byte	0x10, 0x00, 0x46, 0x6f
> > > 	.byte	0x6f, 0x00, 0x2e, 0x3f
> > > 	.byte	0x41, 0x55, 0x46, 0x6f
> > > 	.byte	0x6f, 0x40, 0x40, 0x00
> > > ```
> > > 
> > > However, this is needlessly hard to read. It would be much nicer if we emitted assembly that looked like this:
> > > 
> > > ```
> > >         .short 34       # RecordLen
> > >         .short 0x1505   # LF_STRUCTURE
> > >         .long 0x200     # Properties
> > >         .long 0x1002    # FieldList
> > >         .long 0x0       # DerivedFrom
> > >         .long 0x0       # VShape
> > >         .short 16       # SizeOf
> > >         .asciz "Foo"    # Name
> > >         .asciz ".?AUFoo@@"      # LinkageName
> > > ```
> > > 
> > > ---
> > > 
> > > Nilanjana is still getting started and doesn't have a Phabricator account yet, but this is kind of the direction that I want to go. I figured we'd start with type records, then try to make symbol records more structured, and then move on to .cv_def_range, which is pretty gross right now.
> > What about a new `.codeview` directive which would include more structured data, and also solve .cv_def_range? You would get the best of both worlds, assembly code and readability for debug infos.
> > ```
> > .codeview "
> >       - Kind:            LF_STRUCTURE
> >         Class:
> >           MemberCount:     0
> >           Options:         [ None, ForwardReference, HasUniqueName ]
> >           FieldList:       0
> >           Name:            Foo
> >           UniqueName:      '.?AUFoo@@'
> >           DerivationList:  0
> >           VTableShape:     0
> >           Size:            0
> > "
> > ```
> Nothing prevents from writing partial or entire sections with this:
> ```
> .codeview "
> sections:        
>   - Name:            '.debug$S'
>     Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ]
>     Alignment:       1
>     Subsections:     
>       - !Symbols
>         Records:         
>           - Kind:            S_OBJNAME
>             ObjNameSym:      
>               Signature:       0
>               ObjectName:      'F:\svn\test\t.obj'
>           - Kind:            S_COMPILE3
>             Compile3Sym:     
>               Flags:           [ SecurityChecks, HotPatch ]
>               Machine:         X64
>               FrontendMajor:   19
>               FrontendMinor:   16
>               FrontendBuild:   27031
>               FrontendQFE:     1
>               BackendMajor:    19
>               BackendMinor:    16
>               BackendBuild:    27031
>               BackendQFE:      1
>               Version:         'Microsoft (R) Optimizing Compiler'
> ```
That's an interesting idea. Up until now, we haven't allowed llc, clang, or MC to have dependencies on ObjectYAML because we believed it was too large. We'd have to check that belief, and maybe optimize the code size of this feature, which will be seldomly used. But, it's not like we've been policing the size of the clang binary that much, and I'm sure there are other ways we could optimize its size.

One concern I have is that I want to be able to test feeding corrupt records to LLD. I think we can allow mixing YAML symbol records with today's assembler declared codeview records. If we do that, we won't lose any testing capabilities.

I'll have to think about it.

One other thing is, how do we get labels and relocations into and out of YAML? Today a lot of those things are only supported as offsets into sections and section indices. That's sort of the strong suit of assembly, which is really all about laying out code and finalizing offsets.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62701/new/

https://reviews.llvm.org/D62701