[PATCH] D64428: Teach `llvm-pdbutil pretty -native` about `-injected-sources`

Zachary Turner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 16 13:32:24 PDT 2019


zturner added inline comments.


================
Comment at: llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp:50
+    if (!Name) {
+      consumeError(Name.takeError());
+      return "(invalid file name ref)";
----------------
thakis wrote:
> zturner wrote:
> > rnk wrote:
> > > Hm. I guess perhaps we should've allowed error propagation out of these APIs. I guess that's not fixable without an interface change.
> > The thinking at the time was that anything that might generate an error should happen during load, not on lazy evaluation of the properties.  That's why most of the classes have a `initialize` or `reload()` static method which returns an `Expected<T>`, so that all validation can be done up front.  I'm open to the possibility that that wasn't the best decision though, but it kept the code simple and handled most of the cases we would expect to see in practice.
> Thanks for chiming in! That makes sense to me.
> 
> I changed the code to verify that all referenced strings exist in `reload()` and removed the error handling here. I didn't (yet) do this for getCode() since that'd duplicate quite a bit of code. From your comment, it kind of sounds like the intended design is that InjectedSourceStream should grab _all_ data from disk and store it in memory, and the NativeEnumInjectedSources should refer to that. Is that accurate? If so, InjectedSourceStream should probably have a
> 
> ```
>   struct InjectedSource {
>     uint32_t CRC32;
>     StringRef Name;
>     StringRef ObjName;
>     StringRef VName;
>     PDB_SourceCompression Compression;
>     llvm::MemoryBuffer Contents;
>   };
> ```
> 
> and store a name->InjectedSource map, and check that the CRC32 matches the CRC32 of Contents at load time too. Does that sound right?
> 
> If so, is there an existing method to load the contents of a stream into a MemoryBuffer somewhere?
I don't think you need a MemoryBuffer, because a MemoryBuffer has to be contiguous, which would force a copy out of the mapped PDB.  

That's what the BinaryStreamRef stuff is for, it's essentially a wrapper over a PDB file, and logical offset+length in the PDB file.  You can read from it as if it were a memory buffer, but it handles the case where the buffer isn't contiguous in the backing file for you and only copies when necessary.

So your idea is basically correct, except I'd store a BinaryStreamRef instead of a MemoryBuffer.  I think BinaryStreamReader has a method called readBinaryStreamRef (or something similar), that takes a length and just returns you a binary stream ref that is safe to store (at least until you close the master PDBFile).  

DbiStream::initialize() is a good example, because it does stuff like this already.


================
Comment at: llvm/lib/DebugInfo/PDB/Native/NativeEnumInjectedSources.cpp:50
+    if (!Name) {
+      consumeError(Name.takeError());
+      return "(invalid file name ref)";
----------------
thakis wrote:
> rnk wrote:
> > thakis wrote:
> > > zturner wrote:
> > > > thakis wrote:
> > > > > zturner wrote:
> > > > > > rnk wrote:
> > > > > > > Hm. I guess perhaps we should've allowed error propagation out of these APIs. I guess that's not fixable without an interface change.
> > > > > > The thinking at the time was that anything that might generate an error should happen during load, not on lazy evaluation of the properties.  That's why most of the classes have a `initialize` or `reload()` static method which returns an `Expected<T>`, so that all validation can be done up front.  I'm open to the possibility that that wasn't the best decision though, but it kept the code simple and handled most of the cases we would expect to see in practice.
> > > > > Thanks for chiming in! That makes sense to me.
> > > > > 
> > > > > I changed the code to verify that all referenced strings exist in `reload()` and removed the error handling here. I didn't (yet) do this for getCode() since that'd duplicate quite a bit of code. From your comment, it kind of sounds like the intended design is that InjectedSourceStream should grab _all_ data from disk and store it in memory, and the NativeEnumInjectedSources should refer to that. Is that accurate? If so, InjectedSourceStream should probably have a
> > > > > 
> > > > > ```
> > > > >   struct InjectedSource {
> > > > >     uint32_t CRC32;
> > > > >     StringRef Name;
> > > > >     StringRef ObjName;
> > > > >     StringRef VName;
> > > > >     PDB_SourceCompression Compression;
> > > > >     llvm::MemoryBuffer Contents;
> > > > >   };
> > > > > ```
> > > > > 
> > > > > and store a name->InjectedSource map, and check that the CRC32 matches the CRC32 of Contents at load time too. Does that sound right?
> > > > > 
> > > > > If so, is there an existing method to load the contents of a stream into a MemoryBuffer somewhere?
> > > > I don't think you need a MemoryBuffer, because a MemoryBuffer has to be contiguous, which would force a copy out of the mapped PDB.  
> > > > 
> > > > That's what the BinaryStreamRef stuff is for, it's essentially a wrapper over a PDB file, and logical offset+length in the PDB file.  You can read from it as if it were a memory buffer, but it handles the case where the buffer isn't contiguous in the backing file for you and only copies when necessary.
> > > > 
> > > > So your idea is basically correct, except I'd store a BinaryStreamRef instead of a MemoryBuffer.  I think BinaryStreamReader has a method called readBinaryStreamRef (or something similar), that takes a length and just returns you a binary stream ref that is safe to store (at least until you close the master PDBFile).  
> > > > 
> > > > DbiStream::initialize() is a good example, because it does stuff like this already.
> > > Looking at this some more: For just natvis files, loading everything into memory seems fine, but the docs for IDiaInjectedSource kind of suggest that it's possible to store the source code of all cc files compiled to obj files in the pdb, and loading that eagerly would be quite a bit of data. OTOH, it's only loaded eagerly if PDBFile::getInjectedSourceStream() is called, and the only caller of it currently walks all entries anyhow.
> > > If so, is there an existing method to load the contents of a stream into a MemoryBuffer somewhere?
> > 
> > I don't think so, we moved data into and out of the stream APIs with ArrayRef<uint8_t>, typically stored in BumpPtrAllocators.
> Thanks, I'll land this for now as it makes llvm-pdbutil more useful, and then I can iterate from there.
> 
> I found out how to get source files into /src/headerblock, it's a C# thing (but it still doesn't set obj):
> 
> ```
> C:\src\llvm-mono>type Hello.cs
> using System;
> namespace HelloWorld {
> class Hello {
>   static void Main() { Console.WriteLine("Hello World!"); }
> }
> }
> 
> C:\src\llvm-mono>csc Hello.cs /debug
> Microsoft (R) Visual C# Compiler version 2.10.0.0 (b9fb1610)
> Copyright (C) Microsoft Corporation. All rights reserved.
> 
> 
> C:\src\llvm-mono>csc Hello.cs /debug /nologo
> 
> C:\src\llvm-mono>out\gn\bin\llvm-pdbutil.exe pretty -injected-sources Hello.pdb
> Summary for C:\src\llvm-mono\Hello.pdb
>   Size: 11776 bytes
>   Guid: {9F3B98FA-6D65-D34E-8CF9-86994C5C9569}
>   Age: 1
>   Attributes: HasPrivateSymbols
> ---INJECTED SOURCES---
>   C:\src\llvm-mono\Hello.cs (92 bytes): obj=<null>, vname=c:\src\llvm-mono\hello.cs, crc=3766985073, compression=Unknown (101)
> ```
> 
> (llvm-mono just means "monorepo"; in the c# context it could lead to confusion I suppose :P)
Yea it's fine to land this now and iterate on it.  FWIW, I think it's probably *more* work to get everything into a MemoryBuffer than it is to just save off a BinaryStreamRef, and then it'll just work no matter what or how much stuff you store in there.  But maybe I'm biased :)  

And yes, it's possible to store arbitrary file content into the injected sources.  I had the idea that we could do this for generated code (for example the output of `protoc`), this way you could debug into code that is generated as part of build step and won't ever be archived on source server.  In theory Injected Sources are the way to do this, but the question is really whether the debugger supports it.  It probably does for C#, but not so sure about C++.  If it does, it would be really cool though.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64428/new/

https://reviews.llvm.org/D64428





More information about the llvm-commits mailing list