[PATCH] D21157: [pdb] Improve StreamInterface to support writing

Zachary Turner via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 8 18:11:40 PDT 2016

zturner added a comment.

In http://reviews.llvm.org/D21157#453015, @ruiu wrote:

> If you like the readable and writable streams, I don't have a strong opinion to oppose it, since I don't really understand the need here.

So imagine we want to test that our PDB writing code works.  How would we do it?  Because we aren't generating any PDBs yet.  Even if we imagine that 6 months down the line LLD has full support for generating PDBs, how complete will it be?  What if there's one record type that we don't generate correctly, but we don't know because none of our tests compile the right code to exercise it?

The need is basically to be able to enable better testing.  Our support for dumping PDBs is becoming more and more complete, and only a few things remain.  It does this by reading the PDB, building a bunch of internal data structures, and printing them.  When it's time for LLD to generate PDBs, it will also probably use some internal data structures to represent the various streams and records.  Since the ones we're using for reading are already well tested and work pretty well, it would be nice to find a way to reuse as much as possible for writing, because they already work.  Furthermore, by using the same code, we can start testing arbitrarily complex PDBs right now, and then when it comes time for LLD to start writing PDBs, all of the code will already be tested.  The YAML stuff, for example, is just a description of the PDB and the records.  So we read the YAML, use the fields to fill out the structures that describe the PDB, and then use the StreamWriter to write that into an actual PDB.  And then LLD will do almost the same thing, except instead of coming from YAML, the "description" will come from IR metadata.  But everything else will be the same.  Fill out the fields, write the file.

> But it is a bit concerning that the use case you wrote are currently all hypothetical, and we are writing relatively complex code particularly for cache consistency. Do we need to update the cached entries? It might be a good property to guarantee, but we may be able to just say that "when you write something to a stream, all data objects you read before are invalid now -- so don't use it or read it again." It's hard for me to imagine a use case in which you read, write and read some data again and again from/to the same stream. Even if you need it in future, you can add it when it is needed, no?

I thought about doing it that way, but it seemed overly harsh to say if you even write one byte to a stream, any outstanding references pointing into the stream are invalidated.  Mostly I did it because it made the tests much cleaner and shorter since I didn't have to construct brand new objects every time just to write one integer.


More information about the llvm-commits mailing list