<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Wed, Feb 22, 2017 at 11:29 AM David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">Haven't got to this but would like to take a look/review it before it goes in.<br class="gmail_msg"><br class="gmail_msg">*skimming over some of the description*<br class="gmail_msg"><br class="gmail_msg">Sounds like 'stream' might not be the right terminology - since they return pointers into data that (I think) remains valid for the life of the stream? (this also makes me wonder a bit about memory usage if the cross-block operation is used a lot (causing allocations) but the values are then discarded by the user - the memory can't be reused, it's effectively leaked)<br class="gmail_msg"><br class="gmail_msg"></div></blockquote><div>cross-block allocation is the exception though because PDB is weird, I don't expect that to be heavily used in other places.  That said, in the PDB's implementation of BinaryStream we take this into account and cache allocation requests at a given offset.  The data isn't just random, it's structured into records, and so we expect reads for the same data to always be at the same offsets (if you've got a list of records, you're going to read records 1, 2, and 3, not some random bytes in the middle).  Because of that, we can cache allocation requests for a particular offset and size, and if we see another request come in that was previously cached, we just return that.  It still handles the "weird" case anyway just for the sake of completeness, but I don't think it's actually needed.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">Also - the wrappers ThinStreamReader/Writer - do they benefit substantially from being thin-stream aware, rather than abstractions around any stream of bytes? (since they transform the data anyway)<br class="gmail_msg"></div></blockquote><div>This is actually a surprisingly handy abstraction.  Since a BinaryStream (I've abandoned the term Thin btw, but we can discuss the name orthogonally if you like) most likely represents some kind of structured format with records, you often have quite a bit of data to read out.  And since BinaryStream can be any arbitrary implementation, you don't always have a single list of bytes representing everything.  I'll use the canonical PDB case where you've got all these blocks spread around.  I might know that I have 1000 records, but not where they come from.  Any given record might cross a boundary and return me a pointer allocated from the stream's internal BumpPtrAllocator.  But if I just want to read 1000 records, it's convenient to *treat* the entire thing as one contiguous sequence.  So I just make a BinaryStreamReader and iterate over it until it becomes empty, reading out the records.  If I had to give it a sequence of bytes, this would become very cumbersome.</div><div><br></div><div>That said, there is one implementation of BinaryStream that is just a sequence of bytes.  So this is general enough to support that.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">Oh, reinterpret casting... hrm. That kind of file reading/writing scheme usually makes me a bit uncomfortable due to portability concerns (having to align, byte swap, etc, structs to match the on-disk format can make those structures problematic to work with - if you have to byte swap anyway, you'd need to copy the data out of the underlying buffer anyway, right?)</div></blockquote><div>For byte swapping usually this is done by using llvm's endian aware types.  If you have a struct with a bunch of llvm::support::ulittle16_t's, then it doesn't matter what platform you're on, you can reinterpret_cast the pointer to that struct, and when you read the values they will be correct.</div><div><br></div><div>On the other hand, sometimes it's convenient to just say "give me an integer", and in that case the read function lets you specify what endianness you want (defaults to host) and it copies the value into an output parameter rather than returning a pointer.  In either case, the user of the API never has to deal with byte swapping.</div><div><br></div><div>I'll upload a patch shortly.</div></div></div>