[PATCH] D53379: GSYM symbolication format

Fri Oct 26 11:15:32 PDT 2018

lemo added inline comments.

================
Comment at: lib/DebugInfo/GSYM/README.md:59
+### Address Data Offsets Table
+The address data offsets table immediately follows the address table and consists of `Header.num_addrs` 32 bit file offsets: one for each address in the address table. The offsets in this table are the absolute file offset to the address data for each address in the address table. Keeping this data separate from the address table helps to reduce the number of pages that are touched when address lookups occur on a GSYM file.
+
----------------
clayborg wrote:
> lemo wrote:
> > Why absolute offsets as opposed to relative offsets into the data section? 
> > 1. at very least it makes it easier to manipulate the file format
> > 2. it may also enable short offsets?
> > 3. also consistent with strings offsets
> We need to be able to binary search this table for your address. If we use relative offsets, then we can't do that. The idea is to mmap this file and use the data as is with minimal setup.
Sorry if I wasn't clear, I was talking about the absolute file offsets: 

> The offsets in this table are the absolute file offset to the address data for each address in the address table.

These are just file pointers to the address data, so relative vs. absolute has nothing to do with the binary search in the address table, right? Also, absolute 32bit file offsets could be limiting simply because they can only address 4Gb (as section-relative offsets this is less of a concern)

================
Comment at: lib/DebugInfo/GSYM/README.md:83
+
+### String Table
+The string table follows the file table in stand alone GSYM files and contains all strings for everything contained in the GSYM file. Any string data should be added to the string table and any references to strings inside GSYM information must be stored as 32 bit string table offsets into this string table.
----------------
clayborg wrote:
> lemo wrote:
> > Have you considered sorting the strings + prefix compression? It's an easy way to compress the strings and would avoid the need for special hasing things like directory / filename split in the FileInfo
> I haven't really done much optimization on paths other that split them into directory and filename so file entries can share the strings. One thing we could do is allow strings to be specified in the string table with a length for the file table. That way we could have a long path: /a/b/c/d and refer to "/a", "/a/b", "/a/b/c" and "/a/b/c/d" using the same string. I am open to ideas here. I kept it simple to start with.
What I had in mind is to simply provision for prefix compression - for example every string having an optional link to its prefix (which can also be prefix compressed). What do you think?

================
Comment at: lib/DebugInfo/GSYM/README.md:84
+### String Table
+The string table follows the file table in stand alone GSYM files and contains all strings for everything contained in the GSYM file. Any string data should be added to the string table and any references to strings inside GSYM information must be stored as 32 bit string table offsets into this string table.
+
----------------
Can you please document the exact format used to store strings? (even if it's just to note that it uses the same format as .debug_str for example)

================
Comment at: lib/DebugInfo/GSYM/README.md:103
+```
+The address data starts with a 32 bit type, followed by a 32 bit length, followed by an array of bytes that encode each specify kind of data.
+The `AddressData.type` is an enumeration value:
----------------
clayborg wrote:
> lemo wrote:
> > nit: some types of data may have an implicit payload size so the `length` seems wasteful (I'd put as prefix in the type-specific payload instead)
> I think having the length defined is essential to the format. It allows you to skip any data you don't care about with knowing what it contains.
Then at least encode the len as LEB128?

Repository:
  rL LLVM

https://reviews.llvm.org/D53379