[PATCH] Coverage: Documentation for the coverage mapping format

Tue Aug 5 15:40:00 PDT 2014

This is an awesome improvement! A few more review comments.

================
Comment at: docs/CoverageMappingFormat.rst:51
@@ +50,3 @@
+
+  ``llvm-cov show test -instr-profile=test.profdata test.c``
+
----------------
Tiny nit: please use `./test` here instead of just `test` so that it is clear that `test` is not a subcommand of `llvm-cov show`. (I personally got tripped up by this) Also move the `./test` after `-instr-profile=test.profdata` (unless the syntax of llvm-cov doesn't allow it? which would be weird)

================
Comment at: docs/CoverageMappingFormat.rst:93-95
@@ +92,5 @@
+profile instrumentation counters are associated with a specific function.
+Each function that requires code coverage has to create
+coverage mapping data that can map between the source code ranges and
+the profile instrumentation counters for that function.
+
----------------
The wording here suggests that the function itself creates the data. Consider starting the sentence instead as "For each function that requires code coverage, the fronted has to create...."

================
Comment at: docs/CoverageMappingFormat.rst:104
@@ +103,3 @@
+the region's kind.
+There are several kinds of mapping regions:
+
----------------
Here you discuss what the different regions are used for, but it isn't clear to the reader what the differences "boil down to". Why are these different kinds needed? Are they represented differently? Consider thinking about this from the perspective of what functionality would be impaired by removing each kind.

================
Comment at: docs/CoverageMappingFormat.rst:106-107
@@ +105,4 @@
+
+* Code regions associate portions of source code and `coverage mapping
+  counters`_.
+  For example:
----------------
Do the other kinds of regions have coverage mapping counters associated with them? If not, then please make the wording here indicate that, e.g. "Code regions are special in that they associate portions ..." (also give some insight into *why* they should be unlike the others in this respect).

================
Comment at: docs/CoverageMappingFormat.rst:196-197
@@ +195,4 @@
+
+Finally, a coverage mapping counter can also represent an execution count of
+of zero. The zero counter is used to provide coverage mapping for
+unreachable statements and expressions, like in the example below:
----------------
If the count is statically known to be zero, then why does it need to be represented at all?

I think it is because the tools that process the coverage information after the fact are "dumb" (don't have the full knowledge of the frontend) and so the frontend has to provide the information to them in this form. It's worth noting this I think.

================
Comment at: docs/CoverageMappingFormat.rst:278
@@ +277,3 @@
+If necessary, the encoded data is padded with zeroes so that the size
+of the data string is rounded up to the nearest multiple of 8.
+
----------------
Please say "8 bits" or "8 bytes" here for clarity.

================
Comment at: docs/CoverageMappingFormat.rst:293-294
@@ +292,4 @@
+
+* The length of the substring that contains the encoded translation unit
+  filenames is 20, thus the filenames are encoded in this string:
+
----------------
Please somehow note that this "20" comes from the second field in `@__llvm_coverage_mapping`

================
Comment at: docs/CoverageMappingFormat.rst:302-304
@@ +301,5 @@
+
+  * Its first byte has a value of 1. It stores the the number of filenames
+    contained in this string.
+  * Its second byte stores the length of the first filename in this string.
+  * The remaining 18 bytes are used to store the first filename.
----------------
At this point you should probably say "in LEB128 format, which is used throughout for storing integers", so that users don't get confused at the arbitrary 256 limit (speaking from personal experience here...)

================
Comment at: docs/CoverageMappingFormat.rst:307-308
@@ +306,4 @@
+
+* The length of the substring that contains the encoded coverage mapping data
+  for the first function record is 9, thus the coverage mapping for the first
+  function record is encoded in this string:
----------------
Again, just for clarity, please explicitly state where in `@__llvm_coverage_mapping` this information is coming from.

================
Comment at: docs/CoverageMappingFormat.rst:317-319
@@ +316,5 @@
+
+  * Its first byte stores the number of file ids used by this function. Its
+    value is 1 as there is only one file id used by the mapping data in this
+    function.
+  * Its second byte has a value of 0. This is an index into the filenames
----------------
Instead of saying "first byte", "second byte", etc. It is probably simpler to just put the data that is being discussed, e.g.

- `c"\01"`: stores the number of file ids ...
- `c"\00"`: an index into the filenames array ...
- etc.

================
Comment at: docs/CoverageMappingFormat.rst:325-326
@@ +324,4 @@
+    counter expressions.
+  * The fourth byte has a value of 1. It stores the number of mapping
+    regions that are stored in an array for file id #0.
+  * The fifth byte stores the coverage mapping counter for the
----------------
Why is the total number of mapping regions for the whole file (which could contain many functions) encoded in the per-function data?

================
Comment at: docs/CoverageMappingFormat.rst:346
@@ +345,3 @@
+
+The function's coverage mapping data is encoded as a stream of bytes,
+with a simple structure. The structure consists of the encoding
----------------
Please say either "The per-function coverage mapping data ..." or "Each function's coverage mapping data ...". Otherwise the reader could be confused and think that this is a description of the whole data string (the 40 bytes in the example above).

================
Comment at: docs/CoverageMappingFormat.rst:397-405
@@ +396,11 @@
+
+A `coverage mapping counter`_ is stored in a single `LEB value <LEB128_>`_.
+The value uses the following encoding:
+
+``[tag : 2, data : 30]``
+
+This value contains
+2 bit fields --- the `tag <counter-tag_>`_
+which is stored in the lowest 2 bits,
+and the `counter data`_ which is stored in the remaining bits.
+
----------------
This isn't clear to me. What exactly does a fixed-size bit field in a LEB128 mean? Is this after it is supposedly interpreted into a 32-bit int? What endianness? Please clarify.

================
Comment at: docs/CoverageMappingFormat.rst:493
@@ +492,3 @@
+
+The value of the counter's tag distinguishes between the pseudo-counters and
+counters --- if the tag is zero, than this header contains a
----------------
By pseudo-counter do you mean "expression counter"?

================
Comment at: docs/CoverageMappingFormat.rst:552-559
@@ +551,9 @@
+
+Filenames
+---------
+
+``[numFilenames : LEB128, filename0 : string, filename1 : string, ...]``
+
+The translation unit's filenames are stored in a stream of bytes and are encoded
+using the `types <cvmtypes_>`_ from the function's coverage mapping
+data encoding.
----------------
This seems a bit out of place here. Isn't the stuff above this all about the structure of the per-function entries, but this is about the filename data (the first 20 bytes of the example above?)

http://reviews.llvm.org/D4729