[LLVMdev] asan coverage

Wed Feb 19 09:53:39 PST 2014

Kostya Serebryany <kcc at google.com> writes:
> I've built chromium with " -fprofile-instr-generate -fsanitize=address"
> -- the performance looks good!
> The file format from r198638 is indeed rudimentary.
> Do you already know how the real output format will look like?

We have an idea of what it will look like, but we're still working out
some details.

> Just to summarize what I think is important: 
>   - minimal size on disk, minimal amount of files
>   - minimal i/o while writing to disk, no lockf or some such
>   - separate process produces separate file(s)
>   - fast merging of outputs from different runs of the same binary
>   - only the coverage output files and the binary (+DSOs) are required to
> "symbolize" the coverage data (map the data to file names & line numbers)

I think we agree on all of these goals. Specifically, we're going for:

- A binary format, one file per program (rather than per-source or
  per-object).
- Each run of the process generates data for that run only.
- Merging outputs will be done by a separate tool
- The data file and the binary should be sufficient for symbolizing.

Additionally, we want fast lookup of data for a particular function, but
that's less important for coverage than it is for PGO.

> Ideally, symbolizing the cov data will use the debug info in the binary (i.e.
> llvm-symbolizer, addr2-line or atos),
> this is what we've done in AsanCov, but I don't see a clean way to make it
> work with counters...

We may need some additional info. I haven't put a ton of thought into
this, but I'm hoping we can either (a) use debug info as is or add some
extra (valid) debug info to support this, or (b) add an extra
debug-info-like section to instrumented binaries with the information we
need.