[all-commits] [llvm/llvm-project] 7fdf27: [dfsan] Track origin at loads

Thu Apr 22 09:26:19 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 7fdf270965584a3b63ffed85d3c1ef20b3510668
      https://github.com/llvm/llvm-project/commit/7fdf270965584a3b63ffed85d3c1ef20b3510668
  Author: Jianzhou Zhao <jianzhouzh at google.com>
  Date:   2021-04-22 (Thu, 22 Apr 2021)

  Changed paths:
    M compiler-rt/lib/dfsan/dfsan.cpp
    A compiler-rt/test/dfsan/origin_track_ld.c
    M llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
    M llvm/test/Instrumentation/DataFlowSanitizer/basic.ll
    A llvm/test/Instrumentation/DataFlowSanitizer/origin_track_load.ll

  Log Message:
  -----------
  [dfsan] Track origin at loads

    The first version of origin tracking tracks only memory stores. Although
    this is sufficient for understanding correct flows, it is hard to figure
    out where an undefined value is read from. To find reading undefined values,
    we still have to do a reverse binary search from the last store in the chain
    with printing and logging at possible code paths. This is
    quite inefficient.

    Tracking memory load instructions can help this case. The main issues of
    tracking loads are performance and code size overheads.

    With tracking only stores, the code size overhead is 38%,
    memory overhead is 1x, and cpu overhead is 3x. In practice #load is much
    larger than #store, so both code size and cpu overhead increases. The
    first blocker is code size overhead: link fails if we inline tracking
    loads. The workaround is using external function calls to propagate
    metadata. This is also the workaround ASan uses. The cpu overhead
    is ~10x. This is a trade off between debuggability and performance,
    and will be used only when debugging cases that tracking only stores
    is not enough.

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D100967