[PATCH] D47599: [UBSan] DO NOT COMMIT: precise UBSan checks experiment

Thu May 31 11:50:16 PDT 2018

alekseyshl created this revision.
Herald added subscribers: kristof.beyls, mgorny.
Herald added a reviewer: javed.absar.
Herald added a reviewer: javed.absar.

LLVM part of the two part patch (LLVM + clang)

An experiment: implement precise UBSan checks on 32-bit ARM.

The current UBSan's "one trap per function" approach has a few issues:

- the report doesn't tell us the exact offending instruction since all conditional branches lead to the same trap instruction
- trap instruction costs some code size, even though it's never executed
- *MAYBE* the branch predictor is polluted by lots of un-taken branches

The idea is to inject a "conditional trap" instruction (does not exist
in the current instruction set) into each potential failure point to get
the precise reporting and avoid polluting the branch predictor.
To simulate it on the current instruction set, SVCxx 0xFFFFFF was chosen,
where "xx" is a predicate (overflow etc.), so the UBSan-instrumented code
changes from this:

  adds    r0, r0, r1
  bvc     L
  udf     #65006

L: bx      lr

to this:

  adds    r0, r0, r1
  svcvs   0x00ffffff
  bx      lr

Two UBSan heavy projects were used for benchmarking, bzip2
(http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz) and 464.h264ref from
SPEC_CPU2006v1.2 package, comparing two instrumented versions, one
with the current UBSan implementation and another with the simulated
conditional trap.

Measurements show that code size increased for both projects:

- bzip2 - ~3%
- h264ref - ~5%

The size increase might be attributed to more precise checks, for example,
a + b + c with overflow check:

  adds    r0, r0, r1
  addsvc  r0, r0, r2
  bvc     trap

now looks like this:

  adds    r0, r0, r1
  svcvs   0x00ffffff
  adds    r0, r0, r2
  svcvs   0x00ffffff

Likely, other optimisations are inhibited by these new checks too.

The performance is also worse both on little and big cores:

- bzip2 - ~108% (little core) and ~415% (big core) of base line
- h264ref - ~117% (little core) and ~340% (big core) of base line

Using other instructions instead of SVCxx (NOPxx, for one example) shows
that there is still the performance hit, although not as dramatic difference
on big core as with SVCxx, but still in a range of 2% - 35% on various
tests and instructions.

The conclusion: using the existing ARM instructions to implement
precise UBSan checks achieves this exact goal, precise error reports,
but seem to be impractical in terms of the code size and performance.

Repository:
  rL LLVM

https://reviews.llvm.org/D47599

Files:
  include/llvm/IR/Intrinsics.td
  lib/Target/ARM/ARM.h
  lib/Target/ARM/ARMCondTrapPass.cpp
  lib/Target/ARM/ARMInstrInfo.td
  lib/Target/ARM/ARMTargetMachine.cpp
  lib/Target/ARM/CMakeLists.txt

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D47599.149333.patch
Type: text/x-patch
Size: 5863 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180531/885d9762/attachment.bin>