[cfe-dev] [RFC] UBSan check data size reductions

Filipe Cabecinhas via cfe-dev cfe-dev at lists.llvm.org
Tue Apr 26 09:49:01 PDT 2016


Hi all,

I’m proposing two ways to make ubsan instrumented code (really data…) smaller.

Why?
Our platform has limited resources and game teams have to play within
those to create their games. During debugging, they might have some
more relaxed limits, but they still have a budget they need to stick
to. And they tend to use everything they can!
The size of code+data counts towards that budget, and ubsan is very
happy to make code+data a lot bigger when enabled. This can end up
posing some problems and making ubsan harder to use than we would
like.
This might not be as useful for the usual operating systems people
use, but it might help other people with space constraints enable
ubsan more easily.


Biggest space “wasters”:
A quick objdump + grep found that not all check handlers are equal
(checks inserting >10k calls to handler):

clang (OS X):
793539 ___ubsan_handle_type_mismatch_abort
24204 ___ubsan_handle_load_invalid_value_abort
(4127 ___ubsan_handle_builtin_unreachable)
…

Game:
837785 __ubsan_handle_type_mismatch
20391 __ubsan_handle_load_invalid_value
16037 __ubsan_handle_out_of_bounds
11357 __ubsan_handle_add_overflow
…


Savings:
First, I made clang emit all the ubsan check data to a different
section (and the type_mismatch data, specifically, to another), to
ease accounting (patch attached: 0001-*.patch, provided so anyone can
check it out. I don’t think it’s useful for it to be in clang).

With an estimate of 48B per type_mismatch (SourceLocation (char* + 2x
u32) + TypeDescriptor ref (uptr) + Alignment (uptr) + unsigned char ==
8+2*4+8+8+1 == 33, which will be upped to 48 (!!) with padding (on
x86-64 we’re aligning all structures to 16B)), we end up with (for
type_mismatch checks’ static data):

Before:
clang: ~47.73 MiB
game:  ~59.24 MiB

After 0003-*.patch:
clang: ~31.82 MiB (~67%)
game:  ~39.49 MiB (~67%)


(These numbers don’t match 793539*48 bytes because we end up eliding
some checks. The relative savings in the data section will be the
same, though)


I also have a patch for minimizing ubsan source location data
(parametrizable), which would apply to all checks, but with string
merging, we end up not saving that much, in the size of cstring/rodata
sections. This patch allows the user to drop the first N path
components or only keep the last N patch components from the source
locations emitted by clang.

Before:
clang: ~1.28 MiB
game:  ~10.75 MiB

After 0002-*.patch:
clang: ~1.19 MiB (~93%)
game:  ~8.46 MiB (~79%)



Summing up:
I’m proposing two patches, one saves a lot more than the other in the
cases I’ve seen:
  - Make static check data for type_mismatch 7 bytes smaller. This
check is *by far* the most emitted one. And its static data (per check
handler call) is 48B (with padding). Minimised version is 32B (with
padding). (0003-*.patch)
  - Add -fstrip-path-components-from-checks=N (do bike-shed…), which
tells ubsan to strip the first N path components when emitting
filename information (or only emit the last N, if N is negative).
(0002-*.patch)

The data minimizing patch might not be implementable while keeping
library compatibility (as in: being able to use objects compiled with
an older version of clang on a more recent ubsan library. We might be
able to come up with a heuristic, though), but yields 1/3 savings in
type_mismatch static data (not accounting for file names nor type
descriptors).
The file name minimizing patch is simple enough, and wouldn’t imply
any changes to the sanitizer libraries.

I know these patches are missing tests, this email is an RFC, to know
if there is interest in having this upstream. Proper patches (updated
with comments and tests) will follow depending on the response.

Thank you,

  Filipe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ubsan-cfi-Emit-static-check-data-to-ubsan_check_data.patch
Type: application/octet-stream
Size: 1675 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160426/104226ae/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-ubsan-Add-fstrip-path-components-from-checks.patch
Type: application/octet-stream
Size: 4462 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160426/104226ae/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-ubsan-Minimize-size-of-data-for-type_mismatch.patch
Type: application/octet-stream
Size: 1103 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160426/104226ae/attachment-0002.obj>


More information about the cfe-dev mailing list