[PATCH] D86694: [scudo] Allow -fsanitize=scudo on Linux and Windows (WIP, don't land as is)
Russell Gallop via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 11 09:15:24 PDT 2020
russell.gallop updated this revision to Diff 291235.
russell.gallop edited the summary of this revision.
russell.gallop added a comment.
Herald added subscribers: phosek, hiraditya.
Fixup scudo (sanitizer based) to work on Windows.
This makes use of the CRT alloc hooks from D71786 <https://reviews.llvm.org/D71786>.
To build with scudo on Windows use: -DLLVM_INTEGRATED_CRT_ALLOC=<llvm-project>/stage1/lib/clang/12.0.0/lib/windows/clang_rt.scudo-x86_64.lib -DLLVM_USE_CRT_RELEASE=MT
-DLLVM_USE_SANITIZER=Scudo is supported in this patch, but isn't required. @cryptoad on Linux does this do anything other than add libraries when using clang to drive the linker?
Limitations:
-Note that this is not using hardware CRC32.
-This just hooks in the C scudo library, not the cxx library
I evaluated this on a 3 stage LLVM build on Windows 10 2004 (in vs2019 16.7.3 environment) on a 6-core i7-8700k.
- stage1 builds the scudo sanitizer on Windows: requires -DLLVM_ENABLE_PROJECTS=clang;lld;compiler-rt
- stage2: built with -DCMAKE_C_COMPILER="<stage1>/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="<stage1>/bin/clang-cl.exe" -DCMAKE_LINKER="<stage1>/bin/lld-link.exe" -DLLVM_USE_CRT_RELEASE=MT -DCMAKE_BUILD_TYPE=Release
- stage2_scudo: as stage2 plus -DLLVM_INTEGRATED_CRT_ALLOC=<stage1>/lib/clang/12.0.0/lib/windows/clang_rt.scudo-x86_64.lib
Then evaluated linking clang with ThinLTO:
- stage3: -DCMAKE_C_COMPILER="<stage2>/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="<stage2>/bin/clang-cl.exe" -DCMAKE_LINKER="<stage2>/bin/lld-link.exe" -DLLVM_USE_CRT_RELEASE=MT -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_LTO=Thin
- stage3_scudo: -DCMAKE_C_COMPILER="<stage2_scudo>/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="<stage2_scudo>/bin/clang-cl.exe" -DCMAKE_LINKER="<stage2_scudo>/bin/lld-link.exe" -DLLVM_USE_CRT_RELEASE=MT -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_LTO=Thin
I set /threads:12 and removed /lldltocache.
Without SCUDO_OPTIONS scudo seems to be about 25% slower.
>"hyperfine.exe" -m 3 -w 1 "cd stage3\repro && f:\git\llvm-project\stage2\bin\lld-link @response.txt" "cd stage3_scudo\repro && f:\git\llvm-project\stage2_scudo\bin\lld-link @response.txt"
Benchmark #1: cd stage3\repro && f:\git\llvm-project\stage2\bin\lld-link @response.txt
Time (mean ± σ): 268.209 s ± 4.966 s [User: 18.1 ms, System: 6.6 ms]
Range (min … max): 263.223 s … 273.155 s 3 runs
Benchmark #2: cd stage3_scudo\repro && f:\git\llvm-project\stage2_scudo\bin\lld-link @response.txt
Time (mean ± σ): 334.312 s ± 4.002 s [User: 2.4 ms, System: 14.6 ms]
Range (min … max): 329.889 s … 337.683 s 3 runs
Summary
'cd stage3\repro && f:\git\llvm-project\stage2\bin\lld-link @response.txt' ran
1.25 ± 0.03 times faster than 'cd stage3_scudo\repro && f:\git\llvm-project\stage2_scudo\bin\lld-link @response.txt'
I set scudo options to disable quarantine and mismatch checking and it seems to be about 8% slower.
>set SCUDO_OPTIONS=allocator_release_to_os_interval_ms=-1:QuarantineSizeKb=0:ThreadLocalQuarantineSizeKb=0:DeleteSizeMismatch=0:DeallocationTypeMismatch=0
>"hyperfine.exe" -m 3 -w 1 "cd stage3\repro && f:\git\llvm-project\stage2\bin\lld-link @response.txt" "cd stage3_scudo\repro && f:\git\llvm-project\stage2_scudo\bin\lld-link @response.txt"
Benchmark #1: cd stage3\repro && f:\git\llvm-project\stage2\bin\lld-link @response.txt
Time (mean ± σ): 273.772 s ± 3.624 s [User: 1.3 ms, System: 8.6 ms]
Range (min … max): 269.806 s … 276.909 s 3 runs
Benchmark #2: cd stage3_scudo\repro && f:\git\llvm-project\stage2_scudo\bin\lld-link @response.txt
Time (mean ± σ): 296.593 s ± 2.362 s [User: 1.3 ms, System: 13.9 ms]
Range (min … max): 293.917 s … 298.391 s 3 runs
Summary
'cd stage3\repro && f:\git\llvm-project\stage2\bin\lld-link @response.txt' ran
1.08 ± 0.02 times faster than 'cd stage3_scudo\repro && f:\git\llvm-project\stage2_scudo\bin\lld-link @response.txt'
It's worth noting that the run without scudo was not using all CPU so still hitting some locking issues but was still over 90%. With both scudo runs it was pegged at 100% CPU until near the end of the link. From this it appears that the locking behaviour is better than the default allocator and would win out on a wide enough processor. The loss in straight line performance is potentially due to CRC calculations (not using hardware).
@cryptoad are those the best scudo flags to evaluate for performance?
I'm looking at porting scudo standalone to Windows.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D86694/new/
https://reviews.llvm.org/D86694
Files:
clang/lib/Driver/ToolChains/MSVC.cpp
compiler-rt/cmake/config-ix.cmake
compiler-rt/lib/sanitizer_common/sanitizer_win.cpp
compiler-rt/lib/scudo/CMakeLists.txt
compiler-rt/lib/scudo/scudo_allocator.cpp
compiler-rt/lib/scudo/scudo_crc32.cpp
compiler-rt/lib/scudo/scudo_new_delete.cpp
compiler-rt/lib/scudo/scudo_platform.h
compiler-rt/lib/scudo/scudo_tsd.h
compiler-rt/lib/scudo/scudo_tsd_shared.cpp
compiler-rt/lib/scudo/scudo_tsd_shared.inc
compiler-rt/test/sanitizer_common/CMakeLists.txt
compiler-rt/test/scudo/interface.cpp
compiler-rt/test/scudo/lit.cfg.py
compiler-rt/test/scudo/malloc.cpp
compiler-rt/test/scudo/memalign.c
compiler-rt/test/scudo/mismatch.cpp
compiler-rt/test/scudo/overflow.c
compiler-rt/test/scudo/preload.cpp
compiler-rt/test/scudo/rss.c
compiler-rt/test/scudo/secondary.c
compiler-rt/test/scudo/threads.c
compiler-rt/test/scudo/tsd_destruction.c
compiler-rt/test/scudo/valloc.c
llvm/CMakeLists.txt
llvm/cmake/modules/HandleLLVMOptions.cmake
llvm/lib/Support/CMakeLists.txt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D86694.291235.patch
Type: text/x-patch
Size: 24301 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200911/c9ef0ec9/attachment.bin>
More information about the llvm-commits
mailing list