[compiler-rt] [scudo] Enable "Delayed release to OS" feature for Android (PR #65942)

Mon Nov 13 07:23:52 PST 2023

hctim wrote:

So I had a look at measuring the reuslts myself of a 5 second release-to-os delay using the following patch to `external/scudo`, which I believe should affect the Android config:

```

diff --git a/standalone/allocator_config.h b/standalone/allocator_config.h
index 44c1ac5..e37f917 100644
--- a/standalone/allocator_config.h
+++ b/standalone/allocator_config.h
@@ -186,8 +186,8 @@ struct AndroidConfig {
       static const u32 QuarantineSize = 32U;
       static const u32 DefaultMaxEntriesCount = 32U;
       static const uptr DefaultMaxEntrySize = 2UL << 20;
-      static const s32 MinReleaseToOsIntervalMs = 0;
-      static const s32 MaxReleaseToOsIntervalMs = 1000;
+      static const s32 MinReleaseToOsIntervalMs = 5000;
+      static const s32 MaxReleaseToOsIntervalMs = 5000;
     };
     template <typename Config> using CacheT = MapAllocatorCache<Config>;
   };
diff --git a/standalone/secondary.h b/standalone/secondary.h
index c89e6a9..60bb343 100644
--- a/standalone/secondary.h
+++ b/standalone/secondary.h
@@ -463,7 +463,8 @@ private:
   atomic_uptr MaxEntrySize = {};
   u64 OldestTime GUARDED_BY(Mutex) = 0;
   u32 IsFullEvents GUARDED_BY(Mutex) = 0;
-  atomic_s32 ReleaseToOsIntervalMs = {};
+  atomic_s32 ReleaseToOsIntervalMs =
+    {.ValDoNotUse = CacheConfig::MinReleaseToOsIntervalMs };
   u32 CallsToRetrieve GUARDED_BY(Mutex) = 0;
   u32 SuccessfulRetrieves GUARDED_BY(Mutex) = 0;
```

Then, I measured using Geekbench 5, using `LD_LIBRARY_PATH` to point to the newer libc.so. I saw no significant difference (except for maybe a speedup in the single-threaded midcore):

```
                  small   medium  big
single off.log     0.23%   0.68%  -0.02%
single async.log   0.21%   0.41%  -0.08%
single sync.log    0.27%   0.34%   0.10%
multi  off.log    -0.03%  -0.12%  -0.02%
multi  async.log   0.07%  -0.24%  -0.12%
multi  sync.log   -0.16%   0.00%  -0.40%
```

What's even more interesting is that disabling the secondary cache entirely made some pretty significant speedups only on the midcore. This is only across two runs (rather than the 31 for the larger data set above).

```
   struct Secondary {
     struct Cache {
-      static const u32 EntriesArraySize = 256U;
-      static const u32 QuarantineSize = 32U;
-      static const u32 DefaultMaxEntriesCount = 32U;
+      static const u32 EntriesArraySize = 0U;
+      static const u32 QuarantineSize = 0U;
+      static const u32 DefaultMaxEntriesCount = 0U;
       static const uptr DefaultMaxEntrySize = 2UL << 20;
```

```
                   small   medium  big
single  off.log    0.76%   2.40%   1.00%
single  async.log  0.26%   1.85%   -0.98%
single  sync.log   0.85%   1.76%   0.64%
multi   off.log    -0.54%  0.88%   -0.69%
multi   async.log  0.13%   1.42%   0.06%
multi   sync.log   0.71%   1.14%   -0.86%
```

https://github.com/llvm/llvm-project/pull/65942