[libcxx-commits] [PATCH] D87881: [libunwind] Optimize dl_iterate_phdr's findUnwindSectionsByPhdr

Thu Sep 17 21:31:41 PDT 2020

rprichard added a comment.

I uploaded the Android benchmark I used: https://gist.github.com/rprichard/1688de36f0132a40294adb0409189533.

Results on a Pixel 3 blueline device, running Q, with `taskset 10`:

Before:

  ninja: Entering directory `out'
  [102/102] /x/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/...so libbench_098.so libbench_099.so libbench_100.so libbench_final.so
  out/: 105 files pushed, 0 skipped. 44.9 MB/s (2362029 bytes in 0.050s)
  /x/multitime-android/multitime: 1 file pushed, 0 skipped. 226.5 MB/s (123912 bytes in 0.001s)
  ===> /data/local/tmp/multitime results
  1: /data/local/tmp/out/main
              Mean                 Std.Dev.    Min         Median      Max
  real        0.4897+/-0.04305     0.1671      0.1677      0.4727      0.7813      
  user        0.4815+/-0.04297     0.1668      0.1533      0.4617      0.7733      
  sys         0.0048+/-0.00074     0.0029      0.0000      0.0033      0.0133      

After:

  ninja: Entering directory `out'
  [102/102] /x/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/...so libbench_098.so libbench_099.so libbench_100.so libbench_final.so
  out/: 105 files pushed, 0 skipped. 46.3 MB/s (2362029 bytes in 0.049s)
  /x/multitime-android/multitime: 1 file pushed, 0 skipped. 247.7 MB/s (123912 bytes in 0.000s)
  ===> /data/local/tmp/multitime results
  1: /data/local/tmp/out/main
              Mean                 Std.Dev.    Min         Median      Max
  real        0.3254+/-0.02297     0.0892      0.1630      0.3386      0.4572      
  user        0.3174+/-0.02284     0.0887      0.1567      0.3333      0.4500      
  sys         0.0051+/-0.00067     0.0026      0.0000      0.0067      0.0133      

The large variation between min and max run-times comes from the `cbdata->targetAddr < pinfo->dlpi_addr` optimization in `findUnwindSectionsByPhdr`. On Bionic, when the dynamic loader loads a group of DSOs, all at once, it randomizes the order in which the files are mapped into memory, so the order of DSOs with respective to each other varies from one run to the next. In other respects, the order of the libraries is deterministic -- e.g. the order that constructors are called, and the order in which `dl_iterate_phdr` iterates over the DSOs.

FWIW: An Android NDK user did complain about the performance of EH with many libraries: https://github.com/android/ndk/issues/1062.

I'm still interested in making the FrameHeaderCache actually work on Android, but for now an NDK app can't use the cache because the dlpi_adds/dlpi_subs fields aren't present in most Android versions.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87881/new/

https://reviews.llvm.org/D87881