[PATCH] D77632: [TLI] Per-function fveclib for math library used for vectorization

Mon Apr 13 03:12:21 PDT 2020

nikic added a comment.

I gave D77952 <https://reviews.llvm.org/D77952> a try (on top of this one), but didn't see a significant improvement from that change.

Looking at the callgrind output for compilation of a **small** file, I see 52M total instructions, 4 calls to TLII initialization, where addition of the vector functions takes up the majority of the time, at 0.7M. Most of the cost is in the sorting. 2 of the initialization calls are default-constructed TLII without target triple, which seems suspect to me (are we not adding TLI early enough, and something pulls it in via analysis dependency?)

So for small files, just registering the vector functions does make up a non-trivial fraction of time, and lazy initialization might make sense. This isn't the whole truth though: While the largest regressions are indeed on small files, there are also quite a few > 1% regressions on very large files.

For a mid-size file with ~6000M instructions retried, the main difference I see is `TargetLibraryAnalysis::run()` going up from 82M to 126M, with the cost coming from the extra `getFnAttribute("veclib")` call in the TargetLibraryInfo constructor. Fetching attributes is surprisingly expensive, as it performs an iteration over all attributes internally. As this code is iterating over all attributes anyway in order to handle `no-builtin-*`, it might make sense to move the checks for `"veclib"` and `"no-builtins"` into that loop as well, which should make them essentially free.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77632/new/

https://reviews.llvm.org/D77632