[PATCH] D101553: [Loads] Ignore type test assume sequences inserted for devirtualization

Thu Apr 29 20:54:49 PDT 2021

tejohnson added a comment.

In D101553#2726674 <https://reviews.llvm.org/D101553#2726674>, @nikic wrote:

> Not a big fan of this -- this is effectively whitelisting a specific pattern, and doesn't generalize.
>
> For this code, there are two limits of interest: The total number of instructions we look at, and the number of alias-analysis queries we perform. Currently we limit both through a small number of total instructions. However, the only expensive part, and what needs to be aggressively limited, is the number of alias analysis queries.

There's sort of 3 categories of instructions from what I can tell, in increasing order of expense:

1. Insts that are not loads or stores and don't modify memory - these can be skipped immediately
2. Loads and stores which are compared to the given pointer by comparing the pointer values but not with AA (in getAvailableLoadStore)
3. (Subset of category 2) Instructions that are compared using AA analysis (insts that may write memory)

> I think what we can do here is to have a relatively liberal upper limit on the number of scanned instructions (say 32), while having a tight limit on the number of AA queries or AA query candidates (say 4). This should make the code more resilient against additional bitcast/assume instructions in between, while still limiting compile-time impact.
>
> Do you think something along those lines would address your motivation here?

It would help, but we could still end up with corner cases where with -fwhole-program-vtables enabled for the optimization build we might end up with IR profile matching issues due to different simplification optimizations prior. Although with a larger window the chances of that would presumably be reduced.

One thing I am wondering is whether for the first category above (non load/store/may write memory), if there needs to be any limit since they can be skipped immediately (e.g. instead of just dbg or pseudo insts as it does now). I am not sure how expensive it would be to allow quickly skipping through all of them in the block as we scan for memory accesses. I did some measurements for a large binary build with ~20k input files. With that change, essentially to convert:

  if (isa<DbgInfoIntrinsic>(Inst))
    continue;

to

  if (!isa<LoadInst>(&Inst) && !isa<StoreInst>(&Inst) && !Inst.mayWriteToMemory())
    continue;

across all the input files there are about 40% more iterations of the loop in findAvailablePtrLoadStore and about 47% more iterations in FindAvailableLoadedValue. I left the existing default scan limit of 6 for the loads/stores/maywrite insts, so these were largely just skipping through without any analysis. I didn't compare the compile times of all of the individual files in detail, but the slowest compiling files did not increase in compile time nor did the overall build.

If that isn't acceptable, perhaps a larger limit such as 100? Note this could either subsume the existing unlimited skipping of dbg and pseudo instructions, or allow those to be skipped unconditionally as now. Right now with the current skipping of all dbg and pseudo instructions, the max iteration of this loop is ~1K, so if we include those instructions in the limit it will have a potentially negative effect on optimization over the status quo.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101553/new/

https://reviews.llvm.org/D101553