[llvm] [TailDuplicator] Add a limit on the number of indirect branch successors (PR #78582)

Wed Feb 14 03:16:30 PST 2024

DianQK wrote:

I just remembered that I found a performance issue here. In the example of successive switch statements, we would add a number of PHI instructions, which seem to end up as stack operation instructions. I'm not sure whether it can be optimized in any other way.

---

> One is whether we're emitting too much code here, i.e. the codesize tradeoff is worthwhile. This might depend on what exactly the user wrote... we might want to be more aggressive if the user explicitly uses indirect gotos. This explains the codesize growth you're seeing, but not really the compile-time; it's a constant number of instructions per switch case.

As I mentioned above, the code size problem should be caused by instruction copying and the addition of extra PHI instructions.

> The other is whether the representation we're using is causing non-linear compile-time. The codesize growth should be linear. But if we're representing the edges explicitly, the size of the IR in memory might grow non-linearly, causing compile-time issues.

Actually, In fact, the code size of #79993 isn't terribly bad, but this is a horrible time increase.

> Even if we're tail-duplicating, we don't necessarily need to explicitly represent all the possible source->destination edges separately. It's just the most convenient thing to do given the way codegen works. But we could do something different: instead of making every indirect jump have an edge to every indirect jump destination, we could make all the jumps jump to a synthetic basic block, and then have edges from that basic block to all the destinations. Same semantics, without the compile-time. (Not suggesting you try to fix this here, just noting it's possible.)

I may indeed not be able to fix the issue, but will I follow up to see if I can actually fix the issue. (Although it probably won't.)

> Some threshold probably makes sense, but we should make clear whether we're primarily trying to target the codesize, or the compile-time, so we have a good starting point the next time this is revisited.

I'll continue to work on this issue. Even if I submit a workaround, I'll try to explain the various issues I've found. :)

https://github.com/llvm/llvm-project/pull/78582