[llvm] [LoopIdiom] Perform loop versioning to use memcpy (PR #125043)

Fri Jan 31 00:53:00 PST 2025

https://github.com/nikic commented:

A few more high level concerns:

Phase ordering: LIR runs as part of the function simplification pipeline. However, this patch introduces a decanonicalizing, size-increasing transform. This is a problem if you consider the following scenario: You have a function x calling a function copy. LIR runs on copy first and introduces runtime checks. Then it gets inlined. However, if it were directly inlined, AA would have determined that the pointers can't alias, and no runtime checks are necessary.

At that point, we have to hope that the checks get eliminated again. I think it's very unlikely that the checks can be optimized away -- the best that can happen is that the versioned loop gets converted to memcpy as well, we sink both memcpy and then find that the control flow can be eliminated. Whether that reliably happens in practice, I don't know.

However, even if this does actually work, there is still the problem that this is going to affect the inlining cost model by introducing a lot of code. If the function doesn't get inlined as a result, the outcome will be very detrimental.

Generally, and transforms that introduce runtime checks / loop versioning should be running in the post-inline module optimization pipeline only.

Profitability: Just based on the tests, the runtime checks introduced look quite expensive, on the order of 20 instructions. I'm concerned that this transform can easily be non-profitable for copy loops that only move a small amount of data.

https://github.com/llvm/llvm-project/pull/125043