[clang] [KeyInstr] Enable -gkey-instructions by default if optimisations are enabled (PR #149509)

Mon Oct 6 03:48:18 PDT 2025

OCHyams wrote:

Hi Reid, thanks for the info!

> Hi Orlando, I wanted to share that we detected a significant (~0.4%) .text size regression when building an internal binary with sample PGO (AFDO) and ThinLTO. My initial recommendation was to disable the feature with -gno-key-instructions to debug further, but this didn't appear to fully mitigate the regression.

> I looked at the code in this PR, and it has special handling for IR inputs, and I suspect that the reason is that we're using LTO, which has two compilation actions: One with C(++) source inputs, one with IR inputs, nd I assume we need to pass -gno-key-instructions to both compilation steps

The IR input change in this patch just shifts a check from caller to callee; that bit should be NFC.  Unless `-gkey-instructions` was specified on the Clang command line, it won't be passed to `cc1` for LLVM IR input.

I've designed it so that you can build an input with key instructions, and another without, then LTO them together and it should "just work"; scopes keep their key-instructions-ness regardless of inlining decisions. I don't think that's necessarily relevant to this problem you've run into, but thought it worth pointing out.

> to control the backend behavior in llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp. We haven't yet confirmed that this mitigates the regression, but I'm ~66% confident that it will. If you have any theories about the regression, we're interested. :)

There's a hidden off switch `-mllvm -dwarf-use-key-instructions` that tells DwarfDebug to ignore key instructions metadata and use the old/non-key-instructions method to compute line tables. Maybe that will be useful for diagnosing the issue?

> I fully expect that key instructions will disturb the source locations in a way that increases source code drift, making old profiles from old binaries less accurate when rebuilding new binaries with key instructions enabled, so I expect a temporary regression. 

The intention is that key instructions should **not** change any line number associations, only is_stmt placement. i.e., if that's happening it is unexpected and a bug.

It may change the layout of the table - an instruction that previously implicitly got a source location by virtue of sitting in an instruction range between two line table entries may now explicitly get a location. In other words, an instruction address may shift to being referenced in a line table entry where it was not previously and vice versa. That is just an encoding change rather than semantic one though. If this causes changes in PGO results it may be exposing a bug or assumption in the PGO tooling? 

> However, I think we will have to disable the feature internally temporarily to study this interaction further to confirm that the regression goes away after reprofiling. Other sample PGO users may have a similar experience, so I wanted to share ours.

If you're able to share any additional info or reproducers I'm happy to try to help diagnose things.

https://github.com/llvm/llvm-project/pull/149509