[PATCH] D128863: Add switch to use "source_file_name" instead of Module ID for globally promoted local

Tue Jul 19 15:02:24 PDT 2022

void added a comment.

In D128863#3662642 <https://reviews.llvm.org/D128863#3662642>, @tejohnson wrote:

> In D128863#3661323 <https://reviews.llvm.org/D128863#3661323>, @void wrote:
>
>> In D128863#3660913 <https://reviews.llvm.org/D128863#3660913>, @tejohnson wrote:
>>
>>> In D128863#3660850 <https://reviews.llvm.org/D128863#3660850>, @void wrote:
>>>
>>>> In D128863#3660840 <https://reviews.llvm.org/D128863#3660840>, @tejohnson wrote:
>>>>
>>>>> In D128863#3660585 <https://reviews.llvm.org/D128863#3660585>, @void wrote:
>>>>>
>>>>>> In D128863#3659642 <https://reviews.llvm.org/D128863#3659642>, @tejohnson wrote:
>>>>>>
>>>>>>> Ok, I understand the issue. The hash changes due to code changes in a module A that is allowed to change, but this has the side effect of superficially changing the callsite in another module B that isn't allowed to change. Couple concerns below.
>>>>>>>
>>>>>>> Fundamentally, LTO is doing whole program and cross-module optimization. How do you ensure that a source change to module A doesn't actually affect the optimization in module B? E.g. the reason you are getting a call to a local foo from module A in module B in the first place is that module B decided to import the caller of foo. If foo or its caller changed in module A the importing decisions into module B could actually change its code generation. We could decide not to import foo's caller or foo itself might get imported. There are other whole program optimizations that could also affect the code generation in B depending on changes in other modules. How will you handle or prevent these cases?
>>>>>>
>>>>>> The short answer is "we won't be able to prevent this." It's one of the risks of using LTO with live-patching (a.k.a. ksplice, kpatch, klivepatch). The risk itself isn't that it will change other places where it was inlined (that's expected, of course), but that the patch may become much larger, making it more difficult to validate. If the patch remains of reasonable size, then the risk involved in using said patch will be minimal.
>>>>>
>>>>> If you can handle a change in module A causing a change in module B due to different cross-module inlining by making the patch larger (and I guess including both modules), could you handle this case, where there is a rename of a function in module A causing the callsite in module B to be renamed, similarly?
>>>>
>>>> That's not quite the situation.
>>>>
>>>> Let's say I have module `A` and I patch function `foo()` and get module `B`. A local function in `A`, named `bar()`, is promoted during its compilation to `bar.llvm.1234()`. After patching, `B` will also promote `bar()` to a global, but give it a different suffix, say `bar.llvm.abcd()`. A separate tool comes along to determine what's changed between the `A`'s ELF file and `B`'s ELF file. It sees that `bar.llvm.1234()` is now `bar.llvm.abcd()` and claims that's a change. (In the instance that came up, the changed function was called from a section we forbid changes in.)
>>>
>>> Sure, I believe I understand what's happening in your situation. But I think you and I are using the terms "module A" and "module B" differently. Here your module B is the new version of module A after patching. I was referring to 2 modules from totally different source files. What I was thinking of is a case where there are 2 source files A.cpp and B.cpp that become modules A and B:
>>>
>>> A.cpp
>>> static void bar() {...}
>>> void foo() { bar(); }
>>>
>>> B.cpp
>>> void caller() { foo(); }
>>>
>>> I was assuming you had a situation where with ThinLTO we had imported and inlined foo into module B, requiring bar to be promoted in module A, say to bar.llvm.1 (and leading to a call to the promoted bar.llvm.1 in module B). Then if something in A.cpp got patched, let's just say bar itself got patched, then you build a new module A object where bar is promoted with a different suffix, say to bar.llvm.2. And now you have changes to both modules A and B.
>>>
>>> But you could just as easily change the importing decisions into module B due to a change to bar. E.g. if bar was made smaller, we might decide to import and inline it as well into module B. It sounds like you would handle this situation by including the objects for both module A and B in the patch. I was just wondering if the former case could be handled similarly. I.e. include the new objects for both modules A and B in the patch since there were changes to both as a result of the change in A.cpp.
>>>
>>> If the change is in a section that isn't allowed to be patched, then that seems like it could cause an issue for the other example I gave above too (where we decided to go ahead and import/inline bar as well). Is that correct?
>>
>> Yes, that could happen, but it's a separate issue than what I'm trying to solve here (i.e. it's not a compiler issue). :-)
>
> To me this doesn't seem like a separate issue, just a different incarnation of the same problem - with LTO optimizations a patch to one source file can have effects on the modules for other source files. Actually let me back up a minute - you mentioned earlier that object files are being patched. What does this mean for the kernel in the context of ThinLTO? Unless you are doing distributed ThinLTO I guess you are patching at the binary level? In which case the module boundaries have been erased, so it is sort of irrelevant I guess whether the other affected code was originally in a different source file or not. Also, what is meant by a section that isn't allowed to be patched? Sorry for all the questions - I'm asking all this because I'm trying to understand in general how you prevent changes to code in one place from affecting code in these unpatchable sections (even without ThinLTO, if they came from the same source file, i.e. you could get different inlining). My concern is that this fix just addresses one small aspect of a potentially larger issue you could run into. Today the change in the unpatchable section is just a different suffix on the callee, but down the line it could be that we decided to inline that callsite completely due to the code changes.

I should have explained better. I sometimes forget that not everyone has the same context that I have when describing a problem. :-) The following goes into detail, some of which you know of course, but I wanted to be thorough.

Here's the situation we're in. We're able to patch the kernel without rebooting the system. It's called "live patching." It's done by generating a module with the patched functions, and modifying the beginning of the original functions to jump to the patched code instead. (Data objects can also be patched, but that's more complicated.) When the live patching module is loaded, the kernel is able to wait until no thread has the patched function in its call stack before suspending the system and applying the patch.

Now a change in a patched function could theoretically allow it to become inlinable when before it wasn't. That will propagate the patch through more functions than expected. The result is that each function it was inlined into will now be considered "patched" and the unpatched functions will be modified to jump to the patched version of the functions, etc. This is the situation I think you're describing, and it's expected, though not ideal. But there may be mitigations we could use to avoid it (like using the `noinline` attribute on the original patched function).

Some sections shouldn't be patched for whatever reason. Code in those sections could theoretically call one of the patched functions *not* residing in the unpatchable section, but that's okay, as those sections marked as "unpatchable" tend to be things like sections with initialization code (and so isn't useful because that code has already been executed and possible those sections reclaimed from memory).

What I'm trying to accomplish with this however isn't related to what I just described. Instead I'm trying to avoid a situation which is tricking the live patching tools into thinking that a change exists where it doesn't actually exist (i.e. the change is purely cosmetic). In particular, when a local object is promoted to a global, it uses the module ID hash. But that hash changes between the patched and unpatched versions, causing the object names to change. Normally this won't be an issue. However, the live patching tools look at the changed name and determines that it's part of the patch we're applying.

So let's say that in some section that we don't allow a change in, the original code looks something like this in the ELF file:

    section .text
  bar.llvm.1:
    ...

    section .init.text
  foo:
    ...
    call    bar.llvm.1
    ...

In the patched code, it's changed into:

    section .text
  bar.llvm.2:
    ...

    section .init.text
  foo:
    ...
    call    bar.llvm.2
    ...

Now, whether `bar()` was part of the actual patch or not isn't really an issue. It's the change of `bar()`'s name that causes the live patching tool to see `foo()` and say, "Oops! We don't support changing `foo()`, because it's in the `.init.text` section." When it really wasn't changed. (It's slightly confusing, because we could change `bar()` with the patch, but we're not changing the generated code for `foo()`, which is all the live patching tool cares about in this context.) To solve this issue, I want the compiler to use a suffix for a promoted object that won't change based on changes in a `Module`.

If it comes to pass that changing `bar()` causes it to be inlinable into `foo()`, then `foo()`'s generated code does change and we'll have to address that in some other fashion (probably with `noinline`). But that's changing the patch rather than what the compiler's doing.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128863/new/

https://reviews.llvm.org/D128863