[PATCH] D121970: [Verify] check that BlockAddresses don't refer to functions with certain linkages which wont be emitted

Eli Friedman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 18 11:18:12 PDT 2022


efriedma added a comment.

In D121970#3392892 <https://reviews.llvm.org/D121970#3392892>, @nickdesaulniers wrote:

> In D121970#3391502 <https://reviews.llvm.org/D121970#3391502>, @efriedma wrote:
>
>> I'm not sure I understand the issue you're trying to prevent here.
>
> Two concerns. The first is that I don't think it makes sense at all to have a blockaddress refer to an `available_externally` function be used outside of that function itself, ever.  Consider this IR from my added test case above:
>
>   define available_externally void @foo() {
>   entry:
>     br label %hello
>   hello:
>     ret void
>   }
>   
>   @glo = global i8* blockaddress(@foo, %hello)
>
> Running `llc` on the above will produce:
>
>   	.text
>   	.file	"z.ll"
>   	.type	glo, at object                     # @glo
>   	.data
>   	.globl	glo
>   	.p2align	3
>   glo:
>   	.quad	.Ltmp0
>   	.size	glo, 8
>   
>   	.section	".note.GNU-stack","", at progbits
>
> There is no `.Ltmp0`. Clang will not assemble that (GAS will!).

I think I'd consider that a bug in codegen, not a fundamental IR issue.  If llc decides not to emit a function, it should clean up any related blockaddresses, the same way opt would.

I mean, the way you're looking at it sort of works, but it seems like you end up adding special cases to a bunch of transformation passes.  You mention ipsccp, but there's also the inliner, and globalopt, and also probably other stuff I'm not thinking of.

>> You can have basic blocks that won't be emitted into the object file for a bunch of reasons; it might not have any predecessors, it might be inside an unused internal function, or something else.  In all those cases, we use exactly the same strategy: replace the blockaddress with "1".  I don't see any reason to special-case available_externally.
>
> That assumes that we've run optimizations and computed reachability. I suspect that linkage type and reachability are distinct concerns here.  I guess I'm concerned that if you take "bad" IR and just feed it into llc without running opt, we get some weird results with blockaddresses.  Sure, running opt may help replace bad blockaddresses, but I don't think there's guarantees that code is generally compiled with optimizations on.

I think llc should explicitly handle this as part of codegen.  I'm not suggesting we should depend on opt passes.

>> If you're dealing with weak symbols, there might be issues... but not because there's any fundamental semantic hole.  If a function definition gets discarded, the values of blockaddresses which refer to that definition don't matter.
>
> The values don't matter, until we emit dead references (see `.Ltmp0` example above) or fail to import the use in bitcode reader during LTO.

It's an object format issue, which I think we should be able to solve without making IR invariants more complicated.

>> But the way we currently emit blockaddresses referring to weak functions on ELF targets will lead to link errors, I think.
>
> Runtime linkage failures, right? I'm thinking we might want to test that case, since @dexonsmith originally asked about `weak` functions.

Static linker, not dynamic linker.  The linker will get unhappy if you use a non-weak symbol to refer into a comdat.  Once it links successfully, the code exists in the shared library no matter what the symbol resolver does, so there isn't an issue.

>> https://reviews.llvm.org/D120781#3383492 has an example of a blockaddress in a linkonce_odr global variable, but I'm pretty sure that has undefined behavior at the IR level.  A blockaddress in one translation unit is never equivalent to a blockaddress in another translation unit, so it breaks the rules for linkonce_odr.  It might make sense for clang to diagnose that.
>
> That also sounds like something good to codify. I would pursue checking that via IR verification, then knee-capping places that might produce such invalid IR, like clang.

We can state this explicitly in LangRef.  Not sure if it makes sense to turn this into a verifier check; the problem isn't just block addresses, it's any symbol with internal linkage.  I think we'd end up introducing weird checks into a bunch of transforms to enforce it.

-------------

Overall, I think it's possible to reach a consistent state with adding this invariant, but I think it would be simpler to reach a consistent state without it.  I think all the problems you've brought up can be resolved in other ways.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121970/new/

https://reviews.llvm.org/D121970



More information about the llvm-commits mailing list