[llvm] 6276927 - [ThinLTO] Mark callers of local ifunc not eligible for import
Fangrui Song via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 29 13:21:01 PDT 2023
Author: Fangrui Song
Date: 2023-08-29T12:26:40-07:00
New Revision: 6276927bf3f6ce4a9ef0b9941b2c6450ae4cd1eb
URL: https://github.com/llvm/llvm-project/commit/6276927bf3f6ce4a9ef0b9941b2c6450ae4cd1eb
DIFF: https://github.com/llvm/llvm-project/commit/6276927bf3f6ce4a9ef0b9941b2c6450ae4cd1eb.diff
LOG: [ThinLTO] Mark callers of local ifunc not eligible for import
Fix https://github.com/llvm/llvm-project/issues/58740
The `target_clones` attribute results in ifunc on eligible targets
(Linux glibc/Android or FreeBSD). If the function has internal linkage,
we will get an internal linkage ifunc.
```
__attribute__((target_clones("popcnt", "default")))
static int foo(int n) { return __builtin_popcount(n); }
int use(int n) { return foo(n); }
@foo.ifunc = internal ifunc i32 (i32), ptr @foo.resolver
define internal nonnull ptr @foo.resolver() comdat {
; local linkage comdat is another issue that should be fixed
...
select i1 %.not, ptr @foo.default.1, ptr @foo.popcnt.0
...
}
define internal i32 @foo.default.1(i32 noundef %n)
```
ifuncs are not included in module summaries, so LTO doesn't know the
local linkage `foo.default.1` referenced by `foo.resolver`
should be promoted. If a caller of `foo` (e.g. `use`) is imported,
the local linkage `foo.resolver` will be cloned as a definition
(IRLinker::shouldLink), leading to linker errors.
```
ld.lld: error: undefined hidden symbol: foo.default.1.llvm.8017227050314953235
>>> referenced by bar.c
>>> lto.tmp:(foo.ifunc)
```
As a simple fix, just mark `use` as not eligible for import. Non-local
linkage ifuncs do not have the problem, because they are not imported,
and not cloned when a caller is imported.
---
https://reviews.llvm.org/D82745 contains a more involved fix, though the
original bug it intended to fix
(https://github.com/llvm/llvm-project/issues/45833) now works.
Note: importing ifunc is tricky.
If we import an ifunc, we need to make sure the resolver and the
implementation are in the translation unit, as required by
https://sourceware.org/glibc/wiki/GNU_IFUNC
> Requirement (a): Resolver must be defined in the same translation unit as the implementations.
This is infeasible if the implementation is changed to
available_externally.
In addition, the imported ifunc may be referenced by two translation
units. This doesn't work with PowerPC32 -msecure-plt
(https://maskray.me/blog/2021-01-18-gnu-indirect-function).
At the very least, every referencing translation unit needs one extra
IRELATIVE dynamic relocation.
At least for the local linkage ifunc case, it doesn't have much use
outside of `target_clones`, as a global pointer is usually a better
replacement.
I think ifuncs just have too many pitfalls to design more IR features
around it to optimize them.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D158961
Added:
llvm/test/ThinLTO/X86/ifunc-import.ll
Modified:
llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
Removed:
################################################################################
diff --git a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
index 775bb95fdda7b0..a88622efa12db8 100644
--- a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
+++ b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
@@ -305,6 +305,7 @@ static void computeFunctionSummary(
bool HasInlineAsmMaybeReferencingInternal = false;
bool HasIndirBranchToBlockAddress = false;
+ bool HasIFuncCall = false;
bool HasUnknownCall = false;
bool MayThrow = false;
for (const BasicBlock &BB : F) {
@@ -417,6 +418,16 @@ static void computeFunctionSummary(
}
} else {
HasUnknownCall = true;
+ // If F is imported, a local linkage ifunc (e.g. target_clones on a
+ // static function) called by F will be cloned. Since summaries don't
+ // track ifunc, we do not know implementation functions referenced by
+ // the ifunc resolver need to be promoted in the exporter, and we will
+ // get linker errors due to cloned declarations for implementation
+ // functions. As a simple fix, just mark F as not eligible for import.
+ // Non-local ifunc is not cloned and does not have the issue.
+ if (auto *GI = dyn_cast_if_present<GlobalIFunc>(CalledValue))
+ if (GI->hasLocalLinkage())
+ HasIFuncCall = true;
// Skip inline assembly calls.
if (CI && CI->isInlineAsm())
continue;
@@ -599,7 +610,7 @@ static void computeFunctionSummary(
bool NonRenamableLocal = isNonRenamableLocal(F);
bool NotEligibleForImport = NonRenamableLocal ||
HasInlineAsmMaybeReferencingInternal ||
- HasIndirBranchToBlockAddress;
+ HasIndirBranchToBlockAddress || HasIFuncCall;
GlobalValueSummary::GVFlags Flags(
F.getLinkage(), F.getVisibility(), NotEligibleForImport,
/* Live = */ false, F.isDSOLocal(), F.canBeOmittedFromSymbolTable());
diff --git a/llvm/test/ThinLTO/X86/ifunc-import.ll b/llvm/test/ThinLTO/X86/ifunc-import.ll
new file mode 100644
index 00000000000000..8df43e2c4becfd
--- /dev/null
+++ b/llvm/test/ThinLTO/X86/ifunc-import.ll
@@ -0,0 +1,79 @@
+; RUN: rm -rf %t && split-file %s %t && cd %t
+; RUN: opt -module-summary -o a.bc a.ll
+; RUN: opt -module-summary -o b.bc b.ll
+; RUN: llvm-lto2 run a.bc b.bc -o t --save-temps \
+; RUN: -r a.bc,external.ifunc,pl -r a.bc,use,pl -r a.bc,use2,pl -r a.bc,__cpu_model,lx \
+; RUN: -r b.bc,main,plx -r b.bc,use,l -r b.bc,use2,l
+; RUN: llvm-dis < t.1.3.import.bc | FileCheck %s --check-prefix=A
+; RUN: llvm-dis < t.2.3.import.bc | FileCheck %s --check-prefix=B --implicit-check-not='@internal.resolver'
+
+; A: define internal nonnull ptr @internal.resolver()
+; A: define internal i32 @internal.default.1(i32 %n)
+
+;; The ifunc implementations of internal.ifunc are internal in A, so they cannot
+;; be referenced by B. Our implementation actually ensures that the ifunc resolver
+;; along with its implementations are not imported.
+; B: declare i32 @use(i32) local_unnamed_addr
+; B: define available_externally i32 @use2(i32 %n) local_unnamed_addr
+; B: declare i32 @external.ifunc(i32)
+
+;--- a.ll
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+$internal.resolver = comdat any
+
+ at __cpu_model = external dso_local local_unnamed_addr global { i32, i32, i32, [1 x i32] }
+
+ at internal.ifunc = internal ifunc i32 (i32), ptr @internal.resolver
+ at external.ifunc = ifunc i32 (i32), ptr @internal.resolver
+
+define dso_local i32 @use(i32 %n) local_unnamed_addr {
+entry:
+ %call = call i32 @internal.ifunc(i32 %n)
+ ret i32 %call
+}
+
+define dso_local i32 @use2(i32 %n) local_unnamed_addr {
+entry:
+ %call = call i32 @external.ifunc(i32 %n)
+ ret i32 %call
+}
+
+define internal nonnull ptr @internal.resolver() comdat {
+entry:
+ %0 = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i64 0, i32 3, i64 0), align 4
+ %1 = and i32 %0, 4
+ %.not = icmp eq i32 %1, 0
+ %internal.popcnt.0.internal.default.1 = select i1 %.not, ptr @internal.default.1, ptr @internal.popcnt.0
+ ret ptr %internal.popcnt.0.internal.default.1
+}
+
+define internal i32 @internal.popcnt.0(i32 %n) {
+entry:
+ %0 = call i32 @llvm.ctpop.i32(i32 %n)
+ ret i32 %0
+}
+
+declare i32 @llvm.ctpop.i32(i32)
+
+define internal i32 @internal.default.1(i32 %n) {
+entry:
+ %0 = call i32 @llvm.ctpop.i32(i32 %n)
+ ret i32 %0
+}
+
+;--- b.ll
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define dso_local i32 @main() local_unnamed_addr {
+entry:
+ %0 = call i32 @use(i32 0)
+ %1 = call i32 @use2(i32 0)
+ %2 = add i32 %0, %1
+ ret i32 %2
+}
+
+declare i32 @use(i32) local_unnamed_addr
+declare i32 @use2(i32) local_unnamed_addr
More information about the llvm-commits
mailing list