[PATCH] D108421: Mark openmp internal global dso_local
Fangrui Song via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Sep 1 10:05:43 PDT 2021
MaskRay requested changes to this revision.
MaskRay added a comment.
This revision now requires changes to proceed.
In D108421#2977107 <https://reviews.llvm.org/D108421#2977107>, @kamleshbhalui wrote:
> In D108421#2958848 <https://reviews.llvm.org/D108421#2958848>, @MaskRay wrote:
>
>> If you read the comment in TargetMachine::shouldAssumeDSOLocal: this is the wrong direction. dso_local is assumed to be marked by the frontend.
>>
>> Direct accesses and GOT accesses are trade-offs. Direct accesses are not always preferred. The MachO special case is an unfortunate case for their xnu kernel, not a good example here.
>
> @MaskRay I would like to know more about these trade-offs details, any supporting documents will do.
> Considering below testcase, can you shed some light how code generated by llc-12 is better than llc-11 for given testcase?
> https://godbolt.org/z/x9xojWb58
This is a very minor issue. First, global variable access is rarely a performance bottleneck.
Second, if the symbol turns out to be non-preemptible (which implies that it is defined in the component), the R_X86_64_REX_GOTPCRELX GOT indirection can be optimized out.
The only minor issue is slightly longer code sequence.
> And FYI this testcase does not work when build as Linux Kernel Module. LKM loader throw error("Unknown rela relocation: 42")?
This is a kernel issue. Please mention the justification (is it related to OpenMP?) in the first place.
The kernel can be compiled in -fpie mode. In this mode, taking the address of a default-visibility undefined symbol uses R_X86_64_REX_GOTPCRELX.
So the kernel should support R_X86_64_REX_GOTPCRELX anyway.
---
If we think the optimization is meaningful:
Depending on the property of `.gomp_critical_user_.atomic_reduction.var` different DSOLocal strategies should be used.
If it is TU-local, make it local linkage. If it is linked image local, make it hidden visibility.
If it may be defined in a shared object and shared with other shared objects or the main executable, (not so sure because such symbol interposition does not work in other binary formats), use dso_preemptable as is.
I believe the current forced dso_local is definitely incorrect because it may break `-fpic -shared` links.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D108421/new/
https://reviews.llvm.org/D108421
More information about the cfe-commits
mailing list