[llvm-dev] [RFC] Stop giving a default CPU to the LTO plugin?

Fri Mar 16 03:38:14 PDT 2018

Thanks for the example, that is very useful in working out the overall
scope of the problem, which is now wider than I thought it was. I've
put some comments inline.

On 15 March 2018 at 19:12, Friedman, Eli <efriedma at codeaurora.org> wrote:
> On 3/15/2018 9:43 AM, Peter Smith via llvm-dev wrote:
>>
>> Hello everyone, this is most likely Arm specific, but could affect
>> other targets where there is a somewhat complex relationship between
>> the triple and mcpu option.
>>
>> At present when clang is used as a linker driver for the gold-plugin
>> and when using and an explicit -mcpu is not given to clang, then clang
>> will always generate a -Wl,-plugin-opt=mcpu=<default CPU> where the
>> default CPU is based on the triple.
>>
>> I think that this is causing more problems than it is worth and I'd
>> like clang to not pass -Wl,-plugin-opt=mcpu=<CPU> unless the
>> -mcpu=<CPU> is explicitly on the clang command line.
>>
>> I'd like to know if I'm missing something important, and if there
>> would be any support for a patch to not pass an implicit mcpu via
>> plugin? There may also be a way of handling an inappropriate
>> cpu/triple combination more gracefully.
>>
>> The full story of a problem with the Arm target is available at
>> https://bugs.llvm.org/show_bug.cgi?id=36542 . A short description of
>> the problem is:
>> - Source files are compiled with -mthumb
>> --target=arm-linux-androideabi -mcpu=cortex-a7,
>> this records a triple of thumbv7a-linux-androideabi in the bitcode file.
>> - The command line for the link step only passes --target=
>> arm-linux-androideabi.
>> - The default cpu passed to LTO for a triple arm-linux-androideabi is
>> the very old arm7tdmi.
>> - When the ARMAsmBackend is created from the combination of triple
>> thumbv7a-linux-androideabi and cpu arm7tdmi the resultant target
>> features only include Thumb1. Amongst other things this prevents MC
>> from widening branch instructions to the Thumb2 wide branch.
>> - Code generation prior to MC seems to prefer the attributes in the
>> bitcode file (triple thumbv7a-linux-androideabi -mcpu=generic), this
>> selects a narrow branch instruction for a tail call with the
>> expectation it will be widened by MC. End result is a relocation out
>> of range error.
>>
>> In summary the combination of -mcpu=arm7tdmi and a triple from the
>> bitcode file of thumbv7a-linux-androideabi does not make sense, but
>> when passed to LTO seems to:
>> - Escape all the existing warnings about nonsensical triples [*].
>> - On Arm targets at least; having a final link step without a cpu is
>> likely to be quite common so I don't think that this will be a corner
>> case.
>> - The Build Attributes of the output ELF file get the triple ARMv4t
>> from -mcpu=arm7tdmi which is lying to the linker as the file contains
>> ARMv7a instructions.
>
>
> Having ARMv7a instructions in an ARMv4t file shouldn't be a problem: a
> function should be allowed to override the CPU attributes to generate code
> for a newer target.  This is generally done using the "target" function
> attribute.  If this doesn't work correctly, we should fix it.  It looks like
> it's currently broken; testcase:
>
> void g();
> __attribute__((target("thumb,arch=cortex-a53")))
> void f() { g(); }
>

Hmmm, allowing that makes life much more complicated. For example I
can also write:
void g();
__attribute__((target("thumb,arch=cortex-m0")))
void f() { g(); }

void i();
__attribute__((target("arm,arch=cortex-a53")))
void h() { i(); }

With -mcpu=cortex-m0 and get ARM code within an object claiming to be
Thumb only with no errors or warnings, with no chance of a linker
detecting a mismatch either.

I think that part of this is the same problem that is observed in
PR36542 the ARMAsmBackend that is responsible for widening the tail
call to a Thumb2 branch is created with ARMv4T which doesn't support
Thumb1. There has been a recent change that threads through the
existing SubtargetInfo instead of recreating it from the triple alone.
It is worth mentioning that the object level BuildAttributes do not
include Thumbv7a which is misleading to a linker as it will be
expecting no ARMv7A in the object.

Has there already been a discussion about what per function
code-generation with BuildAttributes higher than the base object
should mean in the context of capabilities of the ARMAsmBackend and
BuildAttributes? My thoughts right now are that if ARMAsmBackend is to
operate at an object level, rather than a per-function level then it
has to use the capabilities of the highest architecture in the file.
This also means giving the object BuildAttributes of the highest
architecture in the file, and giving an error if they contradict, for
example mixing Thumb Cortex-M0 and ARM Cortex-A53. If the
ARMAsmBackend could be made to work on a per-function level then there
is a chance that we could only widen the tail call to g() in f(), but
not elsewhere. To honestly describe this in the BuildAttributes we
would need to use per Section or per Function attributes though.

My suggestion to move forward here is:
- Recreate the SubtargetInfo based on the merge of all the Targets and
CPU information that we have seen, or warn/error if they are
incompatible.
- Ouput the Tag_CPU_arch BuildAttributes based on the merge of all the
Targets and CPU information that we have seen.

It is probably worth moving any discussion of this particular part to
PR36542 since it is somewhat Arm specific. I'll add this comment to
there.

>>
>> My personal preference is that we shouldn't pass an implicit CPU
>> derived from the triple, at least for Arm targets. If that isn't
>> possible I think that we should make sure we detect inappropriate
>> cpu/triple combinations and either warn or error. Does anyone have any
>> strong opinions on which way to go? I'm happy to work on a patch to
>> improve this.
>>
>> I know that this can be worked around by explicitly passing a cpu or
>> more specific triple to the link line, however this is often not done
>> in existing builds.
>>
>> Thanks in advance for any suggestions
>
> It doesn't really make sense to override the target specified in the
> bitcode; I agree we shouldn't do that by default.  But I don't think
> changing that actually fixes the underlying bug.
>

It won't fix the underlying bug, but depending on we deal with
nonsensical combinations of triple and CPU we may end up with a
warning or an error for a common use case.

> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project
>