[llvm-dev] Encode target-abi into LLVM bitcode for LTO.

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 27 15:11:50 PST 2020


On Mon, Jan 27, 2020 at 3:04 PM Eric Christopher <echristo at gmail.com> wrote:

>
>
> On Mon, Jan 27, 2020 at 2:56 PM David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Mon, Jan 20, 2020 at 8:05 AM Sam Elliott via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> To follow up on this issue:
>>>
>>> Our plan is still to encode `target-abi` into the module flags for
>>> RISC-V LLVM IR modules. As was pointed out earlier in this thread, the
>>> function lowering in Clang is slightly different for the ABIs which support
>>> hardware floating point. Therefore adding the `target-abi` metadata at the
>>> very least can help catch places where modules with incompatible ABIs are
>>> merged.
>>>
>>> With this in mind, our plan is the following:
>>>
>>> 1. We have two patches, both prepared by Zakk, one to add the
>>> `target-abi` to the module flags (https://reviews.llvm.org/D72755), and
>>> one to check that LLVM has been passed the correct `target-abi` to match
>>> the module (https://reviews.llvm.org/D72768).
>>>   These two are intended to ensure that we are compiling the module as
>>> expected, and to protect ourselves from errors that could arise otherwise,
>>> and are entirely within the RISC-V target.
>>>   We would like these to be backported to LLVM 10.0 once they are
>>> landed.
>>>
>>
>> OK - so this means the target-abi module flag wouldn't be used to
>> motivate the codegen, just to check that the command line flag is correct,
>> and the command line flag would be what's used to motivate the codegen
>> choices?
>>
>> Sounds OK - but I guess the existence of that command line flag (rather
>> than using the module metadata) is a workaround for not being able to use
>> the module metadata at the time when it's needed (that was something being
>> debated/discussed in the various code reviews, yes?). I'd still be curious
>> to have a well documented explanation for why that was complicated or not
>> possible - and specifically why the target-triple and data layout which are
>> also part of the Module are easier to access/use at the right times (is the
>> target triple/data layout only needed later than the target abi? or is it
>> that there's some workaround for accessing triple/data layout earlier &
>> that workaround isn't as easy to apply to the module metadata which is a
>> bit more module-centric, in some way?)
>>
>
> FWIW if we're going to put something in the module I would want us to use
> it.
>

I believe the "use", such as it is, is in diagnosing errors where users
compile with one target abi but specify a conflicting target abi at link
time. That seems valuable to me - would you rather not diagnose that? Or do
something different here?

- Dave


>
> -eric
>
>
>>
>>
>>> 2. We will try to make all the changes required for RISC-V LTO in time
>>> for the LLVM 11.0 branch date. This includes the target-independent changes
>>> to TargetMachine initialisation (the design of which is not yet agreed).
>>>   Enabling LTO will require and use the `target-abi` module flag.
>>>
>>
>> Ah, OK so this is the bit I was referencing above then. So your goal is
>> to get LTO to be motivated by the target-abi module flag, where as in (1)
>> it'll be motivated by a command line flag, and the module flag will just be
>> checked later that it's consistent with the command line flag?
>>
>> So then this step (2) is where I'd be interested to know the answers to
>> my above questions re: what's different about target triple/data layout and
>> the target abi. Are they all needed around the same time, but their
>> representation (target triple and data layout being first class properties
>> of the module, but maybe they're parsed separatrely in some codepath to get
>> them earlier than otherwise?) different? Or are their representations the
>> same, but target-abi is needed earlier/in a different place than
>> triple/data layout?
>>
>> Anyway, I'd like to see that (2) discussion in more detail - probably
>> here/in an llvm-dev thread, to make sure it's well worked through,
>> especially if it means adding significant new layering.
>>
>>
>>> One advantage I see of this is that other frontends (rustc, LDC, zig,
>>> and others) have a whole release cycle (10.0 -> 11.0) in order to add the
>>> right target-abi metadata to LLVM modules that they produce, in order to
>>> enable LTO. This also means we don’t need to rush target-independent
>>> changes to TargetMachine into LLVM 10.0.
>>>
>>
>> (1) seems fine/trivial-ishly OK to me & yeah, if it paves the way for (2)
>> that's fine by me (pending other people who have more context than I do
>> about any of this).
>>
>> Thanks for summarizing/looping back here!
>>
>> - Dave
>>
>>
>>>
>>> I welcome feedback on this plan.
>>>
>>> Sam
>>>
>>>
>>> > On 15 Jan 2020, at 9:37 am, Zakk <zakk0610 at gmail.com> wrote:
>>> >
>>> >
>>> >
>>> > David Blaikie <dblaikie at gmail.com> 於 2020年1月14日 週二 上午2:15寫道:
>>> >
>>> >
>>> > On Mon, Jan 13, 2020 at 6:12 AM Zakk <zakk0610 at gmail.com> wrote:
>>> >
>>> >
>>> > David Blaikie via llvm-dev <llvm-dev at lists.llvm.org> 於 2020年1月11日 週六
>>> 上午2:03寫道:
>>> > Ah, OK - thanks for walking me through that.
>>> >
>>> > Fair enough, I think I understand the issue/tradeoff now - and that
>>> the other module level metadata don't currently influence the target
>>> configuration at this level?
>>> >
>>> >
>>> > I'm not sure, I only know that the target-abi is decided and computed
>>> when instantiating the TargetMachine, and TragetMachine constructor doesn't
>>> have IR as argument.
>>> >
>>> > Could you look into the other module level metadata to
>>> understand/explain that, as it may help inform the design here?
>>> >
>>> > I grep "addModuleFlag" in clang and list all module level metadata:
>>> >
>>> > "NumRegisterParameters":  implement __attribute((regparm(3))) which is
>>> x86 specific feature and used in X86TargetLowering phase.
>>> > "cfguard": implement Control Flow Guard (Check mechanism) in IR level
>>> pass.
>>> > "StrictVTablePointers", "StrictVTablePointersRequirement": there is no
>>> code use this info in llvm source tree
>>> > "min_enum_size", "wchar_size":  indicate C type size, ARMAsmPrinter
>>> use it to emit corresponded asm attributes.
>>> > "Cross-DSO CFI", "CFI Canonical Jump Tables": implement control flow
>>> integrity (CFI) schemes in IR level pass.
>>> > "cf-protection-return" and "cf-protection-branch": implement control
>>> flow architecture protection in x86, used in X86ISelLowering and
>>> X86AsmPrinter phases.
>>> > "nvvm-reflect-ftz":  implement optimized code paths that flush
>>> subnormals to zero in NVPTX custom IR pass.
>>> > "EnableSplitLTOUnit", "ThinLTO": LTO related stuff, I didn't look that
>>> because I think they will not influence the target configuration.
>>> >
>>> >
>>> > I didn't dig in those "Dwarf Version", "CodeView", "CodeViewGHash",
>>> "Debug Info Version" and others Objective-C related info in module level
>>> metadata.
>>> >
>>> > But I found several metadata info are handled at
>>> AsmPrinter::EmitStartOfAsmFile(Module &M), but there is no any similar
>>> interface in MC layer. ex. MCAsmBackend or MCStreamer.
>>> >
>>> >
>>> > I guess this starts to come from the other direction for me, then:
>>> what purpose does the "target triple" in the module textual IR serve? I
>>> guess it's not used to create the backend, but at best used to validate
>>> that it's correct/the same as what the backend's been configured with? (or
>>> perhaps that is accessible before the Module proper is parsed?)
>>> >
>>> > I'm not sure too. For RISC-V as Sam mention before, it's not easy to
>>> add ABI to the triple... but Yes, the target triple is accessible before
>>> the Module proper is parsed in current codegen flow.
>>> > so in my PoC https://reviews.llvm.org/D72245#change-sHyISc6hOqcy, I
>>> need to changed the flow but I think static method is not a good idea.
>>> >
>>> > And this too. If the triple from the IR file is used in the places you
>>> need this new ABI parameter, it might inform how best to pass this other
>>> data through too.
>>> >
>>> > On Fri, Jan 10, 2020 at 9:09 AM Sam Elliott via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> > I also work on the RISC-V backend, and have been doing a little work
>>> on the ELF psABI document for RISC-V.
>>> >
>>> > I agree that, conceptually, the psABI choice should be in the module
>>> metadata.
>>> >
>>> > Zakk, however, has discovered a phase ordering issue within LLVM that
>>> relates to this approach. The phase ordering problem is that the LTO
>>> backend is currently setup without interrogating the current module for any
>>> information which might affect its setup. He has two patches that propose
>>> two different ways forward, but he thinks neither is satisfactory. It would
>>> be useful to have some guidance as to which approach is preferred by LLVM.
>>> >
>>> > Of course, it would be great if any such adopted approach was
>>> compatible with Mips, given that backend seems to have a similar kind of
>>> problem (though they have also managed to solve it different ways).
>>> >
>>> > As an aside, adding the ABI to the triple is not particularly easy,
>>> as, for instance, there’s already half an ABI choice in the
>>> ‘riscv64-unknown-linux-gnu` “triple”. The gnu/musl ABI choice is orthogonal
>>> to the RISC-V calling convention choice around the use of software/hardware
>>> floating point ABIs, and saying you have to specify both in the triple
>>> seems like it will only confuse users.
>>> >
>>> > tldr: Module metadata seems like the right approach, feedback on which
>>> of the two of Zakk’s patches (from the original email) forms the better
>>> approach to get this information into the LTO backend setup would be
>>> helpful.
>>> >
>>> > Sam
>>> >
>>> > > On 10 Jan 2020, at 12:23 am, David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> > >
>>> > > On Wed, Jan 8, 2020 at 5:57 PM Eric Christopher <echristo at gmail.com>
>>> wrote:
>>> > > Right. I think that's what we ended up doing rather than a more
>>> general attribute on the module itself.
>>> > >
>>> > > What sort of further generality did you have in mind?
>>> > >
>>> > > *shrugs* Probably ok? I'd probably prefer not to have to have target
>>> code to do the evaluation if possible,
>>> > >
>>> > > Target code to evaluate what, sorry? I'm not following (this whole
>>> conversation's started being a bit confusing to me - from what I can see,
>>> module metadata pretty well captures the feature request here (a module
>>> level target-specific attribute that can fail during IR linking if linking
>>> modules with inconsistent values for this attribute)
>>> > >
>>> > > but everything is weird and an edge case - mips abis more than some
>>> :)
>>> > >
>>> > > -eric
>>> > >
>>> > > On Wed, Jan 8, 2020 at 8:58 AM David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> > > Oh, I should say - the module flags metadata also has support for
>>> "error if you try to merge two modules with different values for this flag".
>>> > >
>>> > > On Wed, Jan 8, 2020 at 8:57 AM David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> > >
>>> > >
>>> > > On Tue, Jan 7, 2020 at 5:27 PM Eric Christopher <echristo at gmail.com>
>>> wrote:
>>> > >
>>> > >
>>> > > On Tue, Jan 7, 2020 at 3:18 PM Daniel Sanders via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> > >
>>> > >
>>> > >> On Jan 7, 2020, at 13:57, David Blaikie <dblaikie at gmail.com> wrote:
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Mon, Jan 6, 2020 at 6:05 PM Daniel Sanders <
>>> daniel_l_sanders at apple.com> wrote:
>>> > >>
>>> > >>
>>> > >>> On Jan 6, 2020, at 14:29, David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> On Mon, Jan 6, 2020 at 5:58 AM Zakk <zakk0610 at gmail.com> wrote:
>>> > >>>
>>> > >>>
>>> > >>> David Blaikie <dblaikie at gmail.com> 於 2020年1月6日 週一 下午2:23寫道:
>>> > >>> If this is something that can vary per file in a compilation and
>>> resolve correctly when one object file is built with one ABI and another
>>> object file is built with a different ABI (that seems to be antithetical to
>>> the concept of "ABI" Though) - then it should be a subtarget feature.
>>> > >>>
>>> > >>> ABI is generally something that has to be agreed upon across
>>> object files - so it wouldn't make sense to link two object files with two
>>> different ABIs. What's going on here that makes that valid in this case?
>>> > >>>
>>> > >>>
>>> > >>> Are you talking about that "[mips] Pass ABI name via -target-abi
>>> instead of target-features"?
>>> > >>>
>>> > >>> I'm not talking about that patch in particular (I have no specific
>>> knowledge of mips or its implementation) - but speaking about the general
>>> design of LLVM's subtarget features.
>>> > >>>
>>> > >>> Might be interesting to know why that change was made & may help
>>> explain what's going on here.
>>> > >>
>>> > >> It's been a while so I don't remember the detail but IIRC one of
>>> the reasons was that mips had a feature bit per ABI and had a lot of
>>> duplicated code sanity checking that only one bit was enabled and deriving
>>> the ABI from the feature bits. The -target-abi option already existed and
>>> using that prevented the possibility of having more than one ABI selected.
>>> > >>
>>> > >> There was a lot of code (some of which didn't have access to target
>>> features) in the backend that tried to derive the ABI from the arch
>>> component of the triple (e.g. mips64 => n64 ABI) even though there were
>>> multiple possible ABI's for each arch (mips64 => o32, n32, or n64 ABI's)
>>> and there isn't a canonical choice for any given triple (it varies between
>>> linux distributions and toolchains in general). Settling on -target-abi
>>> allowed us to sort out the inconsistencies in the backends opinion of what
>>> the selected ABI was. It also allowed us to move the selection of the ABI
>>> into the frontend where disagreements between distributions/toolchains on
>>> what each triple means was easier to deal with.
>>> > >>
>>> > >> Is this something that can vary per function in a program? (that
>>> seems confusing to me - ABI is usually, sort of by definition, the thing
>>> that all parts of the program have to agree with (at least on either side
>>> of any function call - I suppose different functions could have different
>>> ABIs as long as the function declarations carried ABI information so
>>> callers could cooperate, etc)) It sounds to me like that's what Zakk is
>>> suggesting/grappling with.
>>> > >
>>> > > No, it was a per-binary thing for mips and was stored in the ELF
>>> header. Ignoring a couple quirks*, every object in the program had to agree
>>> on the ABI in order to link.
>>> > >
>>> > > I'm not particularly familiar with LTO but going by the description
>>> of the problem it seems to me that the overall issue is that for 1, 2, and
>>> 5, each module fails to completely describe the contents. They each have a
>>> label saying it's riscv64, elf, etc. but it doesn't mention lp64d anywhere.
>>> As a result you can't check that you aren't trying to mix incompatible
>>> modules and can only trust (and require) the command line option. It's
>>> worth mentioning that DataLayout tends to change for different ABI's so the
>>> ABI is kind-of there but there isn't anything that really guarantees that
>>> there's a 1:1 relationship.
>>> > >
>>> > > 3 and 4 fix the problem of the missing labels but the snag with 4 is
>>> that target features are overridable at the function level too and that
>>> doesn't really make sense for ABI's (it's fine for calling conventions but
>>> that's only part of the ABI and calling conventions are described elsewhere
>>> in the IR anyway). Without changing the IR, 3 looks like the only one that
>>> solves the overall problem but then you have potential for problems where
>>> the official triple for a platform doesn't match what needs to be in the
>>> triple metadata in the IR. For example, mips64-linux-gnu can be N32 or N64
>>> ABI (or more rarely O32) depending on the
>>> OS/distribution/toolchain/version. FWIW, back when I worked on it, we were
>>> generally moving towards the idea of canonical triples which contained the
>>> ABI and some lowering code on the user facing interfaces to disambiguate
>>> things like mips64-linux-gnu to mips64-linux-gnuabin32.
>>> > >
>>> > >
>>> > > To reply here a bit:
>>> > >
>>> > > I worry about target triple being used, but I think I do/did agree
>>> that it's probably the best we can move to in the near term. My concern is
>>> that we will have diverged from more "canonical" triples that are used in
>>> other places just for our compilation model. I'd love to be able to encode
>>> ABI into the module in some way and make it an error to link two modules
>>> that have incompatible ABIs and am definitely up to ideas on how to encode
>>> target specific module level data into the module. I'd like to avoid
>>> metadata if possible. Any thoughts here on how you'd like to see it encoded
>>> for the long term?
>>> > >
>>> > > I was going to say - module metadata has those semantics (but, yeah,
>>> do have the "this is load bearing metadata" problem).
>>> > >
>>> > > But let's see what else is already there - NumRegisterParameters (no
>>> idea what that is, but that sounds like an ABI feature/not something you
>>> could drop & maintain correctness), Dwarf Version (kinda), CodeView
>>> (kinda), PIC level (probably load bearing), PIE level (similar), Code Model
>>> (load bearing).
>>> > >
>>> > > https://llvm.org/docs/LangRef.html#module-flags-metadata - yeah, I
>>> think this is probably the right tool for this now, given what else is
>>> already here. If someone wants to make this solution not metadata (but I
>>> think this "global flags metadata" is specifically treated to have all the
>>> semantics we want here, so I'm not sure what else would be created that
>>> didn't look basically the same) then it can be done & all these things can
>>> be ported over to whatever that new thing is.
>>> > >
>>> > >
>>> > > *Just for completeness, the quirks I can remember off-hand were:
>>> > > - IEEE754 1985 and 2008 would successfully cross-link unless you
>>> used a flag indicating that it mattered. This was because we wanted to omit
>>> the 1985 standard from newer chips but there were many ecosystems using it
>>> due to historical reasons. In practice, very few programs care about the
>>> tiny details (does negation trap, etc.) so we essentially force-migrated
>>> whole ecosystems by relaxing the link requirements and changing the default.
>>> > > - Along the same lines, we also supported cross-linking specific
>>> variants of the O32 ABI. There was only supposed to be one O32 but an
>>> unfortunate mis-reading of the ABI spec coupled with a failure to catch it
>>> with conformance tests split it in two. Luckily, Matthew Fortune found a
>>> way to reunite them without breaking either one by adding a third that
>>> followed the original intent of the spec and was compatible with either one
>>> (but not both at once) and then migrating everyone to that.
>>> > >
>>> > >
>>> > > *sigh* I remember that. :)
>>> > >
>>> > > Thanks for chiming in Daniel!
>>> > >
>>> > > -eric
>>> > >
>>> > >> If it can vary per function, then the ABI information shouldn't be
>>> used outside the per-function context (ie: no global variables/other output
>>> could depend on the ABI because which function's ABI would it depend on?).
>>> > >>
>>> > >>
>>> > >>> I don't know WHY -target-abi is passing via different option, not
>>> via -mattr (subtarget feature)
>>> > >>> maybe usually subtarget feature is used to manages different
>>> specific ISA.
>>> > >>>
>>> > >>>
>>> > >>> On Sun, Jan 5, 2020 at 10:04 PM Zakk via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> > >>> Hi all.
>>> > >>>
>>> > >>> There are two steps in LTO codegen so the problem is how to pass
>>> ABI info into LTO code generator.
>>> > >>>
>>> > >>> The easier way is pass -target-abi via option to LTO codegen, but
>>> there is linking issue when linking two bitcodes generated by different
>>> -mabi option. (see https://reviews.llvm.org/D71387#1792169)
>>> > >>>
>>> > >>> Usually the ABI info for a file is derived from target triple,
>>> mcpu or -mabi, but in RISC-V, target-abi is only derived from -mabi and
>>> -mattr option, so the one of solutions is encoding target-abi in IR via
>>> LLVM module flags metadata.
>>> > >>>
>>> > >>> But there is an another issue in assembler. In current LLVM
>>> design, there is no mechanism to extract info from IR before AsmBackend
>>> construction, so I use some little weird approach to init target-abi option
>>> before construct AsmBackend[1] or reassign target-abi option in
>>> getSubtargetImpl and do some hack in backend[2].
>>> > >>>
>>> > >>> 1. https://reviews.llvm.org/D72245#change-sHyISc6hOqcy (see
>>> llc.cpp)
>>> > >>> 2. https://reviews.llvm.org/D72246 (see RISCVAsmBackend.h)
>>> > >>>
>>> > >>> I think [1] and [2] are not good enough, the other ideals like
>>> > >>>
>>> > >>> 3. encode target abi info in triple name. ex.
>>> riscv64-unknown-elf-lp64d
>>> > >>> 4. encode target-abi into in target-feature (maybe it's not a good
>>> ideal because mips reverted this approach
>>> > >>> before.
>>> http://llvm.org/viewvc/llvm-project?view=revision&revision=227583)
>>> > >>>
>>> > >>> 5. users should pass target-abi themselves. (append
>>> -Wl,-plugin-opt=-target-abi=ipl32f when compiling with -mabi=ilp32f)
>>> > >>>
>>> > >>> Is it a good idea to encode target-abi into bitcode?
>>> > >>> If yes, is there another good approach to fix AsmBackend issue?
>>> > >>> I’d appreciate any help or suggestions.
>>> > >>>
>>> > >>> Thanks.
>>> > >>>
>>> > >>> --
>>> > >>> Best regards,
>>> > >>> Kuan-Hsu
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> _______________________________________________
>>> > >>> LLVM Developers mailing list
>>> > >>> llvm-dev at lists.llvm.org
>>> > >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>> Best regards,
>>> > >>> Kuan-Hsu
>>> > >>>
>>> > >>>
>>> > >>> _______________________________________________
>>> > >>> LLVM Developers mailing list
>>> > >>> llvm-dev at lists.llvm.org
>>> > >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> > >
>>> > > _______________________________________________
>>> > > LLVM Developers mailing list
>>> > > llvm-dev at lists.llvm.org
>>> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> > > _______________________________________________
>>> > > LLVM Developers mailing list
>>> > > llvm-dev at lists.llvm.org
>>> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >
>>> > --
>>> > Sam Elliott
>>> > Software Developer - LLVM
>>> > lowRISC CIC
>>> > --
>>> >
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org
>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org
>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >
>>> >
>>> > --
>>> > Best regards,
>>> > Kuan-Hsu
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Best regards,
>>> > Kuan-Hsu
>>>
>>> --
>>> Sam Elliott
>>> Software Developer - LLVM
>>> lowRISC CIC
>>> --
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200127/82cd36b5/attachment.html>


More information about the llvm-dev mailing list