[cfe-dev] [RFC] Unified offloading option for CUDA/HIP/OpenMP

Artem Belevich via cfe-dev cfe-dev at lists.llvm.org
Wed Mar 10 11:15:17 PST 2021


On Wed, Mar 10, 2021 at 10:53 AM Doerfert, Johannes <jdoerfert at anl.gov>
wrote:

> On 3/10/21 12:37 PM, Artem Belevich wrote:
> > On Wed, Mar 10, 2021 at 7:38 AM Doerfert, Johannes <jdoerfert at anl.gov>
> > wrote:
> >
> >> On 3/8/21 6:59 PM, Artem Belevich wrote:
> >>> On Mon, Mar 8, 2021 at 11:23 AM Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>
> >> wrote:
> >>>> [AMD Public Use]
> >>>>
> >>>>
> >>>>
> >>>> The amdgpu xnack and sramecc need to be part of GPU arch name the same
> >> way
> >>>> as for --offload-arch, e.g.
> >>>>
> >>>>
> >>>>
> >>>> --offload=amdgcn-gfx906:xnack+,amdgcn-gfx906:xnack-
> >>>>
> >>>>
> >>>>
> >>>> They behave like GPU arch.
> >>>>
> >>>>
> >>>>
> >>> It's just that it's rather unwieldy to use in practice. It's not a
> >>> showstopper, but perhaps now may be a convenient point to consider the
> >>> naming scheme for AMDGPU sub-compilations again.
> >>>
> >>> It should be easy enough to add useful or commonly used names/aliases.
> >>>
> >>> E.g. `--offload=nvidia-ampere` would be equivalent to
> >>> `--offload=sm_80,sm_86`.
> >>> Or `--offload=amd-navi33` ->
> >> `--offload=gfx3011:+something:-something_else`
> >>> Even for CUDA and NVIDIA GPUs that've been around for a pretty long
> time,
> >>> I'm still getting the questions from the users -- "I've got this
> >>> GTX/RTX-whatever video card and can't figure out how to compile for it.
> >>> What are those compute_XY and sm_YZ and which ones should I use?"
> >>> I can only imagine trying to explain to someone : "You need to use
> >>> gfx-XYZ<colon><dash>xnack<colon><plus>sram-ecc.... Oh, you must have
> >>> mistyped that, let's try it again."
> >>>
> >>> Perhaps we need to split offloading machinery further.
> >>>
> >>> The --offloat=target still serves the double purpose of creating a
> >>> sub-compilation *and* specifying the target details, providing the
> >> initial
> >>> set of parameters for the given target. It also prevents creation for
> >>> multiple subcompilations for targets with minor differences which may
> be
> >>> one of the reasons that led to AMDGPU's encoding various features in
> the
> >>> target name.
> >>>
> >>> What if we were to modify the scheme a bit in a way that allows better
> >>> handling of multiple variants of the same target.
> >>> E.g.:
> >>> --offload=gfx906 at A,gfx906 at B   -- creates two sub-compilations both
> >>> targeting gfx906. Optional @suffix makes it possible to match them
> >>> independently.
> >>> -Xoffload=@A --set-features=xnack+,sram-ecc-
> >>> -Xoffload=@B --set-features=xnack-,sram-ecc+
> >>>
> >>> Would something like this help with AMDGPU's feature handling?
> >> Don't we need that also for NVIDIA?
> >>
> >> -offload=nvptx64 at A,nvptx64 at B -Xoffload=@A -march=sm_30 -Xoffload=@B
> >> -march=sm_70
> >>
> > It could be phrased that way, but it's not, strictly speaking, necessary
> > for NVPTX.
> > Unlike AMDGPU it does not support targeting the same GPU in multiple
> > sub-compilations.
> > The question is -- what do we want '--offload' parameter to mean? It
> could
> > range from 'just an arbitrary test string only to be used as the key for
> > the -Xoffload matching' to 'specify all essential
> > subcompilation parameters' which is what  --offload-arch does now. We may
> > want to have both.
>
> I should have been more specific. CUDA might not want to include
> NVPTX of the code with different sm versions but we are expecting
> this to be an OpenMP use case. Compile for old and new GPUs and
> pick at runtime the newest the hardware supports.
>

CUDA does support targeting multiple *different* GPUs, what I meant is that
you can not have two sub-compilations targeting the *same* sm_XX but with
different options.


>
> > "Just a string" would be rather verbose to use in practice as the
> top-level
> > mechanism, unless we also implement some sort of shortcut/alias system to
> > allow specifying common configurations without having to type them all.
> One
> > would be able to create their own subcompilation, if necessary, but for
> > most use cases one of the standard ones would do.
> >
> > E.g. `--offload=sm_70` by itself should be sufficient to target sm_70 GPU
> > with all the necessary options and would be what's used in vast majority
> of
> > cases.
> > `-Xoffload=sm70 -ffoo` would still work, if it would need to be tweaked.
> > If I wanted a custom sub-compilation, I'd do `-offload=my_sm77
> > -Xoffload=my_sm77 -mcpu=sm_77 -other -options`.
> > Because my_sm77 is not a known configuration, it would start with the
> > defaults and it would be up to the user to provide additional options to
> > customize it.
>
> I like the idea. What if we say `--offload=sm_70` will populate
> the defaults for NVIDIA sm70 and people can add custom options
> through `-Xoffload`, e.g., `-Xoffload=sm_70 -foobar`.


That's the idea.



> I don't mind the custom name pipelines but unsure if we need them. If so,
> maybe
> we append them to a identifier that defines the target, e.g.,
> `--offload=sm_70 at A` or `--offload=nvptx64 at B`. `-Xoffload` should
> always match the `--offload` name regardless if it is a known default
> (or a custom name).
>

Yup. As far as the `-Xoffload` matcher is concerned, the key should be just
a string.
We could limit --offload to the list of pre-defined standard targets. Users
could do one of them as the base for further customization. Considering
that it effectively allows them to change -cc1 options, they already have
ability to do whatever they want, except changing the name of the offload
'key'. So, a set of the standard offload configurations plus `@xx` should
provide relatively unencumbered use in common cases and ability to refer to
and modify specific subcompilation's options.


> Does that make sense or am I too confused by now?
>

I think we're on the same page.

--Artem



>
> ~ Johannes
>
>
> >
> > --Artem
> >
> >
> >> WDYT?
> >>
> >> ~ JOhannes
> >>
> >>
> >>> --Artem
> >>>
> >>>
> >>>> Sam
> >>>>
> >>>>
> >>>>
> >>>> *From:* Artem Belevich <tra at google.com>
> >>>> *Sent:* Monday, March 8, 2021 2:01 PM
> >>>> *To:* Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>
> >>>> *Cc:* Doerfert, Johannes <jdoerfert at anl.gov>; Ben Boeckel <
> >>>> ben.boeckel at kitware.com>; Lieberman, Ron <Ron.Lieberman at amd.com>;
> >>>> a.bataev at hotmail.com; Chan, SiuChi <siuchi.chan at amd.com>; Searles,
> >> Mark <
> >>>> Mark.Searles at amd.com>; cfe-dev (cfe-dev at lists.llvm.org) <
> >>>> cfe-dev at lists.llvm.org>; jeffrey.sandoval at hpe.com; Jon Chesterfield <
> >>>> jonathanchesterfield at gmail.com>; Rodgers, Gregory <
> >> Gregory.Rodgers at amd.com
> >>>> *Subject:* Re: [cfe-dev] [RFC] Unified offloading option for
> >>>> CUDA/HIP/OpenMP
> >>>>
> >>>>
> >>>>
> >>>> [CAUTION: External Email]
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Sat, Mar 6, 2021 at 7:13 AM Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>
> >> wrote:
> >>>> [AMD Public Use]
> >>>>
> >>>> We need to different target triples since it may not always be
> possible
> >> to
> >>>> infer target triple by cpu name. So I guess it would be like:
> >>>>
> >>>> "--offload=amdgcn-gfx906,amdgcn-gfx1010"
> >>>> "--Xoffload=amdgcn-gfx* options common to all AMD GPUs"
> >>>> "--Xoffload=amdgcn-gfx906 -mcpu=gfx906
> --fsomething-specific-to-gfx906"
> >>>>
> >>>>
> >>>>
> >>>> SGTM.
> >>>>
> >>>> Do you expect the AMDGPU's features (+xnack, -ecc, etc) to be part of
> >> the
> >>>> offload target ? Or would they be specified via -Xoffload arguments?
> >>>>
> >>>>
> >>>>
> >>>> --Artem
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Sam
> >>>>
> >>>> -----Original Message-----
> >>>> From: Doerfert, Johannes <jdoerfert at anl.gov>
> >>>> Sent: Friday, March 5, 2021 1:25 PM
> >>>> To: Artem Belevich <tra at google.com>; Liu, Yaxun (Sam) <
> >> Yaxun.Liu at amd.com>
> >>>> Cc: Ben Boeckel <ben.boeckel at kitware.com>; Lieberman, Ron <
> >>>> Ron.Lieberman at amd.com>; a.bataev at hotmail.com; Chan, SiuChi <
> >>>> siuchi.chan at amd.com>; Searles, Mark <Mark.Searles at amd.com>; cfe-dev (
> >>>> cfe-dev at lists.llvm.org) <cfe-dev at lists.llvm.org>;
> >> jeffrey.sandoval at hpe.com;
> >>>> Jon Chesterfield <jonathanchesterfield at gmail.com>; Rodgers, Gregory <
> >>>> Gregory.Rodgers at amd.com>
> >>>> Subject: Re: [cfe-dev] [RFC] Unified offloading option for
> >> CUDA/HIP/OpenMP
> >>>> [CAUTION: External Email]
> >>>>
> >>>> On 3/4/21 3:05 PM, Artem Belevich wrote:
> >>>>> On Thu, Mar 4, 2021 at 10:34 AM Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>
> >>>> wrote:
> >>>>>> [AMD Public Use]
> >>>>>>
> >>>>>> There is another aspect we need to consider: how to modify the
> >>>>>> -target option by additional options?
> >>>>>>
> >>>>>> For the existing --offload-arch option, we could use -Xarch_ to add
> >>>>>> specific options for it.
> >>>>>>
> >>>>> `-Xarch_xxx` as implemented right now is a rather limiter hack. IIRC
> >>>>> it only accepts options w/o arguments which limits its usability.
> >>>>>
> >>>>>
> >>>>>> Assuming we have an -offload="amdgcn -mcpu=gfx906" option, then we
> >>>>>> want to add some options specific to it by an additional option,
> what
> >>>>>> should we do?
> >>>>>>
> >>>>> I think we've been conflating telling the driver what to compile for
> >>>>> and customizing individual sub-compilations.
> >>>>>
> >>>>> We could explicitly separate the two tasks. E.g.:
> >>>>> `--[no-]offload=target1,target2,target3...`
> >>>>> `--Xoffload=target_pattern target_options...`
> >>>>>
> >>>>> This way your example would be handled with:
> >>>>> "--offload=gfx906,gfx1010"
> >>>>> "--Xoffload=gfx* options common to all AMD GPUs"
> >>>>> "--Xoffload=gfx906 -mcpu=gfx906 --fsomething-specific-to-gfx906"
> >>>>>
> >>>>> In the end `-Xarch_xxx` would become an alias for '-Xoffload=xxx'.
> >>>> +1
> >>>>
> >>>>
> >>>>> --Artem
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Sam
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Doerfert, Johannes <jdoerfert at anl.gov>
> >>>>>> Sent: Thursday, February 11, 2021 12:59 PM
> >>>>>> To: Artem Belevich <tra at google.com>; Liu, Yaxun (Sam)
> >>>>>> <Yaxun.Liu at amd.com>
> >>>>>> Cc: Ben Boeckel <ben.boeckel at kitware.com>; Lieberman, Ron <
> >>>>>> Ron.Lieberman at amd.com>; a.bataev at hotmail.com; Chan, SiuChi <
> >>>>>> siuchi.chan at amd.com>; Searles, Mark <Mark.Searles at amd.com>;
> cfe-dev (
> >>>>>> cfe-dev at lists.llvm.org) <cfe-dev at lists.llvm.org>;
> >>>>>> jeffrey.sandoval at hpe.com; Jon Chesterfield
> >>>>>> <jonathanchesterfield at gmail.com>
> >>>>>> Subject: Re: [cfe-dev] [RFC] Unified offloading option for
> >>>>>> CUDA/HIP/OpenMP
> >>>>>>
> >>>>>> [CAUTION: External Email]
> >>>>>>
> >>>>>> I'm OK with either.
> >>>>>>
> >>>>>> On 2/11/21 11:42 AM, Artem Belevich wrote:
> >>>>>>> On Thu, Feb 11, 2021 at 8:30 AM Liu, Yaxun (Sam) <
> Yaxun.Liu at amd.com>
> >>>>>> wrote:
> >>>>>>>> [AMD Public Use]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Sorry for the delay.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Both Johannes’ and Artem’s proposals should satisfy the needs of
> >>>> users:
> >>>>>>>>
> >>>>>>>> Option 1:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> `-offload=<offload-pattern> optA optB optC`.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Option 2:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> `-offload=<offload-pattern>,optA,optB,optC`.
> >>>>>>>>
> >>>>>>> I'm fine with #2. We're using something similar with our build
> tools
> >>>>>>> and it works reasonably well.
> >>>>>>> However, it does have one annoying corner case. There's no easy way
> >>>>>>> to pass an option which has a comma in it. E.g. if I want to pass
> >>>>>>> `-Wl,something,something`. Perhaps we could use sed-like approach
> >>>>>>> and allow changing the separator. E.g. `s/a/b/` == `s at a@b@`.
> >>>>>>>
> >>>>>>> --Artem
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Compared to the old options, they are more concise and more
> >> readable.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The main difference is the delimiter. To me option 2 is more
> >>>>>>>> attractive since it does not need quotations for most cases.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Can we reach an agreement on option 2?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Sam
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> *From:* Artem Belevich <tra at google.com>
> >>>>>>>> *Sent:* Tuesday, December 15, 2020 2:13 PM
> >>>>>>>> *To:* Ben Boeckel <ben.boeckel at kitware.com>
> >>>>>>>> *Cc:* Doerfert, Johannes <jdoerfert at anl.gov>; Liu, Yaxun (Sam) <
> >>>>>>>> Yaxun.Liu at amd.com>; Lieberman, Ron <Ron.Lieberman at amd.com>;
> >>>>>>>> a.bataev at hotmail.com; Chan, SiuChi <siuchi.chan at amd.com>;
> Searles,
> >>>>>>>> Mark < Mark.Searles at amd.com>; cfe-dev (cfe-dev at lists.llvm.org) <
> >>>>>>>> cfe-dev at lists.llvm.org>
> >>>>>>>> *Subject:* Re: [cfe-dev] [RFC] Unified offloading option for
> >>>>>>>> CUDA/HIP/OpenMP
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [CAUTION: External Email]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Dec 15, 2020 at 10:23 AM Ben Boeckel
> >>>>>>>> <ben.boeckel at kitware.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> On Mon, Dec 14, 2020 at 14:04:43 -0800, Artem Belevich via cfe-dev
> >>>>>> wrote:
> >>>>>>>>> It all may be an utter overkill, too. WDYT?
> >>>>>>>> Note that tools such as ccache and sccache generally need to be
> >>>>>>>> able to understand what's going on (I believe distcc and other
> >>>>>>>> distributed compilation tools also generally need to know too), so
> >>>>>>>> making it sensible enough for interpretation based on just the
> >>>>>>>> flags to be possible should be considered.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I think this is somewhat orthogonal to how we specify per-target
> >>>>>> options.
> >>>>>>>> Such a tool almost never knows about all possible compiler options
> >>>>>>>> and has to pass through the unknown options as-is.  However, any
> >>>>>>>> form
> >>>>>> of 'nested'
> >>>>>>>> options specified on the command line will have a chance to
> confuse
> >>>>>>>> such tool. E.g. if I want to pass '-E' to some sub-tool for a
> >>>>>>>> particular offload-target, ccache, not being aware that it's not a
> >>>>>>>> top-level compilation option, may interpret it as an attempt to
> >>>>>> preprocess the TU.
> >>>>>>>> I wonder if it would make sense to just move all this per-target
> >>>>>>>> option complexity into an external response file. As far as
> >>>>>>>> existing tools are concerned, it would look like
> >>>>>>>> `--offload-options=target-opts.file` without affecting tool's
> >>>>>>>> general idea what this compilation is about to do, and the
> external
> >>>>>>>> file would allow us to be as flexible as we need to be to specify
> >>>>>>>> per-target
> >>>>>> options. It could be just a flat list of pairs `-Xarch_...
> >>>>>>>> optA`.  Or we could use YAML.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> That approach, however, has its own issues and would still need to
> >>>>>>>> be optional. If it's the only way to specify offload options, that
> >>>>>>>> will complicate other use cases as now they would have to deal
> with
> >>>>>>>> temporary files.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Maybe a slightly modified variant of jdoefert@'s idea would work
> >>>>>> better:
> >>>>>>>>>>>        -offload="amd -march=gfx906 -fno-vectorize" -fopenmp
> >>>>>>>> Implement it in a way similar to -Wl,optA,optB,optC and extend it
> >>>>>>>> to match an offload scope glob/regex.
> >>>>>>>>
> >>>>>>>> E.g. `-offload=<offload-pattern>,optA,optB,optC`.
> >>>>>>>>
> >>>>>>>> As far as the external tools are concerned, it's just one option
> to
> >>>>>>>> pass though. At the same time it should be flexible enough to
> apply
> >>>>>>>> the options to subset of offload targets in a human-manageable
> way.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> --Artem Belevich
> >>>>>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> --Artem Belevich
> >>>>
> >>
>
>

-- 
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210310/d5081e7b/attachment-0001.html>


More information about the cfe-dev mailing list