[cfe-dev] [RFC] Unified offloading option for CUDA/HIP/OpenMP

Wed Mar 10 07:38:15 PST 2021

On 3/8/21 6:59 PM, Artem Belevich wrote:
> On Mon, Mar 8, 2021 at 11:23 AM Liu, Yaxun (Sam) <Yaxun.Liu at amd.com> wrote:
>
>> [AMD Public Use]
>>
>>
>>
>> The amdgpu xnack and sramecc need to be part of GPU arch name the same way
>> as for --offload-arch, e.g.
>>
>>
>>
>> --offload=amdgcn-gfx906:xnack+,amdgcn-gfx906:xnack-
>>
>>
>>
>> They behave like GPU arch.
>>
>>
>>
> It's just that it's rather unwieldy to use in practice. It's not a
> showstopper, but perhaps now may be a convenient point to consider the
> naming scheme for AMDGPU sub-compilations again.
>
> It should be easy enough to add useful or commonly used names/aliases.
>
> E.g. `--offload=nvidia-ampere` would be equivalent to
> `--offload=sm_80,sm_86`.
> Or `--offload=amd-navi33` -> `--offload=gfx3011:+something:-something_else`
>
> Even for CUDA and NVIDIA GPUs that've been around for a pretty long time,
> I'm still getting the questions from the users -- "I've got this
> GTX/RTX-whatever video card and can't figure out how to compile for it.
> What are those compute_XY and sm_YZ and which ones should I use?"
> I can only imagine trying to explain to someone : "You need to use
> gfx-XYZ<colon><dash>xnack<colon><plus>sram-ecc.... Oh, you must have
> mistyped that, let's try it again."
>
> Perhaps we need to split offloading machinery further.
>
> The --offloat=target still serves the double purpose of creating a
> sub-compilation *and* specifying the target details, providing the initial
> set of parameters for the given target. It also prevents creation for
> multiple subcompilations for targets with minor differences which may be
> one of the reasons that led to AMDGPU's encoding various features in the
> target name.
>
> What if we were to modify the scheme a bit in a way that allows better
> handling of multiple variants of the same target.
> E.g.:
> --offload=gfx906 at A,gfx906 at B   -- creates two sub-compilations both
> targeting gfx906. Optional @suffix makes it possible to match them
> independently.
> -Xoffload=@A --set-features=xnack+,sram-ecc-
> -Xoffload=@B --set-features=xnack-,sram-ecc+
>
> Would something like this help with AMDGPU's feature handling?

Don't we need that also for NVIDIA?

-offload=nvptx64 at A,nvptx64 at B -Xoffload=@A -march=sm_30 -Xoffload=@B 
-march=sm_70

WDYT?

~ JOhannes

> --Artem
>
>
>> Sam
>>
>>
>>
>> *From:* Artem Belevich <tra at google.com>
>> *Sent:* Monday, March 8, 2021 2:01 PM
>> *To:* Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>
>> *Cc:* Doerfert, Johannes <jdoerfert at anl.gov>; Ben Boeckel <
>> ben.boeckel at kitware.com>; Lieberman, Ron <Ron.Lieberman at amd.com>;
>> a.bataev at hotmail.com; Chan, SiuChi <siuchi.chan at amd.com>; Searles, Mark <
>> Mark.Searles at amd.com>; cfe-dev (cfe-dev at lists.llvm.org) <
>> cfe-dev at lists.llvm.org>; jeffrey.sandoval at hpe.com; Jon Chesterfield <
>> jonathanchesterfield at gmail.com>; Rodgers, Gregory <Gregory.Rodgers at amd.com
>> *Subject:* Re: [cfe-dev] [RFC] Unified offloading option for
>> CUDA/HIP/OpenMP
>>
>>
>>
>> [CAUTION: External Email]
>>
>>
>>
>>
>>
>> On Sat, Mar 6, 2021 at 7:13 AM Liu, Yaxun (Sam) <Yaxun.Liu at amd.com> wrote:
>>
>> [AMD Public Use]
>>
>> We need to different target triples since it may not always be possible to
>> infer target triple by cpu name. So I guess it would be like:
>>
>> "--offload=amdgcn-gfx906,amdgcn-gfx1010"
>> "--Xoffload=amdgcn-gfx* options common to all AMD GPUs"
>> "--Xoffload=amdgcn-gfx906 -mcpu=gfx906 --fsomething-specific-to-gfx906"
>>
>>
>>
>> SGTM.
>>
>> Do you expect the AMDGPU's features (+xnack, -ecc, etc) to be part of the
>> offload target ? Or would they be specified via -Xoffload arguments?
>>
>>
>>
>> --Artem
>>
>>
>>
>>
>> Sam
>>
>> -----Original Message-----
>> From: Doerfert, Johannes <jdoerfert at anl.gov>
>> Sent: Friday, March 5, 2021 1:25 PM
>> To: Artem Belevich <tra at google.com>; Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>
>> Cc: Ben Boeckel <ben.boeckel at kitware.com>; Lieberman, Ron <
>> Ron.Lieberman at amd.com>; a.bataev at hotmail.com; Chan, SiuChi <
>> siuchi.chan at amd.com>; Searles, Mark <Mark.Searles at amd.com>; cfe-dev (
>> cfe-dev at lists.llvm.org) <cfe-dev at lists.llvm.org>; jeffrey.sandoval at hpe.com;
>> Jon Chesterfield <jonathanchesterfield at gmail.com>; Rodgers, Gregory <
>> Gregory.Rodgers at amd.com>
>> Subject: Re: [cfe-dev] [RFC] Unified offloading option for CUDA/HIP/OpenMP
>>
>> [CAUTION: External Email]
>>
>> On 3/4/21 3:05 PM, Artem Belevich wrote:
>>> On Thu, Mar 4, 2021 at 10:34 AM Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>
>> wrote:
>>>> [AMD Public Use]
>>>>
>>>> There is another aspect we need to consider: how to modify the
>>>> -target option by additional options?
>>>>
>>>> For the existing --offload-arch option, we could use -Xarch_ to add
>>>> specific options for it.
>>>>
>>> `-Xarch_xxx` as implemented right now is a rather limiter hack. IIRC
>>> it only accepts options w/o arguments which limits its usability.
>>>
>>>
>>>> Assuming we have an -offload="amdgcn -mcpu=gfx906" option, then we
>>>> want to add some options specific to it by an additional option, what
>>>> should we do?
>>>>
>>> I think we've been conflating telling the driver what to compile for
>>> and customizing individual sub-compilations.
>>>
>>> We could explicitly separate the two tasks. E.g.:
>>> `--[no-]offload=target1,target2,target3...`
>>> `--Xoffload=target_pattern target_options...`
>>>
>>> This way your example would be handled with:
>>> "--offload=gfx906,gfx1010"
>>> "--Xoffload=gfx* options common to all AMD GPUs"
>>> "--Xoffload=gfx906 -mcpu=gfx906 --fsomething-specific-to-gfx906"
>>>
>>> In the end `-Xarch_xxx` would become an alias for '-Xoffload=xxx'.
>> +1
>>
>>
>>> --Artem
>>>
>>>
>>>
>>>
>>>> Thanks.
>>>>
>>>> Sam
>>>>
>>>> -----Original Message-----
>>>> From: Doerfert, Johannes <jdoerfert at anl.gov>
>>>> Sent: Thursday, February 11, 2021 12:59 PM
>>>> To: Artem Belevich <tra at google.com>; Liu, Yaxun (Sam)
>>>> <Yaxun.Liu at amd.com>
>>>> Cc: Ben Boeckel <ben.boeckel at kitware.com>; Lieberman, Ron <
>>>> Ron.Lieberman at amd.com>; a.bataev at hotmail.com; Chan, SiuChi <
>>>> siuchi.chan at amd.com>; Searles, Mark <Mark.Searles at amd.com>; cfe-dev (
>>>> cfe-dev at lists.llvm.org) <cfe-dev at lists.llvm.org>;
>>>> jeffrey.sandoval at hpe.com; Jon Chesterfield
>>>> <jonathanchesterfield at gmail.com>
>>>> Subject: Re: [cfe-dev] [RFC] Unified offloading option for
>>>> CUDA/HIP/OpenMP
>>>>
>>>> [CAUTION: External Email]
>>>>
>>>> I'm OK with either.
>>>>
>>>> On 2/11/21 11:42 AM, Artem Belevich wrote:
>>>>> On Thu, Feb 11, 2021 at 8:30 AM Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>
>>>> wrote:
>>>>>> [AMD Public Use]
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sorry for the delay.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Both Johannes’ and Artem’s proposals should satisfy the needs of
>> users:
>>>>>>
>>>>>>
>>>>>> Option 1:
>>>>>>
>>>>>>
>>>>>>
>>>>>> `-offload=<offload-pattern> optA optB optC`.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Option 2:
>>>>>>
>>>>>>
>>>>>>
>>>>>> `-offload=<offload-pattern>,optA,optB,optC`.
>>>>>>
>>>>> I'm fine with #2. We're using something similar with our build tools
>>>>> and it works reasonably well.
>>>>> However, it does have one annoying corner case. There's no easy way
>>>>> to pass an option which has a comma in it. E.g. if I want to pass
>>>>> `-Wl,something,something`. Perhaps we could use sed-like approach
>>>>> and allow changing the separator. E.g. `s/a/b/` == `s at a@b@`.
>>>>>
>>>>> --Artem
>>>>>
>>>>>
>>>>>
>>>>>> Compared to the old options, they are more concise and more readable.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The main difference is the delimiter. To me option 2 is more
>>>>>> attractive since it does not need quotations for most cases.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can we reach an agreement on option 2?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sam
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Artem Belevich <tra at google.com>
>>>>>> *Sent:* Tuesday, December 15, 2020 2:13 PM
>>>>>> *To:* Ben Boeckel <ben.boeckel at kitware.com>
>>>>>> *Cc:* Doerfert, Johannes <jdoerfert at anl.gov>; Liu, Yaxun (Sam) <
>>>>>> Yaxun.Liu at amd.com>; Lieberman, Ron <Ron.Lieberman at amd.com>;
>>>>>> a.bataev at hotmail.com; Chan, SiuChi <siuchi.chan at amd.com>; Searles,
>>>>>> Mark < Mark.Searles at amd.com>; cfe-dev (cfe-dev at lists.llvm.org) <
>>>>>> cfe-dev at lists.llvm.org>
>>>>>> *Subject:* Re: [cfe-dev] [RFC] Unified offloading option for
>>>>>> CUDA/HIP/OpenMP
>>>>>>
>>>>>>
>>>>>>
>>>>>> [CAUTION: External Email]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 15, 2020 at 10:23 AM Ben Boeckel
>>>>>> <ben.boeckel at kitware.com>
>>>>>> wrote:
>>>>>>
>>>>>> On Mon, Dec 14, 2020 at 14:04:43 -0800, Artem Belevich via cfe-dev
>>>> wrote:
>>>>>>> It all may be an utter overkill, too. WDYT?
>>>>>> Note that tools such as ccache and sccache generally need to be
>>>>>> able to understand what's going on (I believe distcc and other
>>>>>> distributed compilation tools also generally need to know too), so
>>>>>> making it sensible enough for interpretation based on just the
>>>>>> flags to be possible should be considered.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think this is somewhat orthogonal to how we specify per-target
>>>> options.
>>>>>> Such a tool almost never knows about all possible compiler options
>>>>>> and has to pass through the unknown options as-is.  However, any
>>>>>> form
>>>> of 'nested'
>>>>>> options specified on the command line will have a chance to confuse
>>>>>> such tool. E.g. if I want to pass '-E' to some sub-tool for a
>>>>>> particular offload-target, ccache, not being aware that it's not a
>>>>>> top-level compilation option, may interpret it as an attempt to
>>>> preprocess the TU.
>>>>>>
>>>>>> I wonder if it would make sense to just move all this per-target
>>>>>> option complexity into an external response file. As far as
>>>>>> existing tools are concerned, it would look like
>>>>>> `--offload-options=target-opts.file` without affecting tool's
>>>>>> general idea what this compilation is about to do, and the external
>>>>>> file would allow us to be as flexible as we need to be to specify
>>>>>> per-target
>>>> options. It could be just a flat list of pairs `-Xarch_...
>>>>>> optA`.  Or we could use YAML.
>>>>>>
>>>>>>
>>>>>>
>>>>>> That approach, however, has its own issues and would still need to
>>>>>> be optional. If it's the only way to specify offload options, that
>>>>>> will complicate other use cases as now they would have to deal with
>>>>>> temporary files.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Maybe a slightly modified variant of jdoefert@'s idea would work
>>>> better:
>>>>>>>>>       -offload="amd -march=gfx906 -fno-vectorize" -fopenmp
>>>>>> Implement it in a way similar to -Wl,optA,optB,optC and extend it
>>>>>> to match an offload scope glob/regex.
>>>>>>
>>>>>> E.g. `-offload=<offload-pattern>,optA,optB,optC`.
>>>>>>
>>>>>> As far as the external tools are concerned, it's just one option to
>>>>>> pass though. At the same time it should be flexible enough to apply
>>>>>> the options to subset of offload targets in a human-manageable way.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> --Artem Belevich
>>>>>>
>>
>>
>>
>> --
>>
>> --Artem Belevich
>>
>