[PATCH] D60620: [HIP] Support target id by --offload-arch

Wed May 27 12:29:48 PDT 2020

tra added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/HIP.cpp:121-123
+  auto Pos = SubArchName.find_first_of("+-");
+  if (Pos != SubArchName.npos)
+    SubArchName = SubArchName.substr(0, Pos);
----------------
yaxunl wrote:
> tra wrote:
> > yaxunl wrote:
> > > tra wrote:
> > > > Parsing should probably be extracted into a separate function to avoid replicating it all over the place.
> > > > 
> > > > I'd also propose use a different syntax for the properties.
> > > > * use explicit character to separate individual elements. This way splitting the properties becomes independent of what those properties are. If you decide to make properties with values or change their meaning some other way, it would not affect how you compose them.
> > > > * use `name=value` or `name[+-]` for individual properties. This makes it easy to parse individual properties and normalize their names. This makes property map creation independent of the property values.
> > > > 
> > > > Right now `[+-]` serves as both a separator and as the value, which would present problems if you ever need more flexible parametrization of properties. What if a property must be a number or a string. Granted, you can always encode them as a set of bools, but that's rather unpractical. 
> > > > 
> > > > E.g. something like this would work a bit better: `gfx111:foo+:bar=33:buz=string`.
> > > > 
> > > I discussed this with our team.
> > > 
> > > The target id features are not raw GPU properties. They are distilled to become a few target features to decide what the compiler should do.
> > > 
> > > Each target feature has 3 values: on, off, and default, which are encoded as +feature, -feature, and not present.
> > > 
> > > For runtime, it is simple and clear how to choose device binaries based on the target features: it will try exact match, otherwise choose the default.
> > > 
> > > For compiler, it is also simple and clear what to do for each target feature value, since they corresponding to backend target features.
> > > 
> > > Basically we expect the target id feature to be like flags, not key/value pairs. In case we do need key/value pairs, they can still use + as delimiter.
> > > 
> > > Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter.
> > > 
> > > Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read.
> > > 
> > > Consider the following example
> > > 
> > > 
> > > ```
> > > clang -offload-id gfx908+xnack-sramecc a.hip
> > > 
> > > clang -offload-id gfx908:xnack+:sramecc- a.hip
> > > ```
> > > 
> > > We are more inclined to keep the original format. 
> > You're thinking in terms what's needed by AMDGPU *now*. The scheme you're proposing is sufficient for your use case and I'm fine with that. I'm suggesting that you should consider what happens once this change lands.
> > 
> > The functionality you're implementing is exposed to end-users via top-level clang driver argument. This is visible to users and will be relied on.
> > This will make it hard to change in the future without breaking someone. It's worth making sure we're not painting ourselves in the corner here.
> > 
> > Also, the functionality may be useful/applicable beyond the scope of amdgpu and the binary flags will not be sufficient for everyone. The scheme you're proposing would be somewhat restrictive if I need to pass an integer value or string. We could use something like `gfx123+foo=456-bar=789` but it would look rather odd, IMO. 
> > 
> > Granted, none of the above is a showstopper. I guess we could support multiple formats if it comes to that, but I'd rather not multiply things later because we didn't think of them earlier.
> > 
> > > Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter.
> > 
> > The point was that commingling field separator and the field value is not the cleanest approach, IMO. I'd be fine fine with some other character.
> > 
> > > Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read.
> > 
> > The current use of `gfxXXX` seems to fit the 'short, concise & aesthetically pleasing' part of your argument much better than the proposed scheme.
> > 
> > Is the end user allowed to specify an arbitrary set of the features? Or is the offload-id set restricted to a smaller number of combinations (i.e. tied to particular hardware variants). I vaguely recall that in the past the problem was that AMD needed to create multiple device compilations for one GPU architecture and that didn't fit in the model used by CUDA compilation. 
> > 
> > Would it make sense to keep user-visible GPU arch argument as is and map each known one internally into a set of `offload-id` parameters used to create driver device-side compilations? For CUDA it will be a pass-through, for HIP it will translate single user-specified arch into multiple offload-ids. This would leave AMDGPU free to choose the way internally-used  offload-id is structured and can change it if/when it's necessary without worrying about existing users. It also keeps user-visible parameters short. The translation from gpu-arch to offload-id should be simple enough to maintain.
> > 
> > 
> After discussion, we decided to adopt the format you proposed. The rationale is that we want target id to be treated as an extended `--offload-arch` option, which means it needs to be able to accept all existing and future CUDA arch names. Using `:` as delimiter should be tolerant enough whereas `+/-` is not.
> 
> Also I will try introducing -offload-target-id for this option.
> 
> The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them.
> 
> I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation.
> we want target id to be treated as an extended --offload-arch option
> Also I will try introducing -offload-target-id for this option.

Do we need a new option? I think it may be a natural extension of the `--offload-arch` where all currently used options will still be parsed correctly as an arch without extra features. The tests in the last revision of this patch look reasonable: 

```
// ...
// RUN:   -x hip --offload-arch=gfx908 \
// RUN:   --offload-arch=gfx908:sramecc+:xnack+ 
```

Does this mean that HIP will create two compilation passes -- one for `gfx908` and one for `gfx908:sramecc+:xnack+` ?
Or does it mean that the first line is ignored if you get a more detailed offload arch?

One thing you'll need is a way to normalize the arch+features tuple so we can compare them. 

> The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them.

What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user?
Or does it only make sense for a few specific arch:featureA+:featureB- combinations?
If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names.

I.e. if the only valid ids for gfx111  are `gfx111:foo+:bar-` and `gfx111:buz+`, we could call them `gfx111a` and `gfx111b` and expand it into the right set of features ourselves without relying on the users not to make a typo.

> I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation.

What I'm saying is that maybe we should not expose detailrd features to the end user directly (or by default). Allow them to use friendly GPU names and normalize them internally into an offload ID or a set of IDs.

E.g. right now we specify offload-arch and create one device compilation per specified offload arch.
This patch proposed to make offload-arch more nuanced, but otherwise keeps the machinery the same.
What I'm suggesting is this:

* Normalize each offload-arch argument into a list of build IDs. For CUDA it will just map each arch to a singleton list.
   For AMDGPU, it will expand friendly names into lists of offload-IDs they represent, and into singleton with a single normalized offload ID otherwise.
* do similar normalization for `--no-offload-arch`
* concatenate all enabled offload IDs.
* use the list of offload-ids to drive device compilation pass creation.

As far as the end users are concerned, they can keep using whatever --offload-arch flags they are using now.
If building with --offload-arch=gfx908 requires actually building two GPU objects, it will all be handled transparently by the driver. If they need something specific, it's doable with --offload-arch=gfx908:featureA+ which will build for that variant only.

Would this fit your use case? If not, what do I miss? Could you give me more examples of how do you see offload-id being used?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620