[PATCH] D60620: [HIP] Support target id by --offload-arch

Tue May 26 15:50:41 PDT 2020

tra added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/HIP.cpp:121-123
+  auto Pos = SubArchName.find_first_of("+-");
+  if (Pos != SubArchName.npos)
+    SubArchName = SubArchName.substr(0, Pos);
----------------
yaxunl wrote:
> tra wrote:
> > Parsing should probably be extracted into a separate function to avoid replicating it all over the place.
> > 
> > I'd also propose use a different syntax for the properties.
> > * use explicit character to separate individual elements. This way splitting the properties becomes independent of what those properties are. If you decide to make properties with values or change their meaning some other way, it would not affect how you compose them.
> > * use `name=value` or `name[+-]` for individual properties. This makes it easy to parse individual properties and normalize their names. This makes property map creation independent of the property values.
> > 
> > Right now `[+-]` serves as both a separator and as the value, which would present problems if you ever need more flexible parametrization of properties. What if a property must be a number or a string. Granted, you can always encode them as a set of bools, but that's rather unpractical. 
> > 
> > E.g. something like this would work a bit better: `gfx111:foo+:bar=33:buz=string`.
> > 
> I discussed this with our team.
> 
> The target id features are not raw GPU properties. They are distilled to become a few target features to decide what the compiler should do.
> 
> Each target feature has 3 values: on, off, and default, which are encoded as +feature, -feature, and not present.
> 
> For runtime, it is simple and clear how to choose device binaries based on the target features: it will try exact match, otherwise choose the default.
> 
> For compiler, it is also simple and clear what to do for each target feature value, since they corresponding to backend target features.
> 
> Basically we expect the target id feature to be like flags, not key/value pairs. In case we do need key/value pairs, they can still use + as delimiter.
> 
> Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter.
> 
> Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read.
> 
> Consider the following example
> 
> 
> ```
> clang -offload-id gfx908+xnack-sramecc a.hip
> 
> clang -offload-id gfx908:xnack+:sramecc- a.hip
> ```
> 
> We are more inclined to keep the original format. 
You're thinking in terms what's needed by AMDGPU *now*. The scheme you're proposing is sufficient for your use case and I'm fine with that. I'm suggesting that you should consider what happens once this change lands.

The functionality you're implementing is exposed to end-users via top-level clang driver argument. This is visible to users and will be relied on.
This will make it hard to change in the future without breaking someone. It's worth making sure we're not painting ourselves in the corner here.

Also, the functionality may be useful/applicable beyond the scope of amdgpu and the binary flags will not be sufficient for everyone. The scheme you're proposing would be somewhat restrictive if I need to pass an integer value or string. We could use something like `gfx123+foo=456-bar=789` but it would look rather odd, IMO. 

Granted, none of the above is a showstopper. I guess we could support multiple formats if it comes to that, but I'd rather not multiply things later because we didn't think of them earlier.

> Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter.

The point was that commingling field separator and the field value is not the cleanest approach, IMO. I'd be fine fine with some other character.

> Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read.

The current use of `gfxXXX` seems to fit the 'short, concise & aesthetically pleasing' part of your argument much better than the proposed scheme.

Is the end user allowed to specify an arbitrary set of the features? Or is the offload-id set restricted to a smaller number of combinations (i.e. tied to particular hardware variants). I vaguely recall that in the past the problem was that AMD needed to create multiple device compilations for one GPU architecture and that didn't fit in the model used by CUDA compilation. 

Would it make sense to keep user-visible GPU arch argument as is and map each known one internally into a set of `offload-id` parameters used to create driver device-side compilations? For CUDA it will be a pass-through, for HIP it will translate single user-specified arch into multiple offload-ids. This would leave AMDGPU free to choose the way internally-used  offload-id is structured and can change it if/when it's necessary without worrying about existing users. It also keeps user-visible parameters short. The translation from gpu-arch to offload-id should be simple enough to maintain.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60620/new/

https://reviews.llvm.org/D60620