[flang-commits] [flang] [OpenMP][Flang] Add "IsolatedFromAbove" trait to omp.target (PR #67164)

Fri Oct 13 09:47:47 PDT 2023

agozillon wrote:

> I think my understanding of the bounds is different from what is being suggested by @agozillon and @kiranchandramohan.
> 
> I'll try to summarise my view below, please correct my where I am wrong.
> 
> 1. `bounds-X` -> This is the bounds that is coming from the `box` values and persists through shape information in FIR/HLFIR etc.
> 2. `bounds-y` -> This is the bounds information that is optionally supplied by the user through the pragma. Eg. !$omp target map(a(5:10)).
> 
> `bounds-x` is something that is coming from the language itself and is used to generate code, such as indices for llvm-GEP instructions and what not.
> 
> `bounds-y` is supplied by the user, and from what I can tell the only code gen it affects is the offload_sizes, and the base_ptrs etc. that are passed on to the OMPRTL kernel calls.
> 
> The bounds information associated to `map_info` that @agozillon worked on is designed to capture `bounds-y`. And if I'm correct, although, some of the information for bounds-y may be generated using the bounds-x, they aren't directly correlated. Moreover, they are not interchangeable.

For very in-depth answers to how the bounds operation is intended to work or if you wish it changed in someway (depending on what it is it might take an RFC, but regardless it'd need discussing with the OpenACC developers we share it with), then I'm likely not the best person to ask, this should probably be directed at @razvanlupusoru or @clementval .

With that disclaimer out of the way, I'll try to answer the points with my current level of understanding (and it can hopefully be corrected if/where necessary), the bounds operation often acts as an intermediary to link relevant instructions to define the bounds of an operation (such as those held in descriptors), and in certain cases it will generate instructions to define the bounds if they do not exist. This makes accessing this information extremely easy and when generated properly for both map(tofrom: a) and map(tofrom: a(2:30)) allows us to treat both cases equally and gives us one simple location to access this information, and accessing this information through the bounds op (ideally) works across multiple types such as pointers, allocatables, fixed sized and assumed arrays in the same manner. 

> In my current implementation, I haven't added the bounds field to the `map_info` for `implicit` arguments, as obviously, there is no `bounds-y` associated with them which we might want to retain.

if you mean, this example in particular:

```
subroutine omp_target_implicit_bounds(n)
   integer :: n
   integer :: a(n, 1024)
   !$omp target
      a(11,22) = 33
   !$omp end target
end subroutine omp_target_implicit_bounds
```

> I do see that for something like `!$omp target map(a : alloc)`, which is a similar case to implicitly capturing `a`, the `boundsOp` is still generated for the `map_info`. @agozillon Can you please clarify why we might need the bounds information here? And if we do indeed need it then I can add the same for implicitly captured variables as well.

it is likely unintended behavior that there is no `bounds-y`, I would expect there to be two bounds for the `map_info` for `a` generated here, one per dimension, the syntaxes omp target(a) and omp target(a(1:10) and the case where omp target is captured implicitly (as in the above example) should all generate bounds information where possible (if you explicitly map the above example it is likely it'll generate the relevant bounds with the current implicit index's inside of the extent field). We should try to emit bounds operations consistently across all cases, some reasons given in the prior paragraph. But also in the case of a pointer/allocatable, and the case where we have an assumed-size/shape array (as in the prior example) we'd also be emitting the bounds information in every single capture case regardless of sectioning being performed as it'd be required to access the descriptor information which holds the relevant runtime values so we can generate the relevant dynamic sizes and offsets trivially, all the relevant information can be accessed and handled through the bounds and a user introspecting the IR will also be able to clearly tell which information belongs to which map variable. 

> Now to summarise, it seems incorrect to me to store `bounds-x` information into the bounds field for `map_info`. And by the time MLIR is lowered to LLVM dialect, the shape information is no longer directly associated with the `alloca`. So, the mapping between the `block_args` and `bounds-x` is lost.

`bounds-x` and `bounds-y` are interlinked, the bounds operation is a uniform and easy way to access bounds information across types (and it will generate required information when necessary AFAIK), in this case, if you generate a bounds operation from `a`, it will utilise information from `bounds-x` (at least downstream it does) to populate it where possible, which then ties this information concretely to a map_info, which can then easily be accessed when lowering, which helps generate relevant operations as can be seen in: https://github.com/llvm/llvm-project/pull/68689/files#diff-2cbb5651f4570d81d55ac4198deda0f6f7341b2503479752ef2295da3774c586R1681 and some other places where you search for the bounds operation. 

I don't believe the shape information is the relevant information in this case as far as the bounds are concerned, the shape simply uses the relevant `bounds-x` information that would end up in `bounds-y` to calculate the shape. Other relevant information would also be captured in this `bounds-y` such as upper and lower bounds information. Which means that it doesn't matter if we drop this  `bounds-x` information from being used inside the kernel as we lower, and no longer need it as a block argument, we'd still have access to this information through the bounds operation, which should make the bounds operation a persistent point of access for this information throughout the lowering process.

> 
> The following two seem feasible options to me:
> 
> * We introduce a new variable to the targetOp just for holding the mapping between the `block_args` to their respective `MLIR values`. We can justify it because of the `IsolatedFromAbove` trait which forces us to have everything as block_args.
> * We continue to use the map_operands field to hold this information. Technically it works, and we can add some clever code when lowering from `MLIR` to `LLVMIR` to dispose off these unnecessary block_args and their associated map_operands.
> 
> Overall design-wise I prefer the first option. Seems like a cleaner way of handling block_args to me.

Would this problem be solved through utilisation of the bounds in all cases? as we'd still maintain access to the relevant bounds information, we might drop the utilisation of some of the block arguments inside of the kernel as in the above example where we no longer require them on further lowering, but it wouldn't really matter as we'd always have access to the required information for further lowering via the bounds operations and we could discard the unrequired block arguments/map_info operations (would be nice to not have to have them as map_info in the first place, but that sounds difficult to avoid). But perhaps I am not fully understanding the problem you're trying to solve which is entirely possible, so I appologies if that's the case.

https://github.com/llvm/llvm-project/pull/67164