[Openmp-commits] [flang] [openmp] [mlir] [Flang][OpenMP] Initial mapping of Fortran pointers and allocatables for target devices (PR #71766)

Mon Nov 13 12:50:42 PST 2023

agozillon wrote:

This is just me trying to expand on some of the choices behind this PR (p.s. sorry for the overtly large message), there might be better ways of doing this and I would love to hear them if there are, and other suggestions and questions are also great. I will say that I have tried this PR in a few different ways:

- I've tried to map just the BaseAddr (posed some problems, more information further down). 
- I've also tried a specialised PFT lowering for pointers and allocatables where we map both the BaseAddr and the Descriptor (original symbol) and treat them as a structure containing a member where both are implicitly mapped when mapping allocatables/pointers, this was a little more generalized and it could be treated like a structure/derived type (unsure if it would remain this way as we've not got derived type member mapping yet), the main issue was the utilisation of varPtrPtr in map_info to create an explicit Parent <-> MemberOf link (perhaps another field could be added for this type of link? Or perhaps there's a way to tell what is a member of something without this information during the lowering?) which is then sorted into groupings in the OpenMPToLLVMIRTranslation for appropriate code generation. 
- And I've now tried this method, which maps just the original symbol (descriptor) and I'll try mention a few of the reasons behind some of the change decisions below (if I miss anything you have questions on just ask and I'm happy to elaborate)

Why the need for an specialised OpenMP map_info conversion inside of CodeGen for FIR/HLFIR?

The lowering of mapped values from OpenMP to LLVM-IR requires the underlying type to generate the appropriate instructions for offloading data to devices.

With the move to opaque pointers we now must maintain a seperate MLIR type inside of the omp.map_info which contains the underlying type beneath the opaque pointers. Currently, the descriptor types for allocatables and pointers (and other types possibly) will devolve to opaque pointers using the default omp.map_info lowering inside of the OpenMP dialect as they go through the more generalized FIR lowering process.

This patch seeks to fix that by creating a specialised type conversion for omp.map_info that uses a type lowering similar to the method that the specialised box/embox operations use. This allows the appropriate type lowering for descriptors.

This is unfortunately a required addition as we have no access to the specific FIR type lowering we require in the OpenMP dialect lowering as FIR/HLFIR is not part of the MLIR project like OpenMP currently is.

We also cannot get the type information reliably from another lowered operation in all cases, for example, in the case where we are passing an allocatable as an argument to a function and then mapping that function argument there appears to be no allocation where we can reliably retrieve the type from (in cases such as it being a global or a locally defined variable it is possible, but breaks from how we currently retrieve the LLVM type for other Fortran types). 

Why the need for is_fortran_allocatable?

This patch creates and toggles a newly added is_fortran_allocatable parameter that is part of the omp.map_info operation when a mapped symbol is an allocatable or pointer, this helps with lowering of these mapped types. 

This allows specialised lowering in OpenMPToLLVMIRTranslation for these types, we can assume certain things, such as the first 
element of the structure being the BaseAddr we wish to offload, where to get the element size of the held type, that we always
wish to map the descriptor structure with a To mapping etc. Some of the data can be gathered from the BoundsOp without having the is_fortran_allocatable toggle, such as the upper bound and lower bounds, but other things like the element type and placement of the data in the structure cannot be (the latter has nothing to do with the BoundsOp, but perhaps the former is a reasonable thing to add to it, to more consistently be able to use BoundsOp in these calculations, I am unsure however).

It also currently facilitates the specialised lowering of the type tied to map_info from box type -> descriptor llvm structure. I am 
unsure how we'd tell if it was a descriptor type needing specialised lowering without the is_fortran_allocatable toggle currently. 

As perhaps a tangential statement OpenMP isn't a language agnostic standard either, there is Fortran and C++ specifc rulings and whilst I most definitely believe we should keep as much of this up-to and contained in the relevant Frontend, I'm unsure it will always be possible. However, this is possibly not one of those cases and itmay be able to be made agnostic with some nudges in the right direction.

Why utilise the original symbol (descriptor and pointer to data) as opposed to the BaseAddr (pointer to data)?

Currently (and probably always, as I think the future specification iteration requests descriptor mapping to some extent, but I might be misunderstanding the change set) the default lowering of the kernel expects the descriptor as well as the underlying data to be mapped, the generated kernel directly accesses the descriptor for calculations such as indexing. 

The main symbol/variable being accessed inside of the kernel is also the original descriptor, and there's a portion of kernel lowering for device where we need the original descriptor SSA/global used inside of the kernel so that we can remap all uses of it to the newly generated kernel argument (or derived accessor instructions to this kernel argument) that corresponds
to it. Which works perfectly fine if we map the original symbol, but if we map the BaseAddr, which is a BoxAddrOf operation to the BoxAddr it becomes a little trickier as we can't traceback to the original symbol (global or local allocation) we need, as every BoxAddrOf operation and a lot of other box related operations (e.g. box/embox) currently generates a new allocation 
and the map_info that previously pointed to the BaseAddr linked to the original descriptor instruction (global, allocatable or function argument) no longer points to it, but points to one of the many allocations created during lowering, and then we can no longer rewrite it's kernel uses.

When testing with a mix of target + enter/exit, the alloca SSA each map_info was tracking was also completely different due to each BoxAddrOf generating it's own alloca, this could also prove to be a bit of an issue when mapping data across.  

There might be a way around this I can't think of, but it seems we need to map the descriptor rather than the BaseAddr (internal pointer of the descriptor) currently.

Why specialised lowering for these types?

Regardless of whether we store the BaseAddr (pointer in descriptor) or the Box (descriptor) in the map_info, we need a specialised lowering for it in the MLIR -> LLVM-IR phase triggered by is_fortran_allocatable. This is because we need to gather a lot of information that we assume exists if it's a Fortran type and we can't assume it about other lowered types. It'd be the same for both cases, if we map the BaseAddr, we need to be aware that we need to map the descriptor that contains it (alongside how to gather other relevant information for the lowering) and if we map the descriptor, we need to know how to access the internal pointer that points to the data (and then how to get the contained data's element type size).

The way that would make the lowering perhaps more frontend agnostic/generalised is likely to generate two map_info's per allocatable, one for the BaseAddr and one for the descriptor implicitly, as if a user had written:

map(tofrom: descriptor, descriptor->data) 

This way we could hopefully treat it as a derived type/structure member mapping from every phase after the PFT lowering. However, there may be some corner cases I am missing, as we do not have the full derived type/structure member mapping flow in place yet (the current allocatable mapping is a specialisation of it and I did previously have this more generalised method in place, but never actually tested it with derived types as that's one of my next steps, that or c_ptrs, which may also require a specialised lowering to deal with the opaque structure wrapping them). And in this generalised lowering case I'm not too sure how we'd tell if a type captured by map_info requires the specialised descriptor lowering to attain the descriptor type
without the is_fortran_allocatable flag. 

I also think this more generalised route (and perhaps derived types too) needs some way of indicating if something is a member of a parent structure, e.g. some kind of Parent <-> Child mapping argument for the map_info operation, the previous patch I had up tried to do so using the varPtrPtr argument, but that was obviously the incorrect thing to use. So, we'd need some other way of indicating this relationship between mapped variables or maybe there's a way to do this later in the lowering that I'm unaware of (there's a lot I'm unaware of, so if anyone does know a method that'd bypass adding another map_info operand, please do speak up, I'd be happy to try it!).

Why no BoundsOp usage currently? 

I addressed this in another comment, but to try minimize difficulty tracking it all, we could use the BoundsOp for most of the calculations in this patch, the main thing the BoundsOp doesn't provide currently that it might be able to incorporate (but perhaps it shouldn't) is the element type, which I access the descriptor directly for, you could use getStride in certain cases but I imagine getStride will not always be the exact element type size if a user specifies a non-default stride (and it's only useable when it's specified in bytes). So, for this iteration of the patch I just opted to directly use the descriptor to keep it consistent, as we can't consistently use the BoundsOp. The other descriptor access that isn't covered by BoundsOp and shouldn't be, is accessing the pointer itself inside of the descriptor.

I might have missed some other points here, I should have taken a more profuse set of notes while developing this patch, so I'm sorry about that, if something else comes to me, I'll add it as a subsequent comment.

https://github.com/llvm/llvm-project/pull/71766