[flang] [llvm] [Flang][OpenMP] Fix mapping of character type with LEN > 1 specified (PR #154172)

Mon Sep 1 18:26:40 PDT 2025

================
@@ -69,13 +84,58 @@ struct MapInfoOpConversion
       return mlir::failure();
 
     llvm::SmallVector<mlir::NamedAttribute> newAttrs;
-    mlir::omp::MapInfoOp newOp;
+    mlir::omp::MapBoundsOp mapBoundsOp;
     for (mlir::NamedAttribute attr : curOp->getAttrs()) {
       if (auto typeAttr = mlir::dyn_cast<mlir::TypeAttr>(attr.getValue())) {
         mlir::Type newAttr;
         if (fir::isTypeWithDescriptor(typeAttr.getValue())) {
           newAttr = lowerTy().convertBoxTypeAsStruct(
               mlir::cast<fir::BaseBoxType>(typeAttr.getValue()));
+        } else if (fir::isa_char_string(fir::unwrapSequenceType(
+                       fir::unwrapPassByRefType(typeAttr.getValue()))) &&
+                   !characterWithDynamicLen(
+                       fir::unwrapPassByRefType(typeAttr.getValue()))) {
+          // Characters with a LEN param are represented as char
+          // arrays/strings, the initial lowering doesn't generate
+          // bounds for these, however, we require them to map the
+          // data appropriately in the later lowering stages. This
+          // is to prevent the need for unecessary caveats
+          // specific to Flang. We also strip the array from the
+          // type so that all variations of strings are treated
+          // identically and there's no caveats or specialisations
+          // required in the later stages. As an example, Boxed
+          // char strings will emit a single char array no matter
+          // the number of dimensions caused by additional array
+          // dimensions which needs specialised for, as it differs
+          // from the non-box variation which will emit each array
+          // wrapping the character array, e.g. given a type of
+          // the same dimensions, if one is boxed, the types would
----------------
agozillon wrote:

So, given a type like the following:

 CHARACTER(LEN=16), dimension(:,:), allocatable :: char_arr

When we map it we'll emit something like the below for the base address, that has two bounds:

    %17 = omp.map.bounds lower_bound(%c0 : index) upper_bound(%16 : index) extent(%15#1 : index) stride(%15#2 : index) start_idx(%14#0 : index) {stride_in_bytes = true}
    ...
    %23 = omp.map.bounds lower_bound(%c0 : index) upper_bound(%22 : index) extent(%21#1 : index) stride(%18 : index) start_idx(%20#0 : index) {stride_in_bytes = true}
    %24 = fir.box_offset %2 base_addr : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x!fir.char<1,16>>>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?x?x!fir.char<1,16>>>>
    %25 = omp.map.info var_ptr(%2 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x!fir.char<1,16>>>>>, !fir.char<1,16>) map_clauses(exit_release_or_enter_alloc) capture(ByRef) var_ptr_ptr(%24 : !fir.llvm_ptr<!fir.ref<!fir.array<?x?x!fir.char<1,16>>>>) bounds(%17, %23) -> !fir.llvm_ptr<!fir.ref<!fir.array<?x?x!fir.char<1,16>>>> {name = ""}
    ...

The full fir type is:
 
 !fir.array<?x?x!fir.char<1,16>

Which we currently peel off in lowering ourselves to:

  !fir.char<1,16>

Which converted to the LLVM dialect becomes:

   !llvm.array<16 x i8>

However, even if we didn't peel off the dynamic array extents, the lowering to LLVM dialect does that in any case, so we would end up with the same type (and problem) regardless.

So we devolve what is effectively a 3-D array to 1-D, and now we only have 2 bounds, the bounds we're missing in this case is the fir.char<1,16> which expands to the innermost array.

Another scenario occurs for:

 CHARACTER(LEN=16), dimension(10,10) :: char_arr

Where we end up with:

    %4 = omp.map.bounds lower_bound(%c0 : index) upper_bound(%c9 : index) extent(%c10 : index) stride(%c1 : index) start_idx(%c1 : index)
    %5 = omp.map.bounds lower_bound(%c0 : index) upper_bound(%c9 : index) extent(%c10 : index) stride(%c1 : index) start_idx(%c1 : index)
    %6 = omp.map.info var_ptr(%3 : !fir.ref<!fir.array<10x10x!fir.char<1,16>>>, !fir.array<10x10x!fir.char<1,16>>) map_clauses(exit_release_or_enter_alloc) capture(ByRef) bounds(%4, %5) -> !fir.ref<!fir.array<10x10x!fir.char<1,16>>> {name = "char_arr"}

Where the fir type is: 

!fir.array<10x10x!fir.char<1,16>>

And we end up with an LLVM dialect conversion of:

llvm.array<10 x array<10 x array<16 x i8>>

In this case, we actually keep the full type, but have generated two bounds for it, and neither are the innermost. But in this particular case as it's constant we could in theory calculate the size from the type, but we don't do that at the moment, future work. 

But now we have two different cases for something that we can treat identically or even the same as every other array (which is the goal of this PR), having to be special cased in the later lowering due to the FIR dialect conversion. 

The 1-D variation of this also follows a similar trend, from what I can tell, at least with the build I have just now, it's worth noting that downstream already has this patch applied for us.

So what we do in this PR is try to "canonicalize" the type in both of these cases and generate the additional bound necessary to calculate sizes and offsets where necessary. This lets us avoid special casing the lowering. In theory, we could handle this in OpenMPToLLVMIRTranslation, but special casing for the multiple variations of fortran character arrays doesn't seem like the way to go. 

A third alternative to this could be to teach the frontend to generate bounds for this type on initial lowering, but I don't like the idea of special treatment of the type for Fortran and ferrying it around everywhere, it might be confusing to some, I'd personal prefer to just generate it at the stage we're going to require it! And it also doesn't solve the inconsistency of the type lowering for these between FIR -> LLVM dialect. But I'm open to the idea of course (and any alternatives in general :-) ). 

Hopefully this and the extended comment helps explain things a little bit better, I've added some runtime tests for this that should fail without this PR, as well as expanded the comment as requested and tried to make it read a little more sanely! Also happy to talk about it in a call if that helps at all as well!


https://github.com/llvm/llvm-project/pull/154172