<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/72678>72678</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            mlir::DataLayout usage of unsigned is too small for applications with huge arrays
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            mlir
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jeanPerier
      </td>
    </tr>
</table>

<pre>
    LLVM uses `llvm::TypeSize` which wraps a `uint64_t` in `llvm::DataLayout` ([here](https://github.com/llvm/llvm-project/blob/764c3afd43128f7ccddb070953c330b340ebe811/llvm/include/llvm/IR/DataLayout.h#L629)), but `mlir::DataLayout` uses `unsigned` ([here](https://github.com/llvm/llvm-project/blob/2310066faab21996a52513c5475552f0e28f0624/mlir/include/mlir/Interfaces/DataLayoutInterfaces.h#L37)).

This is problematic for the change request by llvm to flang to stop generating GEP to compute type size in constants (https://github.com/llvm/llvm-project/issues/71507).

Fortran sometimes has huge arrays with static size, and the size of some llvm.array types do not fit on an unsigned, so moving from GEP to using `mlir::DataLayout` causes bug with such programs.

Here is an example of bugs that can occur in MLIR transformation with big arrays (it is not what we are hitting with flang, but this is a simple MLIR illustration using only the LLVM dialect):

```
llvm.func @test_byval(%ptr : !llvm.ptr) {
 llvm.call @with_byval_arg(%ptr) : (!llvm.ptr) -> ()
 llvm.return
}

llvm.func @with_byval_arg(%ptr : !llvm.ptr { llvm.byval = !llvm.array<1000 x array<1000 x array <500 x i32>>> }) {
 llvm.return
}
``` 

Is translated in `mlir-opt -inline` to the following program where only `389 387 264` bytes are copied from an `!llvm.array<1000 x array<1000 x array <500 x i32>>>` instead of `2000 000 000` bytes:

```
llvm.func @test_byval(%arg0: !llvm.ptr) {
    %0 = llvm.mlir.constant(389387264 : i64) : i64
    %1 = llvm.mlir.constant(1 : i64) : i64
    %2 = llvm.alloca %1 x !llvm.array<1000 x array<1000 x array<500 x i32>>> {alignment = 4 : i64} : (i64) -> !llvm.ptr
 "llvm.intr.memcpy"(%2, %arg0, %0) <{isVolatile = false}> : (!llvm.ptr, !llvm.ptr, i64) -> ()
    llvm.return
  }
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysVl1v2zoS_TX0yyAGRerzwQ_5qHcDpEDRLfpaUNRIYkGRWpKq6_76BSk7dtK0i3t7ASWWKM7wnJkzMxLeq8Eg7khxR4qHjVjCaN3uKwrzAZ1Ct2ltd9w9PX1-D4tHD6SkWn-bCL8l_PbTccb_qB9ISgqHUckRDk7MHkTctigTyvxLiC-VeWn4IIJ4Eke7pLeE1aS4G9EhKR4Iq8cQZh83sj1h-0GFcWm30k6E7ZOL9edmdvYrykDYvtW2JWxflbnkou9ynrG6r6TsupZWtCm45Jy2PKfYYp1lFz_KSL10eFl4_EjY_oJuOxLGn0rWkNN1D-0SIpdJK_cWl3OQFpMC2_2T_BjPKC3LXoiWZU1TioIVGZdFXhVFwXqKrO5pyXLC9gneNb_TwqMJ6Hoh0b_geVleGfNqJbwl9IHQ2_X_p1F5UB5mZ1uNkwhKQm8dhBFBjsIMCA7_u6AP0B4hUoBgodfCDPHGBzvDgAadCMoM8K93H-KytNO8BIRwnBG8-oFRLNIaH4QJHv5OuJT3SyJYZQWtXtPYWxecMODthEFN6GEUHsZlQBDOiaOHgwoj-JAIRkQx7cJ0iWhCaPtknThuk1GC76GzYGyAXgWwBoSBZxmwe_AWJvstUu-dnc78Fx9XfqMoKZKm2mU4AVvkGHMwODH5F8z-jQ5jgoQB_C6mWSek7TJ4CKMIIIUBK-XiYojfPz1-hBgI31sXk2nN6r9VwzkQhNUqRI-R1CG6OMQgIYwqpBwmg5Thc2mEk0gEeJUQpHOU1osPbj1lZWyNPqaIpt7SKaFT8poYgitOpKSnKz2miPeLkUByGtCHL-3xm9CxwlgxBweE3wJhWdo3B0dYA6S6W43XfEmhdbSO2FfrL8INzx6SRXJSv_JzQ_i7dbm59ucwLM6c0FYP1-BfoP3Fea8RR7ir47QXCH94fp3SQvh9RimF7_DWIxB-X6RHxRnh79YLIrCfQ_E29HPE4ZrKo1_FokXA7tTPo2Jv7BzgRhmtTBoDwaak9lZre4h5PikVDrH_rVknJeV1A7yugJV5tGqPAX1SlrSzwm6tEJFO-WPy6wDyAUUX64GUlEWb09_z8X8gPOEG-lvhAQBhBU3JTDti6LbnLkdYzeuG1xUr8yQHVeZnGcbbax_Zr31k_8-WXWyF1laK1eH3v6KvX8nrTmg1mAlNSIdceFQP53I6ITuV0SVSK0TCWFpRJrjthJOcj4SxNb4sdpdznNdbupK8J9Wd8p-tFkFpTEf3QnuMao7HvFHI9_Dq8SWuq_IG-LlMAH4ulE23413DG7HBXVZRynLK62oz7vqm51jUTdcWXVFUbZfnWc0LinVZoMiqjdoxyniWZVVWFk1ebWXJG1nRqitEiU3ek5ziJJTeJiDWDZs03HYVK6t6o0WL2qdvN8bWCc_iV5zbpYkYOz_JqVY--IuDoILG3VvTBhYvhjQzzmMrdvJgLfgpNs047MU8ayVTIz8NyqvJuVmc3v3BuI6k_hcAAP__HKo0vw">