[flang-commits] [flang] [flang][OpenMP] Fix LASTPRIVATE for iteration variables (PR #69773)

Leandro Lupori via flang-commits flang-commits at lists.llvm.org
Mon Oct 23 10:45:35 PDT 2023


luporl wrote:

Thank you for the review @NimishMishra!

> If I understand correctly, the previous implementation was generating the `fir.if` depending on a equality check between `v` and `loopIV`. As such, the final update to the iteration variable was being missed.

Not exactly. Consider this code snippet:

```
integer :: ia
ia = 0
!$omp parallel do lastprivate(ia)
do ia = 1, 10
enddo
!$omp end parallel do
```

In the generated FIR, there will be 3 versions of the `ia` variable:

```
%ia = fir.alloca i32 {bindc_name = "ia", uniq_name = "_QFEia"}
omp.parallel {
    %loopIV = fir.alloca i32 {adapt.valuebyref, pinned}
    omp.wsloop for  (%iv) : i32 = (%lb) to (%ub) inclusive step (%step) {
    }
}
```

`%ia` - the original variable
`%loopIV` - this is the private copy of `%ia`
`%iv` - this is the iteration variable created and managed by `omp.wsloop`

So, the previous implementation was actually checking for equality between `iv` and `ub`. The problem is that `iv` goes from `lb` to `ub`, but LASTPRIVATE expects `ia` to be updated with `iv` + `step`, where `iv`'s value is that of the last loop iteration. `v` was added by this patch and is just the result of `iv` + `step`.
 
> I've just one point to raise (@kiranchandramohan can comment better). Through this approach, we have introduced a mandatory set of 1 `arith.addi`, 3 `arith.cmpi`, and 1 `arith.select` in _every_ iteration of the loop. Again, not very sure about this, but can this have a bearing on performance, since loops are extensively present across benchmarks we use for validation?

It is true that now we need to perform more operations to find out when we are in the last loop iteration. However, these are added just when LASTPRIVATE is used and is optimized to 2 instructions when `lb`, `ub` and `step` are
constant. In the example above, it becomes:
```
%v = add i32 %iv, %step
%cmp = icmp sgt i32 %v, 10
```

But when all of the loop parameters are variables, then I guess it is not always possible to optimize it.

> Not analysed this in depth, but could the following work?
> 
> ```
> %cmp = arith.cmpi eq, %u, %iv
> fir.if %cmp {
>     %temp = arith.addi %arg0, %step
>      fir.store %temp to %iv
> }
> ```
> 
> Basically, we let the current flow (generating `if` on `u = ub`) be, and add the extra operations within the `fir.if`? Would it cause some problems I have missed? I expect if the loop is well formed (say both `ub` and `step` are negative), then also the equality condition should work.

This works only when step is 1 or -1. Consider this loop:

```
ia = 0
!$omp do lastprivate(ia)
do ia = 1, 10, 4
enddo
!$omp end do
```

`iv` will have the following values: 1, 5 and 9. After exiting the loop, `ia` must be updated with 13. With this approach (and without this patch), `iv` will never be equal to `ub`, resulting in `ia` not being updated and remaining with value 0.

https://github.com/llvm/llvm-project/pull/69773


More information about the flang-commits mailing list