[flang-commits] [flang] [Flang] Apply nusw nuw flags on array_coor gep's (PR #184573)

Wed Mar 4 01:30:38 PST 2026

https://github.com/Stylie777 created https://github.com/llvm/llvm-project/pull/184573

When generating the LLVM IR, since #110060, `nsw` is applied to operations when lowering the subscripts. This was, up until now, only applied to arithmetic, and not the related getelementptr's.

The original Discouse thread noted that NSW helped with vectorisation later on in the process. Changes to the BasicAA pipeline has led to vectorisation no longer being applied where wrapping cannot be guaranteed for array_coor instructions. By applying the `nusw nuw` flags to the GEP's, this enables vectorisation in the middle end. Supporting arithmatic instructions will also be marked `nuw` to ensure instcombine does not remove these flags when transforming instructions.

This is only applied in specific circumstances, where it is deemed safe to assume no unsigned wrapping. Applying in cases where we cannot guarantee no wrapping may lead to mis-compilation, such as slices or dynamic sizes. In these cases, the existing behaviour is applied as per PR #110060.

This patch has been verified using the following with no regressions:
- llvm-test-suite
- Fujitsu test suite
- Various Opensource HPC Applications

Original Discourse thread: https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584

>From 90ce9edc20dbb76def741d6aa9bf317a22733cb7 Mon Sep 17 00:00:00 2001
From: Jack Styles <jack.styles at arm.com>
Date: Fri, 27 Feb 2026 12:27:07 +0000
Subject: [PATCH] [Flang] Apply nusw nuw flags on array_coor gep's

When generating the LLVM IR, since #110060, `nsw` is applied to
operations when lowering the subscripts. This was, up until now,
only applied to arithmetic, and not the related getelementptr's.

The original Discouse thread noted that NSW helped with vectorisation
later on in the process. Changes to the BasicAA pipeline has led to
vectorisation no longer being applied where wrapping cannot be guaranteed
for array_coor instructions. By applying the `nusw nuw` flags to the
GEP's, this enables vectorisation in the middle end. Supporting arithmatic
instructions will also be marked `nuw` to ensure instcombine does not
remove these flags when transforming instructions.

This is only applied in specific circumstances, where it is deemed
safe to assume no unsigned wrapping. Applying in cases where we cannot
guarantee no wrapping may lead to mis-compilation, such as slices or
dynamic sizes. In these cases, the existing behaviour is applied as per
PR #110060.

This patch has been verified using the following with no regressions:
- llvm-test-suite
- Fujitsu test suite
- Various Opensource HPC Applications

Original Discourse thread: https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584
---
 flang/lib/Optimizer/CodeGen/CodeGen.cpp       | 60 ++++++++++++-------
 flang/test/Fir/convert-to-llvm.fir            | 16 ++---
 flang/test/Fir/tbaa-codegen2.fir              |  2 +-
 flang/test/HLFIR/no-block-merging.fir         |  2 +-
 .../Integration/OpenMP/private-global.f90     |  8 +--
 flang/test/Integration/ivdep.f90              | 56 ++++++++---------
 flang/test/Integration/prefetch.f90           |  2 +-
 7 files changed, 83 insertions(+), 63 deletions(-)

diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index 1c5cc3cc1b306..37590e6cb3198 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -2713,6 +2713,25 @@ struct XArrayCoorOpConversion
         baseIsBoxed ? getBoxTypePair(coor.getMemref().getType()) : TypePair{};
     mlir::LLVM::IntegerOverflowFlags nsw =
         mlir::LLVM::IntegerOverflowFlags::nsw;
+    mlir::LLVM::IntegerOverflowFlags nuw =
+        mlir::LLVM::IntegerOverflowFlags::nuw;
+    mlir::LLVM::IntegerOverflowFlags subFlags = nsw;
+    mlir::LLVM::IntegerOverflowFlags addMulFlags = nsw;
+    mlir::LLVM::GEPNoWrapFlags gepFlags = mlir::LLVM::GEPNoWrapFlags::none;
+
+    // In certain cases, where unsigned wrapping is known not to not occur, we
+    // can apply the nuw flag to Add/Mul operations, and `nusw nuw` flags to
+    // getelementptr's. By doing so, this enables better optimization through
+    // slp-vectorizer later in the LLVM pipeline.
+    const bool canUseNuw = !baseIsBoxed && !isShifted && !isSliced &&
+                           coor.getSubcomponent().empty() &&
+                           coor.getLenParams().empty() &&
+                           !coor.getShape().empty();
+    if (canUseNuw) {
+      addMulFlags = addMulFlags | nuw;
+      gepFlags =
+          mlir::LLVM::GEPNoWrapFlags::nusw | mlir::LLVM::GEPNoWrapFlags::nuw;
+    }
 
     // For each dimension of the array, generate the offset calculation.
     for (unsigned i = 0; i < rank; ++i, ++indexOffset, ++shapeOffset,
@@ -2734,15 +2753,16 @@ struct XArrayCoorOpConversion
           step = integerCast(loc, rewriter, idxTy, operands[sliceOffset + 2]);
       }
       auto idx =
-          mlir::LLVM::SubOp::create(rewriter, loc, idxTy, index, lb, nsw);
-      mlir::Value diff =
-          mlir::LLVM::MulOp::create(rewriter, loc, idxTy, idx, step, nsw);
+          mlir::LLVM::SubOp::create(rewriter, loc, idxTy, index, lb, subFlags);
+      mlir::Value diff = mlir::LLVM::MulOp::create(rewriter, loc, idxTy, idx,
+                                                   step, addMulFlags);
       if (normalSlice) {
         mlir::Value sliceLb =
             integerCast(loc, rewriter, idxTy, operands[sliceOffset]);
-        auto adj =
-            mlir::LLVM::SubOp::create(rewriter, loc, idxTy, sliceLb, lb, nsw);
-        diff = mlir::LLVM::AddOp::create(rewriter, loc, idxTy, diff, adj, nsw);
+        auto adj = mlir::LLVM::SubOp::create(rewriter, loc, idxTy, sliceLb, lb,
+                                             subFlags);
+        diff = mlir::LLVM::AddOp::create(rewriter, loc, idxTy, diff, adj,
+                                         addMulFlags);
       }
       // Update the offset given the stride and the zero based index `diff`
       // that was just computed.
@@ -2750,21 +2770,21 @@ struct XArrayCoorOpConversion
         // Use stride in bytes from the descriptor.
         mlir::Value stride =
             getStrideFromBox(loc, baseBoxTyPair, operands[0], i, rewriter);
-        auto sc =
-            mlir::LLVM::MulOp::create(rewriter, loc, idxTy, diff, stride, nsw);
-        offset =
-            mlir::LLVM::AddOp::create(rewriter, loc, idxTy, sc, offset, nsw);
+        auto sc = mlir::LLVM::MulOp::create(rewriter, loc, idxTy, diff, stride,
+                                            addMulFlags);
+        offset = mlir::LLVM::AddOp::create(rewriter, loc, idxTy, sc, offset,
+                                           addMulFlags);
       } else {
         // Use stride computed at last iteration.
-        auto sc =
-            mlir::LLVM::MulOp::create(rewriter, loc, idxTy, diff, prevExt, nsw);
-        offset =
-            mlir::LLVM::AddOp::create(rewriter, loc, idxTy, sc, offset, nsw);
+        auto sc = mlir::LLVM::MulOp::create(rewriter, loc, idxTy, diff, prevExt,
+                                            addMulFlags);
+        offset = mlir::LLVM::AddOp::create(rewriter, loc, idxTy, sc, offset,
+                                           addMulFlags);
         // Compute next stride assuming contiguity of the base array
         // (in element number).
         auto nextExt = integerCast(loc, rewriter, idxTy, operands[shapeOffset]);
         prevExt = mlir::LLVM::MulOp::create(rewriter, loc, idxTy, prevExt,
-                                            nextExt, nsw);
+                                            nextExt, addMulFlags);
       }
     }
 
@@ -2777,7 +2797,7 @@ struct XArrayCoorOpConversion
           getBaseAddrFromBox(loc, baseBoxTyPair, operands[0], rewriter);
       llvm::SmallVector<mlir::LLVM::GEPArg> args{offset};
       auto addr = mlir::LLVM::GEPOp::create(rewriter, loc, llvmPtrTy, byteTy,
-                                            base, args);
+                                            base, args, gepFlags);
       if (coor.getSubcomponent().empty()) {
         rewriter.replaceOp(coor, addr);
         return mlir::success();
@@ -2802,8 +2822,8 @@ struct XArrayCoorOpConversion
           operands.slice(coor.getSubcomponentOperandIndex(),
                          coor.getSubcomponent().size()));
       args.append(indices.begin(), indices.end());
-      rewriter.replaceOpWithNewOp<mlir::LLVM::GEPOp>(coor, llvmPtrTy,
-                                                     elementType, addr, args);
+      rewriter.replaceOpWithNewOp<mlir::LLVM::GEPOp>(
+          coor, llvmPtrTy, elementType, addr, args, gepFlags);
       return mlir::success();
     }
 
@@ -2825,7 +2845,7 @@ struct XArrayCoorOpConversion
           auto length = integerCast(loc, rewriter, idxTy,
                                     operands[coor.getLenParamsOperandIndex()]);
           offset = mlir::LLVM::MulOp::create(rewriter, loc, idxTy, offset,
-                                             length, nsw);
+                                             length, addMulFlags);
         } else {
           TODO(loc, "compute size of derived type with type parameters");
         }
@@ -2841,7 +2861,7 @@ struct XArrayCoorOpConversion
       args.append(indices.begin(), indices.end());
     }
     rewriter.replaceOpWithNewOp<mlir::LLVM::GEPOp>(
-        coor, llvmPtrTy, gepObjectType, adaptor.getMemref(), args);
+        coor, llvmPtrTy, gepObjectType, adaptor.getMemref(), args, gepFlags);
     return mlir::success();
   }
 };
diff --git a/flang/test/Fir/convert-to-llvm.fir b/flang/test/Fir/convert-to-llvm.fir
index 4108b3b11e2b9..9c6bbd2f766b0 100644
--- a/flang/test/Fir/convert-to-llvm.fir
+++ b/flang/test/Fir/convert-to-llvm.fir
@@ -2224,10 +2224,10 @@ func.func @ext_array_coor0(%arg0: !fir.ref<!fir.array<?xi32>>) {
 // CHECK:         %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:         %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
 // CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] overflow<nsw> : i64
-// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
-// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]]  overflow<nsw> : i64
-// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]]  overflow<nsw> : i64
-// CHECK:         %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
+// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw, nuw> : i64
+// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]]  overflow<nsw, nuw> : i64
+// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]]  overflow<nsw, nuw> : i64
+// CHECK:         %{{.*}} = llvm.getelementptr nusw|nuw %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
 
 // Conversion with shift and slice.
 
@@ -2264,10 +2264,10 @@ func.func @ext_array_coor2(%arg0: !fir.ref<!fir.array<?x!fir.char<1,?>>>) {
 // CHECK:         %[[C1:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CHECK:         %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
 // CHECK:         %[[IDX:.*]] = llvm.sub %[[C0]], %[[C1]] overflow<nsw> : i64
-// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw> : i64
-// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]]  overflow<nsw> : i64
-// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] overflow<nsw> : i64
-// CHECK:         %{{.*}} = llvm.getelementptr %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
+// CHECK:         %[[DIFF0:.*]] = llvm.mul %[[IDX]], %[[C1]] overflow<nsw, nuw> : i64
+// CHECK:         %[[SC:.*]] = llvm.mul %[[DIFF0]], %[[C1]]  overflow<nsw, nuw> : i64
+// CHECK:         %[[OFFSET:.*]] = llvm.add %[[SC]], %[[C0_1]] overflow<nsw, nuw> : i64
+// CHECK:         %{{.*}} = llvm.getelementptr nusw|nuw %[[ARG0]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
 
 // Conversion for a `fir.box`.
 
diff --git a/flang/test/Fir/tbaa-codegen2.fir b/flang/test/Fir/tbaa-codegen2.fir
index 071d3ec89394c..cab258c7a149b 100644
--- a/flang/test/Fir/tbaa-codegen2.fir
+++ b/flang/test/Fir/tbaa-codegen2.fir
@@ -98,7 +98,7 @@ module attributes {fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", llvm.targ
 // access to 'a':
 // CHECK:  %[[VAL43:.*]] = load i32, ptr %[[VAL42]], align 4, !tbaa ![[A_ACCESS_TAG:.*]]
 // [...]
-// CHECK:  %[[VAL50:.*]] = getelementptr i32, ptr %{{.*}}, i64 %{{.*}}
+// CHECK:  %[[VAL50:.*]] = getelementptr nusw nuw i32, ptr %{{.*}}, i64 %{{.*}}
 // store to the temporary:
 // CHECK:  store i32 %{{.*}}, ptr %[[VAL50]], align 4, !tbaa ![[TMP_DATA_ACCESS_TAG:.*]]
 // [...]
diff --git a/flang/test/HLFIR/no-block-merging.fir b/flang/test/HLFIR/no-block-merging.fir
index 75803bf010884..02deb0cc6c4e5 100644
--- a/flang/test/HLFIR/no-block-merging.fir
+++ b/flang/test/HLFIR/no-block-merging.fir
@@ -27,7 +27,7 @@ func.func @no_shape_merge(%cdt: i1, %from: !fir.ref<!fir.array<?xf64>>, %to : !f
 // Note: block merging happens in the output below, but after FIR codegen.
 
 // CHECK-LABEL:  define void @no_shape_merge(
-// CHECK:  %[[GEP:.*]] = getelementptr i8, ptr %{{.*}}
+// CHECK:  %[[GEP:.*]] = getelementptr nusw nuw i8, ptr %{{.*}}
 // CHECK:  %[[LOAD:.*]] = load double, ptr %[[GEP]]
 // CHECK:  store double %[[LOAD]], ptr %{{.*}}
 // CHECK:  ret void
diff --git a/flang/test/Integration/OpenMP/private-global.f90 b/flang/test/Integration/OpenMP/private-global.f90
index 978a8fa3c8205..a49ed6000e205 100644
--- a/flang/test/Integration/OpenMP/private-global.f90
+++ b/flang/test/Integration/OpenMP/private-global.f90
@@ -44,7 +44,7 @@ program bug
 ! check that we use the private copy of table for table/=50
 ! CHECK:       omp.par.region3:
 ! CHECK:         %[[VAL_44:.*]] = sub nsw i64 %{{.*}}, 1
-! CHECK:         %[[VAL_45:.*]] = mul nsw i64 %[[VAL_44]], 1
-! CHECK:         %[[VAL_46:.*]] = mul nsw i64 %[[VAL_45]], 1
-! CHECK:         %[[VAL_47:.*]] = add nsw i64 %[[VAL_46]], 0
-! CHECK:         %[[VAL_48:.*]] = getelementptr i32, ptr %[[PRIV_TABLE]], i64 %[[VAL_47]]
+! CHECK:         %[[VAL_45:.*]] = mul nuw nsw i64 %[[VAL_44]], 1
+! CHECK:         %[[VAL_46:.*]] = mul nuw nsw i64 %[[VAL_45]], 1
+! CHECK:         %[[VAL_47:.*]] = add nuw nsw i64 %[[VAL_46]], 0
+! CHECK:         %[[VAL_48:.*]] = getelementptr nusw nuw i32, ptr %[[PRIV_TABLE]], i64 %[[VAL_47]]
diff --git a/flang/test/Integration/ivdep.f90 b/flang/test/Integration/ivdep.f90
index 0be86ffbb0e88..00b0279e91d5e 100644
--- a/flang/test/Integration/ivdep.f90
+++ b/flang/test/Integration/ivdep.f90
@@ -11,10 +11,10 @@ subroutine ivdep_test1
      !CHECK: %[[VAL_8:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT]]
      !CHECK: %[[VAL_9:.*]] = sext i32 %[[VAL_8]] to i64
      !CHECK: %[[VAL_10:.*]] = sub nsw i64 %[[VAL_9]], 1
-     !CHECK: %[[VAL_11:.*]] = mul nsw i64 %[[VAL_10]], 1
-     !CHECK: %[[VAL_12:.*]] = mul nsw i64 %[[VAL_11]], 1
-     !CHECK: %[[VAL_13:.*]] = add nsw i64 %[[VAL_12]], 0
-     !CHECK: %[[VAL_14:.*]] = getelementptr i32, ptr {{.*}}, i64 %[[VAL_13]]
+     !CHECK: %[[VAL_11:.*]] = mul nuw nsw i64 %[[VAL_10]], 1
+     !CHECK: %[[VAL_12:.*]] = mul nuw nsw i64 %[[VAL_11]], 1
+     !CHECK: %[[VAL_13:.*]] = add nuw nsw i64 %[[VAL_12]], 0
+     !CHECK: %[[VAL_14:.*]] = getelementptr nusw nuw i32, ptr {{.*}}, i64 %[[VAL_13]]
      !CHECK: store i32 %[[VAL_8]], ptr %[[VAL_14]], align 4, !llvm.access.group [[DISTRINCT]]
      !CHECK: %[[VAL_15:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT]]
      !CHECK: %[[VAL_16:.*]] = add nsw i32 %[[VAL_15]], 1
@@ -36,23 +36,23 @@ subroutine ivdep_test2
      !CHECK: %[[VAL_10:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT1]]
      !CHECK: %[[VAL_11:.*]] = sext i32 %[[VAL_10]] to i64
      !CHECK: %[[VAL_12:.*]] = sub nsw i64 %[[VAL_11]], 1
-     !CHECK: %[[VAL_13:.*]] = mul nsw i64 %[[VAL_12]], 1
-     !CHECK: %[[VAL_14:.*]] = mul nsw i64 %[[VAL_13]], 1
-     !CHECK: %[[VAL_15:.*]] = add nsw i64 %[[VAL_14]], 0
-     !CHECK: %[[VAL_16:.*]] = getelementptr i32, ptr {{.*}}, i64 %[[VAL_15]]
+     !CHECK: %[[VAL_13:.*]] = mul nuw nsw i64 %[[VAL_12]], 1
+     !CHECK: %[[VAL_14:.*]] = mul nuw nsw i64 %[[VAL_13]], 1
+     !CHECK: %[[VAL_15:.*]] = add nuw nsw i64 %[[VAL_14]], 0
+     !CHECK: %[[VAL_16:.*]] = getelementptr nusw nuw i32, ptr {{.*}}, i64 %[[VAL_15]]
      !CHECK: %[[VAL_17:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT1]] 
      !CHECK: %[[VAL_18:.*]] = sub nsw i64 %[[VAL_11]], 1
-     !CHECK: %[[VAL_19:.*]] = mul nsw i64 %[[VAL_18]], 1
-     !CHECK: %[[VAL_20:.*]] = mul nsw i64 %[[VAL_19]], 1
-     !CHECK: %[[VAL_21:.*]] = add nsw i64 %[[VAL_20]], 0
-     !CHECK: %[[VAL_22:.*]] = getelementptr i32, ptr {{.*}}, i64 %[[VAL_21]]
+     !CHECK: %[[VAL_19:.*]] = mul nuw nsw i64 %[[VAL_18]], 1
+     !CHECK: %[[VAL_20:.*]] = mul nuw nsw i64 %[[VAL_19]], 1
+     !CHECK: %[[VAL_21:.*]] = add nuw nsw i64 %[[VAL_20]], 0
+     !CHECK: %[[VAL_22:.*]] = getelementptr nusw nuw i32, ptr {{.*}}, i64 %[[VAL_21]]
      !CHECK: %[[VAL_23:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT1]] 
      !CHECK: %[[VAL_24:.*]] = add i32 %[[VAL_17]], %[[VAL_23]]
      !CHECK: %[[VAL_25:.*]] = sub nsw i64 %[[VAL_11]], 1
-     !CHECK: %[[VAL_26:.*]] = mul nsw i64 %[[VAL_25]], 1
-     !CHECK: %[[VAL_27:.*]] = mul nsw i64 %[[VAL_26]], 1
-     !CHECK: %[[VAL_28:.*]] = add nsw i64 %[[VAL_27]], 0
-     !CHECK: %[[VAL_29:.*]] = getelementptr i32, ptr {{.*}}, i64 %[[VAL_28]]
+     !CHECK: %[[VAL_26:.*]] = mul nuw nsw i64 %[[VAL_25]], 1
+     !CHECK: %[[VAL_27:.*]] = mul nuw nsw i64 %[[VAL_26]], 1
+     !CHECK: %[[VAL_28:.*]] = add nuw nsw i64 %[[VAL_27]], 0
+     !CHECK: %[[VAL_29:.*]] = getelementptr nusw nuw i32, ptr {{.*}}, i64 %[[VAL_28]]
      !CHECK: store i32 %[[VAL_24]], ptr %[[VAL_29]], align 4, !llvm.access.group [[DISTRINCT1]]
      !CHECK: %[[VAL_30:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT1]] 
      !CHECK: %[[VAL_31:.*]] = add nsw i32 %[[VAL_30]], 1
@@ -74,23 +74,23 @@ subroutine ivdep_test3
      !CHECK: %[[VAL_10:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT2]]
      !CHECK: %[[VAL_11:.*]] = sext i32 %[[VAL_10]] to i64
      !CHECK: %[[VAL_12:.*]] = sub nsw i64 %[[VAL_11]], 1
-     !CHECK: %[[VAL_13:.*]] = mul nsw i64 %[[VAL_12]], 1
-     !CHECK: %[[VAL_14:.*]] = mul nsw i64 %[[VAL_13]], 1
-     !CHECK: %[[VAL_15:.*]] = add nsw i64 %[[VAL_14]], 0
-     !CHECK: %[[VAL_16:.*]] = getelementptr i32, ptr {{.*}}, i64 %[[VAL_15]]
+     !CHECK: %[[VAL_13:.*]] = mul nuw nsw i64 %[[VAL_12]], 1
+     !CHECK: %[[VAL_14:.*]] = mul nuw nsw i64 %[[VAL_13]], 1
+     !CHECK: %[[VAL_15:.*]] = add nuw nsw i64 %[[VAL_14]], 0
+     !CHECK: %[[VAL_16:.*]] = getelementptr nusw nuw i32, ptr {{.*}}, i64 %[[VAL_15]]
      !CHECK: %[[VAL_17:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT2]] 
      !CHECK: %[[VAL_18:.*]] = sub nsw i64 %[[VAL_11]], 1
-     !CHECK: %[[VAL_19:.*]] = mul nsw i64 %[[VAL_18]], 1
-     !CHECK: %[[VAL_20:.*]] = mul nsw i64 %[[VAL_19]], 1
-     !CHECK: %[[VAL_21:.*]] = add nsw i64 %[[VAL_20]], 0
-     !CHECK: %[[VAL_22:.*]] = getelementptr i32, ptr {{.*}}, i64 %[[VAL_21]]
+     !CHECK: %[[VAL_19:.*]] = mul nuw nsw i64 %[[VAL_18]], 1
+     !CHECK: %[[VAL_20:.*]] = mul nuw nsw i64 %[[VAL_19]], 1
+     !CHECK: %[[VAL_21:.*]] = add nuw nsw i64 %[[VAL_20]], 0
+     !CHECK: %[[VAL_22:.*]] = getelementptr nusw nuw i32, ptr {{.*}}, i64 %[[VAL_21]]
      !CHECK: %[[VAL_23:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT2]] 
      !CHECK: %[[VAL_24:.*]] = add i32 %[[VAL_17]], %[[VAL_23]]
      !CHECK: %[[VAL_25:.*]] = sub nsw i64 %[[VAL_11]], 1
-     !CHECK: %[[VAL_26:.*]] = mul nsw i64 %[[VAL_25]], 1
-     !CHECK: %[[VAL_27:.*]] = mul nsw i64 %[[VAL_26]], 1
-     !CHECK: %[[VAL_28:.*]] = add nsw i64 %[[VAL_27]], 0
-     !CHECK: %[[VAL_29:.*]] = getelementptr i32, ptr {{.*}}, i64 %[[VAL_28]]
+     !CHECK: %[[VAL_26:.*]] = mul nuw nsw i64 %[[VAL_25]], 1
+     !CHECK: %[[VAL_27:.*]] = mul nuw nsw i64 %[[VAL_26]], 1
+     !CHECK: %[[VAL_28:.*]] = add nuw nsw i64 %[[VAL_27]], 0
+     !CHECK: %[[VAL_29:.*]] = getelementptr nusw nuw i32, ptr {{.*}}, i64 %[[VAL_28]]
      !CHECK: store i32 %[[VAL_24]], ptr %[[VAL_29]], align 4, !llvm.access.group [[DISTRINCT2]]
      !CHECK: call void @_QFivdep_test3Pfoo(), !llvm.access.group [[DISTRINCT2]]
      !CHECK: %[[VAL_30:.*]] = load i32, ptr {{.*}}, align 4, !llvm.access.group [[DISTRINCT2]] 
diff --git a/flang/test/Integration/prefetch.f90 b/flang/test/Integration/prefetch.f90
index f3fb7a950e328..c015b6736972a 100644
--- a/flang/test/Integration/prefetch.f90
+++ b/flang/test/Integration/prefetch.f90
@@ -28,7 +28,7 @@ subroutine test_prefetch_01()
 
     ! LLVM: %[[LOAD_I:.*]] = load i32, ptr %[[VAR_I]], align 4
     ! LLVM: %{{.*}} = add nsw i32 %[[LOAD_I]], 64
-    ! LLVM: %[[GEP_A:.*]] = getelementptr i32, ptr %[[VAR_A]], i64 {{.*}}
+    ! LLVM: %[[GEP_A:.*]] = getelementptr nusw nuw i32, ptr %[[VAR_A]], i64 {{.*}}
 
     ! LLVM: call void @llvm.prefetch.p0(ptr %[[GEP_A]], i32 0, i32 3, i32 1)
     ! LLVM: call void @llvm.prefetch.p0(ptr %[[VAR_J]], i32 0, i32 3, i32 1)