[Mlir-commits] [flang] [llvm] [mlir] [do not merge] Remove offset from the memref type and treat it as always dynamic. (PR #192644)

Ivan Butygin llvmlistbot at llvm.org
Fri Apr 17 08:57:03 PDT 2026


https://github.com/Hardcode84 updated https://github.com/llvm/llvm-project/pull/192644

>From b6c994a2e1daa8e2c651c1d6ba4a016af6629f1d Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 00:24:43 +0200
Subject: [PATCH 01/27] [RFC] Drop static offset from MemRefType, keep it in
 ABI and ops

Draft RFC proposing removal of the static offset from StridedLayoutAttr
while preserving offset as a first-class operand/result on memref ops
and keeping the offset slot in the runtime descriptor by default.

Builds on prior discourse threads:
  - https://discourse.llvm.org/t/rfc-removing-offset-from-memref-type-and-lowering/82963
  - https://discourse.llvm.org/t/rfc-contiguous-permutation-offset-o-layout-and-changing-default-memref-layout/85284

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 memref-offset-removal-rfc.md | 189 +++++++++++++++++++++++++++++++++++
 1 file changed, 189 insertions(+)
 create mode 100644 memref-offset-removal-rfc.md

diff --git a/memref-offset-removal-rfc.md b/memref-offset-removal-rfc.md
new file mode 100644
index 0000000000000..e26baf816b592
--- /dev/null
+++ b/memref-offset-removal-rfc.md
@@ -0,0 +1,189 @@
+# RFC: Drop static offset from MemRefType, keep it in ABI and ops
+
+## Status
+
+Draft. Builds on prior discussions:
+- [RFC: Removing offset from MemRef Type and Lowering](https://discourse.llvm.org/t/rfc-removing-offset-from-memref-type-and-lowering/82963)
+- [RFC: ContiguousLayoutAttr and changing default memref layout](https://discourse.llvm.org/t/rfc-contiguous-permutation-offset-o-layout-and-changing-default-memref-layout/85284)
+
+## Summary
+
+Remove the static offset from `StridedLayoutAttr` (and therefore from `MemRefType`).
+Keep offset as a first-class operand/result on `memref.reinterpret_cast`,
+`memref.extract_strided_metadata`, and friends. The type system stops carrying
+offset information; ops still talk about offsets; lowerings decide what offset
+semantics mean at the ABI level.
+
+This is a smaller-blast-radius subset of the original "remove offset
+everywhere" proposal: the runtime descriptor keeps the offset slot by default,
+so existing lowerings remain bit-identical in behavior.
+
+## Motivation
+
+The static offset slot in `StridedLayoutAttr` has not earned its keep:
+
+- It conflates IR-level shape information with ABI/lowering decisions, leaking
+  implementation details into the type system.
+- Most `subview` / `reinterpret_cast` chains produce dynamic offsets in
+  practice; the static slot is rarely populated meaningfully.
+- The "more static offset blocks fold" guard in `canFoldIntoConsumerOp` only
+  exists to prevent casts from inventing offset information. Removing the
+  source of those lies removes the need for the guard.
+- Alternative lowerings (no-offset descriptors, fat pointers) are awkward to
+  support while the type insists on a single offset model.
+- The original author of the offset mechanism has acknowledged that the
+  expected benefits did not materialize (see linked RFC).
+
+## Proposal
+
+### Type level
+
+- Drop the `offset` parameter from `StridedLayoutAttr`. Equivalently: treat
+  it as always `ShapedType::kDynamic` and remove the field.
+- `MemRefType` no longer carries any static offset information.
+- Printer: always omit the `offset:` clause.
+- Parser: accept the legacy form for one release for migration ease, then
+  remove.
+
+### Op level
+
+Operations keep offset as an explicit IR value:
+
+- `memref.reinterpret_cast` continues to accept an `offset` operand.
+  Semantically: "produce a memref view starting at base + offset".
+- `memref.extract_strided_metadata` continues to return an `offset` SSA
+  value. Semantically: "give me the offset that the lowering commits to".
+- `memref.subview` is unchanged at the op level; offset operand remains.
+
+The contract is: offset is a first-class value at the IR level, decoupled
+from the type.
+
+### Lowering strategies
+
+Because offset lives on the op, not the type, lowerings can choose freely:
+
+1. **Current descriptor lowering (default).** Keeps the offset slot in the
+   LLVM struct. `reinterpret_cast` writes offset to the struct;
+   `extract_strided_metadata` reads it. Behavior identical to today.
+
+2. **No-offset lowering.** Collapses offset into the data pointer at
+   lowering time:
+   - `reinterpret_cast` with non-zero offset emits a GEP immediately; the
+     descriptor stores `base + offset`, with no separate offset field.
+   - `extract_strided_metadata` returns a constant 0; downstream DCE
+     removes any arithmetic on it.
+   - LLVM struct loses the offset member.
+
+3. **Fat-pointer lowering.** GEP on the pointer half of the fat pointer;
+   descriptor metadata unchanged.
+
+This factoring makes lowering choice an ABI/codegen decision rather than a
+type-system commitment.
+
+### Folding and canonicalization
+
+- Delete the "more static offset blocks fold" guard in
+  `canFoldIntoConsumerOp` (`mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp`).
+  It guards against lies that can no longer be told.
+- Delete `offset == 0` fast paths in Vector, SparseTensor, and
+  MemRefToLLVM. They exploit information the type no longer carries.
+- Folds that currently constant-propagate offsets through
+  `reinterpret_cast` / `extract_strided_metadata` move from IR-level
+  canonicalization to post-lowering peephole patterns. Pre-lowering, the
+  offset is always conservatively dynamic.
+- Rename or remove `hasStaticLayout()` (currently "all strides static AND
+  offset static"); collapse to "all strides static" or drop entirely.
+
+### API surface
+
+The helper `getStridesAndOffset()` becomes misleading: with no static offset
+on the type, the offset out-param is always `ShapedType::kDynamic` and every
+caller has to plumb it through and ignore it.
+
+- Rename `getStridesAndOffset()` to `getStrides()`. Keep it returning
+  `LogicalResult` so it continues to act as the "is this layout
+  strided-representable?" probe.
+- Drop the offset out-param.
+- Audit ~80 call sites; the rewrite is mechanical.
+
+Edge case: affine-map layouts can in principle compute a static offset
+even when `StridedLayoutAttr` cannot carry one. If any consumer relies on
+that, expose it through a separate `getStaticOffsetIfAny()` returning
+`std::optional<int64_t>` rather than keeping the offset glued to the
+strides API. Likely no real consumers exist; verify by grep before
+deleting outright.
+
+## Migration plan
+
+Order matters; each step is independently mergeable.
+
+1. **Nuke offset-based folds first.** Keeps the IR sound while the rest of
+   the work proceeds, and surfaces any hidden dependence on those folds
+   before the type changes.
+2. **Strip `offset` from `StridedLayoutAttr`.** Update printer/parser. Fix
+   the stale `assert(offset == 0)` at
+   `mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp:1828`.
+3. **Mass-update tests.** Roughly 149 `.mlir` files, ~2348 occurrences.
+   Mostly mechanical: `offset: N` becomes omitted.
+4. **Audit `getStridesAndOffset()` call sites** (~80). Most already handle
+   dynamic offset; a few need adjustment.
+5. **Rename `getStridesAndOffset()` to `getStrides()`** and drop the
+   offset out-param. Land as a single sweeping change once step 4 has
+   identified all consumers.
+6. **Optional follow-up.** Introduce a no-offset lowering pipeline option
+   to validate the design end-to-end. Not required for the type-level
+   change to land.
+
+## Blast radius
+
+- Tests: ~149 `.mlir` files updated (mostly scriptable).
+- Code call sites: ~80 `getStridesAndOffset()` sites audited; ~10 fold and
+  special-case sites materially changed.
+- Lowering: default descriptor path unchanged in behavior. No-offset and
+  fat-pointer paths become straightforward to add later.
+- Verifier: no new constraints; some constraints removed.
+- Estimated effort: 2 to 3 weeks for one experienced contributor.
+
+## Alternatives considered
+
+- **`ContiguousLayoutAttr` (Krzysz00).** Introduces a richer layout
+  attribute that explicitly encodes permutations and offset, partially
+  reclaiming optimization information that bare strides lose. Largely
+  orthogonal to this proposal: this RFC removes offset from the static
+  type encoding; `ContiguousLayoutAttr` enriches the dynamic layout
+  vocabulary. Both can coexist.
+
+- **Remove offset from the descriptor entirely (original RFC).** More
+  invasive; conflicts with SPIR-V and other backends that cannot trivially
+  perform pointer arithmetic on opaque pointers. This proposal is the
+  smaller-blast-radius subset: keep ABI flexibility, remove only the
+  type-level fiction.
+
+- **Status quo with better folding hygiene.** Possible, but does not
+  address the fundamental conflation of type and ABI concerns. The same
+  bug class returns over time.
+
+## Open questions
+
+- Does `extract_strided_metadata` need an attribute or trait declaring its
+  offset semantics for lowerings that disagree, or is "always
+  conservatively dynamic pre-lowering" sufficient?
+- Do downstream projects (IREE, Triton, others) materially depend on
+  static offset propagation through subview chains? If yes, what is their
+  migration path?
+- Should `hasStaticLayout()` be removed outright or kept as a renamed
+  shim?
+- Should the parser keep accepting the legacy `offset: N` form for one
+  release as a soft migration, or hard-cut?
+- Do any in-tree or downstream consumers actually use static offsets
+  derived from affine-map layouts via `getStridesAndOffset()`? If yes,
+  introduce `getStaticOffsetIfAny()`; if no, drop the concept.
+
+## Non-goals
+
+- Changing the default lowering. Behavior of the existing descriptor
+  lowering is preserved.
+- Removing offset from the runtime ABI. Out of scope; covered by the
+  original RFC if desired later.
+- Introducing a new layout attribute. Compatible with, but independent
+  of, `ContiguousLayoutAttr`.

>From 48b27723d4779becc20c8aae29205442dc227458 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 00:29:41 +0200
Subject: [PATCH 02/27] [mlir][memref] Remove offset-based guard in
 CastOp::canFoldIntoConsumerOp

The "more static offset blocks fold" guard exists to refuse folding a cast
that claims static offset information the source did not have. Such a cast
is itself untrustworthy, so blocking the fold only serves to keep the
lying cast in the IR.

Step 1 of the static-offset removal RFC. With this guard gone, the type
change in step 2 (dropping offset from StridedLayoutAttr) does not silently
re-enable any unsound fold patterns that were previously blocked here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
index 27c1649ee4ed3..06d5bddbc03cd 100644
--- a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+++ b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
@@ -717,11 +717,9 @@ bool CastOp::canFoldIntoConsumerOp(CastOp castOp) {
         return false;
   }
 
-  // If cast is towards more static offset along any dimension, don't fold.
-  if (sourceOffset != resultOffset)
-    if (ShapedType::isDynamic(sourceOffset) &&
-        ShapedType::isStatic(resultOffset))
-      return false;
+  // Static offset is intentionally not checked here: a cast that claims a
+  // more-static offset cannot be trusted, so blocking the fold on that basis
+  // would only serve to keep the lying cast around.
 
   // If cast is towards more static strides along any dimension, don't fold.
   for (auto it : llvm::zip(sourceStrides, resultStrides)) {

>From 06887db22dc3ef6df6d76cb675c246401811b53c Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 01:10:22 +0200
Subject: [PATCH 03/27] [WIP][mlir] Strip offset from StridedLayoutAttr (step
 2)

Removes the offset parameter from StridedLayoutAttr and its parser/printer.
Updates all C++/CAPI/Python callsites and mass-strips "offset: N" from .mlir
test files. The runtime offset, when present, lives on the producing op
(memref.subview, memref.reinterpret_cast, memref.extract_strided_metadata).

API choices in this WIP:
- getStridesAndOffset(): returns offset = 0 for back-compat with identity
  layouts (which also report 0), keeping subview/cast verifier comparisons
  consistent across both layout forms.
- getAffineMap(): omits the offset term entirely so the resulting map has no
  spurious offset symbol; the alloc verifier no longer demands one.
- ReinterpretCastOp::verify(): drops the type-vs-operand offset compatibility
  check since the type no longer carries that information.

Status: 120/4049 tests still failing. Remaining categories:
  * Dialect/MemRef/{invalid,canonicalize,subview,multibuffer}.mlir
  * SparseTensor codegen (memref<?xT> vs strided<[1]> mismatch)
  * Conversion tests with printer-driven CHECK lines
  * memref.transpose canonical-map equivalence

This commit lands the bulk plumbing so the remaining work can be triaged in
focused follow-ups rather than a single megapatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 mlir/docs/Bufferization.md                    |   4 +-
 mlir/docs/Dialects/Linalg/_index.md           |  30 +-
 mlir/include/mlir-c/BuiltinAttributes.h       |   7 +-
 .../mlir/Dialect/MemRef/IR/MemRefOps.td       |  44 +--
 .../Dialect/MemRef/Transforms/Transforms.h    |   2 +-
 .../mlir/Dialect/OpenACC/OpenACCOps.td        |  10 +-
 mlir/include/mlir/IR/BuiltinAttributes.td     |  27 +-
 mlir/include/mlir/IR/BuiltinTypes.td          |  17 +-
 mlir/lib/AsmParser/AttributeParser.cpp        |  25 +-
 mlir/lib/AsmParser/TokenKinds.def             |   1 -
 mlir/lib/Bindings/Python/IRAttributes.cpp     |  19 +-
 mlir/lib/CAPI/IR/BuiltinAttributes.cpp        |   9 +-
 .../LinalgToStandard/LinalgToStandard.cpp     |   2 +-
 mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp      |   2 +-
 .../IR/BufferizableOpInterface.cpp            |   5 +-
 .../GPU/Transforms/DecomposeMemRefs.cpp       |  10 +-
 mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp      |  73 ++--
 .../Transforms/ElideReinterpretCast.cpp       |   8 +-
 .../MemRef/Transforms/EmulateNarrowType.cpp   |  24 +-
 .../Transforms/ExtractAddressComputations.cpp |   2 +-
 .../MemRef/Transforms/FlattenMemRefs.cpp      |   2 +-
 .../Transforms/IndependenceTransforms.cpp     |   2 +-
 .../Transforms/RuntimeOpVerification.cpp      |   5 +-
 .../SCF/Transforms/ParallelLoopFusion.cpp     |   2 +-
 .../SparseTensor/IR/SparseTensorDialect.cpp   |   8 +-
 .../BufferizableOpInterfaceImpl.cpp           |   6 +-
 .../VectorTransferSplitRewritePatterns.cpp    |   4 +-
 mlir/lib/IR/BuiltinAttributes.cpp             |  40 ++-
 mlir/python/mlir/dialects/memref.py           |  13 +-
 .../test-strided-metadata-range-analysis.mlir |  14 +-
 mlir/test/CAPI/ir.c                           |   7 +-
 .../AMDGPUToROCDL/amdgpu-to-rocdl.mlir        |  18 +-
 .../bufferization-to-memref.mlir              |  22 +-
 .../FuncToLLVM/func-memref-return.mlir        |   4 +-
 .../FuncToSPIRV/types-to-spirv.mlir           |  20 +-
 .../convert-dynamic-memref-ops.mlir           |   2 +-
 .../expand-then-convert-to-llvm.mlir          | 140 ++++----
 .../memref-to-llvm-with-transforms.mlir       |   6 +-
 .../MemRefToLLVM/memref-to-llvm.mlir          |  46 +--
 .../MemRefToSPIRV/memref-to-spirv.mlir        |  24 +-
 .../Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir |   8 +-
 .../Conversion/PtrToLLVM/ptr-to-llvm.mlir     |  20 +-
 .../Conversion/SCFToGPU/parallel_loop.mlir    |  76 ++---
 .../ShardToMPI/convert-shard-to-mpi.mlir      |  60 ++--
 .../vector-to-mma-ops-mma-sync.mlir           |   8 +-
 .../vector-to-llvm-interface.mlir             |   8 +-
 .../Conversion/VectorToSCF/vector-to-scf.mlir |  10 +-
 .../VectorToXeGPU/gather-to-xegpu.mlir        |  24 +-
 .../VectorToXeGPU/load-to-xegpu.mlir          |   2 +-
 .../VectorToXeGPU/scatter-to-xegpu.mlir       |  24 +-
 .../VectorToXeGPU/store-to-xegpu.mlir         |   2 +-
 .../VectorToXeGPU/transfer-read-to-xegpu.mlir |  32 +-
 .../transfer-write-to-xegpu.mlir              |  18 +-
 .../XeGPUToXeVM/loadstore_matrix.mlir         |  12 +-
 .../XeGPUToXeVM/loadstoreprefetch.mlir        |   6 +-
 .../Dialect/AMDGPU/amdgpu-fold-memrefs.mlir   |  12 +-
 .../amdgpu-resolve-strided-metadata.mlir      |  14 +-
 mlir/test/Dialect/AMDGPU/invalid.mlir         |   6 +-
 mlir/test/Dialect/AMDGPU/ops.mlir             |  24 +-
 .../Dialect/Affine/fold-memref-alias-ops.mlir |  18 +-
 mlir/test/Dialect/Affine/loop-fusion-4.mlir   |   6 +-
 .../Affine/memref-stride-calculation.mlir     |  10 +-
 mlir/test/Dialect/Affine/ops.mlir             |   4 +-
 .../Dialect/ArmSME/vector-legalization.mlir   |  10 +-
 .../dealloc-subviews.mlir                     |   8 +-
 .../buffer-deallocation-simplification.mlir   |   8 +-
 .../drop-equivalent-buffer-results.mlir       |  26 +-
 ...ot-bufferize-empty-tensor-elimination.mlir |   4 +-
 .../one-shot-bufferize-encodings.mlir         |  24 +-
 .../one-shot-bufferize-partial.mlir           |   6 +-
 .../Transforms/one-shot-bufferize.mlir        |   2 +-
 .../one-shot-module-bufferize-out-params.mlir |  18 +-
 .../Transforms/one-shot-module-bufferize.mlir | 114 +++----
 .../optimize-allocation-liveness.mlir         |   8 +-
 .../Dialect/Bufferization/canonicalize.mlir   |  48 +--
 mlir/test/Dialect/Builtin/types.mlir          |  30 +-
 .../ControlFlow/one-shot-bufferize.mlir       |  14 +-
 mlir/test/Dialect/GPU/decompose-memrefs.mlir  |  36 +-
 mlir/test/Dialect/GPU/transform-gpu.mlir      |  24 +-
 .../lower-to-llvm-e2e-with-target-tag.mlir    |  10 +-
 ...lvm-e2e-with-top-level-named-sequence.mlir |  10 +-
 mlir/test/Dialect/Linalg/collapse-dim.mlir    |   8 +-
 mlir/test/Dialect/Linalg/hoisting.mlir        |  12 +-
 mlir/test/Dialect/Linalg/library-calls.mlir   |   4 +-
 mlir/test/Dialect/Linalg/loops.mlir           | 112 +++----
 .../Dialect/Linalg/one-shot-bufferize.mlir    |  12 +-
 .../Linalg/pad-to-specific-memory-space.mlir  |  12 +-
 mlir/test/Dialect/Linalg/promote.mlir         |  70 ++--
 .../Dialect/Linalg/promotion_options.mlir     |  18 +-
 mlir/test/Dialect/Linalg/roundtrip.mlir       |  84 ++---
 mlir/test/Dialect/Linalg/standard.mlir        |  20 +-
 mlir/test/Dialect/Linalg/tile-softmax.mlir    |   6 +-
 ...compose-masked-vectorize-and-cleanups.mlir |   8 +-
 .../transform-op-linalg-copy-to-memref.mlir   |   8 +-
 .../Dialect/Linalg/transform-patterns.mlir    |  90 ++---
 .../Dialect/Linalg/transform-promotion.mlir   |  64 ++--
 mlir/test/Dialect/MemRef/canonicalize.mlir    | 312 +++++++++---------
 .../MemRef/elide-reinterpret-cast.mlir        |  12 +-
 .../Dialect/MemRef/emulate-narrow-type.mlir   |  32 +-
 .../MemRef/expand-strided-metadata.mlir       | 140 ++++----
 .../MemRef/extract-address-computations.mlir  |  54 +--
 mlir/test/Dialect/MemRef/flatten_memref.mlir  |  58 ++--
 .../Dialect/MemRef/fold-memref-alias-ops.mlir | 206 ++++++------
 mlir/test/Dialect/MemRef/invalid.mlir         |  90 ++---
 .../Dialect/MemRef/make-loop-independent.mlir |   6 +-
 mlir/test/Dialect/MemRef/multibuffer.mlir     |  48 +--
 .../Dialect/MemRef/normalize-memrefs-ops.mlir |   6 +-
 .../Dialect/MemRef/normalize-memrefs.mlir     |  10 +-
 mlir/test/Dialect/MemRef/ops.mlir             | 140 ++++----
 mlir/test/Dialect/MemRef/subview.mlir         |  56 ++--
 mlir/test/Dialect/MemRef/transform-ops.mlir   |  16 +-
 .../value-bounds-op-interface-impl.mlir       |   4 +-
 mlir/test/Dialect/OpenACC/ops.mlir            |   4 +-
 .../SCF/one-shot-bufferize-encodings.mlir     |  26 +-
 mlir/test/Dialect/SCF/one-shot-bufferize.mlir |  22 +-
 .../Dialect/SCF/parallel-loop-fusion.mlir     |  26 +-
 .../Dialect/SCF/parallel-loop-unroll.mlir     |  12 +-
 .../SparseTensor/GPU/gpu_matvec_lib.mlir      |  12 +-
 mlir/test/Dialect/SparseTensor/codegen.mlir   |  10 +-
 .../test/Dialect/SparseTensor/sorted_coo.mlir |  42 +--
 mlir/test/Dialect/Tensor/bufferize.mlir       |  36 +-
 .../Dialect/Tensor/one-shot-bufferize.mlir    |  64 ++--
 .../Transform/test-pattern-application.mlir   |   6 +-
 .../Transform/test-promote-tensors.mlir       |  20 +-
 mlir/test/Dialect/Vector/invalid.mlir         |   8 +-
 .../Dialect/Vector/one-shot-bufferize.mlir    |   8 +-
 mlir/test/Dialect/Vector/ops.mlir             |  20 +-
 ...tor-transfer-collapse-inner-most-dims.mlir | 112 +++----
 ...ctor-transfer-drop-unit-dims-patterns.mlir |  70 ++--
 .../Vector/vector-transfer-flatten.mlir       | 106 +++---
 ...fer-full-partial-split-copy-transform.mlir |  50 +--
 .../vector-transfer-full-partial-split.mlir   |  38 +--
 .../Dialect/Vector/vector-transferop-opt.mlir |  24 +-
 .../Vector/vector-warp-distribute.mlir        |  18 +-
 .../X86/AMX/vector-contract-to-tiled-dp.mlir  |  60 ++--
 .../X86/vector-contract-bf16-to-fma.mlir      |  36 +-
 ...or-contract-to-packed-type-dotproduct.mlir |  12 +-
 mlir/test/Dialect/XeGPU/ops.mlir              |   8 +-
 mlir/test/Examples/NVGPU/Ch4.py               |   4 +-
 mlir/test/Examples/NVGPU/Ch5.py               |   4 +-
 mlir/test/IR/invalid-builtin-types.mlir       |  17 +-
 .../Dialect/Linalg/CPU/matmul-vs-matvec.mlir  |   8 +-
 .../Linalg/CPU/rank-reducing-subview.mlir     |   8 +-
 .../MemRef/cast-runtime-verification.mlir     |  20 +-
 .../MemRef/subview-runtime-verification.mlir  |  40 +--
 .../CPU/sparse_rewrite_sort_coo.mlir          |  68 ++--
 .../Dialect/Standard/CPU/test_subview.mlir    |  16 +-
 .../Dialect/Vector/CPU/transfer-read-1d.mlir  |   8 +-
 .../XeGPU/LANE/load_store_subview.mlir        |   8 +-
 .../sm90/gemm_f32_f16_f16_128x128x128.mlir    |   8 +-
 .../gemm_pred_f32_f16_f16_128x128x128.mlir    |   8 +-
 .../CUDA/sm90/python/tools/matmulBuilder.py   |   4 +-
 .../tma_load_128x128_stride_noswizzle.mlir    |   8 +-
 mlir/test/Transforms/canonicalize.mlir        | 106 +++---
 mlir/test/Transforms/compose-subview.mlir     | 100 +++---
 .../test-bubble-down-memory-space-casts.mlir  |  28 +-
 mlir/test/mlir-runner/copy.mlir               |   8 +-
 .../mlir-runner/memref-reinterpret-cast.mlir  |   8 +-
 mlir/test/python/dialects/memref.py           |  14 +-
 mlir/test/python/execution_engine.py          |  12 +-
 mlir/test/python/ir/attributes.py             |  10 +-
 mlir/test/python/ir/builtin_types.py          |  10 +-
 .../Dialect/MemRef/InferShapeTest.cpp         |   6 +-
 mlir/unittests/IR/MemrefLayoutTest.cpp        |   4 +-
 164 files changed, 2258 insertions(+), 2355 deletions(-)

diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
index e04934a120a00..678f7d5510340 100644
--- a/mlir/docs/Bufferization.md
+++ b/mlir/docs/Bufferization.md
@@ -305,8 +305,8 @@ dynamic offset and strides:
 
 ```mlir
 %0 = "my_dialect.unbufferizable_op(%t) : (tensor<?x?xf32>) -> (tensor<?x?xf32>)
-%0_m = bufferization.to_buffer %0 : memref<?x?xf32, strided<[?, ?], offset: ?>>
-%1 = memref.load %0_m[%idx1, %idx2] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+%0_m = bufferization.to_buffer %0 : memref<?x?xf32, strided<[?, ?]>>
+%1 = memref.load %0_m[%idx1, %idx2] : memref<?x?xf32, strided<[?, ?]>>
 ```
 
 All users of `%0` have fully dynamic layout maps. This ensures that the
diff --git a/mlir/docs/Dialects/Linalg/_index.md b/mlir/docs/Dialects/Linalg/_index.md
index 976f0fd3c7e91..cda7b49cb3424 100644
--- a/mlir/docs/Dialects/Linalg/_index.md
+++ b/mlir/docs/Dialects/Linalg/_index.md
@@ -100,10 +100,10 @@ layout, and the second one is a `memref` of 4-element vectors with a 2-strided,
 }
 
 func.func @example(%A: memref<?xf32, strided<[1]>>,
-              %B: memref<?xvector<4xf32>, strided<[2], offset: 1>>) {
+              %B: memref<?xvector<4xf32>, strided<[2]>>) {
   linalg.generic #attrs
   ins(%A: memref<?xf32, strided<[1]>>)
-  outs(%B: memref<?xvector<4xf32>, strided<[2], offset: 1>>) {
+  outs(%B: memref<?xvector<4xf32>, strided<[2]>>) {
   ^bb0(%a: f32, %b: vector<4xf32>):
     %c = "some_compute"(%a, %b): (f32, vector<4xf32>) -> (vector<4xf32>)
     linalg.yield %c: vector<4xf32>
@@ -121,17 +121,17 @@ materialized by a lowering into a form that will resemble:
 // It's syntax can be found here: https://mlir.llvm.org/docs/Dialects/SCFDialect/
 
 func.func @example(%arg0: memref<?xf32>,
-                   %arg1: memref<?xvector<4xf32>, strided<[2], offset: 1>>) {
+                   %arg1: memref<?xvector<4xf32>, strided<[2]>>) {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %0 = memref.dim %arg0, %c0 : memref<?xf32>
   scf.for %arg2 = %c0 to %0 step %c1 {
     %1 = memref.load %arg0[%arg2] : memref<?xf32>
     %2 = memref.load %arg1[%arg2]
-       : memref<?xvector<4xf32>, strided<[2], offset: 1>>
+       : memref<?xvector<4xf32>, strided<[2]>>
     %3 = "some_compute"(%1, %2) : (f32, vector<4xf32>) -> vector<4xf32>
     memref.store %3, %arg1[%arg2]
-       : memref<?xvector<4xf32>, strided<[2], offset: 1>>
+       : memref<?xvector<4xf32>, strided<[2]>>
   }
   return
 }
@@ -185,10 +185,10 @@ uses an identity layout.
   iterator_types = ["parallel", "parallel"]
 }
 
-func.func @example(%A: memref<8x?xf32, strided<[2, 2], offset: 0>>,
+func.func @example(%A: memref<8x?xf32, strided<[2, 2]>>,
               %B: memref<?xvector<4xf32>>) {
   linalg.generic #attrs
-  ins(%A: memref<8x?xf32, strided<[2, 2], offset: 0>>)
+  ins(%A: memref<8x?xf32, strided<[2, 2]>>)
   outs(%B: memref<?xvector<4xf32>>) {
   ^bb0(%a: f32, %b: vector<4xf32>):
     %c = "some_compute"(%a, %b): (f32, vector<4xf32>) -> (vector<4xf32>)
@@ -399,16 +399,16 @@ into a form that will resemble:
 // Run: mlir-opt example4.mlir -convert-linalg-to-std
 
 func.func @example(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: memref<?x?xf32>) {
-  %0 = memref.cast %arg0 : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  %1 = memref.cast %arg1 : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  %2 = memref.cast %arg2 : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  call @pointwise_add(%0, %1, %2) : (memref<?x?xf32, strided<[?, ?], offset: ?>>,
-    memref<?x?xf32, strided<[?, ?], offset: ?>>, memref<?x?xf32, strided<[?, ?], offset: ?>>) -> ()
+  %0 = memref.cast %arg0 : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
+  %1 = memref.cast %arg1 : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
+  %2 = memref.cast %arg2 : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
+  call @pointwise_add(%0, %1, %2) : (memref<?x?xf32, strided<[?, ?]>>,
+    memref<?x?xf32, strided<[?, ?]>>, memref<?x?xf32, strided<[?, ?]>>) -> ()
   return
 }
-func.func @pointwise_add(memref<?x?xf32, strided<[?, ?], offset: ?>>,
-                         memref<?x?xf32, strided<[?, ?], offset: ?>>,
-                         memref<?x?xf32, strided<[?, ?], offset: ?>>) attributes {llvm.emit_c_interface}
+func.func @pointwise_add(memref<?x?xf32, strided<[?, ?]>>,
+                         memref<?x?xf32, strided<[?, ?]>>,
+                         memref<?x?xf32, strided<[?, ?]>>) attributes {llvm.emit_c_interface}
 ```
 
 Which, after lowering to LLVM resembles:
diff --git a/mlir/include/mlir-c/BuiltinAttributes.h b/mlir/include/mlir-c/BuiltinAttributes.h
index 5619970a1117a..74c7730fc3e1e 100644
--- a/mlir/include/mlir-c/BuiltinAttributes.h
+++ b/mlir/include/mlir-c/BuiltinAttributes.h
@@ -746,16 +746,13 @@ MLIR_CAPI_EXPORTED MlirTypeID mlirSparseElementsAttrGetTypeID(void);
 // Checks wheather the given attribute is a strided layout attribute.
 MLIR_CAPI_EXPORTED bool mlirAttributeIsAStridedLayout(MlirAttribute attr);
 
-// Creates a strided layout attribute from given strides and offset.
+// Creates a strided layout attribute from the given strides.
 MLIR_CAPI_EXPORTED MlirAttribute
-mlirStridedLayoutAttrGet(MlirContext ctx, int64_t offset, intptr_t numStrides,
+mlirStridedLayoutAttrGet(MlirContext ctx, intptr_t numStrides,
                          const int64_t *strides);
 
 MLIR_CAPI_EXPORTED MlirStringRef mlirStridedLayoutAttrGetName(void);
 
-// Returns the offset in the given strided layout layout attribute.
-MLIR_CAPI_EXPORTED int64_t mlirStridedLayoutAttrGetOffset(MlirAttribute attr);
-
 // Returns the number of strides in the given strided layout attribute.
 MLIR_CAPI_EXPORTED intptr_t
 mlirStridedLayoutAttrGetNumStrides(MlirAttribute attr);
diff --git a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
index 9dba4d790d631..74ed0d9f5952a 100644
--- a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
+++ b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
@@ -535,12 +535,12 @@ def MemRef_CastOp : MemRef_Op<"cast", [
     // The same holds true for offsets and strides.
 
     // Assert that the input dynamic shape matches the destination static stride.
-    %4 = memref.cast %1 : memref<12x4xf32, strided<[?, ?], offset: ?>> to
-                          memref<12x4xf32, strided<[4, 1], offset: 5>>
+    %4 = memref.cast %1 : memref<12x4xf32, strided<[?, ?]>> to
+                          memref<12x4xf32, strided<[4, 1]>>
     // Erase static offset and stride information, replacing it with
     // dynamic information.
-    %5 = memref.cast %1 : memref<12x4xf32, strided<[4, 1], offset: 5>> to
-                          memref<12x4xf32, strided<[?, ?], offset: ?>>
+    %5 = memref.cast %1 : memref<12x4xf32, strided<[4, 1]>> to
+                          memref<12x4xf32, strided<[?, ?]>>
     ```
 
     b. Either or both memref types are unranked with the same element type, and
@@ -1041,7 +1041,7 @@ def MemRef_ExtractStridedMetadataOp : MemRef_Op<"extract_strided_metadata", [
           offset: [%offset],
           sizes: [%sizes#0, %sizes#1],
           strides: [%strides#0, %strides#1]
-        : memref<f32> to memref<?x?xf32, strided<[?, ?], offset:?>>
+        : memref<f32> to memref<?x?xf32, strided<[?, ?]>>
     ```
   }];
 
@@ -1510,15 +1510,15 @@ def MemRef_ReinterpretCastOp
       offset: [9],
       sizes: [4, 4],
       strides: [16, 2]
-    : memref<8x8xf32, strided<[8, 1], offset: 0>> to
-      memref<4x4xf32, strided<[16, 2], offset: 9>>
+    : memref<8x8xf32, strided<[8, 1]>> to
+      memref<4x4xf32, strided<[16, 2]>>
 
     %result2 = memref.reinterpret_cast %result1 to
       offset: [0],
       sizes: [2, 2],
       strides: [4, 2]
-    : memref<4x4xf32, strided<[16, 2], offset: 9>> to
-      memref<2x2xf32, strided<[4, 2], offset: 0>>
+    : memref<4x4xf32, strided<[16, 2]>> to
+      memref<2x2xf32, strided<[4, 2]>>
     ```
 
     The underlying memory of `%arg0` consists of a linear sequence of integers
@@ -1573,13 +1573,13 @@ def MemRef_ReinterpretCastOp
       offset: [0],
       sizes: [%size0, 10],
       strides: [1, %stride1]
-    : memref<?x?xf32> to memref<?x10xf32, strided<[1, ?], offset: 0>>
+    : memref<?x?xf32> to memref<?x10xf32, strided<[1, ?]>>
 
     memref.reinterpret_cast %unranked to
       offset: [%offset],
       sizes: [%size0, %size1],
       strides: [%stride0, %stride1]
-    : memref<*xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+    : memref<*xf32> to memref<?x?xf32, strided<[?, ?]>>
     ```
 
     This operation creates a new memref descriptor using the base of the
@@ -1590,7 +1590,7 @@ def MemRef_ReinterpretCastOp
       offset: [%offset],
       sizes: [%sizes],
       strides: [%strides] :
-      memref<*xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+      memref<*xf32> to memref<?x?xf32, strided<[?, ?]>>
     ```
     means that `%dst`'s descriptor will be:
     ```mlir
@@ -2181,12 +2181,12 @@ def SubViewOp : MemRef_OpWithOffsetSizesAndStrides<"subview", [
 
     ```mlir
     %result1 = memref.subview %arg0[1, 1][4, 4][2, 2]
-    : memref<8x8xf32, strided<[8, 1], offset: 0>> to
-      memref<4x4xf32, strided<[16, 2], offset: 9>>
+    : memref<8x8xf32, strided<[8, 1]>> to
+      memref<4x4xf32, strided<[16, 2]>>
 
     %result2 = memref.subview %result1[1, 1][2, 2][2, 2]
-    : memref<4x4xf32, strided<[16, 2], offset: 9>> to
-      memref<2x2xf32, strided<[32, 4], offset: 27>>
+    : memref<4x4xf32, strided<[16, 2]>> to
+      memref<2x2xf32, strided<[32, 4]>>
     ```
 
     The underlying memory of `%arg0` consists of a linear sequence of integers
@@ -2234,8 +2234,8 @@ def SubViewOp : MemRef_OpWithOffsetSizesAndStrides<"subview", [
     // Subview of static memref with strided layout at static offsets, sizes
     // and strides.
     %1 = memref.subview %0[4, 2][8, 2][3, 2]
-        : memref<64x4xf32, strided<[7, 9], offset: 91>> to
-          memref<8x2xf32, strided<[21, 18], offset: 137>>
+        : memref<64x4xf32, strided<[7, 9]>> to
+          memref<8x2xf32, strided<[21, 18]>>
     ```
 
     Example 3:
@@ -2244,7 +2244,7 @@ def SubViewOp : MemRef_OpWithOffsetSizesAndStrides<"subview", [
     // Subview of static memref with identity layout at dynamic offsets, sizes
     // and strides.
     %1 = memref.subview %0[%off0, %off1][%sz0, %sz1][%str0, %str1]
-        : memref<64x4xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+        : memref<64x4xf32> to memref<?x?xf32, strided<[?, ?]>>
     ```
 
     Example 4:
@@ -2253,8 +2253,8 @@ def SubViewOp : MemRef_OpWithOffsetSizesAndStrides<"subview", [
     // Subview of dynamic memref with strided layout at dynamic offsets and
     // strides, but static sizes.
     %1 = memref.subview %0[%off0, %off1][4, 4][%str0, %str1]
-        : memref<?x?xf32, strided<[?, ?], offset: ?>> to
-          memref<4x4xf32, strided<[?, ?], offset: ?>>
+        : memref<?x?xf32, strided<[?, ?]>> to
+          memref<4x4xf32, strided<[?, ?]>>
     ```
 
     Example 5:
@@ -2264,7 +2264,7 @@ def SubViewOp : MemRef_OpWithOffsetSizesAndStrides<"subview", [
     %1 = memref.subview %0[0, 0, 0][1, 16, 4][1, 1, 1]
         : memref<8x16x4xf32> to memref<16x4xf32>
     %3 = memref.subview %2[3, 4, 2][1, 6, 3][1, 1, 1]
-        : memref<8x16x4xf32> to memref<6x3xf32, strided<[4, 1], offset: 210>>
+        : memref<8x16x4xf32> to memref<6x3xf32, strided<[4, 1]>>
     ```
 
     Example 6:
diff --git a/mlir/include/mlir/Dialect/MemRef/Transforms/Transforms.h b/mlir/include/mlir/Dialect/MemRef/Transforms/Transforms.h
index 62745f8fa1dfa..8ee52f1a54d11 100644
--- a/mlir/include/mlir/Dialect/MemRef/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/MemRef/Transforms/Transforms.h
@@ -121,7 +121,7 @@ void populateMemRefNarrowTypeEmulationConversions(
 ///   %d = arith.divsi %s, %c3 : index
 ///   %i = arith.remsi %d, %c5 : index
 ///   %sv = memref.subview %0[%i, 0, 0] [1, 4, 128] [1, 1, 1] :
-///     memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+///     memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
 ///   memref.copy %1, %sv : memref<4x128xf32> to memref<4x128xf32, strided<...>>
 ///   "some_use"(%sv) : (memref<4x128xf32, strided<...>) -> ()
 /// }
diff --git a/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td b/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
index 32ecaa6bc2d42..ff3cec297409d 100644
--- a/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
+++ b/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
@@ -1538,7 +1538,7 @@ def OpenACC_FirstprivateRecipeOp
       %extent_inner = acc.get_extent %bounds_inner : (!acc.data_bounds_ty) -> index
       %extent_outer = acc.get_extent %bounds_outer : (!acc.data_bounds_ty) -> index
       %subview = memref.subview %original[%lb_outer, %lb_inner][%extent_outer, %extent_inner][1, 1]
-        : memref<10x20xf32> to memref<?x?xf32, strided<[20, 1], offset: ?>>
+        : memref<10x20xf32> to memref<?x?xf32, strided<[20, 1]>>
       // Copy subview to privatized...
       acc.terminator
     }
@@ -1656,13 +1656,13 @@ def OpenACC_ReductionRecipeOp
 
       // Create subviews to access only the slice portions
       %lhs_slice = memref.subview %lhs[%lb_outer, %lb_inner][%extent_outer, %extent_inner][1, 1]
-        : memref<10x20xf32> to memref<?x?xf32, strided<[20, 1], offset: ?>>
+        : memref<10x20xf32> to memref<?x?xf32, strided<[20, 1]>>
       %rhs_slice = memref.subview %rhs[%lb_outer, %lb_inner][%extent_outer, %extent_inner][1, 1]
-        : memref<10x20xf32> to memref<?x?xf32, strided<[20, 1], offset: ?>>
+        : memref<10x20xf32> to memref<?x?xf32, strided<[20, 1]>>
 
       // Combine only the slice portions
-      linalg.add ins(%lhs_slice, %rhs_slice : memref<?x?xf32, strided<[20, 1], offset: ?>>, memref<?x?xf32, strided<[20, 1], offset: ?>>)
-                 outs(%lhs_slice : memref<?x?xf32, strided<[20, 1], offset: ?>>)
+      linalg.add ins(%lhs_slice, %rhs_slice : memref<?x?xf32, strided<[20, 1]>>, memref<?x?xf32, strided<[20, 1]>>)
+                 outs(%lhs_slice : memref<?x?xf32, strided<[20, 1]>>)
       acc.yield %lhs : memref<10x20xf32>
     }
 
diff --git a/mlir/include/mlir/IR/BuiltinAttributes.td b/mlir/include/mlir/IR/BuiltinAttributes.td
index 299200788136a..e35de7aafdce9 100644
--- a/mlir/include/mlir/IR/BuiltinAttributes.td
+++ b/mlir/include/mlir/IR/BuiltinAttributes.td
@@ -1031,8 +1031,7 @@ def StridedLayoutAttr : Builtin_Attr<"StridedLayout", "strided_layout",
     Syntax:
 
     ```
-    strided-layout-attribute ::= `strided` `<` `[` stride-list `]`
-                                 (`,` `offset` `:` dimension)? `>`
+    strided-layout-attribute ::= `strided` `<` `[` stride-list `]` `>`
     stride-list ::= /*empty*/
                   | dimension (`,` dimension)*
     dimension ::= decimal-literal | `?`
@@ -1043,22 +1042,22 @@ def StridedLayoutAttr : Builtin_Attr<"StridedLayout", "strided_layout",
     each dimension. A stride is the number of elements in the linear storage
     one must step over to reflect an increment in the given dimension. For
     example, a `MxN` row-major contiguous shaped type would have the strides
-    `[N, 1]`. The layout attribute also contains the _offset_ from the base
-    pointer of the shaped type to the first effectively accessed element,
-    expressed in terms of the number of contiguously stored elements.
+    `[N, 1]`.
 
-    Strides must be positive and the offset must be non-negative. Both the
-    strides and the offset may be _dynamic_, i.e. their value may not be known
-    at compile time. This is expressed as a `?` in the assembly syntax and as
-    `ShapedType::kDynamic` in the code. Stride and offset values
-    must satisfy the constraints above at runtime, the behavior is undefined
-    otherwise.
+    Strides must be positive. They may be _dynamic_, i.e. their value may not
+    be known at compile time. This is expressed as a `?` in the assembly syntax
+    and as `ShapedType::kDynamic` in the code. Stride values must satisfy the
+    constraints above at runtime, the behavior is undefined otherwise.
+
+    The offset of a strided memref is not represented in the type. Operations
+    that need to express an offset (`memref.subview`, `memref.reinterpret_cast`,
+    `memref.extract_strided_metadata`) carry it as an explicit operand or
+    result.
 
     See [Dialects/Builtin.md#memreftype](MemRef type) for more information.
   }];
 
   let parameters = (ins
-    "int64_t":$offset,
     ArrayRefParameter<
       "int64_t",
       "array of strides (64-bit integer)"
@@ -1070,8 +1069,8 @@ def StridedLayoutAttr : Builtin_Attr<"StridedLayout", "strided_layout",
     /// Print the attribute to the given output stream.
     void print(raw_ostream &os) const;
 
-    /// Returns true if this layout is static, i.e. the strides and offset all
-    /// have a known value > 0.
+    /// Returns true if this layout is static, i.e. all strides have a known
+    /// value > 0.
     bool hasStaticLayout() const;
   }];
 }
diff --git a/mlir/include/mlir/IR/BuiltinTypes.td b/mlir/include/mlir/IR/BuiltinTypes.td
index 20c41c5f79729..0db4c9174bab0 100644
--- a/mlir/include/mlir/IR/BuiltinTypes.td
+++ b/mlir/include/mlir/IR/BuiltinTypes.td
@@ -802,18 +802,17 @@ def Builtin_MemRef : Builtin_Type<"MemRef", "memref", [
     even elements of the dense consecutive storage along the innermost
     dimension.
 
-    The strided layout supports an optional _offset_ that indicates the
-    distance, in the number of elements, between the beginning of the memref
-    and the first accessed element. When omitted, the offset is considered to
-    be zero. That is, `memref<2, strided<[2], offset: 0>>` and
-    `memref<2, strided<[2]>>` are strictly the same type.
+    The strided layout does not carry an offset. The offset between the
+    base pointer of the underlying buffer and the first accessed element is
+    a runtime concept exposed by ops such as `memref.subview`,
+    `memref.reinterpret_cast`, and `memref.extract_strided_metadata`.
 
-    Both offsets and strides may be _dynamic_, that is, unknown at compile time.
-    This is represented by using a question mark (`?`) instead of the value in
-    the textual form of the IR.
+    Strides may be _dynamic_, that is, unknown at compile time. This is
+    represented by using a question mark (`?`) instead of the value in the
+    textual form of the IR.
 
     The strided layout converts into the following canonical one-dimensional
-    affine form through explicit linearization:
+    affine form through explicit linearization, with a symbolic offset:
 
     ```mlir
     affine_map<(d0, ... dN)[offset, stride0, ... strideN] ->
diff --git a/mlir/lib/AsmParser/AttributeParser.cpp b/mlir/lib/AsmParser/AttributeParser.cpp
index d7075b795ccb9..675a6f9e608fa 100644
--- a/mlir/lib/AsmParser/AttributeParser.cpp
+++ b/mlir/lib/AsmParser/AttributeParser.cpp
@@ -1291,30 +1291,13 @@ Attribute Parser::parseStridedLayoutAttr() {
     } while (consumeIf(Token::comma));
   }
 
-  if (failed(parseToken(Token::r_square, "expected ']'")))
+  if (failed(parseToken(Token::r_square, "expected ']'")) ||
+      failed(parseToken(Token::greater, "expected '>'")))
     return nullptr;
 
-  // Fast path in absence of offset.
-  if (consumeIf(Token::greater)) {
-    if (failed(StridedLayoutAttr::verify(errorEmitter,
-                                         /*offset=*/0, strides)))
-      return nullptr;
-    return StridedLayoutAttr::get(getContext(), /*offset=*/0, strides);
-  }
-
-  if (failed(parseToken(Token::comma, "expected ','")) ||
-      failed(parseToken(Token::kw_offset, "expected 'offset' after comma")) ||
-      failed(parseToken(Token::colon, "expected ':' after 'offset'")))
-    return nullptr;
-
-  std::optional<int64_t> offset = parseStrideOrOffset();
-  if (!offset || failed(parseToken(Token::greater, "expected '>'")))
-    return nullptr;
-
-  if (failed(StridedLayoutAttr::verify(errorEmitter, *offset, strides)))
+  if (failed(StridedLayoutAttr::verify(errorEmitter, strides)))
     return nullptr;
-  return StridedLayoutAttr::get(getContext(), *offset, strides);
-  // return getChecked<StridedLayoutAttr>(loc,getContext(), *offset, strides);
+  return StridedLayoutAttr::get(getContext(), strides);
 }
 
 /// Parse a distinct attribute.
diff --git a/mlir/lib/AsmParser/TokenKinds.def b/mlir/lib/AsmParser/TokenKinds.def
index fe7c53753e156..cd1ad29a1d11d 100644
--- a/mlir/lib/AsmParser/TokenKinds.def
+++ b/mlir/lib/AsmParser/TokenKinds.def
@@ -118,7 +118,6 @@ TOK_KEYWORD(memref)
 TOK_KEYWORD(min)
 TOK_KEYWORD(mod)
 TOK_KEYWORD(none)
-TOK_KEYWORD(offset)
 TOK_KEYWORD(size)
 TOK_KEYWORD(sparse)
 TOK_KEYWORD(step)
diff --git a/mlir/lib/Bindings/Python/IRAttributes.cpp b/mlir/lib/Bindings/Python/IRAttributes.cpp
index 7fada5bbc8502..1e13512d7db5d 100644
--- a/mlir/lib/Bindings/Python/IRAttributes.cpp
+++ b/mlir/lib/Bindings/Python/IRAttributes.cpp
@@ -1269,13 +1269,12 @@ void PyUnitAttribute::bindDerived(ClassTy &c) {
 void PyStridedLayoutAttribute::bindDerived(ClassTy &c) {
   c.def_static(
       "get",
-      [](int64_t offset, const std::vector<int64_t> &strides,
-         DefaultingPyMlirContext ctx) {
+      [](const std::vector<int64_t> &strides, DefaultingPyMlirContext ctx) {
         MlirAttribute attr = mlirStridedLayoutAttrGet(
-            ctx->get(), offset, strides.size(), strides.data());
+            ctx->get(), strides.size(), strides.data());
         return PyStridedLayoutAttribute(ctx->getRef(), attr);
       },
-      nb::arg("offset"), nb::arg("strides"), nb::arg("context") = nb::none(),
+      nb::arg("strides"), nb::arg("context") = nb::none(),
       "Gets a strided layout attribute.");
   c.def_static(
       "get_fully_dynamic",
@@ -1284,19 +1283,11 @@ void PyStridedLayoutAttribute::bindDerived(ClassTy &c) {
         std::vector<int64_t> strides(rank);
         std::fill(strides.begin(), strides.end(), dynamic);
         MlirAttribute attr = mlirStridedLayoutAttrGet(
-            ctx->get(), dynamic, strides.size(), strides.data());
+            ctx->get(), strides.size(), strides.data());
         return PyStridedLayoutAttribute(ctx->getRef(), attr);
       },
       nb::arg("rank"), nb::arg("context") = nb::none(),
-      "Gets a strided layout attribute with dynamic offset and strides of "
-      "a "
-      "given rank.");
-  c.def_prop_ro(
-      "offset",
-      [](PyStridedLayoutAttribute &self) {
-        return mlirStridedLayoutAttrGetOffset(self);
-      },
-      "Returns the value of the float point attribute");
+      "Gets a strided layout attribute with dynamic strides of a given rank.");
   c.def_prop_ro(
       "strides",
       [](PyStridedLayoutAttribute &self) {
diff --git a/mlir/lib/CAPI/IR/BuiltinAttributes.cpp b/mlir/lib/CAPI/IR/BuiltinAttributes.cpp
index 4ced5fe111645..49c3fe194b1b9 100644
--- a/mlir/lib/CAPI/IR/BuiltinAttributes.cpp
+++ b/mlir/lib/CAPI/IR/BuiltinAttributes.cpp
@@ -1038,10 +1038,9 @@ bool mlirAttributeIsAStridedLayout(MlirAttribute attr) {
   return llvm::isa<StridedLayoutAttr>(unwrap(attr));
 }
 
-MlirAttribute mlirStridedLayoutAttrGet(MlirContext ctx, int64_t offset,
-                                       intptr_t numStrides,
+MlirAttribute mlirStridedLayoutAttrGet(MlirContext ctx, intptr_t numStrides,
                                        const int64_t *strides) {
-  return wrap(StridedLayoutAttr::get(unwrap(ctx), offset,
+  return wrap(StridedLayoutAttr::get(unwrap(ctx),
                                      ArrayRef<int64_t>(strides, numStrides)));
 }
 
@@ -1049,10 +1048,6 @@ MlirStringRef mlirStridedLayoutAttrGetName(void) {
   return wrap(StridedLayoutAttr::name);
 }
 
-int64_t mlirStridedLayoutAttrGetOffset(MlirAttribute attr) {
-  return llvm::cast<StridedLayoutAttr>(unwrap(attr)).getOffset();
-}
-
 intptr_t mlirStridedLayoutAttrGetNumStrides(MlirAttribute attr) {
   return static_cast<intptr_t>(
       llvm::cast<StridedLayoutAttr>(unwrap(attr)).getStrides().size());
diff --git a/mlir/lib/Conversion/LinalgToStandard/LinalgToStandard.cpp b/mlir/lib/Conversion/LinalgToStandard/LinalgToStandard.cpp
index 54c554eb6bd93..8a03921fb557c 100644
--- a/mlir/lib/Conversion/LinalgToStandard/LinalgToStandard.cpp
+++ b/mlir/lib/Conversion/LinalgToStandard/LinalgToStandard.cpp
@@ -26,7 +26,7 @@ using namespace mlir::linalg;
 
 static MemRefType makeStridedLayoutDynamic(MemRefType type) {
   return MemRefType::Builder(type).setLayout(StridedLayoutAttr::get(
-      type.getContext(), ShapedType::kDynamic,
+      type.getContext(),
       SmallVector<int64_t>(type.getRank(), ShapedType::kDynamic)));
 }
 
diff --git a/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp b/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
index d4811275b6fd6..faee30e70ad9d 100644
--- a/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
+++ b/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
@@ -82,7 +82,7 @@ static FailureOr<MemRefType> getFatRawBufferTypeLike(MemRefType source,
     if (!stridedLayout)
       return failure();
     MemRefLayoutAttrInterface newLayout =
-        StridedLayoutAttr::get(ctx, 0, stridedLayout.getStrides());
+        StridedLayoutAttr::get(ctx, stridedLayout.getStrides());
     // Special case: if resetting the offset causes the strided layout to become
     // the identity layout, then reset to the identity layout.
     // TODO: this'll get a lot simpler when we have the contiguous layout.
diff --git a/mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp b/mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp
index 08319ef9df79a..57bf087d149ce 100644
--- a/mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp
+++ b/mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp
@@ -842,11 +842,10 @@ bufferization::getMemRefTypeWithFullyDynamicLayout(TensorType tensorType,
 
   // Case 2: Ranked memref type.
   auto rankedTensorType = llvm::cast<RankedTensorType>(tensorType);
-  int64_t dynamicOffset = ShapedType::kDynamic;
   SmallVector<int64_t> dynamicStrides(rankedTensorType.getRank(),
                                       ShapedType::kDynamic);
-  auto stridedLayout = StridedLayoutAttr::get(tensorType.getContext(),
-                                              dynamicOffset, dynamicStrides);
+  auto stridedLayout =
+      StridedLayoutAttr::get(tensorType.getContext(), dynamicStrides);
   return MemRefType::get(rankedTensorType.getShape(),
                          rankedTensorType.getElementType(), stridedLayout,
                          memorySpace);
diff --git a/mlir/lib/Dialect/GPU/Transforms/DecomposeMemRefs.cpp b/mlir/lib/Dialect/GPU/Transforms/DecomposeMemRefs.cpp
index 7b30906abc2fd..4a21095b35566 100644
--- a/mlir/lib/Dialect/GPU/Transforms/DecomposeMemRefs.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/DecomposeMemRefs.cpp
@@ -27,13 +27,9 @@ namespace mlir {
 
 using namespace mlir;
 
-static MemRefType inferCastResultType(Value source, OpFoldResult offset) {
+static MemRefType inferCastResultType(Value source) {
   auto sourceType = cast<BaseMemRefType>(source.getType());
-  SmallVector<int64_t> staticOffsets;
-  SmallVector<Value> dynamicOffsets;
-  dispatchIndexOpFoldResults(offset, dynamicOffsets, staticOffsets);
-  auto stridedLayout =
-      StridedLayoutAttr::get(source.getContext(), staticOffsets.front(), {});
+  auto stridedLayout = StridedLayoutAttr::get(source.getContext(), {});
   return MemRefType::get({}, sourceType.getElementType(), stridedLayout,
                          sourceType.getMemorySpace());
 }
@@ -107,7 +103,7 @@ static Value getFlatMemref(OpBuilder &rewriter, Location loc, Value source,
   SmallVector<OpFoldResult> offsetsTemp = getAsOpFoldResult(offsets);
   auto &&[base, offset, ignore] =
       getFlatOffsetAndStrides(rewriter, loc, source, offsetsTemp);
-  MemRefType retType = inferCastResultType(base, offset);
+  MemRefType retType = inferCastResultType(base);
   return memref::ReinterpretCastOp::create(rewriter, loc, retType, base, offset,
                                            ArrayRef<OpFoldResult>(),
                                            ArrayRef<OpFoldResult>());
diff --git a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
index 06d5bddbc03cd..9c52f64099278 100644
--- a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+++ b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
@@ -2045,8 +2045,8 @@ void ReinterpretCastOp::build(OpBuilder &b, OperationState &result,
   dispatchIndexOpFoldResults(offset, dynamicOffsets, staticOffsets);
   dispatchIndexOpFoldResults(sizes, dynamicSizes, staticSizes);
   dispatchIndexOpFoldResults(strides, dynamicStrides, staticStrides);
-  auto stridedLayout = StridedLayoutAttr::get(
-      b.getContext(), staticOffsets.front(), staticStrides);
+  auto stridedLayout =
+      StridedLayoutAttr::get(b.getContext(), staticStrides);
   auto resultType = MemRefType::get(staticSizes, sourceType.getElementType(),
                                     stridedLayout, sourceType.getMemorySpace());
   build(b, result, resultType, source, offset, sizes, strides, attrs);
@@ -2102,23 +2102,15 @@ LogicalResult ReinterpretCastOp::verify() {
              << " instead of " << resultSize << " in dim = " << idx;
   }
 
-  // Match offset and strides in static_offset and static_strides attributes. If
-  // result memref type has no affine map specified, this will assume an
-  // identity layout.
+  // Match strides in static_strides attribute. The result type no longer
+  // carries an offset, so the static_offsets attribute is the sole carrier of
+  // offset information for this op and is not cross-checked here.
   int64_t resultOffset;
   SmallVector<int64_t, 4> resultStrides;
   if (failed(resultType.getStridesAndOffset(resultStrides, resultOffset)))
     return emitError("expected result type to have strided layout but found ")
            << resultType;
-
-  // Match offset in result memref type and in static_offsets attribute.
-  int64_t expectedOffset = getStaticOffsets().front();
-  if (ShapedType::isStatic(resultOffset) && resultOffset != expectedOffset)
-    return emitError("expected result type with offset = ")
-           << (ShapedType::isDynamic(expectedOffset)
-                   ? std::string("dynamic")
-                   : std::to_string(expectedOffset))
-           << " instead of " << resultOffset;
+  (void)resultOffset;
 
   // Match strides in result memref type and in static_strides attribute.
   for (auto [idx, resultStride, expectedStride] :
@@ -2532,7 +2524,7 @@ computeExpandedLayoutMap(MemRefType srcType, ArrayRef<int64_t> resultShape,
   }
   auto resultStrides = llvm::to_vector<8>(llvm::reverse(reverseResultStrides));
   resultStrides.resize(resultShape.size(), 1);
-  return StridedLayoutAttr::get(srcType.getContext(), srcOffset, resultStrides);
+  return StridedLayoutAttr::get(srcType.getContext(), resultStrides);
 }
 
 FailureOr<MemRefType> ExpandShapeOp::computeExpandedType(
@@ -2828,7 +2820,7 @@ computeCollapsedLayoutMap(MemRefType srcType,
         return failure();
     }
   }
-  return StridedLayoutAttr::get(srcType.getContext(), srcOffset, resultStrides);
+  return StridedLayoutAttr::get(srcType.getContext(), resultStrides);
 }
 
 bool CollapseShapeOp::isGuaranteedCollapsible(
@@ -3081,19 +3073,9 @@ MemRefType SubViewOp::inferResultType(MemRefType sourceMemRefType,
   assert(staticSizes.size() == rank && "staticSizes length mismatch");
   assert(staticStrides.size() == rank && "staticStrides length mismatch");
 
-  // Extract source offset and strides.
+  // Extract source strides (offset is no longer carried by the type).
   auto [sourceStrides, sourceOffset] = sourceMemRefType.getStridesAndOffset();
-
-  // Compute target offset whose value is:
-  //   `sourceOffset + sum_i(staticOffset_i * sourceStrides_i)`.
-  int64_t targetOffset = sourceOffset;
-  for (auto it : llvm::zip(staticOffsets, sourceStrides)) {
-    auto staticOffset = std::get<0>(it), sourceStride = std::get<1>(it);
-    targetOffset = (SaturatedInteger::wrap(targetOffset) +
-                    SaturatedInteger::wrap(staticOffset) *
-                        SaturatedInteger::wrap(sourceStride))
-                       .asInteger();
-  }
+  (void)sourceOffset;
 
   // Compute target stride whose value is:
   //   `sourceStrides_i * staticStrides_i`.
@@ -3107,10 +3089,10 @@ MemRefType SubViewOp::inferResultType(MemRefType sourceMemRefType,
   }
 
   // The type is now known.
-  return MemRefType::get(staticSizes, sourceMemRefType.getElementType(),
-                         StridedLayoutAttr::get(sourceMemRefType.getContext(),
-                                                targetOffset, targetStrides),
-                         sourceMemRefType.getMemorySpace());
+  return MemRefType::get(
+      staticSizes, sourceMemRefType.getElementType(),
+      StridedLayoutAttr::get(sourceMemRefType.getContext(), targetStrides),
+      sourceMemRefType.getMemorySpace());
 }
 
 MemRefType SubViewOp::inferResultType(MemRefType sourceMemRefType,
@@ -3158,7 +3140,6 @@ MemRefType SubViewOp::inferRankReducedResultType(
   }
   return MemRefType::get(resultShape, inferredType.getElementType(),
                          StridedLayoutAttr::get(inferredLayout.getContext(),
-                                                inferredLayout.getOffset(),
                                                 rankReducedStrides),
                          inferredType.getMemorySpace());
 }
@@ -3476,10 +3457,10 @@ static MemRefType getCanonicalSubViewResultType(
     strides.push_back(stride);
   }
 
-  return MemRefType::get(shape, nonRankReducedType.getElementType(),
-                         StridedLayoutAttr::get(sourceType.getContext(),
-                                                layout.getOffset(), strides),
-                         nonRankReducedType.getMemorySpace());
+  return MemRefType::get(
+      shape, nonRankReducedType.getElementType(),
+      StridedLayoutAttr::get(sourceType.getContext(), strides),
+      nonRankReducedType.getMemorySpace());
 }
 
 Value mlir::memref::createCanonicalRankReducingSubViewOp(
@@ -3556,13 +3537,13 @@ namespace {
 /// ```
 ///   %0 = memref.cast %V : memref<16x16xf32> to memref<?x?xf32>
 ///   %1 = memref.subview %0[0, 0][3, 4][1, 1] :
-///     memref<?x?xf32> to memref<3x4xf32, strided<[?, 1], offset: ?>>
+///     memref<?x?xf32> to memref<3x4xf32, strided<[?, 1]>>
 /// ```
 /// is rewritten into:
 /// ```
 ///   %0 = memref.subview %V: memref<16x16xf32> to memref<3x4xf32, #[[map0]]>
-///   %1 = memref.cast %0: memref<3x4xf32, strided<[16, 1], offset: 0>> to
-///     memref<3x4xf32, strided<[?, 1], offset: ?>>
+///   %1 = memref.cast %0: memref<3x4xf32, strided<[16, 1]>> to
+///     memref<3x4xf32, strided<[?, 1]>>
 /// ```
 class SubViewOpMemRefCastFolder final : public OpRewritePattern<SubViewOp> {
 public:
@@ -3658,10 +3639,10 @@ struct SubViewReturnTypeCanonicalizer {
       targetShape.push_back(nonReducedType.getDimSize(i));
     }
 
-    return MemRefType::get(targetShape, nonReducedType.getElementType(),
-                           StridedLayoutAttr::get(nonReducedType.getContext(),
-                                                  offset, targetStrides),
-                           nonReducedType.getMemorySpace());
+    return MemRefType::get(
+        targetShape, nonReducedType.getElementType(),
+        StridedLayoutAttr::get(nonReducedType.getContext(), targetStrides),
+        nonReducedType.getMemorySpace());
   }
 };
 
@@ -3789,6 +3770,7 @@ static MemRefType inferTransposeResultType(MemRefType memRefType,
                                            AffineMap permutationMap) {
   auto originalSizes = memRefType.getShape();
   auto [originalStrides, offset] = memRefType.getStridesAndOffset();
+  (void)offset;
   assert(originalStrides.size() == static_cast<unsigned>(memRefType.getRank()));
 
   // Compute permuted sizes and strides.
@@ -3797,8 +3779,7 @@ static MemRefType inferTransposeResultType(MemRefType memRefType,
 
   return MemRefType::Builder(memRefType)
       .setShape(sizes)
-      .setLayout(
-          StridedLayoutAttr::get(memRefType.getContext(), offset, strides));
+      .setLayout(StridedLayoutAttr::get(memRefType.getContext(), strides));
 }
 
 void TransposeOp::build(OpBuilder &b, OperationState &result, Value in,
diff --git a/mlir/lib/Dialect/MemRef/Transforms/ElideReinterpretCast.cpp b/mlir/lib/Dialect/MemRef/Transforms/ElideReinterpretCast.cpp
index 01632c6ea1579..bff1f2eec25f1 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/ElideReinterpretCast.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/ElideReinterpretCast.cpp
@@ -125,10 +125,10 @@ static bool isScalarSlice(memref::ReinterpretCastOp rc) {
 ///   %view = memref.reinterpret_cast %base
 ///     to offset: [%off], sizes: [1, ..., 1], strides: [N, ..., 1]
 ///       : memref<1x...xNxf32>
-///         to memref<1x...x1xf32, strided<[N, ..., 1], offset: ?>>
+///         to memref<1x...x1xf32, strided<[N, ..., 1]>>
 ///   memref.copy %src, %view
 ///     : memref<1x...x1xf32>
-///       to memref<1x...x1xf32, strided<[N, ..., 1], offset: ?>>
+///       to memref<1x...x1xf32, strided<[N, ..., 1]>>
 ///
 /// AFTER
 ///   %c0 = arith.constant 0 : index
@@ -139,10 +139,10 @@ static bool isScalarSlice(memref::ReinterpretCastOp rc) {
 ///   %view = memref.reinterpret_cast %base
 ///     to offset: [%off], sizes: [1, ..., 1], strides: [1, ..., N]
 ///       : memref<Nx...x1xf32>
-///         to memref<1x...x1xf32, strided<[1, ..., N], offset: ?>>
+///         to memref<1x...x1xf32, strided<[1, ..., N]>>
 ///   memref.copy %src, %view
 ///     : memref<1x...x1xf32>
-///       to memref<1x...x1xf32, strided<[1, ..., N], offset: ?>>
+///       to memref<1x...x1xf32, strided<[1, ..., N]>>
 ///
 /// AFTER
 ///   %c0 = arith.constant 0 : index
diff --git a/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp b/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
index d24224355ed51..c1a4716fc8668 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
@@ -690,24 +690,14 @@ void memref::populateMemRefNarrowTypeEmulationConversions(
         if (!newElemTy)
           return nullptr;
 
+        // The strided layout no longer carries offset information. The
+        // lowering of any op that produced an offset against the source memref
+        // is responsible for materializing the equivalent offset on the
+        // narrow-element memref.
         StridedLayoutAttr layoutAttr;
-        // If the offset is 0, we do not need a strided layout as the stride is
-        // 1, so we only use the strided layout if the offset is not 0.
-        if (offset != 0) {
-          if (offset == ShapedType::kDynamic) {
-            layoutAttr = StridedLayoutAttr::get(ty.getContext(), offset,
-                                                ArrayRef<int64_t>{1});
-          } else {
-            // Check if the number of bytes are a multiple of the loadStoreWidth
-            // and if so, divide it by the loadStoreWidth to get the offset.
-            if ((offset * width) % loadStoreWidth != 0)
-              return std::nullopt;
-            offset = (offset * width) / loadStoreWidth;
-
-            layoutAttr = StridedLayoutAttr::get(ty.getContext(), offset,
-                                                ArrayRef<int64_t>{1});
-          }
-        }
+        if (offset != 0)
+          layoutAttr =
+              StridedLayoutAttr::get(ty.getContext(), ArrayRef<int64_t>{1});
 
         return MemRefType::get(getLinearizedShape(ty, width, loadStoreWidth),
                                newElemTy, layoutAttr, ty.getMemorySpace());
diff --git a/mlir/lib/Dialect/MemRef/Transforms/ExtractAddressComputations.cpp b/mlir/lib/Dialect/MemRef/Transforms/ExtractAddressComputations.cpp
index 9c922c28d0f54..bf49ec23e17ac 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/ExtractAddressComputations.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/ExtractAddressComputations.cpp
@@ -205,7 +205,7 @@ getGenericOpViewSizeForEachDim(RewriterBase &rewriter,
 /// =>
 /// %new_base = subview %base[%off0,.., %offN][1,..,1][1,..,1]
 /// %ld = memref.load %new_base[0,..,0] :
-///    memref<1x..x1xTy, strided<[1,..,1], offset: ?>>
+///    memref<1x..x1xTy, strided<[1,..,1]>>
 ///
 /// `getSrcMemRef` returns the source memref for the given load-like operation.
 ///
diff --git a/mlir/lib/Dialect/MemRef/Transforms/FlattenMemRefs.cpp b/mlir/lib/Dialect/MemRef/Transforms/FlattenMemRefs.cpp
index 6b56ea3ff5cac..b47a16f9f4ea5 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/FlattenMemRefs.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/FlattenMemRefs.cpp
@@ -276,7 +276,7 @@ struct AllocLikeFlattenPattern : public OpRewritePattern<AllocLikeOp> {
 
     auto flatMemrefType =
         MemRefType::get({flatDimSize}, memrefType.getElementType(),
-                        StridedLayoutAttr::get(rewriter.getContext(), 0, {1}),
+                        StridedLayoutAttr::get(rewriter.getContext(), {1}),
                         memrefType.getMemorySpace());
 
     // Collect the flat dynamic-size operand (empty for fully-static case).
diff --git a/mlir/lib/Dialect/MemRef/Transforms/IndependenceTransforms.cpp b/mlir/lib/Dialect/MemRef/Transforms/IndependenceTransforms.cpp
index d5e2b97e501e6..62be35c219405 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/IndependenceTransforms.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/IndependenceTransforms.cpp
@@ -85,7 +85,7 @@ propagateSubViewOp(RewriterBase &rewriter,
 ///
 /// Example:
 /// %from = memref.alloca(%sz) : memref<?xf32>
-/// %to = memref.subview ... : ... to memref<?xf32, strided<[1], offset: ?>>
+/// %to = memref.subview ... : ... to memref<?xf32, strided<[1]>>
 /// memref.store %cst, %from[%c0] : memref<?xf32>
 ///
 /// In the above example, all uses of %from are replaced with %to. This can be
diff --git a/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp b/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp
index e5cc41e2c43ba..3ebb8f0a35bc4 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp
@@ -88,11 +88,10 @@ struct CastOpInterface
     // strides from unranked memrefs, so cast the source to a type with fully
     // dynamic layout, from which we can then extract the offset and strides.
     // (Rank was already verified.)
-    int64_t dynamicOffset = ShapedType::kDynamic;
     SmallVector<int64_t> dynamicShape(resultType.getRank(),
                                       ShapedType::kDynamic);
-    auto stridedLayout = StridedLayoutAttr::get(builder.getContext(),
-                                                dynamicOffset, dynamicShape);
+    auto stridedLayout =
+        StridedLayoutAttr::get(builder.getContext(), dynamicShape);
     auto dynStridesType =
         MemRefType::get(dynamicShape, resultType.getElementType(),
                         stridedLayout, resultType.getMemorySpace());
diff --git a/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp b/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
index 0b132e9109492..b53065281a977 100644
--- a/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/ParallelLoopFusion.cpp
@@ -315,7 +315,7 @@ static Value getBaseMemref(Operation *op) {
 /// vector write stores a full lane pack and a subsequent scalar load reads an
 /// element from that lane pack. EXAMPLE:
 ///  vector.transfer_write %V, %arg[%x, %y, ..., 0] {in_bounds = [true]} :
-///             vector<4xf32>, memref<4xf32, strided<[1], offset: ?>>
+///             vector<4xf32>, memref<4xf32, strided<[1]>>
 ///  scf.for %iter = %c0 to %c4 step %c1 iter_args(...) -> (f32) {
 ///    %0 = memref.load %arg[%x, %y, ..., %iter] : memref<1x128x16x4xf32>
 ///    ...
diff --git a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
index b77a536861d2a..978b9ffb893d8 100644
--- a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
+++ b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
@@ -1626,10 +1626,10 @@ static LogicalResult inferSparseBufferType(ValueRange ops, DictionaryAttr attr,
   SmallVector<int64_t> bufShape = stt.getBatchLvlShape();
   bufShape.push_back(ShapedType::kDynamic);
 
-  auto layout = withStride ? StridedLayoutAttr::StridedLayoutAttr::get(
-                                 stt.getContext(), ShapedType::kDynamic,
-                                 {ShapedType::kDynamic})
-                           : StridedLayoutAttr();
+  auto layout = withStride
+                    ? StridedLayoutAttr::get(stt.getContext(),
+                                             {ShapedType::kDynamic})
+                    : StridedLayoutAttr();
   ret.emplace_back(MemRefType::get(bufShape, elemTp, layout));
   return success();
 }
diff --git a/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp b/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
index 310e72587eb81..b80bfdad2e848 100644
--- a/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
@@ -189,15 +189,15 @@ struct CollapseShapeOpInterface
         resultType = MemRefType::get({}, tensorResultType.getElementType(),
                                      layout, bufferType.getMemorySpace());
       } else {
-        // Source memref has a layout map: result type has the same offset as
-        // the source type.
+        // Source memref has a layout map: result keeps a strided layout but
+        // carries no static offset (offsets live on ops, not the type).
         SmallVector<int64_t> strides;
         int64_t offset;
         if (failed(bufferType.getStridesAndOffset(strides, offset)))
           return failure();
         resultType = MemRefType::get(
             {}, tensorResultType.getElementType(),
-            StridedLayoutAttr::get(op->getContext(), offset, {}),
+            StridedLayoutAttr::get(op->getContext(), {}),
             bufferType.getMemorySpace());
       }
 
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp
index bd14e43747f81..0b28fcf848fc8 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp
@@ -146,7 +146,6 @@ static MemRefType getCastCompatibleMemRefType(MemRefType aT, MemRefType bT) {
     return MemRefType();
 
   ArrayRef<int64_t> aShape = aT.getShape(), bShape = bT.getShape();
-  int64_t resOffset;
   SmallVector<int64_t, 4> resShape(aT.getRank(), 0),
       resStrides(bT.getRank(), 0);
   for (int64_t idx = 0, e = aT.getRank(); idx < e; ++idx) {
@@ -155,10 +154,9 @@ static MemRefType getCastCompatibleMemRefType(MemRefType aT, MemRefType bT) {
     resStrides[idx] =
         (aStrides[idx] == bStrides[idx]) ? aStrides[idx] : ShapedType::kDynamic;
   }
-  resOffset = (aOffset == bOffset) ? aOffset : ShapedType::kDynamic;
   return MemRefType::get(
       resShape, aT.getElementType(),
-      StridedLayoutAttr::get(aT.getContext(), resOffset, resStrides));
+      StridedLayoutAttr::get(aT.getContext(), resStrides));
 }
 
 /// Casts the given memref to a compatible memref type. If the source memref has
diff --git a/mlir/lib/IR/BuiltinAttributes.cpp b/mlir/lib/IR/BuiltinAttributes.cpp
index c06ae5b178624..10cc732cfc5d6 100644
--- a/mlir/lib/IR/BuiltinAttributes.cpp
+++ b/mlir/lib/IR/BuiltinAttributes.cpp
@@ -220,31 +220,37 @@ void StridedLayoutAttr::print(llvm::raw_ostream &os) const {
 
   os << "strided<[";
   llvm::interleaveComma(getStrides(), os, printIntOrQuestion);
-  os << "]";
-
-  if (getOffset() != 0) {
-    os << ", offset: ";
-    printIntOrQuestion(getOffset());
-  }
-  os << ">";
+  os << "]>";
 }
 
-/// Returns true if this layout is static, i.e. the strides and offset all have
-/// a known value > 0.
+/// Returns true if this layout is static, i.e. all strides have a known
+/// value > 0.
 bool StridedLayoutAttr::hasStaticLayout() const {
-  return ShapedType::isStatic(getOffset()) &&
-         ShapedType::isStaticShape(getStrides());
+  return ShapedType::isStaticShape(getStrides());
 }
 
-/// Returns the strided layout as an affine map.
+/// Returns the strided layout as an affine map. The type does not carry an
+/// offset, so the affine map omits the offset term entirely; the runtime
+/// offset, if any, lives on the producing op.
 AffineMap StridedLayoutAttr::getAffineMap() const {
-  return makeStridedLinearLayoutMap(getStrides(), getOffset(), getContext());
+  ArrayRef<int64_t> strides = getStrides();
+  MLIRContext *context = getContext();
+  AffineExpr expr = getAffineConstantExpr(0, context);
+  unsigned nSymbols = 0;
+  for (const auto &en : llvm::enumerate(strides)) {
+    AffineExpr d = getAffineDimExpr(en.index(), context);
+    AffineExpr stride = ShapedType::isStatic(en.value())
+                            ? getAffineConstantExpr(en.value(), context)
+                            : getAffineSymbolExpr(nSymbols++, context);
+    expr = expr + d * stride;
+  }
+  return AffineMap::get(/*dimCount=*/strides.size(), nSymbols, expr);
 }
 
 /// Checks that the type-agnostic strided layout invariants are satisfied.
 LogicalResult
 StridedLayoutAttr::verify(function_ref<InFlightDiagnostic()> emitError,
-                          int64_t offset, ArrayRef<int64_t> strides) {
+                          ArrayRef<int64_t> strides) {
   return success();
 }
 
@@ -263,7 +269,11 @@ StridedLayoutAttr::getStridesAndOffset(ArrayRef<int64_t>,
                                        SmallVectorImpl<int64_t> &strides,
                                        int64_t &offset) const {
   llvm::append_range(strides, getStrides());
-  offset = getOffset();
+  // The type no longer pins a static offset. Report zero for back-compat with
+  // identity-layout memrefs (which also report zero), so subview/cast offset
+  // checks remain consistent across both layout forms. The runtime offset, if
+  // any, lives on the producing op.
+  offset = 0;
   return success();
 }
 
diff --git a/mlir/python/mlir/dialects/memref.py b/mlir/python/mlir/dialects/memref.py
index 34f00a3292b79..9cf191fde2d96 100644
--- a/mlir/python/mlir/dialects/memref.py
+++ b/mlir/python/mlir/dialects/memref.py
@@ -36,7 +36,7 @@ def _is_static_int_like(i):
 def _infer_memref_subview_result_type(
     source_memref_type, offsets, static_sizes, static_strides
 ):
-    source_strides, source_offset = source_memref_type.get_strides_and_offset()
+    source_strides, _ = source_memref_type.get_strides_and_offset()
     # "canonicalize" from tuple|list -> list
     offsets, static_sizes, static_strides, source_strides = map(
         list, (offsets, static_sizes, static_strides, source_strides)
@@ -59,23 +59,16 @@ def _infer_memref_subview_result_type(
             if _is_constant_int_like(i):
                 s[idx] = i.owner.opview.literal_value
 
-    if any(not _is_static_int_like(i) for i in offsets + [source_offset]):
-        target_offset = ShapedType.get_dynamic_size()
-    else:
-        target_offset = source_offset
-        for offset, target_stride in zip(offsets, source_strides):
-            target_offset += offset * target_stride
-
     target_strides = []
     for source_stride, static_stride in zip(source_strides, static_strides):
         target_strides.append(source_stride * static_stride)
 
     # If default striding then no need to complicate things for downstream ops (e.g., expand_shape).
     default_strides = list(accumulate(static_sizes[1:][::-1], operator.mul))[::-1] + [1]
-    if target_strides == default_strides and target_offset == 0:
+    if target_strides == default_strides:
         layout = None
     else:
-        layout = StridedLayoutAttr.get(target_offset, target_strides)
+        layout = StridedLayoutAttr.get(target_strides)
     return (
         offsets,
         static_sizes,
diff --git a/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir b/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
index 808c1c2bfd2a8..dcce78e9173e6 100644
--- a/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
+++ b/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
@@ -1,6 +1,6 @@
 // RUN: mlir-opt -test-strided-metadata-range-analysis %s 2>&1 | FileCheck %s
 
-func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1: memref<1x128x1x32x1xf32, strided<[4096, 32, 32, 1, 1]>>, %arg2: memref<8x16x4xf32, strided<[1, 64, 8], offset: 16>>, %arg3: index, %arg4: index, %arg5: index) {
+func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1: memref<1x128x1x32x1xf32, strided<[4096, 32, 32, 1, 1]>>, %arg2: memref<8x16x4xf32, strided<[1, 64, 8]>>, %arg3: index, %arg4: index, %arg5: index) {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
@@ -13,7 +13,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // CHECK-SAME: offset = [{unsigned : [1, 1] signed : [1, 1]}]
   // CHECK-SAME: sizes = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: strides = [{unsigned : [64, 64] signed : [64, 64]}, {unsigned : [4, 4] signed : [4, 4]}, {unsigned : [1, 1] signed : [1, 1]}]
-  %subview = memref.subview %arg0[%c0, %c0, %c1] [%arg3, %arg4, %arg5] [%c1, %c1, %c1] : memref<8x16x4xf32, strided<[64, 4, 1]>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  %subview = memref.subview %arg0[%c0, %c0, %c1] [%arg3, %arg4, %arg5] [%c1, %c1, %c1] : memref<8x16x4xf32, strided<[64, 4, 1]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // Test a subview of a subview, with bounded dynamic offsets.
   // CHECK: Op:  %[[SV1:.*]] = memref.subview
@@ -21,7 +21,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // CHECK-SAME: offset = [{unsigned : [346, 484] signed : [346, 484]}]
   // CHECK-SAME: sizes = [{unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}]
   // CHECK-SAME: strides = [{unsigned : [704, 832] signed : [704, 832]}, {unsigned : [44, 52] signed : [44, 52]}, {unsigned : [11, 13] signed : [11, 13]}]
-  %subview_0 = memref.subview %subview[%1, %1, %1] [%c2, %c2, %c2] [%0, %0, %0] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  %subview_0 = memref.subview %subview[%1, %1, %1] [%c2, %c2, %c2] [%0, %0, %0] : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // Test a subview of a subview, with constant operands.
   // CHECK: Op:  %[[SV2:.*]] = memref.subview
@@ -29,7 +29,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // CHECK-SAME: offset = [{unsigned : [368, 510] signed : [368, 510]}]
   // CHECK-SAME: sizes = [{unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}]
   // CHECK-SAME: strides = [{unsigned : [704, 832] signed : [704, 832]}, {unsigned : [44, 52] signed : [44, 52]}, {unsigned : [11, 13] signed : [11, 13]}]
-  %subview_1 = memref.subview %subview_0[%c0, %c0, %c2] [%c2, %c2, %c2] [%c1, %c1, %c1] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  %subview_1 = memref.subview %subview_0[%c0, %c0, %c2] [%c2, %c2, %c2] [%c1, %c1, %c1] : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // Test a rank-reducing subview.
   // CHECK: Op:  %[[SV3:.*]] = memref.subview
@@ -37,7 +37,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: sizes = [{unsigned : [64, 64] signed : [64, 64]}, {unsigned : [16, 16] signed : [16, 16]}]
   // CHECK-SAME: strides = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
-  %subview_2 = memref.subview %arg1[%arg4, %arg4, %arg4, %arg4, %arg4] [1, 64, 1, 16, 1] [%arg5, %arg5, %arg5, %arg5, %arg5] : memref<1x128x1x32x1xf32, strided<[4096, 32, 32, 1, 1]>> to memref<64x16xf32, strided<[?, ?], offset: ?>>
+  %subview_2 = memref.subview %arg1[%arg4, %arg4, %arg4, %arg4, %arg4] [1, 64, 1, 16, 1] [%arg5, %arg5, %arg5, %arg5, %arg5] : memref<1x128x1x32x1xf32, strided<[4096, 32, 32, 1, 1]>> to memref<64x16xf32, strided<[?, ?]>>
 
   // Test a subview of a rank-reducing subview
   // CHECK: Op:  %[[SV4:.*]] = memref.subview
@@ -45,7 +45,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: sizes = [{unsigned : [5, 7] signed : [5, 7]}]
   // CHECK-SAME: strides = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
-  %subview_3 = memref.subview %subview_2[%c0, %0] [1, %1] [%c1, %c2] : memref<64x16xf32, strided<[?, ?], offset: ?>> to memref<?xf32, strided<[?], offset: ?>>
+  %subview_3 = memref.subview %subview_2[%c0, %0] [1, %1] [%c1, %c2] : memref<64x16xf32, strided<[?, ?]>> to memref<?xf32, strided<[?]>>
 
   // Test a subview with mixed bounded and unbound dynamic sizes.
   // CHECK: Op:  %[[SV5:.*]] = memref.subview
@@ -53,7 +53,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // CHECK-SAME: offset = [{unsigned : [32, 32] signed : [32, 32]}]
   // CHECK-SAME: sizes = [{unsigned : [11, 13] signed : [11, 13]}, {unsigned : [5, 7] signed : [5, 7]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: strides = [{unsigned : [1, 1] signed : [1, 1]}, {unsigned : [64, 64] signed : [64, 64]}, {unsigned : [8, 8] signed : [8, 8]}]
-  %subview_4 = memref.subview %arg2[%c0, %c0, %c2] [%0, %1, %arg5] [%c1, %c1, %c1] : memref<8x16x4xf32, strided<[1, 64, 8], offset: 16>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  %subview_4 = memref.subview %arg2[%c0, %c0, %c2] [%0, %1, %arg5] [%c1, %c1, %c1] : memref<8x16x4xf32, strided<[1, 64, 8]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
   return
 }
 
diff --git a/mlir/test/CAPI/ir.c b/mlir/test/CAPI/ir.c
index e66c931383f89..4b73d4b914f3f 100644
--- a/mlir/test/CAPI/ir.c
+++ b/mlir/test/CAPI/ir.c
@@ -1270,13 +1270,12 @@ int printBuiltinAttributes(MlirContext ctx) {
 
   int64_t layoutStrides[3] = {5, 7, 13};
   MlirAttribute stridedLayoutAttr =
-      mlirStridedLayoutAttrGet(ctx, 42, 3, &layoutStrides[0]);
+      mlirStridedLayoutAttrGet(ctx, 3, &layoutStrides[0]);
 
-  // CHECK: strided<[5, 7, 13], offset: 42>
+  // CHECK: strided<[5, 7, 13]>
   mlirAttributeDump(stridedLayoutAttr);
 
-  if (mlirStridedLayoutAttrGetOffset(stridedLayoutAttr) != 42 ||
-      mlirStridedLayoutAttrGetNumStrides(stridedLayoutAttr) != 3 ||
+  if (mlirStridedLayoutAttrGetNumStrides(stridedLayoutAttr) != 3 ||
       mlirStridedLayoutAttrGetStride(stridedLayoutAttr, 0) != 5 ||
       mlirStridedLayoutAttrGetStride(stridedLayoutAttr, 1) != 7 ||
       mlirStridedLayoutAttrGetStride(stridedLayoutAttr, 2) != 13)
diff --git a/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir b/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
index e43ecfd01cb50..d04932bdcc2cc 100644
--- a/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
+++ b/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
@@ -53,7 +53,7 @@ func.func @fat_raw_buffer_cast_0d(%buf: memref<i32, #gpu.address_space<global>>)
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_dyn_size_offset
-func.func @fat_raw_buffer_cast_dyn_size_offset(%buf: memref<?xi32, strided<[1], offset: ?>, #gpu.address_space<global>>) -> memref<?xi32, strided<[1], offset: ?>, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast_dyn_size_offset(%buf: memref<?xi32, strided<[1]>, #gpu.address_space<global>>) -> memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>> {
   // CHECK: %[[size0:.*]] = llvm.extractvalue %{{.*}}[3, 0]
   // CHECK: %[[stride0:.*]] = llvm.extractvalue %{{.*}}[4, 0]
   // CHECK: %[[maxVals:.*]] = llvm.mul %[[size0]], %[[stride0]]
@@ -62,13 +62,13 @@ func.func @fat_raw_buffer_cast_dyn_size_offset(%buf: memref<?xi32, strided<[1],
   // CHECK: %[[offset:.*]] = llvm.extractvalue %{{.*}}[2]
   // CHECK: rocdl.make.buffer.rsrc %{{.*}}, %{{.*}}, %[[numRecords]], %{{.*}}
   // CHECK: llvm.insertvalue %[[offset]], %{{.*}}[2]
-  %ret = amdgpu.fat_raw_buffer_cast %buf : memref<?xi32, strided<[1], offset: ?>, #gpu.address_space<global>> to memref<?xi32, strided<[1], offset: ?>, #amdgpu.address_space<fat_raw_buffer>>
-  return %ret : memref<?xi32, strided<[1], offset: ?>, #amdgpu.address_space<fat_raw_buffer>>
+  %ret = amdgpu.fat_raw_buffer_cast %buf : memref<?xi32, strided<[1]>, #gpu.address_space<global>> to memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>>
+  return %ret : memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>>
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_reset_offset
-func.func @fat_raw_buffer_cast_reset_offset(%buf: memref<?xi32, strided<[1], offset: ?>, #gpu.address_space<global>>) -> memref<?xi32, #amdgpu.address_space<fat_raw_buffer>> {
-  // CHECK: %[[desc:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<?xi32, strided<[1], offset: ?>, #gpu.address_space<global>> to !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
+func.func @fat_raw_buffer_cast_reset_offset(%buf: memref<?xi32, strided<[1]>, #gpu.address_space<global>>) -> memref<?xi32, #amdgpu.address_space<fat_raw_buffer>> {
+  // CHECK: %[[desc:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<?xi32, strided<[1]>, #gpu.address_space<global>> to !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK-DAG: %[[memRefPtr:.*]] = llvm.extractvalue %[[desc]][1]
   // CHECK-DAG: %[[memRefOff:.*]] = llvm.extractvalue %[[desc]][2]
   // CHECK-DAG: %[[basePtr:.*]] = llvm.getelementptr %[[memRefPtr]][%[[memRefOff]]]
@@ -76,7 +76,7 @@ func.func @fat_raw_buffer_cast_reset_offset(%buf: memref<?xi32, strided<[1], off
   // CHECK: %[[fatBuf:.*]] = rocdl.make.buffer.rsrc %[[basePtr]], %{{.*}}, %{{.*}}, %{{.*}}
   // CHECK: llvm.insertvalue %[[fatBuf]], %{{.*}}[1]
   // CHECK: llvm.insertvalue %[[zeroOff]], %{{.*}}[2]
-  %ret = amdgpu.fat_raw_buffer_cast %buf resetOffset : memref<?xi32, strided<[1], offset: ?>, #gpu.address_space<global>> to memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
+  %ret = amdgpu.fat_raw_buffer_cast %buf resetOffset : memref<?xi32, strided<[1]>, #gpu.address_space<global>> to memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
   return %ret : memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
 }
 
@@ -151,8 +151,8 @@ func.func @gpu_gcn_raw_buffer_load_i32(%buf: memref<64xi32>, %idx: i32) -> i32 {
 }
 
 // CHECK-LABEL: func @gpu_gcn_raw_buffer_load_i32_strided
-func.func @gpu_gcn_raw_buffer_load_i32_strided(%buf: memref<16x16xi32, strided<[?, ?], offset: ?>>, %i: i32, %j: i32) -> i32 {
-    // CHECK: %[[descriptor:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<16x16xi32, strided<[?, ?], offset: ?>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+func.func @gpu_gcn_raw_buffer_load_i32_strided(%buf: memref<16x16xi32, strided<[?, ?]>>, %i: i32, %j: i32) -> i32 {
+    // CHECK: %[[descriptor:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<16x16xi32, strided<[?, ?]>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[elem_size:.*]] = llvm.mlir.constant(4 : i32) : i32
     // CHECK: %[[algn_ptr:.*]] = llvm.extractvalue %[[descriptor]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[offset:.*]] = llvm.extractvalue %[[descriptor]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -181,7 +181,7 @@ func.func @gpu_gcn_raw_buffer_load_i32_strided(%buf: memref<16x16xi32, strided<[
     // CHECK: %[[zero_1:.*]] = llvm.mlir.constant(0 : i32) : i32
     // CHECK: %[[v:.*]] = rocdl.raw.ptr.buffer.load %[[rsrc]], %[[vgpr_off]], %[[sgpr_off]], %[[zero_1]] : i32
     // CHECK: return %[[v]] : i32
-  %0 = amdgpu.raw_buffer_load {boundsCheck = true} %buf[%i, %j] :  memref<16x16xi32, strided<[?, ?], offset: ?>>, i32, i32 -> i32
+  %0 = amdgpu.raw_buffer_load {boundsCheck = true} %buf[%i, %j] :  memref<16x16xi32, strided<[?, ?]>>, i32, i32 -> i32
   func.return %0 : i32
 }
 
diff --git a/mlir/test/Conversion/BufferizationToMemRef/bufferization-to-memref.mlir b/mlir/test/Conversion/BufferizationToMemRef/bufferization-to-memref.mlir
index 21d5f42158d09..a423f21fe6227 100644
--- a/mlir/test/Conversion/BufferizationToMemRef/bufferization-to-memref.mlir
+++ b/mlir/test/Conversion/BufferizationToMemRef/bufferization-to-memref.mlir
@@ -53,18 +53,18 @@ func.func @conversion_unknown(%arg0 : memref<*xf32>) -> memref<*xf32> {
 // -----
 
 // CHECK-LABEL: func @conversion_with_layout_map(
-//  CHECK-SAME:     %[[ARG:.*]]: memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     %[[ARG:.*]]: memref<?xf32, strided<[?]>>
 //       CHECK:   %[[C0:.*]] = arith.constant 0 : index
 //       CHECK:   %[[DIM:.*]] = memref.dim %[[ARG]], %[[C0]]
 //       CHECK:   %[[ALLOC:.*]] = memref.alloc(%[[DIM]]) : memref<?xf32>
-//       CHECK:   %[[CASTED:.*]] = memref.cast %[[ALLOC]] : memref<?xf32> to memref<?xf32, strided<[?], offset: ?>>
+//       CHECK:   %[[CASTED:.*]] = memref.cast %[[ALLOC]] : memref<?xf32> to memref<?xf32, strided<[?]>>
 //       CHECK:   memref.copy
 //       CHECK:   memref.dealloc
 //       CHECK:   return %[[CASTED]]
-func.func @conversion_with_layout_map(%arg0 : memref<?xf32, strided<[?], offset: ?>>) -> memref<?xf32, strided<[?], offset: ?>> {
-  %1 = bufferization.clone %arg0 : memref<?xf32, strided<[?], offset: ?>> to memref<?xf32, strided<[?], offset: ?>>
-  memref.dealloc %arg0 : memref<?xf32, strided<[?], offset: ?>>
-  return %1 : memref<?xf32, strided<[?], offset: ?>>
+func.func @conversion_with_layout_map(%arg0 : memref<?xf32, strided<[?]>>) -> memref<?xf32, strided<[?]>> {
+  %1 = bufferization.clone %arg0 : memref<?xf32, strided<[?]>> to memref<?xf32, strided<[?]>>
+  memref.dealloc %arg0 : memref<?xf32, strided<[?]>>
+  return %1 : memref<?xf32, strided<[?]>>
 }
 
 // -----
@@ -72,12 +72,12 @@ func.func @conversion_with_layout_map(%arg0 : memref<?xf32, strided<[?], offset:
 // This bufferization.clone cannot be lowered because a buffer with this layout
 // map cannot be allocated (or casted to).
 
-func.func @conversion_with_invalid_layout_map(%arg0 : memref<?xf32, strided<[10], offset: ?>>)
-    -> memref<?xf32, strided<[10], offset: ?>> {
+func.func @conversion_with_invalid_layout_map(%arg0 : memref<?xf32, strided<[10]>>)
+    -> memref<?xf32, strided<[10]>> {
 // expected-error at +1 {{failed to legalize operation 'bufferization.clone' that was explicitly marked illegal}}
-  %1 = bufferization.clone %arg0 : memref<?xf32, strided<[10], offset: ?>> to memref<?xf32, strided<[10], offset: ?>>
-  memref.dealloc %arg0 : memref<?xf32, strided<[10], offset: ?>>
-  return %1 : memref<?xf32, strided<[10], offset: ?>>
+  %1 = bufferization.clone %arg0 : memref<?xf32, strided<[10]>> to memref<?xf32, strided<[10]>>
+  memref.dealloc %arg0 : memref<?xf32, strided<[10]>>
+  return %1 : memref<?xf32, strided<[10]>>
 }
 
 // -----
diff --git a/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir b/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
index 22ebbf8618bde..a9036959b4a7b 100644
--- a/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
+++ b/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
@@ -41,7 +41,7 @@ func.func @check_static_return(%static : memref<32x18xf32>) -> memref<32x18xf32>
 // CHECK-SAME: -> !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // BAREPTR-LABEL: func @check_static_return_with_offset
 // BAREPTR-SAME: (%[[arg:.*]]: !llvm.ptr) -> !llvm.ptr {
-func.func @check_static_return_with_offset(%static : memref<32x18xf32, strided<[22,1], offset: 7>>) -> memref<32x18xf32, strided<[22,1], offset: 7>> {
+func.func @check_static_return_with_offset(%static : memref<32x18xf32, strided<[22,1]>>) -> memref<32x18xf32, strided<[22,1]>> {
 // CHECK:  llvm.return %{{.*}} : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
 // BAREPTR: %[[udf:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -59,7 +59,7 @@ func.func @check_static_return_with_offset(%static : memref<32x18xf32, strided<[
 // BAREPTR-NEXT: %[[ins4:.*]] = llvm.insertvalue %[[val4]], %[[ins3]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // BAREPTR-NEXT: %[[base1:.*]] = llvm.extractvalue %[[ins4]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // BAREPTR-NEXT: llvm.return %[[base1]] : !llvm.ptr
-  return %static : memref<32x18xf32, strided<[22,1], offset: 7>>
+  return %static : memref<32x18xf32, strided<[22,1]>>
 }
 
 
diff --git a/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir b/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir
index 0c77c88334572..24d549ee52e1d 100644
--- a/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir
+++ b/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir
@@ -715,22 +715,22 @@ func.func @memref_offset_strides(
 // CHECK-SAME: !spirv.array<256 x f32, stride=4> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<64 x f32, stride=4> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<88 x f32, stride=4> [0])>, StorageBuffer>
-  %arg0: memref<16x4xf32, strided<[4, 1], offset: 0>, #spirv.storage_class<StorageBuffer>>,  // tightly packed; row major
-  %arg1: memref<16x4xf32, strided<[4, 1], offset: 8>, #spirv.storage_class<StorageBuffer>>,  // offset 8
-  %arg2: memref<16x4xf32, strided<[16, 1], offset: 0>, #spirv.storage_class<StorageBuffer>>, // pad 12 after each row
-  %arg3: memref<16x4xf32, strided<[1, 16], offset: 0>, #spirv.storage_class<StorageBuffer>>, // tightly packed; col major
-  %arg4: memref<16x4xf32, strided<[1, 22], offset: 0>, #spirv.storage_class<StorageBuffer>>, // pad 4 after each col
+  %arg0: memref<16x4xf32, strided<[4, 1]>, #spirv.storage_class<StorageBuffer>>,  // tightly packed; row major
+  %arg1: memref<16x4xf32, strided<[4, 1]>, #spirv.storage_class<StorageBuffer>>,  // offset 8
+  %arg2: memref<16x4xf32, strided<[16, 1]>, #spirv.storage_class<StorageBuffer>>, // pad 12 after each row
+  %arg3: memref<16x4xf32, strided<[1, 16]>, #spirv.storage_class<StorageBuffer>>, // tightly packed; col major
+  %arg4: memref<16x4xf32, strided<[1, 22]>, #spirv.storage_class<StorageBuffer>>, // pad 4 after each col
 
 // CHECK-SAME: !spirv.array<64 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<72 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<256 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<64 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<88 x f16, stride=2> [0])>, StorageBuffer>
-  %arg5: memref<16x4xf16, strided<[4, 1], offset: 0>, #spirv.storage_class<StorageBuffer>>,
-  %arg6: memref<16x4xf16, strided<[4, 1], offset: 8>, #spirv.storage_class<StorageBuffer>>,
-  %arg7: memref<16x4xf16, strided<[16, 1], offset: 0>, #spirv.storage_class<StorageBuffer>>,
-  %arg8: memref<16x4xf16, strided<[1, 16], offset: 0>, #spirv.storage_class<StorageBuffer>>,
-  %arg9: memref<16x4xf16, strided<[1, 22], offset: 0>, #spirv.storage_class<StorageBuffer>>
+  %arg5: memref<16x4xf16, strided<[4, 1]>, #spirv.storage_class<StorageBuffer>>,
+  %arg6: memref<16x4xf16, strided<[4, 1]>, #spirv.storage_class<StorageBuffer>>,
+  %arg7: memref<16x4xf16, strided<[16, 1]>, #spirv.storage_class<StorageBuffer>>,
+  %arg8: memref<16x4xf16, strided<[1, 16]>, #spirv.storage_class<StorageBuffer>>,
+  %arg9: memref<16x4xf16, strided<[1, 22]>, #spirv.storage_class<StorageBuffer>>
 ) { return }
 
 } // end module
diff --git a/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir b/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
index 543fdf5c26f5e..fa23c0b4fcc9b 100644
--- a/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
@@ -478,7 +478,7 @@ func.func @memref_reinterpret_cast_unranked_to_dynamic_shape(%offset: index,
   %output = memref.reinterpret_cast %input to
            offset: [%offset], sizes: [%size_0, %size_1],
            strides: [%stride_0, %stride_1]
-           : memref<*xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+           : memref<*xf32> to memref<?x?xf32, strided<[?, ?]>>
   return
 }
 // CHECK-SAME: ([[OFFSETarg:%[a-z,0-9]+]]: index,
diff --git a/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir b/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
index c2c93525b6509..bd89db7b20c54 100644
--- a/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
@@ -51,8 +51,8 @@
 // CHECK:         %[[ARG0f:[a-zA-Z0-9]*]]: index,
 // CHECK:         %[[ARG1f:[a-zA-Z0-9]*]]: index,
 // CHECK:         %[[ARG2f:.*]]: index)
-func.func @subview(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : index, %arg1 : index, %arg2 : index)
--> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @subview(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1 : index, %arg2 : index)
+-> memref<?x?xf32, strided<[?, ?]>> {
   // CHECK-DAG: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK-DAG: %[[ARG0:.*]] = builtin.unrealized_conversion_cast %[[ARG0f]]
   // CHECK-DAG: %[[ARG1:.*]] = builtin.unrealized_conversion_cast %[[ARG1f]]
@@ -76,9 +76,9 @@ func.func @subview(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : in
   // CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[ARG1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
   %1 = memref.subview %0[%arg0, %arg1][%arg0, %arg1][%arg0, %arg1] :
-    memref<64x4xf32, strided<[4, 1], offset: 0>>
-  to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %1 : memref<?x?xf32, strided<[?, ?], offset: ?>>
+    memref<64x4xf32, strided<[4, 1]>>
+  to memref<?x?xf32, strided<[?, ?]>>
+  return %1 : memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -88,7 +88,7 @@ func.func @subview(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : in
 // CHECK:         %[[ARG0f:[a-zA-Z0-9]*]]: index,
 // CHECK:         %[[ARG1f:[a-zA-Z0-9]*]]: index,
 // CHECK:         %[[ARG2f:.*]]: index)
-func.func @subview_non_zero_addrspace(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>, 3>, %arg0 : index, %arg1 : index, %arg2 : index) -> memref<?x?xf32, strided<[?, ?], offset: ?>, 3> {
+func.func @subview_non_zero_addrspace(%0 : memref<64x4xf32, strided<[4, 1]>, 3>, %arg0 : index, %arg1 : index, %arg2 : index) -> memref<?x?xf32, strided<[?, ?]>, 3> {
   // CHECK-DAG: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK-DAG: %[[ARG0:.*]] = builtin.unrealized_conversion_cast %[[ARG0f]]
   // CHECK-DAG: %[[ARG1:.*]] = builtin.unrealized_conversion_cast %[[ARG1f]]
@@ -112,9 +112,9 @@ func.func @subview_non_zero_addrspace(%0 : memref<64x4xf32, strided<[4, 1], offs
   // CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[ARG1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
 
   %1 = memref.subview %0[%arg0, %arg1][%arg0, %arg1][%arg0, %arg1] :
-    memref<64x4xf32, strided<[4, 1], offset: 0>, 3>
-    to memref<?x?xf32, strided<[?, ?], offset: ?>, 3>
-  return %1 : memref<?x?xf32, strided<[?, ?], offset: ?>, 3>
+    memref<64x4xf32, strided<[4, 1]>, 3>
+    to memref<?x?xf32, strided<[?, ?]>, 3>
+  return %1 : memref<?x?xf32, strided<[?, ?]>, 3>
 }
 
 // -----
@@ -124,7 +124,7 @@ func.func @subview_non_zero_addrspace(%0 : memref<64x4xf32, strided<[4, 1], offs
 // CHECK-SAME:         %[[ARG0f:[a-zA-Z0-9]*]]: index
 // CHECK-SAME:         %[[ARG1f:[a-zA-Z0-9]*]]: index
 // CHECK-SAME:         %[[ARG2f:[a-zA-Z0-9]*]]: index
-func.func @subview_const_size(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : index, %arg1 : index, %arg2 : index) -> memref<4x2xf32, strided<[?, ?], offset: ?>> {
+func.func @subview_const_size(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1 : index, %arg2 : index) -> memref<4x2xf32, strided<[?, ?]>> {
   // CHECK-DAG: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK-DAG: %[[ARG0:.*]] = builtin.unrealized_conversion_cast %[[ARG0f]]
   // CHECK-DAG: %[[ARG1:.*]] = builtin.unrealized_conversion_cast %[[ARG1f]]
@@ -149,9 +149,9 @@ func.func @subview_const_size(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>,
   // CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[ARG1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
   %1 = memref.subview %0[%arg0, %arg1][4, 2][%arg0, %arg1] :
-    memref<64x4xf32, strided<[4, 1], offset: 0>>
-    to memref<4x2xf32, strided<[?, ?], offset: ?>>
-  return %1 : memref<4x2xf32, strided<[?, ?], offset: ?>>
+    memref<64x4xf32, strided<[4, 1]>>
+    to memref<4x2xf32, strided<[?, ?]>>
+  return %1 : memref<4x2xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -161,7 +161,7 @@ func.func @subview_const_size(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>,
 // CHECK-SAME:         %[[ARG0f:[a-zA-Z0-9]*]]: index
 // CHECK-SAME:         %[[ARG1f:[a-zA-Z0-9]*]]: index
 // CHECK-SAME:         %[[ARG2f:[a-zA-Z0-9]*]]: index
-func.func @subview_const_stride(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : index, %arg1 : index, %arg2 : index) -> memref<?x?xf32, strided<[4, 2], offset: ?>> {
+func.func @subview_const_stride(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1 : index, %arg2 : index) -> memref<?x?xf32, strided<[4, 2]>> {
   // CHECK-DAG: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK-DAG: %[[ARG0:.*]] = builtin.unrealized_conversion_cast %[[ARG0f]]
   // CHECK-DAG: %[[ARG1:.*]] = builtin.unrealized_conversion_cast %[[ARG1f]]
@@ -184,16 +184,16 @@ func.func @subview_const_stride(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>
   // CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[CST_STRIDE1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
   %1 = memref.subview %0[%arg0, %arg1][%arg0, %arg1][1, 2] :
-    memref<64x4xf32, strided<[4, 1], offset: 0>>
-    to memref<?x?xf32, strided<[4, 2], offset: ?>>
-  return %1 : memref<?x?xf32, strided<[4, 2], offset: ?>>
+    memref<64x4xf32, strided<[4, 1]>>
+    to memref<?x?xf32, strided<[4, 2]>>
+  return %1 : memref<?x?xf32, strided<[4, 2]>>
 }
 
 // -----
 
 // CHECK-LABEL: func @subview_const_stride_and_offset(
 // CHECK-SAME:         %[[MEM:.*]]: memref<{{.*}}>
-func.func @subview_const_stride_and_offset(%0 : memref<64x8xf32, strided<[8, 1], offset: 0>>) -> memref<62x3xf32, strided<[8, 1], offset: 2>> {
+func.func @subview_const_stride_and_offset(%0 : memref<64x8xf32, strided<[8, 1]>>) -> memref<62x3xf32, strided<[8, 1]>> {
   // The last "insertvalue" that populates the memref descriptor from the function arguments.
   // CHECK: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
 
@@ -214,9 +214,9 @@ func.func @subview_const_stride_and_offset(%0 : memref<64x8xf32, strided<[8, 1],
   // CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[CST_STRIDE1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
   %1 = memref.subview %0[0, 2][62, 3][1, 1] :
-    memref<64x8xf32, strided<[8, 1], offset: 0>>
-    to memref<62x3xf32, strided<[8, 1], offset: 2>>
-  return %1 : memref<62x3xf32, strided<[8, 1], offset: 2>>
+    memref<64x8xf32, strided<[8, 1]>>
+    to memref<62x3xf32, strided<[8, 1]>>
+  return %1 : memref<62x3xf32, strided<[8, 1]>>
 }
 
 // -----
@@ -226,7 +226,7 @@ func.func @subview_const_stride_and_offset(%0 : memref<64x8xf32, strided<[8, 1],
 // CHECK:         %[[ARG0f:[a-zA-Z0-9]*]]: index,
 // CHECK:         %[[ARG1f:[a-zA-Z0-9]*]]: index,
 // CHECK:         %[[ARG2f:.*]]: index)
-func.func @subview_mixed_static_dynamic(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : index, %arg1 : index, %arg2 : index) -> memref<62x?xf32, strided<[?, 1], offset: ?>> {
+func.func @subview_mixed_static_dynamic(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1 : index, %arg2 : index) -> memref<62x?xf32, strided<[?, 1]>> {
   // CHECK-DAG: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK-DAG: %[[ARG0:.*]] = builtin.unrealized_conversion_cast %[[ARG0f]]
   // CHECK-DAG: %[[ARG1:.*]] = builtin.unrealized_conversion_cast %[[ARG1f]]
@@ -255,16 +255,16 @@ func.func @subview_mixed_static_dynamic(%0 : memref<64x4xf32, strided<[4, 1], of
   // CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[CST_STRIDE1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
   %1 = memref.subview %0[%arg1, 2][62, %arg2][%arg0, 1] :
-    memref<64x4xf32, strided<[4, 1], offset: 0>>
-    to memref<62x?xf32, strided<[?, 1], offset: ?>>
-  return %1 : memref<62x?xf32, strided<[?, 1], offset: ?>>
+    memref<64x4xf32, strided<[4, 1]>>
+    to memref<62x?xf32, strided<[?, 1]>>
+  return %1 : memref<62x?xf32, strided<[?, 1]>>
 }
 
 // -----
 
 // CHECK-LABEL: func @subview_leading_operands(
 // CHECK:         %[[MEM:.*]]: memref<{{.*}}>,
-func.func @subview_leading_operands(%0 : memref<5x3xf32>, %1: memref<5x?xf32>) -> memref<3x3xf32, strided<[3, 1], offset: 6>> {
+func.func @subview_leading_operands(%0 : memref<5x3xf32>, %1: memref<5x?xf32>) -> memref<3x3xf32, strided<[3, 1]>> {
   // CHECK: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // Alloc ptr
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64
@@ -284,16 +284,16 @@ func.func @subview_leading_operands(%0 : memref<5x3xf32>, %1: memref<5x?xf32>) -
   // CHECK: %[[DESC5:.*]] = llvm.insertvalue %[[C3]], %[[DESC4]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[CST_STRIDE1:.*]] = llvm.mlir.constant(1 : index) : i64
   // CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[CST_STRIDE1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-  %2 = memref.subview %0[2, 0][3, 3][1, 1]: memref<5x3xf32> to memref<3x3xf32, strided<[3, 1], offset: 6>>
+  %2 = memref.subview %0[2, 0][3, 3][1, 1]: memref<5x3xf32> to memref<3x3xf32, strided<[3, 1]>>
 
-  return %2 : memref<3x3xf32, strided<[3, 1], offset: 6>>
+  return %2 : memref<3x3xf32, strided<[3, 1]>>
 }
 
 // -----
 
 // CHECK-LABEL: func @subview_leading_operands_dynamic(
 // CHECK:         %[[MEM:[a-zA-Z0-9]*]]: memref
-func.func @subview_leading_operands_dynamic(%0 : memref<5x?xf32>) -> memref<3x?xf32, strided<[?, 1], offset: ?>> {
+func.func @subview_leading_operands_dynamic(%0 : memref<5x?xf32>) -> memref<3x?xf32, strided<[?, 1]>> {
   // CHECK: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK: %[[SIZE1:.*]] = llvm.extractvalue %[[MEMREF]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64
@@ -322,15 +322,15 @@ func.func @subview_leading_operands_dynamic(%0 : memref<5x?xf32>) -> memref<3x?x
 
   %c0 = arith.constant 1 : index
   %d0 = memref.dim %0, %c0 : memref<5x?xf32>
-  %1 = memref.subview %0[2, 0][3, %d0][1, 1]: memref<5x?xf32> to memref<3x?xf32, strided<[?, 1], offset: ?>>
-  return %1 : memref<3x?xf32, strided<[?, 1], offset: ?>>
+  %1 = memref.subview %0[2, 0][3, %d0][1, 1]: memref<5x?xf32> to memref<3x?xf32, strided<[?, 1]>>
+  return %1 : memref<3x?xf32, strided<[?, 1]>>
 }
 
 // -----
 
 // CHECK-LABEL: func @subview_rank_reducing_leading_operands(
 // CHECK:         %[[MEM:.*]]: memref
-func.func @subview_rank_reducing_leading_operands(%0 : memref<5x3xf32>) -> memref<3xf32, strided<[1], offset: 3>> {
+func.func @subview_rank_reducing_leading_operands(%0 : memref<5x3xf32>) -> memref<3xf32, strided<[1]>> {
   // CHECK: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64
@@ -346,16 +346,16 @@ func.func @subview_rank_reducing_leading_operands(%0 : memref<5x3xf32>) -> memre
   // CHECK: %[[CST_STRIDE0:.*]] = llvm.mlir.constant(1 : index) : i64
   // CHECK: %[[DESC4:.*]] = llvm.insertvalue %[[CST_STRIDE0]], %[[DESC3]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 
-  %1 = memref.subview %0[1, 0][1, 3][1, 1]: memref<5x3xf32> to memref<3xf32, strided<[1], offset: 3>>
+  %1 = memref.subview %0[1, 0][1, 3][1, 1]: memref<5x3xf32> to memref<3xf32, strided<[1]>>
 
-  return %1 : memref<3xf32, strided<[1], offset: 3>>
+  return %1 : memref<3xf32, strided<[1]>>
 }
 
 // -----
 
 // CHECK-LABEL: func @subview_negative_stride
 // CHECK-SAME: (%[[MEM:.*]]: memref<7xf32>)
-func.func @subview_negative_stride(%arg0 : memref<7xf32>) -> memref<7xf32, strided<[-1], offset: 6>> {
+func.func @subview_negative_stride(%arg0 : memref<7xf32>) -> memref<7xf32, strided<[-1]>> {
   // CHECK: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64
@@ -368,11 +368,11 @@ func.func @subview_negative_stride(%arg0 : memref<7xf32>) -> memref<7xf32, strid
   // CHECK: %[[DESC3:.*]] = llvm.insertvalue %[[CST_SIZE0]], %[[DESC2]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK: %[[CST_STRIDE0:.*]] = llvm.mlir.constant(-1 : index) : i64
   // CHECK: %[[DESC4:.*]] = llvm.insertvalue %[[CST_STRIDE0]], %[[DESC3]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-  // CHECK: %[[RES:.*]] = builtin.unrealized_conversion_cast %[[DESC4]] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> to memref<7xf32, strided<[-1], offset: 6>>
-  // CHECK: return %[[RES]] : memref<7xf32, strided<[-1], offset: 6>>
+  // CHECK: %[[RES:.*]] = builtin.unrealized_conversion_cast %[[DESC4]] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> to memref<7xf32, strided<[-1]>>
+  // CHECK: return %[[RES]] : memref<7xf32, strided<[-1]>>
 
-  %0 = memref.subview %arg0[6] [7] [-1] : memref<7xf32> to memref<7xf32, strided<[-1], offset: 6>>
-  return %0 : memref<7xf32, strided<[-1], offset: 6>>
+  %0 = memref.subview %arg0[6] [7] [-1] : memref<7xf32> to memref<7xf32, strided<[-1]>>
+  return %0 : memref<7xf32, strided<[-1]>>
 }
 
 // -----
@@ -410,16 +410,16 @@ func.func @collapse_shape_static(%arg0: memref<1x3x4x1x5xf32>) -> memref<3x4x5xf
 // -----
 
 func.func @collapse_shape_dynamic_with_non_identity_layout(
-        %arg0 : memref<4x?x?xf32, strided<[?, 4, 1], offset: ?>>) ->
-        memref<4x?xf32, strided<[?, ?], offset: ?>> {
+        %arg0 : memref<4x?x?xf32, strided<[?, 4, 1]>>) ->
+        memref<4x?xf32, strided<[?, ?]>> {
   %0 = memref.collapse_shape %arg0 [[0], [1, 2]]:
-    memref<4x?x?xf32, strided<[?, 4, 1], offset: ?>> into
-    memref<4x?xf32, strided<[?, ?], offset: ?>>
-  return %0 : memref<4x?xf32, strided<[?, ?], offset: ?>>
+    memref<4x?x?xf32, strided<[?, 4, 1]>> into
+    memref<4x?xf32, strided<[?, ?]>>
+  return %0 : memref<4x?xf32, strided<[?, ?]>>
 }
 // CHECK-LABEL:   func.func @collapse_shape_dynamic_with_non_identity_layout(
-// CHECK-SAME:                                                               %[[ARG:.*]]: memref<4x?x?xf32, strided<[?, 4, 1], offset: ?>>) -> memref<4x?xf32, strided<[?, ?], offset: ?>> {
-// CHECK:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<4x?x?xf32, strided<[?, 4, 1], offset: ?>> to !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK-SAME:                                                               %[[ARG:.*]]: memref<4x?x?xf32, strided<[?, 4, 1]>>) -> memref<4x?xf32, strided<[?, ?]>> {
+// CHECK:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<4x?x?xf32, strided<[?, 4, 1]>> to !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i64,
 // CHECK:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i64,
 // CHECK:           %[[OFFSET:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
@@ -439,12 +439,12 @@ func.func @collapse_shape_dynamic_with_non_identity_layout(
 // CHECK:           %[[DESC5:.*]] = llvm.insertvalue %[[FINAL_SIZE1]], %[[DESC4]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[C1:.*]] = llvm.mlir.constant(1 : index) : i64
 // CHECK:           %[[DESC6:.*]] = llvm.insertvalue %[[C1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[RES:.*]] = builtin.unrealized_conversion_cast %[[DESC6]] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> to memref<4x?xf32, strided<[?, ?], offset: ?>>
-// CHECK:           return %[[RES]] : memref<4x?xf32, strided<[?, ?], offset: ?>>
+// CHECK:           %[[RES:.*]] = builtin.unrealized_conversion_cast %[[DESC6]] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> to memref<4x?xf32, strided<[?, ?]>>
+// CHECK:           return %[[RES]] : memref<4x?xf32, strided<[?, ?]>>
 // CHECK:         }
 // CHECK32-LABEL:   func.func @collapse_shape_dynamic_with_non_identity_layout(
-// CHECK32-SAME:                                                               %[[ARG:.*]]: memref<4x?x?xf32, strided<[?, 4, 1], offset: ?>>) -> memref<4x?xf32, strided<[?, ?], offset: ?>> {
-// CHECK32:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<4x?x?xf32, strided<[?, 4, 1], offset: ?>> to !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
+// CHECK32-SAME:                                                               %[[ARG:.*]]: memref<4x?x?xf32, strided<[?, 4, 1]>>) -> memref<4x?xf32, strided<[?, ?]>> {
+// CHECK32:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<4x?x?xf32, strided<[?, 4, 1]>> to !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
 // CHECK32:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i32,
 // CHECK32:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i32,
 // CHECK32:           %[[OFFSET:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
@@ -464,8 +464,8 @@ func.func @collapse_shape_dynamic_with_non_identity_layout(
 // CHECK32:           %[[DESC5:.*]] = llvm.insertvalue %[[FINAL_SIZE1_CAST]], %[[DESC4]][3, 1] : !llvm.struct<(ptr, ptr, i32, array<2 x i32>, array<2 x i32>)>
 // CHECK32:           %[[C1_I32:.*]] = llvm.mlir.constant(1 : index) : i32
 // CHECK32:           %[[DESC6:.*]] = llvm.insertvalue %[[C1_I32]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i32, array<2 x i32>, array<2 x i32>)>
-// CHECK32:           %[[RES:.*]] = builtin.unrealized_conversion_cast %[[DESC6]] : !llvm.struct<(ptr, ptr, i32, array<2 x i32>, array<2 x i32>)> to memref<4x?xf32, strided<[?, ?], offset: ?>>
-// CHECK32:           return %[[RES]] : memref<4x?xf32, strided<[?, ?], offset: ?>>
+// CHECK32:           %[[RES:.*]] = builtin.unrealized_conversion_cast %[[DESC6]] : !llvm.struct<(ptr, ptr, i32, array<2 x i32>, array<2 x i32>)> to memref<4x?xf32, strided<[?, ?]>>
+// CHECK32:           return %[[RES]] : memref<4x?xf32, strided<[?, ?]>>
 // CHECK32:         }
 
 // -----
@@ -623,18 +623,18 @@ func.func @expand_shape_dynamic(%arg0 : memref<1x?xf32>, %sz0: index) -> memref<
 // -----
 
 func.func @expand_shape_dynamic_with_non_identity_layout(
-            %arg0 : memref<1x?xf32, strided<[?, ?], offset: ?>>, %sz0: index) ->
-            memref<1x2x?xf32, strided<[?, ?, ?], offset: ?>> {
+            %arg0 : memref<1x?xf32, strided<[?, ?]>>, %sz0: index) ->
+            memref<1x2x?xf32, strided<[?, ?, ?]>> {
   %0 = memref.expand_shape %arg0 [[0], [1, 2]] output_shape [1, 2, %sz0] :
-    memref<1x?xf32, strided<[?, ?], offset: ?>> into
-    memref<1x2x?xf32, strided<[?, ?, ?], offset: ?>>
-  return %0 : memref<1x2x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<1x?xf32, strided<[?, ?]>> into
+    memref<1x2x?xf32, strided<[?, ?, ?]>>
+  return %0 : memref<1x2x?xf32, strided<[?, ?, ?]>>
 }
 // CHECK-LABEL:   func.func @expand_shape_dynamic_with_non_identity_layout(
-// CHECK-SAME:      %[[ARG0:.*]]: memref<1x?xf32, strided<[?, ?], offset: ?>>,
-// CHECK-SAME:      %[[ARG1:.*]]: index) -> memref<1x2x?xf32, strided<[?, ?, ?], offset: ?>> {
+// CHECK-SAME:      %[[ARG0:.*]]: memref<1x?xf32, strided<[?, ?]>>,
+// CHECK-SAME:      %[[ARG1:.*]]: index) -> memref<1x2x?xf32, strided<[?, ?, ?]>> {
 // CHECK:           %[[UNREALIZED_CONVERSION_CAST_0:.*]] = builtin.unrealized_conversion_cast %[[ARG1]] : index to i64
-// CHECK:           %[[UNREALIZED_CONVERSION_CAST_1:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<1x?xf32, strided<[?, ?], offset: ?>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[UNREALIZED_CONVERSION_CAST_1:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<1x?xf32, strided<[?, ?]>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[EXTRACTVALUE_0:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[EXTRACTVALUE_1:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[MLIR_0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64)>
@@ -659,17 +659,17 @@ func.func @expand_shape_dynamic_with_non_identity_layout(
 // CHECK:           %[[INSERTVALUE_8:.*]] = llvm.insertvalue %[[UNREALIZED_CONVERSION_CAST_3]], %[[INSERTVALUE_7]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_9:.*]] = llvm.insertvalue %[[UNREALIZED_CONVERSION_CAST_0]], %[[INSERTVALUE_8]][3, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_10:.*]] = llvm.insertvalue %[[EXTRACTVALUE_4]], %[[INSERTVALUE_9]][4, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[UNREALIZED_CONVERSION_CAST_4:.*]] = builtin.unrealized_conversion_cast %[[INSERTVALUE_10]] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> to memref<1x2x?xf32, strided<[?, ?, ?], offset: ?>>
-// CHECK:           return %[[UNREALIZED_CONVERSION_CAST_4]] : memref<1x2x?xf32, strided<[?, ?, ?], offset: ?>>
+// CHECK:           %[[UNREALIZED_CONVERSION_CAST_4:.*]] = builtin.unrealized_conversion_cast %[[INSERTVALUE_10]] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)> to memref<1x2x?xf32, strided<[?, ?, ?]>>
+// CHECK:           return %[[UNREALIZED_CONVERSION_CAST_4]] : memref<1x2x?xf32, strided<[?, ?, ?]>>
 // CHECK:         }
 
 // -----
 
 // CHECK-LABEL: func @collapse_static_shape_with_non_identity_layout
-func.func @collapse_static_shape_with_non_identity_layout(%arg: memref<1x1x8x8xf32, strided<[64, 64, 8, 1], offset: ?>>) -> memref<64xf32, strided<[1], offset: ?>> {
+func.func @collapse_static_shape_with_non_identity_layout(%arg: memref<1x1x8x8xf32, strided<[64, 64, 8, 1]>>) -> memref<64xf32, strided<[1]>> {
 // CHECK-NOT: memref.collapse_shape
-  %1 = memref.collapse_shape %arg [[0, 1, 2, 3]] : memref<1x1x8x8xf32, strided<[64, 64, 8, 1], offset: ?>> into memref<64xf32, strided<[1], offset: ?>>
-  return %1 : memref<64xf32, strided<[1], offset: ?>>
+  %1 = memref.collapse_shape %arg [[0, 1, 2, 3]] : memref<1x1x8x8xf32, strided<[64, 64, 8, 1]>> into memref<64xf32, strided<[1]>>
+  return %1 : memref<64xf32, strided<[1]>>
 }
 
 // -----
@@ -680,8 +680,8 @@ func.func @collapse_static_shape_with_non_identity_layout(%arg: memref<1x1x8x8xf
 // will be able to do their job easily.
 
 // CHECK-LABEL: func @load_and_assume(
-// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf32, strided<[?, ?], offset: ?>>,
-// CHECK: %[[DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xf32, strided<[?, ?], offset: ?>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf32, strided<[?, ?]>>,
+// CHECK: %[[DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xf32, strided<[?, ?]>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK: %[[ALIGNED_PTR:.*]] = llvm.extractvalue %[[DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK: %[[OFFSET:.*]] = llvm.extractvalue %[[DESC]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK: %[[BUFF_ADDR:.*]] = llvm.getelementptr %[[ALIGNED_PTR]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, f32
@@ -690,10 +690,10 @@ func.func @collapse_static_shape_with_non_identity_layout(%arg: memref<1x1x8x8xf
 // CHECK: %[[VAL:.*]] = llvm.load %[[LD_ADDR]] : !llvm.ptr -> f32
 // CHECK: return %[[VAL]] : f32
 func.func @load_and_assume(
-    %arg0: memref<?x?xf32, strided<[?, ?], offset: ?>>,
+    %arg0: memref<?x?xf32, strided<[?, ?]>>,
     %i0: index, %i1: index)
     -> f32 {
-  %arg0_align = memref.assume_alignment %arg0, 16 : memref<?x?xf32, strided<[?, ?], offset: ?>>
-  %2 = memref.load %arg0_align[%i0, %i1] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+  %arg0_align = memref.assume_alignment %arg0, 16 : memref<?x?xf32, strided<[?, ?]>>
+  %2 = memref.load %arg0_align[%i0, %i1] : memref<?x?xf32, strided<[?, ?]>>
   func.return %2 : f32
 }
diff --git a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm-with-transforms.mlir b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm-with-transforms.mlir
index f6d0524fce39d..26988aa58c918 100644
--- a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm-with-transforms.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm-with-transforms.mlir
@@ -3,8 +3,8 @@
 // Checks that the program does not crash. The functionality of the pattern is
 // already checked in test/Dialect/MemRef/*.mlir
 
-func.func @subview_folder(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> {
-  %subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>>
-  return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>>
+func.func @subview_folder(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1]>> {
+  %subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1]>>
+  return %subview : memref<?x?xf32, strided<[100, 1]>>
 }
 // CHECK-LABEL: llvm.func @subview_folder
diff --git a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
index d2fe5ab582b71..fede45f965329 100644
--- a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
@@ -169,13 +169,13 @@ func.func @view_memref_as_rank0(%offset: index, %mem: memref<2xi8>) {
 // CHECK32:         %[[ARG1:[a-zA-Z0-9]*]]: index,
 // CHECK32:         %[[ARG2:.*]]: index)
 // CHECK-INTERFACE-LABEL: func @subview(
-func.func @subview(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : index, %arg1 : index, %arg2 : index) {
+func.func @subview(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1 : index, %arg2 : index) {
   // CHECK: memref.subview %[[MEMREF]][%[[ARG0]], %[[ARG1]]] [%[[ARG0]], %[[ARG1]]]
   // CHECK32: memref.subview %[[MEMREF]][%[[ARG0]], %[[ARG1]]] [%[[ARG0]], %[[ARG1]]] [%[[ARG0]], %[[ARG1]]]
   // CHECK-INTERFACE: memref.subview
   %1 = memref.subview %0[%arg0, %arg1][%arg0, %arg1][%arg0, %arg1] :
-    memref<64x4xf32, strided<[4, 1], offset: 0>>
-  to memref<?x?xf32, strided<[?, ?], offset: ?>>
+    memref<64x4xf32, strided<[4, 1]>>
+  to memref<?x?xf32, strided<[?, ?]>>
   return
 }
 
@@ -227,7 +227,7 @@ func.func @distinct_objects_noop(%arg0: memref<?xf16>) -> memref<?xf16> {
 
 // CHECK-LABEL: func @assume_alignment_w_offset
 // CHECK-INTERFACE-LABEL: func @assume_alignment_w_offset
-func.func @assume_alignment_w_offset(%0 : memref<4x4xf16, strided<[?, ?], offset: ?>>) {
+func.func @assume_alignment_w_offset(%0 : memref<4x4xf16, strided<[?, ?]>>) {
   // CHECK-DAG: %[[PTR:.*]] = llvm.extractvalue %[[MEMREF:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK-DAG: %[[OFFSET:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK-DAG: %[[BUFF_ADDR:.*]] =  llvm.getelementptr %[[PTR]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, f16
@@ -235,7 +235,7 @@ func.func @assume_alignment_w_offset(%0 : memref<4x4xf16, strided<[?, ?], offset
   // CHECK-DAG: %[[ALIGN:.*]] = llvm.mlir.constant(16 : index) : i64
   // CHECK-NEXT: llvm.intr.assume %[[TRUE]] ["align"(%[[BUFF_ADDR]], %[[ALIGN]] : !llvm.ptr, i64)] : i1
   // CHECK-INTERFACE: llvm.intr.assume
-  %1 = memref.assume_alignment %0, 16 : memref<4x4xf16, strided<[?, ?], offset: ?>>
+  %1 = memref.assume_alignment %0, 16 : memref<4x4xf16, strided<[?, ?]>>
   return
 }
 // -----
@@ -308,8 +308,8 @@ func.func @address_space(%arg0 : memref<32xf32, affine_map<(d0) -> (d0)>, 7>) {
 //       CHECK:    llvm.insertvalue {{.*}}[4, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK-INTERFACE-LABEL: func @transpose
 // CHECK-INTERFACE-NOT: memref.transpose
-func.func @transpose(%arg0: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>) {
-  %0 = memref.transpose %arg0 (i, j, k) -> (k, i, j) : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>> to memref<?x?x?xf32, strided<[1, ?, ?], offset: ?>>
+func.func @transpose(%arg0: memref<?x?x?xf32, strided<[?, ?, 1]>>) {
+  %0 = memref.transpose %arg0 (i, j, k) -> (k, i, j) : memref<?x?x?xf32, strided<[?, ?, 1]>> to memref<?x?x?xf32, strided<[1, ?, ?]>>
   return
 }
 
@@ -502,15 +502,15 @@ func.func @atomic_rmw(%I : memref<10xi32>, %ival : i32, %F : memref<10xf32>, %fv
 
 // -----
 
-func.func @atomic_rmw_with_offset(%I : memref<10xi32, strided<[1], offset: 5>>, %ival : i32, %i : index) {
-  memref.atomic_rmw andi %ival, %I[%i] : (i32, memref<10xi32, strided<[1], offset: 5>>) -> i32
+func.func @atomic_rmw_with_offset(%I : memref<10xi32, strided<[1]>>, %ival : i32, %i : index) {
+  memref.atomic_rmw andi %ival, %I[%i] : (i32, memref<10xi32, strided<[1]>>) -> i32
   return
 }
 // CHECK-LABEL:  func @atomic_rmw_with_offset
-// CHECK-SAME:   %[[ARG0:.+]]: memref<10xi32, strided<[1], offset: 5>>
+// CHECK-SAME:   %[[ARG0:.+]]: memref<10xi32, strided<[1]>>
 // CHECK-SAME:   %[[ARG1:.+]]: i32
 // CHECK-SAME:   %[[ARG2:.+]]: index
-// CHECK-DAG:    %[[MEMREF_STRUCT:.+]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<10xi32, strided<[1], offset: 5>> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK-DAG:    %[[MEMREF_STRUCT:.+]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<10xi32, strided<[1]>> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 // CHECK-DAG:    %[[INDEX:.+]] = builtin.unrealized_conversion_cast %[[ARG2]] : index to i64
 // CHECK:        %[[BASE_PTR:.+]] = llvm.extractvalue %[[MEMREF_STRUCT]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 // CHECK:        %[[OFFSET:.+]] = llvm.mlir.constant(5 : index) : i64
@@ -618,13 +618,13 @@ func.func @memref_copy_ranked() {
 // CHECK-INTERFACE-LABEL: func @memref_copy_contiguous
 func.func @memref_copy_contiguous(%in: memref<16x4xi32>, %offset: index) {
   %buf = memref.alloc() : memref<1x2xi32>
-  %sub = memref.subview %in[%offset, 0] [1, 2] [1, 1] : memref<16x4xi32> to memref<1x2xi32, strided<[4, 1], offset: ?>>
-  memref.copy %sub, %buf : memref<1x2xi32, strided<[4, 1], offset: ?>> to memref<1x2xi32>
+  %sub = memref.subview %in[%offset, 0] [1, 2] [1, 1] : memref<16x4xi32> to memref<1x2xi32, strided<[4, 1]>>
+  memref.copy %sub, %buf : memref<1x2xi32, strided<[4, 1]>> to memref<1x2xi32>
   // Skip the memref descriptor of the alloc.
   // CHECK: llvm.insertvalue {{%.*}}, {{%.*}}[4, 1]
   // Get the memref for the subview.
-  // CHECK: %[[SUBVIEW:.*]] = memref.subview %{{.*}}[%{{.*}}, 0] [1, 2] [1, 1] : memref<16x4xi32> to memref<1x2xi32, strided<[4, 1], offset: ?>>
-  // CHECK: %[[DESC:.*]] = builtin.unrealized_conversion_cast %[[SUBVIEW]] : memref<1x2xi32, strided<[4, 1], offset: ?>> to !llvm.struct<(ptr
+  // CHECK: %[[SUBVIEW:.*]] = memref.subview %{{.*}}[%{{.*}}, 0] [1, 2] [1, 1] : memref<16x4xi32> to memref<1x2xi32, strided<[4, 1]>>
+  // CHECK: %[[DESC:.*]] = builtin.unrealized_conversion_cast %[[SUBVIEW]] : memref<1x2xi32, strided<[4, 1]>> to !llvm.struct<(ptr
   // CHECK: [[EXTRACT0:%.*]] = llvm.extractvalue %[[DESC]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: [[MUL1:%.*]] = llvm.mul {{.*}}, [[EXTRACT0]] : i64
   // CHECK: [[EXTRACT1:%.*]] = llvm.extractvalue %[[DESC]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -650,9 +650,9 @@ func.func @memref_copy_contiguous(%in: memref<16x4xi32>, %offset: index) {
 // CHECK-INTERFACE-LABEL: func @memref_copy_0d_offset
 func.func @memref_copy_0d_offset(%in: memref<2xi32>) {
   %buf = memref.alloc() : memref<i32>
-  %sub = memref.subview %in[1] [1] [1] : memref<2xi32> to memref<1xi32, strided<[1], offset: 1>>
-  %scalar = memref.collapse_shape %sub [] : memref<1xi32, strided<[1], offset: 1>> into memref<i32, strided<[], offset: 1>>
-  memref.copy %scalar, %buf : memref<i32, strided<[], offset: 1>> to memref<i32>
+  %sub = memref.subview %in[1] [1] [1] : memref<2xi32> to memref<1xi32, strided<[1]>>
+  %scalar = memref.collapse_shape %sub [] : memref<1xi32, strided<[1]>> into memref<i32, strided<[]>>
+  memref.copy %scalar, %buf : memref<i32, strided<[]>> to memref<i32>
   // CHECK: llvm.intr.memcpy
   // CHECK-INTERFACE: llvm.intr.memcpy
   return
@@ -664,8 +664,8 @@ func.func @memref_copy_0d_offset(%in: memref<2xi32>) {
 // CHECK-INTERFACE-LABEL: func @memref_copy_noncontiguous
 func.func @memref_copy_noncontiguous(%in: memref<16x2xi32>, %offset: index) {
   %buf = memref.alloc() : memref<2x1xi32>
-  %sub = memref.subview %in[%offset, 0] [2, 1] [1, 1] : memref<16x2xi32> to memref<2x1xi32, strided<[2, 1], offset: ?>>
-  memref.copy %sub, %buf : memref<2x1xi32, strided<[2, 1], offset: ?>> to memref<2x1xi32>
+  %sub = memref.subview %in[%offset, 0] [2, 1] [1, 1] : memref<16x2xi32> to memref<2x1xi32, strided<[2, 1]>>
+  memref.copy %sub, %buf : memref<2x1xi32, strided<[2, 1]>> to memref<2x1xi32>
   // CHECK: llvm.call @memrefCopy
   // CHECK-INTERFACE: llvm.call @memrefCopy
   return
@@ -742,7 +742,7 @@ func.func @extract_aligned_pointer_as_index_unranked(%m: memref<*xf32>) -> index
 
 // CHECK-LABEL: func @extract_strided_metadata(
 // CHECK-SAME: %[[ARG:.*]]: memref
-// CHECK: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<?x?xf32, strided<[?, ?], offset: ?>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<?x?xf32, strided<[?, ?]>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64)>
@@ -760,10 +760,10 @@ func.func @extract_aligned_pointer_as_index_unranked(%m: memref<*xf32>) -> index
 // CHECK-INTERFACE-NOT: memref.extract_strided_metadata
 
 func.func @extract_strided_metadata(
-    %ref: memref<?x?xf32, strided<[?,?], offset: ?>>) {
+    %ref: memref<?x?xf32, strided<[?,?]>>) {
 
   %base, %offset, %sizes:2, %strides:2 =
-    memref.extract_strided_metadata %ref : memref<?x?xf32, strided<[?,?], offset: ?>>
+    memref.extract_strided_metadata %ref : memref<?x?xf32, strided<[?,?]>>
     -> memref<f32>, index,
        index, index,
        index, index
diff --git a/mlir/test/Conversion/MemRefToSPIRV/memref-to-spirv.mlir b/mlir/test/Conversion/MemRefToSPIRV/memref-to-spirv.mlir
index 931dd43be33c3..94f67b8b05ea2 100644
--- a/mlir/test/Conversion/MemRefToSPIRV/memref-to-spirv.mlir
+++ b/mlir/test/Conversion/MemRefToSPIRV/memref-to-spirv.mlir
@@ -388,36 +388,36 @@ module attributes {
 
 // CHECK-LABEL: func.func @reinterpret_cast
 //  CHECK-SAME:  (%[[MEM:.*]]: memref<?xf32, #spirv.storage_class<CrossWorkgroup>>, %[[OFF:.*]]: index)
-func.func @reinterpret_cast(%arg: memref<?xf32, #spirv.storage_class<CrossWorkgroup>>, %arg1: index) -> memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>> {
+func.func @reinterpret_cast(%arg: memref<?xf32, #spirv.storage_class<CrossWorkgroup>>, %arg1: index) -> memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>> {
 //   CHECK-DAG:  %[[MEM1:.*]] = builtin.unrealized_conversion_cast %[[MEM]] : memref<?xf32, #spirv.storage_class<CrossWorkgroup>> to !spirv.ptr<f32, CrossWorkgroup>
 //   CHECK-DAG:  %[[OFF1:.*]] = builtin.unrealized_conversion_cast %[[OFF]] : index to i32
 //       CHECK:  %[[RET:.*]] = spirv.InBoundsPtrAccessChain %[[MEM1]][%[[OFF1]]] : !spirv.ptr<f32, CrossWorkgroup>, i32
-//       CHECK:  %[[RET1:.*]] = builtin.unrealized_conversion_cast %[[RET]] : !spirv.ptr<f32, CrossWorkgroup> to memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>>
+//       CHECK:  %[[RET1:.*]] = builtin.unrealized_conversion_cast %[[RET]] : !spirv.ptr<f32, CrossWorkgroup> to memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>>
 //       CHECK:  return %[[RET1]]
-  %ret = memref.reinterpret_cast %arg to offset: [%arg1], sizes: [10], strides: [1] : memref<?xf32, #spirv.storage_class<CrossWorkgroup>> to memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>>
-  return %ret : memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>>
+  %ret = memref.reinterpret_cast %arg to offset: [%arg1], sizes: [10], strides: [1] : memref<?xf32, #spirv.storage_class<CrossWorkgroup>> to memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>>
+  return %ret : memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>>
 }
 
 // CHECK-LABEL: func.func @reinterpret_cast_0
 //  CHECK-SAME:  (%[[MEM:.*]]: memref<?xf32, #spirv.storage_class<CrossWorkgroup>>)
-func.func @reinterpret_cast_0(%arg: memref<?xf32, #spirv.storage_class<CrossWorkgroup>>) -> memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>> {
+func.func @reinterpret_cast_0(%arg: memref<?xf32, #spirv.storage_class<CrossWorkgroup>>) -> memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>> {
 //   CHECK-DAG:  %[[MEM1:.*]] = builtin.unrealized_conversion_cast %[[MEM]] : memref<?xf32, #spirv.storage_class<CrossWorkgroup>> to !spirv.ptr<f32, CrossWorkgroup>
-//   CHECK-DAG:  %[[RET:.*]] = builtin.unrealized_conversion_cast %[[MEM1]] : !spirv.ptr<f32, CrossWorkgroup> to memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>>
+//   CHECK-DAG:  %[[RET:.*]] = builtin.unrealized_conversion_cast %[[MEM1]] : !spirv.ptr<f32, CrossWorkgroup> to memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>>
 //       CHECK:  return %[[RET]]
-  %ret = memref.reinterpret_cast %arg to offset: [0], sizes: [10], strides: [1] : memref<?xf32, #spirv.storage_class<CrossWorkgroup>> to memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>>
-  return %ret : memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>>
+  %ret = memref.reinterpret_cast %arg to offset: [0], sizes: [10], strides: [1] : memref<?xf32, #spirv.storage_class<CrossWorkgroup>> to memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>>
+  return %ret : memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>>
 }
 
 // CHECK-LABEL: func.func @reinterpret_cast_5
 //  CHECK-SAME:  (%[[MEM:.*]]: memref<?xf32, #spirv.storage_class<CrossWorkgroup>>)
-func.func @reinterpret_cast_5(%arg: memref<?xf32, #spirv.storage_class<CrossWorkgroup>>) -> memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>> {
+func.func @reinterpret_cast_5(%arg: memref<?xf32, #spirv.storage_class<CrossWorkgroup>>) -> memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>> {
 //       CHECK:  %[[MEM1:.*]] = builtin.unrealized_conversion_cast %[[MEM]] : memref<?xf32, #spirv.storage_class<CrossWorkgroup>> to !spirv.ptr<f32, CrossWorkgroup>
 //       CHECK:  %[[OFF:.*]] = spirv.Constant 5 : i32
 //       CHECK:  %[[RET:.*]] = spirv.InBoundsPtrAccessChain %[[MEM1]][%[[OFF]]] : !spirv.ptr<f32, CrossWorkgroup>, i32
-//       CHECK:  %[[RET1:.*]] = builtin.unrealized_conversion_cast %[[RET]] : !spirv.ptr<f32, CrossWorkgroup> to memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>>
+//       CHECK:  %[[RET1:.*]] = builtin.unrealized_conversion_cast %[[RET]] : !spirv.ptr<f32, CrossWorkgroup> to memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>>
 //       CHECK:  return %[[RET1]]
-  %ret = memref.reinterpret_cast %arg to offset: [5], sizes: [10], strides: [1] : memref<?xf32, #spirv.storage_class<CrossWorkgroup>> to memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>>
-  return %ret : memref<?xf32, strided<[1], offset: ?>, #spirv.storage_class<CrossWorkgroup>>
+  %ret = memref.reinterpret_cast %arg to offset: [5], sizes: [10], strides: [1] : memref<?xf32, #spirv.storage_class<CrossWorkgroup>> to memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>>
+  return %ret : memref<?xf32, strided<[1]>, #spirv.storage_class<CrossWorkgroup>>
 }
 
 } // end module
diff --git a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
index 50bea5a85022e..464592b716c2d 100644
--- a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
+++ b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
@@ -836,7 +836,7 @@ func.func @tma_fence(%tensorMap1d: !tensorMap1d) {
 }
 
 !lhsTensorMap = !nvgpu.tensormap.descriptor<tensor = memref<128x64xf16, 3>, swizzle = swizzle_128b, l2promo = none, oob = zero, interleave = none>
-!rhsTensorMap = !nvgpu.tensormap.descriptor<tensor = memref<64x64xf16, strided<[64, 1], offset: 8192>, 3>, swizzle = swizzle_128b, l2promo = none, oob = zero, interleave = none>
+!rhsTensorMap = !nvgpu.tensormap.descriptor<tensor = memref<64x64xf16, strided<[64, 1]>, 3>, swizzle = swizzle_128b, l2promo = none, oob = zero, interleave = none>
 
 module @mymodule {
   // Dynamic Shared memory
@@ -847,8 +847,8 @@ module @mymodule {
     %dynamicMem = memref.get_global @dynamicShmem : memref<0xf16, 3>
     %lhsShmem = memref.reinterpret_cast %dynamicMem to offset: [0], sizes: [128,64], strides: [64,1] : memref<0xf16, 3> to memref<128x64xf16,3>
     %rhsShmem2 = memref.reinterpret_cast %dynamicMem to offset: [0], sizes: [4, 64, 64],  strides: [4096, 64, 1] : memref<0xf16, 3> to memref<4x64x64xf16,3>
-    %rhsShmem3 = memref.subview %rhsShmem2[2, 0, 0][1, 64, 64][1, 1, 1] : memref<4x64x64xf16,3> to memref<1x64x64xf16, strided<[4096, 64, 1], offset: 8192>, 3>
-    %rhsShmem = memref.subview %rhsShmem3[0, 0, 0][1, 64, 64][1, 1, 1]  : memref<1x64x64xf16, strided<[4096, 64, 1], offset: 8192>, 3> to memref<64x64xf16, strided<[64, 1], offset: 8192>, 3>
+    %rhsShmem3 = memref.subview %rhsShmem2[2, 0, 0][1, 64, 64][1, 1, 1] : memref<4x64x64xf16,3> to memref<1x64x64xf16, strided<[4096, 64, 1]>, 3>
+    %rhsShmem = memref.subview %rhsShmem3[0, 0, 0][1, 64, 64][1, 1, 1]  : memref<1x64x64xf16, strided<[4096, 64, 1]>, 3> to memref<64x64xf16, strided<[64, 1]>, 3>
     // CHECK: nvvm.cp.async.bulk.tensor.shared.cluster.global
     nvgpu.tma.async.load %lhsTensorMap[%c0, %c0], %mbarrier[%c0] to %lhsShmem : !lhsTensorMap, !barrierType -> memref<128x64xf16,3>
     // CHECK: %[[desc:.+]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
@@ -856,7 +856,7 @@ module @mymodule {
     // CHECK: %[[shmemOfset:.+]] = llvm.getelementptr %[[desc]][%[[c8192]]] : (!llvm.ptr<3>, i64)
     // CHECK: %[[dest:.+]] = llvm.addrspacecast %[[shmemOfset]] : !llvm.ptr<3> to !llvm.ptr<7>
     // CHECK: nvvm.cp.async.bulk.tensor.shared.cluster.global %[[dest]], %{{.*}}, %{{.*}}, box[%{{.*}}, %{{.*}}]
-    nvgpu.tma.async.load %rhsTensorMap[%c0, %c0], %mbarrier[%c0] to %rhsShmem : !rhsTensorMap, !barrierType -> memref<64x64xf16, strided<[64, 1], offset: 8192>, 3>
+    nvgpu.tma.async.load %rhsTensorMap[%c0, %c0], %mbarrier[%c0] to %rhsShmem : !rhsTensorMap, !barrierType -> memref<64x64xf16, strided<[64, 1]>, 3>
     return
   }
 }
diff --git a/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir b/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
index 5128fd8ccb265..7110a622dcb03 100644
--- a/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
+++ b/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
@@ -207,11 +207,11 @@ func.func @test_memref_mixed(%arg0: memref<10x?x30xf32, #ptr.generic_space>) ->
 // CHECK:           %[[VAL_11:.*]] = llvm.insertvalue %[[VAL_10]], %[[VAL_9]][0] : !llvm.struct<(ptr)>
 // CHECK:           llvm.return %[[VAL_7]] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:         }
-func.func @test_memref_strided(%arg0: memref<10x20xf32, strided<[40, 2], offset: 5>, #ptr.generic_space>) -> memref<10x20xf32, strided<[40, 2], offset: 5>, #ptr.generic_space> {
-  %0 = ptr.to_ptr %arg0 : memref<10x20xf32, strided<[40, 2], offset: 5>, #ptr.generic_space> -> <#ptr.generic_space>
-  %1 = ptr.get_metadata %arg0 : memref<10x20xf32, strided<[40, 2], offset: 5>, #ptr.generic_space>
-  %2 = ptr.from_ptr %0 metadata %1 : <#ptr.generic_space> -> memref<10x20xf32, strided<[40, 2], offset: 5>, #ptr.generic_space>
-  return %2 : memref<10x20xf32, strided<[40, 2], offset: 5>, #ptr.generic_space>
+func.func @test_memref_strided(%arg0: memref<10x20xf32, strided<[40, 2]>, #ptr.generic_space>) -> memref<10x20xf32, strided<[40, 2]>, #ptr.generic_space> {
+  %0 = ptr.to_ptr %arg0 : memref<10x20xf32, strided<[40, 2]>, #ptr.generic_space> -> <#ptr.generic_space>
+  %1 = ptr.get_metadata %arg0 : memref<10x20xf32, strided<[40, 2]>, #ptr.generic_space>
+  %2 = ptr.from_ptr %0 metadata %1 : <#ptr.generic_space> -> memref<10x20xf32, strided<[40, 2]>, #ptr.generic_space>
+  return %2 : memref<10x20xf32, strided<[40, 2]>, #ptr.generic_space>
 }
 
 // Tests a comprehensive scenario with fully dynamic memref, including pointer arithmetic
@@ -259,13 +259,13 @@ func.func @test_memref_strided(%arg0: memref<10x20xf32, strided<[40, 2], offset:
 // CHECK:           %[[VAL_39:.*]] = llvm.insertvalue %[[VAL_38]], %[[VAL_37]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           llvm.return %[[VAL_39]] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:         }
-func.func @test_comprehensive_dynamic(%arg0: memref<?x?xf32, strided<[?, ?], offset: ?>, #ptr.generic_space>) -> memref<?x?xf32, strided<[?, ?], offset: ?>, #ptr.generic_space> {
-  %0 = ptr.to_ptr %arg0 : memref<?x?xf32, strided<[?, ?], offset: ?>, #ptr.generic_space> -> <#ptr.generic_space>
-  %1 = ptr.get_metadata %arg0 : memref<?x?xf32, strided<[?, ?], offset: ?>, #ptr.generic_space>
+func.func @test_comprehensive_dynamic(%arg0: memref<?x?xf32, strided<[?, ?]>, #ptr.generic_space>) -> memref<?x?xf32, strided<[?, ?]>, #ptr.generic_space> {
+  %0 = ptr.to_ptr %arg0 : memref<?x?xf32, strided<[?, ?]>, #ptr.generic_space> -> <#ptr.generic_space>
+  %1 = ptr.get_metadata %arg0 : memref<?x?xf32, strided<[?, ?]>, #ptr.generic_space>
   %2 = ptr.type_offset f32 : index
   %3 = ptr.ptr_add inbounds %0, %2 : !ptr.ptr<#ptr.generic_space>, index
-  %4 = ptr.from_ptr %3 metadata %1 : <#ptr.generic_space> -> memref<?x?xf32, strided<[?, ?], offset: ?>, #ptr.generic_space>
-  return %4 : memref<?x?xf32, strided<[?, ?], offset: ?>, #ptr.generic_space>
+  %4 = ptr.from_ptr %3 metadata %1 : <#ptr.generic_space> -> memref<?x?xf32, strided<[?, ?]>, #ptr.generic_space>
+  return %4 : memref<?x?xf32, strided<[?, ?]>, #ptr.generic_space>
 }
 
 // Tests a round-trip conversion of a 0D (scalar) memref
diff --git a/mlir/test/Conversion/SCFToGPU/parallel_loop.mlir b/mlir/test/Conversion/SCFToGPU/parallel_loop.mlir
index 2f192df1dad2e..af906f3c6fcbf 100644
--- a/mlir/test/Conversion/SCFToGPU/parallel_loop.mlir
+++ b/mlir/test/Conversion/SCFToGPU/parallel_loop.mlir
@@ -201,37 +201,37 @@ func.func @parallel_loop_tiled_seq(%arg0 : index, %arg1 : index, %arg2 : index,
 #map2 = affine_map<(d0)[s0] -> (3, -d0 + s0)>
 
 module {
-  func.func @sum(%arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>, %arg1: memref<?x?xf32, strided<[?, 1], offset: ?>>, %arg2: memref<?x?xf32, strided<[?, 1], offset: ?>>) {
+  func.func @sum(%arg0: memref<?x?xf32, strided<[?, 1]>>, %arg1: memref<?x?xf32, strided<[?, 1]>>, %arg2: memref<?x?xf32, strided<[?, 1]>>) {
     %c1 = arith.constant 1 : index
     %c0 = arith.constant 0 : index
     %c3 = arith.constant 3 : index
     %c2 = arith.constant 2 : index
-    %0 = memref.dim %arg0, %c0 : memref<?x?xf32, strided<[?, 1], offset: ?>>
-    %1 = memref.dim %arg0, %c1 : memref<?x?xf32, strided<[?, 1], offset: ?>>
+    %0 = memref.dim %arg0, %c0 : memref<?x?xf32, strided<[?, 1]>>
+    %1 = memref.dim %arg0, %c1 : memref<?x?xf32, strided<[?, 1]>>
     scf.parallel (%arg3, %arg4) = (%c0, %c0) to (%0, %1) step (%c2, %c3) {
-      %2 = memref.dim %arg0, %c0 : memref<?x?xf32, strided<[?, 1], offset: ?>>
+      %2 = memref.dim %arg0, %c0 : memref<?x?xf32, strided<[?, 1]>>
       %3 = affine.min #map1(%arg3)[%2]
       %squared_min = arith.muli %3, %3 : index
-      %4 = memref.dim %arg0, %c1 : memref<?x?xf32, strided<[?, 1], offset: ?>>
+      %4 = memref.dim %arg0, %c1 : memref<?x?xf32, strided<[?, 1]>>
       %d = arith.subi %4, %arg4 : index
       %5 = arith.minsi %c3, %d : index
-      %6 = memref.subview %arg0[%arg3, %arg4][%squared_min, %5][%c1, %c1] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-      %7 = memref.dim %arg1, %c0 : memref<?x?xf32, strided<[?, 1], offset: ?>>
+      %6 = memref.subview %arg0[%arg3, %arg4][%squared_min, %5][%c1, %c1] : memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
+      %7 = memref.dim %arg1, %c0 : memref<?x?xf32, strided<[?, 1]>>
       %8 = affine.min #map1(%arg3)[%7]
-      %9 = memref.dim %arg1, %c1 : memref<?x?xf32, strided<[?, 1], offset: ?>>
+      %9 = memref.dim %arg1, %c1 : memref<?x?xf32, strided<[?, 1]>>
       %10 = affine.min #map2(%arg4)[%9]
-      %11 = memref.subview %arg1[%arg3, %arg4][%8, %10][%c1, %c1] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-      %12 = memref.dim %arg2, %c0 : memref<?x?xf32, strided<[?, 1], offset: ?>>
+      %11 = memref.subview %arg1[%arg3, %arg4][%8, %10][%c1, %c1] : memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
+      %12 = memref.dim %arg2, %c0 : memref<?x?xf32, strided<[?, 1]>>
       %13 = affine.min #map1(%arg3)[%12]
-      %14 = memref.dim %arg2, %c1 : memref<?x?xf32, strided<[?, 1], offset: ?>>
+      %14 = memref.dim %arg2, %c1 : memref<?x?xf32, strided<[?, 1]>>
       %15 = affine.min #map2(%arg4)[%14]
-      %16 = memref.subview %arg2[%arg3, %arg4][%13, %15][%c1, %c1] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+      %16 = memref.subview %arg2[%arg3, %arg4][%13, %15][%c1, %c1] : memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
       scf.parallel (%arg5, %arg6) = (%c0, %c0) to (%squared_min, %5) step (%c1, %c1) {
-        %17 = memref.load %6[%arg5, %arg6] : memref<?x?xf32, strided<[?, ?], offset: ?>>
-        %18 = memref.load %11[%arg5, %arg6] : memref<?x?xf32, strided<[?, ?], offset: ?>>
-        %19 = memref.load %16[%arg5, %arg6] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+        %17 = memref.load %6[%arg5, %arg6] : memref<?x?xf32, strided<[?, ?]>>
+        %18 = memref.load %11[%arg5, %arg6] : memref<?x?xf32, strided<[?, ?]>>
+        %19 = memref.load %16[%arg5, %arg6] : memref<?x?xf32, strided<[?, ?]>>
         %20 = arith.addf %17, %18 : f32
-        memref.store %20, %16[%arg5, %arg6] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+        memref.store %20, %16[%arg5, %arg6] : memref<?x?xf32, strided<[?, ?]>>
         scf.reduce
       } {mapping = [#gpu.loop_dim_map<bound = (d0) -> (d0), map = (d0) -> (d0), processor = thread_x>, #gpu.loop_dim_map<bound = (d0) -> (d0), map = (d0) -> (d0), processor = thread_y>]}
       scf.reduce
@@ -247,13 +247,13 @@ module {
 
 // CHECK:       module {
 // CHECK-LABEL:   func @sum(
-// CHECK-SAME:              [[VAL_0:%.*]]: memref<?x?xf32, strided<[?, 1], offset: ?>>, [[VAL_1:%.*]]: memref<?x?xf32, strided<[?, 1], offset: ?>>, [[VAL_2:%.*]]: memref<?x?xf32, strided<[?, 1], offset: ?>>) {
+// CHECK-SAME:              [[VAL_0:%.*]]: memref<?x?xf32, strided<[?, 1]>>, [[VAL_1:%.*]]: memref<?x?xf32, strided<[?, 1]>>, [[VAL_2:%.*]]: memref<?x?xf32, strided<[?, 1]>>) {
 // CHECK:           %[[C1:.*]] = arith.constant 1 : index
 // CHECK:           %[[C0:.*]] = arith.constant 0 : index
 // CHECK:           %[[C3:.*]] = arith.constant 3 : index
 // CHECK:           %[[C2:.*]] = arith.constant 2 : index
-// CHECK:           [[VAL_7:%.*]] = memref.dim [[VAL_0]], %[[C0]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
-// CHECK:           [[VAL_8:%.*]] = memref.dim [[VAL_0]], %[[C1]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:           [[VAL_7:%.*]] = memref.dim [[VAL_0]], %[[C0]] : memref<?x?xf32, strided<[?, 1]>>
+// CHECK:           [[VAL_8:%.*]] = memref.dim [[VAL_0]], %[[C1]] : memref<?x?xf32, strided<[?, 1]>>
 // CHECK:           [[VAL_9:%.*]] = arith.constant 1 : index
 // CHECK:           [[VAL_10:%.*]] = affine.apply #[[$MAP1]]([[VAL_7]]){{\[}}%[[C0]], %[[C2]]]
 // CHECK:           [[VAL_11:%.*]] = affine.apply #[[$MAP1]]([[VAL_8]]){{\[}}%[[C0]], %[[C3]]]
@@ -263,34 +263,34 @@ module {
 // CHECK:           gpu.launch blocks([[VAL_16:%.*]], [[VAL_17:%.*]], [[VAL_18:%.*]]) in ([[VAL_19:%.*]] = [[VAL_10]], [[VAL_20:%.*]] = [[VAL_11]], [[VAL_21:%.*]] = [[VAL_9]]) threads([[VAL_22:%.*]], [[VAL_23:%.*]], [[VAL_24:%.*]]) in ([[VAL_25:%.*]] = [[VAL_13]], [[VAL_26:%.*]] = [[VAL_15]], [[VAL_27:%.*]] = [[VAL_9]]) {
 // CHECK:             [[VAL_28:%.*]] = affine.apply #[[$MAP2]]([[VAL_16]]){{\[}}%[[C2]], %[[C0]]]
 // CHECK:             [[VAL_29:%.*]] = affine.apply #[[$MAP2]]([[VAL_17]]){{\[}}%[[C3]], %[[C0]]]
-// CHECK:             [[VAL_30:%.*]] = memref.dim [[VAL_0]], %[[C0]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:             [[VAL_30:%.*]] = memref.dim [[VAL_0]], %[[C0]] : memref<?x?xf32, strided<[?, 1]>>
 // CHECK:             [[VAL_31:%.*]] = affine.min #[[$MAP3]]([[VAL_28]]){{\[}}[[VAL_30]]]
 // CHECK:             [[VAL_31_SQUARED:%.*]] = arith.muli [[VAL_31]], [[VAL_31]] : index
-// CHECK:             [[VAL_32:%.*]] = memref.dim [[VAL_0]], %[[C1]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:             [[VAL_32:%.*]] = memref.dim [[VAL_0]], %[[C1]] : memref<?x?xf32, strided<[?, 1]>>
 // CHECK:             [[VAL_D:%.*]] = arith.subi [[VAL_32]], [[VAL_29]] : index
 // CHECK:             [[VAL_33:%.*]] = arith.minsi %[[C3]], [[VAL_D]] : index
-// CHECK:             [[VAL_34:%.*]] = memref.subview [[VAL_0]]{{\[}}[[VAL_28]], [[VAL_29]]] {{\[}}[[VAL_31_SQUARED]], [[VAL_33]]] {{\[}}%[[C1]], %[[C1]]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-// CHECK:             [[VAL_35:%.*]] = memref.dim [[VAL_1]], %[[C0]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:             [[VAL_34:%.*]] = memref.subview [[VAL_0]]{{\[}}[[VAL_28]], [[VAL_29]]] {{\[}}[[VAL_31_SQUARED]], [[VAL_33]]] {{\[}}%[[C1]], %[[C1]]] : memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
+// CHECK:             [[VAL_35:%.*]] = memref.dim [[VAL_1]], %[[C0]] : memref<?x?xf32, strided<[?, 1]>>
 // CHECK:             [[VAL_36:%.*]] = affine.min #[[$MAP3]]([[VAL_28]]){{\[}}[[VAL_35]]]
-// CHECK:             [[VAL_37:%.*]] = memref.dim [[VAL_1]], %[[C1]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:             [[VAL_37:%.*]] = memref.dim [[VAL_1]], %[[C1]] : memref<?x?xf32, strided<[?, 1]>>
 // CHECK:             [[VAL_38:%.*]] = affine.min #[[$MAP4]]([[VAL_29]]){{\[}}[[VAL_37]]]
-// CHECK:             [[VAL_39:%.*]] = memref.subview [[VAL_1]]{{\[}}[[VAL_28]], [[VAL_29]]] {{\[}}[[VAL_36]], [[VAL_38]]] {{\[}}%[[C1]], %[[C1]]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-// CHECK:             [[VAL_40:%.*]] = memref.dim [[VAL_2]], %[[C0]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:             [[VAL_39:%.*]] = memref.subview [[VAL_1]]{{\[}}[[VAL_28]], [[VAL_29]]] {{\[}}[[VAL_36]], [[VAL_38]]] {{\[}}%[[C1]], %[[C1]]] : memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
+// CHECK:             [[VAL_40:%.*]] = memref.dim [[VAL_2]], %[[C0]] : memref<?x?xf32, strided<[?, 1]>>
 // CHECK:             [[VAL_41:%.*]] = affine.min #[[$MAP3]]([[VAL_28]]){{\[}}[[VAL_40]]]
-// CHECK:             [[VAL_42:%.*]] = memref.dim [[VAL_2]], %[[C1]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:             [[VAL_42:%.*]] = memref.dim [[VAL_2]], %[[C1]] : memref<?x?xf32, strided<[?, 1]>>
 // CHECK:             [[VAL_43:%.*]] = affine.min #[[$MAP4]]([[VAL_29]]){{\[}}[[VAL_42]]]
-// CHECK:             [[VAL_44:%.*]] = memref.subview [[VAL_2]]{{\[}}[[VAL_28]], [[VAL_29]]] {{\[}}[[VAL_41]], [[VAL_43]]] {{\[}}%[[C1]], %[[C1]]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+// CHECK:             [[VAL_44:%.*]] = memref.subview [[VAL_2]]{{\[}}[[VAL_28]], [[VAL_29]]] {{\[}}[[VAL_41]], [[VAL_43]]] {{\[}}%[[C1]], %[[C1]]] : memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
 // CHECK:             [[VAL_45:%.*]] = affine.apply #[[$MAP2]]([[VAL_22]]){{\[}}%[[C1]], %[[C0]]]
 // CHECK:             [[VAL_46:%.*]] = arith.cmpi slt, [[VAL_45]], [[VAL_31_SQUARED]] : index
 // CHECK:             scf.if [[VAL_46]] {
 // CHECK:               [[VAL_47:%.*]] = affine.apply #[[$MAP2]]([[VAL_23]]){{\[}}%[[C1]], %[[C0]]]
 // CHECK:               [[VAL_48:%.*]] = arith.cmpi slt, [[VAL_47]], [[VAL_33]] : index
 // CHECK:               scf.if [[VAL_48]] {
-// CHECK:                 [[VAL_49:%.*]] = memref.load [[VAL_34]]{{\[}}[[VAL_45]], [[VAL_47]]] : memref<?x?xf32, strided<[?, ?], offset: ?>>
-// CHECK:                 [[VAL_50:%.*]] = memref.load [[VAL_39]]{{\[}}[[VAL_45]], [[VAL_47]]] : memref<?x?xf32, strided<[?, ?], offset: ?>>
-// CHECK:                 [[VAL_51:%.*]] = memref.load [[VAL_44]]{{\[}}[[VAL_45]], [[VAL_47]]] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+// CHECK:                 [[VAL_49:%.*]] = memref.load [[VAL_34]]{{\[}}[[VAL_45]], [[VAL_47]]] : memref<?x?xf32, strided<[?, ?]>>
+// CHECK:                 [[VAL_50:%.*]] = memref.load [[VAL_39]]{{\[}}[[VAL_45]], [[VAL_47]]] : memref<?x?xf32, strided<[?, ?]>>
+// CHECK:                 [[VAL_51:%.*]] = memref.load [[VAL_44]]{{\[}}[[VAL_45]], [[VAL_47]]] : memref<?x?xf32, strided<[?, ?]>>
 // CHECK:                 [[VAL_52:%.*]] = arith.addf [[VAL_49]], [[VAL_50]] : f32
-// CHECK:                 memref.store [[VAL_52]], [[VAL_44]]{{\[}}[[VAL_45]], [[VAL_47]]] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+// CHECK:                 memref.store [[VAL_52]], [[VAL_44]]{{\[}}[[VAL_45]], [[VAL_47]]] : memref<?x?xf32, strided<[?, ?]>>
 // CHECK:               }
 // CHECK:             }
 // CHECK:             gpu.terminator
@@ -537,18 +537,18 @@ func.func @parallel_reduction_1d_tiled() {
   %alloc_0 = memref.alloc() : memref<8192xf32>
   %alloc_1 = memref.alloc() : memref<64xf32>
   scf.parallel (%arg1) = (%c0) to (%c64) step (%c1) {
-    %subview = memref.subview %alloc_1[%arg1] [1] [1] : memref<64xf32> to memref<f32, strided<[], offset: ?>>
+    %subview = memref.subview %alloc_1[%arg1] [1] [1] : memref<64xf32> to memref<f32, strided<[]>>
     %0 = affine.apply affine_map<(d0) -> (d0 * 128)>(%arg1)
-    %subview_1 = memref.subview %alloc_0[%0] [128] [1] : memref<8192xf32> to memref<128xf32, strided<[1], offset: ?>>
+    %subview_1 = memref.subview %alloc_0[%0] [128] [1] : memref<8192xf32> to memref<128xf32, strided<[1]>>
     %1 = scf.parallel (%arg2) = (%c0) to (%c128) step (%c1) init (%cst) -> f32 {
-      %2 = memref.load %subview_1[%arg2] : memref<128xf32, strided<[1], offset: ?>>
+      %2 = memref.load %subview_1[%arg2] : memref<128xf32, strided<[1]>>
       scf.reduce(%2 : f32) {
       ^bb0(%arg3: f32, %arg4: f32):
         %3 = arith.addf %arg3, %arg4 : f32
         scf.reduce.return %3 : f32
       }
     } {mapping = [#gpu.loop_dim_map<processor = thread_x, map = (d0) -> (d0), bound = (d0) -> (d0)>]}
-    memref.store %1, %subview[] : memref<f32, strided<[], offset: ?>>
+    memref.store %1, %subview[] : memref<f32, strided<[]>>
     scf.reduce 
   } {mapping = [#gpu.loop_dim_map<processor = block_x, map = (d0) -> (d0), bound = (d0) -> (d0)>]}
   memref.dealloc %alloc_0 : memref<8192xf32>
@@ -568,13 +568,13 @@ func.func @parallel_reduction_1d_tiled() {
 // CHECK-NEXT: %[[dim1:.*]] = affine.apply #map2(%[[dim0]])
 // CHECK-NEXT: %[[tile:.*]] = memref.subview %[[alloc_0]][%[[dim1]]] [128] [1] : memref<8192xf32>
 // CHECK-NEXT: %[[dim2:.*]] = affine.apply #map1(%[[arg_3]])[{{.*}}, {{.*}}]
-// CHECK-NEXT: %[[src:.*]] = memref.load %[[tile]][%[[dim2]]] : memref<128xf32, strided<[1], offset: ?>>
+// CHECK-NEXT: %[[src:.*]] = memref.load %[[tile]][%[[dim2]]] : memref<128xf32, strided<[1]>>
 // CHECK-NEXT: %[[res:.*]] = gpu.all_reduce %[[src]] {
 // CHECK-NEXT: ^bb0(%[[arg12:.*]]: f32, %[[arg13:.*]]: f32):
 // CHECK-NEXT: %[[sum:.*]] = arith.addf %[[arg12]], %[[arg13]] : f32
 // CHECK-NEXT: gpu.yield %[[sum]] : f32
 // CHECK-NEXT: } : (f32) -> f32
-// CHECK-NEXT: memref.store %[[res]], %[[dst]][] : memref<f32, strided<[], offset: ?>>
+// CHECK-NEXT: memref.store %[[res]], %[[dst]][] : memref<f32, strided<[]>>
 
 // -----
 
diff --git a/mlir/test/Conversion/ShardToMPI/convert-shard-to-mpi.mlir b/mlir/test/Conversion/ShardToMPI/convert-shard-to-mpi.mlir
index 062f05b5c5e13..f7292f417ab3d 100644
--- a/mlir/test/Conversion/ShardToMPI/convert-shard-to-mpi.mlir
+++ b/mlir/test/Conversion/ShardToMPI/convert-shard-to-mpi.mlir
@@ -302,8 +302,8 @@ module attributes { mpi.dlti = #dlti.map<"MPI:comm_world_rank" = 1> } {
     // CHECK-DAG: [[vc2_i32:%.*]] = arith.constant 2 : i32
     // CHECK: [[v0:%.*]] = mpi.comm_world : !mpi.comm
     // CHECK: [[valloc:%.*]] = memref.alloc() : memref<2x120x120xi8>
-    // CHECK: [[vsubview:%.*]] = memref.subview [[varg0]][118, 0, 0] [2, 120, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<2x120x120xi8, strided<[14400, 120, 1], offset: 1699200>>
-    // CHECK: memref.copy [[vsubview]], [[valloc]] : memref<2x120x120xi8, strided<[14400, 120, 1], offset: 1699200>> to memref<2x120x120xi8>
+    // CHECK: [[vsubview:%.*]] = memref.subview [[varg0]][118, 0, 0] [2, 120, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<2x120x120xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[vsubview]], [[valloc]] : memref<2x120x120xi8, strided<[14400, 120, 1]>> to memref<2x120x120xi8>
     // CHECK: mpi.send([[valloc]], [[vc91_i32]], [[vc2_i32]], [[v0]]) : memref<2x120x120xi8>, i32, i32
     // CHECK: mpi.recv([[valloc]], [[vc91_i32]], [[vc0_i32]], [[v0]]) : memref<2x120x120xi8>, i32, i32
     // CHECK: [[vsubview_0:%.*]] = memref.subview [[varg0]][0, 0, 0] [2, 120, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<2x120x120xi8, strided<[14400, 120, 1]>>
@@ -329,31 +329,31 @@ module attributes { mpi.dlti = #dlti.map<"MPI:comm_world_rank" = 24> } {
     // CHECK-DAG: [[vc44_i32:%.*]] = arith.constant 44 : i32
     // CHECK: [[v0:%.*]] = mpi.comm_world : !mpi.comm
     // CHECK: [[valloc:%.*]] = memref.alloc() : memref<117x113x5xi8>
-    // CHECK: [[vsubview:%.*]] = memref.subview [[varg0]][1, 3, 109] [117, 113, 5] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x5xi8, strided<[14400, 120, 1], offset: 14869>>
-    // CHECK: memref.copy [[vsubview]], [[valloc]] : memref<117x113x5xi8, strided<[14400, 120, 1], offset: 14869>> to memref<117x113x5xi8>
+    // CHECK: [[vsubview:%.*]] = memref.subview [[varg0]][1, 3, 109] [117, 113, 5] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x5xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[vsubview]], [[valloc]] : memref<117x113x5xi8, strided<[14400, 120, 1]>> to memref<117x113x5xi8>
     // CHECK: mpi.send([[valloc]], [[vc91_i32]], [[vc44_i32]], [[v0]]) : memref<117x113x5xi8>, i32, i32
     // CHECK: mpi.recv([[valloc]], [[vc91_i32]], [[vc4_i32]], [[v0]]) : memref<117x113x5xi8>, i32, i32
-    // CHECK: [[vsubview_0:%.*]] = memref.subview [[varg0]][1, 3, 0] [117, 113, 5] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x5xi8, strided<[14400, 120, 1], offset: 14760>>
-    // CHECK: memref.copy [[valloc]], [[vsubview_0]] : memref<117x113x5xi8> to memref<117x113x5xi8, strided<[14400, 120, 1], offset: 14760>>
+    // CHECK: [[vsubview_0:%.*]] = memref.subview [[varg0]][1, 3, 0] [117, 113, 5] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x5xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[valloc]], [[vsubview_0]] : memref<117x113x5xi8> to memref<117x113x5xi8, strided<[14400, 120, 1]>>
     // CHECK: memref.dealloc [[valloc]] : memref<117x113x5xi8>
     // CHECK: [[valloc_1:%.*]] = memref.alloc() : memref<117x113x6xi8>
-    // CHECK: [[vsubview_2:%.*]] = memref.subview [[varg0]][1, 3, 5] [117, 113, 6] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x6xi8, strided<[14400, 120, 1], offset: 14765>>
-    // CHECK: memref.copy [[vsubview_2]], [[valloc_1]] : memref<117x113x6xi8, strided<[14400, 120, 1], offset: 14765>> to memref<117x113x6xi8>
+    // CHECK: [[vsubview_2:%.*]] = memref.subview [[varg0]][1, 3, 5] [117, 113, 6] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x6xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[vsubview_2]], [[valloc_1]] : memref<117x113x6xi8, strided<[14400, 120, 1]>> to memref<117x113x6xi8>
     // CHECK: mpi.send([[valloc_1]], [[vc91_i32]], [[vc4_i32]], [[v0]]) : memref<117x113x6xi8>, i32, i32
     // CHECK: mpi.recv([[valloc_1]], [[vc91_i32]], [[vc44_i32]], [[v0]]) : memref<117x113x6xi8>, i32, i32
-    // CHECK: [[vsubview_3:%.*]] = memref.subview [[varg0]][1, 3, 114] [117, 113, 6] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x6xi8, strided<[14400, 120, 1], offset: 14874>>
-    // CHECK: memref.copy [[valloc_1]], [[vsubview_3]] : memref<117x113x6xi8> to memref<117x113x6xi8, strided<[14400, 120, 1], offset: 14874>>
+    // CHECK: [[vsubview_3:%.*]] = memref.subview [[varg0]][1, 3, 114] [117, 113, 6] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x6xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[valloc_1]], [[vsubview_3]] : memref<117x113x6xi8> to memref<117x113x6xi8, strided<[14400, 120, 1]>>
     // CHECK: memref.dealloc [[valloc_1]] : memref<117x113x6xi8>
     // CHECK: [[v1:%.*]] = mpi.comm_world : !mpi.comm
     // CHECK: [[valloc_4:%.*]] = memref.alloc() : memref<117x3x120xi8>
-    // CHECK: [[vsubview_5:%.*]] = memref.subview [[varg0]][1, 113, 0] [117, 3, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<117x3x120xi8, strided<[14400, 120, 1], offset: 27960>>
-    // CHECK: memref.copy [[vsubview_5]], [[valloc_4]] : memref<117x3x120xi8, strided<[14400, 120, 1], offset: 27960>> to memref<117x3x120xi8>
+    // CHECK: [[vsubview_5:%.*]] = memref.subview [[varg0]][1, 113, 0] [117, 3, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<117x3x120xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[vsubview_5]], [[valloc_4]] : memref<117x3x120xi8, strided<[14400, 120, 1]>> to memref<117x3x120xi8>
     // CHECK: mpi.send([[valloc_4]], [[vc91_i32]], [[vc29_i32]], [[v1]]) : memref<117x3x120xi8>, i32, i32
     // CHECK: memref.dealloc [[valloc_4]] : memref<117x3x120xi8>
     // CHECK: [[valloc_6:%.*]] = memref.alloc() : memref<117x4x120xi8>
     // CHECK: mpi.recv([[valloc_6]], [[vc91_i32]], [[vc29_i32]], [[v1]]) : memref<117x4x120xi8>, i32, i32
-    // CHECK: [[vsubview_7:%.*]] = memref.subview [[varg0]][1, 116, 0] [117, 4, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<117x4x120xi8, strided<[14400, 120, 1], offset: 28320>>
-    // CHECK: memref.copy [[valloc_6]], [[vsubview_7]] : memref<117x4x120xi8> to memref<117x4x120xi8, strided<[14400, 120, 1], offset: 28320>>
+    // CHECK: [[vsubview_7:%.*]] = memref.subview [[varg0]][1, 116, 0] [117, 4, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<117x4x120xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[valloc_6]], [[vsubview_7]] : memref<117x4x120xi8> to memref<117x4x120xi8, strided<[14400, 120, 1]>>
     // CHECK: memref.dealloc [[valloc_6]] : memref<117x4x120xi8>
     // CHECK: [[v2:%.*]] = mpi.comm_world : !mpi.comm
     // CHECK: [[valloc_8:%.*]] = memref.alloc() : memref<1x120x120xi8>
@@ -362,8 +362,8 @@ module attributes { mpi.dlti = #dlti.map<"MPI:comm_world_rank" = 24> } {
     // CHECK: memref.copy [[valloc_8]], [[vsubview_9]] : memref<1x120x120xi8> to memref<1x120x120xi8, strided<[14400, 120, 1]>>
     // CHECK: memref.dealloc [[valloc_8]] : memref<1x120x120xi8>
     // CHECK: [[valloc_10:%.*]] = memref.alloc() : memref<2x120x120xi8>
-    // CHECK: [[vsubview_11:%.*]] = memref.subview [[varg0]][1, 0, 0] [2, 120, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<2x120x120xi8, strided<[14400, 120, 1], offset: 14400>>
-    // CHECK: memref.copy [[vsubview_11]], [[valloc_10]] : memref<2x120x120xi8, strided<[14400, 120, 1], offset: 14400>> to memref<2x120x120xi8>
+    // CHECK: [[vsubview_11:%.*]] = memref.subview [[varg0]][1, 0, 0] [2, 120, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<2x120x120xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[vsubview_11]], [[valloc_10]] : memref<2x120x120xi8, strided<[14400, 120, 1]>> to memref<2x120x120xi8>
     // CHECK: mpi.send([[valloc_10]], [[vc91_i32]], [[vc23_i32]], [[v2]]) : memref<2x120x120xi8>, i32, i32
     // CHECK: memref.dealloc [[valloc_10]] : memref<2x120x120xi8>
     %res = shard.update_halo %arg0 on @grid0 split_axes = [[2], [1], [0]] halo_sizes = [1, 2, 3, 4, 5, 6] : memref<120x120x120xi8>
@@ -383,31 +383,31 @@ module attributes { mpi.dlti = #dlti.map<"MPI:comm_world_rank" = 24> } {
     // CHECK: [[v0:%.*]] = bufferization.to_buffer [[varg0]] : tensor<120x120x120xi8> to memref<120x120x120xi8>
     // CHECK: [[v1:%.*]] = mpi.comm_world : !mpi.comm
     // CHECK: [[valloc:%.*]] = memref.alloc() : memref<117x113x5xi8>
-    // CHECK: [[vsubview:%.*]] = memref.subview [[v0]][1, 3, 109] [117, 113, 5] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x5xi8, strided<[14400, 120, 1], offset: 14869>>
-    // CHECK: memref.copy [[vsubview]], [[valloc]] : memref<117x113x5xi8, strided<[14400, 120, 1], offset: 14869>> to memref<117x113x5xi8>
+    // CHECK: [[vsubview:%.*]] = memref.subview [[v0]][1, 3, 109] [117, 113, 5] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x5xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[vsubview]], [[valloc]] : memref<117x113x5xi8, strided<[14400, 120, 1]>> to memref<117x113x5xi8>
     // CHECK: mpi.send([[valloc]], [[vc91_i32]], [[vc44_i32]], [[v1]]) : memref<117x113x5xi8>, i32, i32
     // CHECK: mpi.recv([[valloc]], [[vc91_i32]], [[vc4_i32]], [[v1]]) : memref<117x113x5xi8>, i32, i32
-    // CHECK: [[vsubview_0:%.*]] = memref.subview [[v0]][1, 3, 0] [117, 113, 5] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x5xi8, strided<[14400, 120, 1], offset: 14760>>
-    // CHECK: memref.copy [[valloc]], [[vsubview_0]] : memref<117x113x5xi8> to memref<117x113x5xi8, strided<[14400, 120, 1], offset: 14760>>
+    // CHECK: [[vsubview_0:%.*]] = memref.subview [[v0]][1, 3, 0] [117, 113, 5] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x5xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[valloc]], [[vsubview_0]] : memref<117x113x5xi8> to memref<117x113x5xi8, strided<[14400, 120, 1]>>
     // CHECK: memref.dealloc [[valloc]] : memref<117x113x5xi8>
     // CHECK: [[valloc_1:%.*]] = memref.alloc() : memref<117x113x6xi8>
-    // CHECK: [[vsubview_2:%.*]] = memref.subview [[v0]][1, 3, 5] [117, 113, 6] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x6xi8, strided<[14400, 120, 1], offset: 14765>>
-    // CHECK: memref.copy [[vsubview_2]], [[valloc_1]] : memref<117x113x6xi8, strided<[14400, 120, 1], offset: 14765>> to memref<117x113x6xi8>
+    // CHECK: [[vsubview_2:%.*]] = memref.subview [[v0]][1, 3, 5] [117, 113, 6] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x6xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[vsubview_2]], [[valloc_1]] : memref<117x113x6xi8, strided<[14400, 120, 1]>> to memref<117x113x6xi8>
     // CHECK: mpi.send([[valloc_1]], [[vc91_i32]], [[vc4_i32]], [[v1]]) : memref<117x113x6xi8>, i32, i32
     // CHECK: mpi.recv([[valloc_1]], [[vc91_i32]], [[vc44_i32]], [[v1]]) : memref<117x113x6xi8>, i32, i32
-    // CHECK: [[vsubview_3:%.*]] = memref.subview [[v0]][1, 3, 114] [117, 113, 6] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x6xi8, strided<[14400, 120, 1], offset: 14874>>
-    // CHECK: memref.copy [[valloc_1]], [[vsubview_3]] : memref<117x113x6xi8> to memref<117x113x6xi8, strided<[14400, 120, 1], offset: 14874>>
+    // CHECK: [[vsubview_3:%.*]] = memref.subview [[v0]][1, 3, 114] [117, 113, 6] [1, 1, 1] : memref<120x120x120xi8> to memref<117x113x6xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[valloc_1]], [[vsubview_3]] : memref<117x113x6xi8> to memref<117x113x6xi8, strided<[14400, 120, 1]>>
     // CHECK: memref.dealloc [[valloc_1]] : memref<117x113x6xi8>
     // CHECK: [[v2:%.*]] = mpi.comm_world : !mpi.comm
     // CHECK: [[valloc_4:%.*]] = memref.alloc() : memref<117x3x120xi8>
-    // CHECK: [[vsubview_5:%.*]] = memref.subview [[v0]][1, 113, 0] [117, 3, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<117x3x120xi8, strided<[14400, 120, 1], offset: 27960>>
-    // CHECK: memref.copy [[vsubview_5]], [[valloc_4]] : memref<117x3x120xi8, strided<[14400, 120, 1], offset: 27960>> to memref<117x3x120xi8>
+    // CHECK: [[vsubview_5:%.*]] = memref.subview [[v0]][1, 113, 0] [117, 3, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<117x3x120xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[vsubview_5]], [[valloc_4]] : memref<117x3x120xi8, strided<[14400, 120, 1]>> to memref<117x3x120xi8>
     // CHECK: mpi.send([[valloc_4]], [[vc91_i32]], [[vc29_i32]], [[v2]]) : memref<117x3x120xi8>, i32, i32
     // CHECK: memref.dealloc [[valloc_4]] : memref<117x3x120xi8>
     // CHECK: [[valloc_6:%.*]] = memref.alloc() : memref<117x4x120xi8>
     // CHECK: mpi.recv([[valloc_6]], [[vc91_i32]], [[vc29_i32]], [[v2]]) : memref<117x4x120xi8>, i32, i32
-    // CHECK: [[vsubview_7:%.*]] = memref.subview [[v0]][1, 116, 0] [117, 4, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<117x4x120xi8, strided<[14400, 120, 1], offset: 28320>>
-    // CHECK: memref.copy [[valloc_6]], [[vsubview_7]] : memref<117x4x120xi8> to memref<117x4x120xi8, strided<[14400, 120, 1], offset: 28320>>
+    // CHECK: [[vsubview_7:%.*]] = memref.subview [[v0]][1, 116, 0] [117, 4, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<117x4x120xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[valloc_6]], [[vsubview_7]] : memref<117x4x120xi8> to memref<117x4x120xi8, strided<[14400, 120, 1]>>
     // CHECK: memref.dealloc [[valloc_6]] : memref<117x4x120xi8>
     // CHECK: [[v3:%.*]] = mpi.comm_world : !mpi.comm
     // CHECK: [[valloc_8:%.*]] = memref.alloc() : memref<1x120x120xi8>
@@ -416,8 +416,8 @@ module attributes { mpi.dlti = #dlti.map<"MPI:comm_world_rank" = 24> } {
     // CHECK: memref.copy [[valloc_8]], [[vsubview_9]] : memref<1x120x120xi8> to memref<1x120x120xi8, strided<[14400, 120, 1]>>
     // CHECK: memref.dealloc [[valloc_8]] : memref<1x120x120xi8>
     // CHECK: [[valloc_10:%.*]] = memref.alloc() : memref<2x120x120xi8>
-    // CHECK: [[vsubview_11:%.*]] = memref.subview [[v0]][1, 0, 0] [2, 120, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<2x120x120xi8, strided<[14400, 120, 1], offset: 14400>>
-    // CHECK: memref.copy [[vsubview_11]], [[valloc_10]] : memref<2x120x120xi8, strided<[14400, 120, 1], offset: 14400>> to memref<2x120x120xi8>
+    // CHECK: [[vsubview_11:%.*]] = memref.subview [[v0]][1, 0, 0] [2, 120, 120] [1, 1, 1] : memref<120x120x120xi8> to memref<2x120x120xi8, strided<[14400, 120, 1]>>
+    // CHECK: memref.copy [[vsubview_11]], [[valloc_10]] : memref<2x120x120xi8, strided<[14400, 120, 1]>> to memref<2x120x120xi8>
     // CHECK: mpi.send([[valloc_10]], [[vc91_i32]], [[vc23_i32]], [[v3]]) : memref<2x120x120xi8>, i32, i32
     // CHECK: memref.dealloc [[valloc_10]] : memref<2x120x120xi8>
     // CHECK: [[v4:%.*]] = bufferization.to_tensor [[v0]] restrict writable : memref<120x120x120xi8> to tensor<120x120x120xi8>
diff --git a/mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir b/mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir
index 912f7fba59e60..2c69fd2557744 100644
--- a/mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir
+++ b/mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir
@@ -721,7 +721,7 @@ func.func @m16n8k32_int8_row_col_row(%arg0: memref<128x128xi8, #gpu.address_spac
 #map1 = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map2 = affine_map<(d0, d1, d2) -> (d1, d2)>
 #map3 = affine_map<(d0, d1, d2) -> (d0, d1)>
-!smem_type = memref<20x20xf16, strided<[?, 1], offset: ?>, #gpu.address_space<workgroup>>
+!smem_type = memref<20x20xf16, strided<[?, 1]>, #gpu.address_space<workgroup>>
 
 // This test case is identical to m16n8k16 test case, but it tests that having
 // n row dimension with unknown stride is handled correctly.
@@ -758,7 +758,7 @@ func.func @strided_memref_read_write(%arg0: !smem_type,
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map2 = affine_map<(d0, d1, d2, d3) -> (d2, d0, d3)>
 #map3 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
-!smem_type = memref<20x20x20xf16, strided<[?, ?, 1], offset: ?>, #gpu.address_space<workgroup>>
+!smem_type = memref<20x20x20xf16, strided<[?, ?, 1]>, #gpu.address_space<workgroup>>
 
 // CHECK-LABEL: func @unsupported_non_2d_load_store
 func.func @unsupported_non_2d_load_store(%arg0: !smem_type,
@@ -786,7 +786,7 @@ func.func @unsupported_non_2d_load_store(%arg0: !smem_type,
 #map2 = affine_map<(d0, d1, d2) -> (d1, d2)>
 #map3 = affine_map<(d0, d1, d2) -> (d0, d1)>
 
-!smem_type = memref<20x20xf16, strided<[?, ?], offset: ?>, #gpu.address_space<workgroup>>
+!smem_type = memref<20x20xf16, strided<[?, ?]>, #gpu.address_space<workgroup>>
 
 // CHECK-LABEL: func @unsupported_fully_dynamic_strides
 func.func @unsupported_fully_dynamic_strides(%arg0: !smem_type,
@@ -815,7 +815,7 @@ func.func @unsupported_fully_dynamic_strides(%arg0: !smem_type,
 #map3 = affine_map<(d0, d1, d2) -> (d0, d1)>
 
 
-!smem_type = memref<20x20xf16, strided<[?, 1], offset: ?>, #gpu.address_space<workgroup>>
+!smem_type = memref<20x20xf16, strided<[?, 1]>, #gpu.address_space<workgroup>>
 
 // CHECK-LABEL: func @unsupported_transposed_store
 func.func @unsupported_transposed_store(%arg0: !smem_type,
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
index d570d46e11b4a..00ed7f947b503 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
@@ -2053,10 +2053,10 @@ func.func @gather_with_alignment(%arg0: memref<?xf32>, %arg1: vector<3xi32>, %ar
 // -----
 
 // TODO: Implement this lowering.
-func.func @negative_gather_on_strided_memref(%arg0: memref<?xf32, strided<[2], offset: ?>>, %arg1: vector<3xi32>, %arg2: vector<3xi1>, %arg3: vector<3xf32>) -> vector<3xf32> {
+func.func @negative_gather_on_strided_memref(%arg0: memref<?xf32, strided<[2]>>, %arg1: vector<3xi32>, %arg2: vector<3xi1>, %arg3: vector<3xf32>) -> vector<3xf32> {
   %0 = arith.constant 0: index
   %1 = vector.gather %arg0[%0][%arg1], %arg2, %arg3
-    : memref<?xf32, strided<[2], offset: ?>>, vector<3xi32>, vector<3xi1>, vector<3xf32> into vector<3xf32>
+    : memref<?xf32, strided<[2]>>, vector<3xi32>, vector<3xi1>, vector<3xf32> into vector<3xf32>
   return %1 : vector<3xf32>
 }
 
@@ -2155,10 +2155,10 @@ func.func @scatter_with_alignment(%arg0: memref<?xf32>, %arg1: vector<3xi32>, %a
 // -----
 
 // TODO: Implement this lowering.
-func.func @negative_scatter_on_strided_memref(%arg0: memref<?xf32, strided<[2], offset: ?>>, %arg1: vector<3xi32>, %arg2: vector<3xi1>, %arg3: vector<3xf32>) {
+func.func @negative_scatter_on_strided_memref(%arg0: memref<?xf32, strided<[2]>>, %arg1: vector<3xi32>, %arg2: vector<3xi1>, %arg3: vector<3xf32>) {
   %0 = arith.constant 0: index
   vector.scatter %arg0[%0][%arg1], %arg2, %arg3
-    : memref<?xf32, strided<[2], offset: ?>>, vector<3xi32>, vector<3xi1>, vector<3xf32>
+    : memref<?xf32, strided<[2]>>, vector<3xi32>, vector<3xi1>, vector<3xf32>
   return
 }
 
diff --git a/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir b/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
index 1ed82954398f0..855affaac7e00 100644
--- a/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
+++ b/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
@@ -515,10 +515,10 @@ func.func @transfer_read_with_tensor(%arg: tensor<f32>) -> vector<1xf32> {
 // -----
 
 // CHECK-LABEL: transfer_write_scalable
-func.func @transfer_write_scalable(%arg0: memref<?xf32, strided<[?], offset: ?>>, %arg1: f32) {
+func.func @transfer_write_scalable(%arg0: memref<?xf32, strided<[?]>>, %arg1: f32) {
   %0 = llvm.mlir.constant(0 : i32) : i32
   %c0 = arith.constant 0 : index
-  %dim = memref.dim %arg0, %c0 : memref<?xf32, strided<[?], offset: ?>>
+  %dim = memref.dim %arg0, %c0 : memref<?xf32, strided<[?]>>
   %1 = llvm.intr.stepvector : vector<[16]xi32>
   %2 = arith.index_cast %dim : index to i32
   %3 = llvm.mlir.undef : vector<[16]xi32>
@@ -528,11 +528,11 @@ func.func @transfer_write_scalable(%arg0: memref<?xf32, strided<[?], offset: ?>>
   %7 = llvm.mlir.undef : vector<[16]xf32>
   %8 = llvm.insertelement %arg1, %7[%0 : i32] : vector<[16]xf32>
   %9 = llvm.shufflevector %8, %7 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] : vector<[16]xf32>
-  vector.transfer_write %9, %arg0[%c0], %6 {in_bounds = [true]} : vector<[16]xf32>, memref<?xf32, strided<[?], offset: ?>>
+  vector.transfer_write %9, %arg0[%c0], %6 {in_bounds = [true]} : vector<[16]xf32>, memref<?xf32, strided<[?]>>
   return
 }
 
-// CHECK-SAME:      %[[ARG_0:.*]]: memref<?xf32, strided<[?], offset: ?>>,
+// CHECK-SAME:      %[[ARG_0:.*]]: memref<?xf32, strided<[?]>>,
 // CHECK-DAG:       %[[C_0:.*]] = arith.constant 0 : index
 // CHECK-DAG:       %[[C_16:.*]] = arith.constant 16 : index
 // CHECK-DAG:       %[[STEP:.*]] = arith.constant 1 : index
@@ -543,7 +543,7 @@ func.func @transfer_write_scalable(%arg0: memref<?xf32, strided<[?], offset: ?>>
 // CHECK:             %[[MASK_VAL:.*]] = vector.extract %[[MASK_VEC]][%[[IDX]]] : i1 from vector<[16]xi1>
 // CHECK:             scf.if %[[MASK_VAL]] {
 // CHECK:               %[[VAL_TO_STORE:.*]] = vector.extract %{{.*}}[%[[IDX]]] : f32 from vector<[16]xf32>
-// CHECK:               memref.store %[[VAL_TO_STORE]], %[[ARG_0]][%[[IDX]]] : memref<?xf32, strided<[?], offset: ?>>
+// CHECK:               memref.store %[[VAL_TO_STORE]], %[[ARG_0]][%[[IDX]]] : memref<?xf32, strided<[?]>>
 // CHECK:             } else {
 // CHECK:             }
 // CHECK:           }
diff --git a/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir
index 2a319869a7b06..14c4429109228 100644
--- a/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir
@@ -158,9 +158,9 @@ gpu.func @gather_from_subview(%source: memref<4096x4096xf16>,
                               %pass_thru: vector<8xf16>) -> vector<8xf16> {
   %subview = memref.subview %source[%memref_off, %memref_off] [256, 256] [1, 1]
       : memref<4096x4096xf16>
-        to memref<256x256xf16, strided<[4096, 1], offset: ?>>
+        to memref<256x256xf16, strided<[4096, 1]>>
   %0 = vector.gather %subview[%off1, %off2][%indices], %mask, %pass_thru
-       : memref<256x256xf16, strided<[4096, 1], offset: ?>>,
+       : memref<256x256xf16, strided<[4096, 1]>>,
          vector<8xindex>, vector<8xi1>, vector<8xf16>
          into vector<8xf16>
   gpu.return %0 : vector<8xf16>
@@ -172,13 +172,13 @@ gpu.func @gather_from_subview(%source: memref<4096x4096xf16>,
 // CHECK-SAME:   %[[MASK:.+]]: vector<8xi1>,
 // CHECK-SAME:   %[[PASS:.+]]: vector<8xf16>) -> vector<8xf16> {
 // CHECK:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[MEMREF_OFF]], %[[MEMREF_OFF]]] [256, 256] [1, 1]
-// CHECK:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> memref<f16>, index, index, index, index, index
+// CHECK:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // CHECK:        arith.muli {{.*}}%[[OFF1]]{{.*}} : index
 // CHECK:        arith.addi %[[OFFSET]]{{.*}} : index
 // CHECK:        %[[BASE_OFF:.+]] = arith.addi {{.*}}%[[OFF2]]{{.*}} : index
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast %[[BASE_OFF]] : index to vector<8xindex>
 // CHECK:        %[[LIN:.+]] = arith.addi %[[SPLAT]], %[[INDICES]] : vector<8xindex>
-// CHECK:        %[[BASE_IDX:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> index
+// CHECK:        %[[BASE_IDX:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> index
 // CHECK:        %[[BASE_I64:.+]] = arith.index_cast %[[BASE_IDX]] : index to i64
 // CHECK:        %[[VEC:.+]] = xegpu.load %[[BASE_I64]]{{\[}}%[[LIN]]{{\]}}, %[[MASK]]
 // CHECK-SAME:     : i64, vector<8xindex>, vector<8xi1> -> vector<8xf16>
@@ -189,17 +189,17 @@ gpu.func @gather_from_subview(%source: memref<4096x4096xf16>,
 // -----
 gpu.module @xevm_module {
 gpu.func @non_unit_inner_stride_1D(
-    %source: memref<32xf32, strided<[?], offset: ?>>,
+    %source: memref<32xf32, strided<[?]>>,
     %off: index, %indices: vector<8xindex>, %mask: vector<8xi1>,
     %pass_thru: vector<8xf32>) -> vector<8xf32> {
   %0 = vector.gather %source[%off][%indices], %mask, %pass_thru
-       : memref<32xf32, strided<[?], offset: ?>>,
+       : memref<32xf32, strided<[?]>>,
          vector<8xindex>, vector<8xi1>, vector<8xf32>
          into vector<8xf32>
   gpu.return %0 : vector<8xf32>
 }
 // CHECK-LABEL:  @non_unit_inner_stride_1D(
-// CHECK-SAME:   %[[SRC:.+]]: memref<32xf32, strided<[?], offset: ?>>,
+// CHECK-SAME:   %[[SRC:.+]]: memref<32xf32, strided<[?]>>,
 // CHECK-SAME:   %[[OFF1:.+]]: index,
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8xindex>,
 // CHECK-SAME:   %[[MASK:.+]]: vector<8xi1>, %[[PASS:.+]]: vector<8xf32>) -> vector<8xf32> {
@@ -210,7 +210,7 @@ gpu.func @non_unit_inner_stride_1D(
 // CHECK:        %[[STRD_INDICES:.+]] = arith.muli %[[STRD_VEC:.+]], %[[INDICES]] : vector<8xindex>
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
 // CHECK:        %[[LIN_IDX:.+]] = arith.addi %[[SPLAT]], %[[STRD_INDICES]] : vector<8xindex>
-// CHECK:        %[[BASE:.+]] = memref.extract_aligned_pointer_as_index %[[SRC]] : memref<32xf32, strided<[?], offset: ?>> -> index
+// CHECK:        %[[BASE:.+]] = memref.extract_aligned_pointer_as_index %[[SRC]] : memref<32xf32, strided<[?]>> -> index
 // CHECK:        %[[BASE_I64:.+]] = arith.index_cast %[[BASE]] : index to i64
 // CHECK:        %[[V:.+]] = xegpu.load %[[BASE_I64]]{{\[}}%[[LIN_IDX]]{{\]}}, %[[MASK]] : i64, vector<8xindex>, vector<8xi1> -> vector<8xf32>
 // CHECK:        %[[RES:.+]] = arith.select %[[MASK]], %[[V]], %[[PASS]] : vector<8xi1>, vector<8xf32>
@@ -220,18 +220,18 @@ gpu.func @non_unit_inner_stride_1D(
 // -----
 gpu.module @xevm_module {
 gpu.func @non_unit_inner_stride_3D(
-    %source: memref<4x8x32xf32, strided<[?, 128, 2], offset: ?>>,
+    %source: memref<4x8x32xf32, strided<[?, 128, 2]>>,
     %off0: index, %off1: index, %off2: index,
     %indices: vector<8xindex>, %mask: vector<8xi1>,
     %pass_thru: vector<8xf32>) -> vector<8xf32> {
   %0 = vector.gather %source[%off0, %off1, %off2][%indices], %mask, %pass_thru
-       : memref<4x8x32xf32, strided<[?, 128, 2], offset: ?>>,
+       : memref<4x8x32xf32, strided<[?, 128, 2]>>,
          vector<8xindex>, vector<8xi1>, vector<8xf32>
          into vector<8xf32>
   gpu.return %0 : vector<8xf32>
 }
 // CHECK-LABEL:  @non_unit_inner_stride_3D(
-// CHECK-SAME:   %[[SRC:.+]]: memref<4x8x32xf32, strided<[?, 128, 2], offset: ?>>,
+// CHECK-SAME:   %[[SRC:.+]]: memref<4x8x32xf32, strided<[?, 128, 2]>>,
 // CHECK-SAME:   %[[OFF0:.+]]: index, %[[OFF1:.+]]: index, %[[OFF2:.+]]: index,
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8xindex>, %[[MASK:.+]]: vector<8xi1>,
 // CHECK-SAME:   %[[PASS:.+]]: vector<8xf32>) -> vector<8xf32> {
@@ -243,7 +243,7 @@ gpu.func @non_unit_inner_stride_3D(
 // CHECK:        %[[STRD_INDICES:.+]] = arith.muli {{.*}}%[[INDICES]]{{.*}} : vector<8xindex>
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}} : index to vector<8xindex>
 // CHECK:        %[[LIN_IDX:.+]] = arith.addi %[[SPLAT]], %[[STRD_INDICES]] : vector<8xindex>
-// CHECK:        %[[BASE:.+]] = memref.extract_aligned_pointer_as_index %[[SRC]] : memref<4x8x32xf32, strided<[?, 128, 2], offset: ?>> -> index
+// CHECK:        %[[BASE:.+]] = memref.extract_aligned_pointer_as_index %[[SRC]] : memref<4x8x32xf32, strided<[?, 128, 2]>> -> index
 // CHECK:        %[[BASE_I64:.+]] = arith.index_cast %[[BASE]] : index to i64
 // CHECK:        %[[V:.+]] = xegpu.load %[[BASE_I64]]{{\[}}%[[LIN_IDX]]{{\]}}, %[[MASK]] : i64, vector<8xindex>, vector<8xi1> -> vector<8xf32>
 // CHECK:        %[[RES:.+]] = arith.select %[[MASK]], %[[V]], %[[PASS]] : vector<8xi1>, vector<8xf32>
diff --git a/mlir/test/Conversion/VectorToXeGPU/load-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/load-to-xegpu.mlir
index c77efa03f3483..482911ca49dc5 100644
--- a/mlir/test/Conversion/VectorToXeGPU/load-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/load-to-xegpu.mlir
@@ -12,7 +12,7 @@ func.func @load_1D_vector(%source: memref<8x16x32xf32>, %offset: index) -> vecto
 // CHECK:       %[[ELEM_BYTES:.+]] = arith.constant 4 : index
 // CHECK:       %[[COLLAPSED:.+]] = memref.subview %[[SRC]][%[[OFFSET]], %[[OFFSET]], 0]
 // CHECK:       %[[BASE_BUFFER:.+]], %[[OFFSET1:.+]], %[[SIZES:.+]], %[[STRIDES:.+]] = memref.extract_strided_metadata %[[COLLAPSED]]
-// CHECK-SAME:    : memref<32xf32, strided<[1], offset: ?>> -> memref<f32>, index, index, index
+// CHECK-SAME:    : memref<32xf32, strided<[1]>> -> memref<f32>, index, index, index
 // CHECK:       %[[INTPTR:.+]] = memref.extract_aligned_pointer_as_index %[[BASE_BUFFER]]
 // CHECK-SAME:    : memref<f32> -> index
 // CHECK:       %[[MUL:.+]] = arith.muli %[[OFFSET1]], %[[ELEM_BYTES]] : index
diff --git a/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir
index ffd3f170c0fad..ef2d6e65168d5 100644
--- a/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir
@@ -119,14 +119,14 @@ gpu.func @store_dynamic_source2(%vec: vector<8x16xf32>, %source: memref<?x8x16xf
 // -----
 gpu.module @xevm_module {
 gpu.func @non_unit_inner_stride_1D(
-    %vec: vector<8xf32>, %source: memref<32xf32, strided<[?], offset: ?>>,
+    %vec: vector<8xf32>, %source: memref<32xf32, strided<[?]>>,
     %off: index, %indices: vector<8xindex>, %mask: vector<8xi1>) {
   vector.scatter %source[%off][%indices], %mask, %vec
-    : memref<32xf32, strided<[?], offset: ?>>, vector<8xindex>, vector<8xi1>, vector<8xf32>
+    : memref<32xf32, strided<[?]>>, vector<8xindex>, vector<8xi1>, vector<8xf32>
   gpu.return
 }
 // CHECK-LABEL:  @non_unit_inner_stride_1D(
-// CHECK-SAME:   %[[VAL:.+]]: vector<8xf32>, %[[SRC:.+]]: memref<32xf32, strided<[?], offset: ?>>,
+// CHECK-SAME:   %[[VAL:.+]]: vector<8xf32>, %[[SRC:.+]]: memref<32xf32, strided<[?]>>,
 // CHECK-SAME:   %[[OFF1:.+]]: index,
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8xindex>, %[[MASK:.+]]: vector<8xi1>) {
 // CHECK:        %[[BB:.+]], %[[M_OFF:.+]], %[[SZ:.+]], %[[STRIDE:.+]] = memref.extract_strided_metadata %[[SRC]]
@@ -136,7 +136,7 @@ gpu.func @non_unit_inner_stride_1D(
 // CHECK:        %[[STRD_INDICES:.+]] = arith.muli %[[STRD_VEC:.+]], %[[INDICES]] : vector<8xindex>
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
 // CHECK:        %[[LIN_IDX:.+]] = arith.addi %[[SPLAT]], %[[STRD_INDICES]] : vector<8xindex>
-// CHECK:        %[[BASE:.+]] = memref.extract_aligned_pointer_as_index %[[SRC]] : memref<32xf32, strided<[?], offset: ?>> -> index
+// CHECK:        %[[BASE:.+]] = memref.extract_aligned_pointer_as_index %[[SRC]] : memref<32xf32, strided<[?]>> -> index
 // CHECK:        %[[BASE_I64:.+]] = arith.index_cast %[[BASE]] : index to i64
 // CHECK:        xegpu.store %[[VAL]], %[[BASE_I64]]{{\[}}%[[LIN_IDX]]{{\]}}, %[[MASK]] : vector<8xf32>, i64, vector<8xindex>, vector<8xi1>
 // CHECK:        gpu.return
@@ -146,16 +146,16 @@ gpu.func @non_unit_inner_stride_1D(
 gpu.module @xevm_module {
 gpu.func @non_unit_inner_stride_3D(
     %vec: vector<8xf32>,
-    %source: memref<4x8x32xf32, strided<[?, 128, 2], offset: ?>>,
+    %source: memref<4x8x32xf32, strided<[?, 128, 2]>>,
     %off0: index, %off1: index, %off2: index,
     %indices: vector<8xindex>, %mask: vector<8xi1>) {
   vector.scatter %source[%off0, %off1, %off2][%indices], %mask, %vec
-    : memref<4x8x32xf32, strided<[?, 128, 2], offset: ?>>,
+    : memref<4x8x32xf32, strided<[?, 128, 2]>>,
       vector<8xindex>, vector<8xi1>, vector<8xf32>
   gpu.return
 }
 // CHECK-LABEL:  @non_unit_inner_stride_3D(
-// CHECK-SAME:   %[[VAL:.+]]: vector<8xf32>, %[[SRC:.+]]: memref<4x8x32xf32, strided<[?, 128, 2], offset: ?>>,
+// CHECK-SAME:   %[[VAL:.+]]: vector<8xf32>, %[[SRC:.+]]: memref<4x8x32xf32, strided<[?, 128, 2]>>,
 // CHECK-SAME:   %[[OFF0:.+]]: index, %[[OFF1:.+]]: index, %[[OFF2:.+]]: index,
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8xindex>, %[[MASK:.+]]: vector<8xi1>) {
 // CHECK:        %[[BB:.+]], %[[M_OFF:.+]], %[[SIZES:.+]]:3, %[[STRIDES:.+]]:3 = memref.extract_strided_metadata %[[SRC]]
@@ -166,7 +166,7 @@ gpu.func @non_unit_inner_stride_3D(
 // CHECK:        %[[STRD_INDICES:.+]] = arith.muli {{.*}}%[[INDICES]]{{.*}} : vector<8xindex>
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}} : index to vector<8xindex>
 // CHECK:        %[[LIN_IDX:.+]] = arith.addi %[[SPLAT]], %[[STRD_INDICES]] : vector<8xindex>
-// CHECK:        %[[BASE:.+]] = memref.extract_aligned_pointer_as_index %[[SRC]] : memref<4x8x32xf32, strided<[?, 128, 2], offset: ?>> -> index
+// CHECK:        %[[BASE:.+]] = memref.extract_aligned_pointer_as_index %[[SRC]] : memref<4x8x32xf32, strided<[?, 128, 2]>> -> index
 // CHECK:        %[[BASE_I64:.+]] = arith.index_cast %[[BASE]] : index to i64
 // CHECK:        xegpu.store %[[VAL]], %[[BASE_I64]]{{\[}}%[[LIN_IDX]]{{\]}}, %[[MASK]] : vector<8xf32>, i64, vector<8xindex>, vector<8xi1>
 // CHECK:        gpu.return
@@ -181,9 +181,9 @@ gpu.func @scatter_into_subview(%vals: vector<8xf16>,
                                %mask: vector<8xi1>) {
   %subview = memref.subview %source[%memref_off, %memref_off] [256, 256] [1, 1]
       : memref<4096x4096xf16>
-        to memref<256x256xf16, strided<[4096, 1], offset: ?>>
+        to memref<256x256xf16, strided<[4096, 1]>>
   vector.scatter %subview[%off1, %off2][%indices], %mask, %vals
-      : memref<256x256xf16, strided<[4096, 1], offset: ?>>,
+      : memref<256x256xf16, strided<[4096, 1]>>,
         vector<8xindex>, vector<8xi1>, vector<8xf16>
   gpu.return
 }
@@ -193,13 +193,13 @@ gpu.func @scatter_into_subview(%vals: vector<8xf16>,
 // CHECK-SAME:   %[[MEMREF_OFF:.+]]: index, %[[OFF1:.+]]: index, %[[OFF2:.+]]: index,
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8xindex>, %[[MASK:.+]]: vector<8xi1>) {
 // CHECK:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[MEMREF_OFF]], %[[MEMREF_OFF]]] [256, 256] [1, 1]
-// CHECK:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> memref<f16>, index, index, index, index, index
+// CHECK:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // CHECK:        arith.muli {{.*}}%[[OFF1]]{{.*}} : index
 // CHECK:        arith.addi %[[OFFSET]]{{.*}} : index
 // CHECK:        %[[BASE_OFF:.+]] = arith.addi {{.*}}%[[OFF2]]{{.*}} : index
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast %[[BASE_OFF]] : index to vector<8xindex>
 // CHECK:        %[[LIN:.+]] = arith.addi %[[SPLAT]], %[[INDICES]] : vector<8xindex>
-// CHECK:        %[[BASE_IDX:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> index
+// CHECK:        %[[BASE_IDX:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> index
 // CHECK:        %[[BASE_I64:.+]] = arith.index_cast %[[BASE_IDX]] : index to i64
 // CHECK:        xegpu.store %[[VALS]], %[[BASE_I64]]{{\[}}%[[LIN]]{{\]}}, %[[MASK]] : vector<8xf16>, i64, vector<8xindex>, vector<8xi1>
 // CHECK:        gpu.return
diff --git a/mlir/test/Conversion/VectorToXeGPU/store-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/store-to-xegpu.mlir
index 8ff2e6ee7d13c..d5cdad5ddaf02 100644
--- a/mlir/test/Conversion/VectorToXeGPU/store-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/store-to-xegpu.mlir
@@ -14,7 +14,7 @@ func.func @store_1D_vector(%vec: vector<8xf32>,
 // CHECK:       %[[ELEM_BYTES:.*]] = arith.constant 4 : index
 // CHECK:       %[[COLLAPSED:.+]] = memref.subview %[[SRC]][%[[OFFSET]], %[[OFFSET]], 0]
 // CHECK:       %[[BASE_BUFFER:.+]], %[[OFFSET1:.+]], %[[SIZES:.+]], %[[STRIDES:.+]] = memref.extract_strided_metadata %[[COLLAPSED]]
-// CHECK-SAME:    : memref<32xf32, strided<[1], offset: ?>> -> memref<f32>, index, index, index
+// CHECK-SAME:    : memref<32xf32, strided<[1]>> -> memref<f32>, index, index, index
 // CHECK:       %[[INTPTR:.+]] = memref.extract_aligned_pointer_as_index %[[BASE_BUFFER]]
 // CHECK-SAME:    : memref<f32> -> index
 // CHECK:       %[[MUL:.+]] = arith.muli %[[OFFSET1]], %[[ELEM_BYTES]] : index
diff --git a/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir
index 1a19c8a13f120..586ed0d748644 100644
--- a/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir
@@ -391,11 +391,11 @@ gpu.func @no_load_tensor(%source: tensor<32x64xf32>,
 // -----
 gpu.module @xevm_module {
 gpu.func @no_load_non_unit_inner_stride(
-    %source: memref<32xf32, strided<[?], offset: ?>>,
+    %source: memref<32xf32, strided<[?]>>,
     %offset: index) -> vector<8xf32> {
   %c0 = arith.constant 0.0 : f32
   %0 = vector.transfer_read %source[%offset], %c0 {in_bounds = [true]}
-    : memref<32xf32, strided<[?], offset: ?>>, vector<8xf32>
+    : memref<32xf32, strided<[?]>>, vector<8xf32>
   gpu.return %0 : vector<8xf32>
 }
 
@@ -429,9 +429,9 @@ gpu.func @no_load_unsupported_map(%source: memref<16x32x64xf32>,
 gpu.module @xevm_module {
 gpu.func @load_from_subview_1D(%source: memref<4096x4096xf16>, %off1: index, %off2: index) -> vector<8xf16> {
   %c0 = arith.constant 0.0 : f16
-  %subview = memref.subview %source[%off1, %off2] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>>
+  %subview = memref.subview %source[%off1, %off2] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
   %0 = vector.transfer_read %subview[%off2, %off2], %c0
-    {in_bounds = [true]} : memref<256x256xf16, strided<[4096, 1], offset: ?>>, vector<8xf16>
+    {in_bounds = [true]} : memref<256x256xf16, strided<[4096, 1]>>, vector<8xf16>
   gpu.return %0 : vector<8xf16>
 }
 
@@ -439,15 +439,15 @@ gpu.func @load_from_subview_1D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-ND-SAME:   %[[SRC:.+]]: memref<4096x4096xf16>,
 // LOAD-ND-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-ND:        %[[CST:.+]] = arith.constant dense<true> : vector<8xi1>
-// LOAD-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>> 
-// LOAD-ND:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> memref<f16>, index, index, index, index, index
+// LOAD-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>> 
+// LOAD-ND:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // LOAD-ND:        %[[STEP:.+]] = vector.step : vector<8xindex>
 // LOAD-ND:        arith.muli {{.*}} : index
 // LOAD-ND:        arith.addi %[[OFFSET]]{{.*}} : index
 // LOAD-ND:        arith.addi {{.*}} : index
 // LOAD-ND:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
 // LOAD-ND:        %[[IDX:.+]] = arith.addi %[[SPLAT]], %[[STEP]] : vector<8xindex>
-// LOAD-ND:        %[[COLLAPSE:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> index
+// LOAD-ND:        %[[COLLAPSE:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> index
 // LOAD-ND:        %[[COLLAPSE_I:.+]] = arith.index_cast %[[COLLAPSE]] : index to i64
 // LOAD-ND:        %[[VEC:.+]] = xegpu.load %[[COLLAPSE_I]]{{\[}}%[[IDX]]{{\]}}, %[[CST]] : i64, vector<8xindex>, vector<8xi1> -> vector<8xf16>
 
@@ -455,15 +455,15 @@ gpu.func @load_from_subview_1D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-GATHER-SAME:   %[[SRC:.+]]: memref<4096x4096xf16>,
 // LOAD-GATHER-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-GATHER:        %[[CST:.+]] = arith.constant dense<true> : vector<8xi1>
-// LOAD-GATHER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>> 
-// LOAD-GATHER:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> memref<f16>, index, index, index, index, index
+// LOAD-GATHER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>> 
+// LOAD-GATHER:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // LOAD-GATHER:        %[[STEP:.+]] = vector.step : vector<8xindex>
 // LOAD-GATHER:        arith.muli {{.*}} : index
 // LOAD-GATHER:        arith.addi %[[OFFSET]]{{.*}} : index
 // LOAD-GATHER:        arith.addi {{.*}} : index
 // LOAD-GATHER:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
 // LOAD-GATHER:        %[[IDX:.+]] = arith.addi %[[SPLAT]], %[[STEP]] : vector<8xindex>
-// LOAD-GATHER:        %[[COLLAPSE:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> index
+// LOAD-GATHER:        %[[COLLAPSE:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> index
 // LOAD-GATHER:        %[[COLLAPSE_I:.+]] = arith.index_cast %[[COLLAPSE]] : index to i64
 // LOAD-GATHER:        %[[VEC:.+]] = xegpu.load %[[COLLAPSE_I]]{{\[}}%[[IDX]]{{\]}}, %[[CST]] : i64, vector<8xindex>, vector<8xi1> -> vector<8xf16>
 }
@@ -472,9 +472,9 @@ gpu.func @load_from_subview_1D(%source: memref<4096x4096xf16>, %off1: index, %of
 gpu.module @xevm_module {
 gpu.func @load_from_subview_2D(%source: memref<4096x4096xf16>, %off1: index, %off2: index) -> vector<8x16xf16> {
   %c0 = arith.constant 0.0 : f16
-  %subview = memref.subview %source[%off1, %off2] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>>
+  %subview = memref.subview %source[%off1, %off2] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
   %0 = vector.transfer_read %subview[%off2, %off2], %c0
-    {in_bounds = [true, true]} : memref<256x256xf16, strided<[4096, 1], offset: ?>>, vector<8x16xf16>
+    {in_bounds = [true, true]} : memref<256x256xf16, strided<[4096, 1]>>, vector<8x16xf16>
   gpu.return %0 : vector<8x16xf16>
 }
 
@@ -482,7 +482,7 @@ gpu.func @load_from_subview_2D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-ND-SAME:   %[[SRC:.+]]: memref<4096x4096xf16>,
 // LOAD-ND-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-ND:        %[[ELEM_BYTES:.+]] = arith.constant 2 : index
-// LOAD-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>>
+// LOAD-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
 // LOAD-ND:        %[[BASE_BUFFER:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[SUBVIEW]]
 // LOAD-ND:        %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE_BUFFER]]
 // LOAD-ND:        %[[MUL:.*]] = arith.muli %[[OFFSET]], %[[ELEM_BYTES]] : index
@@ -497,8 +497,8 @@ gpu.func @load_from_subview_2D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-GATHER-SAME:   %[[SRC:.+]]: memref<4096x4096xf16>,
 // LOAD-GATHER-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-GATHER:        %[[CST:.+]] = arith.constant dense<true> : vector<8x16xi1>
-// LOAD-GATHER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>>
-// LOAD-GATHER:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> memref<f16>, index, index, index, index, index
+// LOAD-GATHER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
+// LOAD-GATHER:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // LOAD-GATHER-COUNT2: vector.step
 // LOAD-GATHER-COUNT2: vector.shape_cast
 // LOAD-GATHER-COUNT2: vector.broadcast
@@ -506,7 +506,7 @@ gpu.func @load_from_subview_2D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-GATHER-COUNT2: arith.addi {{.*}} : index
 // LOAD-GATHER:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8x16xindex>
 // LOAD-GATHER:        %[[IDX:.+]] = arith.addi %[[SPLAT]], {{.*}} : vector<8x16xindex>
-// LOAD-GATHER:        %[[COLLAPSE:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> index
+// LOAD-GATHER:        %[[COLLAPSE:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> index
 // LOAD-GATHER:        %[[COLLAPSE_I:.+]] = arith.index_cast %[[COLLAPSE]] : index to i64
 // LOAD-GATHER:        %[[VEC:.+]] = xegpu.load %[[COLLAPSE_I]]{{\[}}%[[IDX]]{{\]}}, %[[CST]] : i64, vector<8x16xindex>, vector<8x16xi1> -> vector<8x16xf16>
 }
diff --git a/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir
index 66da64225678e..d8ecc80497164 100644
--- a/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir
@@ -18,7 +18,7 @@ gpu.func @store_1D_vector(%vec: vector<8xf32>,
 // STORE-ND:       %[[ELEM_BYTES:.+]] = arith.constant 4 : index
 // STORE-ND:       %[[COLLAPSED:.+]] = memref.subview %[[SRC]][%[[OFFSET]], %[[OFFSET]], 0]
 // STORE-ND:       %[[BASE_BUFFER:.+]], %[[OFFSET1:.+]], %[[SIZES:.+]], %[[STRIDES:.+]] = memref.extract_strided_metadata %[[COLLAPSED]]
-// STORE-ND-SAME:    : memref<32xf32, strided<[1], offset: ?>> -> memref<f32>, index, index, index
+// STORE-ND-SAME:    : memref<32xf32, strided<[1]>> -> memref<f32>, index, index, index
 // STORE-ND:       %[[INTPTR:.+]] = memref.extract_aligned_pointer_as_index %[[BASE_BUFFER]]
 // STORE-ND-SAME:    : memref<f32> -> index
 // STORE-ND:       %[[MUL:.+]] = arith.muli %[[OFFSET1]], %[[ELEM_BYTES]] : index
@@ -247,10 +247,10 @@ gpu.func @no_store_tensor(%vec: vector<8x16xf32>,
 // -----
 gpu.module @xevm_module {
 gpu.func @no_store_non_unit_inner_stride(%vec: vector<8xf32>,
-    %source: memref<32xf32, strided<[?], offset: ?>>, %offset: index) {
+    %source: memref<32xf32, strided<[?]>>, %offset: index) {
   vector.transfer_write %vec, %source[%offset]
     {in_bounds = [true]}
-    : vector<8xf32>, memref<32xf32, strided<[?], offset: ?>>
+    : vector<8xf32>, memref<32xf32, strided<[?]>>
   gpu.return
 }
 
@@ -302,10 +302,10 @@ gpu.func @store_to_subview(%vec: vector<8xf16>,
     %source: memref<4096x4096xf16>, %off1: index, %off2: index) {
   %subview = memref.subview %source[%off1, %off2] [256, 256] [1, 1]
       : memref<4096x4096xf16>
-        to memref<256x256xf16, strided<[4096, 1], offset: ?>>
+        to memref<256x256xf16, strided<[4096, 1]>>
   vector.transfer_write %vec, %subview[%off2, %off2]
       {in_bounds = [true]}
-      : vector<8xf16>, memref<256x256xf16, strided<[4096, 1], offset: ?>>
+      : vector<8xf16>, memref<256x256xf16, strided<[4096, 1]>>
   gpu.return
 }
 // STORE-ND-LABEL:  @store_to_subview(
@@ -313,7 +313,7 @@ gpu.func @store_to_subview(%vec: vector<8xf16>,
 // STORE-ND-SAME:   %[[SRC:.+]]: memref<4096x4096xf16>,
 // STORE-ND-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // STORE-ND:        %[[ELEM_BYTES:.+]] = arith.constant 2 : index
-// STORE-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>>
+// STORE-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
 // STORE-ND:        %[[COLLAPSED:.+]] = memref.subview %[[SUBVIEW]][%[[OFF2]], 0]
 // STORE-ND:        %[[BASE_BUFFER:.*]], %[[OFFSET:.*]], %[[SIZES:.*]], %[[STRIDES:.*]] = memref.extract_strided_metadata %[[COLLAPSED]]
 // STORE-ND:        %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE_BUFFER]]
@@ -330,9 +330,9 @@ gpu.func @store_to_subview(%vec: vector<8xf16>,
 // STORE-SCATTER-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // STORE-SCATTER:        %[[CST:.+]] = arith.constant dense<true> : vector<8xi1>
 // STORE-SCATTER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1]
-// STORE-SCATTER-SAME:     : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1], offset: ?>>
+// STORE-SCATTER-SAME:     : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
 // STORE-SCATTER:        %[[BB:.+]], %[[OFFSET:.+]], {{.*}}, {{.*}} = memref.extract_strided_metadata %[[SUBVIEW]]
-// STORE-SCATTER-SAME:     : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> memref<f16>, index, index, index, index, index
+// STORE-SCATTER-SAME:     : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // STORE-SCATTER:        %[[STEP:.+]] = vector.step : vector<8xindex>
 // STORE-SCATTER:        arith.muli {{.*}} : index
 // STORE-SCATTER:        arith.addi %[[OFFSET]]{{.*}} : index
@@ -340,7 +340,7 @@ gpu.func @store_to_subview(%vec: vector<8xf16>,
 // STORE-SCATTER:        %[[SPLAT:.+]] = vector.broadcast {{.*}} : index to vector<8xindex>
 // STORE-SCATTER:        %[[IDX:.+]] = arith.addi %[[SPLAT]], %[[STEP]] : vector<8xindex>
 // STORE-SCATTER:        %[[COLLAPSE:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]]
-// STORE-SCATTER-SAME:     : memref<256x256xf16, strided<[4096, 1], offset: ?>> -> index
+// STORE-SCATTER-SAME:     : memref<256x256xf16, strided<[4096, 1]>> -> index
 // STORE-SCATTER:        %[[COLLAPSE_I:.+]] = arith.index_cast %[[COLLAPSE]] : index to i64
 // STORE-SCATTER:        xegpu.store %[[VEC]], %[[COLLAPSE_I]]{{\[}}%[[IDX]]{{\]}}, %[[CST]] : vector<8xf16>, i64, vector<8xindex>, vector<8xi1>
 }
diff --git a/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir b/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
index fa683175693be..83dbf36aa4a4b 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
@@ -39,16 +39,16 @@ gpu.module @test_kernel [#xevm.target<chip = "pvc">] {
     %c0 = arith.constant 0 : index
     %view = memref.view %arg0[%c0][]: memref<1024xi8, 3> to memref<64x32xf32, 3>
 
-    %subview = memref.subview %view[32, 0] [32, 32] [1, 1] : memref<64x32xf32, 3> to memref<32x32xf32, strided<[32, 1], offset: 1024>, 3>
+    %subview = memref.subview %view[32, 0] [32, 32] [1, 1] : memref<64x32xf32, 3> to memref<32x32xf32, strided<[32, 1]>, 3>
 
-    //CHECK: %[[intptr:.*]] = memref.extract_aligned_pointer_as_index %[[base_buffer:.*]] : memref<32x32xf32, strided<[32, 1], offset: 1024>, 3> -> index
+    //CHECK: %[[intptr:.*]] = memref.extract_aligned_pointer_as_index %[[base_buffer:.*]] : memref<32x32xf32, strided<[32, 1]>, 3> -> index
     //CHECK: %[[ptr_i32:.*]] = arith.index_castui %[[intptr]] : index to i32
     //CHECK: %[[offset_i32:.*]] = arith.index_castui %[[offset:.*]] : index to i32
     //CHECK: %[[c4_i32:.*]] = arith.constant 4 : i32
     //CHECK: %[[mul:.*]] = arith.muli %[[offset_i32]], %[[c4_i32]] : i32
     //CHECK: %[[add:.*]] = arith.addi %[[ptr_i32]], %[[mul]] : i32
 
-    %0 = xegpu.create_mem_desc %subview : memref<32x32xf32, strided<[32, 1], offset: 1024>, 3> -> !xegpu.mem_desc<32x32xf32>
+    %0 = xegpu.create_mem_desc %subview : memref<32x32xf32, strided<[32, 1]>, 3> -> !xegpu.mem_desc<32x32xf32>
 
     //CHECK: %[[TID:.*]] = gpu.thread_id x
     //CHECK: %[[C1:.*]] = arith.constant 1 : index
@@ -289,9 +289,9 @@ gpu.module @test_kernel [#xevm.target<chip = "pvc">] {
 
     %c0 = arith.constant 0 : index
 
-  %smem_coop_a = memref.subview %arg0[64, 0][1, 16][1, 1] : memref<256x16xbf16, 3> to memref<1x16xbf16, strided<[16, 1], offset: 1024>, 3>
+  %smem_coop_a = memref.subview %arg0[64, 0][1, 16][1, 1] : memref<256x16xbf16, 3> to memref<1x16xbf16, strided<[16, 1]>, 3>
 
-  //CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %{{.*}} : memref<1x16xbf16, strided<[16, 1], offset: 1024>, 3> -> index
+  //CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %{{.*}} : memref<1x16xbf16, strided<[16, 1]>, 3> -> index
   //CHECK: %[[C1024:.*]] = arith.constant 1024 : index
   //CHECK: %[[CAST0:.*]] = arith.index_castui %[[INTPTR]] : index to i32
   //CHECK: %[[CAST1:.*]] = arith.index_castui %[[C1024]] : index to i32
@@ -299,7 +299,7 @@ gpu.module @test_kernel [#xevm.target<chip = "pvc">] {
   //CHECK: %[[MUL:.*]] = arith.muli %[[CAST1]], %[[C2]] : i32
   //CHECK: %{{.*}} = arith.addi %[[CAST0]], %[[MUL]] : i32
 
-  %mdesc_coop_a = xegpu.create_mem_desc %smem_coop_a : memref<1x16xbf16, strided<[16, 1], offset: 1024>, 3> -> !xegpu.mem_desc<1x16xbf16>
+  %mdesc_coop_a = xegpu.create_mem_desc %smem_coop_a : memref<1x16xbf16, strided<[16, 1]>, 3> -> !xegpu.mem_desc<1x16xbf16>
 
   %ret = xegpu.load_matrix%mdesc_coop_a[%c0, %c0]: !xegpu.mem_desc<1x16xbf16>, index, index -> vector<1x16xbf16>
 
diff --git a/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir b/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir
index 39be929978d1e..0062a5638c0c6 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir
@@ -117,9 +117,9 @@ gpu.module @test {
 gpu.func @load_gather_from_dyn_memref_subview(%dyn: memref<?xf16>, %offset: vector<1xindex>, %mask: vector<1xi1>, %dst: memref<1xf16>) {
   %c0 = arith.constant 0 : index
   %id = gpu.subgroup_id : index
-  %src = memref.subview %dyn[%id][16][1] : memref<?xf16> to memref<16xf16, strided<[1], offset: ?>>
+  %src = memref.subview %dyn[%id][16][1] : memref<?xf16> to memref<16xf16, strided<[1]>>
 
-  // CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]], %[[STRIDES:.*]] = memref.extract_strided_metadata %{{.*}} : memref<16xf16, strided<[1], offset: ?>> -> memref<f16>, index, index, index
+  // CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]], %[[STRIDES:.*]] = memref.extract_strided_metadata %{{.*}} : memref<16xf16, strided<[1]>> -> memref<f16>, index, index, index
   // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE]] : memref<f16> -> index
   // CHECK: %[[CAST1:.*]] = arith.index_castui %[[INTPTR]] : index to i64
   // CHECK: %[[CAST2:.*]] = arith.index_castui %[[OFFSET]] : index to i64
@@ -130,7 +130,7 @@ gpu.func @load_gather_from_dyn_memref_subview(%dyn: memref<?xf16>, %offset: vect
   // CHECK: %{{.*}} = llvm.inttoptr %[[ADD2]] : i64 to !llvm.ptr<1>
 
   %0 = xegpu.load %src[%offset], %mask <{l1_hint = #xegpu.cache_hint<cached>, l2_hint = #xegpu.cache_hint<uncached>}>
-      : memref<16xf16, strided<[1], offset: ?>>, vector<1xindex>, vector<1xi1> -> vector<1xf16>
+      : memref<16xf16, strided<[1]>>, vector<1xindex>, vector<1xi1> -> vector<1xf16>
   vector.store %0, %dst[%c0] : memref<1xf16>, vector<1xf16>
   gpu.return
 }
diff --git a/mlir/test/Dialect/AMDGPU/amdgpu-fold-memrefs.mlir b/mlir/test/Dialect/AMDGPU/amdgpu-fold-memrefs.mlir
index 4fc6bc1846c3d..6cbdf5444327d 100644
--- a/mlir/test/Dialect/AMDGPU/amdgpu-fold-memrefs.mlir
+++ b/mlir/test/Dialect/AMDGPU/amdgpu-fold-memrefs.mlir
@@ -40,10 +40,10 @@ func.func @subview_folding_offset(%offset_i: index, %offset_j: index) {
 
   %alloc = memref.alloc() : memref<64x64xf16, #gpu_lds_addrspace>
   %mem = memref.alloc() : memref<64x128xf16>
-  %subview = memref.subview %mem[32, 64][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1], offset: 4160>>
+  %subview = memref.subview %mem[32, 64][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1]>>
   %c0 = arith.constant 0 : index
   amdgpu.gather_to_lds %subview[%offset_i, %offset_j], %alloc[%c0, %c0]
-    : vector<8xf16>, memref<32x64xf16, strided<[128, 1], offset: 4160>>, memref<64x64xf16, #gpu_lds_addrspace>
+    : vector<8xf16>, memref<32x64xf16, strided<[128, 1]>>, memref<64x64xf16, #gpu_lds_addrspace>
   func.return
 }
 
@@ -222,9 +222,9 @@ func.func @test_transpose_load_subview_offset(%offset_i: index, %offset_j: index
   %alloc = memref.alloc() : memref<64x128xf16, #gpu_wg>
   %subview = memref.subview %alloc[32, 64][32, 64][1, 1]
     : memref<64x128xf16, #gpu_wg>
-    to memref<32x64xf16, strided<[128, 1], offset: 4160>, #gpu_wg>
+    to memref<32x64xf16, strided<[128, 1]>, #gpu_wg>
   %result = amdgpu.transpose_load %subview[%offset_i, %offset_j]
-    : memref<32x64xf16, strided<[128, 1], offset: 4160>, #gpu_wg> -> vector<4xf16>
+    : memref<32x64xf16, strided<[128, 1]>, #gpu_wg> -> vector<4xf16>
   return %result : vector<4xf16>
 }
 
@@ -374,10 +374,10 @@ func.func @test_make_dma_base_both_fold(%mem: memref<64x128xf16, #gpu_global_add
   // CHECK: amdgpu.make_dma_base %[[MEM]][%[[GI]], %[[GJ]]], %[[LDS]][%[[IDX]]]
   // CHECK-SAME: memref<64x128xf16, #gpu.address_space<global>>, memref<4096xf16, #gpu.address_space<workgroup>> -> !amdgpu.tdm_base<f16>
 
-  %subview = memref.subview %mem[32, 64][32, 64][1, 1] : memref<64x128xf16, #gpu_global_addrspace> to memref<32x64xf16, strided<[128, 1], offset: 4160>, #gpu_global_addrspace>
+  %subview = memref.subview %mem[32, 64][32, 64][1, 1] : memref<64x128xf16, #gpu_global_addrspace> to memref<32x64xf16, strided<[128, 1]>, #gpu_global_addrspace>
   %expand_lds = memref.expand_shape %lds [[0, 1]] output_shape [64, 64] : memref<4096xf16, #gpu_lds_addrspace> into memref<64x64xf16, #gpu_lds_addrspace>
   %base = amdgpu.make_dma_base %subview[%global_i, %global_j], %expand_lds[%lds_i, %lds_j]
-    : memref<32x64xf16, strided<[128, 1], offset: 4160>, #gpu_global_addrspace>, memref<64x64xf16, #gpu_lds_addrspace> -> !amdgpu.tdm_base<f16>
+    : memref<32x64xf16, strided<[128, 1]>, #gpu_global_addrspace>, memref<64x64xf16, #gpu_lds_addrspace> -> !amdgpu.tdm_base<f16>
   func.return
 }
 
diff --git a/mlir/test/Dialect/AMDGPU/amdgpu-resolve-strided-metadata.mlir b/mlir/test/Dialect/AMDGPU/amdgpu-resolve-strided-metadata.mlir
index 831bb5f0f66ec..f4c0829b48cf7 100644
--- a/mlir/test/Dialect/AMDGPU/amdgpu-resolve-strided-metadata.mlir
+++ b/mlir/test/Dialect/AMDGPU/amdgpu-resolve-strided-metadata.mlir
@@ -1,10 +1,10 @@
 // RUN: mlir-opt -amdgpu-resolve-strided-metadata -split-input-file %s | FileCheck %s
 
-!tSrc = memref<?x?xi32, strided<[?, ?], offset: ?>>
-!tDst = memref<?x?xi32, strided<[?, ?], offset: ?>, #amdgpu.address_space<fat_raw_buffer>>
+!tSrc = memref<?x?xi32, strided<[?, ?]>>
+!tDst = memref<?x?xi32, strided<[?, ?]>, #amdgpu.address_space<fat_raw_buffer>>
 !tRes = memref<i32, #amdgpu.address_space<fat_raw_buffer>>
 // CHECK-LABEL: @resolve_metadata_no_offset_reset
-// CHECK-SAME: (%[[arg0:.*]]: memref<?x?xi32, strided<[?, ?], offset: ?>>)
+// CHECK-SAME: (%[[arg0:.*]]: memref<?x?xi32, strided<[?, ?]>>)
 // CHECK-NEXT: %[[cast:.+]] = amdgpu.fat_raw_buffer_cast %[[arg0]]
 // CHECK-NEXT: %{{.+}}, %[[offset:.+]], %[[size:.+]]:2, %[[stride:.+]]:2 = memref.extract_strided_metadata %[[arg0]]
 // CHECK-NEXT: %[[reinterp:.+]] = memref.reinterpret_cast %[[cast]]
@@ -17,11 +17,11 @@ func.func @resolve_metadata_no_offset_reset(%arg0: !tSrc) -> (!tRes, index, inde
 
 // -----
 
-!tSrc = memref<?x?xi32, strided<[?, ?], offset: ?>>
+!tSrc = memref<?x?xi32, strided<[?, ?]>>
 !tDst = memref<?x?xi32, strided<[?, ?]>, #amdgpu.address_space<fat_raw_buffer>>
 !tRes = memref<i32, #amdgpu.address_space<fat_raw_buffer>>
 // CHECK-LABEL: @resolve_metadata_offset_reset
-// CHECK-SAME: (%[[arg0:.*]]: memref<?x?xi32, strided<[?, ?], offset: ?>>)
+// CHECK-SAME: (%[[arg0:.*]]: memref<?x?xi32, strided<[?, ?]>>)
 // CHECK-NEXT: %[[offset:.+]] = arith.constant 0 : index
 // CHECK-NEXT: %[[cast:.+]] = amdgpu.fat_raw_buffer_cast %[[arg0]]
 // CHECK-NEXT: %{{.+}}, %{{.+}}, %[[size:.+]]:2, %[[stride:.+]]:2 = memref.extract_strided_metadata %[[arg0]]
@@ -35,11 +35,11 @@ func.func @resolve_metadata_offset_reset(%arg0: !tSrc) -> (!tRes, index, index,
 
 // -----
 
-!tSrc = memref<?x?xi32, strided<[?, ?], offset: ?>>
+!tSrc = memref<?x?xi32, strided<[?, ?]>>
 !tDst = memref<?x?xi32, strided<[?, ?]>, #amdgpu.address_space<fat_raw_buffer>>
 !tRes = memref<i32, #amdgpu.address_space<fat_raw_buffer>>
 // CHECK-LABEL: @resolve_metadata_no_base_ptr
-// CHECK-SAME: (%[[arg0:.*]]: memref<?x?xi32, strided<[?, ?], offset: ?>>)
+// CHECK-SAME: (%[[arg0:.*]]: memref<?x?xi32, strided<[?, ?]>>)
 // CHECK-NEXT: %[[offset:.+]] = arith.constant 0 : index
 // CHECK-NEXT: %[[cast:.+]] = amdgpu.fat_raw_buffer_cast %[[arg0]]
 // CHECK-NEXT: %{{.+}}, %{{.+}}, %[[size:.+]]:2, %[[stride:.+]]:2 = memref.extract_strided_metadata %[[arg0]]
diff --git a/mlir/test/Dialect/AMDGPU/invalid.mlir b/mlir/test/Dialect/AMDGPU/invalid.mlir
index d7d449bd8a579..f00f78465d1dc 100644
--- a/mlir/test/Dialect/AMDGPU/invalid.mlir
+++ b/mlir/test/Dialect/AMDGPU/invalid.mlir
@@ -221,9 +221,9 @@ func.func @wmma_unsignedB_float(%arg0 : vector<8xf16>, %arg1 : vector<8xf32>) ->
 // -----
 
 // Missing `resetOffset`
-func.func @fat_raw_buffer_cast_stripped_offset(%m: memref<8xi32, strided<[1], offset: ?>, #gpu.address_space<global>>) -> memref<8xi32, #amdgpu.address_space<fat_raw_buffer>> {
-  // expected-error at +1 {{'amdgpu.fat_raw_buffer_cast' op expected result type to be 'memref<8xi32, strided<[1], offset: ?>, #amdgpu.address_space<fat_raw_buffer>>' but got 'memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>'}}
-  %ret = amdgpu.fat_raw_buffer_cast %m : memref<8xi32, strided<[1], offset: ?>, #gpu.address_space<global>> to memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
+func.func @fat_raw_buffer_cast_stripped_offset(%m: memref<8xi32, strided<[1]>, #gpu.address_space<global>>) -> memref<8xi32, #amdgpu.address_space<fat_raw_buffer>> {
+  // expected-error at +1 {{'amdgpu.fat_raw_buffer_cast' op expected result type to be 'memref<8xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>>' but got 'memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>'}}
+  %ret = amdgpu.fat_raw_buffer_cast %m : memref<8xi32, strided<[1]>, #gpu.address_space<global>> to memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
   func.return %ret : memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
 }
 
diff --git a/mlir/test/Dialect/AMDGPU/ops.mlir b/mlir/test/Dialect/AMDGPU/ops.mlir
index 6f4dd486610cc..5ba7df6890296 100644
--- a/mlir/test/Dialect/AMDGPU/ops.mlir
+++ b/mlir/test/Dialect/AMDGPU/ops.mlir
@@ -415,53 +415,53 @@ func.func @fat_raw_buffer_cast_easy(%m: memref<8xi32>) -> memref<8xi32, #amdgpu.
 // CHECK-SAME: cacheSwizzleStride(%{{[^)]*}})
 // CHECK-SAME: boundsCheck(false)
 // CHECK-SAME: resetOffset
-func.func @fat_raw_buffer_cast(%m: memref<8xi32, strided<[1], offset: ?>>, %validBytes: i64, %cacheSwizzle: i14) -> memref<8xi32, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast(%m: memref<8xi32, strided<[1]>>, %validBytes: i64, %cacheSwizzle: i14) -> memref<8xi32, #amdgpu.address_space<fat_raw_buffer>> {
   %ret = amdgpu.fat_raw_buffer_cast %m validBytes(%validBytes) cacheSwizzleStride(%cacheSwizzle) boundsCheck(false) resetOffset
-    : memref<8xi32, strided<[1], offset: ?>> to memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
+    : memref<8xi32, strided<[1]>> to memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
   func.return %ret : memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_dynamic_1d_reset_offset
 // CHECK: amdgpu.fat_raw_buffer_cast
-func.func @fat_raw_buffer_cast_dynamic_1d_reset_offset(%m: memref<?xi32, strided<[1], offset: ?>>) -> memref<?xi32, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast_dynamic_1d_reset_offset(%m: memref<?xi32, strided<[1]>>) -> memref<?xi32, #amdgpu.address_space<fat_raw_buffer>> {
   %ret = amdgpu.fat_raw_buffer_cast %m resetOffset
-    : memref<?xi32, strided<[1], offset: ?>> to memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
+    : memref<?xi32, strided<[1]>> to memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
   func.return %ret : memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_dynamic_0d_reset_offset
 // CHECK: %[[ret:.+]] = amdgpu.fat_raw_buffer_cast
 // CHECK: return %[[ret]]
-func.func @fat_raw_buffer_cast_dynamic_0d_reset_offset(%m: memref<i32, strided<[], offset: ?>>) -> memref<i32, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast_dynamic_0d_reset_offset(%m: memref<i32, strided<[]>>) -> memref<i32, #amdgpu.address_space<fat_raw_buffer>> {
   %ret = amdgpu.fat_raw_buffer_cast %m resetOffset
-    : memref<i32, strided<[], offset: ?>> to memref<i32, #amdgpu.address_space<fat_raw_buffer>>
+    : memref<i32, strided<[]>> to memref<i32, #amdgpu.address_space<fat_raw_buffer>>
   func.return %ret : memref<i32, #amdgpu.address_space<fat_raw_buffer>>
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_static_shape_2d_reset_offset
 // CHECK: %[[ret:.+]] = amdgpu.fat_raw_buffer_cast
 // CHECK: return %[[ret]]
-func.func @fat_raw_buffer_cast_static_shape_2d_reset_offset(%m: memref<4x4xi32, strided<[4, 1], offset: ?>>) -> memref<4x4xi32, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast_static_shape_2d_reset_offset(%m: memref<4x4xi32, strided<[4, 1]>>) -> memref<4x4xi32, #amdgpu.address_space<fat_raw_buffer>> {
   %ret = amdgpu.fat_raw_buffer_cast %m resetOffset
-    : memref<4x4xi32, strided<[4, 1], offset: ?>> to memref<4x4xi32, #amdgpu.address_space<fat_raw_buffer>>
+    : memref<4x4xi32, strided<[4, 1]>> to memref<4x4xi32, #amdgpu.address_space<fat_raw_buffer>>
   func.return %ret : memref<4x4xi32, #amdgpu.address_space<fat_raw_buffer>>
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_dynamic_2d_reset_offset
 // CHECK: %[[ret:.+]] = amdgpu.fat_raw_buffer_cast
 // CHECK: return %[[ret]]
-func.func @fat_raw_buffer_cast_dynamic_2d_reset_offset(%m: memref<?x?xi32, strided<[?, 1], offset: ?>>) -> memref<?x?xi32, strided<[?, 1]>, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast_dynamic_2d_reset_offset(%m: memref<?x?xi32, strided<[?, 1]>>) -> memref<?x?xi32, strided<[?, 1]>, #amdgpu.address_space<fat_raw_buffer>> {
   %ret = amdgpu.fat_raw_buffer_cast %m resetOffset
-    : memref<?x?xi32, strided<[?, 1], offset: ?>> to memref<?x?xi32, strided<[?, 1]>, #amdgpu.address_space<fat_raw_buffer>>
+    : memref<?x?xi32, strided<[?, 1]>> to memref<?x?xi32, strided<[?, 1]>, #amdgpu.address_space<fat_raw_buffer>>
   func.return %ret : memref<?x?xi32, strided<[?, 1]>, #amdgpu.address_space<fat_raw_buffer>>
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_noncontiguous_2d_reset_offset
 // CHECK: %[[ret:.+]] = amdgpu.fat_raw_buffer_cast
 // CHECK: return %[[ret]]
-func.func @fat_raw_buffer_cast_noncontiguous_2d_reset_offset(%m: memref<4x4xi32, strided<[8, 1], offset: ?>>) -> memref<4x4xi32, strided<[8, 1]>, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast_noncontiguous_2d_reset_offset(%m: memref<4x4xi32, strided<[8, 1]>>) -> memref<4x4xi32, strided<[8, 1]>, #amdgpu.address_space<fat_raw_buffer>> {
   %ret = amdgpu.fat_raw_buffer_cast %m resetOffset
-    : memref<4x4xi32, strided<[8, 1], offset: ?>> to memref<4x4xi32, strided<[8, 1]>, #amdgpu.address_space<fat_raw_buffer>>
+    : memref<4x4xi32, strided<[8, 1]>> to memref<4x4xi32, strided<[8, 1]>, #amdgpu.address_space<fat_raw_buffer>>
   func.return %ret : memref<4x4xi32, strided<[8, 1]>, #amdgpu.address_space<fat_raw_buffer>>
 }
 
diff --git a/mlir/test/Dialect/Affine/fold-memref-alias-ops.mlir b/mlir/test/Dialect/Affine/fold-memref-alias-ops.mlir
index 5e3e107531802..33e12f4c88fb4 100644
--- a/mlir/test/Dialect/Affine/fold-memref-alias-ops.mlir
+++ b/mlir/test/Dialect/Affine/fold-memref-alias-ops.mlir
@@ -6,12 +6,12 @@
 
 // CHECK-LABEL: func @fold_static_stride_subview_with_affine_load_store
 func.func @fold_static_stride_subview_with_affine_load_store(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index) -> f32 {
-  %0 = memref.subview %arg0[%arg1, %arg2][4, 4][2, 3] : memref<12x32xf32> to memref<4x4xf32, strided<[64, 3], offset: ?>>
-  %1 = affine.load %0[%arg3, %arg4] : memref<4x4xf32, strided<[64, 3], offset: ?>>
+  %0 = memref.subview %arg0[%arg1, %arg2][4, 4][2, 3] : memref<12x32xf32> to memref<4x4xf32, strided<[64, 3]>>
+  %1 = affine.load %0[%arg3, %arg4] : memref<4x4xf32, strided<[64, 3]>>
   // CHECK-NEXT: affine.apply
   // CHECK-NEXT: affine.apply
   // CHECK-NEXT: affine.load
-  affine.store %1, %0[%arg3, %arg4] : memref<4x4xf32, strided<[64, 3], offset: ?>>
+  affine.store %1, %0[%arg3, %arg4] : memref<4x4xf32, strided<[64, 3]>>
   // CHECK-NEXT: affine.apply
   // CHECK-NEXT: affine.apply
   // CHECK-NEXT: affine.store
@@ -93,14 +93,14 @@ func.func @fold_static_stride_subview_with_affine_load_store_expand_shape_3d(%ar
 // CHECK-LABEL: fold_memref_alias_expand_shape_subview_load_store_dynamic_dim
 // CHECK-SAME: (%[[ARG0:.*]]: memref<2048x16xf32>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index, %[[ARG3:.*]]: index, %[[ARG4:.*]]: index)
 func.func @fold_memref_alias_expand_shape_subview_load_store_dynamic_dim(%alloc: memref<2048x16xf32>, %c10: index, %c5: index, %c0: index, %sz0: index) {
-  %subview = memref.subview %alloc[%c5, 0] [%c10, 16] [1, 1] : memref<2048x16xf32> to memref<?x16xf32, strided<[16, 1], offset: ?>>
-  %expand_shape = memref.expand_shape %subview [[0], [1, 2, 3]] output_shape [%sz0, 1, 8, 2] : memref<?x16xf32, strided<[16, 1], offset: ?>> into memref<?x1x8x2xf32, strided<[16, 16, 2, 1], offset: ?>>
-  %dim = memref.dim %expand_shape, %c0 : memref<?x1x8x2xf32, strided<[16, 16, 2, 1], offset: ?>>
+  %subview = memref.subview %alloc[%c5, 0] [%c10, 16] [1, 1] : memref<2048x16xf32> to memref<?x16xf32, strided<[16, 1]>>
+  %expand_shape = memref.expand_shape %subview [[0], [1, 2, 3]] output_shape [%sz0, 1, 8, 2] : memref<?x16xf32, strided<[16, 1]>> into memref<?x1x8x2xf32, strided<[16, 16, 2, 1]>>
+  %dim = memref.dim %expand_shape, %c0 : memref<?x1x8x2xf32, strided<[16, 16, 2, 1]>>
 
   affine.for %arg6 = 0 to %dim step 64 {
     affine.for %arg7 = 0 to 16 step 16 {
-      %dummy_load = affine.load %expand_shape[%arg6, 0, %arg7, %arg7] : memref<?x1x8x2xf32, strided<[16, 16, 2, 1], offset: ?>>
-      affine.store %dummy_load, %subview[%arg6, %arg7] : memref<?x16xf32, strided<[16, 1], offset: ?>>
+      %dummy_load = affine.load %expand_shape[%arg6, 0, %arg7, %arg7] : memref<?x1x8x2xf32, strided<[16, 16, 2, 1]>>
+      affine.store %dummy_load, %subview[%arg6, %arg7] : memref<?x16xf32, strided<[16, 1]>>
     }
   }
   return
@@ -108,7 +108,7 @@ func.func @fold_memref_alias_expand_shape_subview_load_store_dynamic_dim(%alloc:
 // CHECK-NEXT:   %[[C0:.*]] = arith.constant 0
 // CHECK-NEXT:   memref.subview
 // CHECK-NEXT:   %[[EXPAND_SHAPE:.*]] = memref.expand_shape
-// CHECK-NEXT:   %[[DIM:.*]] = memref.dim %[[EXPAND_SHAPE]], %[[ARG3]] : memref<?x1x8x2xf32, strided<[16, 16, 2, 1], offset: ?>>
+// CHECK-NEXT:   %[[DIM:.*]] = memref.dim %[[EXPAND_SHAPE]], %[[ARG3]] : memref<?x1x8x2xf32, strided<[16, 16, 2, 1]>>
 // CHECK-NEXT:   affine.for %[[ARG5:.*]] = 0 to %[[DIM]] step 64 {
 // CHECK-NEXT:   affine.for %[[ARG6:.*]] = 0 to 16 step 16 {
 // CHECK-NEXT:   %[[VAL0:.*]] = affine.linearize_index disjoint [%[[C0]], %[[ARG6]], %[[ARG6]]] by (1, 8, 2)
diff --git a/mlir/test/Dialect/Affine/loop-fusion-4.mlir b/mlir/test/Dialect/Affine/loop-fusion-4.mlir
index cf530016c201a..db054e705a42d 100644
--- a/mlir/test/Dialect/Affine/loop-fusion-4.mlir
+++ b/mlir/test/Dialect/Affine/loop-fusion-4.mlir
@@ -439,7 +439,7 @@ func.func @non_int_memory_space() {
 // (reduction along %arg4) and fuse.
 
 // PRODUCER-CONSUMER-LABEL: func @slice_compute_check
-func.func @slice_compute_check(%arg0: memref<1x8x26xi32, strided<[?, ?, ?], offset: ?>>, %arg1: memref<1x8x26xi32, strided<[?, ?, ?], offset: ?>>, %arg2: memref<1x8x26xi32, strided<[?, ?, ?], offset: ?>>) {
+func.func @slice_compute_check(%arg0: memref<1x8x26xi32, strided<[?, ?, ?]>>, %arg1: memref<1x8x26xi32, strided<[?, ?, ?]>>, %arg2: memref<1x8x26xi32, strided<[?, ?, ?]>>) {
   %alloc_14 = memref.alloc() : memref<1x8x26xi32>
   %alloc_15 = memref.alloc() : memref<1x26xi32>
   affine.for %arg3 = 0 to 1 {
@@ -690,8 +690,8 @@ module {
       }
     }
     %alloc_3 = memref.alloc() {alignment = 64 : i64} : memref<3x10x7x6xf32>
-    %subview = memref.subview %alloc_3[0, 2, 1, 0] [3, 7, 5, 6] [1, 1, 1, 1] : memref<3x10x7x6xf32> to memref<3x7x5x6xf32, strided<[420, 42, 6, 1], offset: 90>>
-    memref.copy %alloc, %subview : memref<3x7x5x6xf32> to memref<3x7x5x6xf32, strided<[420, 42, 6, 1], offset: 90>>
+    %subview = memref.subview %alloc_3[0, 2, 1, 0] [3, 7, 5, 6] [1, 1, 1, 1] : memref<3x10x7x6xf32> to memref<3x7x5x6xf32, strided<[420, 42, 6, 1]>>
+    memref.copy %alloc, %subview : memref<3x7x5x6xf32> to memref<3x7x5x6xf32, strided<[420, 42, 6, 1]>>
     %alloc_4 = memref.alloc() {alignment = 64 : i64} : memref<3x10x3x6x1xf32>
     affine.for %arg0 = 0 to 3 {
       affine.for %arg1 = 0 to 10 {
diff --git a/mlir/test/Dialect/Affine/memref-stride-calculation.mlir b/mlir/test/Dialect/Affine/memref-stride-calculation.mlir
index 29a5f5e0d5f44..c59128a37dd0e 100644
--- a/mlir/test/Dialect/Affine/memref-stride-calculation.mlir
+++ b/mlir/test/Dialect/Affine/memref-stride-calculation.mlir
@@ -17,7 +17,7 @@ func.func @f(%0: index) {
 
   %11 = memref.alloc() : memref<3x4x5xf32, affine_map<(i, j, k)->(i, j, k)>>
 // CHECK: MemRefType offset: 0 strides: 20, 5, 1
-  %b11 = memref.alloc() : memref<3x4x5xf32, strided<[20, 5, 1], offset: 0>>
+  %b11 = memref.alloc() : memref<3x4x5xf32, strided<[20, 5, 1]>>
 // CHECK: MemRefType offset: 0 strides: 20, 5, 1
   %12 = memref.alloc(%0) : memref<3x4x?xf32, affine_map<(i, j, k)->(i, j, k)>>
 // CHECK: MemRefType offset: 0 strides: ?, ?, 1
@@ -34,19 +34,19 @@ func.func @f(%0: index) {
 // CHECK: MemRefType offset: 1 strides: 32, 16, ?
   %22 = memref.alloc()[%0] : memref<3x4x5xf32, affine_map<(i, j, k)[M]->(32 * i + M * j + 16 * k + 3)>>
 // CHECK: MemRefType offset: 3 strides: 32, ?, 16
-  %b22 = memref.alloc(%0)[%0, %0] : memref<3x4x?xf32, strided<[?, ?, 1], offset: 0>>
+  %b22 = memref.alloc(%0)[%0, %0] : memref<3x4x?xf32, strided<[?, ?, 1]>>
 // CHECK: MemRefType offset: 0 strides: ?, ?, 1
   %23 = memref.alloc(%0)[%0] : memref<3x?x5xf32, affine_map<(i, j, k)[M]->(M * i + 32 * j + 16 * k + 7)>>
 // CHECK: MemRefType offset: 7 strides: ?, 32, 16
-  %b23 = memref.alloc(%0)[%0] : memref<3x?x5xf32, strided<[?, 5, 1], offset: 0>>
+  %b23 = memref.alloc(%0)[%0] : memref<3x?x5xf32, strided<[?, 5, 1]>>
 // CHECK: MemRefType offset: 0 strides: ?, 5, 1
   %24 = memref.alloc(%0)[%0] : memref<3x?x5xf32, affine_map<(i, j, k)[M]->(M * i + 32 * j + 16 * k + M)>>
 // CHECK: MemRefType offset: ? strides: ?, 32, 16
-  %b24 = memref.alloc(%0)[%0, %0] : memref<3x?x5xf32, strided<[?, 32, 16], offset: ?>>
+  %b24 = memref.alloc(%0)[%0, %0] : memref<3x?x5xf32, strided<[?, 32, 16]>>
 // CHECK: MemRefType offset: ? strides: ?, 32, 16
   %25 = memref.alloc(%0, %0)[%0, %0] : memref<?x?x16xf32, affine_map<(i, j, k)[M, N]->(M * i + N * j + k + 1)>>
 // CHECK: MemRefType offset: 1 strides: ?, ?, 1
-  %b25 = memref.alloc(%0, %0)[%0, %0] : memref<?x?x16xf32, strided<[?, ?, 1], offset: 1>>
+  %b25 = memref.alloc(%0, %0)[%0, %0] : memref<?x?x16xf32, strided<[?, ?, 1]>>
 // CHECK: MemRefType offset: 1 strides: ?, ?, 1
   %26 = memref.alloc(%0)[] : memref<?xf32, affine_map<(i)[M]->(i)>>
 // CHECK: MemRefType offset: 0 strides: 1
diff --git a/mlir/test/Dialect/Affine/ops.mlir b/mlir/test/Dialect/Affine/ops.mlir
index 35b07c1c7fe1f..53c089eca20e8 100644
--- a/mlir/test/Dialect/Affine/ops.mlir
+++ b/mlir/test/Dialect/Affine/ops.mlir
@@ -109,8 +109,8 @@ func.func @valid_symbols(%arg0: index, %arg1: index, %arg2: index) {
     affine.for %arg4 = 0 to %13 step 264 {
       %18 = memref.dim %0, %c0 : memref<?x?xf32>
       %20 = memref.subview %0[%c0, %c0][%18,%arg4][%c1,%c1] : memref<?x?xf32>
-                          to memref<?x?xf32, strided<[?, ?], offset: ?>>
-      %24 = memref.dim %20, %c0 : memref<?x?xf32, strided<[?, ?], offset: ?>>
+                          to memref<?x?xf32, strided<[?, ?]>>
+      %24 = memref.dim %20, %c0 : memref<?x?xf32, strided<[?, ?]>>
       affine.for %arg5 = 0 to %24 step 768 {
         "foo"() : () -> ()
       }
diff --git a/mlir/test/Dialect/ArmSME/vector-legalization.mlir b/mlir/test/Dialect/ArmSME/vector-legalization.mlir
index 6cdf576272ebc..50a94449cf37d 100644
--- a/mlir/test/Dialect/ArmSME/vector-legalization.mlir
+++ b/mlir/test/Dialect/ArmSME/vector-legalization.mlir
@@ -415,10 +415,10 @@ func.func @lift_illegal_transpose_to_memory(%a: index, %b: index, %memref: memre
   // CHECK-DAG: %[[C0_F32:.*]] = arith.constant 0.000000e+00 : f32
   // CHECK-DAG: %[[VSCALE:.*]] = vector.vscale
   // CHECK-DAG: %[[C8_VSCALE:.*]] = arith.muli %[[VSCALE]], %[[C8]] : index
-  // CHECK-NEXT: %[[READ_SUBVIEW:.*]] = memref.subview %[[MEMREF]][%[[INDEXA]], %[[INDEXB]]] [%[[C8_VSCALE]], 4] [1, 1] : memref<?x?xf32> to memref<?x4xf32, strided<[?, 1], offset: ?>>
-  // CHECK-NEXT: %[[CAST:.*]] = memref.cast %[[READ_SUBVIEW]] : memref<?x4xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  // CHECK-NEXT: %[[TRANSPOSE:.*]] = memref.transpose %[[CAST]] (d0, d1) -> (d1, d0) : memref<?x?xf32, strided<[?, ?], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  // CHECK-NEXT: %[[LEGAL_READ:.*]]  = vector.transfer_read %[[TRANSPOSE]][%c0, %c0], %[[C0_F32]] : memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<4x[8]xf32>
+  // CHECK-NEXT: %[[READ_SUBVIEW:.*]] = memref.subview %[[MEMREF]][%[[INDEXA]], %[[INDEXB]]] [%[[C8_VSCALE]], 4] [1, 1] : memref<?x?xf32> to memref<?x4xf32, strided<[?, 1]>>
+  // CHECK-NEXT: %[[CAST:.*]] = memref.cast %[[READ_SUBVIEW]] : memref<?x4xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
+  // CHECK-NEXT: %[[TRANSPOSE:.*]] = memref.transpose %[[CAST]] (d0, d1) -> (d1, d0) : memref<?x?xf32, strided<[?, ?]>> to memref<?x?xf32, strided<[?, ?]>>
+  // CHECK-NEXT: %[[LEGAL_READ:.*]]  = vector.transfer_read %[[TRANSPOSE]][%c0, %c0], %[[C0_F32]] : memref<?x?xf32, strided<[?, ?]>>, vector<4x[8]xf32>
   // CHECK-NEXT: return %[[LEGAL_READ]]
   %pad = arith.constant 0.0 : f32
   %illegalRead = vector.transfer_read %memref[%a, %b], %pad : memref<?x?xf32>, vector<[8]x4xf32>
@@ -438,7 +438,7 @@ func.func @lift_illegal_transpose_to_memory_with_mask(%dim0: index, %dim1: index
   // CHECK-DAG: %[[TRANSPOSE:.*]] = memref.transpose %[[CAST]]
   // CHECK-DAG: %[[MASK:.*]] = vector.create_mask %[[DIM1]], %[[DIM0]] : vector<4x[8]xi1>
   // CHECK:     %[[LEGAL_READ:.*]] = vector.transfer_read %[[TRANSPOSE]]
-  // CHECK-SAME:                       %[[MASK]] : memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<4x[8]xf32>
+  // CHECK-SAME:                       %[[MASK]] : memref<?x?xf32, strided<[?, ?]>>, vector<4x[8]xf32>
   // CHECK-NEXT: return %[[LEGAL_READ]]
   %pad = arith.constant 0.0 : f32
   %mask = vector.create_mask %dim0, %dim1 : vector<[8]x4xi1>
diff --git a/mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-subviews.mlir b/mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-subviews.mlir
index 35523319de154..a5deaa95c3f7c 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-subviews.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-subviews.mlir
@@ -6,12 +6,12 @@
 
 // CHECK-LABEL: func @subview
 func.func @subview(%arg0 : index, %arg1 : index, %arg2 : memref<?x?xf32>) {
-  %0 = memref.alloc() : memref<64x4xf32, strided<[4, 1], offset: 0>>
+  %0 = memref.alloc() : memref<64x4xf32, strided<[4, 1]>>
   %1 = memref.subview %0[%arg0, %arg1][%arg0, %arg1][%arg0, %arg1] :
-    memref<64x4xf32, strided<[4, 1], offset: 0>>
-  to memref<?x?xf32, strided<[?, ?], offset: ?>>
+    memref<64x4xf32, strided<[4, 1]>>
+  to memref<?x?xf32, strided<[?, ?]>>
   test.copy(%1, %arg2) :
-    (memref<?x?xf32, strided<[?, ?], offset: ?>>, memref<?x?xf32>)
+    (memref<?x?xf32, strided<[?, ?]>>, memref<?x?xf32>)
   return
 }
 
diff --git a/mlir/test/Dialect/Bufferization/Transforms/buffer-deallocation-simplification.mlir b/mlir/test/Dialect/Bufferization/Transforms/buffer-deallocation-simplification.mlir
index b40a17cf800bf..14bbe4813628e 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/buffer-deallocation-simplification.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/buffer-deallocation-simplification.mlir
@@ -6,9 +6,9 @@ func.func @dealloc_deallocated_in_retained(%arg0: memref<2xi32>, %arg1: i1, %arg
   %2:2 = bufferization.dealloc (%arg0 : memref<2xi32>) if (%arg1) retain (%arg0, %arg2 : memref<2xi32>, memref<2xi32>)
   // multiple must-alias
   %3 = memref.subview %arg0[0][1][1] : memref<2xi32> to memref<i32>
-  %4 = memref.subview %arg0[1][1][1] : memref<2xi32> to memref<1xi32, strided<[1], offset: 1>>
+  %4 = memref.subview %arg0[1][1][1] : memref<2xi32> to memref<1xi32, strided<[1]>>
   %alloc = memref.alloc() : memref<2xi32>
-  %5:3 = bufferization.dealloc (%arg0, %4 : memref<2xi32>, memref<1xi32, strided<[1], offset: 1>>) if (%arg1, %arg3) retain (%arg0, %alloc, %3 : memref<2xi32>, memref<2xi32>, memref<i32>)
+  %5:3 = bufferization.dealloc (%arg0, %4 : memref<2xi32>, memref<1xi32, strided<[1]>>) if (%arg1, %arg3) retain (%arg0, %alloc, %3 : memref<2xi32>, memref<2xi32>, memref<i32>)
   return %0, %1, %2#0, %2#1, %5#0, %5#1, %5#2 : i1, i1, i1, i1, i1, i1, i1
 }
 
@@ -37,9 +37,9 @@ func.func @dealloc_deallocated_in_retained_extract_base_memref(%arg0: memref<2xi
   %2:2 = bufferization.dealloc (%base_buffer : memref<i32>) if (%arg1) retain (%arg0, %arg2 : memref<2xi32>, memref<2xi32>)
   // multiple must-alias
   %3 = memref.subview %arg0[0][1][1] : memref<2xi32> to memref<i32>
-  %4 = memref.subview %arg0[1][1][1] : memref<2xi32> to memref<1xi32, strided<[1], offset: 1>>
+  %4 = memref.subview %arg0[1][1][1] : memref<2xi32> to memref<1xi32, strided<[1]>>
   %alloc = memref.alloc() : memref<2xi32>
-  %5:3 = bufferization.dealloc (%base_buffer, %4 : memref<i32>, memref<1xi32, strided<[1], offset: 1>>) if (%arg1, %arg3) retain (%arg0, %alloc, %3 : memref<2xi32>, memref<2xi32>, memref<i32>)
+  %5:3 = bufferization.dealloc (%base_buffer, %4 : memref<i32>, memref<1xi32, strided<[1]>>) if (%arg1, %arg3) retain (%arg0, %alloc, %3 : memref<2xi32>, memref<2xi32>, memref<i32>)
   return %0, %1, %2#0, %2#1, %5#0, %5#1, %5#2 : i1, i1, i1, i1, i1, i1, i1
 }
 
diff --git a/mlir/test/Dialect/Bufferization/Transforms/drop-equivalent-buffer-results.mlir b/mlir/test/Dialect/Bufferization/Transforms/drop-equivalent-buffer-results.mlir
index b20188af43bf5..a6681b882a7fa 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/drop-equivalent-buffer-results.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/drop-equivalent-buffer-results.mlir
@@ -6,14 +6,14 @@
 // CHECK-LABEL: func private @single_buffer_return({{.*}}) {
 // CHECK: return
 
-!type = memref<?xf32, strided<[?], offset: ?>>
+!type = memref<?xf32, strided<[?]>>
 func.func private @single_buffer_return(%buf: !type, %val: f32, %idx: index) -> !type {
   memref.store %val, %buf[%idx] : !type
   return %buf : !type
 }
 
 // CHECK-LABEL: func @caller(
-// CHECK-SAME:     %[[BUF:.+]]: memref<?xf32, strided<[?], offset: ?>>,
+// CHECK-SAME:     %[[BUF:.+]]: memref<?xf32, strided<[?]>>,
 // CHECK: call @single_buffer_return(%[[BUF]]{{.*}}-> ()
 // CHECK: %[[LOADED:.+]] = memref.load %[[BUF]]
 // CHECK: return %[[LOADED]]
@@ -29,7 +29,7 @@ func.func @caller(%buf: !type, %val: f32, %idx: index) -> f32 {
 // CHECK-LABEL: func private @multiple_buffer_returns({{.*}}) {
 // CHECK: return
 
-!type = memref<?xf32, strided<[?], offset: ?>>
+!type = memref<?xf32, strided<[?]>>
 !type1 = memref<?x?xf32>
 func.func private @multiple_buffer_returns(
     %buf: !type, %buf1: !type1, %val: f32, %idx: index) -> (!type1, !type) {
@@ -44,7 +44,7 @@ func.func private @multiple_buffer_returns(
 // CHECK: %[[CST:.+]] = arith.constant 1 : i32
 // CHECK: return %[[CST]] : i32
 
-!type = memref<?xf32, strided<[?], offset: ?>>
+!type = memref<?xf32, strided<[?]>>
 !type1 = memref<?x?xf32>
 func.func private @multiple_mixed_returns(
     %buf: !type, %buf1: !type1, %val: f32, %idx: index) -> (!type1, i32, !type) {
@@ -58,17 +58,17 @@ func.func private @multiple_mixed_returns(
 
 // Ensure public functions remain unchanged by default.
 // CHECK-LABEL: func @public_function(
-// CHECK-SAME:    %[[BUF:.+]]: memref<?xf32, strided<[?], offset: ?>>,
-// CHECK-SAME:    ) -> memref<?xf32, strided<[?], offset: ?>> {
+// CHECK-SAME:    %[[BUF:.+]]: memref<?xf32, strided<[?]>>,
+// CHECK-SAME:    ) -> memref<?xf32, strided<[?]>> {
 // CHECK: return %[[BUF]]
 
 // When explicitly requested, public functions can be modified.
 // MODIFY-PUBLIC-LABEL: func @public_function(
-// MODIFY-PUBLIC-SAME:    %[[BUF:.+]]: memref<?xf32, strided<[?], offset: ?>>,
+// MODIFY-PUBLIC-SAME:    %[[BUF:.+]]: memref<?xf32, strided<[?]>>,
 // MODIFY-PUBLIC-SAME:    ) {
 // MODIFY-PUBLIC: return
 
-!type = memref<?xf32, strided<[?], offset: ?>>
+!type = memref<?xf32, strided<[?]>>
 func.func @public_function(
     %buf: !type, %val: f32, %idx: index) -> !type {
   memref.store %val, %buf[%idx] : !type
@@ -76,13 +76,13 @@ func.func @public_function(
 }
 
 // CHECK-LABEL: func @caller(
-// CHECK-SAME:     %[[IN_BUF:.+]]: memref<?xf32, strided<[?], offset: ?>>,
+// CHECK-SAME:     %[[IN_BUF:.+]]: memref<?xf32, strided<[?]>>,
 // CHECK: %[[RET_VAL:.+]] = call @public_function(%[[IN_BUF]]{{.*}}-> memref
 // CHECK: %[[LOADED:.+]] = memref.load %[[RET_VAL]]
 // CHECK: return %[[LOADED]]
 
 // MODIFY-PUBLIC-LABEL: func @caller(
-// MODIFY-PUBLIC-SAME:    %[[IN_BUF:.+]]: memref<?xf32, strided<[?], offset: ?>>,
+// MODIFY-PUBLIC-SAME:    %[[IN_BUF:.+]]: memref<?xf32, strided<[?]>>,
 // MODIFY-PUBLIC: call @public_function(%[[IN_BUF]]{{.*}}-> ()
 // MODIFY-PUBLIC: %[[LOADED:.*]] = memref.load %[[IN_BUF]]
 // MODIFY-PUBLIC: return %[[LOADED]]
@@ -96,11 +96,11 @@ func.func @caller(%buf: !type, %val: f32, %idx: index) -> f32 {
 // -----
 
 // CHECK-LABEL: func private @negative_external_function(
-// CHECK-SAME:    -> memref<?xf32, strided<[?], offset: ?>>
+// CHECK-SAME:    -> memref<?xf32, strided<[?]>>
 
 // Ensure external function remains unchanged.
 // MODIFY-PUBLIC-LABEL: func private @negative_external_function(
-// MODIFY-PUBLIC-SAME:    -> memref<?xf32, strided<[?], offset: ?>>
+// MODIFY-PUBLIC-SAME:    -> memref<?xf32, strided<[?]>>
 
-!type = memref<?xf32, strided<[?], offset: ?>>
+!type = memref<?xf32, strided<[?]>>
 func.func private @negative_external_function(%arg0: !type) -> !type
diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
index 3929f5be3b4ef..6ef0ad9e30ff6 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
@@ -296,7 +296,7 @@ func.func @regression_multiple_insertion_points(%t1: tensor<?x?xf32>) -> tensor<
 // -----
 
 // CHECK-LABEL: func @materialize_in_destination(
-//  CHECK-SAME:     %[[m:.*]]: memref<5xf32, strided<[?], offset: ?>>,
+//  CHECK-SAME:     %[[m:.*]]: memref<5xf32, strided<[?]>>,
 //       CHECK:   linalg.fill {{.*}} outs(%[[m]]
 //       CHECK:   return %[[m]]
 func.func @materialize_in_destination(%t: tensor<5xf32>, %f: f32) -> tensor<5xf32> {
@@ -322,7 +322,7 @@ func.func @materialize_in_destination_buffer(%m: memref<5xf32>, %f: f32) {
 // -----
 
 // CHECK-LABEL: func @linalg_copy(
-//  CHECK-SAME:     %[[m:.*]]: memref<5xf32, strided<[?], offset: ?>>,
+//  CHECK-SAME:     %[[m:.*]]: memref<5xf32, strided<[?]>>,
 //       CHECK:   linalg.fill {{.*}} outs(%[[m]]
 //       CHECK:   return %[[m]]
 func.func @linalg_copy(%t: tensor<5xf32>, %f: f32) -> tensor<5xf32> {
diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir
index e97777c3e3d13..061ab2c0d5041 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir
@@ -47,9 +47,9 @@ func.func @alloc_tesor_copy_from_default_space(%arg0: tensor<128xf32>) -> tensor
 
 // CHECK-LABEL: @alloc_tesor_copy_from_default_space
 //  CHECK-SAME: (%[[arg0:.+]]: tensor<128xf32>) -> tensor<128xf32> {
-//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32> to memref<128xf32, strided<[?], offset: ?>>
+//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32> to memref<128xf32, strided<[?]>>
 //       CHECK:     %[[alloc:.+]] = memref.alloc() {alignment = 64 : i64} : memref<128xf32, 1>
-//       CHECK:     memref.copy %[[v0]], %[[alloc]] : memref<128xf32, strided<[?], offset: ?>> to memref<128xf32, 1>
+//       CHECK:     memref.copy %[[v0]], %[[alloc]] : memref<128xf32, strided<[?]>> to memref<128xf32, 1>
 //       CHECK:     %[[v1:.+]] = bufferization.to_tensor %[[alloc]] : memref<128xf32, 1> to tensor<128xf32>
 //       CHECK:     return %[[v1]] : tensor<128xf32>
 
@@ -63,9 +63,9 @@ func.func @alloc_tesor_copy_from_non_default_space(%arg0: tensor<128xf32, 1>) ->
 
 // CHECK-LABEL: @alloc_tesor_copy_from_non_default_space
 //  CHECK-SAME: (%[[arg0:.+]]: tensor<128xf32, 1 : i64>) -> tensor<128xf32, 2 : i64> {
-//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?], offset: ?>, 1>
+//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?]>, 1>
 //       CHECK:     %[[alloc:.+]] = memref.alloc() {alignment = 64 : i64} : memref<128xf32, 2>
-//       CHECK:     memref.copy %[[v0]], %[[alloc]] : memref<128xf32, strided<[?], offset: ?>, 1> to memref<128xf32, 2>
+//       CHECK:     memref.copy %[[v0]], %[[alloc]] : memref<128xf32, strided<[?]>, 1> to memref<128xf32, 2>
 //       CHECK:     %[[v1:.+]] = bufferization.to_tensor %[[alloc]] : memref<128xf32, 2> to tensor<128xf32, 2 : i64>
 //       CHECK:     return %[[v1]] : tensor<128xf32, 2 : i64>
 
@@ -82,16 +82,16 @@ func.func @alloc_tesor_copy_from_non_default_space_no_cast(%arg0: tensor<128xf32
 
 // CHECK-LABEL: @alloc_tesor_copy_from_non_default_space_no_cast
 //  CHECK-SAME: (%[[arg0:.+]]: tensor<128xf32, 1 : i64>, %[[arg1:.+]]: tensor<4xf32, 1 : i64>) -> tensor<128xf32, 1 : i64> {
-//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg1]] : tensor<4xf32, 1 : i64> to memref<4xf32, strided<[?], offset: ?>, 1>
-//       CHECK:     %[[v1:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?], offset: ?>, 1>
-//       CHECK:     %[[v2:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?], offset: ?>, 1>
+//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg1]] : tensor<4xf32, 1 : i64> to memref<4xf32, strided<[?]>, 1>
+//       CHECK:     %[[v1:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?]>, 1>
+//       CHECK:     %[[v2:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?]>, 1>
 //       CHECK:     %[[alloc:.+]] = memref.alloc() {alignment = 64 : i64} : memref<128xf32, 2>
-//       CHECK:     memref.copy %[[v2]], %[[alloc]] : memref<128xf32, strided<[?], offset: ?>, 1> to memref<128xf32, 2>
+//       CHECK:     memref.copy %[[v2]], %[[alloc]] : memref<128xf32, strided<[?]>, 1> to memref<128xf32, 2>
 //       CHECK:     %[[v3:.+]] = bufferization.to_tensor %[[alloc]] : memref<128xf32, 2> to tensor<128xf32, 1 : i64>
 //       CHECK:     %[[alloc_0:.+]] = memref.alloc() {alignment = 64 : i64} : memref<128xf32, 1>
-//       CHECK:     memref.copy %[[v1]], %[[alloc_0]] : memref<128xf32, strided<[?], offset: ?>, 1> to memref<128xf32, 1>
+//       CHECK:     memref.copy %[[v1]], %[[alloc_0]] : memref<128xf32, strided<[?]>, 1> to memref<128xf32, 1>
 //       CHECK:     %[[subview:.+]] = memref.subview %[[alloc_0]][0] [4] [1] : memref<128xf32, 1> to memref<4xf32, strided<[1]>, 1>
-//       CHECK:     memref.copy %[[v0]], %[[subview]] : memref<4xf32, strided<[?], offset: ?>, 1> to memref<4xf32, strided<[1]>, 1>
+//       CHECK:     memref.copy %[[v0]], %[[subview]] : memref<4xf32, strided<[?]>, 1> to memref<4xf32, strided<[1]>, 1>
 //       CHECK:     return %[[v3]] : tensor<128xf32, 1 : i64>
 
 // -----
@@ -104,8 +104,8 @@ func.func @materialize_in_destination(%arg0: tensor<128xf32, 1>) -> tensor<128xf
 
 // CHECK-LABEL: @materialize_in_destination
 //  CHECK-SAME: (%[[arg0:.+]]: tensor<128xf32, 1 : i64>) -> tensor<128xf32, 2 : i64> {
-//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?], offset: ?>, 1>
+//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?]>, 1>
 //       CHECK:     %[[alloc:.+]] = memref.alloc() {alignment = 64 : i64} : memref<128xf32, 2>
-//       CHECK:     memref.copy %[[v0]], %[[alloc]] : memref<128xf32, strided<[?], offset: ?>, 1> to memref<128xf32, 2>
+//       CHECK:     memref.copy %[[v0]], %[[alloc]] : memref<128xf32, strided<[?]>, 1> to memref<128xf32, 2>
 //       CHECK:     %[[v1:.+]] = bufferization.to_tensor %[[alloc]] : memref<128xf32, 2> to tensor<128xf32, 2 : i64>
 //       CHECK:     return %[[v1]] : tensor<128xf32, 2 : i64>
diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-partial.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-partial.mlir
index 908c760d9a0cd..f008e2b698986 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-partial.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-partial.mlir
@@ -25,8 +25,8 @@ func.func @use_of_unknown_op_1(%t1: tensor<?xf32>)
 
   %idx = arith.constant 0 : index
   %cst = arith.constant 0.0 : f32
-  // CHECK: %[[dummy_memref:.*]] = bufferization.to_buffer %[[dummy]] : tensor<?xf32> to memref<?xf32, strided<[?], offset: ?>>
-  // CHECK: vector.transfer_read %[[dummy_memref]][%{{.*}}], %{{.*}} : memref<?xf32, strided<[?], offset: ?>>
+  // CHECK: %[[dummy_memref:.*]] = bufferization.to_buffer %[[dummy]] : tensor<?xf32> to memref<?xf32, strided<[?]>>
+  // CHECK: vector.transfer_read %[[dummy_memref]][%{{.*}}], %{{.*}} : memref<?xf32, strided<[?]>>
   // CHECK-NO-LAYOUT-MAP: %[[dummy_memref:.*]] = bufferization.to_buffer %[[dummy]] : tensor<?xf32> to memref<?xf32>
   // CHECK-NO-LAYOUT-MAP: vector.transfer_read %[[dummy_memref]][%{{.*}}], %{{.*}} : memref<?xf32>
   %1 = vector.transfer_read %0[%idx], %cst : tensor<?xf32>, vector<5xf32>
@@ -61,7 +61,7 @@ func.func @use_of_unknown_op_3(%t1: tensor<?xf32>)
 
   // CHECK: %[[dummy:.*]] = "test.dummy_op"(%[[t1]])
   %0 = "test.dummy_op"(%t1) : (tensor<?xf32>) -> tensor<?xf32>
-  // CHECK: %[[dummy_memref:.*]] = bufferization.to_buffer %[[dummy]] : tensor<?xf32> to memref<?xf32, strided<[?], offset: ?>>
+  // CHECK: %[[dummy_memref:.*]] = bufferization.to_buffer %[[dummy]] : tensor<?xf32> to memref<?xf32, strided<[?]>>
   // CHECK: %[[v2:.*]] = vector.transfer_read %[[dummy_memref]]
   %2 = vector.transfer_read %0[%idx], %cst : tensor<?xf32>, vector<5xf32>
 
diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize.mlir
index 8031732011839..ded7bee8a38b6 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize.mlir
@@ -227,7 +227,7 @@ func.func @tensor_copy(%arg0: tensor<5xf32>) -> tensor<5xf32> {
 
 // CHECK-LABEL: func @materialize_in_destination_buffer(
 //  CHECK-SAME:     %[[t:.*]]: tensor<5xf32>, %[[m:.*]]: memref<5xf32>)
-//       CHECK:   %[[b:.*]] = bufferization.to_buffer %[[t]] : tensor<5xf32> to memref<5xf32, strided<[?], offset: ?>>
+//       CHECK:   %[[b:.*]] = bufferization.to_buffer %[[t]] : tensor<5xf32> to memref<5xf32, strided<[?]>>
 //       CHECK:   memref.copy %[[b]], %[[m]]
 func.func @materialize_in_destination_buffer(%t: tensor<5xf32>, %m: memref<5xf32>) {
   bufferization.materialize_in_destination %t in restrict writable %m
diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-out-params.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-out-params.mlir
index 75e9a8926ad15..114ff3a2e1132 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-out-params.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-out-params.mlir
@@ -8,8 +8,8 @@
 // Note: This bufferization is not very efficient yet, but it works.
 
 // CHECK-LABEL: func private @callee(
-//  CHECK-SAME:              %[[arg0:.*]]: memref<5xf32, strided<[?], offset: ?>>,
-//  CHECK-SAME:              %[[arg1:.*]]: memref<5xf32, strided<[?], offset: ?>>) {
+//  CHECK-SAME:              %[[arg0:.*]]: memref<5xf32, strided<[?]>>,
+//  CHECK-SAME:              %[[arg1:.*]]: memref<5xf32, strided<[?]>>) {
 // This alloc is not needed, but it is inserted due to the out-of-place
 // bufferization of the tensor.insert. With a better layering of the out param
 // promotion pass, this alloc could be avoided.
@@ -30,7 +30,7 @@
 //       CHECK-NO-LAYOUT:   memref.copy %[[alloc]], %[[arg1]]
 
 // CHECK-BASELINE-LABEL: func private @callee(
-//  CHECK-BASELINE-SAME:     %[[arg0:.*]]: memref<5xf32, strided<[?], offset: ?>>) -> memref<5xf32> {
+//  CHECK-BASELINE-SAME:     %[[arg0:.*]]: memref<5xf32, strided<[?]>>) -> memref<5xf32> {
 //       CHECK-BASELINE:   %[[alloc:.*]] = memref.alloc() {{.*}} : memref<5xf32>
 //       CHECK-BASELINE:   memref.copy %[[arg0]], %[[alloc]]
 //       CHECK-BASELINE:   memref.store {{.*}}, %[[alloc]]
@@ -45,9 +45,9 @@ func.func private @callee(%t: tensor<5xf32>) -> (tensor<5xf32>, tensor<5xf32>) {
   return %t, %1 : tensor<5xf32>, tensor<5xf32>
 }
 
-// CHECK: func @main(%[[arg0:.*]]: memref<5xf32, strided<[?], offset: ?>>) -> (f32, f32) {
+// CHECK: func @main(%[[arg0:.*]]: memref<5xf32, strided<[?]>>) -> (f32, f32) {
 // CHECK:   %[[alloc:.*]] = memref.alloc() : memref<5xf32>
-// CHECK:   %[[casted:.*]] = memref.cast %[[alloc]] : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>>
+// CHECK:   %[[casted:.*]] = memref.cast %[[alloc]] : memref<5xf32> to memref<5xf32, strided<[?]>>
 // CHECK:   call @callee(%[[arg0]], %[[casted]])
 // CHECK:   %[[l1:.*]] = memref.load %[[arg0]]
 // CHECK:   %[[l2:.*]] = memref.load %[[casted]]
@@ -70,9 +70,9 @@ func.func @main(%t: tensor<5xf32>) -> (f32, f32) {
 
 // CHECK-LABEL: func private @callee(
 //  CHECK-SAME:     %{{.*}}: index,
-//  CHECK-SAME:     %[[r:.*]]: memref<2x5xf32, strided<[?, ?], offset: ?>>) {
+//  CHECK-SAME:     %[[r:.*]]: memref<2x5xf32, strided<[?, ?]>>) {
 //       CHECK:   %[[alloc:.*]] = memref.alloc() {{.*}} : memref<10x20xf32>
-//       CHECK:   %[[subview:.*]] = memref.subview %[[alloc]]{{.*}} : memref<10x20xf32> to memref<2x5xf32, strided<[20, 1], offset: ?>>
+//       CHECK:   %[[subview:.*]] = memref.subview %[[alloc]]{{.*}} : memref<10x20xf32> to memref<2x5xf32, strided<[20, 1]>>
 //       CHECK:   %[[casted:.*]] = memref.cast %[[subview]]
 //       CHECK:   memref.copy %[[casted]], %[[r]]
 
@@ -89,7 +89,7 @@ func.func @main(%t: tensor<5xf32>) -> (f32, f32) {
 //       CHECK-NO-LAYOUT:   memref.copy %[[alloc2]], %[[r]]
 
 // CHECK-BASELINE-LABEL: func private @callee(
-//  CHECK-BASELINE-SAME:     %{{.*}}: index) -> memref<2x5xf32, strided<[20, 1], offset: ?>> {
+//  CHECK-BASELINE-SAME:     %{{.*}}: index) -> memref<2x5xf32, strided<[20, 1]>> {
 //       CHECK-BASELINE:   %[[alloc:.*]] = memref.alloc() {{.*}} : memref<10x20xf32>
 //       CHECK-BASELINE:   %[[subview:.*]] = memref.subview %[[alloc]]
 //       CHECK-BASELINE:   return %[[subview]]
@@ -101,7 +101,7 @@ func.func private @callee(%idx: index) -> tensor<2x5xf32> {
 
 // CHECK: func @main(
 // CHECK:   %[[alloc:.*]] = memref.alloc() : memref<2x5xf32>
-// CHECK:   %[[casted:.*]] = memref.cast %[[alloc]] : memref<2x5xf32> to memref<2x5xf32, strided<[?, ?], offset: ?>>
+// CHECK:   %[[casted:.*]] = memref.cast %[[alloc]] : memref<2x5xf32> to memref<2x5xf32, strided<[?, ?]>>
 // CHECK:   call @callee(%{{.*}}, %[[casted]])
 // CHECK:   memref.load %[[casted]]
 
diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir
index d5cb7a0f14f5a..eea2a1a1b59a6 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir
@@ -15,11 +15,11 @@
 
 // Bufferization of bodiless function with no tensor return value.
 
-// CHECK-LABEL: func private @private_func(memref<?xf32, strided<[?], offset: ?>>
+// CHECK-LABEL: func private @private_func(memref<?xf32, strided<[?]>>
 // CHECK-NO-LAYOUT-MAP-LABEL: func private @private_func(memref<?xf32>)
 func.func private @private_func(tensor<?xf32>) -> ()
 
-// CHECK-LABEL: func private @private_func_2d(memref<?x?xf32, strided<[?, ?], offset: ?>>
+// CHECK-LABEL: func private @private_func_2d(memref<?x?xf32, strided<[?, ?]>>
 // CHECK-NO-LAYOUT-MAP-LABEL: func private @private_func_2d(memref<?x?xf32>)
 func.func private @private_func_2d(tensor<?x?xf32>) -> ()
 
@@ -36,7 +36,7 @@ func.func @empty_func() -> () {
 
 // CHECK: func private @external_func_with_return_val(memref<4xi32, strided{{.*}}>) -> f32
 // CHECK-FULLY-DYNAMIC-LAYOUT-MAP-LABEL: func private @external_func_with_return_val(memref<4xi32,
-// CHECK-FULLY-DYNAMIC-LAYOUT-MAP-SAME: strided<[?], offset: ?>>
+// CHECK-FULLY-DYNAMIC-LAYOUT-MAP-SAME: strided<[?]>>
 // CHECK-NO-LAYOUT-MAP-LABEL: func private @external_func_with_return_val(memref<4xi32>)
 func.func private @external_func_with_return_val(tensor<4xi32>) -> f32
 
@@ -44,13 +44,13 @@ func.func private @external_func_with_return_val(tensor<4xi32>) -> f32
 
 // Bufferization of bodiless function that returns a tensor.
 
-// CHECK: func.func private @foo(memref<?xf32, strided<[?], offset: ?>>) -> (f32, memref<?xf32, strided<[?], offset: ?>>, f32)
+// CHECK: func.func private @foo(memref<?xf32, strided<[?]>>) -> (f32, memref<?xf32, strided<[?]>>, f32)
 func.func private @foo(%t : tensor<?xf32>) -> (f32, tensor<?xf32>, f32)
 
 // CHECK: func.func @call_to_unknown_tensor_returning_func(
-// CHECK-SAME: %[[arg0:.*]]: memref<?xf32, strided<[?], offset: ?>>) {
+// CHECK-SAME: %[[arg0:.*]]: memref<?xf32, strided<[?]>>) {
 func.func @call_to_unknown_tensor_returning_func(%t : tensor<?xf32>) {
-  // CHECK: call @foo(%[[arg0]]) : (memref<?xf32, strided<[?], offset: ?>>) -> (f32, memref<?xf32, strided<[?], offset: ?>>, f32)
+  // CHECK: call @foo(%[[arg0]]) : (memref<?xf32, strided<[?]>>) -> (f32, memref<?xf32, strided<[?]>>, f32)
   call @foo(%t) : (tensor<?xf32>) -> (f32, tensor<?xf32>, f32)
   return
 }
@@ -59,14 +59,14 @@ func.func @call_to_unknown_tensor_returning_func(%t : tensor<?xf32>) {
 
 // A function that returns a non-equivalent tensor with layout map.
 
-// CHECK-LABEL: func @return_extract_slice(%{{.*}}) -> memref<2x?xf32, strided<[10, 1], offset: ?>>
+// CHECK-LABEL: func @return_extract_slice(%{{.*}}) -> memref<2x?xf32, strided<[10, 1]>>
 //       CHECK:   %[[alloc:.*]] = memref.alloc() {{.*}} : memref<20x10xf32>
-//       CHECK:   %[[subview:.*]] = memref.subview {{.*}} : memref<20x10xf32> to memref<2x?xf32, strided<[10, 1], offset: ?>>
+//       CHECK:   %[[subview:.*]] = memref.subview {{.*}} : memref<20x10xf32> to memref<2x?xf32, strided<[10, 1]>>
 //       CHECK:   return %[[subview]]
 
 // CHECK-NO-LAYOUT-MAP-LABEL: func @return_extract_slice(%{{.*}}) -> memref<2x?xf32>
 //       CHECK-NO-LAYOUT-MAP:   %[[alloc:.*]] = memref.alloc() {{.*}} : memref<20x10xf32>
-//       CHECK-NO-LAYOUT-MAP:   %[[subview:.*]] = memref.subview {{.*}} : memref<20x10xf32> to memref<2x?xf32, strided<[10, 1], offset: ?>>
+//       CHECK-NO-LAYOUT-MAP:   %[[subview:.*]] = memref.subview {{.*}} : memref<20x10xf32> to memref<2x?xf32, strided<[10, 1]>>
 //       CHECK-NO-LAYOUT-MAP:   %[[alloc_no_layout:.*]] = memref.alloc(%{{.*}}) {{.*}} : memref<2x?xf32>
 //       CHECK-NO-LAYOUT-MAP:   memref.copy %[[subview]], %[[alloc_no_layout]]
 // TODO: %alloc should be deallocated here, but we currently do not dealloc
@@ -75,7 +75,7 @@ func.func @call_to_unknown_tensor_returning_func(%t : tensor<?xf32>) {
 //       CHECK-NO-LAYOUT-MAP:   return %[[alloc_no_layout]]
 
 // CHECK-FULLY-DYNAMIC-LAYOUT-MAP-LABEL: func @return_extract_slice(%{{.*}}) -> memref<2x?xf32,
-//  CHECK-FULLY-DYNAMIC-LAYOUT-MAP-SAME: strided<[?, ?], offset: ?>> {
+//  CHECK-FULLY-DYNAMIC-LAYOUT-MAP-SAME: strided<[?, ?]>> {
 func.func @return_extract_slice(%idx: index, %sz: index) -> (tensor<2x?xf32>)
 {
   %t = bufferization.alloc_tensor() : tensor<20x10xf32>
@@ -96,9 +96,9 @@ func.func @foo(%arg0: tensor<3x8xf16>) -> tensor<3x8xf16> {
 
 // CHECK-NO-LAYOUT-MAP-LABEL:   func.func @call_extract_slice(
 // CHECK-NO-LAYOUT-MAP-SAME:                                  %[[VAL_0:.*]]: memref<4x8xf16>) -> memref<3x8xf16> {
-// CHECK-NO-LAYOUT-MAP:           %[[VAL_1:.*]] = memref.subview %[[VAL_0]][1, 0] [3, 8] [1, 1] : memref<4x8xf16> to memref<3x8xf16, strided<[8, 1], offset: 8>>
+// CHECK-NO-LAYOUT-MAP:           %[[VAL_1:.*]] = memref.subview %[[VAL_0]][1, 0] [3, 8] [1, 1] : memref<4x8xf16> to memref<3x8xf16, strided<[8, 1]>>
 // CHECK-NO-LAYOUT-MAP:           %[[VAL_2:.*]] = memref.alloc() {alignment = 64 : i64} : memref<3x8xf16>
-// CHECK-NO-LAYOUT-MAP:           memref.copy %[[VAL_1]], %[[VAL_2]] : memref<3x8xf16, strided<[8, 1], offset: 8>> to memref<3x8xf16>
+// CHECK-NO-LAYOUT-MAP:           memref.copy %[[VAL_1]], %[[VAL_2]] : memref<3x8xf16, strided<[8, 1]>> to memref<3x8xf16>
 // CHECK-NO-LAYOUT-MAP:           %[[VAL_3:.*]] = call @foo(%[[VAL_2]]) : (memref<3x8xf16>) -> memref<3x8xf16>
 // CHECK-NO-LAYOUT-MAP:           return %[[VAL_3]] : memref<3x8xf16>
 // CHECK-NO-LAYOUT-MAP:         }
@@ -305,7 +305,7 @@ func.func @main(%t: tensor<?xf32> {bufferization.writable = false}) -> f32 {
 // Alloc and copy must be inserted because the arith.constant is read-only.
 
 //      CHECK: memref.global "private" constant @__constant_4xi32 : memref<4xi32> = dense<[1, 2, 3, 4]>
-//      CHECK: func private @some_external_func(memref<4xi32, strided<[?], offset: ?>>)
+//      CHECK: func private @some_external_func(memref<4xi32, strided<[?]>>)
 func.func private @some_external_func(tensor<4xi32>)
 
 //      CHECK: func @main()
@@ -314,9 +314,9 @@ func.func @main() {
   %A = arith.constant dense<[1, 2, 3, 4]> : tensor<4xi32>
 
 //  CHECK-DAG:   %[[alloc:.*]] = memref.alloc
-//  CHECK-DAG:   %[[B:.*]] = memref.cast %[[alloc]] : memref<4xi32> to memref<4xi32, strided<[?], offset: ?>>
+//  CHECK-DAG:   %[[B:.*]] = memref.cast %[[alloc]] : memref<4xi32> to memref<4xi32, strided<[?]>>
 //  CHECK-DAG:   memref.copy %[[A]], %[[alloc]]
-//      CHECK:   call @some_external_func(%[[B]]) : (memref<4xi32, strided<[?], offset: ?>>) -> ()
+//      CHECK:   call @some_external_func(%[[B]]) : (memref<4xi32, strided<[?]>>) -> ()
   call @some_external_func(%A) : (tensor<4xi32>) -> ()
 
   return
@@ -328,7 +328,7 @@ func.func @main() {
 // function call is inside of an scf.execute_region.
 
 //      CHECK: memref.global "private" constant @__constant_4xi32 : memref<4xi32> = dense<[1, 2, 3, 4]>
-//      CHECK: func private @some_external_func_within_scf_execute(memref<4xi32, strided<[?], offset: ?>>)
+//      CHECK: func private @some_external_func_within_scf_execute(memref<4xi32, strided<[?]>>)
 func.func private @some_external_func_within_scf_execute(tensor<4xi32>)
 
 //      CHECK: func @main()
@@ -339,9 +339,9 @@ func.func @main() {
 // Note: The scf.execute_region canonicalizes away.
 
 //  CHECK-DAG:   %[[alloc:.*]] = memref.alloc
-//  CHECK-DAG:   %[[B:.*]] = memref.cast %[[alloc]] : memref<4xi32> to memref<4xi32, strided<[?], offset: ?>>
+//  CHECK-DAG:   %[[B:.*]] = memref.cast %[[alloc]] : memref<4xi32> to memref<4xi32, strided<[?]>>
 //  CHECK-DAG:   memref.copy %[[A]], %[[alloc]]
-//      CHECK:   call @some_external_func_within_scf_execute(%[[B]]) : (memref<4xi32, strided<[?], offset: ?>>) -> ()
+//      CHECK:   call @some_external_func_within_scf_execute(%[[B]]) : (memref<4xi32, strided<[?]>>) -> ()
   scf.execute_region {
     func.call @some_external_func_within_scf_execute(%A) : (tensor<4xi32>) -> ()
     scf.yield
@@ -398,13 +398,13 @@ module {
 
 // -----
 
-//      CHECK:  func private @some_external_func(memref<?xf32, strided<[?], offset: ?>>)
+//      CHECK:  func private @some_external_func(memref<?xf32, strided<[?]>>)
 func.func private @some_external_func(tensor<?xf32>)
 
 //      CHECK:  func private @scf_for_with_tensor_insert_slice(
-// CHECK-SAME:    %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
-// CHECK-SAME:    %[[B:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
-// CHECK-SAME:    %[[C:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>
+// CHECK-SAME:    %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
+// CHECK-SAME:    %[[B:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
+// CHECK-SAME:    %[[C:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?]>>
 func.func private @scf_for_with_tensor_insert_slice(
     %A : tensor<?xf32>, %B : tensor<?xf32>, %C : tensor<4xf32>,
     %lb : index, %ub : index, %step : index)
@@ -415,11 +415,11 @@ func.func private @scf_for_with_tensor_insert_slice(
       -> (tensor<?xf32>, tensor<?xf32>)
   {
     // CHECK-NEXT:   %[[SVA:.*]] = memref.subview %[[A]]
-    // CHECK-NEXT:   memref.copy %[[C]], %[[SVA]] : memref<4xf32, strided<[?], offset: ?>> to memref<4xf32, strided<[?], offset: ?>>
+    // CHECK-NEXT:   memref.copy %[[C]], %[[SVA]] : memref<4xf32, strided<[?]>> to memref<4xf32, strided<[?]>>
     %ttA = tensor.insert_slice %C into %tA[%i][4][1] : tensor<4xf32> into tensor<?xf32>
 
     // CHECK-NEXT:   %[[SVB:.*]] = memref.subview %[[B]]
-    // CHECK-NEXT:   memref.copy %[[C]], %[[SVB]] : memref<4xf32, strided<[?], offset: ?>> to memref<4xf32, strided<[?], offset: ?>>
+    // CHECK-NEXT:   memref.copy %[[C]], %[[SVB]] : memref<4xf32, strided<[?]>> to memref<4xf32, strided<[?]>>
     %ttB = tensor.insert_slice %C into %tB[%i][4][1] : tensor<4xf32> into tensor<?xf32>
 
     // scf.yield is empty and is elided
@@ -432,9 +432,9 @@ func.func private @scf_for_with_tensor_insert_slice(
 }
 
 //      CHECK:  func @bar(
-// CHECK-SAME:    %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
-// CHECK-SAME:    %[[B:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
-// CHECK-SAME:    %[[C:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>
+// CHECK-SAME:    %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
+// CHECK-SAME:    %[[B:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
+// CHECK-SAME:    %[[C:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?]>>
 func.func @bar(
     %A : tensor<?xf32> {bufferization.writable = true},
     %B : tensor<?xf32> {bufferization.writable = true},
@@ -451,7 +451,7 @@ func.func @bar(
 //  CHECK-DAG:   %[[alloc:.*]] = memref.alloc
 //  CHECK-DAG:   %[[casted:.*]] = memref.cast %[[alloc]]
 //  CHECK-DAG:   memref.copy %[[B]], %[[alloc]]
-// CHECK-NEXT:   call @some_external_func(%[[casted]]) : (memref<?xf32, strided<[?], offset: ?>>) -> ()
+// CHECK-NEXT:   call @some_external_func(%[[casted]]) : (memref<?xf32, strided<[?]>>) -> ()
   call @some_external_func(%r0#0) : (tensor<?xf32>) -> ()
 
 //      CHECK:   return
@@ -461,17 +461,17 @@ func.func @bar(
 // -----
 
 //      CHECK:  func private @init_and_dot(
-// CHECK-SAME:    %[[A:[a-zA-Z0-9]*]]: memref<64xf32, strided<[?], offset: ?>>
-// CHECK-SAME:    %[[B:[a-zA-Z0-9]*]]: memref<64xf32, strided<[?], offset: ?>>
-// CHECK-SAME:    %[[C:[a-zA-Z0-9]*]]: memref<f32, strided<[], offset: ?>>
+// CHECK-SAME:    %[[A:[a-zA-Z0-9]*]]: memref<64xf32, strided<[?]>>
+// CHECK-SAME:    %[[B:[a-zA-Z0-9]*]]: memref<64xf32, strided<[?]>>
+// CHECK-SAME:    %[[C:[a-zA-Z0-9]*]]: memref<f32, strided<[]>>
 func.func private @init_and_dot(%a: tensor<64xf32>, %b: tensor<64xf32>, %c: tensor<f32>) -> tensor<f32> {
   // CHECK-NEXT:   %[[C0:.*]] = arith.constant 0{{.*}} : f32
   %v0 = arith.constant 0.0 : f32
 
-  // CHECK-NEXT:   linalg.fill ins(%[[C0]] : f32) outs(%[[C]] : memref<f32, strided<[], offset: ?>>)
+  // CHECK-NEXT:   linalg.fill ins(%[[C0]] : f32) outs(%[[C]] : memref<f32, strided<[]>>)
   %d = linalg.fill ins(%v0 : f32) outs(%c : tensor<f32>) -> tensor<f32>
 
-  // CHECK-NEXT:   linalg.dot ins(%[[A]], %[[B]] : memref<64xf32, strided<[?], offset: ?>>, memref<64xf32, strided<[?], offset: ?>>) outs(%[[C]] : memref<f32, strided<[], offset: ?>>)
+  // CHECK-NEXT:   linalg.dot ins(%[[A]], %[[B]] : memref<64xf32, strided<[?]>>, memref<64xf32, strided<[?]>>) outs(%[[C]] : memref<f32, strided<[]>>)
   %e = linalg.dot ins(%a, %b : tensor<64xf32>,tensor<64xf32>)
     outs(%d: tensor<f32>) -> tensor<f32>
 
@@ -491,9 +491,9 @@ func.func @main() {
   // CHECK-NEXT:   %[[A:.*]] = memref.alloc() {alignment = 64 : i64} : memref<64xf32>
   // CHECK-NEXT:   %[[B:.*]] = memref.alloc() {alignment = 64 : i64} : memref<64xf32>
   // CHECK-NEXT:   %[[C:.*]] = memref.alloc() {alignment = 64 : i64} : memref<f32>
-  //  CHECK-DAG:   %[[cA:.*]] = memref.cast %[[A]] : memref<64xf32> to memref<64xf32, strided<[?], offset: ?>>
-  //  CHECK-DAG:   %[[cB:.*]] = memref.cast %[[B]] : memref<64xf32> to memref<64xf32, strided<[?], offset: ?>>
-  //  CHECK-DAG:   %[[cC:.*]] = memref.cast %[[C]] : memref<f32> to memref<f32, strided<[], offset: ?>>
+  //  CHECK-DAG:   %[[cA:.*]] = memref.cast %[[A]] : memref<64xf32> to memref<64xf32, strided<[?]>>
+  //  CHECK-DAG:   %[[cB:.*]] = memref.cast %[[B]] : memref<64xf32> to memref<64xf32, strided<[?]>>
+  //  CHECK-DAG:   %[[cC:.*]] = memref.cast %[[C]] : memref<f32> to memref<f32, strided<[]>>
   %A = bufferization.alloc_tensor() : tensor<64xf32>
   %B = bufferization.alloc_tensor() : tensor<64xf32>
   %C = bufferization.alloc_tensor() : tensor<f32>
@@ -524,25 +524,25 @@ func.func private @printMemrefF32(tensor<*xf32>)
 
 // -----
 
-// CHECK: func private @external_func(memref<?xf32, strided<[?], offset: ?>>)
+// CHECK: func private @external_func(memref<?xf32, strided<[?]>>)
 func.func private @external_func(tensor<?xf32>)
 
 //      CHECK: func @callee(
 // CHECK-SAME:   %[[A:[0-9a-zA-Z]*]]: memref<?xf32>
-// CHECK-SAME:   %[[B:[0-9a-zA-Z]*]]: memref<?xf32, strided<[?], offset: ?>>
-// CHECK-SAME:   %[[C:[0-9a-zA-Z]*]]: memref<?xf32, strided<[?], offset: ?>>
+// CHECK-SAME:   %[[B:[0-9a-zA-Z]*]]: memref<?xf32, strided<[?]>>
+// CHECK-SAME:   %[[C:[0-9a-zA-Z]*]]: memref<?xf32, strided<[?]>>
 func.func @callee(
     %A : tensor<?xf32> {bufferization.buffer_layout = affine_map<(i)[s0, s1] -> (i)>},
     %B : tensor<?xf32>,
     %C : tensor<?xf32>) {
-// CHECK-NEXT: %[[CASTED:.*]] = memref.cast %[[A]] : memref<?xf32> to memref<?xf32, strided<[?], offset: ?>>
-// CHECK-NEXT: call @external_func(%[[CASTED]]) : (memref<?xf32, strided<[?], offset: ?>>) -> ()
+// CHECK-NEXT: %[[CASTED:.*]] = memref.cast %[[A]] : memref<?xf32> to memref<?xf32, strided<[?]>>
+// CHECK-NEXT: call @external_func(%[[CASTED]]) : (memref<?xf32, strided<[?]>>) -> ()
   call @external_func(%A) : (tensor<?xf32>) -> ()
 
-// CHECK-NEXT: call @external_func(%[[B]]) : (memref<?xf32, strided<[?], offset: ?>>) -> ()
+// CHECK-NEXT: call @external_func(%[[B]]) : (memref<?xf32, strided<[?]>>) -> ()
   call @external_func(%B) : (tensor<?xf32>) -> ()
 
-// CHECK-NEXT: call @external_func(%[[C]]) : (memref<?xf32, strided<[?], offset: ?>>) -> ()
+// CHECK-NEXT: call @external_func(%[[C]]) : (memref<?xf32, strided<[?]>>) -> ()
   call @external_func(%C) : (tensor<?xf32>) -> ()
 
   return
@@ -551,7 +551,7 @@ func.func @callee(
 //      CHECK: func @entry(
 // CHECK-SAME:   %[[A:[0-9a-zA-Z]*]]: memref<?xf32>
 // CHECK-SAME:   %[[B:[0-9a-zA-Z]*]]: memref<?xf32>
-// CHECK-SAME:   %[[C:[0-9a-zA-Z]*]]: memref<?xf32, strided<[?], offset: ?>>
+// CHECK-SAME:   %[[C:[0-9a-zA-Z]*]]: memref<?xf32, strided<[?]>>
 func.func @entry(%A : tensor<?xf32> {bufferization.buffer_layout = affine_map<(i)[s0, s1] -> (i)>, bufferization.writable = false},
                  %B : tensor<?xf32> {bufferization.buffer_layout = affine_map<(i)[s0, s1] -> (i)>, bufferization.writable = false},
                  %C : tensor<?xf32> {bufferization.writable = false}) {
@@ -735,7 +735,7 @@ func.func @foo(%m: memref<5xf32>) -> memref<5xf32> {
   return %1 : memref<5xf32>
 }
 
-// CHECK: func.func @bar(%{{.*}}: memref<5xf32, strided<[?], offset: ?>>, %arg1: memref<5xf32>) -> memref<5xf32>
+// CHECK: func.func @bar(%{{.*}}: memref<5xf32, strided<[?]>>, %arg1: memref<5xf32>) -> memref<5xf32>
 func.func @bar(%t: tensor<5xf32>, %m: memref<5xf32>) -> memref<5xf32> {
   %0 = func.call @foo(%m) : (memref<5xf32>) -> (memref<5xf32>)
   return %0 : memref<5xf32>
@@ -746,14 +746,14 @@ func.func @bar(%t: tensor<5xf32>, %m: memref<5xf32>) -> memref<5xf32> {
 // A recursive function.
 
 // CHECK-LABEL: func.func @foo(
-//  CHECK-SAME:     %[[arg0:.*]]: memref<5xf32, strided<[?], offset: ?>>) -> memref<5xf32, strided<[?], offset: ?>> {
+//  CHECK-SAME:     %[[arg0:.*]]: memref<5xf32, strided<[?]>>) -> memref<5xf32, strided<[?]>> {
 func.func @foo(%t: tensor<5xf32>) -> tensor<5xf32> {
   // We are conservative around recursive functions. The analysis cannot handle
   // them, so we have to assume the op operand of the call op bufferizes to a
   // memory read and write. This causes a copy in this test case.
   // CHECK: %[[copy:.*]] = memref.alloc() {alignment = 64 : i64} : memref<5xf32>
   // CHECK: memref.copy %[[arg0]], %[[copy]]
-  // CHECK: %[[cast:.*]] = memref.cast %[[copy]] : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>>
+  // CHECK: %[[cast:.*]] = memref.cast %[[copy]] : memref<5xf32> to memref<5xf32, strided<[?]>>
   // CHECK: %[[call:.*]] = call @foo(%[[cast]])
   %0 = call @foo(%t) : (tensor<5xf32>) -> (tensor<5xf32>)
 
@@ -771,8 +771,8 @@ func.func @foo(%t: tensor<5xf32>) -> tensor<5xf32> {
 // Two functions calling each other recursively.
 
 // CHECK-LABEL: func.func @foo(
-//  CHECK-SAME:     %[[arg0:.*]]: memref<5xf32, strided<[?], offset: ?>>) -> memref<5xf32, strided<[?], offset: ?>> {
-//       CHECK:   %[[call:.*]] = call @bar(%[[arg0]]) : (memref<5xf32, strided<[?], offset: ?>>) -> memref<5xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     %[[arg0:.*]]: memref<5xf32, strided<[?]>>) -> memref<5xf32, strided<[?]>> {
+//       CHECK:   %[[call:.*]] = call @bar(%[[arg0]]) : (memref<5xf32, strided<[?]>>) -> memref<5xf32, strided<[?]>>
 //       CHECK:   return %[[call]]
 //       CHECK: }
 func.func @foo(%t: tensor<5xf32>) -> tensor<5xf32> {
@@ -781,8 +781,8 @@ func.func @foo(%t: tensor<5xf32>) -> tensor<5xf32> {
 }
 
 // CHECK-LABEL: func.func @bar(
-//  CHECK-SAME:     %[[arg0:.*]]: memref<5xf32, strided<[?], offset: ?>>) -> memref<5xf32, strided<[?], offset: ?>> {
-//       CHECK:   %[[call:.*]] = call @foo(%[[arg0]]) : (memref<5xf32, strided<[?], offset: ?>>) -> memref<5xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     %[[arg0:.*]]: memref<5xf32, strided<[?]>>) -> memref<5xf32, strided<[?]>> {
+//       CHECK:   %[[call:.*]] = call @foo(%[[arg0]]) : (memref<5xf32, strided<[?]>>) -> memref<5xf32, strided<[?]>>
 //       CHECK:   return %[[call]]
 //       CHECK: }
 func.func @bar(%t: tensor<5xf32>) -> tensor<5xf32>{
@@ -795,22 +795,22 @@ func.func @bar(%t: tensor<5xf32>) -> tensor<5xf32>{
 // The two func.return operands have different types after bufferization. Make
 // sure that memref.cast ops are inserted.
 
-// CHECK-LABEL: func @result_type_mismatch({{.*}}) -> memref<5xf32, strided<[?], offset: ?>>
+// CHECK-LABEL: func @result_type_mismatch({{.*}}) -> memref<5xf32, strided<[?]>>
 func.func @result_type_mismatch(%c: i1) -> tensor<5xf32> {
   // CHECK: %[[alloc:.*]] = memref.alloc() {alignment = 64 : i64} : memref<10xf32>
   %t = tensor.empty() : tensor<10xf32>
   cf.cond_br %c, ^bb1, ^bb2
 ^bb1:
   // CHECK: %[[m0:.*]] = memref.subview %[[alloc]][0] [5] [2] : memref<10xf32> to memref<5xf32, strided<[2]>>
-  // CHECK: %[[cast0:.*]] = memref.cast %[[m0]] : memref<5xf32, strided<[2]>> to memref<5xf32, strided<[?], offset: ?>>
+  // CHECK: %[[cast0:.*]] = memref.cast %[[m0]] : memref<5xf32, strided<[2]>> to memref<5xf32, strided<[?]>>
   %0 = tensor.extract_slice %t[0][5][2] : tensor<10xf32> to tensor<5xf32>
-  // CHECK: return %[[cast0]] : memref<5xf32, strided<[?], offset: ?>
+  // CHECK: return %[[cast0]] : memref<5xf32, strided<[?]>
   return %0 : tensor<5xf32>
 ^bb2:
-  // CHECK: %[[m1:.*]] = memref.subview %[[alloc]][2] [5] [1] : memref<10xf32> to memref<5xf32, strided<[1], offset: 2>>
-  // CHECK: %[[cast1:.*]] = memref.cast %[[m1]] : memref<5xf32, strided<[1], offset: 2>> to memref<5xf32, strided<[?], offset: ?>>
+  // CHECK: %[[m1:.*]] = memref.subview %[[alloc]][2] [5] [1] : memref<10xf32> to memref<5xf32, strided<[1]>>
+  // CHECK: %[[cast1:.*]] = memref.cast %[[m1]] : memref<5xf32, strided<[1]>> to memref<5xf32, strided<[?]>>
   %1 = tensor.extract_slice %t[2][5][1] : tensor<10xf32> to tensor<5xf32>
-  // CHECK: return %[[cast1]] : memref<5xf32, strided<[?], offset: ?>>
+  // CHECK: return %[[cast1]] : memref<5xf32, strided<[?]>>
   return %1 : tensor<5xf32>
 }
 
diff --git a/mlir/test/Dialect/Bufferization/Transforms/optimize-allocation-liveness.mlir b/mlir/test/Dialect/Bufferization/Transforms/optimize-allocation-liveness.mlir
index 63d33e3a88bed..e7e0a0546fcd2 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/optimize-allocation-liveness.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/optimize-allocation-liveness.mlir
@@ -143,8 +143,8 @@ func.func private @test_users_in_different_blocks_linalig_generic(%arg0: memref<
 // CHECK:           memref.dealloc %[[VAL_11]] : memref<45x6144xf32, 1>
 // CHECK:           scf.for %[[VAL_13:.*]] = %[[VAL_3]] to %[[VAL_6]] step %[[VAL_4]] {
 // CHECK:             scf.for %[[VAL_14:.*]] = %[[VAL_3]] to %[[VAL_7]] step %[[VAL_5]] {
-// CHECK:               %[[VAL_15:.*]] = memref.subview %[[VAL_9]]{{\[}}%[[VAL_13]], %[[VAL_14]], 0] [1, 8, 256] [1, 1, 1] : memref<45x24x256xf32, 1> to memref<1x8x256xf32, strided<[6144, 256, 1], offset: ?>, 1>
-// CHECK:               %[[VAL_16:.*]] = memref.subview %[[VAL_10]]{{\[}}%[[VAL_14]], 0] [8, 256] [1, 1] : memref<24x256xf32, 1> to memref<8x256xf32, strided<[256, 1], offset: ?>, 1>
+// CHECK:               %[[VAL_15:.*]] = memref.subview %[[VAL_9]]{{\[}}%[[VAL_13]], %[[VAL_14]], 0] [1, 8, 256] [1, 1, 1] : memref<45x24x256xf32, 1> to memref<1x8x256xf32, strided<[6144, 256, 1]>, 1>
+// CHECK:               %[[VAL_16:.*]] = memref.subview %[[VAL_10]]{{\[}}%[[VAL_14]], 0] [8, 256] [1, 1] : memref<24x256xf32, 1> to memref<8x256xf32, strided<[256, 1]>, 1>
 // CHECK:             }
 // CHECK:           }
 // CHECK:           memref.dealloc %[[VAL_10]] : memref<24x256xf32, 1>
@@ -167,8 +167,8 @@ func.func private @test_deallocs_in_different_block_forops(%arg0: memref<45x24x2
   %expand_shape2 = memref.expand_shape %alloc_2 [[0], [1, 2]] output_shape [45, 24, 256] : memref<45x6144xf32, 1> into memref<45x24x256xf32, 1>
   scf.for %arg3 = %c0 to %c45 step %c1 {
     scf.for %arg4 = %c0 to %c24 step %c8 {
-      %subview = memref.subview %expand_shape[%arg3, %arg4, 0] [1, 8, 256] [1, 1, 1] : memref<45x24x256xf32, 1> to memref<1x8x256xf32, strided<[6144, 256, 1], offset: ?>, 1>
-      %subview_3 = memref.subview %alloc_1[%arg4, 0] [8, 256] [1, 1] : memref<24x256xf32, 1> to memref<8x256xf32, strided<[256, 1], offset: ?>, 1>    
+      %subview = memref.subview %expand_shape[%arg3, %arg4, 0] [1, 8, 256] [1, 1, 1] : memref<45x24x256xf32, 1> to memref<1x8x256xf32, strided<[6144, 256, 1]>, 1>
+      %subview_3 = memref.subview %alloc_1[%arg4, 0] [8, 256] [1, 1] : memref<24x256xf32, 1> to memref<8x256xf32, strided<[256, 1]>, 1>    
     }
   }
   memref.dealloc %alloc : memref<45x6144xf32, 1>
diff --git a/mlir/test/Dialect/Bufferization/canonicalize.mlir b/mlir/test/Dialect/Bufferization/canonicalize.mlir
index df07511798b91..b99afc2ec0377 100644
--- a/mlir/test/Dialect/Bufferization/canonicalize.mlir
+++ b/mlir/test/Dialect/Bufferization/canonicalize.mlir
@@ -53,20 +53,20 @@ func.func @canonicalize_buffer_cast_of_tensor_load_different_address_space(%arg0
 // If the memrefs are definitely cast-compatible, canonicalize to
 //            cast.
 // CHECK-LABEL: func @canonicalize_buffer_cast_of_tensor_load(
-//  CHECK-SAME:   %[[M:.*]]: memref<?xf32, strided<[1], offset: 3>>)
-//  CHECK-SAME:     -> memref<?xf32, strided<[1], offset: ?>> {
+//  CHECK-SAME:   %[[M:.*]]: memref<?xf32, strided<[1]>>)
+//  CHECK-SAME:     -> memref<?xf32, strided<[1]>> {
 //   CHECK-NOT: bufferization.to_tensor
 //   CHECK-NOT: bufferization.to_buffer
 //       CHECK: %[[R:.*]] = memref.cast %[[M]]
-//  CHECK-SAME:   memref<?xf32, strided<[1], offset: 3>> to memref<?xf32, strided<[1], offset: ?>>
+//  CHECK-SAME:   memref<?xf32, strided<[1]>> to memref<?xf32, strided<[1]>>
 //       CHECK: return %[[R]]
 func.func @canonicalize_buffer_cast_of_tensor_load(
-  %arg0: memref<?xf32, strided<[1], offset: 3>>)
-  -> memref<?xf32, strided<[1], offset: ?>>
+  %arg0: memref<?xf32, strided<[1]>>)
+  -> memref<?xf32, strided<[1]>>
 {
-  %0 = bufferization.to_tensor %arg0 : memref<?xf32, strided<[1], offset: 3>> to tensor<?xf32>
-  %1 = bufferization.to_buffer %0 : tensor<?xf32> to memref<?xf32, strided<[1], offset: ?>>
-  return %1 : memref<?xf32, strided<[1], offset: ?>>
+  %0 = bufferization.to_tensor %arg0 : memref<?xf32, strided<[1]>> to tensor<?xf32>
+  %1 = bufferization.to_buffer %0 : tensor<?xf32> to memref<?xf32, strided<[1]>>
+  return %1 : memref<?xf32, strided<[1]>>
 }
 
 // -----
@@ -75,21 +75,21 @@ func.func @canonicalize_buffer_cast_of_tensor_load(
 //            copy.
 // CHECK-LABEL: func @canonicalize_buffer_cast_of_tensor_load_to_copy(
 func.func @canonicalize_buffer_cast_of_tensor_load_to_copy(
-  %arg0: memref<?xf32, strided<[1], offset: ?>>)
-  -> memref<?xf32, strided<[1], offset: 3>> {
-  %0 = bufferization.to_tensor %arg0 : memref<?xf32, strided<[1], offset: ?>> to tensor<?xf32>
-  %1 = bufferization.to_buffer %0 : tensor<?xf32> to memref<?xf32, strided<[1], offset: 3>>
-  return %1 : memref<?xf32, strided<[1], offset: 3>>
+  %arg0: memref<?xf32, strided<[1]>>)
+  -> memref<?xf32, strided<[1]>> {
+  %0 = bufferization.to_tensor %arg0 : memref<?xf32, strided<[1]>> to tensor<?xf32>
+  %1 = bufferization.to_buffer %0 : tensor<?xf32> to memref<?xf32, strided<[1]>>
+  return %1 : memref<?xf32, strided<[1]>>
 }
-// CHECK-SAME:   %[[M:.*]]: memref<?xf32, strided<[1], offset: ?>>)
-// CHECK-SAME:     -> memref<?xf32, strided<[1], offset: 3>> {
+// CHECK-SAME:   %[[M:.*]]: memref<?xf32, strided<[1]>>)
+// CHECK-SAME:     -> memref<?xf32, strided<[1]>> {
 //  CHECK-NOT: bufferization.to_tensor
 //  CHECK-NOT: bufferization.to_buffer
 //      CHECK: %[[C0:.*]] = arith.constant 0 : index
-//      CHECK: %[[DIM:.*]] = memref.dim %[[M]], %[[C0]] : memref<?xf32, strided<[1], offset: ?>>
-//      CHECK: %[[ALLOC:.*]] = memref.alloc(%[[DIM]]) : memref<?xf32, strided<[1], offset: 3>>
+//      CHECK: %[[DIM:.*]] = memref.dim %[[M]], %[[C0]] : memref<?xf32, strided<[1]>>
+//      CHECK: %[[ALLOC:.*]] = memref.alloc(%[[DIM]]) : memref<?xf32, strided<[1]>>
 //      CHECK: memref.copy %[[M]], %[[ALLOC]]
-// CHECK-SAME:   memref<?xf32, strided<[1], offset: ?>> to memref<?xf32, strided<[1], offset: 3>>
+// CHECK-SAME:   memref<?xf32, strided<[1]>> to memref<?xf32, strided<[1]>>
 //      CHECK: return %[[ALLOC]]
 
 // -----
@@ -281,16 +281,16 @@ func.func @tensor_cast_to_unranked_buffer(%arg0 : tensor<4x6x16x32xi8>) ->
 // CHECK-LABEL: func @tensor_cast_to_buffer
 //  CHECK-SAME:   %[[ARG0:.+]]: tensor<4x6x16x32xi8>
 func.func @tensor_cast_to_buffer_layout_and_memspace(%arg0 : tensor<4x6x16x32xi8>) ->
-  memref<?x?x16x32xi8, strided<[?, ?, ?, 1], offset: ?>, 1> {
+  memref<?x?x16x32xi8, strided<[?, ?, ?, 1]>, 1> {
   %0 = tensor.cast %arg0 : tensor<4x6x16x32xi8> to tensor<?x?x16x32xi8>
-  %1 = bufferization.to_buffer %0 : tensor<?x?x16x32xi8> to memref<?x?x16x32xi8, strided<[?, ?, ?, 1], offset: ?>, 1>
-  return %1 : memref<?x?x16x32xi8, strided<[?, ?, ?, 1], offset: ?>, 1>
+  %1 = bufferization.to_buffer %0 : tensor<?x?x16x32xi8> to memref<?x?x16x32xi8, strided<[?, ?, ?, 1]>, 1>
+  return %1 : memref<?x?x16x32xi8, strided<[?, ?, ?, 1]>, 1>
 }
 // CHECK:   %[[M:.+]] = bufferization.to_buffer %[[ARG0]] : tensor<4x6x16x32xi8>
 // CHECK:   %[[M1:.+]] = memref.cast %[[M]]
-// CHECK-SAME: memref<4x6x16x32xi8, strided<[?, ?, ?, 1], offset: ?>, 1>
-// CHECK-SAME: to memref<?x?x16x32xi8, strided<[?, ?, ?, 1], offset: ?>, 1>
-// CHECK:   return %[[M1]] : memref<?x?x16x32xi8, strided<[?, ?, ?, 1], offset: ?>, 1>
+// CHECK-SAME: memref<4x6x16x32xi8, strided<[?, ?, ?, 1]>, 1>
+// CHECK-SAME: to memref<?x?x16x32xi8, strided<[?, ?, ?, 1]>, 1>
+// CHECK:   return %[[M1]] : memref<?x?x16x32xi8, strided<[?, ?, ?, 1]>, 1>
 
 // -----
 
diff --git a/mlir/test/Dialect/Builtin/types.mlir b/mlir/test/Dialect/Builtin/types.mlir
index 80840ec32424e..5d2d78d260026 100644
--- a/mlir/test/Dialect/Builtin/types.mlir
+++ b/mlir/test/Dialect/Builtin/types.mlir
@@ -1,22 +1,22 @@
 // RUN: mlir-opt %s | mlir-opt | FileCheck %s
 
-// CHECK: memref<?x?xf32, strided<[?, ?], offset: ?>>
-func.func private @f1() -> memref<?x?xf32, strided<[?, ?], offset: ?>>
-// CHECK: memref<?x?xf32, strided<[42, 1], offset: 10>>
-func.func private @f2() -> memref<?x?xf32, strided<[42, 1], offset: 10>>
-// CHECK: memref<?x?xf32, strided<[?, 1], offset: 10>>
-func.func private @f3() -> memref<?x?xf32, strided<[?, 1], offset: 10>>
-// CHECK: memref<?x?xf32, strided<[?, 1], offset: ?>>
-func.func private @f4() -> memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK: memref<?x?xf32, strided<[?, ?]>>
+func.func private @f1() -> memref<?x?xf32, strided<[?, ?]>>
+// CHECK: memref<?x?xf32, strided<[42, 1]>>
+func.func private @f2() -> memref<?x?xf32, strided<[42, 1]>>
+// CHECK: memref<?x?xf32, strided<[?, 1]>>
+func.func private @f3() -> memref<?x?xf32, strided<[?, 1]>>
+// CHECK: memref<?x?xf32, strided<[?, 1]>>
+func.func private @f4() -> memref<?x?xf32, strided<[?, 1]>>
 // CHECK: memref<?x?xf32, strided<[42, 1]>>
 func.func private @f5() -> memref<?x?xf32, strided<[42, 1]>>
 // CHECK: memref<?x?xf32, strided<[42, 1]>>
-func.func private @f6() -> memref<?x?xf32, strided<[42, 1], offset: 0>>
+func.func private @f6() -> memref<?x?xf32, strided<[42, 1]>>
 // CHECK: memref<f32, strided<[]>>
 func.func private @f7() -> memref<f32, strided<[]>>
-// CHECK: memref<f32, strided<[], offset: ?>>
-func.func private @f8() -> memref<f32, strided<[], offset: ?>>
-// CHECK: memref<?xf32, strided<[-1], offset: ?>>
-func.func private @f9() -> memref<?xf32, strided<[-1], offset: ?>>
-// CHECK: memref<f32, strided<[], offset: -1>>
-func.func private @f10() -> memref<f32, strided<[], offset: -1>>
+// CHECK: memref<f32, strided<[]>>
+func.func private @f8() -> memref<f32, strided<[]>>
+// CHECK: memref<?xf32, strided<[-1]>>
+func.func private @f9() -> memref<?xf32, strided<[-1]>>
+// CHECK: memref<f32, strided<[]>>
+func.func private @f10() -> memref<f32, strided<[]>>
diff --git a/mlir/test/Dialect/ControlFlow/one-shot-bufferize.mlir b/mlir/test/Dialect/ControlFlow/one-shot-bufferize.mlir
index e37b63d01378b..258ed0a3b4122 100644
--- a/mlir/test/Dialect/ControlFlow/one-shot-bufferize.mlir
+++ b/mlir/test/Dialect/ControlFlow/one-shot-bufferize.mlir
@@ -3,10 +3,10 @@
 
 // CHECK-NO-FUNC-LABEL: func @br(
 //  CHECK-NO-FUNC-SAME:     %[[t:.*]]: tensor<5xf32>)
-//       CHECK-NO-FUNC:   %[[m:.*]] = bufferization.to_buffer %[[t]] : tensor<5xf32> to memref<5xf32, strided<[?], offset: ?>>
-//       CHECK-NO-FUNC:   %[[r:.*]] = scf.execute_region -> memref<5xf32, strided<[?], offset: ?>> {
+//       CHECK-NO-FUNC:   %[[m:.*]] = bufferization.to_buffer %[[t]] : tensor<5xf32> to memref<5xf32, strided<[?]>>
+//       CHECK-NO-FUNC:   %[[r:.*]] = scf.execute_region -> memref<5xf32, strided<[?]>> {
 //       CHECK-NO-FUNC:     cf.br ^[[block:.*]](%[[m]]
-//       CHECK-NO-FUNC:   ^[[block]](%[[arg1:.*]]: memref<5xf32, strided<[?], offset: ?>>):
+//       CHECK-NO-FUNC:   ^[[block]](%[[arg1:.*]]: memref<5xf32, strided<[?]>>):
 //       CHECK-NO-FUNC:     scf.yield %[[arg1]]
 //       CHECK-NO-FUNC:   }
 //       CHECK-NO-FUNC:   return
@@ -23,14 +23,14 @@ func.func @br(%t: tensor<5xf32>) {
 
 // CHECK-NO-FUNC-LABEL: func @cond_br(
 //  CHECK-NO-FUNC-SAME:     %[[t1:.*]]: tensor<5xf32>,
-//       CHECK-NO-FUNC:   %[[m1:.*]] = bufferization.to_buffer %[[t1]] : tensor<5xf32> to memref<5xf32, strided<[?], offset: ?>>
+//       CHECK-NO-FUNC:   %[[m1:.*]] = bufferization.to_buffer %[[t1]] : tensor<5xf32> to memref<5xf32, strided<[?]>>
 //       CHECK-NO-FUNC:   %[[alloc:.*]] = memref.alloc() {{.*}} : memref<5xf32>
-//       CHECK-NO-FUNC:   %[[r:.*]] = scf.execute_region -> memref<5xf32, strided<[?], offset: ?>> {
+//       CHECK-NO-FUNC:   %[[r:.*]] = scf.execute_region -> memref<5xf32, strided<[?]>> {
 //       CHECK-NO-FUNC:     cf.cond_br %{{.*}}, ^[[block1:.*]](%[[m1]] : {{.*}}), ^[[block2:.*]](%[[alloc]] : {{.*}})
-//       CHECK-NO-FUNC:   ^[[block1]](%[[arg1:.*]]: memref<5xf32, strided<[?], offset: ?>>):
+//       CHECK-NO-FUNC:   ^[[block1]](%[[arg1:.*]]: memref<5xf32, strided<[?]>>):
 //       CHECK-NO-FUNC:     scf.yield %[[arg1]]
 //       CHECK-NO-FUNC:   ^[[block2]](%[[arg2:.*]]: memref<5xf32>):
-//       CHECK-NO-FUNC:     %[[cast:.*]] = memref.cast %[[arg2]] : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>
+//       CHECK-NO-FUNC:     %[[cast:.*]] = memref.cast %[[arg2]] : memref<5xf32> to memref<5xf32, strided<[?]>
 //       CHECK-NO-FUNC:     cf.br ^[[block1]](%[[cast]] : {{.*}})
 //       CHECK-NO-FUNC:   }
 //       CHECK-NO-FUNC:   return
diff --git a/mlir/test/Dialect/GPU/decompose-memrefs.mlir b/mlir/test/Dialect/GPU/decompose-memrefs.mlir
index 1a19221948451..6f65136e20ad0 100644
--- a/mlir/test/Dialect/GPU/decompose-memrefs.mlir
+++ b/mlir/test/Dialect/GPU/decompose-memrefs.mlir
@@ -7,8 +7,8 @@
 //       CHECK:  gpu.launch
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
 //       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
-//       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[], offset: ?>>
-//       CHECK:  memref.store %[[VAL]], %[[PTR]][] : memref<f32, strided<[], offset: ?>>
+//       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[]>>
+//       CHECK:  memref.store %[[VAL]], %[[PTR]][] : memref<f32, strided<[]>>
 func.func @decompose_store(%arg0 : f32, %arg1 : memref<?x?x?xf32>) {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
@@ -28,23 +28,23 @@ func.func @decompose_store(%arg0 : f32, %arg1 : memref<?x?x?xf32>) {
 
 //       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6] -> (s0 + s1 * s2 + s3 * s4 + s5 * s6)>
 //       CHECK: @decompose_store_strided
-//  CHECK-SAME: (%[[VAL:.*]]: f32, %[[MEM:.*]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>)
+//  CHECK-SAME: (%[[VAL:.*]]: f32, %[[MEM:.*]]: memref<?x?x?xf32, strided<[?, ?, ?]>>)
 //       CHECK:  %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[MEM]]
 //       CHECK:  gpu.launch
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
 //       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[OFFSET]], %[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]], %[[STRIDES]]#2]
-//       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[], offset: ?>>
-//       CHECK:  memref.store %[[VAL]], %[[PTR]][] : memref<f32, strided<[], offset: ?>>
-func.func @decompose_store_strided(%arg0 : f32, %arg1 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>) {
+//       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[]>>
+//       CHECK:  memref.store %[[VAL]], %[[PTR]][] : memref<f32, strided<[]>>
+func.func @decompose_store_strided(%arg0 : f32, %arg1 : memref<?x?x?xf32, strided<[?, ?, ?]>>) {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
-  %block_dim0 = memref.dim %arg1, %c0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  %block_dim1 = memref.dim %arg1, %c1 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  %block_dim2 = memref.dim %arg1, %c2 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  %block_dim0 = memref.dim %arg1, %c0 : memref<?x?x?xf32, strided<[?, ?, ?]>>
+  %block_dim1 = memref.dim %arg1, %c1 : memref<?x?x?xf32, strided<[?, ?, ?]>>
+  %block_dim2 = memref.dim %arg1, %c2 : memref<?x?x?xf32, strided<[?, ?, ?]>>
   gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %c1, %grid_y = %c1, %grid_z = %c1)
              threads(%tx, %ty, %tz) in (%block_x = %block_dim0, %block_y = %block_dim1, %block_z = %block_dim2) {
-    memref.store %arg0, %arg1[%tx, %ty, %tz] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref.store %arg0, %arg1[%tx, %ty, %tz] : memref<?x?x?xf32, strided<[?, ?, ?]>>
     gpu.terminator
   }
   return
@@ -59,8 +59,8 @@ func.func @decompose_store_strided(%arg0 : f32, %arg1 : memref<?x?x?xf32, stride
 //       CHECK:  gpu.launch
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
 //       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
-//       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[], offset: ?>>
-//       CHECK:  %[[RES:.*]] = memref.load %[[PTR]][] : memref<f32, strided<[], offset: ?>>
+//       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[]>>
+//       CHECK:  %[[RES:.*]] = memref.load %[[PTR]][] : memref<f32, strided<[]>>
 //       CHECK:  "test.test"(%[[RES]]) : (f32) -> ()
 func.func @decompose_load(%arg0 : memref<?x?x?xf32>) {
   %c0 = arith.constant 0 : index
@@ -88,7 +88,7 @@ func.func @decompose_load(%arg0 : memref<?x?x?xf32>) {
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
 //       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
 //       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [%{{.*}}, %{{.*}}, %{{.*}}], strides: [%[[STRIDES]]#0, %[[STRIDES]]#1, 1]
-//       CHECK:  "test.test"(%[[PTR]]) : (memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>) -> ()
+//       CHECK:  "test.test"(%[[PTR]]) : (memref<?x?x?xf32, strided<[?, ?, ?]>>) -> ()
 func.func @decompose_subview(%arg0 : memref<?x?x?xf32>) {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
@@ -98,8 +98,8 @@ func.func @decompose_subview(%arg0 : memref<?x?x?xf32>) {
   %block_dim2 = memref.dim %arg0, %c2 : memref<?x?x?xf32>
   gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %c1, %grid_y = %c1, %grid_z = %c1)
              threads(%tx, %ty, %tz) in (%block_x = %block_dim0, %block_y = %block_dim1, %block_z = %block_dim2) {
-    %res = memref.subview %arg0[%tx, %ty, %tz] [%c2, %c2, %c2] [%c1, %c1, %c1] : memref<?x?x?xf32> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-    "test.test"(%res) : (memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>) -> ()
+    %res = memref.subview %arg0[%tx, %ty, %tz] [%c2, %c2, %c2] [%c1, %c1, %c1] : memref<?x?x?xf32> to memref<?x?x?xf32, strided<[?, ?, ?]>>
+    "test.test"(%res) : (memref<?x?x?xf32, strided<[?, ?, ?]>>) -> ()
     gpu.terminator
   }
   return
@@ -119,7 +119,7 @@ func.func @decompose_subview(%arg0 : memref<?x?x?xf32>) {
 //       CHECK:  %[[IDX1:.*]] = affine.apply #[[MAP1]]()[%[[STRIDES]]#1]
 //       CHECK:  %[[IDX2:.*]] = affine.apply #[[MAP2]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
 //       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX2]]], sizes: [%{{.*}}, %{{.*}}, %{{.*}}], strides: [%[[IDX]], %[[IDX1]], 4]
-//       CHECK:  "test.test"(%[[PTR]]) : (memref<?x?x?xf32, strided<[?, ?, 4], offset: ?>>) -> ()
+//       CHECK:  "test.test"(%[[PTR]]) : (memref<?x?x?xf32, strided<[?, ?, 4]>>) -> ()
 func.func @decompose_subview_strided(%arg0 : memref<?x?x?xf32>) {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
@@ -129,8 +129,8 @@ func.func @decompose_subview_strided(%arg0 : memref<?x?x?xf32>) {
   %block_dim2 = memref.dim %arg0, %c2 : memref<?x?x?xf32>
   gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %c1, %grid_y = %c1, %grid_z = %c1)
              threads(%tx, %ty, %tz) in (%block_x = %block_dim0, %block_y = %block_dim1, %block_z = %block_dim2) {
-    %res = memref.subview %arg0[%tx, %ty, %tz] [%c2, %c2, %c2] [2, 3, 4] : memref<?x?x?xf32> to memref<?x?x?xf32, strided<[?, ?, 4], offset: ?>>
-    "test.test"(%res) : (memref<?x?x?xf32, strided<[?, ?, 4], offset: ?>>) -> ()
+    %res = memref.subview %arg0[%tx, %ty, %tz] [%c2, %c2, %c2] [2, 3, 4] : memref<?x?x?xf32> to memref<?x?x?xf32, strided<[?, ?, 4]>>
+    "test.test"(%res) : (memref<?x?x?xf32, strided<[?, ?, 4]>>) -> ()
     gpu.terminator
   }
   return
diff --git a/mlir/test/Dialect/GPU/transform-gpu.mlir b/mlir/test/Dialect/GPU/transform-gpu.mlir
index 7e4a02109227a..587ee03121ff6 100644
--- a/mlir/test/Dialect/GPU/transform-gpu.mlir
+++ b/mlir/test/Dialect/GPU/transform-gpu.mlir
@@ -662,7 +662,7 @@ func.func @simple_fill(%arg0: memref<128xf32>) -> memref<128xf32> {
 //       CHECK:     %[[BIDX:.*]] = gpu.block_id x
 //       CHECK:     %[[BLX:.*]] = affine.apply #[[$MAPB]]()[%[[BIDX]]]
     %0 = affine.apply #map(%arg1)
-    %subview = memref.subview %arg0[%0] [128] [1] : memref<128xf32> to memref<128xf32, strided<[1], offset: ?>>
+    %subview = memref.subview %arg0[%0] [128] [1] : memref<128xf32> to memref<128xf32, strided<[1]>>
     scf.forall (%arg2) in (4) {
 //       CHECK:     %[[TIDX:.*]] = gpu.thread_id x
 //       CHECK:     %[[TIDY:.*]] = gpu.thread_id y
@@ -671,11 +671,11 @@ func.func @simple_fill(%arg0: memref<128xf32>) -> memref<128xf32> {
 //   CHECK-NOT:     scf.if
 //       CHECK:       memref.subview %{{.*}}[%[[THX]]]
       %1 = affine.apply #map1(%arg2)
-      %subview_0 = memref.subview %subview[%1] [32] [1] : memref<128xf32, strided<[1], offset: ?>> to memref<32xf32, strided<[1], offset: ?>>
-      vector.transfer_write %cst, %subview_0[%c0] {in_bounds = [true]} : vector<32xf32>, memref<32xf32, strided<[1], offset: ?>>
-      memref.copy %subview_0, %subview_0 : memref<32xf32, strided<[1], offset: ?>> to memref<32xf32, strided<[1], offset: ?>>
+      %subview_0 = memref.subview %subview[%1] [32] [1] : memref<128xf32, strided<[1]>> to memref<32xf32, strided<[1]>>
+      vector.transfer_write %cst, %subview_0[%c0] {in_bounds = [true]} : vector<32xf32>, memref<32xf32, strided<[1]>>
+      memref.copy %subview_0, %subview_0 : memref<32xf32, strided<[1]>> to memref<32xf32, strided<[1]>>
     } {mapping = [#gpu.warp<linear_dim_0>]}
-    memref.copy %subview, %subview : memref<128xf32, strided<[1], offset: ?>> to memref<128xf32, strided<[1], offset: ?>>
+    memref.copy %subview, %subview : memref<128xf32, strided<[1]>> to memref<128xf32, strided<[1]>>
   } {mapping = [#gpu.block<x>]}
   return %arg0 : memref<128xf32>
 }
@@ -713,7 +713,7 @@ func.func @simple_fill(%arg0: memref<128x256xf32>) -> memref<128x256xf32> {
     //   CHECK:     %[[BLX:.*]] = affine.apply #[[$MAPB]]()[%[[BIDX]]]
     %0 = affine.apply #map(%arg1)
     %subview = memref.subview %arg0[%0, 0] [128, 256] [1, 1]
-      : memref<128x256xf32> to memref<128x256xf32, strided<[256, 1], offset: ?>>
+      : memref<128x256xf32> to memref<128x256xf32, strided<[256, 1]>>
 
     // %arg2 and %arg3 map to lanes [0, 6) and are turned into epxressions
     // involving threadIdx.x/y by the map_nested_forall_to_threads
@@ -730,9 +730,9 @@ func.func @simple_fill(%arg0: memref<128x256xf32>) -> memref<128x256xf32> {
       %1 = affine.apply #map1(%arg2)
       %2 = affine.apply #map1(%arg3)
       %subview_0 = memref.subview %subview[%1, %2] [16, 32] [1, 1] 
-        : memref<128x256xf32, strided<[256, 1], offset: ?>> to memref<16x32xf32, strided<[256, 1], offset: ?>>
+        : memref<128x256xf32, strided<[256, 1]>> to memref<16x32xf32, strided<[256, 1]>>
       vector.transfer_write %cst, %subview_0[%c0, %c0] {in_bounds = [true, true]} 
-        : vector<16x32xf32>, memref<16x32xf32, strided<[256, 1], offset: ?>>
+        : vector<16x32xf32>, memref<16x32xf32, strided<[256, 1]>>
 
     // This could be obtained e.g. if a previous transformation mapped this loop
     // to lanes. This can aslo be written by hand as valid IR.
@@ -780,7 +780,7 @@ func.func @simple_fill(%arg0: memref<128xf32>) -> memref<128xf32> {
 //       CHECK:     %[[BIDX:.*]] = gpu.block_id x
 //       CHECK:     %[[BLX:.*]] = affine.apply #[[$MAPB]]()[%[[BIDX]]]
     %0 = affine.apply #map(%arg1)
-    %subview = memref.subview %arg0[%0] [128] [1] : memref<128xf32> to memref<128xf32, strided<[1], offset: ?>>
+    %subview = memref.subview %arg0[%0] [128] [1] : memref<128xf32> to memref<128xf32, strided<[1]>>
 
     // %arg2 and %arg3 map to lanes [0, 6) and are turned into epxressions
     // involving threadIdx.x/y by the map_nested_forall_to_threads
@@ -809,15 +809,15 @@ func.func @simple_fill(%arg0: memref<128xf32>) -> memref<128xf32> {
       //       CHECK:       memref.subview %{{.*}}[%[[W0]]] [%[[W1]]]
       %1 = affine.apply #map1(%arg2)
       %2 = affine.apply #map1(%arg3)
-      %subview_0 = memref.subview %subview[%1] [%2] [1] : memref<128xf32, strided<[1], offset: ?>> to memref<?xf32, strided<[1], offset: ?>>
-      vector.transfer_write %cst, %subview_0[%c0] {in_bounds = [true]} : vector<32xf32>, memref<?xf32, strided<[1], offset: ?>>
+      %subview_0 = memref.subview %subview[%1] [%2] [1] : memref<128xf32, strided<[1]>> to memref<?xf32, strided<[1]>>
+      vector.transfer_write %cst, %subview_0[%c0] {in_bounds = [true]} : vector<32xf32>, memref<?xf32, strided<[1]>>
 
     // This could be obtained e.g. if a previous transformation mapped this loop
     // to lanes. This can aslo be written by hand as valid IR.
     // This additionally uses the hex mask: 0x 10 1111 0001
     } {mapping = [#gpu.warp<linear_dim_0>, #gpu.warp<linear_dim_1>, #gpu.mask<0x2f1>]}
 
-    memref.copy %subview, %subview : memref<128xf32, strided<[1], offset: ?>> to memref<128xf32, strided<[1], offset: ?>>
+    memref.copy %subview, %subview : memref<128xf32, strided<[1]>> to memref<128xf32, strided<[1]>>
   } {mapping = [#gpu.block<x>]}
   return %arg0 : memref<128xf32>
 }
diff --git a/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-target-tag.mlir b/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-target-tag.mlir
index 835ae01ffa8c1..8ef3cd5b88bec 100644
--- a/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-target-tag.mlir
+++ b/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-target-tag.mlir
@@ -12,8 +12,8 @@ module attributes {transform.target_tag="payload"} {
 
 // Check that we properly lower to llvm memref operations that require to be
 // expanded first, like `memref.subview`.
-func.func @subview(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : index, %arg1 : index, %arg2 : index)
--> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @subview(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1 : index, %arg2 : index)
+-> memref<?x?xf32, strided<[?, ?]>> {
   // CHECK-LABEL: @subview
   // CHECK-SAME: %[[BASE:[^:]*]]: !llvm.ptr
   // CHECK-SAME: %[[BASE_ALIGNED:[^:]*]]: !llvm.ptr,
@@ -48,9 +48,9 @@ func.func @subview(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : in
   // CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[ARG1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
   %1 = memref.subview %0[%arg0, %arg1][%arg0, %arg1][%arg0, %arg1] :
-    memref<64x4xf32, strided<[4, 1], offset: 0>>
-  to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %1 : memref<?x?xf32, strided<[?, ?], offset: ?>>
+    memref<64x4xf32, strided<[4, 1]>>
+  to memref<?x?xf32, strided<[?, ?]>>
+  return %1 : memref<?x?xf32, strided<[?, ?]>>
 }
 
 } // transform payload
diff --git a/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-top-level-named-sequence.mlir b/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-top-level-named-sequence.mlir
index 864ebb2155740..48e18d95c0e59 100644
--- a/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-top-level-named-sequence.mlir
+++ b/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-top-level-named-sequence.mlir
@@ -11,8 +11,8 @@
 
 // Check that we properly lower to llvm memref operations that require to be
 // expanded first, like `memref.subview`.
-func.func @subview(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : index, %arg1 : index, %arg2 : index)
--> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @subview(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1 : index, %arg2 : index)
+-> memref<?x?xf32, strided<[?, ?]>> {
   // CHECK-LABEL: @subview
   // CHECK-SAME: %[[BASE:[^:]*]]: !llvm.ptr
   // CHECK-SAME: %[[BASE_ALIGNED:[^:]*]]: !llvm.ptr,
@@ -47,9 +47,9 @@ func.func @subview(%0 : memref<64x4xf32, strided<[4, 1], offset: 0>>, %arg0 : in
   // CHECK: %[[DESC6:.*]] = llvm.insertvalue %[[ARG1]], %[[DESC5]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
   %1 = memref.subview %0[%arg0, %arg1][%arg0, %arg1][%arg0, %arg1] :
-    memref<64x4xf32, strided<[4, 1], offset: 0>>
-  to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %1 : memref<?x?xf32, strided<[?, ?], offset: ?>>
+    memref<64x4xf32, strided<[4, 1]>>
+  to memref<?x?xf32, strided<[?, ?]>>
+  return %1 : memref<?x?xf32, strided<[?, ?]>>
 }
 
 module @named_inclusion_in_named attributes { transform.with_named_sequence } {
diff --git a/mlir/test/Dialect/Linalg/collapse-dim.mlir b/mlir/test/Dialect/Linalg/collapse-dim.mlir
index 61c4234c301f8..c86b06e90ae69 100644
--- a/mlir/test/Dialect/Linalg/collapse-dim.mlir
+++ b/mlir/test/Dialect/Linalg/collapse-dim.mlir
@@ -135,10 +135,10 @@ func.func @collapsable_memref_projected_ops(%arg0: memref<1x24x32x8xf32>, %arg1:
 
 func.func @uncollapsable_strided_memref(%arg0: memref<2x6x24x48xi32>, %arg1: memref<2x6x24x48xi32>) -> (memref<2x6x24x48xi32>) {
   %alloc = memref.alloc() {alignment = 64 : i64} : memref<2x6x24x48xi32>
-  %subview = memref.subview %arg0[0, 0, 0, 0] [1, 3, 12, 24] [1, 1, 1, 1] : memref<2x6x24x48xi32> to memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1], offset: 0>>
-  %subview0 = memref.subview %arg1[0, 0, 0, 0] [1, 3, 12, 24] [1, 1, 1, 1] : memref<2x6x24x48xi32> to memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1], offset: 0>>
-  %subview1 = memref.subview %alloc[0, 0, 0, 0] [1, 3, 12, 24] [1, 1, 1, 1] : memref<2x6x24x48xi32> to memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1], offset: 0>>
-  linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%subview, %subview0 : memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1], offset: 0>>, memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1], offset: 0>>) outs(%subview1 : memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1], offset: 0>>) {
+  %subview = memref.subview %arg0[0, 0, 0, 0] [1, 3, 12, 24] [1, 1, 1, 1] : memref<2x6x24x48xi32> to memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1]>>
+  %subview0 = memref.subview %arg1[0, 0, 0, 0] [1, 3, 12, 24] [1, 1, 1, 1] : memref<2x6x24x48xi32> to memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1]>>
+  %subview1 = memref.subview %alloc[0, 0, 0, 0] [1, 3, 12, 24] [1, 1, 1, 1] : memref<2x6x24x48xi32> to memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1]>>
+  linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%subview, %subview0 : memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1]>>, memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1]>>) outs(%subview1 : memref<1x3x12x24xi32, strided<[6912, 1152, 48, 1]>>) {
   ^bb0(%in: i32, %in_0: i32, %out: i32):
     %0 = arith.addi %in, %in_0 : i32
     linalg.yield %0 : i32
diff --git a/mlir/test/Dialect/Linalg/hoisting.mlir b/mlir/test/Dialect/Linalg/hoisting.mlir
index aa0b97a4787fa..d573b8bb5ec99 100644
--- a/mlir/test/Dialect/Linalg/hoisting.mlir
+++ b/mlir/test/Dialect/Linalg/hoisting.mlir
@@ -608,7 +608,7 @@ module attributes {transform.with_named_sequence} {
 // CHECK:            %[[D1:.+]] = vector.transfer_read %[[ALLOC_0]][%[[C0]], %[[C0]]], %[[CST]] {in_bounds = [true, true]}
 // CHECK-SAME:         : memref<32x128xf32>, vector<32x128xf32>
 // CHECK:            "some_use"(%[[D0]], %[[D1]], %[[CAST]]) : (vector<32x64xf32>, vector<32x128xf32>, memref<32x128xf32,
-// CHECK-SAME:         strided<[128, 1], offset: ?>>) -> ()
+// CHECK-SAME:         strided<[128, 1]>>) -> ()
 // CHECK:          }
 // CHECK:          memref.dealloc %[[ALLOC]] : memref<32x64xf32>
 // CHECK:          return
@@ -619,11 +619,11 @@ func.func @hoist_vector_transfer_read() {
   %cst_2 = arith.constant 0.000000e+00 : f32
   %memref0 = memref.alloc() : memref<32x64xf32>
   %memref2 = memref.alloc() : memref<32x128xf32>
-  %subview2 = memref.subview %memref2[%c0, %c0] [32, 128] [1, 1]: memref<32x128xf32> to memref<32x128xf32, strided<[128, 1], offset: ?>>
+  %subview2 = memref.subview %memref2[%c0, %c0] [32, 128] [1, 1]: memref<32x128xf32> to memref<32x128xf32, strided<[128, 1]>>
   scf.for %arg0 = %c0 to %c1024 step %c128 {
     %2 = vector.transfer_read %memref2[%c0, %c0], %cst_2 {in_bounds = [true, true]} : memref<32x128xf32>, vector<32x128xf32>
     %3 = vector.transfer_read %memref0[%c0, %c0], %cst_2 {in_bounds = [true, true]} : memref<32x64xf32>, vector<32x64xf32>
-    "some_use"(%3, %2, %subview2) : (vector<32x64xf32>, vector<32x128xf32>, memref<32x128xf32, strided<[128, 1], offset: ?>>) -> ()
+    "some_use"(%3, %2, %subview2) : (vector<32x64xf32>, vector<32x128xf32>, memref<32x128xf32, strided<[128, 1]>>) -> ()
   }
   memref.dealloc %memref0 : memref<32x64xf32>
   return
@@ -813,7 +813,7 @@ module attributes {transform.with_named_sequence} {
 //       CHECK:    scf.for {{.*}} {
 //       CHECK:      vector.transfer_write {{.*}} : vector<4xi32>, memref<4xi32>
 //       CHECK-NEXT:      vector.transfer_read {{.*}} : memref<1x4x1xi32>, vector<1x4x1xi32>
-//       CHECK-NEXT:      vector.transfer_write {{.*}} : vector<1x4x1xi32>, memref<1x4x1xi32, strided<[20, 1, 1], offset: ?>>
+//       CHECK-NEXT:      vector.transfer_write {{.*}} : vector<1x4x1xi32>, memref<1x4x1xi32, strided<[20, 1, 1]>>
 //       CHECK-NEXT:    }
 
 func.func @no_hoisting_collapse_shape(%in_0: memref<1x20x1xi32>, %1: memref<9x1xi32>, %vec: vector<4xi32>) {
@@ -823,11 +823,11 @@ func.func @no_hoisting_collapse_shape(%in_0: memref<1x20x1xi32>, %1: memref<9x1x
   %c20 = arith.constant 20 : index
   %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x4x1xi32>
   scf.for %arg0 = %c0 to %c20 step %c4 {
-    %subview = memref.subview %in_0[0, %arg0, 0] [1, 4, 1] [1, 1, 1] : memref<1x20x1xi32> to memref<1x4x1xi32, strided<[20, 1, 1], offset: ?>>
+    %subview = memref.subview %in_0[0, %arg0, 0] [1, 4, 1] [1, 1, 1] : memref<1x20x1xi32> to memref<1x4x1xi32, strided<[20, 1, 1]>>
     %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x4x1xi32> into memref<4xi32>
     vector.transfer_write %vec, %collapse_shape[%c0] {in_bounds = [true]} : vector<4xi32>, memref<4xi32>
     %read = vector.transfer_read %alloca[%c0, %c0, %c0], %c0_i32 {in_bounds = [true, true, true]} : memref<1x4x1xi32>, vector<1x4x1xi32>
-    vector.transfer_write %read, %subview[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32, strided<[20, 1, 1], offset: ?>>
+    vector.transfer_write %read, %subview[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32, strided<[20, 1, 1]>>
   }
   return
 }
diff --git a/mlir/test/Dialect/Linalg/library-calls.mlir b/mlir/test/Dialect/Linalg/library-calls.mlir
index 77c9d4a911447..e86e5bd060c16 100644
--- a/mlir/test/Dialect/Linalg/library-calls.mlir
+++ b/mlir/test/Dialect/Linalg/library-calls.mlir
@@ -36,8 +36,8 @@ func.func @matmul(%A: memref<?x?xf32>, %B: memref<?x?xf32>) -> (memref<?x?xf32>)
     iterator_types = ["parallel"]
 }
 
-// CHECK: func.func private @linalg_copy_view32xf16as1_view32xf16as6(memref<32xf16, strided<[?], offset: ?>, 1>, memref<32xf16, strided<[?], offset: ?>, 6>) attributes {llvm.emit_c_interface}
-// CHECK: func.func private @linalg_copy_view32xf16as6_view32xf16as1(memref<32xf16, strided<[?], offset: ?>, 6>, memref<32xf16, strided<[?], offset: ?>, 1>) attributes {llvm.emit_c_interface}
+// CHECK: func.func private @linalg_copy_view32xf16as1_view32xf16as6(memref<32xf16, strided<[?]>, 1>, memref<32xf16, strided<[?]>, 6>) attributes {llvm.emit_c_interface}
+// CHECK: func.func private @linalg_copy_view32xf16as6_view32xf16as1(memref<32xf16, strided<[?]>, 6>, memref<32xf16, strided<[?]>, 1>) attributes {llvm.emit_c_interface}
 
 module {
   func.func @helper(%arg7: memref<32xf16, 1>, %arg8: memref<32xf16, 1>, %arg9: memref<32xf16, 1>) {
diff --git a/mlir/test/Dialect/Linalg/loops.mlir b/mlir/test/Dialect/Linalg/loops.mlir
index efe8010cffc91..b94f5bb30876e 100644
--- a/mlir/test/Dialect/Linalg/loops.mlir
+++ b/mlir/test/Dialect/Linalg/loops.mlir
@@ -157,47 +157,47 @@ func.func @dot_bool(%arg0: memref<?xi1>, %arg1: memref<?xi1>,
 //  CHECK-NEXT:   store %[[res]], {{.*}} : memref<i1>
 
 
-func.func @dot_view(%arg0: memref<?xf32, strided<[1], offset: ?>>, %arg1: memref<?xf32, strided<[1], offset: ?>>, %arg2: memref<f32>) {
-  linalg.dot ins(%arg0, %arg1 : memref<?xf32, strided<[1], offset: ?>>,
-                                memref<?xf32, strided<[1], offset: ?>>)
+func.func @dot_view(%arg0: memref<?xf32, strided<[1]>>, %arg1: memref<?xf32, strided<[1]>>, %arg2: memref<f32>) {
+  linalg.dot ins(%arg0, %arg1 : memref<?xf32, strided<[1]>>,
+                                memref<?xf32, strided<[1]>>)
             outs(%arg2:  memref<f32>)
   return
 }
 // CHECK-LABEL: func @dot_view(
-//       CHECK:   %{{.*}}: memref<?xf32, strided<[1], offset: ?>>, %{{.*}}: memref<?xf32, strided<[1], offset: ?>>, %{{.*}}: memref<f32>) {
-//       CHECK: %[[K:.*]] = memref.dim %arg0, %c0 : memref<?xf32, strided<[1], offset: ?>>
+//       CHECK:   %{{.*}}: memref<?xf32, strided<[1]>>, %{{.*}}: memref<?xf32, strided<[1]>>, %{{.*}}: memref<f32>) {
+//       CHECK: %[[K:.*]] = memref.dim %arg0, %c0 : memref<?xf32, strided<[1]>>
 //       CHECK: scf.for {{.*}} to %[[K]]
-//   CHECK-DAG:   %[[a:.*]] = memref.load %arg0[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
-//   CHECK-DAG:   %[[b:.*]] = memref.load %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
+//   CHECK-DAG:   %[[a:.*]] = memref.load %arg0[%{{.*}}] : memref<?xf32, strided<[1]>>
+//   CHECK-DAG:   %[[b:.*]] = memref.load %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1]>>
 //   CHECK-DAG:   %[[inc:.*]] = arith.mulf %[[a]], %[[b]] : f32
 //   CHECK-DAG:   %[[c:.*]] = memref.load %{{.*}}[] : memref<f32>
 //   CHECK-DAG:   %[[res:.*]] = arith.addf %[[c]], %[[inc]] : f32
 //       CHECK:   store %[[res]], %{{.*}}[] : memref<f32>
 
 // CHECKPARALLEL-LABEL: func @dot_view(
-//       CHECKPARALLEL:   %{{.*}}: memref<?xf32, strided<[1], offset: ?>>, %{{.*}}: memref<?xf32, strided<[1], offset: ?>>, %{{.*}}: memref<f32>) {
-//       CHECKPARALLEL: %[[K:.*]] = memref.dim %arg0, %c0 : memref<?xf32, strided<[1], offset: ?>>
+//       CHECKPARALLEL:   %{{.*}}: memref<?xf32, strided<[1]>>, %{{.*}}: memref<?xf32, strided<[1]>>, %{{.*}}: memref<f32>) {
+//       CHECKPARALLEL: %[[K:.*]] = memref.dim %arg0, %c0 : memref<?xf32, strided<[1]>>
 //       CHECKPARALLEL: scf.for {{.*}} to %[[K]]
-//   CHECKPARALLEL-DAG:   %[[a:.*]] = memref.load %arg0[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
-//   CHECKPARALLEL-DAG:   %[[b:.*]] = memref.load %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
+//   CHECKPARALLEL-DAG:   %[[a:.*]] = memref.load %arg0[%{{.*}}] : memref<?xf32, strided<[1]>>
+//   CHECKPARALLEL-DAG:   %[[b:.*]] = memref.load %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1]>>
 //   CHECKPARALLEL-DAG:   %[[inc:.*]] = arith.mulf %[[a]], %[[b]] : f32
 //   CHECKPARALLEL-DAG:   %[[c:.*]] = memref.load %{{.*}}[] : memref<f32>
 //   CHECKPARALLEL-DAG:   %[[res:.*]] = arith.addf %[[c]], %[[inc]] : f32
 //       CHECKPARALLEL:   store %[[res]], %{{.*}}[] : memref<f32>
 
-func.func @fill_view(%arg0: memref<?xf32, strided<[1], offset: ?>>, %arg1: f32) {
-  linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<?xf32, strided<[1], offset: ?>>)
+func.func @fill_view(%arg0: memref<?xf32, strided<[1]>>, %arg1: f32) {
+  linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<?xf32, strided<[1]>>)
   return
 }
 // CHECK-LABEL: func @fill_view(
-//       CHECK: %{{.*}}: memref<?xf32, strided<[1], offset: ?>>, %{{.*}}: f32) {
+//       CHECK: %{{.*}}: memref<?xf32, strided<[1]>>, %{{.*}}: f32) {
 //       CHECK:   scf.for {{.*}} to %{{.*}}
-//       CHECK:     store %{{.*}}, %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
+//       CHECK:     store %{{.*}}, %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1]>>
 
 // CHECKPARALLEL-LABEL: func @fill_view(
-//       CHECKPARALLEL: %{{.*}}: memref<?xf32, strided<[1], offset: ?>>, %{{.*}}: f32) {
+//       CHECKPARALLEL: %{{.*}}: memref<?xf32, strided<[1]>>, %{{.*}}: f32) {
 //       CHECKPARALLEL:   scf.parallel (%{{.*}}) = (%{{.*}}) to (%{{.*}}) step (%{{.*}}) {
-//       CHECKPARALLEL:     store %{{.*}}, %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
+//       CHECKPARALLEL:     store %{{.*}}, %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1]>>
 
 func.func @fill_view0(%arg0: memref<f32>, %arg1: f32) {
   linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<f32>)
@@ -209,44 +209,44 @@ func.func @fill_view0(%arg0: memref<f32>, %arg1: f32) {
 // CHECKPARALLEL-LABEL: func @fill_view0(%{{.*}}: memref<f32>, %{{.*}}: f32) {
 //       CHECKPARALLEL:   store %{{.*}}, %{{.*}}[] : memref<f32>
 
-func.func @fill_view3(%arg0: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>, %arg1: f32) {
-  linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>)
+func.func @fill_view3(%arg0: memref<?x?x?xf32, strided<[?, ?, 1]>>, %arg1: f32) {
+  linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<?x?x?xf32, strided<[?, ?, 1]>>)
   return
 }
 // CHECK-LABEL: func @fill_view3(
-//       CHECK: %{{.*}}: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>, %{{.*}}: f32) {
+//       CHECK: %{{.*}}: memref<?x?x?xf32, strided<[?, ?, 1]>>, %{{.*}}: f32) {
 //       CHECK:   scf.for {{.*}} to %{{.*}}
 //       CHECK:     scf.for {{.*}} to %{{.*}}
 //       CHECK:       scf.for {{.*}} to %{{.*}}
-//       CHECK:         store %{{.*}}, {{.*}} : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
+//       CHECK:         store %{{.*}}, {{.*}} : memref<?x?x?xf32, strided<[?, ?, 1]>>
 
 // CHECKPARALLEL-LABEL: func @fill_view3(
-//       CHECKPARALLEL: %{{.*}}: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>, %{{.*}}: f32) {
+//       CHECKPARALLEL: %{{.*}}: memref<?x?x?xf32, strided<[?, ?, 1]>>, %{{.*}}: f32) {
 //       CHECKPARALLEL:   scf.parallel (%{{.*}}, %{{.*}}, %{{.*}}) = (%{{.*}}, %{{.*}}, %{{.*}}) to (%{{.*}}, %{{.*}}, %{{.*}}) step (%{{.*}}, %{{.*}}, %{{.*}}) {
-//       CHECKPARALLEL:     store %{{.*}}, {{.*}} : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
+//       CHECKPARALLEL:     store %{{.*}}, {{.*}} : memref<?x?x?xf32, strided<[?, ?, 1]>>
 
-func.func @copy_view(%arg0: memref<?xf32, strided<[1], offset: ?>>, %arg1: memref<?xf32, strided<[1], offset: ?>>) {
+func.func @copy_view(%arg0: memref<?xf32, strided<[1]>>, %arg1: memref<?xf32, strided<[1]>>) {
   linalg.generic {
     iterator_types = ["parallel"],
     indexing_maps = [ affine_map<(i) -> (i)>, affine_map<(i) -> (i)>] }
-    ins(%arg0: memref<?xf32, strided<[1], offset: ?>>)
-   outs(%arg1: memref<?xf32, strided<[1], offset: ?>>) {
+    ins(%arg0: memref<?xf32, strided<[1]>>)
+   outs(%arg1: memref<?xf32, strided<[1]>>) {
     ^bb0(%a: f32, %b: f32):
       linalg.yield %a : f32
   }
   return
 }
 // CHECK-LABEL: func @copy_view(
-//       CHECK: %{{.*}}: memref<?xf32, strided<[1], offset: ?>>, %{{.*}}: memref<?xf32, strided<[1], offset: ?>>) {
+//       CHECK: %{{.*}}: memref<?xf32, strided<[1]>>, %{{.*}}: memref<?xf32, strided<[1]>>) {
 //       CHECK:   scf.for {{.*}} to %{{.*}}
-//       CHECK:     %[[L:.*]] = memref.load %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
-//       CHECK:     store %[[L]], %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
+//       CHECK:     %[[L:.*]] = memref.load %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1]>>
+//       CHECK:     store %[[L]], %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1]>>
 
 // CHECKPARALLEL-LABEL: func @copy_view(
-//       CHECKPARALLEL: %{{.*}}: memref<?xf32, strided<[1], offset: ?>>, %{{.*}}: memref<?xf32, strided<[1], offset: ?>>) {
+//       CHECKPARALLEL: %{{.*}}: memref<?xf32, strided<[1]>>, %{{.*}}: memref<?xf32, strided<[1]>>) {
 //       CHECKPARALLEL:   scf.parallel (%{{.*}}) = (%{{.*}}) to (%{{.*}}) step (%{{.*}}) {
-//       CHECKPARALLEL:     %[[L:.*]] = memref.load %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
-//       CHECKPARALLEL:     store %[[L]], %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1], offset: ?>>
+//       CHECKPARALLEL:     %[[L:.*]] = memref.load %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1]>>
+//       CHECKPARALLEL:     store %[[L]], %{{.*}}[%{{.*}}] : memref<?xf32, strided<[1]>>
 
 #accesses = [
   affine_map<(i, j, k) -> (i, j)>,
@@ -259,11 +259,11 @@ func.func @copy_view(%arg0: memref<?xf32, strided<[1], offset: ?>>, %arg1: memre
   library_call = "some_external_function_name_2",
   doc = "B(i,j,k), C(i,k,j) = foo(A(i, j), B(i,j,k), C(i,k,j))"
 }
-func.func @generic_region(%arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>, %arg1: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>, %arg2: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>) {
+func.func @generic_region(%arg0: memref<?x?xf32, strided<[?, 1]>>, %arg1: memref<?x?x?xf32, strided<[?, ?, 1]>>, %arg2: memref<?x?x?xf32, strided<[?, ?, 1]>>) {
   linalg.generic #trait2
-    ins(%arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>)
-   outs(%arg1, %arg2 : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>,
-                       memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>) {
+    ins(%arg0: memref<?x?xf32, strided<[?, 1]>>)
+   outs(%arg1, %arg2 : memref<?x?x?xf32, strided<[?, ?, 1]>>,
+                       memref<?x?x?xf32, strided<[?, ?, 1]>>) {
     ^bb0(%a: f32, %b: f32, %c: f32):
       %d = arith.mulf %a, %b : f32
       %e = arith.addf %c, %d : f32
@@ -275,23 +275,23 @@ func.func @generic_region(%arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>, %a
 //       CHECK: scf.for %[[i:.*]] = {{.*}}
 //       CHECK:   scf.for %[[j:.*]] = {{.*}}
 //       CHECK:     scf.for %[[k:.*]] = {{.*}}
-//       CHECK:       %[[a:.*]] = memref.load %{{.*}}[%[[i]], %[[j]]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
-//       CHECK:       %[[b:.*]] = memref.load %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
-//       CHECK:       %[[c:.*]] = memref.load %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
+//       CHECK:       %[[a:.*]] = memref.load %{{.*}}[%[[i]], %[[j]]] : memref<?x?xf32, strided<[?, 1]>>
+//       CHECK:       %[[b:.*]] = memref.load %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, strided<[?, ?, 1]>>
+//       CHECK:       %[[c:.*]] = memref.load %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, strided<[?, ?, 1]>>
 //       CHECK:       %[[d:.*]] = arith.mulf %[[a]], %[[b]] : f32
 //       CHECK:       %[[e:.*]] = arith.addf %[[c]], %[[d]] : f32
-//       CHECK:       store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
-//       CHECK:       store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
+//       CHECK:       store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, strided<[?, ?, 1]>>
+//       CHECK:       store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, strided<[?, ?, 1]>>
 
 // CHECKPARALLEL-LABEL: @generic_region
 //       CHECKPARALLEL: scf.parallel (%[[i:[a-zA-Z0-9_]*]], %[[j:[a-zA-Z0-9_]*]], %[[k:[a-zA-Z0-9_]*]])
-//       CHECKPARALLEL:   %[[a:.*]] = memref.load %{{.*}}[%[[i]], %[[j]]] : memref<?x?xf32, strided<[?, 1], offset: ?>>
-//       CHECKPARALLEL:   %[[b:.*]] = memref.load %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
-//       CHECKPARALLEL:   %[[c:.*]] = memref.load %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
+//       CHECKPARALLEL:   %[[a:.*]] = memref.load %{{.*}}[%[[i]], %[[j]]] : memref<?x?xf32, strided<[?, 1]>>
+//       CHECKPARALLEL:   %[[b:.*]] = memref.load %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, strided<[?, ?, 1]>>
+//       CHECKPARALLEL:   %[[c:.*]] = memref.load %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, strided<[?, ?, 1]>>
 //       CHECKPARALLEL:   %[[d:.*]] = arith.mulf %[[a]], %[[b]] : f32
 //       CHECKPARALLEL:   %[[e:.*]] = arith.addf %[[c]], %[[d]] : f32
-//       CHECKPARALLEL:   store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
-//       CHECKPARALLEL:   store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
+//       CHECKPARALLEL:   store %[[d]], %{{.*}}[%[[i]], %[[j]], %[[k]]] : memref<?x?x?xf32, strided<[?, ?, 1]>>
+//       CHECKPARALLEL:   store %[[e]], %{{.*}}[%[[i]], %[[k]], %[[j]]] : memref<?x?x?xf32, strided<[?, ?, 1]>>
 
 #trait4 = {
   iterator_types = ["parallel", "parallel", "parallel"],
@@ -300,13 +300,13 @@ func.func @generic_region(%arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>, %a
   doc = "B(i,j,k), C(i,k,j) = foo(A(i, j) * B(i,j,k), i * j * k + C(i,k,j))"
 }
 func.func @generic_index_region(
-        %arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-        %arg1: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>,
-        %arg2: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>) {
+        %arg0: memref<?x?xf32, strided<[?, 1]>>,
+        %arg1: memref<?x?x?xf32, strided<[?, ?, 1]>>,
+        %arg2: memref<?x?x?xf32, strided<[?, ?, 1]>>) {
   linalg.generic #trait4
-      ins(%arg0 : memref<?x?xf32, strided<[?, 1], offset: ?>>)
-     outs(%arg1, %arg2 : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>,
-                         memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>) {
+      ins(%arg0 : memref<?x?xf32, strided<[?, 1]>>)
+     outs(%arg1, %arg2 : memref<?x?x?xf32, strided<[?, ?, 1]>>,
+                         memref<?x?x?xf32, strided<[?, ?, 1]>>) {
     ^bb0(%a: f32, %b: f32, %c: f32):
       %i = linalg.index 0 : index
       %j = linalg.index 1 : index
@@ -882,14 +882,14 @@ func.func @lower_to_loops_with_rank_reducing_subviews(
     %arg0 : memref<?xi32>, %arg1 : memref<?x?xi32>, %arg2 : index,
     %arg3 : index, %arg4 : index) {
   %0 = memref.subview %arg0[%arg2] [%arg3] [1]
-      : memref<?xi32> to memref<?xi32, strided<[1], offset: ?>>
+      : memref<?xi32> to memref<?xi32, strided<[1]>>
   %1 = memref.subview %arg1[0, %arg4] [1, %arg3] [1, 1]
-      : memref<?x?xi32> to memref<?xi32, strided<[1], offset: ?>>
+      : memref<?x?xi32> to memref<?xi32, strided<[1]>>
   linalg.generic {
     iterator_types = ["parallel"],
     indexing_maps = [affine_map<(i) -> (i)>, affine_map<(i) -> (i)>]}
-    ins(%0: memref<?xi32, strided<[1], offset: ?>>)
-   outs(%1: memref<?xi32, strided<[1], offset: ?>>) {
+    ins(%0: memref<?xi32, strided<[1]>>)
+   outs(%1: memref<?xi32, strided<[1]>>) {
     ^bb0(%a: i32, %b: i32):
       linalg.yield %a : i32
   }
diff --git a/mlir/test/Dialect/Linalg/one-shot-bufferize.mlir b/mlir/test/Dialect/Linalg/one-shot-bufferize.mlir
index 85cc1ffc2029e..d972c6c998f98 100644
--- a/mlir/test/Dialect/Linalg/one-shot-bufferize.mlir
+++ b/mlir/test/Dialect/Linalg/one-shot-bufferize.mlir
@@ -11,7 +11,7 @@
 // TODO: Some test cases from this file should be moved to other dialects.
 
 // CHECK-LABEL: func private @fill_inplace(
-//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
 // CHECK-NO-LAYOUT-MAP-LABEL: func private @fill_inplace(%{{.*}}: memref<?xf32>) {
 func.func private @fill_inplace(
     %A : tensor<?xf32> {bufferization.writable = true})
@@ -22,7 +22,7 @@ func.func private @fill_inplace(
 
   /// Inplaceable, no alloc
   // CHECK-NOT: alloc
-  //     CHECK: linalg.fill ins(%[[F0]] : f32) outs(%[[A]] : memref<?xf32, strided<[?], offset: ?>>)
+  //     CHECK: linalg.fill ins(%[[F0]] : f32) outs(%[[A]] : memref<?xf32, strided<[?]>>)
   %r = linalg.fill ins(%f0 : f32) outs(%A : tensor<?xf32>) -> tensor<?xf32>
 
   //     CHECK: return
@@ -34,7 +34,7 @@ func.func private @fill_inplace(
 
 /// No bufferization.writable flag, must allocate.
 // CHECK-LABEL: func @not_inplace(
-//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>) -> memref<?xf32> {
+//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>) -> memref<?xf32> {
 // CHECK-NO-LAYOUT-MAP-LABEL: func @not_inplace(%{{.*}}: memref<?xf32>) -> memref<?xf32>
 func.func @not_inplace(
     %A : tensor<?xf32> {bufferization.writable = false})
@@ -43,7 +43,7 @@ func.func @not_inplace(
   //     CHECK: %[[F0:.*]] = arith.constant 0.000000e+00 : f32
   %f0 = arith.constant 0.0 : f32
 
-  //     CHECK: %[[D0:.*]] = memref.dim %[[A]], {{.*}} : memref<?xf32, strided<[?], offset: ?>>
+  //     CHECK: %[[D0:.*]] = memref.dim %[[A]], {{.*}} : memref<?xf32, strided<[?]>>
   //     CHECK: %[[ALLOC:.*]] = memref.alloc(%[[D0]]) {alignment = 64 : i64} : memref<?xf32>
   //     CHECK: linalg.fill ins(%[[F0]] : f32) outs(%[[ALLOC]] : memref<?xf32>)
   %r = linalg.fill ins(%f0 : f32) outs(%A : tensor<?xf32>) -> tensor<?xf32>
@@ -57,7 +57,7 @@ func.func @not_inplace(
 
 
 // CHECK-LABEL: func private @not_inplace
-//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?x?xf32, strided<[?, ?], offset: ?>>) {
+//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?x?xf32, strided<[?, ?]>>) {
 // CHECK-NO-LAYOUT-MAP-LABEL: func private @not_inplace(%{{.*}}: memref<?x?xf32>) {
 func.func private @not_inplace(
     %A : tensor<?x?xf32> {bufferization.writable = true})
@@ -115,7 +115,7 @@ func.func @vec_inplace(
 // -----
 
 // CHECK-LABEL: func @vec_not_inplace
-//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
 func.func @vec_not_inplace(
     %A : tensor<?xf32> {bufferization.writable = true}, %vec : vector<4xf32>)
   -> (tensor<?xf32>, tensor<?xf32>)
diff --git a/mlir/test/Dialect/Linalg/pad-to-specific-memory-space.mlir b/mlir/test/Dialect/Linalg/pad-to-specific-memory-space.mlir
index 9f52cf8aa862a..ac140ab60e066 100644
--- a/mlir/test/Dialect/Linalg/pad-to-specific-memory-space.mlir
+++ b/mlir/test/Dialect/Linalg/pad-to-specific-memory-space.mlir
@@ -4,9 +4,9 @@
 #map = affine_map<()[s0] -> (-s0 + 12, 7)>
 
 // CHECK-LABEL: func @pad_to_memory_space(
-//  CHECK-SAME:     %[[arg0:.*]]: memref<24x12xf32, strided<[?, ?], offset: ?>>,
-//  CHECK-SAME:     %[[arg1:.*]]: memref<12x25xf32, strided<[?, ?], offset: ?>>,
-//  CHECK-SAME:     %[[arg2:.*]]: memref<24x25xf32, strided<[?, ?], offset: ?>>,
+//  CHECK-SAME:     %[[arg0:.*]]: memref<24x12xf32, strided<[?, ?]>>,
+//  CHECK-SAME:     %[[arg1:.*]]: memref<12x25xf32, strided<[?, ?]>>,
+//  CHECK-SAME:     %[[arg2:.*]]: memref<24x25xf32, strided<[?, ?]>>,
 func.func @pad_to_memory_space(%arg0: tensor<24x12xf32>,
                                %arg1: tensor<12x25xf32>,
                                %arg2: tensor<24x25xf32>,
@@ -66,9 +66,9 @@ module attributes {transform.with_named_sequence} {
 #map = affine_map<()[s0] -> (-s0 + 12, 7)>
 
 // CHECK-LABEL: func @vectorize_and_bufferize_pad(
-//  CHECK-SAME:     %[[arg0:.*]]: memref<24x12xf32, strided<[?, ?], offset: ?>>,
-//  CHECK-SAME:     %[[arg1:.*]]: memref<12x25xf32, strided<[?, ?], offset: ?>>,
-//  CHECK-SAME:     %[[arg2:.*]]: memref<24x25xf32, strided<[?, ?], offset: ?>>,
+//  CHECK-SAME:     %[[arg0:.*]]: memref<24x12xf32, strided<[?, ?]>>,
+//  CHECK-SAME:     %[[arg1:.*]]: memref<12x25xf32, strided<[?, ?]>>,
+//  CHECK-SAME:     %[[arg2:.*]]: memref<24x25xf32, strided<[?, ?]>>,
 func.func @vectorize_and_bufferize_pad(%arg0: tensor<24x12xf32>,
                                        %arg1: tensor<12x25xf32>,
                                        %arg2: tensor<24x25xf32>,
diff --git a/mlir/test/Dialect/Linalg/promote.mlir b/mlir/test/Dialect/Linalg/promote.mlir
index bab606c3a8169..04e17e40af2ab 100644
--- a/mlir/test/Dialect/Linalg/promote.mlir
+++ b/mlir/test/Dialect/Linalg/promote.mlir
@@ -19,13 +19,13 @@ func.func @matmul_f32(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
   scf.for %arg4 = %c0 to %6 step %c2 {
     scf.for %arg5 = %c0 to %8 step %c3 {
       scf.for %arg6 = %c0 to %7 step %c4 {
-        %11 = memref.subview %3[%arg4, %arg6][%c2, %c4][1, 1] : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-        %14 = memref.subview %4[%arg6, %arg5][%c4, %c3][1, 1] : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-        %17 = memref.subview %5[%arg4, %arg5][%c2, %c3][1, 1] : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
+        %11 = memref.subview %3[%arg4, %arg6][%c2, %c4][1, 1] : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
+        %14 = memref.subview %4[%arg6, %arg5][%c4, %c3][1, 1] : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
+        %17 = memref.subview %5[%arg4, %arg5][%c2, %c3][1, 1] : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
         linalg.matmul
-          ins(%11, %14: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                        memref<?x?xf32, strided<[?, 1], offset: ?>>)
-         outs(%17: memref<?x?xf32, strided<[?, 1], offset: ?>>)
+          ins(%11, %14: memref<?x?xf32, strided<[?, 1]>>,
+                        memref<?x?xf32, strided<[?, 1]>>)
+         outs(%17: memref<?x?xf32, strided<[?, 1]>>)
       }
     }
   }
@@ -52,13 +52,13 @@ func.func @matmul_f32(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
 //       CHECK:         %[[fullC:.*]] = memref.view %[[tmpC]][{{.*}}][{{.*}}] : memref<24xi8> to memref<?x?xf32>
 //       CHECK:         %[[partialC:.*]] = memref.subview %[[fullC]]{{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
 
-//       CHECK:         linalg.copy ins(%[[vA]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialA]] : memref<?x?xf32, strided<[?, 1]>>)
-//       CHECK:         linalg.copy ins(%[[vB]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialB]] : memref<?x?xf32, strided<[?, 1]>>)
-//       CHECK:         linalg.copy ins(%[[vC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialC]] : memref<?x?xf32, strided<[?, 1]>>)
+//       CHECK:         linalg.copy ins(%[[vA]] : memref<?x?xf32, strided<[?, 1]>>) outs(%[[partialA]] : memref<?x?xf32, strided<[?, 1]>>)
+//       CHECK:         linalg.copy ins(%[[vB]] : memref<?x?xf32, strided<[?, 1]>>) outs(%[[partialB]] : memref<?x?xf32, strided<[?, 1]>>)
+//       CHECK:         linalg.copy ins(%[[vC]] : memref<?x?xf32, strided<[?, 1]>>) outs(%[[partialC]] : memref<?x?xf32, strided<[?, 1]>>)
 //
 //       CHECK:         linalg.matmul ins(%[[partialA]], %[[partialB]]{{.*}} outs(%[[partialC]]
 //
-//       CHECK:         linalg.copy ins(%[[partialC]] : memref<?x?xf32, strided<[?, 1]>>) outs(%[[vC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)
+//       CHECK:         linalg.copy ins(%[[partialC]] : memref<?x?xf32, strided<[?, 1]>>) outs(%[[vC]] : memref<?x?xf32, strided<[?, 1]>>)
 //
 //   CHECK-NOT:         memref.dealloc %[[tmpA]] : memref<32xi8>
 //   CHECK-NOT:         memref.dealloc %[[tmpB]] : memref<48xi8>
@@ -89,13 +89,13 @@ func.func @matmul_f64(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
   scf.for %arg4 = %c0 to %6 step %c2 {
     scf.for %arg5 = %c0 to %8 step %c3 {
       scf.for %arg6 = %c0 to %7 step %c4 {
-        %11 = memref.subview %3[%arg4, %arg6][%c2, %c4][1, 1] : memref<?x?xf64> to memref<?x?xf64, strided<[?, 1], offset: ?>>
-        %14 = memref.subview %4[%arg6, %arg5][%c4, %c3][1, 1] : memref<?x?xf64> to memref<?x?xf64, strided<[?, 1], offset: ?>>
-        %17 = memref.subview %5[%arg4, %arg5][%c2, %c3][1, 1] : memref<?x?xf64> to memref<?x?xf64, strided<[?, 1], offset: ?>>
+        %11 = memref.subview %3[%arg4, %arg6][%c2, %c4][1, 1] : memref<?x?xf64> to memref<?x?xf64, strided<[?, 1]>>
+        %14 = memref.subview %4[%arg6, %arg5][%c4, %c3][1, 1] : memref<?x?xf64> to memref<?x?xf64, strided<[?, 1]>>
+        %17 = memref.subview %5[%arg4, %arg5][%c2, %c3][1, 1] : memref<?x?xf64> to memref<?x?xf64, strided<[?, 1]>>
         linalg.matmul
-          ins(%11, %14: memref<?x?xf64, strided<[?, 1], offset: ?>>,
-                        memref<?x?xf64, strided<[?, 1], offset: ?>>)
-         outs(%17: memref<?x?xf64, strided<[?, 1], offset: ?>>)
+          ins(%11, %14: memref<?x?xf64, strided<[?, 1]>>,
+                        memref<?x?xf64, strided<[?, 1]>>)
+         outs(%17: memref<?x?xf64, strided<[?, 1]>>)
       }
     }
   }
@@ -122,13 +122,13 @@ func.func @matmul_f64(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
 //       CHECK:         %[[fullC_f64:.*]] = memref.view %[[tmpC_f64]][{{.*}}][{{.*}}] : memref<48xi8> to memref<?x?xf64>
 //       CHECK:         %[[partialC_f64:.*]] = memref.subview %[[fullC_f64]][0, 0] [%{{.*}}, %{{.*}}] [1, 1] : memref<?x?xf64> to memref<?x?xf64, strided<[?, 1]>>
 
-//       CHECK:         linalg.copy ins(%[[vA_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>>) outs(%[[partialA_f64]] : memref<?x?xf64, strided<[?, 1]>>)
-//       CHECK:         linalg.copy ins(%[[vB_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>>) outs(%[[partialB_f64]] : memref<?x?xf64, strided<[?, 1]>>)
-//       CHECK:         linalg.copy ins(%[[vC_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>>) outs(%[[partialC_f64]] : memref<?x?xf64, strided<[?, 1]>>)
+//       CHECK:         linalg.copy ins(%[[vA_f64]] : memref<?x?xf64, strided<[?, 1]>>) outs(%[[partialA_f64]] : memref<?x?xf64, strided<[?, 1]>>)
+//       CHECK:         linalg.copy ins(%[[vB_f64]] : memref<?x?xf64, strided<[?, 1]>>) outs(%[[partialB_f64]] : memref<?x?xf64, strided<[?, 1]>>)
+//       CHECK:         linalg.copy ins(%[[vC_f64]] : memref<?x?xf64, strided<[?, 1]>>) outs(%[[partialC_f64]] : memref<?x?xf64, strided<[?, 1]>>)
 //
 //       CHECK:         linalg.matmul ins(%[[partialA_f64]], %[[partialB_f64]]{{.*}} outs(%[[partialC_f64]]
 //
-//       CHECK:         linalg.copy ins(%[[partialC_f64]] : memref<?x?xf64, strided<[?, 1]>>) outs(%[[vC_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>>)
+//       CHECK:         linalg.copy ins(%[[partialC_f64]] : memref<?x?xf64, strided<[?, 1]>>) outs(%[[vC_f64]] : memref<?x?xf64, strided<[?, 1]>>)
 //
 //       CHECK:         memref.dealloc %[[tmpA_f64]] : memref<64xi8>
 //       CHECK:         memref.dealloc %[[tmpB_f64]] : memref<96xi8>
@@ -162,19 +162,19 @@ func.func @gemm_shared(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?
 // CHECK:   scf.for %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} {
 // CHECK:     scf.for %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} {
 // CHECK:       scf.for %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} {
-// CHECK:         %[[subview_A:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-// CHECK:         %[[subview_B:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-// CHECK:         %[[subview_C:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:         %[[subview_A:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
+// CHECK:         %[[subview_B:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
+// CHECK:         %[[subview_C:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
 
 // CHECK:         %[[shared_A:.*]] = memref.subview %[[alloc_B]][0, 0] [%{{.*}}, %{{.*}}] [1, 1] : memref<16x16xf32, #gpu.address_space<workgroup>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<workgroup>>
 // CHECK:         %[[shared_B:.*]] = memref.subview %[[alloc_A]][0, 0] [%{{.*}}, %{{.*}}] [1, 1] : memref<16x16xf32, #gpu.address_space<workgroup>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<workgroup>>
 
 // CHECK-NEXT:    gpu.barrier
-// CHECK-NEXT:    memref.copy %[[subview_A]], %[[shared_A]] :  memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<workgroup>>
+// CHECK-NEXT:    memref.copy %[[subview_A]], %[[shared_A]] :  memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<workgroup>>
 // CHECK-NEXT:    gpu.barrier
 
 // CHECK-NEXT:    gpu.barrier
-// CHECK-NEXT:    memref.copy %[[subview_B]], %[[shared_B]] :  memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<workgroup>>
+// CHECK-NEXT:    memref.copy %[[subview_B]], %[[shared_B]] :  memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<workgroup>>
 // CHECK-NEXT:    gpu.barrier
 
 // CHECK:         linalg.matmul ins(%[[shared_A]], %[[shared_B]]{{.*}} outs(%[[subview_C]]
@@ -211,15 +211,15 @@ func.func @gemm_private(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<
 // CHECK:   scf.for %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} {
 // CHECK:     scf.for %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} {
 // CHECK:       scf.for %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} {
-// CHECK:         %[[subview_A:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-// CHECK:         %[[subview_B:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-// CHECK:         %[[subview_C:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:         %[[subview_A:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
+// CHECK:         %[[subview_B:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
+// CHECK:         %[[subview_C:.*]] = memref.subview {{.*}} : memref<?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
 
 // CHECK:         %[[private_A:.*]] = memref.subview %[[alloc_B]][0, 0] [%{{.*}}, %{{.*}}] [1, 1] : memref<16x16xf32, #gpu.address_space<private>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<private>>
 // CHECK:         %[[private_B:.*]] = memref.subview %[[alloc_A]][0, 0] [%{{.*}}, %{{.*}}] [1, 1] : memref<16x16xf32, #gpu.address_space<private>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<private>>
 
-// CHECK-NEXT:    memref.copy %[[subview_A]], %[[private_A]] :  memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<private>>
-// CHECK-NEXT:    memref.copy %[[subview_B]], %[[private_B]] :  memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<private>>
+// CHECK-NEXT:    memref.copy %[[subview_A]], %[[private_A]] :  memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<private>>
+// CHECK-NEXT:    memref.copy %[[subview_B]], %[[private_B]] :  memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[16, 1]>, #gpu.address_space<private>>
 
 // CHECK:         linalg.matmul ins(%[[private_A]], %[[private_B]]{{.*}} outs(%[[subview_C]]
 
@@ -241,11 +241,11 @@ module attributes {transform.with_named_sequence} {
 #map8 = affine_map<(d0, d1, d2) -> (d0, d1)>
 
 // CHECK: promote_rank_reducing_subviews(%[[arg0:.+]]: memref<{{.*}}>, %[[arg1:.+]]: memref<{{.*}}>, %[[arg2:.+]]: memref<{{.*}}>, %[[lb1:.+]]: index, %[[lb2:.+]]: index, %[[lb3:.+]]: index, %[[lb4:.+]]: index, %[[lb5:.+]]: index, %[[lb6:.+]]: index, %[[ub1:.+]]: index, %[[ub2:.+]]: index
-func.func @promote_rank_reducing_subviews(%arg0:  memref<?x?x?x64xf32, strided<[?, ?, ?, ?], offset: ?>>, %arg1: memref<128x3x3x64xf32, strided<[?, ?, ?, ?], offset: ?>>, %arg2: memref<?x?x?x128xf32>,
+func.func @promote_rank_reducing_subviews(%arg0:  memref<?x?x?x64xf32, strided<[?, ?, ?, ?]>>, %arg1: memref<128x3x3x64xf32, strided<[?, ?, ?, ?]>>, %arg2: memref<?x?x?x128xf32>,
                                           %arg3: index, %arg4: index, %arg5: index, %arg6: index, %arg7: index, %arg8: index, %ub1: index, %ub2: index) {
-  %13 = memref.subview %arg0[%arg3, 0, %arg4, %arg8] [1, 1, %ub1, 32] [1, 1, 1, 1] : memref<?x?x?x64xf32, strided<[?, ?, ?, ?], offset: ?>> to memref<?x32xf32, strided<[?, ?], offset: ?>>
-  %14 = memref.subview %arg1[0, %arg6, %arg7, %arg8] [128, 1, 1, 32] [1, 1, 1, 1] : memref<128x3x3x64xf32, strided<[?, ?, ?, ?], offset: ?>> to memref<128x32xf32, strided<[?, ?], offset: ?>>
-  %9 = memref.subview %arg2[%arg3, %arg4, %arg5, 0] [1, 1, %ub2, 128] [1, 1, 1, 1] : memref<?x?x?x128xf32> to memref<?x128xf32, strided<[128, 1], offset: ?>>
+  %13 = memref.subview %arg0[%arg3, 0, %arg4, %arg8] [1, 1, %ub1, 32] [1, 1, 1, 1] : memref<?x?x?x64xf32, strided<[?, ?, ?, ?]>> to memref<?x32xf32, strided<[?, ?]>>
+  %14 = memref.subview %arg1[0, %arg6, %arg7, %arg8] [128, 1, 1, 32] [1, 1, 1, 1] : memref<128x3x3x64xf32, strided<[?, ?, ?, ?]>> to memref<128x32xf32, strided<[?, ?]>>
+  %9 = memref.subview %arg2[%arg3, %arg4, %arg5, 0] [1, 1, %ub2, 128] [1, 1, 1, 1] : memref<?x?x?x128xf32> to memref<?x128xf32, strided<[128, 1]>>
 
   // CHECK: %[[a_alloc:.+]] = memref.alloc
   // CHECK: %[[a_view:.+]] = memref.view %[[a_alloc]]{{.*}}
@@ -264,7 +264,7 @@ func.func @promote_rank_reducing_subviews(%arg0:  memref<?x?x?x64xf32, strided<[
   // CHECK-SAME: ins(%[[a_pro_subview]], %[[b_pro_subview]]
   // CHECK-SAME: outs(%[[c_pro_subview]]
 
-  linalg.generic {indexing_maps = [#map6, #map7, #map8], iterator_types = ["parallel", "parallel", "reduction"]} ins(%13, %14 : memref<?x32xf32, strided<[?, ?], offset: ?>>, memref<128x32xf32, strided<[?, ?], offset: ?>>) outs(%9 : memref<?x128xf32, strided<[128, 1], offset: ?>>) {
+  linalg.generic {indexing_maps = [#map6, #map7, #map8], iterator_types = ["parallel", "parallel", "reduction"]} ins(%13, %14 : memref<?x32xf32, strided<[?, ?]>>, memref<128x32xf32, strided<[?, ?]>>) outs(%9 : memref<?x128xf32, strided<[128, 1]>>) {
   ^bb0(%arg9: f32, %arg10: f32, %arg11: f32):
     %15 = arith.mulf %arg9, %arg10 : f32
     %16 = arith.addf %arg11, %15 : f32
diff --git a/mlir/test/Dialect/Linalg/promotion_options.mlir b/mlir/test/Dialect/Linalg/promotion_options.mlir
index dbc073c2665f9..5b7651bd0d1bd 100644
--- a/mlir/test/Dialect/Linalg/promotion_options.mlir
+++ b/mlir/test/Dialect/Linalg/promotion_options.mlir
@@ -27,10 +27,10 @@ func.func @gemm(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>
 //      CHECK:       %[[VC:.*]] = memref.view %[[tmpC]][%[[C0]]][] : memref<1024xi8> to memref<16x16xf32>
 //      CHECK:       %[[svCC:.+]] = memref.subview %[[VC]]
 
-//      CHECK:       linalg.copy ins(%[[svA]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[svAA]] : memref<?x?xf32, strided<[16, 1]>>)
-//      CHECK:       linalg.copy ins(%[[svC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[svCC]] : memref<?x?xf32, strided<[16, 1]>>)
+//      CHECK:       linalg.copy ins(%[[svA]] : memref<?x?xf32, strided<[?, 1]>>) outs(%[[svAA]] : memref<?x?xf32, strided<[16, 1]>>)
+//      CHECK:       linalg.copy ins(%[[svC]] : memref<?x?xf32, strided<[?, 1]>>) outs(%[[svCC]] : memref<?x?xf32, strided<[16, 1]>>)
 //      CHECK:       linalg.matmul ins(%[[VA]], %[[svB]]{{.*}} outs(%[[VC]]
-//      CHECK:       linalg.copy ins(%[[svCC]] : memref<?x?xf32, strided<[16, 1]>>) outs(%[[svC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)
+//      CHECK:       linalg.copy ins(%[[svCC]] : memref<?x?xf32, strided<[16, 1]>>) outs(%[[svC]] : memref<?x?xf32, strided<[?, 1]>>)
 //      CHECK:       memref.dealloc %[[tmpA]]
 //      CHECK:       memref.dealloc %[[tmpC]]
 
@@ -55,13 +55,13 @@ func.func @matmul_f32(%A: memref<512x256xf32>, %B: memref<256x512xf32>, %C: memr
         %i0 = affine.min affine_map<(d0)[s0] -> (-d0 + 512, s0)>(%arg4)[%s0]
         %i1 = affine.min affine_map<(d0)[s0] -> (-d0 + 512, s0)>(%arg5)[%s1]
         %i2 = affine.min affine_map<(d0)[s0] -> (-d0 + 256, s0)>(%arg6)[%s2]
-        %0 = memref.subview %A[%arg4, %arg6][%i0, %i2][1, 1] : memref<512x256xf32> to memref<?x?xf32, strided<[256, 1], offset: ?>>
-        %1 = memref.subview %B[%arg6, %arg5][%i2, %i1][1, 1] : memref<256x512xf32> to memref<?x?xf32, strided<[512, 1], offset: ?>>
-        %2 = memref.subview %C[%arg4, %arg5][%i0, %i1][1, 1] : memref<256x256xf32> to memref<?x?xf32, strided<[256, 1], offset: ?>>
+        %0 = memref.subview %A[%arg4, %arg6][%i0, %i2][1, 1] : memref<512x256xf32> to memref<?x?xf32, strided<[256, 1]>>
+        %1 = memref.subview %B[%arg6, %arg5][%i2, %i1][1, 1] : memref<256x512xf32> to memref<?x?xf32, strided<[512, 1]>>
+        %2 = memref.subview %C[%arg4, %arg5][%i0, %i1][1, 1] : memref<256x256xf32> to memref<?x?xf32, strided<[256, 1]>>
         linalg.matmul
-          ins(%0, %1: memref<?x?xf32, strided<[256, 1], offset: ?>>,
-                      memref<?x?xf32, strided<[512, 1], offset: ?>>)
-          outs(%2: memref<?x?xf32, strided<[256, 1], offset: ?>>)
+          ins(%0, %1: memref<?x?xf32, strided<[256, 1]>>,
+                      memref<?x?xf32, strided<[512, 1]>>)
+          outs(%2: memref<?x?xf32, strided<[256, 1]>>)
       }
     }
   }
diff --git a/mlir/test/Dialect/Linalg/roundtrip.mlir b/mlir/test/Dialect/Linalg/roundtrip.mlir
index bfb92c3289a49..bc81bb85b34e6 100644
--- a/mlir/test/Dialect/Linalg/roundtrip.mlir
+++ b/mlir/test/Dialect/Linalg/roundtrip.mlir
@@ -26,65 +26,65 @@ func.func @views(%arg0: index) {
 
 // -----
 
-func.func @ops(%arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-          %arg1: memref<?xf32, strided<[1], offset: ?>>,
-          %arg2: memref<?xf32, strided<[1], offset: ?>>,
+func.func @ops(%arg0: memref<?x?xf32, strided<[?, 1]>>,
+          %arg1: memref<?xf32, strided<[1]>>,
+          %arg2: memref<?xf32, strided<[1]>>,
           %arg3: memref<f32>) {
-  linalg.matmul ins(%arg0, %arg0 : memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                                   memref<?x?xf32, strided<[?, 1], offset: ?>>)
-               outs(%arg0 : memref<?x?xf32, strided<[?, 1], offset: ?>>)
-  linalg.matvec ins(%arg0, %arg1: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                                  memref<?xf32, strided<[1], offset: ?>>)
-               outs(%arg2: memref<?xf32, strided<[1], offset: ?>>)
-  linalg.dot ins(%arg1, %arg2: memref<?xf32, strided<[1], offset: ?>>,
-                               memref<?xf32, strided<[1], offset: ?>>)
+  linalg.matmul ins(%arg0, %arg0 : memref<?x?xf32, strided<[?, 1]>>,
+                                   memref<?x?xf32, strided<[?, 1]>>)
+               outs(%arg0 : memref<?x?xf32, strided<[?, 1]>>)
+  linalg.matvec ins(%arg0, %arg1: memref<?x?xf32, strided<[?, 1]>>,
+                                  memref<?xf32, strided<[1]>>)
+               outs(%arg2: memref<?xf32, strided<[1]>>)
+  linalg.dot ins(%arg1, %arg2: memref<?xf32, strided<[1]>>,
+                               memref<?xf32, strided<[1]>>)
             outs(%arg3: memref<f32>)
   return
 }
 // CHECK-LABEL: func @ops(%
 // CHECK: linalg.matmul
-// CHECK-SAME:   ins(%{{.*}}, %{{.*}} : memref<?x?xf32, strided<[?, 1], offset: ?>>,
-// CHECK-SAME:                          memref<?x?xf32, strided<[?, 1], offset: ?>>)
-// CHECK-SAME:  outs(%{{.*}} : memref<?x?xf32, strided<[?, 1], offset: ?>>)
+// CHECK-SAME:   ins(%{{.*}}, %{{.*}} : memref<?x?xf32, strided<[?, 1]>>,
+// CHECK-SAME:                          memref<?x?xf32, strided<[?, 1]>>)
+// CHECK-SAME:  outs(%{{.*}} : memref<?x?xf32, strided<[?, 1]>>)
 // CHECK: linalg.matvec
-// CHECK-SAME:   ins(%{{.*}}, %{{.*}}: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-// CHECK-SAME:                         memref<?xf32, strided<[1], offset: ?>>)
-// CHECK-SAME:  outs(%{{.*}}: memref<?xf32, strided<[1], offset: ?>>)
+// CHECK-SAME:   ins(%{{.*}}, %{{.*}}: memref<?x?xf32, strided<[?, 1]>>,
+// CHECK-SAME:                         memref<?xf32, strided<[1]>>)
+// CHECK-SAME:  outs(%{{.*}}: memref<?xf32, strided<[1]>>)
 // CHECK: linalg.dot
-// CHECK-SAME:   ins(%{{.*}}, %{{.*}}: memref<?xf32, strided<[1], offset: ?>>,
-// CHECK-SAME:                         memref<?xf32, strided<[1], offset: ?>>)
+// CHECK-SAME:   ins(%{{.*}}, %{{.*}}: memref<?xf32, strided<[1]>>,
+// CHECK-SAME:                         memref<?xf32, strided<[1]>>)
 // CHECK-SAME:  outs(%{{.*}}: memref<f32>)
 
 // -----
 
-func.func @fill_view(%arg0: memref<?xf32, strided<[1], offset: ?>>, %arg1: f32) {
-  linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<?xf32, strided<[1], offset: ?>>)
+func.func @fill_view(%arg0: memref<?xf32, strided<[1]>>, %arg1: f32) {
+  linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<?xf32, strided<[1]>>)
   return
 }
 // CHECK-LABEL: func @fill_view(
-//       CHECK:  %{{.*}}: memref<?xf32, strided<[1], offset: ?>>, %{{.*}}: f32) {
-//       CHECK:   linalg.fill ins(%{{.*}} : f32) outs(%{{.*}} : memref<?xf32, strided<[1], offset: ?>>)
+//       CHECK:  %{{.*}}: memref<?xf32, strided<[1]>>, %{{.*}}: f32) {
+//       CHECK:   linalg.fill ins(%{{.*}} : f32) outs(%{{.*}} : memref<?xf32, strided<[1]>>)
 
 // -----
 
-func.func @memref_transpose(%arg0: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>) {
-  %0 = memref.transpose %arg0 (i, j, k) -> (k, j, i) : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>> to memref<?x?x?xf32, strided<[1, ?, ?], offset: ?>>
+func.func @memref_transpose(%arg0: memref<?x?x?xf32, strided<[?, ?, 1]>>) {
+  %0 = memref.transpose %arg0 (i, j, k) -> (k, j, i) : memref<?x?x?xf32, strided<[?, ?, 1]>> to memref<?x?x?xf32, strided<[1, ?, ?]>>
   return
 }
 // CHECK-LABEL: func @memref_transpose
 //       CHECK:   memref.transpose %{{.*}} ([[i:.*]], [[j:.*]], [[k:.*]]) -> ([[k]], [[j]], [[i]]) :
-//  CHECK-SAME:      memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>> to memref<?x?x?xf32, strided<[1, ?, ?], offset: ?>>
+//  CHECK-SAME:      memref<?x?x?xf32, strided<[?, ?, 1]>> to memref<?x?x?xf32, strided<[1, ?, ?]>>
 
 // -----
 
 
-func.func @fill_view3(%arg0: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>, %arg1: f32) {
-  linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>)
+func.func @fill_view3(%arg0: memref<?x?x?xf32, strided<[?, ?, 1]>>, %arg1: f32) {
+  linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<?x?x?xf32, strided<[?, ?, 1]>>)
   return
 }
 // CHECK-LABEL: func @fill_view3(
-//       CHECK:  %{{.*}}: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>, %{{.*}}: f32) {
-//       CHECK:   linalg.fill ins(%{{.*}} : f32) outs(%{{.*}} : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>)
+//       CHECK:  %{{.*}}: memref<?x?x?xf32, strided<[?, ?, 1]>>, %{{.*}}: f32) {
+//       CHECK:   linalg.fill ins(%{{.*}} : f32) outs(%{{.*}} : memref<?x?x?xf32, strided<[?, ?, 1]>>)
 
 // -----
 
@@ -100,12 +100,12 @@ func.func @fill_view3(%arg0: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>, %
   library_call = "some_external_function_name_1"
 }
 
-func.func @generic(%arg0: memref<?x?xvector<3x4xi4>, strided<[?, 1], offset: ?>>,
-              %arg1: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>) {
+func.func @generic(%arg0: memref<?x?xvector<3x4xi4>, strided<[?, 1]>>,
+              %arg1: memref<?x?x?xf32, strided<[?, ?, 1]>>) {
   %cst = arith.constant 0.0 : f32
   linalg.generic #trait_0
-       ins(%arg0, %cst : memref<?x?xvector<3x4xi4>, strided<[?, 1], offset: ?>>, f32)
-      outs(%arg1 : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>)
+       ins(%arg0, %cst : memref<?x?xvector<3x4xi4>, strided<[?, 1]>>, f32)
+      outs(%arg1 : memref<?x?x?xf32, strided<[?, ?, 1]>>)
       attrs = {foo = 1} {
     ^bb(%0: vector<3x4xi4>, %1: f32, %2: f32) :
       linalg.yield %1 : f32
@@ -117,8 +117,8 @@ func.func @generic(%arg0: memref<?x?xvector<3x4xi4>, strided<[?, 1], offset: ?>>
 //  CHECK-SAME:     indexing_maps = [#{{[0-9a-z]*}}, #{{[0-9a-z]*}}, #{{[0-9a-z]*}}],
 //  CHECK-SAME:     iterator_types = ["parallel", "parallel", "parallel"],
 //  CHECK-SAME:     library_call = "some_external_function_name_1"}
-//  CHECK-SAME:      ins({{.*}}, {{.*}} : memref<?x?xvector<3x4xi4>, strided<[?, 1], offset: ?>>, f32)
-//  CHECK-SAME:     outs({{.*}} : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>)
+//  CHECK-SAME:      ins({{.*}}, {{.*}} : memref<?x?xvector<3x4xi4>, strided<[?, 1]>>, f32)
+//  CHECK-SAME:     outs({{.*}} : memref<?x?x?xf32, strided<[?, ?, 1]>>)
 //  CHECK-SAME:     {foo = 1 : i64}
 
 // -----
@@ -247,11 +247,11 @@ func.func @generic_op_zero_rank(%arg0: tensor<f32>, %arg1 : tensor<3x4xf32>) ->
   library_call = "some_external_function_name_2"
 }
 
-func.func @generic_region(%arg0: memref<?x?xvector<3x4xi4>, strided<[?, 1], offset: ?>>,
-                     %arg1: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>) {
+func.func @generic_region(%arg0: memref<?x?xvector<3x4xi4>, strided<[?, 1]>>,
+                     %arg1: memref<?x?x?xf32, strided<[?, ?, 1]>>) {
   linalg.generic #trait_3
-       ins(%arg0 : memref<?x?xvector<3x4xi4>, strided<[?, 1], offset: ?>>)
-      outs(%arg1 : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>)
+       ins(%arg0 : memref<?x?xvector<3x4xi4>, strided<[?, 1]>>)
+      outs(%arg1 : memref<?x?x?xf32, strided<[?, ?, 1]>>)
       attrs = {foo = 1} {
     ^bb(%a: vector<3x4xi4>, %b: f32) :
       %0 = linalg.index 0 : index
@@ -266,8 +266,8 @@ func.func @generic_region(%arg0: memref<?x?xvector<3x4xi4>, strided<[?, 1], offs
 //  CHECK-SAME:     indexing_maps = [#{{[0-9a-z]*}}, #{{[0-9a-z]*}}],
 //  CHECK-SAME:     iterator_types = ["parallel", "parallel", "parallel"],
 //  CHECK-SAME:     library_call = "some_external_function_name_2"
-//  CHECK-SAME:      ins({{.*}} : memref<?x?xvector<3x4xi4>, strided<[?, 1], offset: ?>>)
-//  CHECK-SAME:     outs({{.*}} : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>)
+//  CHECK-SAME:      ins({{.*}} : memref<?x?xvector<3x4xi4>, strided<[?, 1]>>)
+//  CHECK-SAME:     outs({{.*}} : memref<?x?x?xf32, strided<[?, ?, 1]>>)
 //  CHECK-SAME:     attrs = {foo = 1 : i64} {
 //       CHECK:  ^{{.*}}(%{{.*}}: vector<3x4xi4>, %{{.*}}: f32):
 //       CHECK:    %{{.*}} = linalg.index 0 : index
diff --git a/mlir/test/Dialect/Linalg/standard.mlir b/mlir/test/Dialect/Linalg/standard.mlir
index f50016f9ea477..fa944675ba218 100644
--- a/mlir/test/Dialect/Linalg/standard.mlir
+++ b/mlir/test/Dialect/Linalg/standard.mlir
@@ -1,26 +1,26 @@
 // RUN: mlir-opt %s -convert-linalg-to-std --split-input-file -verify-diagnostics | FileCheck %s
 
-func.func @dot(%arg0: memref<?xf32, strided<[1], offset: ?>>,
-          %arg1: memref<?xf32, strided<[1], offset: ?>>,
+func.func @dot(%arg0: memref<?xf32, strided<[1]>>,
+          %arg1: memref<?xf32, strided<[1]>>,
           %arg2: memref<f32>) {
-  linalg.dot ins(%arg0, %arg1: memref<?xf32, strided<[1], offset: ?>>,
-                               memref<?xf32, strided<[1], offset: ?>>)
+  linalg.dot ins(%arg0, %arg1: memref<?xf32, strided<[1]>>,
+                               memref<?xf32, strided<[1]>>)
              outs(%arg2: memref<f32>)
   return
 }
 // CHECK-LABEL: func @dot(
-//  CHECK-SAME: %[[arg0:[a-zA-z0-9]*]]: memref<?xf32, strided<[1], offset: ?>>,
-//  CHECK-SAME: %[[arg1:[a-zA-z0-9]*]]: memref<?xf32, strided<[1], offset: ?>>,
+//  CHECK-SAME: %[[arg0:[a-zA-z0-9]*]]: memref<?xf32, strided<[1]>>,
+//  CHECK-SAME: %[[arg1:[a-zA-z0-9]*]]: memref<?xf32, strided<[1]>>,
 //  CHECK-SAME: %[[arg2:[a-zA-z0-9]*]]: memref<f32>) {
 //       CHECK:   %[[o0:.*]] = memref.cast %[[arg0]] :
-//  CHECK-SAME:     memref<?xf32, strided<[1], offset: ?>> to memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     memref<?xf32, strided<[1]>> to memref<?xf32, strided<[?]>>
 //       CHECK:   %[[o1:.*]] = memref.cast %[[arg1]] :
-//  CHECK-SAME:     memref<?xf32, strided<[1], offset: ?>> to memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     memref<?xf32, strided<[1]>> to memref<?xf32, strided<[?]>>
 //       CHECK:   %[[o2:.*]] = memref.cast %[[arg2]] :
-//  CHECK-SAME:     memref<f32> to memref<f32, strided<[], offset: ?>>
+//  CHECK-SAME:     memref<f32> to memref<f32, strided<[]>>
 //       CHECK:   call @linalg_dot_viewsxf32_viewsxf32_viewf32(
 //  CHECK-SAME:     %[[o0]], %[[o1]], %[[o2]]) :
-//  CHECK-SAME:   memref<?xf32, strided<[?], offset: ?>>, memref<?xf32, strided<[?], offset: ?>>, memref<f32, strided<[], offset: ?>>
+//  CHECK-SAME:   memref<?xf32, strided<[?]>>, memref<?xf32, strided<[?]>>, memref<f32, strided<[]>>
 
 // -----
 
diff --git a/mlir/test/Dialect/Linalg/tile-softmax.mlir b/mlir/test/Dialect/Linalg/tile-softmax.mlir
index 7d201b58a8c3d..784a7fc3671b7 100644
--- a/mlir/test/Dialect/Linalg/tile-softmax.mlir
+++ b/mlir/test/Dialect/Linalg/tile-softmax.mlir
@@ -133,9 +133,9 @@ module attributes {transform.with_named_sequence} {
 // CHECK:           scf.for %[[VAL_7:.*]] = %[[C0]] to %[[C16]] step %[[C2]] {
 // CHECK:             scf.for %[[VAL_8:.*]] = %[[C0]] to %[[C64]] step %[[C3]] {
 // CHECK:               %[[VAL_9:.*]] = affine.min #[[$MIN_MAP]](%[[VAL_8]])
-// CHECK:               %[[VAL_10:.*]] = memref.subview %[[VAL_0]]{{\[}}%[[VAL_7]], %[[VAL_8]], 0] [2, %[[VAL_9]], 256] [1, 1, 1] : memref<16x64x256xf32> to memref<2x?x256xf32, strided<[16384, 256, 1], offset: ?>>
-// CHECK:               %[[VAL_11:.*]] = memref.subview %[[VAL_1]]{{\[}}%[[VAL_7]], %[[VAL_8]], 0] [2, %[[VAL_9]], 256] [1, 1, 1] : memref<16x64x256xf32> to memref<2x?x256xf32, strided<[16384, 256, 1], offset: ?>>
-// CHECK:               linalg.softmax dimension(1) ins(%[[VAL_10]] : memref<2x?x256xf32, strided<[16384, 256, 1], offset: ?>>) outs(%[[VAL_11]] : memref<2x?x256xf32, strided<[16384, 256, 1], offset: ?>>)
+// CHECK:               %[[VAL_10:.*]] = memref.subview %[[VAL_0]]{{\[}}%[[VAL_7]], %[[VAL_8]], 0] [2, %[[VAL_9]], 256] [1, 1, 1] : memref<16x64x256xf32> to memref<2x?x256xf32, strided<[16384, 256, 1]>>
+// CHECK:               %[[VAL_11:.*]] = memref.subview %[[VAL_1]]{{\[}}%[[VAL_7]], %[[VAL_8]], 0] [2, %[[VAL_9]], 256] [1, 1, 1] : memref<16x64x256xf32> to memref<2x?x256xf32, strided<[16384, 256, 1]>>
+// CHECK:               linalg.softmax dimension(1) ins(%[[VAL_10]] : memref<2x?x256xf32, strided<[16384, 256, 1]>>) outs(%[[VAL_11]] : memref<2x?x256xf32, strided<[16384, 256, 1]>>)
 // CHECK:             }
 // CHECK:           }
 // CHECK:           return
diff --git a/mlir/test/Dialect/Linalg/transform-op-compose-masked-vectorize-and-cleanups.mlir b/mlir/test/Dialect/Linalg/transform-op-compose-masked-vectorize-and-cleanups.mlir
index 61fe3da34e1d5..e46212c8e3841 100644
--- a/mlir/test/Dialect/Linalg/transform-op-compose-masked-vectorize-and-cleanups.mlir
+++ b/mlir/test/Dialect/Linalg/transform-op-compose-masked-vectorize-and-cleanups.mlir
@@ -4,16 +4,16 @@
 func.func @masked_matmul(%module: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: memref<?x?xf32>) {
 
   //      CHECK: %[[MLHS:.*]] = vector.create_mask {{.*}} : vector<8x8xi1>
-  //      CHECK: %[[LHS:.*]] = vector.transfer_read %{{.*}}, %[[MLHS]] {in_bounds = [true, true]} : memref<?x?xf32, strided<[?, 1], offset: ?>>, vector<8x8xf32>
+  //      CHECK: %[[LHS:.*]] = vector.transfer_read %{{.*}}, %[[MLHS]] {in_bounds = [true, true]} : memref<?x?xf32, strided<[?, 1]>>, vector<8x8xf32>
   //      CHECK: %[[MRHS:.*]] = vector.create_mask {{.*}} : vector<8x8xi1>
-  //      CHECK: %[[RHS:.*]] = vector.transfer_read %{{.*}}, %[[MRHS]] {in_bounds = [true, true]} : memref<?x?xf32, strided<[?, 1], offset: ?>>, vector<8x8xf32>
+  //      CHECK: %[[RHS:.*]] = vector.transfer_read %{{.*}}, %[[MRHS]] {in_bounds = [true, true]} : memref<?x?xf32, strided<[?, 1]>>, vector<8x8xf32>
   //      CHECK: %[[MACC:.*]] = vector.create_mask {{.*}} : vector<8x8xi1>
-  //      CHECK: %[[ACC:.*]] = vector.transfer_read {{.*}}, %[[MACC]] {in_bounds = [true, true]} : memref<?x?xf32, strided<[?, 1], offset: ?>>, vector<8x8xf32>
+  //      CHECK: %[[ACC:.*]] = vector.transfer_read {{.*}}, %[[MACC]] {in_bounds = [true, true]} : memref<?x?xf32, strided<[?, 1]>>, vector<8x8xf32>
   //      CHECK: %[[MRES:.*]] = vector.create_mask {{.*}} : vector<8x8x8xi1>
   //      CHECK: %[[RES:.*]] = vector.mask %[[MRES]] { vector.contract
   // CHECK-SAME:   : vector<8x8xf32>, vector<8x8xf32> into vector<8x8xf32>
   // CHECK-SAME:   : vector<8x8x8xi1> -> vector<8x8xf32>
-  //      CHECK: vector.transfer_write %[[RES]], %{{.*}}, %[[MACC]] {in_bounds = [true, true]} : vector<8x8xf32>, memref<?x?xf32, strided<[?, 1], offset: ?>>
+  //      CHECK: vector.transfer_write %[[RES]], %{{.*}}, %[[MACC]] {in_bounds = [true, true]} : vector<8x8xf32>, memref<?x?xf32, strided<[?, 1]>>
   linalg.matmul ins(%module, %arg1 : memref<?x?xf32>, memref<?x?xf32>) outs(%arg2 : memref<?x?xf32>)
   return
 }
diff --git a/mlir/test/Dialect/Linalg/transform-op-linalg-copy-to-memref.mlir b/mlir/test/Dialect/Linalg/transform-op-linalg-copy-to-memref.mlir
index 7280ccbea2563..8b47d08ca7bb0 100644
--- a/mlir/test/Dialect/Linalg/transform-op-linalg-copy-to-memref.mlir
+++ b/mlir/test/Dialect/Linalg/transform-op-linalg-copy-to-memref.mlir
@@ -22,15 +22,15 @@ module attributes {transform.with_named_sequence} {
 
 // CHECK:  func.func @linalg_copy_to_memref_copy_strides(%[[INPUT:.*]]: memref<128x32xf32>, %[[OUTPUT:.*]]: memref<128x64xf32>) {
 // CHECK:    %[[ALLOC:.*]] = memref.alloc() {alignment = 64 : i64} : memref<128x64xf32>
-// CHECK:    %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][0, 32] [128, 32] [1, 1] : memref<128x64xf32> to memref<128x32xf32, strided<[64, 1], offset: 32>>
-// CHECK:    memref.copy %[[INPUT]], %[[SUBVIEW]] : memref<128x32xf32> to memref<128x32xf32, strided<[64, 1], offset: 32>>
+// CHECK:    %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][0, 32] [128, 32] [1, 1] : memref<128x64xf32> to memref<128x32xf32, strided<[64, 1]>>
+// CHECK:    memref.copy %[[INPUT]], %[[SUBVIEW]] : memref<128x32xf32> to memref<128x32xf32, strided<[64, 1]>>
 // CHECK:    return
 // CHECK:  }
 
 func.func @linalg_copy_to_memref_copy_strides(%input : memref<128x32xf32>, %output : memref<128x64xf32>) {
   %alloc = memref.alloc() {alignment = 64 : i64} : memref<128x64xf32>
-  %subview = memref.subview %alloc[0, 32] [128, 32] [1, 1] : memref<128x64xf32> to memref<128x32xf32, strided<[64, 1], offset: 32>>
-  linalg.copy ins(%input : memref<128x32xf32>) outs(%subview : memref<128x32xf32, strided<[64, 1], offset: 32>>)
+  %subview = memref.subview %alloc[0, 32] [128, 32] [1, 1] : memref<128x64xf32> to memref<128x32xf32, strided<[64, 1]>>
+  linalg.copy ins(%input : memref<128x32xf32>) outs(%subview : memref<128x32xf32, strided<[64, 1]>>)
   return
 }
 
diff --git a/mlir/test/Dialect/Linalg/transform-patterns.mlir b/mlir/test/Dialect/Linalg/transform-patterns.mlir
index 176e55e3e6c4a..3f32de417a56e 100644
--- a/mlir/test/Dialect/Linalg/transform-patterns.mlir
+++ b/mlir/test/Dialect/Linalg/transform-patterns.mlir
@@ -1,10 +1,10 @@
 // RUN: mlir-opt %s -transform-interpreter -test-linalg-transform-patterns=test-patterns -split-input-file | FileCheck %s
 
-func.func @dot(%x: memref<?xf32, strided<[1], offset: ?>>,
-          %y: memref<?xf32, strided<[1], offset: ?>>,
+func.func @dot(%x: memref<?xf32, strided<[1]>>,
+          %y: memref<?xf32, strided<[1]>>,
           %v: memref<f32>) {
-  linalg.dot ins(%x, %y: memref<?xf32, strided<[1], offset: ?>>,
-                         memref<?xf32, strided<[1], offset: ?>>)
+  linalg.dot ins(%x, %y: memref<?xf32, strided<[1]>>,
+                         memref<?xf32, strided<[1]>>)
             outs(%v: memref<f32>)
   return
 }
@@ -25,13 +25,13 @@ module attributes {transform.with_named_sequence} {
 
 // -----
 
-func.func @matvec(%A: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-             %x: memref<?xf32, strided<[1], offset: ?>>,
-             %y: memref<?xf32, strided<[1], offset: ?>>) {
+func.func @matvec(%A: memref<?x?xf32, strided<[?, 1]>>,
+             %x: memref<?xf32, strided<[1]>>,
+             %y: memref<?xf32, strided<[1]>>) {
   linalg.matvec
-    ins(%A, %x: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                memref<?xf32, strided<[1], offset: ?>>)
-    outs(%y: memref<?xf32, strided<[1], offset: ?>>)
+    ins(%A, %x: memref<?x?xf32, strided<[?, 1]>>,
+                memref<?xf32, strided<[1]>>)
+    outs(%y: memref<?xf32, strided<[1]>>)
   return
 }
 
@@ -50,17 +50,17 @@ module attributes {transform.with_named_sequence} {
 // CHECK:         scf.for {{.*}} step %[[c5]]
 // CHECK:           scf.for {{.*}} step %[[c6]]
 // CHECK:             linalg.matvec
-// CHECK:               ins({{.*}}: memref<?x?xf32, strided<[?, 1], offset: ?>>, memref<?xf32, strided<[1], offset: ?>>)
-// CHECK:              outs({{.*}}: memref<?xf32, strided<[1], offset: ?>>)
+// CHECK:               ins({{.*}}: memref<?x?xf32, strided<[?, 1]>>, memref<?xf32, strided<[1]>>)
+// CHECK:              outs({{.*}}: memref<?xf32, strided<[1]>>)
 
 // -----
 
-func.func @matmul(%A: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-             %B: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-             %C: memref<?x?xf32, strided<[?, 1], offset: ?>>) {
-  linalg.matmul ins(%A, %B: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                            memref<?x?xf32, strided<[?, 1], offset: ?>>)
-               outs(%C: memref<?x?xf32, strided<[?, 1], offset: ?>>)
+func.func @matmul(%A: memref<?x?xf32, strided<[?, 1]>>,
+             %B: memref<?x?xf32, strided<[?, 1]>>,
+             %C: memref<?x?xf32, strided<[?, 1]>>) {
+  linalg.matmul ins(%A, %B: memref<?x?xf32, strided<[?, 1]>>,
+                            memref<?x?xf32, strided<[?, 1]>>)
+               outs(%C: memref<?x?xf32, strided<[?, 1]>>)
   return
 }
 
@@ -102,8 +102,8 @@ module attributes {transform.with_named_sequence} {
 // CHECK:                             scf.for {{.*}} = %[[c0]] to {{.*}} step %[[c3]] {
 // CHECK:                               scf.for {{.*}} = %[[c0]] to {{.*}} step %[[c4]] {
 // CHECK:                                 linalg.matmul
-// CHECK:                                   ins({{.*}}: memref<?x?xf32, strided<[?, 1], offset: ?>>, memref<?x?xf32, strided<[?, 1], offset: ?>>)
-// CHECK:                                  outs({{.*}}: memref<?x?xf32, strided<[?, 1], offset: ?>>)
+// CHECK:                                   ins({{.*}}: memref<?x?xf32, strided<[?, 1]>>, memref<?x?xf32, strided<[?, 1]>>)
+// CHECK:                                  outs({{.*}}: memref<?x?xf32, strided<[?, 1]>>)
 
 // -----
 
@@ -122,13 +122,13 @@ module attributes {transform.with_named_sequence} {
   library_call = "linalg_matmul",
   iterator_types = ["parallel", "parallel", "reduction"]
 }
-func.func @permute_generic(%A: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-           %B: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-           %C: memref<?x?xf32, strided<[?, 1], offset: ?>>) {
+func.func @permute_generic(%A: memref<?x?xf32, strided<[?, 1]>>,
+           %B: memref<?x?xf32, strided<[?, 1]>>,
+           %C: memref<?x?xf32, strided<[?, 1]>>) {
   linalg.generic #generic_matmul_trait
-    ins(%A, %B : memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                 memref<?x?xf32, strided<[?, 1], offset: ?>>)
-   outs(%C : memref<?x?xf32, strided<[?, 1], offset: ?>>) {
+    ins(%A, %B : memref<?x?xf32, strided<[?, 1]>>,
+                 memref<?x?xf32, strided<[?, 1]>>)
+   outs(%C : memref<?x?xf32, strided<[?, 1]>>) {
     ^bb(%a: f32, %b: f32, %c: f32):
       %d = arith.mulf %a, %b: f32
       %e = arith.addf %c, %d: f32
@@ -150,18 +150,18 @@ module attributes {transform.with_named_sequence} {
 // CHECK-SAME:   indexing_maps = [#[[$kn]], #[[$nm]], #[[$km]]],
 // CHECK-SAME:   iterator_types = ["parallel", "reduction", "parallel"],
 // CHECK-SAME:   library_call = "linalg_matmul"}
-// CHECK:          memref<?x?xf32, strided<[?, 1], offset: ?>>,
-// CHECK-SAME:     memref<?x?xf32, strided<[?, 1], offset: ?>>
-// CHECK-SAME:     memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:          memref<?x?xf32, strided<[?, 1]>>,
+// CHECK-SAME:     memref<?x?xf32, strided<[?, 1]>>
+// CHECK-SAME:     memref<?x?xf32, strided<[?, 1]>>
 
 // -----
 
-func.func @matvec_perm(%A: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-             %x: memref<?xf32, strided<[1], offset: ?>>,
-             %y: memref<?xf32, strided<[1], offset: ?>>) {
-  linalg.matvec ins(%A, %x: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                            memref<?xf32, strided<[1], offset: ?>>)
-               outs(%y: memref<?xf32, strided<[1], offset: ?>>)
+func.func @matvec_perm(%A: memref<?x?xf32, strided<[?, 1]>>,
+             %x: memref<?xf32, strided<[1]>>,
+             %y: memref<?xf32, strided<[1]>>) {
+  linalg.matvec ins(%A, %x: memref<?x?xf32, strided<[?, 1]>>,
+                            memref<?xf32, strided<[1]>>)
+               outs(%y: memref<?xf32, strided<[1]>>)
   return
 }
 
@@ -180,17 +180,17 @@ module attributes {transform.with_named_sequence} {
 // CHECK:         scf.for {{.*}} = %[[c0]] to {{.*}} step %[[c6]]
 // CHECK:           scf.for {{.*}} = %[[c0]] to {{.*}} step %[[c5]]
 // CHECK:             linalg.matvec
-// CHECK:               ins({{.*}}: memref<?x?xf32, strided<[?, 1], offset: ?>>, memref<?xf32, strided<[1], offset: ?>>)
-// CHECK:              outs({{.*}}: memref<?xf32, strided<[1], offset: ?>>)
+// CHECK:               ins({{.*}}: memref<?x?xf32, strided<[?, 1]>>, memref<?xf32, strided<[1]>>)
+// CHECK:              outs({{.*}}: memref<?xf32, strided<[1]>>)
 
 // -----
 
-func.func @matmul_perm(%A: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-             %B: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-             %C: memref<?x?xf32, strided<[?, 1], offset: ?>>) {
-  linalg.matmul ins(%A, %B: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                            memref<?x?xf32, strided<[?, 1], offset: ?>>)
-               outs(%C : memref<?x?xf32, strided<[?, 1], offset: ?>>)
+func.func @matmul_perm(%A: memref<?x?xf32, strided<[?, 1]>>,
+             %B: memref<?x?xf32, strided<[?, 1]>>,
+             %C: memref<?x?xf32, strided<[?, 1]>>) {
+  linalg.matmul ins(%A, %B: memref<?x?xf32, strided<[?, 1]>>,
+                            memref<?x?xf32, strided<[?, 1]>>)
+               outs(%C : memref<?x?xf32, strided<[?, 1]>>)
   return
 }
 
@@ -225,5 +225,5 @@ module attributes {transform.with_named_sequence} {
 // CHECK:                       scf.for {{.*}} = %[[c0]] to {{.*}} step %[[c30]] {
 // CHECK:                         scf.for {{.*}} = %[[c0]] to {{.*}} step %[[c40]] {
 // CHECK:                                 linalg.matmul
-// CHECK:                                  ins({{.*}}: memref<?x?xf32, strided<[?, 1], offset: ?>>, memref<?x?xf32, strided<[?, 1], offset: ?>>)
-// CHECK:                                   outs({{.*}}: memref<?x?xf32, strided<[?, 1], offset: ?>>)
+// CHECK:                                  ins({{.*}}: memref<?x?xf32, strided<[?, 1]>>, memref<?x?xf32, strided<[?, 1]>>)
+// CHECK:                                   outs({{.*}}: memref<?x?xf32, strided<[?, 1]>>)
diff --git a/mlir/test/Dialect/Linalg/transform-promotion.mlir b/mlir/test/Dialect/Linalg/transform-promotion.mlir
index 7c4cd623c742d..029df5916db94 100644
--- a/mlir/test/Dialect/Linalg/transform-promotion.mlir
+++ b/mlir/test/Dialect/Linalg/transform-promotion.mlir
@@ -1,28 +1,28 @@
 // RUN: mlir-opt %s -transform-interpreter -split-input-file | FileCheck %s
 
-func.func @promote_subview_matmul(%arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                             %arg1: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                             %arg2: memref<?x?xf32, strided<[?, 1], offset: ?>>) {
+func.func @promote_subview_matmul(%arg0: memref<?x?xf32, strided<[?, 1]>>,
+                             %arg1: memref<?x?xf32, strided<[?, 1]>>,
+                             %arg2: memref<?x?xf32, strided<[?, 1]>>) {
   %c2000 = arith.constant 2000 : index
   %c3000 = arith.constant 3000 : index
   %c4000 = arith.constant 4000 : index
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
-  %0 = memref.dim %arg0, %c0 : memref<?x?xf32, strided<[?, 1], offset: ?>>
-  %1 = memref.dim %arg0, %c1 : memref<?x?xf32, strided<[?, 1], offset: ?>>
-  %2 = memref.dim %arg1, %c1 : memref<?x?xf32, strided<[?, 1], offset: ?>>
+  %0 = memref.dim %arg0, %c0 : memref<?x?xf32, strided<[?, 1]>>
+  %1 = memref.dim %arg0, %c1 : memref<?x?xf32, strided<[?, 1]>>
+  %2 = memref.dim %arg1, %c1 : memref<?x?xf32, strided<[?, 1]>>
   scf.for %arg3 = %c0 to %0 step %c2000 {
     scf.for %arg4 = %c0 to %2 step %c3000 {
       scf.for %arg5 = %c0 to %1 step %c4000 {
         %3 = memref.subview %arg0[%arg3, %arg5][%c2000, %c4000][%c1, %c1] :
-             memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+             memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
         %4 = memref.subview %arg1[%arg5, %arg4][%c4000, %c3000][%c1, %c1] :
-             memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+             memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
         %5 = memref.subview %arg2[%arg3, %arg4][%c2000, %c3000][%c1, %c1] :
-             memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-        linalg.matmul ins(%3, %4: memref<?x?xf32, strided<[?, ?], offset: ?>>,
-                                  memref<?x?xf32, strided<[?, ?], offset: ?>>)
-                     outs(%5: memref<?x?xf32, strided<[?, ?], offset: ?>>)
+             memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
+        linalg.matmul ins(%3, %4: memref<?x?xf32, strided<[?, ?]>>,
+                                  memref<?x?xf32, strided<[?, ?]>>)
+                     outs(%5: memref<?x?xf32, strided<[?, ?]>>)
       }
     }
   }
@@ -68,30 +68,30 @@ module attributes {transform.with_named_sequence} {
 
 // -----
 
-func.func @promote_first_subview_matmul(%arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                             %arg1: memref<?x?xf32, strided<[?, 1], offset: ?>>,
-                             %arg2: memref<?x?xf32, strided<[?, 1], offset: ?>>) {
+func.func @promote_first_subview_matmul(%arg0: memref<?x?xf32, strided<[?, 1]>>,
+                             %arg1: memref<?x?xf32, strided<[?, 1]>>,
+                             %arg2: memref<?x?xf32, strided<[?, 1]>>) {
   %c2000 = arith.constant 2000 : index
   %c3000 = arith.constant 3000 : index
   %c4000 = arith.constant 4000 : index
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
-  %0 = memref.dim %arg0, %c0 : memref<?x?xf32, strided<[?, 1], offset: ?>>
-  %1 = memref.dim %arg0, %c1 : memref<?x?xf32, strided<[?, 1], offset: ?>>
-  %2 = memref.dim %arg1, %c1 : memref<?x?xf32, strided<[?, 1], offset: ?>>
+  %0 = memref.dim %arg0, %c0 : memref<?x?xf32, strided<[?, 1]>>
+  %1 = memref.dim %arg0, %c1 : memref<?x?xf32, strided<[?, 1]>>
+  %2 = memref.dim %arg1, %c1 : memref<?x?xf32, strided<[?, 1]>>
   scf.for %arg3 = %c0 to %0 step %c2000 {
     scf.for %arg4 = %c0 to %2 step %c3000 {
       scf.for %arg5 = %c0 to %1 step %c4000 {
         %3 = memref.subview %arg0[%arg3, %arg5][%c2000, %c4000][%c1, %c1] :
-             memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+             memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
         %4 = memref.subview %arg1[%arg5, %arg4][%c4000, %c3000][%c1, %c1] :
-             memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+             memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
         %5 = memref.subview %arg2[%arg3, %arg4][%c2000, %c3000][%c1, %c1] :
-             memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+             memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
         linalg.matmul {__internal_linalg_transform__ = "_promote_first_view_"}
-          ins(%3, %4: memref<?x?xf32, strided<[?, ?], offset: ?>>,
-                      memref<?x?xf32, strided<[?, ?], offset: ?>>)
-         outs(%5: memref<?x?xf32, strided<[?, ?], offset: ?>>)
+          ins(%3, %4: memref<?x?xf32, strided<[?, ?]>>,
+                      memref<?x?xf32, strided<[?, ?]>>)
+         outs(%5: memref<?x?xf32, strided<[?, ?]>>)
       }
     }
   }
@@ -117,8 +117,8 @@ func.func @promote_first_subview_matmul(%arg0: memref<?x?xf32, strided<[?, 1], o
 // CHECK:         linalg.copy ins(%[[s0]] : memref<?x?xf32, strided{{.*}}>) outs(%[[l0]] : memref<?x?xf32, strided{{.*}}>)
 // CHECK-NOT:     linalg.copy
 // CHECK:         linalg.matmul
-// CHECK-SAME:           ins(%[[v0]], %[[s1]] : memref<?x?xf32>, memref<?x?xf32, strided<[?, ?], offset: ?>>)
-// CHECK-SAME:          outs(%[[s2]] : memref<?x?xf32, strided<[?, ?], offset: ?>>)
+// CHECK-SAME:           ins(%[[v0]], %[[s1]] : memref<?x?xf32>, memref<?x?xf32, strided<[?, ?]>>)
+// CHECK-SAME:          outs(%[[s2]] : memref<?x?xf32, strided<[?, ?]>>)
 
 module attributes {transform.with_named_sequence} {
   transform.named_sequence @__transform_main(%arg1: !transform.any_op) {
@@ -130,16 +130,16 @@ module attributes {transform.with_named_sequence} {
 
 // -----
 
-func.func @aligned_promote_fill(%arg0: memref<?x?xf32, strided<[?, 1], offset: ?>>) {
+func.func @aligned_promote_fill(%arg0: memref<?x?xf32, strided<[?, 1]>>) {
   %c2000 = arith.constant 2000 : index
   %c4000 = arith.constant 4000 : index
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %cf = arith.constant 1.0 : f32
   %3 = memref.subview %arg0[%c0, %c0][%c2000, %c4000][%c1, %c1] :
-         memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+         memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, ?]>>
   linalg.fill
-   ins(%cf : f32) outs(%3 : memref<?x?xf32, strided<[?, ?], offset: ?>>)
+   ins(%cf : f32) outs(%3 : memref<?x?xf32, strided<[?, ?]>>)
   return
 }
 // CHECK-LABEL: func @aligned_promote_fill
@@ -162,7 +162,7 @@ module attributes {transform.with_named_sequence} {
 
 // -----
 
-func.func @aligned_promote_fill_complex(%arg0: memref<?x?xcomplex<f32>, strided<[?, 1], offset: ?>>) {
+func.func @aligned_promote_fill_complex(%arg0: memref<?x?xcomplex<f32>, strided<[?, 1]>>) {
   %c2000 = arith.constant 2000 : index
   %c4000 = arith.constant 4000 : index
   %c0 = arith.constant 0 : index
@@ -170,9 +170,9 @@ func.func @aligned_promote_fill_complex(%arg0: memref<?x?xcomplex<f32>, strided<
   %cf = arith.constant 1.0 : f32
   %cc = complex.create %cf, %cf : complex<f32>
   %3 = memref.subview %arg0[%c0, %c0][%c2000, %c4000][%c1, %c1] :
-         memref<?x?xcomplex<f32>, strided<[?, 1], offset: ?>> to memref<?x?xcomplex<f32>, strided<[?, ?], offset: ?>>
+         memref<?x?xcomplex<f32>, strided<[?, 1]>> to memref<?x?xcomplex<f32>, strided<[?, ?]>>
   linalg.fill ins(%cc : complex<f32>)
-             outs(%3 : memref<?x?xcomplex<f32>, strided<[?, ?], offset: ?>>)
+             outs(%3 : memref<?x?xcomplex<f32>, strided<[?, ?]>>)
   return
 }
 // CHECK-LABEL: func @aligned_promote_fill_complex
diff --git a/mlir/test/Dialect/MemRef/canonicalize.mlir b/mlir/test/Dialect/MemRef/canonicalize.mlir
index 6c4fd6f8f58d6..249bdb984e6d6 100644
--- a/mlir/test/Dialect/MemRef/canonicalize.mlir
+++ b/mlir/test/Dialect/MemRef/canonicalize.mlir
@@ -34,12 +34,12 @@ func.func @collapse_expand_rank0_cancel(%arg0 : memref<1x1xi8>) -> memref<1x1xi8
 //       CHECK:   %[[S:.+]] = memref.subview %[[ARG0]][0, 1, 0, 0] [1, 1, 16, 32] [1, 1, 1, 1] : memref<4x6x16x32xi8> to memref<16x32xi8, strided{{.*}}>
 //       CHECK:   return %[[S]] : memref<16x32xi8, strided{{.*}}>
 func.func @subview_of_size_memcast(%arg : memref<4x6x16x32xi8>) ->
-  memref<16x32xi8, strided<[32, 1], offset: 512>>{
+  memref<16x32xi8, strided<[32, 1]>>{
   %0 = memref.cast %arg : memref<4x6x16x32xi8> to memref<?x?x16x32xi8>
   %1 = memref.subview %0[0, 1, 0, 0] [1, 1, 16, 32] [1, 1, 1, 1] :
     memref<?x?x16x32xi8> to
-    memref<16x32xi8, strided<[32, 1], offset: 512>>
-  return %1 : memref<16x32xi8, strided<[32, 1], offset: 512>>
+    memref<16x32xi8, strided<[32, 1]>>
+  return %1 : memref<16x32xi8, strided<[32, 1]>>
 }
 
 // -----
@@ -47,14 +47,14 @@ func.func @subview_of_size_memcast(%arg : memref<4x6x16x32xi8>) ->
 //       CHECK: func @subview_of_strides_memcast
 //  CHECK-SAME:   %[[ARG0:.[a-z0-9A-Z_]+]]: memref<1x1x?xf32, strided{{.*}}>
 //       CHECK:   %[[S:.+]] = memref.subview %[[ARG0]][0, 0, 0] [1, 1, 4]
-//  CHECK-SAME:                    to memref<1x4xf32, strided<[35, 1], offset: ?>>
+//  CHECK-SAME:                    to memref<1x4xf32, strided<[35, 1]>>
 //       CHECK:   %[[M:.+]] = memref.cast %[[S]]
-//  CHECK-SAME:                    to memref<1x4xf32, strided<[?, ?], offset: ?>>
+//  CHECK-SAME:                    to memref<1x4xf32, strided<[?, ?]>>
 //       CHECK:   return %[[M]]
-func.func @subview_of_strides_memcast(%arg : memref<1x1x?xf32, strided<[35, 7, 1], offset: ?>>) -> memref<1x4xf32, strided<[?, ?], offset: ?>> {
-  %0 = memref.cast %arg : memref<1x1x?xf32, strided<[35, 7, 1], offset: ?>> to memref<1x1x?xf32, strided<[?, ?, ?], offset: ?>>
-  %1 = memref.subview %0[0, 0, 0] [1, 1, 4] [1, 1, 1] : memref<1x1x?xf32, strided<[?, ?, ?], offset: ?>> to memref<1x4xf32, strided<[?, ?], offset: ?>>
-  return %1 : memref<1x4xf32, strided<[?, ?], offset: ?>>
+func.func @subview_of_strides_memcast(%arg : memref<1x1x?xf32, strided<[35, 7, 1]>>) -> memref<1x4xf32, strided<[?, ?]>> {
+  %0 = memref.cast %arg : memref<1x1x?xf32, strided<[35, 7, 1]>> to memref<1x1x?xf32, strided<[?, ?, ?]>>
+  %1 = memref.subview %0[0, 0, 0] [1, 1, 4] [1, 1, 1] : memref<1x1x?xf32, strided<[?, ?, ?]>> to memref<1x4xf32, strided<[?, ?]>>
+  return %1 : memref<1x4xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -71,26 +71,26 @@ func.func @subview_of_static_full_size(%arg0 : memref<4x6x16x32xi8>) -> memref<4
 // -----
 
 // CHECK-LABEL: func @negative_subview_of_static_full_size
-//  CHECK-SAME:   %[[ARG0:.+]]: memref<16x4xf32,  strided<[4, 1], offset: ?>>
+//  CHECK-SAME:   %[[ARG0:.+]]: memref<16x4xf32,  strided<[4, 1]>>
 //  CHECK-SAME:   %[[IDX:.+]]: index
 //       CHECK:   %[[S:.+]] = memref.subview %[[ARG0]][%[[IDX]], 0] [16, 4] [1, 1]
-//  CHECK-SAME:                    to memref<16x4xf32,  strided<[4, 1], offset: ?>>
-//       CHECK:    return %[[S]] : memref<16x4xf32,  strided<[4, 1], offset: ?>>
-func.func @negative_subview_of_static_full_size(%arg0:  memref<16x4xf32,  strided<[4, 1], offset: ?>>, %idx: index) -> memref<16x4xf32,  strided<[4, 1], offset: ?>> {
-  %0 = memref.subview %arg0[%idx, 0][16, 4][1, 1] : memref<16x4xf32,  strided<[4, 1], offset: ?>> to memref<16x4xf32,  strided<[4, 1], offset: ?>>
-  return %0 : memref<16x4xf32,  strided<[4, 1], offset: ?>>
+//  CHECK-SAME:                    to memref<16x4xf32,  strided<[4, 1]>>
+//       CHECK:    return %[[S]] : memref<16x4xf32,  strided<[4, 1]>>
+func.func @negative_subview_of_static_full_size(%arg0:  memref<16x4xf32,  strided<[4, 1]>>, %idx: index) -> memref<16x4xf32,  strided<[4, 1]>> {
+  %0 = memref.subview %arg0[%idx, 0][16, 4][1, 1] : memref<16x4xf32,  strided<[4, 1]>> to memref<16x4xf32,  strided<[4, 1]>>
+  return %0 : memref<16x4xf32,  strided<[4, 1]>>
 }
 
 // -----
 
 func.func @subview_canonicalize(%arg0 : memref<?x?x?xf32>, %arg1 : index,
-    %arg2 : index) -> memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    %arg2 : index) -> memref<?x?x?xf32, strided<[?, ?, ?]>>
 {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c4 = arith.constant 4 : index
-  %0 = memref.subview %arg0[%c0, %arg1, %c1] [%c4, %c1, %arg2] [%c1, %c1, %c1] : memref<?x?x?xf32> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  return %0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  %0 = memref.subview %arg0[%c0, %arg1, %c1] [%c4, %c1, %arg2] [%c1, %c1, %c1] : memref<?x?x?xf32> to memref<?x?x?xf32, strided<[?, ?, ?]>>
+  return %0 : memref<?x?x?xf32, strided<[?, ?, ?]>>
 }
 // CHECK-LABEL: func @subview_canonicalize
 //  CHECK-SAME:   %[[ARG0:.+]]: memref<?x?x?xf32>
@@ -103,13 +103,13 @@ func.func @subview_canonicalize(%arg0 : memref<?x?x?xf32>, %arg1 : index,
 // -----
 
 func.func @rank_reducing_subview_canonicalize(%arg0 : memref<?x?x?xf32>, %arg1 : index,
-  %arg2 : index) -> memref<?x?xf32, strided<[?, ?], offset: ?>>
+  %arg2 : index) -> memref<?x?xf32, strided<[?, ?]>>
 {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c4 = arith.constant 4 : index
-  %0 = memref.subview %arg0[%c0, %arg1, %c1] [%c4, 1, %arg2] [%c1, %c1, %c1] : memref<?x?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %0 : memref<?x?xf32, strided<[?, ?], offset: ?>>
+  %0 = memref.subview %arg0[%c0, %arg1, %c1] [%c4, 1, %arg2] [%c1, %c1, %c1] : memref<?x?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
+  return %0 : memref<?x?xf32, strided<[?, ?]>>
 }
 // CHECK-LABEL: func @rank_reducing_subview_canonicalize
 //  CHECK-SAME:   %[[ARG0:.+]]: memref<?x?x?xf32>
@@ -122,62 +122,62 @@ func.func @rank_reducing_subview_canonicalize(%arg0 : memref<?x?x?xf32>, %arg1 :
 // -----
 
 func.func @multiple_reducing_dims(%arg0 : memref<1x384x384xf32>,
-    %arg1 : index, %arg2 : index, %arg3 : index) -> memref<?xf32, strided<[1], offset: ?>>
+    %arg1 : index, %arg2 : index, %arg3 : index) -> memref<?xf32, strided<[1]>>
 {
   %c1 = arith.constant 1 : index
-  %0 = memref.subview %arg0[0, %arg1, %arg2] [1, %c1, %arg3] [1, 1, 1] : memref<1x384x384xf32> to memref<?x?xf32, strided<[384, 1], offset: ?>>
-  %1 = memref.subview %0[0, 0] [1, %arg3] [1, 1] : memref<?x?xf32, strided<[384, 1], offset: ?>> to memref<?xf32, strided<[1], offset: ?>>
-  return %1 : memref<?xf32, strided<[1], offset: ?>>
+  %0 = memref.subview %arg0[0, %arg1, %arg2] [1, %c1, %arg3] [1, 1, 1] : memref<1x384x384xf32> to memref<?x?xf32, strided<[384, 1]>>
+  %1 = memref.subview %0[0, 0] [1, %arg3] [1, 1] : memref<?x?xf32, strided<[384, 1]>> to memref<?xf32, strided<[1]>>
+  return %1 : memref<?xf32, strided<[1]>>
 }
 //       CHECK: func @multiple_reducing_dims
 //       CHECK:   %[[REDUCED1:.+]] = memref.subview %{{.+}}[0, %{{.+}}, %{{.+}}] [1, 1, %{{.+}}] [1, 1, 1]
-//  CHECK-SAME:       : memref<1x384x384xf32> to memref<1x?xf32, strided<[384, 1], offset: ?>>
+//  CHECK-SAME:       : memref<1x384x384xf32> to memref<1x?xf32, strided<[384, 1]>>
 //       CHECK:   %[[REDUCED2:.+]] = memref.subview %[[REDUCED1]][0, 0] [1, %{{.+}}] [1, 1]
-//  CHECK-SAME:       : memref<1x?xf32, strided<[384, 1], offset: ?>> to memref<?xf32, strided<[1], offset: ?>>
+//  CHECK-SAME:       : memref<1x?xf32, strided<[384, 1]>> to memref<?xf32, strided<[1]>>
 
 // -----
 
 func.func @multiple_reducing_dims_dynamic(%arg0 : memref<?x?x?xf32>,
-    %arg1 : index, %arg2 : index, %arg3 : index) -> memref<?xf32, strided<[1], offset: ?>>
+    %arg1 : index, %arg2 : index, %arg3 : index) -> memref<?xf32, strided<[1]>>
 {
   %c1 = arith.constant 1 : index
-  %0 = memref.subview %arg0[0, %arg1, %arg2] [1, %c1, %arg3] [1, 1, 1] : memref<?x?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-  %1 = memref.subview %0[0, 0] [1, %arg3] [1, 1] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?xf32, strided<[1], offset: ?>>
-  return %1 : memref<?xf32, strided<[1], offset: ?>>
+  %0 = memref.subview %arg0[0, %arg1, %arg2] [1, %c1, %arg3] [1, 1, 1] : memref<?x?x?xf32> to memref<?x?xf32, strided<[?, 1]>>
+  %1 = memref.subview %0[0, 0] [1, %arg3] [1, 1] : memref<?x?xf32, strided<[?, 1]>> to memref<?xf32, strided<[1]>>
+  return %1 : memref<?xf32, strided<[1]>>
 }
 //       CHECK: func @multiple_reducing_dims_dynamic
 //       CHECK:   %[[REDUCED1:.+]] = memref.subview %{{.+}}[0, %{{.+}}, %{{.+}}] [1, 1, %{{.+}}] [1, 1, 1]
-//  CHECK-SAME:       : memref<?x?x?xf32> to memref<1x?xf32, strided<[?, 1], offset: ?>>
+//  CHECK-SAME:       : memref<?x?x?xf32> to memref<1x?xf32, strided<[?, 1]>>
 //       CHECK:   %[[REDUCED2:.+]] = memref.subview %[[REDUCED1]][0, 0] [1, %{{.+}}] [1, 1]
-//  CHECK-SAME:       : memref<1x?xf32, strided<[?, 1], offset: ?>> to memref<?xf32, strided<[1], offset: ?>>
+//  CHECK-SAME:       : memref<1x?xf32, strided<[?, 1]>> to memref<?xf32, strided<[1]>>
 
 // -----
 
-func.func @multiple_reducing_dims_all_dynamic(%arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,
-    %arg1 : index, %arg2 : index, %arg3 : index) -> memref<?xf32, strided<[?], offset: ?>>
+func.func @multiple_reducing_dims_all_dynamic(%arg0 : memref<?x?x?xf32, strided<[?, ?, ?]>>,
+    %arg1 : index, %arg2 : index, %arg3 : index) -> memref<?xf32, strided<[?]>>
 {
   %c1 = arith.constant 1 : index
   %0 = memref.subview %arg0[0, %arg1, %arg2] [1, %c1, %arg3] [1, 1, 1]
-      : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  %1 = memref.subview %0[0, 0] [1, %arg3] [1, 1] : memref<?x?xf32, strided<[?, ?], offset: ?>> to memref<?xf32, strided<[?], offset: ?>>
-  return %1 : memref<?xf32, strided<[?], offset: ?>>
+      : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<?x?xf32, strided<[?, ?]>>
+  %1 = memref.subview %0[0, 0] [1, %arg3] [1, 1] : memref<?x?xf32, strided<[?, ?]>> to memref<?xf32, strided<[?]>>
+  return %1 : memref<?xf32, strided<[?]>>
 }
 //       CHECK: func @multiple_reducing_dims_all_dynamic
 //       CHECK:   %[[REDUCED1:.+]] = memref.subview %{{.+}}[0, %{{.+}}, %{{.+}}] [1, 1, %{{.+}}] [1, 1, 1]
-//  CHECK-SAME:       : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to memref<1x?xf32, strided<[?, ?], offset: ?>>
+//  CHECK-SAME:       : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<1x?xf32, strided<[?, ?]>>
 //       CHECK:   %[[REDUCED2:.+]] = memref.subview %[[REDUCED1]][0, 0] [1, %{{.+}}] [1, 1]
-//  CHECK-SAME:       : memref<1x?xf32, strided<[?, ?], offset: ?>> to memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:       : memref<1x?xf32, strided<[?, ?]>> to memref<?xf32, strided<[?]>>
 
 // -----
 
-func.func @subview_negative_stride1(%arg0 : memref<?xf32>) -> memref<?xf32, strided<[?], offset: ?>>
+func.func @subview_negative_stride1(%arg0 : memref<?xf32>) -> memref<?xf32, strided<[?]>>
 {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant -1 : index
   %1 = memref.dim %arg0, %c0 : memref<?xf32>
   %2 = arith.addi %1, %c1 : index
-  %3 = memref.subview %arg0[%2] [%1] [%c1] : memref<?xf32> to memref<?xf32, strided<[?], offset: ?>>
-  return %3 : memref<?xf32, strided<[?], offset: ?>>
+  %3 = memref.subview %arg0[%2] [%1] [%c1] : memref<?xf32> to memref<?xf32, strided<[?]>>
+  return %3 : memref<?xf32, strided<[?]>>
 }
 //       CHECK: func @subview_negative_stride1
 //  CHECK-SAME:   (%[[ARG0:.*]]: memref<?xf32>)
@@ -185,36 +185,36 @@ func.func @subview_negative_stride1(%arg0 : memref<?xf32>) -> memref<?xf32, stri
 //       CHECK:   %[[C2:.*]] = arith.constant -1
 //       CHECK:   %[[DIM1:.*]] = memref.dim %[[ARG0]], %[[C1]] : memref<?xf32>
 //       CHECK:   %[[DIM2:.*]] = arith.addi %[[DIM1]], %[[C2]] : index
-//       CHECK:   %[[RES1:.*]] = memref.subview %[[ARG0]][%[[DIM2]]] [%[[DIM1]]] [-1] : memref<?xf32> to memref<?xf32, strided<[-1], offset: ?>>
-//       CHECK:   %[[RES2:.*]] = memref.cast %[[RES1]] : memref<?xf32, strided<[-1], offset: ?>> to memref<?xf32, strided<[?], offset: ?>>
-//       CHECK:   return %[[RES2]] : memref<?xf32, strided<[?], offset: ?>>
+//       CHECK:   %[[RES1:.*]] = memref.subview %[[ARG0]][%[[DIM2]]] [%[[DIM1]]] [-1] : memref<?xf32> to memref<?xf32, strided<[-1]>>
+//       CHECK:   %[[RES2:.*]] = memref.cast %[[RES1]] : memref<?xf32, strided<[-1]>> to memref<?xf32, strided<[?]>>
+//       CHECK:   return %[[RES2]] : memref<?xf32, strided<[?]>>
 
 // -----
 
-func.func @subview_negative_stride2(%arg0 : memref<7xf32>) -> memref<?xf32, strided<[?], offset: ?>>
+func.func @subview_negative_stride2(%arg0 : memref<7xf32>) -> memref<?xf32, strided<[?]>>
 {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant -1 : index
   %1 = memref.dim %arg0, %c0 : memref<7xf32>
   %2 = arith.addi %1, %c1 : index
-  %3 = memref.subview %arg0[%2] [%1] [%c1] : memref<7xf32> to memref<?xf32, strided<[?], offset: ?>>
-  return %3 : memref<?xf32, strided<[?], offset: ?>>
+  %3 = memref.subview %arg0[%2] [%1] [%c1] : memref<7xf32> to memref<?xf32, strided<[?]>>
+  return %3 : memref<?xf32, strided<[?]>>
 }
 //       CHECK: func @subview_negative_stride2
 //  CHECK-SAME:   (%[[ARG0:.*]]: memref<7xf32>)
-//       CHECK:   %[[RES1:.*]] = memref.subview %[[ARG0]][6] [7] [-1] : memref<7xf32> to memref<7xf32, strided<[-1], offset: 6>>
-//       CHECK:   %[[RES2:.*]] = memref.cast %[[RES1]] : memref<7xf32, strided<[-1], offset: 6>> to memref<?xf32, strided<[?], offset: ?>>
-//       CHECK:   return %[[RES2]] : memref<?xf32, strided<[?], offset: ?>>
+//       CHECK:   %[[RES1:.*]] = memref.subview %[[ARG0]][6] [7] [-1] : memref<7xf32> to memref<7xf32, strided<[-1]>>
+//       CHECK:   %[[RES2:.*]] = memref.cast %[[RES1]] : memref<7xf32, strided<[-1]>> to memref<?xf32, strided<[?]>>
+//       CHECK:   return %[[RES2]] : memref<?xf32, strided<[?]>>
 
 // -----
 
 // CHECK-LABEL: func @no_fold_subview_negative_size
 //  CHECK:        %[[SUBVIEW:.+]] = memref.subview
 //  CHECK:        return %[[SUBVIEW]]
-func.func @no_fold_subview_negative_size(%input: memref<4x1024xf32>) -> memref<?x256xf32, strided<[1024, 1], offset: 2304>> {
+func.func @no_fold_subview_negative_size(%input: memref<4x1024xf32>) -> memref<?x256xf32, strided<[1024, 1]>> {
   %cst = arith.constant -13 : index
-  %0 = memref.subview %input[2, 256] [%cst, 256] [1, 1] : memref<4x1024xf32> to memref<?x256xf32, strided<[1024, 1], offset: 2304>>
-  return %0 : memref<?x256xf32, strided<[1024, 1], offset: 2304>>
+  %0 = memref.subview %input[2, 256] [%cst, 256] [1, 1] : memref<4x1024xf32> to memref<?x256xf32, strided<[1024, 1]>>
+  return %0 : memref<?x256xf32, strided<[1024, 1]>>
 }
 
 // -----
@@ -222,11 +222,11 @@ func.func @no_fold_subview_negative_size(%input: memref<4x1024xf32>) -> memref<?
 // CHECK-LABEL: func @no_fold_subview_zero_stride
 //  CHECK:        %[[SUBVIEW:.+]] = memref.subview
 //  CHECK:        return %[[SUBVIEW]]
-func.func @no_fold_subview_zero_stride(%arg0 : memref<10xf32>) -> memref<1xf32, strided<[?], offset: 1>> {
+func.func @no_fold_subview_zero_stride(%arg0 : memref<10xf32>) -> memref<1xf32, strided<[?]>> {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
-  %1 = memref.subview %arg0[1] [1] [%c0] : memref<10xf32> to memref<1xf32, strided<[?], offset: 1>>
-  return %1 : memref<1xf32, strided<[?], offset: 1>>
+  %1 = memref.subview %arg0[1] [1] [%c0] : memref<10xf32> to memref<1xf32, strided<[?]>>
+  return %1 : memref<1xf32, strided<[?]>>
 }
 
 // -----
@@ -393,25 +393,25 @@ func.func @alloc_alignment_const_fold() -> memref<?xf32> {
 
 // CHECK-LABEL: func @alloc_const_fold_with_symbols1(
 //  CHECK: %[[c1:.+]] = arith.constant 1 : index
-//  CHECK: %[[mem1:.+]] = memref.alloc({{.*}})[%[[c1]], %[[c1]]] : memref<?xi32, strided{{.*}}>
+//  CHECK: %[[mem1:.+]] = memref.alloc({{.*}})[%[[c1]]] : memref<?xi32, strided{{.*}}>
 //  CHECK: return %[[mem1]] : memref<?xi32, strided{{.*}}>
-func.func @alloc_const_fold_with_symbols1(%arg0 : index) -> memref<?xi32, strided<[?], offset: ?>> {
+func.func @alloc_const_fold_with_symbols1(%arg0 : index) -> memref<?xi32, strided<[?]>> {
   %c1 = arith.constant 1 : index
-  %0 = memref.alloc(%arg0)[%c1, %c1] : memref<?xi32, strided<[?], offset: ?>>
-  return %0 : memref<?xi32, strided<[?], offset: ?>>
+  %0 = memref.alloc(%arg0)[%c1] : memref<?xi32, strided<[?]>>
+  return %0 : memref<?xi32, strided<[?]>>
 }
 
 // -----
 
 // CHECK-LABEL: func @alloc_const_fold_with_symbols2(
 //  CHECK: %[[c1:.+]] = arith.constant 1 : index
-//  CHECK: %[[mem1:.+]] = memref.alloc()[%[[c1]], %[[c1]]] : memref<1xi32, strided{{.*}}>
+//  CHECK: %[[mem1:.+]] = memref.alloc()[%[[c1]]] : memref<1xi32, strided{{.*}}>
 //  CHECK: %[[mem2:.+]] = memref.cast %[[mem1]] : memref<1xi32, strided{{.*}}> to memref<?xi32, strided{{.*}}>
 //  CHECK: return %[[mem2]] : memref<?xi32, strided{{.*}}>
-func.func @alloc_const_fold_with_symbols2() -> memref<?xi32, strided<[?], offset: ?>> {
+func.func @alloc_const_fold_with_symbols2() -> memref<?xi32, strided<[?]>> {
   %c1 = arith.constant 1 : index
-  %0 = memref.alloc(%c1)[%c1, %c1] : memref<?xi32, strided<[?], offset: ?>>
-  return %0 : memref<?xi32, strided<[?], offset: ?>>
+  %0 = memref.alloc(%c1)[%c1] : memref<?xi32, strided<[?]>>
+  return %0 : memref<?xi32, strided<[?]>>
 }
 
 // -----
@@ -472,15 +472,15 @@ func.func @compose_collapse_of_expand_partially_dynamic(%arg0: memref<?xf16>, %a
 // -----
 
 func.func @do_not_compose_collapse_of_expand_non_identity_layout(
-    %arg0: memref<?x?xf32, strided<[?, 1], offset: 0>>, %sz0: index, %sz1: index)
-    -> memref<?xf32, strided<[?], offset: 0>> {
+    %arg0: memref<?x?xf32, strided<[?, 1]>>, %sz0: index, %sz1: index)
+    -> memref<?xf32, strided<[?]>> {
   %1 = memref.expand_shape %arg0 [[0, 1], [2]] output_shape [%sz0, 4, %sz1] :
-    memref<?x?xf32, strided<[?, 1], offset: 0>> into
-    memref<?x4x?xf32, strided<[?, ?, 1], offset: 0>>
+    memref<?x?xf32, strided<[?, 1]>> into
+    memref<?x4x?xf32, strided<[?, ?, 1]>>
   %2 = memref.collapse_shape %1 [[0, 1, 2]] :
-    memref<?x4x?xf32, strided<[?, ?, 1], offset: 0>> into
-    memref<?xf32, strided<[?], offset: 0>>
-  return %2 : memref<?xf32, strided<[?], offset: 0>>
+    memref<?x4x?xf32, strided<[?, ?, 1]>> into
+    memref<?xf32, strided<[?]>>
+  return %2 : memref<?xf32, strided<[?]>>
 }
 // CHECK-LABEL: func @do_not_compose_collapse_of_expand_non_identity_layout
 // CHECK: expand
@@ -680,10 +680,10 @@ func.func @not_fold_memref_expand_static_to_dynamic_cast_if_really_dynamic(%arg0
 // CHECK:           return %[[EXPAND_SHAPE_0]] : memref<8x1x4xf32>
 // CHECK:         }
 func.func @fold_memref_expand_static_to_dynamic_layout(%arg0 : memref<8x4xf32>) -> memref<8x1x4xf32> {
-  %0 = memref.cast %arg0 : memref<8x4xf32> to memref<8x4xf32, strided<[?, ?], offset: ?>>
+  %0 = memref.cast %arg0 : memref<8x4xf32> to memref<8x4xf32, strided<[?, ?]>>
   %1 = memref.expand_shape %0 [[0, 1], [2]] output_shape [8, 1, 4]
-      : memref<8x4xf32, strided<[?, ?], offset: ?>> into memref<8x1x4xf32, strided<[?,?,?], offset: ?>>
-  %2 = memref.cast %1 : memref<8x1x4xf32, strided<[?,?,?], offset: ?>> to memref<8x1x4xf32>
+      : memref<8x4xf32, strided<[?, ?]>> into memref<8x1x4xf32, strided<[?,?,?]>>
+  %2 = memref.cast %1 : memref<8x1x4xf32, strided<[?,?,?]>> to memref<8x1x4xf32>
   return %2 : memref<8x1x4xf32>
 }
 
@@ -734,18 +734,18 @@ func.func @collapse_after_memref_cast_type_change_dynamic(%arg0: memref<1x1x1x?x
 // -----
 
 func.func @reduced_memref(%arg0: memref<2x5x7x1xf32>, %arg1 :index)
-    -> memref<1x4x1xf32, strided<[35, 7, 1], offset: ?>> {
+    -> memref<1x4x1xf32, strided<[35, 7, 1]>> {
   %c0 = arith.constant 0 : index
   %c5 = arith.constant 5 : index
   %c4 = arith.constant 4 : index
   %c2 = arith.constant 2 : index
   %c1 = arith.constant 1 : index
   %0 = memref.subview %arg0[%arg1, %arg1, %arg1, 0] [%c1, %c4, %c1, 1] [1, 1, 1, 1]
-      : memref<2x5x7x1xf32> to memref<?x?x?xf32, strided<[35, 7, 1], offset: ?>>
+      : memref<2x5x7x1xf32> to memref<?x?x?xf32, strided<[35, 7, 1]>>
   %1 = memref.cast %0
-      : memref<?x?x?xf32, strided<[35, 7, 1], offset: ?>> to
-        memref<1x4x1xf32, strided<[35, 7, 1], offset: ?>>
-  return %1 : memref<1x4x1xf32, strided<[35, 7, 1], offset: ?>>
+      : memref<?x?x?xf32, strided<[35, 7, 1]>> to
+        memref<1x4x1xf32, strided<[35, 7, 1]>>
+  return %1 : memref<1x4x1xf32, strided<[35, 7, 1]>>
 }
 
 // CHECK-LABEL: func @reduced_memref
@@ -778,9 +778,9 @@ func.func @fold_no_op_subview(%arg0 : memref<20x42xf32>) -> memref<20x42xf32, st
 
 // -----
 
-func.func @no_fold_subview_with_non_zero_offset(%arg0 : memref<20x42xf32>) -> memref<20x41xf32, strided<[42, 1], offset: 1>> {
-  %0 = memref.subview %arg0[0, 1] [20, 41] [1, 1] : memref<20x42xf32> to memref<20x41xf32, strided<[42, 1], offset: 1>>
-  return %0 : memref<20x41xf32, strided<[42, 1], offset: 1>>
+func.func @no_fold_subview_with_non_zero_offset(%arg0 : memref<20x42xf32>) -> memref<20x41xf32, strided<[42, 1]>> {
+  %0 = memref.subview %arg0[0, 1] [20, 41] [1, 1] : memref<20x42xf32> to memref<20x41xf32, strided<[42, 1]>>
+  return %0 : memref<20x41xf32, strided<[42, 1]>>
 }
 // CHECK-LABEL: func @no_fold_subview_with_non_zero_offset(
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview
@@ -799,11 +799,11 @@ func.func @no_fold_subview_with_non_unit_stride(%arg0 : memref<20x42xf32>) -> me
 // -----
 
 // CHECK-LABEL: func @no_fold_invalid_dynamic_slice
-//       CHECK:   memref.subview %arg0[2] [%{{.*}}] [1] : memref<10xf32> to memref<?xf32, strided<[1], offset: 2>>
-func.func @no_fold_invalid_dynamic_slice(%arg0: memref<10xf32>) -> memref<?xf32, strided<[1], offset: 2>> {
+//       CHECK:   memref.subview %arg0[2] [%{{.*}}] [1] : memref<10xf32> to memref<?xf32, strided<[1]>>
+func.func @no_fold_invalid_dynamic_slice(%arg0: memref<10xf32>) -> memref<?xf32, strided<[1]>> {
   %c11 = arith.constant 11 : index
-  %0 = memref.subview %arg0 [2][%c11][1] : memref<10xf32> to memref<?xf32, strided<[1], offset: 2>>
-  func.return %0 : memref<?xf32, strided<[1], offset: 2>>
+  %0 = memref.subview %arg0 [2][%c11][1] : memref<10xf32> to memref<?xf32, strided<[1]>>
+  func.return %0 : memref<?xf32, strided<[1]>>
 }
 
 // -----
@@ -834,9 +834,9 @@ func.func @atomicrmw_cast_fold(%arg0 : f32, %arg1 : memref<4xf32>, %c : index) {
 // -----
 
 func.func @copy_of_cast(%m1: memref<?xf32>, %m2: memref<*xf32>) {
-  %casted1 = memref.cast %m1 : memref<?xf32> to memref<?xf32, strided<[?], offset: ?>>
-  %casted2 = memref.cast %m2 : memref<*xf32> to memref<?xf32, strided<[?], offset: ?>>
-  memref.copy %casted1, %casted2 : memref<?xf32, strided<[?], offset: ?>> to memref<?xf32, strided<[?], offset: ?>>
+  %casted1 = memref.cast %m1 : memref<?xf32> to memref<?xf32, strided<[?]>>
+  %casted2 = memref.cast %m2 : memref<*xf32> to memref<?xf32, strided<[?]>>
+  memref.copy %casted1, %casted2 : memref<?xf32, strided<[?]>> to memref<?xf32, strided<[?]>>
   return
 }
 
@@ -1036,7 +1036,7 @@ func.func @scope_merge_without_terminator() {
 // static information.
 //
 // CHECK-LABEL: func @extract_strided_metadata_of_cast
-//  CHECK-SAME: %[[ARG:.*]]: memref<3x?xi32, strided<[4, ?], offset: ?>>)
+//  CHECK-SAME: %[[ARG:.*]]: memref<3x?xi32, strided<[4, ?]>>)
 //
 //   CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
 //   CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
@@ -1044,18 +1044,18 @@ func.func @scope_merge_without_terminator() {
 //
 //       CHECK: return %[[BASE]], %[[DYN_OFFSET]], %[[C3]], %[[DYN_SIZES]]#1, %[[C4]], %[[DYN_STRIDES]]#1
 func.func @extract_strided_metadata_of_cast(
-  %arg : memref<3x?xi32, strided<[4, ?], offset:?>>)
+  %arg : memref<3x?xi32, strided<[4, ?]>>)
   -> (memref<i32>, index,
       index, index,
       index, index) {
 
   %cast =
     memref.cast %arg :
-      memref<3x?xi32, strided<[4, ?], offset: ?>> to
-      memref<?x?xi32, strided<[?, ?], offset: ?>>
+      memref<3x?xi32, strided<[4, ?]>> to
+      memref<?x?xi32, strided<[?, ?]>>
 
   %base, %base_offset, %sizes:2, %strides:2 =
-    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?], offset: ?>>
+    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?]>>
     -> memref<i32>, index,
        index, index,
        index, index
@@ -1078,7 +1078,7 @@ func.func @extract_strided_metadata_of_cast(
 // in the destination type.
 //
 // CHECK-LABEL: func @extract_strided_metadata_of_cast_w_csts
-//  CHECK-SAME: %[[ARG:.*]]: memref<?x?xi32, strided<[?, ?], offset: ?>>)
+//  CHECK-SAME: %[[ARG:.*]]: memref<?x?xi32, strided<[?, ?]>>)
 //
 //   CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
 //   CHECK-DAG: %[[C18:.*]] = arith.constant 18 : index
@@ -1087,18 +1087,18 @@ func.func @extract_strided_metadata_of_cast(
 //
 //       CHECK: return %[[BASE]], %[[C25]], %[[C4]], %[[DYN_SIZES]]#1, %[[DYN_STRIDES]]#0, %[[C18]]
 func.func @extract_strided_metadata_of_cast_w_csts(
-  %arg : memref<?x?xi32, strided<[?, ?], offset:?>>)
+  %arg : memref<?x?xi32, strided<[?, ?]>>)
   -> (memref<i32>, index,
       index, index,
       index, index) {
 
   %cast =
     memref.cast %arg :
-      memref<?x?xi32, strided<[?, ?], offset: ?>> to
-      memref<4x?xi32, strided<[?, 18], offset: 25>>
+      memref<?x?xi32, strided<[?, ?]>> to
+      memref<4x?xi32, strided<[?, 18]>>
 
   %base, %base_offset, %sizes:2, %strides:2 =
-    memref.extract_strided_metadata %cast:memref<4x?xi32, strided<[?, 18], offset: 25>>
+    memref.extract_strided_metadata %cast:memref<4x?xi32, strided<[?, 18]>>
     -> memref<i32>, index,
        index, index,
        index, index
@@ -1134,10 +1134,10 @@ func.func @extract_strided_metadata_of_cast_unranked(
   %cast =
     memref.cast %arg :
       memref<*xi32> to
-      memref<?x?xi32, strided<[?, ?], offset: ?>>
+      memref<?x?xi32, strided<[?, ?]>>
 
   %base, %base_offset, %sizes:2, %strides:2 =
-    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?], offset: ?>>
+    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?]>>
     -> memref<i32>, index,
        index, index,
        index, index
@@ -1167,12 +1167,12 @@ func.func @reinterpret_noop(%arg : memref<2x3x4xf32>) -> memref<2x3x4xf32> {
 //       CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[ARG]] to offset: [0], sizes: [100, 100], strides: [100, 1]
 //       CHECK: %[[CAST:.*]] = memref.cast %[[RES]]
 //       CHECK: return %[[CAST]]
-func.func @reinterpret_constant_fold(%arg0: memref<f32>) -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @reinterpret_constant_fold(%arg0: memref<f32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c100 = arith.constant 100 : index
-  %reinterpret_cast = memref.reinterpret_cast %arg0 to offset: [%c0], sizes: [%c100, %c100], strides: [%c100, %c1] : memref<f32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %reinterpret_cast : memref<?x?xf32, strided<[?, ?], offset: ?>>
+  %reinterpret_cast = memref.reinterpret_cast %arg0 to offset: [%c0], sizes: [%c100, %c100], strides: [%c100, %c1] : memref<f32> to memref<?x?xf32, strided<[?, ?]>>
+  return %reinterpret_cast : memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -1220,10 +1220,10 @@ func.func @reinterpret_of_subview(%arg : memref<?xi8>, %size1: index, %size2: in
 //  CHECK-SAME: (%[[ARG:.*]]: memref<8x2xf32>)
 //       CHECK: %[[CAST:.*]] = memref.cast %[[ARG]] : memref<8x2xf32> to memref<?x?xf32,
 //       CHECK: return %[[CAST]]
-func.func @reinterpret_of_extract_strided_metadata_w_type_mistach(%arg0 : memref<8x2xf32>) -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @reinterpret_of_extract_strided_metadata_w_type_mistach(%arg0 : memref<8x2xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %arg0 : memref<8x2xf32> -> memref<f32>, index, index, index, index, index
-  %m2 = memref.reinterpret_cast %base to offset: [%offset], sizes: [%sizes#0, %sizes#1], strides: [%strides#0, %strides#1] : memref<f32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %m2 : memref<?x?xf32, strided<[?, ?], offset: ?>>
+  %m2 = memref.reinterpret_cast %base to offset: [%offset], sizes: [%sizes#0, %sizes#1], strides: [%strides#0, %strides#1] : memref<f32> to memref<?x?xf32, strided<[?, ?]>>
+  return %m2 : memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -1237,11 +1237,11 @@ func.func @reinterpret_of_extract_strided_metadata_w_type_mistach(%arg0 : memref
 //  CHECK-SAME: (%[[ARG:.*]]: memref<8x2xf32>)
 //       CHECK: %[[CAST:.*]] = memref.cast %[[ARG]] : memref<8x2xf32> to memref<?x?xf32,
 //       CHECK: return %[[CAST]]
-func.func @reinterpret_of_extract_strided_metadata_w_constants(%arg0 : memref<8x2xf32>) -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @reinterpret_of_extract_strided_metadata_w_constants(%arg0 : memref<8x2xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %arg0 : memref<8x2xf32> -> memref<f32>, index, index, index, index, index
   %c8 = arith.constant 8: index
-  %m2 = memref.reinterpret_cast %base to offset: [0], sizes: [%c8, 2], strides: [2, %strides#1] : memref<f32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %m2 : memref<?x?xf32, strided<[?, ?], offset: ?>>
+  %m2 = memref.reinterpret_cast %base to offset: [0], sizes: [%c8, 2], strides: [2, %strides#1] : memref<f32> to memref<?x?xf32, strided<[?, ?]>>
+  return %m2 : memref<?x?xf32, strided<[?, ?]>>
 }
 // -----
 
@@ -1250,10 +1250,10 @@ func.func @reinterpret_of_extract_strided_metadata_w_constants(%arg0 : memref<8x
 // CHECK-LABEL: func @reinterpret_of_extract_strided_metadata_same_type
 //  CHECK-SAME: (%[[ARG:.*]]: memref<?x?xf32
 //       CHECK: return %[[ARG]]
-func.func @reinterpret_of_extract_strided_metadata_same_type(%arg0 : memref<?x?xf32, strided<[?,?], offset: ?>>) -> memref<?x?xf32, strided<[?,?], offset: ?>> {
-  %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %arg0 : memref<?x?xf32, strided<[?,?], offset: ?>> -> memref<f32>, index, index, index, index, index
-  %m2 = memref.reinterpret_cast %base to offset: [%offset], sizes: [%sizes#0, %sizes#1], strides: [%strides#0, %strides#1] : memref<f32> to memref<?x?xf32, strided<[?,?], offset:?>>
-  return %m2 : memref<?x?xf32, strided<[?,?], offset:?>>
+func.func @reinterpret_of_extract_strided_metadata_same_type(%arg0 : memref<?x?xf32, strided<[?,?]>>) -> memref<?x?xf32, strided<[?,?]>> {
+  %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %arg0 : memref<?x?xf32, strided<[?,?]>> -> memref<f32>, index, index, index, index, index
+  %m2 = memref.reinterpret_cast %base to offset: [%offset], sizes: [%sizes#0, %sizes#1], strides: [%strides#0, %strides#1] : memref<f32> to memref<?x?xf32, strided<[?,?]>>
+  return %m2 : memref<?x?xf32, strided<[?,?]>>
 }
 
 // -----
@@ -1265,10 +1265,10 @@ func.func @reinterpret_of_extract_strided_metadata_same_type(%arg0 : memref<?x?x
 //       CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[ARG]] to offset: [0], sizes: [4, 2, 2], strides: [1, 1, 1]
 //       CHECK: %[[CAST:.*]] = memref.cast %[[RES]]
 //       CHECK: return %[[CAST]]
-func.func @reinterpret_of_extract_strided_metadata_w_different_stride(%arg0 : memref<8x2xf32>) -> memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> {
+func.func @reinterpret_of_extract_strided_metadata_w_different_stride(%arg0 : memref<8x2xf32>) -> memref<?x?x?xf32, strided<[?, ?, ?]>> {
   %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %arg0 : memref<8x2xf32> -> memref<f32>, index, index, index, index, index
-  %m2 = memref.reinterpret_cast %base to offset: [%offset], sizes: [4, 2, 2], strides: [1, 1, %strides#1] : memref<f32> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  return %m2 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  %m2 = memref.reinterpret_cast %base to offset: [%offset], sizes: [4, 2, 2], strides: [1, 1, %strides#1] : memref<f32> to memref<?x?x?xf32, strided<[?, ?, ?]>>
+  return %m2 : memref<?x?x?xf32, strided<[?, ?, ?]>>
 }
 // -----
 
@@ -1279,10 +1279,10 @@ func.func @reinterpret_of_extract_strided_metadata_w_different_stride(%arg0 : me
 //       CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[ARG]] to offset: [1], sizes: [8, 2], strides: [2, 1]
 //       CHECK: %[[CAST:.*]] = memref.cast %[[RES]]
 //       CHECK: return %[[CAST]]
-func.func @reinterpret_of_extract_strided_metadata_w_different_offset(%arg0 : memref<8x2xf32>) -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @reinterpret_of_extract_strided_metadata_w_different_offset(%arg0 : memref<8x2xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %arg0 : memref<8x2xf32> -> memref<f32>, index, index, index, index, index
-  %m2 = memref.reinterpret_cast %base to offset: [1], sizes: [%sizes#0, %sizes#1], strides: [%strides#0, %strides#1] : memref<f32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %m2 : memref<?x?xf32, strided<[?, ?], offset: ?>>
+  %m2 = memref.reinterpret_cast %base to offset: [1], sizes: [%sizes#0, %sizes#1], strides: [%strides#0, %strides#1] : memref<f32> to memref<?x?xf32, strided<[?, ?]>>
+  return %m2 : memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -1294,14 +1294,14 @@ func.func @reinterpret_of_extract_strided_metadata_w_different_offset(%arg0 : me
 //  CHECK-SAME: (%[[ARG:.*]]: memref<2x3xf32>)
 //       CHECK: %[[SZ:.*]] = arith.constant -1 : index
 //       CHECK: memref.reinterpret_cast %[[ARG]] to offset: [0], sizes: [1, %[[SZ]]], strides: [-1, 1]
-func.func @reinterpret_cast_with_negative_size(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @reinterpret_cast_with_negative_size(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %sz = arith.constant -1 : index
   %output = memref.reinterpret_cast %arg0 to
             offset: [%c0], sizes: [%c1, %sz], strides: [%sz, %c1]
-            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %output : memref<?x?xf32, strided<[?, ?], offset: ?>>
+            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?]>>
+  return %output : memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -1313,14 +1313,14 @@ func.func @reinterpret_cast_with_negative_size(%arg0: memref<2x3xf32>) -> memref
 //  CHECK-SAME: (%[[ARG:.*]]: memref<2x3xf32>)
 //       CHECK: %[[NEG:.*]] = arith.constant -1 : index
 //       CHECK: memref.reinterpret_cast %[[ARG]] to offset: [%[[NEG]]], sizes: [1, 2], strides: [2, 1]
-func.func @reinterpret_cast_with_negative_offset(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @reinterpret_cast_with_negative_offset(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
   %neg = arith.constant -1 : index
   %output = memref.reinterpret_cast %arg0 to
             offset: [%neg], sizes: [%c1, %c2], strides: [%c2, %c1]
-            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %output : memref<?x?xf32, strided<[?, ?], offset: ?>>
+            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?]>>
+  return %output : memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -1330,14 +1330,14 @@ func.func @reinterpret_cast_with_negative_offset(%arg0: memref<2x3xf32>) -> memr
 //  CHECK-SAME: (%[[ARG:.*]]: memref<2x3xf32>)
 //       CHECK: %[[NEG:.*]] = arith.constant -1 : index
 //       CHECK: memref.reinterpret_cast %[[ARG]] to offset: [%[[NEG]]], sizes: [1, %[[NEG]]], strides: [2, 1]
-func.func @reinterpret_cast_with_negative_size_and_offset(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @reinterpret_cast_with_negative_size_and_offset(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
   %neg = arith.constant -1 : index
   %output = memref.reinterpret_cast %arg0 to
             offset: [%neg], sizes: [%c1, %neg], strides: [%c2, %c1]
-            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %output : memref<?x?xf32, strided<[?, ?], offset: ?>>
+            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?]>>
+  return %output : memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -1348,12 +1348,12 @@ func.func @reinterpret_cast_with_negative_size_and_offset(%arg0: memref<2x3xf32>
 //  CHECK-SAME: (%[[ARG:.*]]: memref<2x3xf32>)
 //       CHECK: %[[NEG:.*]] = arith.constant -1 : index
 //       CHECK: memref.reinterpret_cast %[[ARG]] to offset: [%[[NEG]]], sizes: [%[[NEG]], %[[NEG]]], strides: [2, 1]
-func.func @reinterpret_cast_no_fold_with_all_negative_size_and_offset(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @reinterpret_cast_no_fold_with_all_negative_size_and_offset(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %neg = arith.constant -1 : index
   %output = memref.reinterpret_cast %arg0 to
             offset: [%neg], sizes: [%neg, %neg], strides: [2, 1]
-            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %output : memref<?x?xf32, strided<[?, ?], offset: ?>>
+            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?]>>
+  return %output : memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
@@ -1366,25 +1366,25 @@ func.func @reinterpret_cast_no_fold_with_all_negative_size_and_offset(%arg0: mem
 //   CHECK-NOT: arith.constant
 //       CHECK: %[[RC:.*]] = memref.reinterpret_cast %[[ARG]] to offset: [0], sizes: [1, 2], strides: [-1, 1]
 //       CHECK: memref.cast %[[RC]]
-func.func @reinterpret_cast_fold_negative_stride(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+func.func @reinterpret_cast_fold_negative_stride(%arg0: memref<2x3xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
   %neg = arith.constant -1 : index
   %output = memref.reinterpret_cast %arg0 to
             offset: [%c0], sizes: [%c1, %c2], strides: [%neg, %c1]
-            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %output : memref<?x?xf32, strided<[?, ?], offset: ?>>
+            : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?]>>
+  return %output : memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
 
 func.func @canonicalize_rank_reduced_subview(%arg0 : memref<8x?xf32>,
-    %arg1 : index) -> memref<?xf32, strided<[?], offset: ?>> {
+    %arg1 : index) -> memref<?xf32, strided<[?]>> {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
-  %0 = memref.subview %arg0[%c0, %c0] [1, %arg1] [%c1, %c1] : memref<8x?xf32> to memref<?xf32, strided<[?], offset: ?>>
-  return %0 :  memref<?xf32, strided<[?], offset: ?>>
+  %0 = memref.subview %arg0[%c0, %c0] [1, %arg1] [%c1, %c1] : memref<8x?xf32> to memref<?xf32, strided<[?]>>
+  return %0 :  memref<?xf32, strided<[?]>>
 }
 //      CHECK: func @canonicalize_rank_reduced_subview
 // CHECK-SAME:     %[[ARG0:.+]]: memref<8x?xf32>
@@ -1493,20 +1493,20 @@ func.func @expand_collapse_dynamic_do_not_fold_to_cast(%m: memref<1x?x1x32xsi8,
 // -----
 
 // CHECK-LABEL: func @fold_trivial_subviews(
-//  CHECK-SAME:     %[[m:.*]]: memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     %[[m:.*]]: memref<?xf32, strided<[?]>>
 //       CHECK:   %[[subview:.*]] = memref.subview %[[m]][5]
 //       CHECK:   return %[[subview]]
-func.func @fold_trivial_subviews(%m: memref<?xf32, strided<[?], offset: ?>>,
+func.func @fold_trivial_subviews(%m: memref<?xf32, strided<[?]>>,
                                  %sz: index)
-    -> memref<?xf32, strided<[?], offset: ?>>
+    -> memref<?xf32, strided<[?]>>
 {
   %0 = memref.subview %m[5] [%sz] [1]
-      : memref<?xf32, strided<[?], offset: ?>>
-        to memref<?xf32, strided<[?], offset: ?>>
+      : memref<?xf32, strided<[?]>>
+        to memref<?xf32, strided<[?]>>
   %1 = memref.subview %0[0] [%sz] [1]
-      : memref<?xf32, strided<[?], offset: ?>>
-        to memref<?xf32, strided<[?], offset: ?>>
-  return %1 : memref<?xf32, strided<[?], offset: ?>>
+      : memref<?xf32, strided<[?]>>
+        to memref<?xf32, strided<[?]>>
+  return %1 : memref<?xf32, strided<[?]>>
 }
 
 // -----
@@ -1579,14 +1579,14 @@ func.func private @ub_negative_alloc_size() -> memref<?x?x?xi1> {
 // CHECK-LABEL: func @subview_rank_reduction(
 //  CHECK-SAME:     %[[arg0:.*]]: memref<1x384x384xf32>, %[[arg1:.*]]: index
 func.func @subview_rank_reduction(%arg0: memref<1x384x384xf32>, %idx: index)
-    -> memref<?x?xf32, strided<[384, 1], offset: ?>> {
+    -> memref<?x?xf32, strided<[384, 1]>> {
   %c1 = arith.constant 1 : index
-  // CHECK: %[[subview:.*]] = memref.subview %[[arg0]][0, %[[arg1]], %[[arg1]]] [1, 1, %[[arg1]]] [1, 1, 1] : memref<1x384x384xf32> to memref<1x?xf32, strided<[384, 1], offset: ?>>
-  // CHECK: %[[cast:.*]] = memref.cast %[[subview]] : memref<1x?xf32, strided<[384, 1], offset: ?>> to memref<?x?xf32, strided<[384, 1], offset: ?>>
+  // CHECK: %[[subview:.*]] = memref.subview %[[arg0]][0, %[[arg1]], %[[arg1]]] [1, 1, %[[arg1]]] [1, 1, 1] : memref<1x384x384xf32> to memref<1x?xf32, strided<[384, 1]>>
+  // CHECK: %[[cast:.*]] = memref.cast %[[subview]] : memref<1x?xf32, strided<[384, 1]>> to memref<?x?xf32, strided<[384, 1]>>
   %0 = memref.subview %arg0[0, %idx, %idx] [1, %c1, %idx] [1, 1, 1]
-      : memref<1x384x384xf32> to memref<?x?xf32, strided<[384, 1], offset: ?>>
+      : memref<1x384x384xf32> to memref<?x?xf32, strided<[384, 1]>>
   // CHECK: return %[[cast]]
-  return %0 : memref<?x?xf32, strided<[384, 1], offset: ?>>
+  return %0 : memref<?x?xf32, strided<[384, 1]>>
 }
 
 // -----
@@ -1745,10 +1745,10 @@ func.func @non_replace_view_negative_static_dims(%src: memref<?xi8>, %offset : i
 // CHECK-NOT: memref.dim
 // CHECK: return %[[ARG1]]
 func.func @no_crash_dim_of_ambiguous_subview(
-    %arg0: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>, %arg1: index) -> index {
+    %arg0: memref<?x?x?xf32, strided<[?, ?, ?]>>, %arg1: index) -> index {
   %c1 = arith.constant 1 : index
   %subview = memref.subview %arg0[0, 0, 0] [1, %arg1, 1] [1, 1, 1]
-      : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to memref<1x?xf32, strided<[?, ?], offset: ?>>
-  %dim = memref.dim %subview, %c1 : memref<1x?xf32, strided<[?, ?], offset: ?>>
+      : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<1x?xf32, strided<[?, ?]>>
+  %dim = memref.dim %subview, %c1 : memref<1x?xf32, strided<[?, ?]>>
   return %dim : index
 }
diff --git a/mlir/test/Dialect/MemRef/elide-reinterpret-cast.mlir b/mlir/test/Dialect/MemRef/elide-reinterpret-cast.mlir
index da47562e9c0d6..fc6b096d3d623 100644
--- a/mlir/test/Dialect/MemRef/elide-reinterpret-cast.mlir
+++ b/mlir/test/Dialect/MemRef/elide-reinterpret-cast.mlir
@@ -35,7 +35,7 @@ func.func private @concat_nonzero_offset(%src : memref<1x1xf32>,
   %reinterpret_cast = memref.reinterpret_cast %dst
     to offset: [1], sizes: [1, 1], strides: [1, 1]
     : memref<1x108xf32>
-      to memref<1x1xf32, strided<[1, 1], offset: 1>>
+      to memref<1x1xf32, strided<[1, 1]>>
 
   // CHECK-NOT:  memref.copy
   // CHECK:      %[[C0:.*]] = arith.constant 0 : index
@@ -44,7 +44,7 @@ func.func private @concat_nonzero_offset(%src : memref<1x1xf32>,
   // CHECK:      memref.store %[[VAL]], %[[DST]][%[[C0]], %[[C1]]] : memref<1x108xf32>
   memref.copy %src, %reinterpret_cast
     : memref<1x1xf32>
-      to memref<1x1xf32, strided<[1, 1], offset: 1>>
+      to memref<1x1xf32, strided<[1, 1]>>
   return
 }
 
@@ -58,7 +58,7 @@ func.func private @concat_dynamic_offset(%offset: index, %src : memref<1x1xf32>,
   %reinterpret_cast = memref.reinterpret_cast %dst
     to offset: [%offset], sizes: [1, 1], strides: [1, 1]
     : memref<1x108xf32>
-      to memref<1x1xf32, strided<[1, 1], offset: ?>>
+      to memref<1x1xf32, strided<[1, 1]>>
 
   // CHECK-NOT:  memref.copy
   // CHECK:      %[[C0:.*]] = arith.constant 0 : index
@@ -68,7 +68,7 @@ func.func private @concat_dynamic_offset(%offset: index, %src : memref<1x1xf32>,
   // CHECK:      memref.store %[[VAL]], %[[DST]][%[[C0]], %[[OFF]]] : memref<1x108xf32>
   memref.copy %src, %reinterpret_cast
     : memref<1x1xf32>
-      to memref<1x1xf32, strided<[1, 1], offset: ?>>
+      to memref<1x1xf32, strided<[1, 1]>>
   return
 }
 
@@ -167,13 +167,13 @@ func.func private @negative_concat_strided_base(%src: memref<1x1xf32>,
   %reinterpret_cast = memref.reinterpret_cast %dst
     to offset: [6], sizes: [1, 1], strides: [11, 80]
     : memref<8x1xf32, strided<[10, 2]>>
-      to memref<1x1xf32, strided<[11, 80], offset: 6>>
+      to memref<1x1xf32, strided<[11, 80]>>
 
   // CHECK:      memref.copy %arg0, %reinterpret_cast
   // CHECK-NOT:  memref.load
   // CHECK-NOT:  memref.store
   memref.copy %src, %reinterpret_cast
-    : memref<1x1xf32> to memref<1x1xf32, strided<[11, 80], offset: 6>>
+    : memref<1x1xf32> to memref<1x1xf32, strided<[11, 80]>>
 
   return
 }
diff --git a/mlir/test/Dialect/MemRef/emulate-narrow-type.mlir b/mlir/test/Dialect/MemRef/emulate-narrow-type.mlir
index dd64ecc98721a..6062bbfca595a 100644
--- a/mlir/test/Dialect/MemRef/emulate-narrow-type.mlir
+++ b/mlir/test/Dialect/MemRef/emulate-narrow-type.mlir
@@ -198,19 +198,19 @@ func.func @rank_zero_memref() -> i4 {
 
 func.func @memref_strided_i4(%idx : index) -> i4 {
   %arr = memref.alloc() : memref<128xi4>
-  %subview = memref.subview %arr[32] [32] [1] : memref<128xi4> to memref<32xi4, strided<[1], offset:32>>
-  %1 = memref.load %subview[%idx] : memref<32xi4, strided<[1], offset:32>>
+  %subview = memref.subview %arr[32] [32] [1] : memref<128xi4> to memref<32xi4, strided<[1]>>
+  %1 = memref.load %subview[%idx] : memref<32xi4, strided<[1]>>
   return %1 : i4
 }
 
 // CHECK-LABEL: func @memref_strided_i4
 //       CHECK:   %[[ALLOC:.+]] = memref.alloc() : memref<64xi8>
-//       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][16] [16] [1] : memref<64xi8> to memref<16xi8, strided<[1], offset: 16>>
+//       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][16] [16] [1] : memref<64xi8> to memref<16xi8, strided<[1]>>
 //       CHECK:   %[[LOAD:.+]] = memref.load %[[SUBVIEW]]
 
 // CHECK32-LABEL: func @memref_strided_i4
 //       CHECK32:   %[[ALLOC:.+]] = memref.alloc() : memref<16xi32>
-//       CHECK32:   %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][4] [4] [1] : memref<16xi32> to memref<4xi32, strided<[1], offset: 4>>
+//       CHECK32:   %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][4] [4] [1] : memref<16xi32> to memref<4xi32, strided<[1]>>
 //       CHECK32:   %[[LOAD:.+]] = memref.load %[[SUBVIEW]]
 
 // -----
@@ -219,21 +219,21 @@ func.func @memref_subview_dynamic_offset_i4(%idx : index) -> i4 {
   %c0 = arith.constant 0 : index
   %arr = memref.alloc() : memref<512x64x8x16xi4>
   %subview = memref.subview %arr[%idx, 0, 0, 0] [16, 64, 8, 16] [1, 1, 1, 1] : memref<512x64x8x16xi4>
-                                                                            to memref<16x64x8x16xi4, strided<[8192, 128, 16, 1], offset: ?>>
-  %ld = memref.load %subview[%c0, %c0, %c0, %c0] : memref<16x64x8x16xi4, strided<[8192, 128, 16, 1], offset: ?>>
+                                                                            to memref<16x64x8x16xi4, strided<[8192, 128, 16, 1]>>
+  %ld = memref.load %subview[%c0, %c0, %c0, %c0] : memref<16x64x8x16xi4, strided<[8192, 128, 16, 1]>>
   return %ld : i4
 }
 
 // CHECK-LABEL:   func.func @memref_subview_dynamic_offset_i4(
 // CHECK:           %[[ALLOC:.*]] = memref.alloc() : memref<2097152xi8>
 // CHECK:           %[[IDX:.*]] = affine.apply
-// CHECK:           %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][%[[IDX]]] [65536] [1] : memref<2097152xi8> to memref<65536xi8, strided<[1], offset: ?>>
+// CHECK:           %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][%[[IDX]]] [65536] [1] : memref<2097152xi8> to memref<65536xi8, strided<[1]>>
 // CHECK:           memref.load %[[SUBVIEW]]
 
 // CHECK32-LABEL:   func.func @memref_subview_dynamic_offset_i4(
 // CHECK32:           %[[ALLOC:.*]] = memref.alloc() : memref<524288xi32>
 // CHECK32:           %[[IDX:.*]] = affine.apply
-// CHECK32:           %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][%[[IDX]]] [16384] [1] : memref<524288xi32> to memref<16384xi32, strided<[1], offset: ?>>
+// CHECK32:           %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][%[[IDX]]] [16384] [1] : memref<524288xi32> to memref<16384xi32, strided<[1]>>
 // CHECK32:           memref.load %[[SUBVIEW]]
 
 // -----
@@ -242,8 +242,8 @@ func.func @negative_memref_subview_non_contiguous(%idx : index) -> i4 {
   %c0 = arith.constant 0 : index
   %arr = memref.alloc() : memref<40x40xi4>
   // expected-error @+1 {{failed to legalize operation 'memref.subview' that was explicitly marked illegal}}
-  %subview = memref.subview %arr[%idx, 0] [4, 8] [1, 1] : memref<40x40xi4> to memref<4x8xi4, strided<[40, 1], offset:?>>
-  %ld = memref.load %subview[%c0, %c0] : memref<4x8xi4, strided<[40, 1], offset:?>>
+  %subview = memref.subview %arr[%idx, 0] [4, 8] [1, 1] : memref<40x40xi4> to memref<4x8xi4, strided<[40, 1]>>
+  %ld = memref.load %subview[%c0, %c0] : memref<4x8xi4, strided<[40, 1]>>
   return %ld : i4
 }
 
@@ -273,8 +273,8 @@ func.func @reinterpret_cast_memref_load_0D() -> i4 {
 
 func.func @reinterpret_cast_memref_load_1D(%arg0: index) -> i4 {
     %0 = memref.alloc() : memref<5x5xi4>
-    %reinterpret_cast_0 = memref.reinterpret_cast %0 to offset: [8], sizes: [25], strides: [1] : memref<5x5xi4> to memref<25xi4, strided<[1], offset:8>>
-    %1 = memref.load %reinterpret_cast_0[%arg0] : memref<25xi4, strided<[1], offset:8>>
+    %reinterpret_cast_0 = memref.reinterpret_cast %0 to offset: [8], sizes: [25], strides: [1] : memref<5x5xi4> to memref<25xi4, strided<[1]>>
+    %1 = memref.load %reinterpret_cast_0[%arg0] : memref<25xi4, strided<[1]>>
     return %1 : i4
 }
 //   CHECK-DAG: #[[MAP:.+]] = affine_map<()[s0] -> (s0 floordiv 2)>
@@ -282,9 +282,9 @@ func.func @reinterpret_cast_memref_load_1D(%arg0: index) -> i4 {
 //       CHECK: func @reinterpret_cast_memref_load_1D(
 //  CHECK-SAME: %[[ARG0:.+]]: index
 //       CHECK:   %[[ALLOC:.+]] = memref.alloc() : memref<13xi8>
-//       CHECK:   %[[RE_CAST:.+]] = memref.reinterpret_cast %[[ALLOC]] to offset: [4], sizes: [13], strides: [1] : memref<13xi8> to memref<13xi8, strided<[1], offset: 4>>
+//       CHECK:   %[[RE_CAST:.+]] = memref.reinterpret_cast %[[ALLOC]] to offset: [4], sizes: [13], strides: [1] : memref<13xi8> to memref<13xi8, strided<[1]>>
 //       CHECK:   %[[INDEX:.+]] = affine.apply #[[MAP]]()[%[[ARG0]]]
-//       CHECK:   %[[LOAD:.+]] = memref.load %[[RE_CAST]][%[[INDEX]]] : memref<13xi8, strided<[1], offset: 4>>
+//       CHECK:   %[[LOAD:.+]] = memref.load %[[RE_CAST]][%[[INDEX]]] : memref<13xi8, strided<[1]>>
 //       CHECK:   %[[OFFSET:.+]] = affine.apply #[[MAP1]]()[%[[ARG0]]]
 //       CHECK:   %[[CAST:.+]] = arith.index_cast %[[OFFSET]] : index to i8
 //       CHECK:   %[[SHR:.+]] = arith.shrsi %[[LOAD]], %[[CAST]] : i8
@@ -296,9 +296,9 @@ func.func @reinterpret_cast_memref_load_1D(%arg0: index) -> i4 {
 //       CHECK32: func @reinterpret_cast_memref_load_1D(
 //  CHECK32-SAME: %[[ARG0:.+]]: index
 //       CHECK32:   %[[ALLOC:.+]] = memref.alloc() : memref<4xi32>
-//       CHECK32:   %[[RE_CAST:.+]] = memref.reinterpret_cast %[[ALLOC]] to offset: [1], sizes: [4], strides: [1] : memref<4xi32> to memref<4xi32, strided<[1], offset: 1>>
+//       CHECK32:   %[[RE_CAST:.+]] = memref.reinterpret_cast %[[ALLOC]] to offset: [1], sizes: [4], strides: [1] : memref<4xi32> to memref<4xi32, strided<[1]>>
 //       CHECK32:   %[[INDEX:.+]] = affine.apply #[[MAP]]()[%[[ARG0]]]
-//       CHECK32:   %[[LOAD:.+]] = memref.load %[[RE_CAST]][%[[INDEX]]] : memref<4xi32, strided<[1], offset: 1>>
+//       CHECK32:   %[[LOAD:.+]] = memref.load %[[RE_CAST]][%[[INDEX]]] : memref<4xi32, strided<[1]>>
 //       CHECK32:   %[[OFFSET:.+]] = affine.apply #[[MAP1]]()[%[[ARG0]]]
 //       CHECK32:   %[[CAST:.+]] = arith.index_cast %[[OFFSET]] : index to i32
 //       CHECK32:   %[[SHR:.+]] = arith.shrsi %[[LOAD]], %[[CAST]] : i32
diff --git a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
index 70c5e1aee85dc..8ddedd2acd81e 100644
--- a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
+++ b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
@@ -1,8 +1,8 @@
 // RUN: mlir-opt --expand-strided-metadata -split-input-file %s -o - | FileCheck %s
 
 // CHECK-LABEL: func @extract_strided_metadata_constants
-//  CHECK-SAME: (%[[ARG:.*]]: memref<5x4xf32, strided<[4, 1], offset: 2>>)
-func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4, 1], offset: 2>>)
+//  CHECK-SAME: (%[[ARG:.*]]: memref<5x4xf32, strided<[4, 1]>>)
+func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4, 1]>>)
     -> (memref<f32>, index, index, index, index, index) {
   //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
   //   CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
@@ -11,7 +11,7 @@ func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4,
 
   //       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]]
   %base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %base :
-    memref<5x4xf32, strided<[4,1], offset:2>>
+    memref<5x4xf32, strided<[4,1]>>
     -> memref<f32>, index, index, index, index, index
 
   // CHECK: %[[BASE]], %[[C2]], %[[C5]], %[[C4]], %[[C4]], %[[C1]]
@@ -41,7 +41,7 @@ func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4,
 // CHECK-DAG: #[[$STRIDE_MAP:.*]] = affine_map<()[s0, s1] -> (s0 * s1)>
 // CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6] -> (s0 + s1 * s2 + s3 * s4 + s5 * s6)>
 // CHECK-LABEL: func @simplify_subview_all_dynamic
-//  CHECK-SAME: (%[[ARG:.*]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>, %[[DYN_OFFSET0:.*]]: index, %[[DYN_OFFSET1:.*]]: index, %[[DYN_OFFSET2:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_SIZE2:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index, %[[DYN_STRIDE2:.*]]: index)
+//  CHECK-SAME: (%[[ARG:.*]]: memref<?x?x?xf32, strided<[?, ?, ?]>>, %[[DYN_OFFSET0:.*]]: index, %[[DYN_OFFSET1:.*]]: index, %[[DYN_OFFSET2:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_SIZE2:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index, %[[DYN_STRIDE2:.*]]: index)
 //
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[ARG]]
 //
@@ -55,19 +55,19 @@ func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4,
 //
 //       CHECK: return %[[RES]]
 func.func @simplify_subview_all_dynamic(
-    %base: memref<?x?x?xf32, strided<[?,?,?], offset:?>>,
+    %base: memref<?x?x?xf32, strided<[?,?,?]>>,
     %offset0: index, %offset1: index, %offset2: index,
     %size0: index, %size1: index, %size2: index,
     %stride0: index, %stride1: index, %stride2: index)
-    -> memref<?x?x?xf32, strided<[?,?,?], offset:?>> {
+    -> memref<?x?x?xf32, strided<[?,?,?]>> {
 
   %subview = memref.subview %base[%offset0, %offset1, %offset2]
                                  [%size0, %size1, %size2]
                                  [%stride0, %stride1, %stride2] :
-    memref<?x?x?xf32, strided<[?,?,?], offset: ?>> to
-      memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<?x?x?xf32, strided<[?,?,?]>> to
+      memref<?x?x?xf32, strided<[?, ?, ?]>>
 
-  return %subview : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  return %subview : memref<?x?x?xf32, strided<[?, ?, ?]>>
 }
 
 // -----
@@ -103,10 +103,10 @@ func.func @extract_strided_metadata_of_subview(%base: memref<5x4xf32>)
     -> (memref<f32>, index, index, index, index, index) {
 
   %subview = memref.subview %base[0, 2][2, 2][1, 1] :
-    memref<5x4xf32> to memref<2x2xf32, strided<[4, 1], offset: 2>>
+    memref<5x4xf32> to memref<2x2xf32, strided<[4, 1]>>
 
   %base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %subview :
-    memref<2x2xf32, strided<[4,1], offset:2>>
+    memref<2x2xf32, strided<[4,1]>>
     -> memref<f32>, index, index, index, index, index
 
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
@@ -148,10 +148,10 @@ func.func @extract_strided_metadata_of_subview_with_dynamic_size(
     -> (memref<f32>, index, index, index, index, index, index, index) {
 
   %subview = memref.subview %base[3, 4, 2][%size, 6, 3][1, 1, 1] :
-    memref<8x16x24xf32> to memref<?x6x3xf32, strided<[384, 24, 1], offset: 1250>>
+    memref<8x16x24xf32> to memref<?x6x3xf32, strided<[384, 24, 1]>>
 
   %base_buffer, %offset, %sizes:3, %strides:3 = memref.extract_strided_metadata %subview :
-    memref<?x6x3xf32, strided<[384, 24, 1], offset: 1250>>
+    memref<?x6x3xf32, strided<[384, 24, 1]>>
     -> memref<f32>, index, index, index, index, index, index, index
 
   return %base_buffer, %offset, %sizes#0, %sizes#1, %sizes#2, %strides#0, %strides#1, %strides#2 :
@@ -194,10 +194,10 @@ func.func @extract_strided_metadata_of_rank_reduced_subview(%base: memref<8x16x2
     -> (memref<f32>, index, index, index, index, index) {
 
   %subview = memref.subview %base[3, 4, 2][1, 6, 3][1, 1, 1] :
-    memref<8x16x24xf32> to memref<6x3xf32, strided<[24, 1], offset: 1250>>
+    memref<8x16x24xf32> to memref<6x3xf32, strided<[24, 1]>>
 
   %base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %subview :
-    memref<6x3xf32, strided<[24, 1], offset: 1250>>
+    memref<6x3xf32, strided<[24, 1]>>
     -> memref<f32>, index, index, index, index, index
 
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
@@ -244,10 +244,10 @@ func.func @extract_strided_metadata_of_rank_reduced_subview_w_variable_strides(
     -> (memref<f32>, index, index, index, index, index) {
 
   %subview = memref.subview %base[3, 4, 2][1, 6, 3][1, %stride, 1] :
-    memref<8x16x24xf32> to memref<6x3xf32, strided<[?, 1], offset: 1250>>
+    memref<8x16x24xf32> to memref<6x3xf32, strided<[?, 1]>>
 
   %base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %subview :
-    memref<6x3xf32, strided<[?, 1], offset: 1250>>
+    memref<6x3xf32, strided<[?, 1]>>
     -> memref<f32>, index, index, index, index, index
 
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
@@ -288,10 +288,10 @@ func.func @extract_strided_metadata_of_subview_w_variable_offset(
     -> (memref<f32>, index, index, index, index, index) {
 
   %subview = memref.subview %arg0[%arg1, %arg2] [64, 64] [1, 1] :
-    memref<384x128xf32> to memref<64x64xf32, strided<[128, 1], offset: ?>>
+    memref<384x128xf32> to memref<64x64xf32, strided<[128, 1]>>
 
   %base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %subview :
-  memref<64x64xf32, strided<[128, 1], offset: ?>> -> memref<f32>, index, index, index, index, index
+  memref<64x64xf32, strided<[128, 1]>> -> memref<f32>, index, index, index, index, index
 
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
     memref<f32>, index, index, index, index, index
@@ -318,7 +318,7 @@ func.func @extract_strided_metadata_of_subview_w_variable_offset(
 // CHECK-DAG: #[[$STRIDE_MAP:.*]] = affine_map<()[s0, s1] -> (s0 * s1)>
 // CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6] -> (s0 + s1 * s2 + s3 * s4 + s5 * s6)>
 // CHECK-LABEL: func @extract_strided_metadata_of_subview_all_dynamic
-//  CHECK-SAME: (%[[ARG:.*]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>, %[[DYN_OFFSET0:.*]]: index, %[[DYN_OFFSET1:.*]]: index, %[[DYN_OFFSET2:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_SIZE2:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index, %[[DYN_STRIDE2:.*]]: index)
+//  CHECK-SAME: (%[[ARG:.*]]: memref<?x?x?xf32, strided<[?, ?, ?]>>, %[[DYN_OFFSET0:.*]]: index, %[[DYN_OFFSET1:.*]]: index, %[[DYN_OFFSET2:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_SIZE2:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index, %[[DYN_STRIDE2:.*]]: index)
 //
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[ARG]]
 //
@@ -330,7 +330,7 @@ func.func @extract_strided_metadata_of_subview_w_variable_offset(
 //
 //       CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]], %[[FINAL_STRIDE0]], %[[FINAL_STRIDE1]], %[[FINAL_STRIDE2]]
 func.func @extract_strided_metadata_of_subview_all_dynamic(
-    %base: memref<?x?x?xf32, strided<[?,?,?], offset:?>>,
+    %base: memref<?x?x?xf32, strided<[?,?,?]>>,
     %offset0: index, %offset1: index, %offset2: index,
     %size0: index, %size1: index, %size2: index,
     %stride0: index, %stride1: index, %stride2: index)
@@ -339,11 +339,11 @@ func.func @extract_strided_metadata_of_subview_all_dynamic(
   %subview = memref.subview %base[%offset0, %offset1, %offset2]
                                  [%size0, %size1, %size2]
                                  [%stride0, %stride1, %stride2] :
-    memref<?x?x?xf32, strided<[?,?,?], offset: ?>> to
-      memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<?x?x?xf32, strided<[?,?,?]>> to
+      memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   %base_buffer, %offset, %sizes:3, %strides:3 = memref.extract_strided_metadata %subview :
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
     -> memref<f32>, index, index, index, index, index, index, index
 
   return %base_buffer, %offset, %sizes#0, %sizes#1, %sizes#2, %strides#0, %strides#1, %strides#2 :
@@ -394,7 +394,7 @@ func.func @extract_strided_metadata_of_subview_all_dynamic(
 //  CHECK-SAME: (%[[ARG:.*]]: memref<?x?xf32,
 //  CHECK-SAME: %[[SIZE0:.*]]: index,  %[[SIZE1:.*]]: index)
 //
-//   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<?x?xf32, strided<[?, ?], offset: ?>> -> memref<f32>, index, index, index, index, index
+//   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<?x?xf32, strided<[?, ?]>> -> memref<f32>, index, index, index, index, index
 //
 //   CHECK-DAG: %[[DYN_STRIDE0:.*]] = affine.apply #[[$DIM0_STRIDE_MAP]]()[%[[STRIDES]]#0]
 //   CHECK-DAG: %[[DYN_STRIDE1:.*]] = affine.apply #[[$DIM1_STRIDE_MAP]]()[%[[STRIDES]]#0]
@@ -407,16 +407,16 @@ func.func @extract_strided_metadata_of_subview_all_dynamic(
 //
 //   CHECK: return %[[REINTERPRET_CAST]]
 func.func @simplify_expand_shape(
-    %base: memref<?x?xf32, strided<[?,?], offset:?>>,
+    %base: memref<?x?xf32, strided<[?,?]>>,
     %sz0: index, %sz1: index)
-    -> memref<?x7x8x9x10x2x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?], offset: ?>> {
+    -> memref<?x7x8x9x10x2x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?]>> {
 
   %expand_shape = memref.expand_shape %base [[0, 1, 2, 3],[4, 5, 6, 7]] output_shape [%sz0, 7, 8, 9, 10, 2, %sz1, 3] :
-    memref<?x?xf32, strided<[?,?], offset: ?>> into
-      memref<?x7x8x9x10x2x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?], offset: ?>>
+    memref<?x?xf32, strided<[?,?]>> into
+      memref<?x7x8x9x10x2x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?]>>
 
   return %expand_shape :
-    memref<?x7x8x9x10x2x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?], offset: ?>>
+    memref<?x7x8x9x10x2x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?]>>
 }
 
 // -----
@@ -540,7 +540,7 @@ func.func @extract_strided_metadata_of_expand_shape_all_static(
 //   CHECK-DAG: %[[C8:.*]] = arith.constant 8 : index
 //   CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
 //
-//   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<?x?xf32, strided<[?, ?], offset: ?>> -> memref<f32>, index, index, index, index, index
+//   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<?x?xf32, strided<[?, ?]>> -> memref<f32>, index, index, index, index, index
 //
 //   CHECK-DAG: %[[DYN_STRIDE0:.*]] = affine.apply #[[$DIM0_STRIDE_MAP]]()[%[[SIZE1]], %[[STRIDES]]#0]
 //   CHECK-DAG: %[[DYN_STRIDE1:.*]] = affine.apply #[[$DIM1_STRIDE_MAP]]()[%[[STRIDES]]#0]
@@ -551,18 +551,18 @@ func.func @extract_strided_metadata_of_expand_shape_all_static(
 
 //   CHECK: return %[[BASE]], %[[OFFSET]], %[[SIZE0]], %[[SIZE1]], %[[C8]], %[[C9]], %[[C10]], %[[SIZE2]], %[[SIZE3]], %[[C3]], %[[DYN_STRIDE0]], %[[DYN_STRIDE1]], %[[DYN_STRIDE2]], %[[STRIDES]]#0, %[[DYN_STRIDE4]], %[[DYN_STRIDE5]], %[[DYN_STRIDE6]], %[[STRIDES]]#1 : memref<f32>, index, index, index, index, index, index, index, index, index, index, index, index, index
 func.func @extract_strided_metadata_of_expand_shape_all_dynamic(
-    %base: memref<?x?xf32, strided<[?,?], offset:?>>,
+    %base: memref<?x?xf32, strided<[?,?]>>,
     %sz0: index, %sz1: index, %sz2: index, %sz3: index)
     -> (memref<f32>, index,
        index, index, index, index, index, index, index, index,
        index, index, index, index, index, index, index, index) {
 
   %expand_shape = memref.expand_shape %base[[0, 1, 2, 3],[4, 5, 6, 7]] output_shape [%sz0, %sz1, 8, 9, 10, %sz2, %sz3, 3] :
-    memref<?x?xf32, strided<[?,?], offset: ?>> into
-      memref<?x?x8x9x10x?x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?], offset: ?>>
+    memref<?x?xf32, strided<[?,?]>> into
+      memref<?x?x8x9x10x?x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?]>>
 
   %base_buffer, %offset, %sizes:8, %strides:8 = memref.extract_strided_metadata %expand_shape :
-    memref<?x?x8x9x10x?x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?], offset: ?>>
+    memref<?x?x8x9x10x?x?x3xf32, strided<[?, ?, ?, ?, ?, ?, ?, ?]>>
     -> memref<f32>, index,
        index, index, index, index, index, index, index, index,
        index, index, index, index, index, index, index, index
@@ -586,24 +586,24 @@ func.func @extract_strided_metadata_of_expand_shape_all_dynamic(
 // of the expand_shape is empty, the handling of such shape hits a corner
 // case.
 // CHECK-LABEL: func @extract_strided_metadata_of_expand_shape_all_static_0_rank
-//  CHECK-SAME: (%[[ARG:.*]]: memref<i16, strided<[], offset: ?>>)
+//  CHECK-SAME: (%[[ARG:.*]]: memref<i16, strided<[]>>)
 //
 //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
 //
-//   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]] = memref.extract_strided_metadata %[[ARG]] : memref<i16, strided<[], offset: ?>> -> memref<i16>, index
+//   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]] = memref.extract_strided_metadata %[[ARG]] : memref<i16, strided<[]>> -> memref<i16>, index
 //
 //   CHECK: return %[[BASE]], %[[OFFSET]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]] : memref<i16>, index, index, index, index, index, index, index, index, index, index, index
 func.func @extract_strided_metadata_of_expand_shape_all_static_0_rank(
-    %arg : memref<i16, strided<[], offset: ?>>)
+    %arg : memref<i16, strided<[]>>)
     -> (memref<i16>, index,
        index, index, index, index, index,
        index, index, index, index, index) {
 
   %expand_shape = memref.expand_shape %arg[] output_shape [1, 1, 1, 1, 1] :
-    memref<i16, strided<[], offset: ?>> into memref<1x1x1x1x1xi16, strided<[1,1,1,1,1], offset: ?>>
+    memref<i16, strided<[]>> into memref<1x1x1x1x1xi16, strided<[1,1,1,1,1]>>
 
   %base, %offset, %sizes:5, %strides:5 = memref.extract_strided_metadata %expand_shape :
-    memref<1x1x1x1x1xi16, strided<[1,1,1,1,1], offset: ?>>
+    memref<1x1x1x1x1xi16, strided<[1,1,1,1,1]>>
     -> memref<i16>, index,
        index, index, index, index, index,
        index, index, index, index, index
@@ -958,18 +958,18 @@ func.func @simplify_collapse_with_dim_of_size1(%arg0: memref<3x1xf32, strided<[2
 // CHECK-LABEL: func @simplify_collapse_with_dim_of_size1_and_non_1_stride(
 //  CHECK-SAME: %[[ARG:.*]]: memref<1x1xi32, strided<[2, 1]
 //
-//       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<1x1xi32, strided<[2, 1], offset: ?>>
+//       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<1x1xi32, strided<[2, 1]>>
 //
 //       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [1], strides: [2]
 func.func @simplify_collapse_with_dim_of_size1_and_non_1_stride
-    (%arg0: memref<1x1xi32, strided<[2, 1], offset: ?>>)
-    -> memref<1xi32, strided<[2], offset: ?>> {
+    (%arg0: memref<1x1xi32, strided<[2, 1]>>)
+    -> memref<1xi32, strided<[2]>> {
 
   %collapse_shape = memref.collapse_shape %arg0 [[0, 1]] :
-    memref<1x1xi32, strided<[2, 1], offset: ?>>
-    into memref<1xi32, strided<[2], offset: ?>>
+    memref<1x1xi32, strided<[2, 1]>>
+    into memref<1xi32, strided<[2]>>
 
-  return %collapse_shape : memref<1xi32, strided<[2], offset: ?>>
+  return %collapse_shape : memref<1xi32, strided<[2]>>
 }
 
 // -----
@@ -999,18 +999,18 @@ func.func @simplify_collapse_with_dim_of_size1_and_non_1_stride
 // CHECK-LABEL: func @simplify_collapse_with_dim_of_size1_and_resulting_dyn_stride(
 //  CHECK-SAME: %[[ARG:.*]]: memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2]
 //
-//       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:5, %[[STRIDES:.*]]:5 = memref.extract_strided_metadata %[[ARG]] : memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2], offset: ?>>
+//       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:5, %[[STRIDES:.*]]:5 = memref.extract_strided_metadata %[[ARG]] : memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2]>>
 //
 //       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [6, 1], strides: [%[[STRIDES]]#1, %[[STRIDES]]#2]
 func.func @simplify_collapse_with_dim_of_size1_and_resulting_dyn_stride
-    (%arg0: memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2], offset: ?>>)
-    -> memref<6x1xi32, strided<[?, ?], offset: ?>> {
+    (%arg0: memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2]>>)
+    -> memref<6x1xi32, strided<[?, ?]>> {
 
   %collapse_shape = memref.collapse_shape %arg0 [[0, 1], [2, 3, 4]] :
-    memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2], offset: ?>>
-    into memref<6x1xi32, strided<[?, ?], offset: ?>>
+    memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2]>>
+    into memref<6x1xi32, strided<[?, ?]>>
 
-  return %collapse_shape : memref<6x1xi32, strided<[?, ?], offset: ?>>
+  return %collapse_shape : memref<6x1xi32, strided<[?, ?]>>
 }
 
 // -----
@@ -1128,13 +1128,13 @@ func.func @extract_strided_metadata_of_extract_strided_metadata(%arg : memref<i3
 // should come straight from the inputs of the reinterpret_cast.
 //
 // CHECK-LABEL: func @extract_strided_metadata_of_reinterpret_cast
-//  CHECK-SAME: %[[ARG:.*]]: memref<?x?xi32, strided<[?, ?], offset: ?>>, %[[DYN_OFFSET:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index)
+//  CHECK-SAME: %[[ARG:.*]]: memref<?x?xi32, strided<[?, ?]>>, %[[DYN_OFFSET:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index)
 //
 //       CHECK: %[[BASE:.*]], %{{.*}}, %{{.*}}:2, %{{.*}}:2 = memref.extract_strided_metadata %[[ARG]]
 //
 //       CHECK: return %[[BASE]], %[[DYN_OFFSET]], %[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_STRIDE0]], %[[DYN_STRIDE1]]
 func.func @extract_strided_metadata_of_reinterpret_cast(
-  %arg : memref<?x?xi32, strided<[?, ?], offset:?>>,
+  %arg : memref<?x?xi32, strided<[?, ?]>>,
   %offset: index,
   %size0 : index, %size1 : index,
   %stride0 : index, %stride1 : index)
@@ -1147,11 +1147,11 @@ func.func @extract_strided_metadata_of_reinterpret_cast(
       offset: [%offset],
       sizes: [%size0, %size1],
       strides: [%stride0, %stride1] :
-      memref<?x?xi32, strided<[?, ?], offset: ?>> to
-      memref<?x?xi32, strided<[?, ?], offset: ?>>
+      memref<?x?xi32, strided<[?, ?]>> to
+      memref<?x?xi32, strided<[?, ?]>>
 
   %base, %base_offset, %sizes:2, %strides:2 =
-    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?], offset: ?>>
+    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?]>>
     -> memref<i32>, index,
        index, index,
        index, index
@@ -1193,10 +1193,10 @@ func.func @extract_strided_metadata_of_reinterpret_cast_unranked(
       sizes: [%size0, %size1],
       strides: [%stride0, %stride1] :
       memref<*xi32> to
-      memref<?x?xi32, strided<[?, ?], offset: ?>>
+      memref<?x?xi32, strided<[?, ?]>>
 
   %base, %base_offset, %sizes:2, %strides:2 =
-    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?], offset: ?>>
+    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?]>>
     -> memref<i32>, index,
        index, index,
        index, index
@@ -1215,13 +1215,13 @@ func.func @extract_strided_metadata_of_reinterpret_cast_unranked(
 // we handle 0-D properly.
 //
 // CHECK-LABEL: func @extract_strided_metadata_of_reinterpret_cast_rank0
-//  CHECK-SAME: %[[ARG:.*]]: memref<i32, strided<[], offset: ?>>, %[[DYN_OFFSET:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index)
+//  CHECK-SAME: %[[ARG:.*]]: memref<i32, strided<[]>>, %[[DYN_OFFSET:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index)
 //
 //       CHECK: %[[BASE:.*]], %[[BASE_OFFSET:.*]] = memref.extract_strided_metadata %[[ARG]]
 //
 //       CHECK: return %[[BASE]], %[[DYN_OFFSET]], %[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_STRIDE0]], %[[DYN_STRIDE1]]
 func.func @extract_strided_metadata_of_reinterpret_cast_rank0(
-  %arg : memref<i32, strided<[], offset:?>>,
+  %arg : memref<i32, strided<[]>>,
   %offset: index,
   %size0 : index, %size1 : index,
   %stride0 : index, %stride1 : index)
@@ -1234,11 +1234,11 @@ func.func @extract_strided_metadata_of_reinterpret_cast_rank0(
       offset: [%offset],
       sizes: [%size0, %size1],
       strides: [%stride0, %stride1] :
-      memref<i32, strided<[], offset: ?>> to
-      memref<?x?xi32, strided<[?, ?], offset: ?>>
+      memref<i32, strided<[]>> to
+      memref<?x?xi32, strided<[?, ?]>>
 
   %base, %base_offset, %sizes:2, %strides:2 =
-    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?], offset: ?>>
+    memref.extract_strided_metadata %cast:memref<?x?xi32, strided<[?, ?]>>
     -> memref<i32>, index,
        index, index,
        index, index
@@ -1291,15 +1291,15 @@ func.func @extract_strided_metadata_of_get_global()
 // CHECK-LABEL: func @extract_strided_metadata_of_get_global_with_strides()
 //       CHECK:   %[[GET_GLOBAL:.+]] = memref.get_global @const_i32
 //       CHECK:   memref.extract_strided_metadata %[[GET_GLOBAL]]
-memref.global "private" constant @const_i32 : memref<512x384xi32, strided<[420, 1], offset: 0>> = dense<42>
+memref.global "private" constant @const_i32 : memref<512x384xi32, strided<[420, 1]>> = dense<42>
 
 func.func @extract_strided_metadata_of_get_global_with_strides()
     -> (memref<i32>, index, index, index, index, index) {
 
-  %A = memref.get_global @const_i32 : memref<512x384xi32, strided<[420, 1], offset: 0>>
+  %A = memref.get_global @const_i32 : memref<512x384xi32, strided<[420, 1]>>
 
   %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %A :
-    memref<512x384xi32, strided<[420, 1], offset: 0>>
+    memref<512x384xi32, strided<[420, 1]>>
     -> memref<i32>, index, index, index, index, index
 
   return %base, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
@@ -1315,15 +1315,15 @@ func.func @extract_strided_metadata_of_get_global_with_strides()
 // CHECK-LABEL: func @extract_strided_metadata_of_get_global_with_offset()
 //       CHECK:   %[[GET_GLOBAL:.+]] = memref.get_global @const_i32
 //       CHECK:   memref.extract_strided_metadata %[[GET_GLOBAL]]
-memref.global "private" constant @const_i32 : memref<512x384xi32, strided<[384, 1], offset: 20>> = dense<42>
+memref.global "private" constant @const_i32 : memref<512x384xi32, strided<[384, 1]>> = dense<42>
 
 func.func @extract_strided_metadata_of_get_global_with_offset()
     -> (memref<i32>, index, index, index, index, index) {
 
-  %A = memref.get_global @const_i32 : memref<512x384xi32, strided<[384, 1], offset: 20>>
+  %A = memref.get_global @const_i32 : memref<512x384xi32, strided<[384, 1]>>
 
   %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %A :
-    memref<512x384xi32, strided<[384, 1], offset: 20>>
+    memref<512x384xi32, strided<[384, 1]>>
     -> memref<i32>, index, index, index, index, index
 
   return %base, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
diff --git a/mlir/test/Dialect/MemRef/extract-address-computations.mlir b/mlir/test/Dialect/MemRef/extract-address-computations.mlir
index eec3d5c62983b..5818ea4ada895 100644
--- a/mlir/test/Dialect/MemRef/extract-address-computations.mlir
+++ b/mlir/test/Dialect/MemRef/extract-address-computations.mlir
@@ -9,8 +9,8 @@
 // CHECK-SAME: %[[BASE:[^:]*]]: memref{{[^,]*}},
 // CHECK-SAME: %[[DYN_OFFSET:.*]]: index)
 // CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET]], 0, 8] [1, 1, 1] [1, 1, 1] : memref<2x16x16xf32> to memref<1x1x1xf32, strided<[256, 16, 1], offset: ?>>
-// CHECK: %[[LOADED_VAL:.*]] = memref.load %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] : memref<1x1x1xf32, strided<[256, 16, 1], offset: ?>>
+// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET]], 0, 8] [1, 1, 1] [1, 1, 1] : memref<2x16x16xf32> to memref<1x1x1xf32, strided<[256, 16, 1]>>
+// CHECK: %[[LOADED_VAL:.*]] = memref.load %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] : memref<1x1x1xf32, strided<[256, 16, 1]>>
 // CHECK: return %[[LOADED_VAL]] : f32
 
 // expected-remark @below {{transformed}}
@@ -41,8 +41,8 @@ module attributes {transform.with_named_sequence} {
 // CHECK-SAME: %[[BASE:[^:]*]]: memref{{[^,]*}},
 // CHECK-SAME: %[[DYN_OFFSET:.*]]: index)
 // CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET]], 0, 8] [1, 1, 1] [1, 1, 1] : memref<2x16x16xf32> to memref<1x1x1xf32, strided<[256, 16, 1], offset: ?>>
-// CHECK: %[[LOADED_VAL:.*]] = memref.load %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {nontemporal = true} : memref<1x1x1xf32, strided<[256, 16, 1], offset: ?>>
+// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET]], 0, 8] [1, 1, 1] [1, 1, 1] : memref<2x16x16xf32> to memref<1x1x1xf32, strided<[256, 16, 1]>>
+// CHECK: %[[LOADED_VAL:.*]] = memref.load %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {nontemporal = true} : memref<1x1x1xf32, strided<[256, 16, 1]>>
 // CHECK: return %[[LOADED_VAL]] : f32
 func.func @test_load_nontemporal(%base : memref<2x16x16xf32>, %offset : index) -> f32 {
   %c0 = arith.constant 0 : index
@@ -73,8 +73,8 @@ module attributes {transform.with_named_sequence} {
 // CHECK-SAME: %[[DYN_OFFSET:.*]]: index)
 // CHECK-DAG: %[[CF0:.*]] = arith.constant 0.0{{0*e\+00}} : f32
 // CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET]], 0, 8] [1, 1, 1] [1, 1, 1] : memref<2x16x16xf32> to memref<1x1x1xf32, strided<[256, 16, 1], offset: ?>>
-// CHECK: memref.store %[[CF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] : memref<1x1x1xf32, strided<[256, 16, 1], offset: ?>>
+// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET]], 0, 8] [1, 1, 1] [1, 1, 1] : memref<2x16x16xf32> to memref<1x1x1xf32, strided<[256, 16, 1]>>
+// CHECK: memref.store %[[CF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] : memref<1x1x1xf32, strided<[256, 16, 1]>>
 // CHECK: return
 func.func @test_store(%base : memref<2x16x16xf32>, %offset : index) -> () {
   %cf0 = arith.constant 0.0 : f32
@@ -103,8 +103,8 @@ module attributes {transform.with_named_sequence} {
 // CHECK-SAME: %[[DYN_OFFSET:.*]]: index)
 // CHECK-DAG: %[[CF0:.*]] = arith.constant 0.0{{0*e\+00}} : f32
 // CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET]], 0, 8] [1, 1, 1] [1, 1, 1] : memref<2x16x16xf32> to memref<1x1x1xf32, strided<[256, 16, 1], offset: ?>>
-// CHECK: memref.store %[[CF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {nontemporal = true} : memref<1x1x1xf32, strided<[256, 16, 1], offset: ?>>
+// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET]], 0, 8] [1, 1, 1] [1, 1, 1] : memref<2x16x16xf32> to memref<1x1x1xf32, strided<[256, 16, 1]>>
+// CHECK: memref.store %[[CF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {nontemporal = true} : memref<1x1x1xf32, strided<[256, 16, 1]>>
 // CHECK: return
 func.func @test_store_nontemporal(%base : memref<2x16x16xf32>, %offset : index) -> () {
   %cf0 = arith.constant 0.0 : f32
@@ -140,8 +140,8 @@ module attributes {transform.with_named_sequence} {
 // CHECK:  %[[SUM_RES2:.*]] = scf.for %[[IV2:.*]] = %[[C0]] to %[[UPPER_BOUND2]] step %[[C1]] iter_args(%[[SUM_ITER2:.*]] = %[[SUM_ALL]]) -> (f32) {
 // CHECK:    %[[SUM_RES1:.*]] = scf.for %[[IV1:.*]] = %[[C0]] to %[[UPPER_BOUND1]] step %[[C1]] iter_args(%[[SUM_ITER1:.*]] = %[[SUM_ITER2]]) -> (f32) {
 // CHECK:      %[[SUM_RES0:.*]] = scf.for %[[IV0:.*]] = %[[C0]] to %[[UPPER_BOUND0]] step %[[C1]] iter_args(%[[SUM_ITER0:.*]] = %[[SUM_ITER1]]) -> (f32) {
-// CHECK:        %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[IV0]], %[[IV1]], %[[IV2]]] [1, 1, 1] [1, 1, 1] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to memref<1x1x1xf32, strided<[?, ?, ?], offset: ?>>
-// CHECK:        %[[LOADED_VAL:.*]] = memref.load %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] : memref<1x1x1xf32, strided<[?, ?, ?], offset: ?>>
+// CHECK:        %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[IV0]], %[[IV1]], %[[IV2]]] [1, 1, 1] [1, 1, 1] : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<1x1x1xf32, strided<[?, ?, ?]>>
+// CHECK:        %[[LOADED_VAL:.*]] = memref.load %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] : memref<1x1x1xf32, strided<[?, ?, ?]>>
 // CHECK:        %[[RES:.*]] = arith.addf %[[LOADED_VAL]], %[[SUM_ITER2]] : f32
 // CHECK:        scf.yield %[[RES]] : f32
 // CHECK:      }
@@ -150,18 +150,18 @@ module attributes {transform.with_named_sequence} {
 // CHECK:    scf.yield %[[SUM_RES1]] : f32
 // CHECK:  }
 // CHECK:  return %[[SUM_RES2]] : f32
-func.func @testWithLoop(%base : memref<?x?x?xf32, strided<[?,?,?], offset: ?>>) -> f32 {
+func.func @testWithLoop(%base : memref<?x?x?xf32, strided<[?,?,?]>>) -> f32 {
   %sum_all = arith.constant 0.0 : f32
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
-  %upper_bound0 = memref.dim %base, %c0 : memref<?x?x?xf32, strided<[?,?,?], offset: ?>>
-  %upper_bound1 = memref.dim %base, %c1 : memref<?x?x?xf32, strided<[?,?,?], offset: ?>>
-  %upper_bound2 = memref.dim %base, %c2 : memref<?x?x?xf32, strided<[?,?,?], offset: ?>>
+  %upper_bound0 = memref.dim %base, %c0 : memref<?x?x?xf32, strided<[?,?,?]>>
+  %upper_bound1 = memref.dim %base, %c1 : memref<?x?x?xf32, strided<[?,?,?]>>
+  %upper_bound2 = memref.dim %base, %c2 : memref<?x?x?xf32, strided<[?,?,?]>>
   %sum_res2 = scf.for %iv2 = %c0 to %upper_bound2 step %c1 iter_args(%sum_iter2 = %sum_all) -> (f32) {
     %sum_res1 = scf.for %iv1 = %c0 to %upper_bound1 step %c1 iter_args(%sum_iter1 = %sum_iter2) -> (f32) {
       %sum_res0 = scf.for %iv0 = %c0 to %upper_bound0 step %c1 iter_args(%sum_iter0 = %sum_iter1) -> (f32) {
-        %loaded_val = memref.load %base[%iv0, %iv1, %iv2] : memref<?x?x?xf32, strided<[?,?,?], offset: ?>>
+        %loaded_val = memref.load %base[%iv0, %iv1, %iv2] : memref<?x?x?xf32, strided<[?,?,?]>>
         %res = arith.addf %loaded_val, %sum_iter2 : f32
         scf.yield %res : f32
       }
@@ -201,8 +201,8 @@ module attributes {transform.with_named_sequence} {
 // CHECK-DAG: %[[DYN_SIZE1:.*]] = affine.apply #[[$THIRTY_TWO_MINUS_OFF_MAP]]()[%[[DYN_OFFSET1]]]
 // CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$THIRTY_TWO_MINUS_OFF_MAP]]()[%[[DYN_OFFSET2]]]
 // CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<4x32x32xf16, 3> to memref<?x?x?xf16, strided<[1024, 32, 1], offset: ?>, 3>
-// CHECK: %[[LOADED_VAL:.*]] = nvgpu.ldmatrix %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {numTiles = 4 : i32, transpose = false} : memref<?x?x?xf16, strided<[1024, 32, 1], offset: ?>, 3> -> vector<4x2xf16>
+// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<4x32x32xf16, 3> to memref<?x?x?xf16, strided<[1024, 32, 1]>, 3>
+// CHECK: %[[LOADED_VAL:.*]] = nvgpu.ldmatrix %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {numTiles = 4 : i32, transpose = false} : memref<?x?x?xf16, strided<[1024, 32, 1]>, 3> -> vector<4x2xf16>
 // CHECK: return %[[LOADED_VAL]] : vector<4x2xf16>
 func.func @test_ldmatrix(%base : memref<4x32x32xf16, 3>,
     %offset0 : index, %offset1: index, %offset2: index)
@@ -239,8 +239,8 @@ module attributes {transform.with_named_sequence} {
 // CHECK-DAG: %[[DYN_SIZE1:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#1, %[[DYN_OFFSET1]]]
 // CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]
 // CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16, 3> to memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>, 3>
-// CHECK: %[[LOADED_VAL:.*]] = nvgpu.ldmatrix %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {numTiles = 4 : i32, transpose = false} : memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>, 3> -> vector<4x2xf16>
+// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16, 3> to memref<?x?x?xf16, strided<[?, ?, 1]>, 3>
+// CHECK: %[[LOADED_VAL:.*]] = nvgpu.ldmatrix %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {numTiles = 4 : i32, transpose = false} : memref<?x?x?xf16, strided<[?, ?, 1]>, 3> -> vector<4x2xf16>
 // CHECK: return %[[LOADED_VAL]] : vector<4x2xf16>
 func.func @test_ldmatrix(%base : memref<?x?x?xf16, 3>,
     %offset0 : index, %offset1: index, %offset2: index)
@@ -280,8 +280,8 @@ module attributes {transform.with_named_sequence} {
 // CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]
 // CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
 // CHECK-DAG: %[[CF0:.*]] = arith.constant 0.0{{0*e\+00}} : f16
-// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16> to memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>
-// CHECK: %[[LOADED_VAL:.*]] = vector.transfer_read %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]], %[[CF0]] {permutation_map = #[[$PERMUTATION_MAP]]} : memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>, vector<4x2xf16>
+// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16> to memref<?x?x?xf16, strided<[?, ?, 1]>>
+// CHECK: %[[LOADED_VAL:.*]] = vector.transfer_read %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]], %[[CF0]] {permutation_map = #[[$PERMUTATION_MAP]]} : memref<?x?x?xf16, strided<[?, ?, 1]>>, vector<4x2xf16>
 // CHECK: return %[[LOADED_VAL]] : vector<4x2xf16>
 func.func @test_transfer_read_op(%base : memref<?x?x?xf16>,
     %offset0 : index, %offset1: index, %offset2: index)
@@ -351,8 +351,8 @@ module attributes {transform.with_named_sequence} {
 // CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]
 // CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
 // CHECK-DAG: %[[VCF0:.*]] = arith.constant dense<0.0{{0*e\+00}}> : vector<4x2xf16>
-// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16> to memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>
-// CHECK: vector.transfer_write %[[VCF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>
+// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16> to memref<?x?x?xf16, strided<[?, ?, 1]>>
+// CHECK: vector.transfer_write %[[VCF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, memref<?x?x?xf16, strided<[?, ?, 1]>>
 // CHECK: return
 func.func @test_transfer_write_op(%base : memref<?x?x?xf16>,
     %offset0 : index, %offset1: index, %offset2: index) {
@@ -390,13 +390,13 @@ module attributes {transform.with_named_sequence} {
 // CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]
 // CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
 // CHECK-DAG: %[[VCF0:.*]] = arith.constant dense<0.0{{0*e\+00}}> : vector<4x2xf16>
-// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>> to memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>
-// CHECK: vector.transfer_write %[[VCF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>
+// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16, strided<[329, 26, 12]>> to memref<?x?x?xf16, strided<[329, 26, 12]>>
+// CHECK: vector.transfer_write %[[VCF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, memref<?x?x?xf16, strided<[329, 26, 12]>>
 // CHECK: return
-func.func @test_transfer_write_op_with_strides(%base : memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>,
+func.func @test_transfer_write_op_with_strides(%base : memref<?x?x?xf16, strided<[329, 26, 12]>>,
     %offset0 : index, %offset1: index, %offset2: index) {
   %vcf0 = arith.constant dense<0.000000e+00> : vector<4x2xf16>
-  vector.transfer_write %vcf0, %base[%offset0, %offset1, %offset2] { permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : vector<4x2xf16>, memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>
+  vector.transfer_write %vcf0, %base[%offset0, %offset1, %offset2] { permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : vector<4x2xf16>, memref<?x?x?xf16, strided<[329, 26, 12]>>
   return
 }
 
diff --git a/mlir/test/Dialect/MemRef/flatten_memref.mlir b/mlir/test/Dialect/MemRef/flatten_memref.mlir
index c9166b11c8d13..6325d07ad642f 100644
--- a/mlir/test/Dialect/MemRef/flatten_memref.mlir
+++ b/mlir/test/Dialect/MemRef/flatten_memref.mlir
@@ -1,73 +1,73 @@
 // RUN: mlir-opt --flatten-memref %s --split-input-file --verify-diagnostics | FileCheck %s
 
-func.func @load_scalar_from_memref(%input: memref<4x8xf32, strided<[8, 1], offset: 100>>) -> f32 {
+func.func @load_scalar_from_memref(%input: memref<4x8xf32, strided<[8, 1]>>) -> f32 {
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
-  %value = memref.load %input[%c1, %c2] : memref<4x8xf32, strided<[8, 1], offset: 100>>
+  %value = memref.load %input[%c1, %c2] : memref<4x8xf32, strided<[8, 1]>>
   return %value : f32
 }
 // CHECK-LABEL: func @load_scalar_from_memref
 // CHECK-NEXT: %[[C10:.*]] = arith.constant 10 : index
 // CHECK-NEXT: %[[REINT:.*]] = memref.reinterpret_cast %arg0 to offset: [100], sizes: [32], strides: [1]
-// CHECK-SAME: memref<4x8xf32, strided<[8, 1], offset: 100>> to memref<32xf32, strided<[1], offset: 100>>
-// CHECK-NEXT: memref.load %[[REINT]][%[[C10]]] : memref<32xf32, strided<[1], offset: 100>>
+// CHECK-SAME: memref<4x8xf32, strided<[8, 1]>> to memref<32xf32, strided<[1]>>
+// CHECK-NEXT: memref.load %[[REINT]][%[[C10]]] : memref<32xf32, strided<[1]>>
 
 
 // -----
 
-func.func @load_scalar_from_memref_dynamic_dim(%input: memref<?x?xf32, strided<[?, ?], offset: ?>>, %row: index, %col: index) -> f32 {
-  %value = memref.load %input[%col, %row] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+func.func @load_scalar_from_memref_dynamic_dim(%input: memref<?x?xf32, strided<[?, ?]>>, %row: index, %col: index) -> f32 {
+  %value = memref.load %input[%col, %row] : memref<?x?xf32, strided<[?, ?]>>
   return %value : f32
 }
 
 // CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3] -> (s0 * s1 + s2 * s3)>
 // CHECK: #[[MAP1:.*]] = affine_map<()[s0, s1, s2, s3] -> (s0 * s1, s2 * s3)>
 // CHECK: func @load_scalar_from_memref_dynamic_dim
-// CHECK-SAME: (%[[ARG0:.*]]: memref<?x?xf32, strided<[?, ?], offset: ?>>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index)
+// CHECK-SAME: (%[[ARG0:.*]]: memref<?x?xf32, strided<[?, ?]>>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index)
 // CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG0]]
 // CHECK: %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[ARG2]], %[[STRIDES]]#0, %[[ARG1]], %[[STRIDES]]#1]
 // CHECK: %[[SIZE:.*]] = affine.max #[[MAP1]]()[%[[STRIDES]]#0, %[[SIZES]]#0, %[[STRIDES]]#1, %[[SIZES]]#1]
-// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %arg0 to offset: [%[[OFFSET]]], sizes: [%[[SIZE]]], strides: [1] : memref<?x?xf32, strided<[?, ?], offset: ?>> to memref<?xf32, strided<[1], offset: ?>> 
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %arg0 to offset: [%[[OFFSET]]], sizes: [%[[SIZE]]], strides: [1] : memref<?x?xf32, strided<[?, ?]>> to memref<?xf32, strided<[1]>> 
 // CHECK: memref.load %[[REINT]][%[[IDX]]]
 
 // -----
 
-func.func @load_scalar_from_memref_static_dim(%input: memref<8x12xf32, strided<[24, 2], offset: 100>>) -> f32 {
+func.func @load_scalar_from_memref_static_dim(%input: memref<8x12xf32, strided<[24, 2]>>) -> f32 {
    %c7 = arith.constant 7 : index
    %c10 = arith.constant 10 : index
-  %value = memref.load %input[%c7, %c10] : memref<8x12xf32, strided<[24, 2], offset: 100>>
+  %value = memref.load %input[%c7, %c10] : memref<8x12xf32, strided<[24, 2]>>
   return %value : f32
 }
 
 // CHECK-LABEL: func @load_scalar_from_memref_static_dim
-// CHECK-SAME: (%[[ARG0:.*]]: memref<8x12xf32, strided<[24, 2], offset: 100>>)
+// CHECK-SAME: (%[[ARG0:.*]]: memref<8x12xf32, strided<[24, 2]>>)
 // CHECK: %[[C188:.*]] = arith.constant 188 : index
-// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [100], sizes: [192], strides: [1] : memref<8x12xf32, strided<[24, 2], offset: 100>> to memref<192xf32, strided<[1], offset: 100>>
-// CHECK: memref.load %[[REINT]][%[[C188]]] : memref<192xf32, strided<[1], offset: 100>>
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [100], sizes: [192], strides: [1] : memref<8x12xf32, strided<[24, 2]>> to memref<192xf32, strided<[1]>>
+// CHECK: memref.load %[[REINT]][%[[C188]]] : memref<192xf32, strided<[1]>>
 
 // -----
 
-func.func @store_scalar_from_memref_padded(%input: memref<4x8xf32, strided<[18, 2], offset: 100>>, %row: index, %col: index, %value: f32) {
-  memref.store %value, %input[%col, %row] : memref<4x8xf32, strided<[18, 2], offset: 100>>
+func.func @store_scalar_from_memref_padded(%input: memref<4x8xf32, strided<[18, 2]>>, %row: index, %col: index, %value: f32) {
+  memref.store %value, %input[%col, %row] : memref<4x8xf32, strided<[18, 2]>>
   return
 }
 // CHECK: #[[MAP:.*]] = affine_map<()[s0, s1] -> (s0 * 18 + s1 * 2)>
 // CHECK: func @store_scalar_from_memref_padded
-// CHECK-SAME: (%[[ARG0:.*]]: memref<4x8xf32, strided<[18, 2], offset: 100>>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index, %[[ARG3:.*]]: f32)
+// CHECK-SAME: (%[[ARG0:.*]]: memref<4x8xf32, strided<[18, 2]>>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index, %[[ARG3:.*]]: f32)
 // CHECK: %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[ARG2]], %[[ARG1]]]
 // CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]]
-// CHECK: memref.store %[[ARG3]], %[[REINT]][%[[IDX]]] : memref<72xf32, strided<[1], offset: 100>>
+// CHECK: memref.store %[[ARG3]], %[[REINT]][%[[IDX]]] : memref<72xf32, strided<[1]>>
 
 // -----
 
-func.func @store_scalar_from_memref_dynamic_dim(%input: memref<?x?xf32, strided<[?, ?], offset: ?>>, %row: index, %col: index, %value: f32) {
-  memref.store %value, %input[%col, %row] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+func.func @store_scalar_from_memref_dynamic_dim(%input: memref<?x?xf32, strided<[?, ?]>>, %row: index, %col: index, %value: f32) {
+  memref.store %value, %input[%col, %row] : memref<?x?xf32, strided<[?, ?]>>
   return
 }
 // CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3] -> (s0 * s1 + s2 * s3)>
 // CHECK: #[[MAP1:.*]] = affine_map<()[s0, s1, s2, s3] -> (s0 * s1, s2 * s3)>
 // CHECK: func @store_scalar_from_memref_dynamic_dim
-// CHECK-SAME: (%[[ARG0:.*]]: memref<?x?xf32, strided<[?, ?], offset: ?>>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index, %[[ARG3:.*]]: f32)
+// CHECK-SAME: (%[[ARG0:.*]]: memref<?x?xf32, strided<[?, ?]>>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index, %[[ARG3:.*]]: f32)
 // CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG0]]
 // CHECK: %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[ARG2]], %[[STRIDES]]#0, %[[ARG1]], %[[STRIDES]]#1]
 // CHECK: %[[SIZE:.*]] = affine.max #[[MAP1]]()[%[[STRIDES]]#0, %[[SIZES]]#0, %[[STRIDES]]#1, %[[SIZES]]#1]
@@ -309,14 +309,14 @@ func.func @flatten_alloc_strided_row_major() -> memref<4x8xf32, strided<[8, 1]>>
 
 // Non-zero static offset: the flat allocation covers [0, offset+extent) = [0, 82)
 // and the reinterpret_cast restores the original offset in the result type.
-func.func @flatten_alloc_strided_offset() -> memref<4x8xf32, strided<[8, 1], offset: 50>> {
-  %0 = memref.alloc() : memref<4x8xf32, strided<[8, 1], offset: 50>>
-  return %0 : memref<4x8xf32, strided<[8, 1], offset: 50>>
+func.func @flatten_alloc_strided_offset() -> memref<4x8xf32, strided<[8, 1]>> {
+  %0 = memref.alloc() : memref<4x8xf32, strided<[8, 1]>>
+  return %0 : memref<4x8xf32, strided<[8, 1]>>
 }
 
 // CHECK-LABEL: func @flatten_alloc_strided_offset
 // CHECK: %[[ALLOC:.*]] = memref.alloc() : memref<82xf32, strided<[1]>>
-// CHECK: memref.reinterpret_cast %[[ALLOC]] to offset: [50], sizes: [4, 8], strides: [8, 1] : memref<82xf32, strided<[1]>> to memref<4x8xf32, strided<[8, 1], offset: 50>>
+// CHECK: memref.reinterpret_cast %[[ALLOC]] to offset: [50], sizes: [4, 8], strides: [8, 1] : memref<82xf32, strided<[1]>> to memref<4x8xf32, strided<[8, 1]>>
 
 // -----
 
@@ -360,14 +360,14 @@ func.func @chained_alloc_load() -> vector<8xf32> {
 
 // -----
 
-func.func @load_scalar_from_memref_static_dim_col_major(%input: memref<4x8xf32, strided<[1, 4], offset: 100>>, %row: index, %col: index) -> f32 {
-  %value = memref.load %input[%col, %row] : memref<4x8xf32, strided<[1, 4], offset: 100>>
+func.func @load_scalar_from_memref_static_dim_col_major(%input: memref<4x8xf32, strided<[1, 4]>>, %row: index, %col: index) -> f32 {
+  %value = memref.load %input[%col, %row] : memref<4x8xf32, strided<[1, 4]>>
   return %value : f32
 }
 
 // CHECK: #[[MAP:.*]] = affine_map<()[s0, s1] -> (s0 + s1 * 4)>
 // CHECK: func @load_scalar_from_memref_static_dim_col_major
-// CHECK-SAME: (%[[ARG0:.*]]: memref<4x8xf32, strided<[1, 4], offset: 100>>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index)
+// CHECK-SAME: (%[[ARG0:.*]]: memref<4x8xf32, strided<[1, 4]>>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index)
 // CHECK: %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[ARG2]], %[[ARG1]]]
-// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [100], sizes: [32], strides: [1] : memref<4x8xf32, strided<[1, 4], offset: 100>> to memref<32xf32, strided<[1], offset: 100>>
-// CHECK: memref.load %[[REINT]][%[[IDX]]] : memref<32xf32, strided<[1], offset: 100>>
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [100], sizes: [32], strides: [1] : memref<4x8xf32, strided<[1, 4]>> to memref<32xf32, strided<[1]>>
+// CHECK: memref.load %[[REINT]][%[[IDX]]] : memref<32xf32, strided<[1]>>
diff --git a/mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir b/mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir
index 114ba86cda718..de3fc9b2499b5 100644
--- a/mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir
+++ b/mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir
@@ -1,8 +1,8 @@
 // RUN: mlir-opt -fold-memref-alias-ops -split-input-file %s | FileCheck %s
 
 func.func @fold_static_stride_subview_with_load(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index) -> f32 {
-  %0 = memref.subview %arg0[%arg1, %arg2][4, 4][2, 3] : memref<12x32xf32> to memref<4x4xf32, strided<[64, 3], offset: ?>>
-  %1 = memref.load %0[%arg3, %arg4] : memref<4x4xf32, strided<[64, 3], offset: ?>>
+  %0 = memref.subview %arg0[%arg1, %arg2][4, 4][2, 3] : memref<12x32xf32> to memref<4x4xf32, strided<[64, 3]>>
+  %1 = memref.load %0[%arg3, %arg4] : memref<4x4xf32, strided<[64, 3]>>
   return %1 : f32
 }
 //  CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (s0 + s1 * 2)>
@@ -21,8 +21,8 @@ func.func @fold_static_stride_subview_with_load(%arg0 : memref<12x32xf32>, %arg1
 
 func.func @fold_dynamic_stride_subview_with_load(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5 : index, %arg6 : index) -> f32 {
   %0 = memref.subview %arg0[%arg1, %arg2][4, 4][%arg5, %arg6] :
-    memref<12x32xf32> to memref<4x4xf32, strided<[?, ?], offset: ?>>
-  %1 = memref.load %0[%arg3, %arg4] : memref<4x4xf32, strided<[?, ?], offset: ?>>
+    memref<12x32xf32> to memref<4x4xf32, strided<[?, ?]>>
+  %1 = memref.load %0[%arg3, %arg4] : memref<4x4xf32, strided<[?, ?]>>
   return %1 : f32
 }
 //  CHECK-DAG: #[[MAP:.+]] = affine_map<()[s0, s1, s2] -> (s0 + s1 * s2)>
@@ -42,8 +42,8 @@ func.func @fold_dynamic_stride_subview_with_load(%arg0 : memref<12x32xf32>, %arg
 
 func.func @fold_static_stride_subview_with_store(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5 : f32) {
   %0 = memref.subview %arg0[%arg1, %arg2][4, 4][2, 3] :
-    memref<12x32xf32> to memref<4x4xf32, strided<[64, 3], offset: ?>>
-  memref.store %arg5, %0[%arg3, %arg4] : memref<4x4xf32, strided<[64, 3], offset: ?>>
+    memref<12x32xf32> to memref<4x4xf32, strided<[64, 3]>>
+  memref.store %arg5, %0[%arg3, %arg4] : memref<4x4xf32, strided<[64, 3]>>
   return
 }
 //  CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (s0 + s1 * 2)>
@@ -62,8 +62,8 @@ func.func @fold_static_stride_subview_with_store(%arg0 : memref<12x32xf32>, %arg
 
 func.func @fold_dynamic_stride_subview_with_store(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5 : index, %arg6 : index, %arg7 : f32) {
   %0 = memref.subview %arg0[%arg1, %arg2][4, 4][%arg5, %arg6] :
-    memref<12x32xf32> to memref<4x4xf32, strided<[?, ?], offset: ?>>
-  memref.store %arg7, %0[%arg3, %arg4] : memref<4x4xf32, strided<[?, ?], offset: ?>>
+    memref<12x32xf32> to memref<4x4xf32, strided<[?, ?]>>
+  memref.store %arg7, %0[%arg3, %arg4] : memref<4x4xf32, strided<[?, ?]>>
   return
 }
 //  CHECK-DAG: #[[MAP:.+]] = affine_map<()[s0, s1, s2] -> (s0 + s1 * s2)>
@@ -85,8 +85,8 @@ func.func @fold_subview_with_transfer_read_0d(
   %arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index)
     -> vector<f32> {
   %f1 = arith.constant 1.0 : f32
-  %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[], offset: ?>>
-  %1 = vector.transfer_read %0[], %f1 : memref<f32, strided<[], offset: ?>>, vector<f32>
+  %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[]>>
+  %1 = vector.transfer_read %0[], %f1 : memref<f32, strided<[]>>, vector<f32>
   return %1 : vector<f32>
 }
 //      CHECK: func @fold_subview_with_transfer_read_0d
@@ -101,8 +101,8 @@ func.func @fold_subview_with_transfer_read_0d(
 func.func @fold_subview_with_transfer_read(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5 : index, %arg6 : index) -> vector<4xf32> {
   %f1 = arith.constant 1.0 : f32
 
-  %0 = memref.subview %arg0[%arg1, %arg2][4, 4][%arg5, %arg6] : memref<12x32xf32> to memref<4x4xf32, strided<[?, ?], offset: ?>>
-  %1 = vector.transfer_read %0[%arg3, %arg4], %f1 {in_bounds = [true]} : memref<4x4xf32, strided<[?, ?], offset: ?>>, vector<4xf32>
+  %0 = memref.subview %arg0[%arg1, %arg2][4, 4][%arg5, %arg6] : memref<12x32xf32> to memref<4x4xf32, strided<[?, ?]>>
+  %1 = vector.transfer_read %0[%arg3, %arg4], %f1 {in_bounds = [true]} : memref<4x4xf32, strided<[?, ?]>>, vector<4xf32>
   return %1 : vector<4xf32>
 }
 //      CHECK: func @fold_subview_with_transfer_read
@@ -115,8 +115,8 @@ func.func @fold_static_stride_subview_with_transfer_write_0d(
     %arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index,
     %v : vector<f32>) {
   %f1 = arith.constant 1.0 : f32
-  %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[], offset: ?>>
-  vector.transfer_write %v, %0[] {in_bounds = []} : vector<f32>, memref<f32, strided<[], offset: ?>>
+  %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[]>>
+  vector.transfer_write %v, %0[] {in_bounds = []} : vector<f32>, memref<f32, strided<[]>>
   return
 }
 //      CHECK: func @fold_static_stride_subview_with_transfer_write_0d
@@ -131,8 +131,8 @@ func.func @fold_static_stride_subview_with_transfer_write_0d(
 
 func.func @fold_static_stride_subview_with_transfer_write(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5: index, %arg6 : index, %arg7 : vector<4xf32>) {
   %0 = memref.subview %arg0[%arg1, %arg2][4, 4][%arg5, %arg6] :
-    memref<12x32xf32> to memref<4x4xf32, strided<[?, ?], offset: ?>>
-  vector.transfer_write %arg7, %0[%arg3, %arg4] {in_bounds = [true]} : vector<4xf32>, memref<4x4xf32, strided<[?, ?], offset: ?>>
+    memref<12x32xf32> to memref<4x4xf32, strided<[?, ?]>>
+  vector.transfer_write %arg7, %0[%arg3, %arg4] {in_bounds = [true]} : vector<4xf32>, memref<4x4xf32, strided<[?, ?]>>
   return
 }
 //      CHECK: func @fold_static_stride_subview_with_transfer_write
@@ -147,8 +147,8 @@ func.func @fold_rank_reducing_subview_with_load
      %arg7 : index, %arg8 : index, %arg9 : index, %arg10: index,
      %arg11 : index, %arg12 : index, %arg13 : index, %arg14: index,
      %arg15 : index, %arg16 : index) -> f32 {
-  %0 = memref.subview %arg0[%arg1, %arg2, %arg3, %arg4, %arg5, %arg6][4, 1, 1, 4, 1, 1][%arg7, %arg8, %arg9, %arg10, %arg11, %arg12] : memref<?x?x?x?x?x?xf32> to memref<4x1x4x1xf32, strided<[?, ?, ?, ?], offset: ?>>
-  %1 = memref.load %0[%arg13, %arg14, %arg15, %arg16] : memref<4x1x4x1xf32, strided<[?, ?, ?, ?], offset: ?>>
+  %0 = memref.subview %arg0[%arg1, %arg2, %arg3, %arg4, %arg5, %arg6][4, 1, 1, 4, 1, 1][%arg7, %arg8, %arg9, %arg10, %arg11, %arg12] : memref<?x?x?x?x?x?xf32> to memref<4x1x4x1xf32, strided<[?, ?, ?, ?]>>
+  %1 = memref.load %0[%arg13, %arg14, %arg15, %arg16] : memref<4x1x4x1xf32, strided<[?, ?, ?, ?]>>
   return %1 : f32
 }
 //  CHECK-DAG: #[[MAP:.+]] = affine_map<()[s0, s1, s2] -> (s0 + s1 * s2)>
@@ -179,17 +179,17 @@ func.func @fold_rank_reducing_subview_with_load
 // -----
 
 func.func @fold_rank_reducing_subview_1x8x1x3_to_1x8x3_drop_middle_unit_dim(
-    %arg0 : memref<?x?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>>,
+    %arg0 : memref<?x?x?x?xf32, strided<[?, ?, ?, ?]>>,
     %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index) -> f32 {
   %c0 = arith.constant 0 : index
   %0 = memref.subview %arg0[0, 0, 0, 0][1, 8, 1, 3][1, 1, 1, 1]
-      : memref<?x?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>> to
-        memref<1x8x3xf32, strided<[?, ?, ?], offset: ?>>
-  %1 = memref.load %0[%c0, %arg1, %arg2] : memref<1x8x3xf32, strided<[?, ?, ?], offset: ?>>
+      : memref<?x?x?x?xf32, strided<[?, ?, ?, ?]>> to
+        memref<1x8x3xf32, strided<[?, ?, ?]>>
+  %1 = memref.load %0[%c0, %arg1, %arg2] : memref<1x8x3xf32, strided<[?, ?, ?]>>
   return %1 : f32
 }
 //      CHECK: func @fold_rank_reducing_subview_1x8x1x3_to_1x8x3_drop_middle_unit_dim
-// CHECK-SAME:   %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>>
+// CHECK-SAME:   %[[ARG0:[a-zA-Z0-9_]+]]: memref<?x?x?x?xf32, strided<[?, ?, ?, ?]>>
 // CHECK-SAME:   %[[ARG1:[a-zA-Z0-9_]+]]: index
 // CHECK-SAME:   %[[ARG2:[a-zA-Z0-9_]+]]: index
 // CHECK-SAME:   %[[ARG3:[a-zA-Z0-9_]+]]: index
@@ -200,20 +200,20 @@ func.func @fold_rank_reducing_subview_1x8x1x3_to_1x8x3_drop_middle_unit_dim(
 // -----
 
 func.func @fold_vector_transfer_read_with_rank_reduced_subview(
-    %arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,
+    %arg0 : memref<?x?x?xf32, strided<[?, ?, ?]>>,
     %arg1: index, %arg2 : index, %arg3 : index, %arg4: index, %arg5 : index,
     %arg6 : index) -> vector<4xf32> {
   %cst = arith.constant 0.0 : f32
   %0 = memref.subview %arg0[0, %arg1, %arg2] [1, %arg3, %arg4] [1, 1, 1]
-      : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to
-        memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : memref<?x?x?xf32, strided<[?, ?, ?]>> to
+        memref<?x?xf32, strided<[?, ?]>>
   %1 = vector.transfer_read %0[%arg5, %arg6], %cst {in_bounds = [true]}
-      : memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<4xf32>
+      : memref<?x?xf32, strided<[?, ?]>>, vector<4xf32>
   return %1 : vector<4xf32>
 }
 //   CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
 //       CHECK: func @fold_vector_transfer_read_with_rank_reduced_subview
-//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?]>>
 //  CHECK-SAME:    %[[ARG1:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG2:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG3:[a-zA-Z0-9]+]]: index
@@ -228,20 +228,20 @@ func.func @fold_vector_transfer_read_with_rank_reduced_subview(
 // -----
 
 func.func @fold_vector_transfer_write_with_rank_reduced_subview(
-    %arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,
+    %arg0 : memref<?x?x?xf32, strided<[?, ?, ?]>>,
     %arg1 : vector<4xf32>, %arg2: index, %arg3 : index, %arg4 : index,
     %arg5: index, %arg6 : index, %arg7 : index) {
   %cst = arith.constant 0.0 : f32
   %0 = memref.subview %arg0[0, %arg2, %arg3] [1, %arg4, %arg5] [1, 1, 1]
-      : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to
-        memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : memref<?x?x?xf32, strided<[?, ?, ?]>> to
+        memref<?x?xf32, strided<[?, ?]>>
   vector.transfer_write %arg1, %0[%arg6, %arg7] {in_bounds = [true]}
-      : vector<4xf32>, memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : vector<4xf32>, memref<?x?xf32, strided<[?, ?]>>
   return
 }
 //   CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
 //       CHECK: func @fold_vector_transfer_write_with_rank_reduced_subview
-//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?]>>
 //  CHECK-SAME:    %[[ARG1:[a-zA-Z0-9]+]]: vector<4xf32>
 //  CHECK-SAME:    %[[ARG2:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG3:[a-zA-Z0-9]+]]: index
@@ -257,21 +257,21 @@ func.func @fold_vector_transfer_write_with_rank_reduced_subview(
 // -----
 
 func.func @fold_vector_transfer_write_with_inner_rank_reduced_subview(
-    %arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,
+    %arg0 : memref<?x?x?xf32, strided<[?, ?, ?]>>,
     %arg1 : vector<4xf32>, %arg2: index, %arg3 : index, %arg4 : index,
     %arg5: index, %arg6 : index, %arg7 : index) {
   %cst = arith.constant 0.0 : f32
   %0 = memref.subview %arg0[%arg2, %arg3, 0] [%arg4, %arg5, 1] [1, 1, 1]
-      : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to
-        memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : memref<?x?x?xf32, strided<[?, ?, ?]>> to
+        memref<?x?xf32, strided<[?, ?]>>
   vector.transfer_write %arg1, %0[%arg6, %arg7] {in_bounds = [true]}
-      : vector<4xf32>, memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : vector<4xf32>, memref<?x?xf32, strided<[?, ?]>>
   return
 }
 //   CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
 //   CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1, d2) -> (d1)>
 //       CHECK: func @fold_vector_transfer_write_with_inner_rank_reduced_subview
-//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?]>>
 //  CHECK-SAME:    %[[ARG1:[a-zA-Z0-9]+]]: vector<4xf32>
 //  CHECK-SAME:    %[[ARG2:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG3:[a-zA-Z0-9]+]]: index
@@ -288,20 +288,20 @@ func.func @fold_vector_transfer_write_with_inner_rank_reduced_subview(
 // -----
 
 func.func @fold_masked_vector_transfer_read_with_subview(
-    %arg0 : memref<?x?xf32, strided<[?, ?], offset: ?>>,
+    %arg0 : memref<?x?xf32, strided<[?, ?]>>,
     %arg1: index, %arg2 : index, %arg3 : index, %arg4: index, %arg5 : index,
     %arg6 : index, %mask : vector<4xi1>) -> vector<4xf32> {
   %cst = arith.constant 0.0 : f32
   %0 = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1]
-      : memref<?x?xf32, strided<[?, ?], offset: ?>> to
-        memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : memref<?x?xf32, strided<[?, ?]>> to
+        memref<?x?xf32, strided<[?, ?]>>
   %1 = vector.transfer_read %0[%arg5, %arg6], %cst, %mask {in_bounds = [true]}
-      : memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<4xf32>
+      : memref<?x?xf32, strided<[?, ?]>>, vector<4xf32>
   return %1 : vector<4xf32>
 }
 //   CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
 //       CHECK: func @fold_masked_vector_transfer_read_with_subview
-//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?xf32, strided<[?, ?], offset: ?>>
+//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?xf32, strided<[?, ?]>>
 //  CHECK-SAME:    %[[ARG1:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG2:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG3:[a-zA-Z0-9]+]]: index
@@ -316,22 +316,22 @@ func.func @fold_masked_vector_transfer_read_with_subview(
 // -----
 
 func.func @fold_masked_vector_transfer_read_with_rank_reducing_subview(
-    %arg0 : memref<?x?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>>,
+    %arg0 : memref<?x?x?x?xf32, strided<[?, ?, ?, ?]>>,
     %arg1: index, %arg2 : index, %arg3 : index, %arg4: index, %arg5 : index,
     %arg6 : index, %mask : vector<4x3xi1>) -> vector<3x4xf32> {
   %cst = arith.constant 0.0 : f32
   %0 = memref.subview %arg0[0, %arg1, 0, %arg2] [1, %arg3, 1, %arg4] [1, 1, 1, 1]
-      : memref<?x?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>> to
-        memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : memref<?x?x?x?xf32, strided<[?, ?, ?, ?]>> to
+        memref<?x?xf32, strided<[?, ?]>>
   %1 = vector.transfer_read %0[%arg5, %arg6], %cst, %mask {
          permutation_map = affine_map<(d0, d1) -> (d1, d0)>, in_bounds = [true, true]}
-      : memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<3x4xf32>
+      : memref<?x?xf32, strided<[?, ?]>>, vector<3x4xf32>
   return %1 : vector<3x4xf32>
 }
 //   CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
 //   CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d3, d1)>
 //       CHECK: func @fold_masked_vector_transfer_read_with_rank_reducing_subview
-//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>>
+//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32, strided<[?, ?, ?, ?]>>
 //  CHECK-SAME:    %[[ARG1:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG2:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG3:[a-zA-Z0-9]+]]: index
@@ -348,20 +348,20 @@ func.func @fold_masked_vector_transfer_read_with_rank_reducing_subview(
 // -----
 
 func.func @fold_masked_vector_transfer_write_with_subview(
-    %arg0 : memref<?x?xf32, strided<[?, ?], offset: ?>>,
+    %arg0 : memref<?x?xf32, strided<[?, ?]>>,
     %arg1 : vector<4xf32>, %arg2: index, %arg3 : index, %arg4 : index,
     %arg5: index, %arg6 : index, %arg7 : index, %mask : vector<4xi1>) {
   %cst = arith.constant 0.0 : f32
   %0 = memref.subview %arg0[%arg2, %arg3] [%arg4, %arg5] [1, 1]
-      : memref<?x?xf32, strided<[?, ?], offset: ?>> to
-        memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : memref<?x?xf32, strided<[?, ?]>> to
+        memref<?x?xf32, strided<[?, ?]>>
   vector.transfer_write %arg1, %0[%arg6, %arg7], %mask {in_bounds = [true]}
-      : vector<4xf32>, memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : vector<4xf32>, memref<?x?xf32, strided<[?, ?]>>
   return
 }
 //   CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
 //       CHECK: func @fold_masked_vector_transfer_write_with_subview
-//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?xf32, strided<[?, ?], offset: ?>>
+//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?xf32, strided<[?, ?]>>
 //  CHECK-SAME:    %[[ARG1:[a-zA-Z0-9]+]]: vector<4xf32>
 //  CHECK-SAME:    %[[ARG2:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG3:[a-zA-Z0-9]+]]: index
@@ -377,22 +377,22 @@ func.func @fold_masked_vector_transfer_write_with_subview(
 // -----
 
 func.func @fold_masked_vector_transfer_write_with_rank_reducing_subview(
-    %arg0 : memref<?x?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>>,
+    %arg0 : memref<?x?x?x?xf32, strided<[?, ?, ?, ?]>>,
     %arg1 : vector<3x4xf32>, %arg2: index, %arg3 : index, %arg4 : index,
     %arg5: index, %arg6 : index, %arg7 : index, %mask : vector<4x3xi1>) {
   %cst = arith.constant 0.0 : f32
   %0 = memref.subview %arg0[0, %arg2, 0, %arg3] [1, %arg4, 1, %arg5] [1, 1, 1, 1]
-      : memref<?x?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>> to
-        memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : memref<?x?x?x?xf32, strided<[?, ?, ?, ?]>> to
+        memref<?x?xf32, strided<[?, ?]>>
   vector.transfer_write %arg1, %0[%arg6, %arg7], %mask {
         permutation_map = affine_map<(d0, d1) -> (d1, d0)>, in_bounds = [true, true]}
-      : vector<3x4xf32>, memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : vector<3x4xf32>, memref<?x?xf32, strided<[?, ?]>>
   return
 }
 //   CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
 //   CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d3, d1)>
 //       CHECK: func @fold_masked_vector_transfer_write_with_rank_reducing_subview
-//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>>
+//  CHECK-SAME:    %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32, strided<[?, ?, ?, ?]>>
 //  CHECK-SAME:    %[[ARG1:[a-zA-Z0-9]+]]: vector<3x4xf32>
 //  CHECK-SAME:    %[[ARG2:[a-zA-Z0-9]+]]: index
 //  CHECK-SAME:    %[[ARG3:[a-zA-Z0-9]+]]: index
@@ -475,35 +475,35 @@ func.func @fold_static_stride_subview_with_memref_expand_shape_with_constant_acc
 // CHECK-LABEL: func @subview_of_subview(
 //  CHECK-SAME:     %[[m:.*]]: memref<8x1024xf32, 3>, %[[pos:.*]]: index
 //       CHECK:   %[[add:.*]] = affine.apply #[[$map]]()[%arg1]
-//       CHECK:   memref.subview %arg0[4, %[[add]]] [1, 1] [1, 1] : memref<8x1024xf32, 3> to memref<f32, strided<[], offset: ?>, 3>
+//       CHECK:   memref.subview %arg0[4, %[[add]]] [1, 1] [1, 1] : memref<8x1024xf32, 3> to memref<f32, strided<[]>, 3>
 func.func @subview_of_subview(%m: memref<8x1024xf32, 3>, %pos: index)
-    -> memref<f32, strided<[], offset: ?>, 3>
+    -> memref<f32, strided<[]>, 3>
 {
   %0 = memref.subview %m[3, %pos] [5, 7] [1, 1]
       : memref<8x1024xf32, 3>
-        to memref<5x7xf32, strided<[1024, 1], offset: ?>, 3>
+        to memref<5x7xf32, strided<[1024, 1]>, 3>
   %1 = memref.subview %0[1, 2] [1, 1] [1, 1]
-      : memref<5x7xf32, strided<[1024, 1], offset: ?>, 3>
-        to memref<f32, strided<[], offset: ?>, 3>
-  return %1 : memref<f32, strided<[], offset: ?>, 3>
+      : memref<5x7xf32, strided<[1024, 1]>, 3>
+        to memref<f32, strided<[]>, 3>
+  return %1 : memref<f32, strided<[]>, 3>
 }
 
 // -----
 
 // CHECK-LABEL: func @subview_of_subview_rank_reducing(
 //  CHECK-SAME:     %[[m:.*]]: memref<?x?x?xf32>
-//       CHECK:   memref.subview %arg0[3, 7, 8] [1, 1, 1] [1, 1, 1] : memref<?x?x?xf32> to memref<f32, strided<[], offset: ?>>
+//       CHECK:   memref.subview %arg0[3, 7, 8] [1, 1, 1] [1, 1, 1] : memref<?x?x?xf32> to memref<f32, strided<[]>>
 func.func @subview_of_subview_rank_reducing(%m: memref<?x?x?xf32>,
                                             %sz: index, %pos: index)
-    -> memref<f32, strided<[], offset: ?>>
+    -> memref<f32, strided<[]>>
 {
   %0 = memref.subview %m[3, 1, 8] [1, %sz, 1] [1, 1, 1]
       : memref<?x?x?xf32>
-        to memref<?xf32, strided<[?], offset: ?>>
+        to memref<?xf32, strided<[?]>>
   %1 = memref.subview %0[6] [1] [1]
-      : memref<?xf32, strided<[?], offset: ?>>
-        to memref<f32, strided<[], offset: ?>>
-  return %1 : memref<f32, strided<[], offset: ?>>
+      : memref<?xf32, strided<[?]>>
+        to memref<f32, strided<[]>>
+  return %1 : memref<f32, strided<[]>>
 }
 
 // -----
@@ -511,8 +511,8 @@ func.func @subview_of_subview_rank_reducing(%m: memref<?x?x?xf32>,
 // CHECK-LABEL: func @fold_load_keep_nontemporal(
 //      CHECK:   memref.load %{{.+}}[%{{.+}}, %{{.+}}] {nontemporal = true}
 func.func @fold_load_keep_nontemporal(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index) -> f32 {
-  %0 = memref.subview %arg0[%arg1, %arg2][4, 4][2, 3] : memref<12x32xf32> to memref<4x4xf32, strided<[64, 3], offset: ?>>
-  %1 = memref.load %0[%arg3, %arg4] {nontemporal = true }: memref<4x4xf32, strided<[64, 3], offset: ?>>
+  %0 = memref.subview %arg0[%arg1, %arg2][4, 4][2, 3] : memref<12x32xf32> to memref<4x4xf32, strided<[64, 3]>>
+  %1 = memref.load %0[%arg3, %arg4] {nontemporal = true }: memref<4x4xf32, strided<[64, 3]>>
   return %1 : f32
 }
 
@@ -522,8 +522,8 @@ func.func @fold_load_keep_nontemporal(%arg0 : memref<12x32xf32>, %arg1 : index,
 //      CHECK:   memref.store %{{.+}}, %{{.+}}[%{{.+}}, %{{.+}}]  {nontemporal = true} : memref<12x32xf32>
 func.func @fold_store_keep_nontemporal(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5 : f32) {
   %0 = memref.subview %arg0[%arg1, %arg2][4, 4][2, 3] :
-    memref<12x32xf32> to memref<4x4xf32, strided<[64, 3], offset: ?>>
-  memref.store %arg5, %0[%arg3, %arg4] {nontemporal=true}: memref<4x4xf32, strided<[64, 3], offset: ?>>
+    memref<12x32xf32> to memref<4x4xf32, strided<[64, 3]>>
+  memref.store %arg5, %0[%arg3, %arg4] {nontemporal=true}: memref<4x4xf32, strided<[64, 3]>>
   return
 }
 
@@ -544,8 +544,8 @@ func.func @fold_prefetch_expand_shape(%src: memref<32xf32>, %i0: index, %i1: ind
 // -----
 
 func.func @fold_gpu_subgroup_mma_load_matrix_1d(%src: memref<?xvector<4xf32>>, %offset: index, %i: index) -> !gpu.mma_matrix<16x16xf16, "COp"> {
-  %subview = memref.subview %src[%offset] [81920] [1] : memref<?xvector<4xf32>> to memref<81920xvector<4xf32>, strided<[1], offset: ?>>
-  %matrix = gpu.subgroup_mma_load_matrix %subview[%i] {leadDimension = 160 : index} : memref<81920xvector<4xf32>, strided<[1], offset: ?>> -> !gpu.mma_matrix<16x16xf16, "COp">
+  %subview = memref.subview %src[%offset] [81920] [1] : memref<?xvector<4xf32>> to memref<81920xvector<4xf32>, strided<[1]>>
+  %matrix = gpu.subgroup_mma_load_matrix %subview[%i] {leadDimension = 160 : index} : memref<81920xvector<4xf32>, strided<[1]>> -> !gpu.mma_matrix<16x16xf16, "COp">
   return %matrix: !gpu.mma_matrix<16x16xf16, "COp">
 }
 
@@ -559,8 +559,8 @@ func.func @fold_gpu_subgroup_mma_load_matrix_1d(%src: memref<?xvector<4xf32>>, %
 // -----
 
 func.func @fold_gpu_subgroup_mma_store_matrix_1d(%dst: memref<?xvector<4xf32>>, %offset: index, %i: index, %matrix: !gpu.mma_matrix<16x16xf16, "COp">) {
-  %subview = memref.subview %dst[%offset] [81920] [1] : memref<?xvector<4xf32>> to memref<81920xvector<4xf32>, strided<[1], offset: ?>>
-  gpu.subgroup_mma_store_matrix %matrix, %subview[%i] {leadDimension = 160 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<81920xvector<4xf32>, strided<[1], offset: ?>>
+  %subview = memref.subview %dst[%offset] [81920] [1] : memref<?xvector<4xf32>> to memref<81920xvector<4xf32>, strided<[1]>>
+  gpu.subgroup_mma_store_matrix %matrix, %subview[%i] {leadDimension = 160 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<81920xvector<4xf32>, strided<[1]>>
   return
 }
 
@@ -575,9 +575,9 @@ func.func @fold_gpu_subgroup_mma_store_matrix_1d(%dst: memref<?xvector<4xf32>>,
 // CHECK-LABEL: func.func @fold_gpu_subgroup_mma_load_matrix_2d
 //  CHECK-SAME: %[[SRC:.+]]: memref<128x128xf32>
 func.func @fold_gpu_subgroup_mma_load_matrix_2d(%arg0 : memref<128x128xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index) -> !gpu.mma_matrix<16x16xf16, "COp"> {
-  %subview = memref.subview %arg0[%arg1, %arg2][64, 32][2, 1] : memref<128x128xf32> to memref<64x32xf32, strided<[256, 1], offset: ?>>
+  %subview = memref.subview %arg0[%arg1, %arg2][64, 32][2, 1] : memref<128x128xf32> to memref<64x32xf32, strided<[256, 1]>>
   // CHECK: gpu.subgroup_mma_load_matrix %[[SRC]][{{.+}}] {leadDimension = 32 : index} : memref<128x128xf32> -> !gpu.mma_matrix<16x16xf16, "COp">
-  %matrix = gpu.subgroup_mma_load_matrix %subview[%arg3, %arg4] {leadDimension = 32 : index} : memref<64x32xf32, strided<[256, 1], offset: ?>> -> !gpu.mma_matrix<16x16xf16, "COp">
+  %matrix = gpu.subgroup_mma_load_matrix %subview[%arg3, %arg4] {leadDimension = 32 : index} : memref<64x32xf32, strided<[256, 1]>> -> !gpu.mma_matrix<16x16xf16, "COp">
   return %matrix : !gpu.mma_matrix<16x16xf16, "COp">
 }
 
@@ -586,9 +586,9 @@ func.func @fold_gpu_subgroup_mma_load_matrix_2d(%arg0 : memref<128x128xf32>, %ar
 // CHECK-LABEL: func.func @fold_gpu_subgroup_mma_load_matrix_2d
 //  CHECK-SAME: %[[DST:.+]]: memref<128x128xf32>
 func.func @fold_gpu_subgroup_mma_load_matrix_2d(%arg0 : memref<128x128xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %matrix: !gpu.mma_matrix<16x16xf16, "COp">) {
-  %subview = memref.subview %arg0[%arg1, %arg2][64, 32][2, 1] : memref<128x128xf32> to memref<64x32xf32, strided<[256, 1], offset: ?>>
+  %subview = memref.subview %arg0[%arg1, %arg2][64, 32][2, 1] : memref<128x128xf32> to memref<64x32xf32, strided<[256, 1]>>
   // CHECK: gpu.subgroup_mma_store_matrix %{{.+}}, %[[DST]][{{.+}}] {leadDimension = 32 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<128x128xf32>
-  gpu.subgroup_mma_store_matrix %matrix, %subview[%arg3, %arg4] {leadDimension = 32 : index} :  !gpu.mma_matrix<16x16xf16, "COp">, memref<64x32xf32, strided<[256, 1], offset: ?>>
+  gpu.subgroup_mma_store_matrix %matrix, %subview[%arg3, %arg4] {leadDimension = 32 : index} :  !gpu.mma_matrix<16x16xf16, "COp">, memref<64x32xf32, strided<[256, 1]>>
   return
 }
 
@@ -599,8 +599,8 @@ func.func @fold_nvgpu_device_async_copy_zero_sub_idx(%gmem_memref_3d : memref<2x
 
   %c0 = arith.constant 0 : index
   %smem_memref_4d = memref.alloc() : memref<5x1x64x64xf16, #gpu.address_space<workgroup>>
-  %gmem_memref_subview_2d = memref.subview %gmem_memref_3d[%idx_1, %idx_2, %idx_3] [1, 1, 8] [1, 1, 1] : memref<2x128x768xf16> to memref<1x8xf16, strided<[98304, 1], offset: ?>>
-  %async_token = nvgpu.device_async_copy %gmem_memref_subview_2d[%c0, %c0], %smem_memref_4d[%c0, %c0, %c0, %c0], 8 {bypassL1} : memref<1x8xf16, strided<[98304, 1], offset: ?>> to memref<5x1x64x64xf16, #gpu.address_space<workgroup>>
+  %gmem_memref_subview_2d = memref.subview %gmem_memref_3d[%idx_1, %idx_2, %idx_3] [1, 1, 8] [1, 1, 1] : memref<2x128x768xf16> to memref<1x8xf16, strided<[98304, 1]>>
+  %async_token = nvgpu.device_async_copy %gmem_memref_subview_2d[%c0, %c0], %smem_memref_4d[%c0, %c0, %c0, %c0], 8 {bypassL1} : memref<1x8xf16, strided<[98304, 1]>> to memref<5x1x64x64xf16, #gpu.address_space<workgroup>>
   return
 }
 
@@ -616,8 +616,8 @@ func.func @fold_nvgpu_device_async_copy_zero_sub_idx(%gmem_memref_3d : memref<2x
 func.func @fold_src_nvgpu_device_async_copy(%gmem_memref_3d : memref<2x128x768xf16>, %src_idx_0 : index, %src_idx_1 : index, %src_idx_2 : index, %src_sub_idx_0 : index, %src_sub_idx_1 : index) {
   %c0 = arith.constant 0 : index
   %smem_memref_4d = memref.alloc() : memref<5x1x64x64xf16, #gpu.address_space<workgroup>>
-  %gmem_memref_subview_2d = memref.subview %gmem_memref_3d[%src_idx_0, %src_idx_1, %src_idx_2] [1, 1, 8] [1, 1, 1] : memref<2x128x768xf16> to memref<1x8xf16, strided<[98304, 1], offset: ?>>
-  %async_token = nvgpu.device_async_copy %gmem_memref_subview_2d[%src_sub_idx_0, %src_sub_idx_1], %smem_memref_4d[%c0, %c0, %c0, %c0], 8 {bypassL1} : memref<1x8xf16, strided<[98304, 1], offset: ?>> to memref<5x1x64x64xf16, #gpu.address_space<workgroup>>
+  %gmem_memref_subview_2d = memref.subview %gmem_memref_3d[%src_idx_0, %src_idx_1, %src_idx_2] [1, 1, 8] [1, 1, 1] : memref<2x128x768xf16> to memref<1x8xf16, strided<[98304, 1]>>
+  %async_token = nvgpu.device_async_copy %gmem_memref_subview_2d[%src_sub_idx_0, %src_sub_idx_1], %smem_memref_4d[%c0, %c0, %c0, %c0], 8 {bypassL1} : memref<1x8xf16, strided<[98304, 1]>> to memref<5x1x64x64xf16, #gpu.address_space<workgroup>>
   return
 }
 
@@ -635,9 +635,9 @@ func.func @fold_src_nvgpu_device_async_copy(%gmem_memref_3d : memref<2x128x768xf
 func.func @fold_src_fold_dest_nvgpu_device_async_copy(%gmem_memref_3d : memref<2x128x768xf16>, %src_idx_0 : index, %src_idx_1 : index, %src_idx_2 : index, %src_sub_idx_0 : index, %src_sub_idx_1 : index, %dest_idx_0 : index, %dest_idx_1 : index, %dest_idx_2 : index, %dest_idx_3 : index, %dest_sub_idx_0 : index, %dest_sub_idx_1 : index) {
   %c0 = arith.constant 0 : index
   %smem_memref_4d = memref.alloc() : memref<5x1x64x64xf16, #gpu.address_space<workgroup>>
-  %gmem_memref_subview_2d = memref.subview %gmem_memref_3d[%src_idx_0, %src_idx_1, %src_idx_2] [1, 1, 8] [1, 1, 1] : memref<2x128x768xf16> to memref<1x8xf16, strided<[98304, 1], offset: ?>>
-  %smem_memref_2d = memref.subview %smem_memref_4d[%dest_idx_0, %dest_idx_1, %dest_idx_2, %dest_idx_3] [1, 1, 1, 8] [1, 1, 1, 1] : memref<5x1x64x64xf16, #gpu.address_space<workgroup>> to memref<1x8xf16, strided<[4096, 1], offset: ?>, #gpu.address_space<workgroup>>
-  %async_token = nvgpu.device_async_copy %gmem_memref_subview_2d[%src_sub_idx_0, %src_sub_idx_1], %smem_memref_2d[%dest_sub_idx_0, %dest_sub_idx_1], 8 {bypassL1} : memref<1x8xf16, strided<[98304, 1], offset: ?>> to memref<1x8xf16, strided<[4096, 1], offset: ?>, #gpu.address_space<workgroup>>
+  %gmem_memref_subview_2d = memref.subview %gmem_memref_3d[%src_idx_0, %src_idx_1, %src_idx_2] [1, 1, 8] [1, 1, 1] : memref<2x128x768xf16> to memref<1x8xf16, strided<[98304, 1]>>
+  %smem_memref_2d = memref.subview %smem_memref_4d[%dest_idx_0, %dest_idx_1, %dest_idx_2, %dest_idx_3] [1, 1, 1, 8] [1, 1, 1, 1] : memref<5x1x64x64xf16, #gpu.address_space<workgroup>> to memref<1x8xf16, strided<[4096, 1]>, #gpu.address_space<workgroup>>
+  %async_token = nvgpu.device_async_copy %gmem_memref_subview_2d[%src_sub_idx_0, %src_sub_idx_1], %smem_memref_2d[%dest_sub_idx_0, %dest_sub_idx_1], 8 {bypassL1} : memref<1x8xf16, strided<[98304, 1]>> to memref<1x8xf16, strided<[4096, 1]>, #gpu.address_space<workgroup>>
   return
 }
 
@@ -660,8 +660,8 @@ func.func @test_ldmatrix(%arg0: memref<4x32x32xf16, 3>, %arg1: index, %arg2: ind
   %0 = affine.apply #map()[%arg1]
   %1 = affine.apply #map1()[%arg2]
   %2 = affine.apply #map1()[%arg3]
-  %subview = memref.subview %arg0[%arg1, %arg2, %arg3] [%0, %1, %2] [1, 1, 1] : memref<4x32x32xf16, 3> to memref<?x?x?xf16, strided<[1024, 32, 1], offset: ?>, 3>
-  %3 = nvgpu.ldmatrix %subview[%c0, %c0, %c0] {numTiles = 4 : i32, transpose = false} : memref<?x?x?xf16, strided<[1024, 32, 1], offset: ?>, 3> -> vector<4x2xf16>
+  %subview = memref.subview %arg0[%arg1, %arg2, %arg3] [%0, %1, %2] [1, 1, 1] : memref<4x32x32xf16, 3> to memref<?x?x?xf16, strided<[1024, 32, 1]>, 3>
+  %3 = nvgpu.ldmatrix %subview[%c0, %c0, %c0] {numTiles = 4 : i32, transpose = false} : memref<?x?x?xf16, strided<[1024, 32, 1]>, 3> -> vector<4x2xf16>
   return %3 : vector<4x2xf16>
 }
 
@@ -681,8 +681,8 @@ func.func @fold_vector_load_subview(%src : memref<24x64xf32>,
                                     %dim2 : index,
                                     %idx : index) -> vector<12x32xf32> {
 
-    %0 = memref.subview %src[%off1, %off2][%dim1, %dim2][1, 1] : memref<24x64xf32> to memref<?x?xf32, strided<[64, 1], offset: ?>>
-    %1 = vector.load %0[%idx, %idx] :  memref<?x?xf32, strided<[64, 1], offset: ?>>, vector<12x32xf32>
+    %0 = memref.subview %src[%off1, %off2][%dim1, %dim2][1, 1] : memref<24x64xf32> to memref<?x?xf32, strided<[64, 1]>>
+    %1 = vector.load %0[%idx, %idx] :  memref<?x?xf32, strided<[64, 1]>>, vector<12x32xf32>
     return %1 : vector<12x32xf32>
 }
 
@@ -702,8 +702,8 @@ func.func @fold_vector_load_subview(%src : memref<24x64xf32>,
 
 func.func @fold_vector_maskedload_subview(
   %arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3: vector<32xi1>, %arg4: vector<32xf32>) -> vector<32xf32> {
-  %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[], offset: ?>>
-  %1 = vector.maskedload %0[], %arg3, %arg4 : memref<f32, strided<[], offset: ?>>, vector<32xi1>, vector<32xf32> into vector<32xf32>
+  %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[]>>
+  %1 = vector.maskedload %0[], %arg3, %arg4 : memref<f32, strided<[]>>, vector<32xi1>, vector<32xf32> into vector<32xf32>
   return %1 : vector<32xf32>
 }
 
@@ -725,8 +725,8 @@ func.func @fold_vector_store_subview(%src : memref<24x64xf32>,
                                      %dim1 : index,
                                      %dim2 : index) -> () {
 
-    %0 = memref.subview %src[%off1, %off2][%dim1, %dim2][1, 1] : memref<24x64xf32> to memref<?x?xf32, strided<[64, 1], offset: ?>>
-    vector.store %vec, %0[%idx, %idx] : memref<?x?xf32, strided<[64, 1], offset: ?>> , vector<2x32xf32>
+    %0 = memref.subview %src[%off1, %off2][%dim1, %dim2][1, 1] : memref<24x64xf32> to memref<?x?xf32, strided<[64, 1]>>
+    vector.store %vec, %0[%idx, %idx] : memref<?x?xf32, strided<[64, 1]>> , vector<2x32xf32>
     return
 }
 
@@ -748,8 +748,8 @@ func.func @fold_vector_store_subview(%src : memref<24x64xf32>,
 
 func.func @fold_vector_maskedstore_subview(
   %arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3: vector<32xi1>, %arg4: vector<32xf32>) -> () {
-  %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[], offset: ?>>
-  vector.maskedstore %0[], %arg3, %arg4 : memref<f32, strided<[], offset: ?>>, vector<32xi1>, vector<32xf32>
+  %0 = memref.subview %arg0[%arg1, %arg2][1, 1][1, 1] : memref<12x32xf32> to memref<f32, strided<[]>>
+  vector.maskedstore %0[], %arg3, %arg4 : memref<f32, strided<[]>>, vector<32xi1>, vector<32xf32>
   return
 }
 
@@ -990,8 +990,8 @@ func.func @fold_dma_start_subview_src(
     %off0 : index, %off1 : index) {
   %c0 = arith.constant 0 : index
   %num_elements = arith.constant 32 : index
-  %subview = memref.subview %src[%off0, %off1][32, 32][1, 1] : memref<128x64xf32> to memref<32x32xf32, strided<[64, 1], offset: ?>>
-  memref.dma_start %subview[%c0, %c0], %dst[%c0], %num_elements, %tag[%c0] : memref<32x32xf32, strided<[64, 1], offset: ?>>, memref<32xf32, 1>, memref<1xi32>
+  %subview = memref.subview %src[%off0, %off1][32, 32][1, 1] : memref<128x64xf32> to memref<32x32xf32, strided<[64, 1]>>
+  memref.dma_start %subview[%c0, %c0], %dst[%c0], %num_elements, %tag[%c0] : memref<32x32xf32, strided<[64, 1]>>, memref<32xf32, 1>, memref<1xi32>
   return
 }
 
@@ -1012,8 +1012,8 @@ func.func @fold_dma_start_subview_dst(
     %off0 : index, %off1 : index) {
   %c0 = arith.constant 0 : index
   %num_elements = arith.constant 32 : index
-  %subview = memref.subview %dst[%off0, %off1][32, 32][1, 1] : memref<128x64xf32, 1> to memref<32x32xf32, strided<[64, 1], offset: ?>, 1>
-  memref.dma_start %src[%c0], %subview[%c0, %c0], %num_elements, %tag[%c0] : memref<32xf32>, memref<32x32xf32, strided<[64, 1], offset: ?>, 1>, memref<1xi32>
+  %subview = memref.subview %dst[%off0, %off1][32, 32][1, 1] : memref<128x64xf32, 1> to memref<32x32xf32, strided<[64, 1]>, 1>
+  memref.dma_start %src[%c0], %subview[%c0, %c0], %num_elements, %tag[%c0] : memref<32xf32>, memref<32x32xf32, strided<[64, 1]>, 1>, memref<1xi32>
   return
 }
 // CHECK-LABEL: func @fold_dma_start_subview_dst
diff --git a/mlir/test/Dialect/MemRef/invalid.mlir b/mlir/test/Dialect/MemRef/invalid.mlir
index d3670fde08d81..c8ce8fda648df 100644
--- a/mlir/test/Dialect/MemRef/invalid.mlir
+++ b/mlir/test/Dialect/MemRef/invalid.mlir
@@ -152,7 +152,7 @@ func.func @memref_reinterpret_cast_too_many_offsets(%in: memref<?xf32>) {
   // expected-error @+1 {{expected 1 offset values}}
   %out = memref.reinterpret_cast %in to
            offset: [0, 0], sizes: [10, 10], strides: [10, 1]
-           : memref<?xf32> to memref<10x10xf32, strided<[10, 1], offset: 0>>
+           : memref<?xf32> to memref<10x10xf32, strided<[10, 1]>>
   return
 }
 
@@ -162,7 +162,7 @@ func.func @memref_reinterpret_cast_incompatible_element_types(%in: memref<*xf32>
   // expected-error @+1 {{source element type ('f32') does not match result element type ('i32')}}
   %out = memref.reinterpret_cast %in to
            offset: [0], sizes: [10], strides: [1]
-         : memref<*xf32> to memref<10xi32, strided<[1], offset: 0>>
+         : memref<*xf32> to memref<10xi32, strided<[1]>>
   return
 }
 
@@ -172,7 +172,7 @@ func.func @memref_reinterpret_cast_incompatible_memory_space(%in: memref<*xf32>)
   // expected-error @+1 {{different memory spaces specified}}
   %out = memref.reinterpret_cast %in to
            offset: [0], sizes: [10], strides: [1]
-         : memref<*xf32> to memref<10xi32, strided<[1], offset: 0>, 2>
+         : memref<*xf32> to memref<10xi32, strided<[1]>, 2>
   return
 }
 
@@ -182,7 +182,7 @@ func.func @memref_reinterpret_cast_offset_mismatch(%in: memref<?xf32>) {
   // expected-error @+1 {{expected result type with offset = 1 instead of 2}}
   %out = memref.reinterpret_cast %in to
            offset: [1], sizes: [10], strides: [1]
-         : memref<?xf32> to memref<10xf32, strided<[1], offset: 2>>
+         : memref<?xf32> to memref<10xf32, strided<[1]>>
   return
 }
 
@@ -192,7 +192,7 @@ func.func @memref_reinterpret_cast_size_mismatch(%in: memref<*xf32>) {
   // expected-error @+1 {{expected result type with size = 10 instead of 1 in dim = 0}}
   %out = memref.reinterpret_cast %in to
            offset: [0], sizes: [10], strides: [1]
-         : memref<*xf32> to memref<1xf32, strided<[1], offset: 0>>
+         : memref<*xf32> to memref<1xf32, strided<[1]>>
   return
 }
 
@@ -202,7 +202,7 @@ func.func @memref_reinterpret_cast_offset_mismatch(%in: memref<?xf32>) {
   // expected-error @+1 {{expected result type with stride = 2 instead of 1 in dim = 0}}
   %out = memref.reinterpret_cast %in to
            offset: [2], sizes: [10], strides: [2]
-         : memref<?xf32> to memref<10xf32, strided<[1], offset: 2>>
+         : memref<?xf32> to memref<10xf32, strided<[1]>>
   return
 }
 
@@ -271,11 +271,11 @@ func.func @memref_reshape_dst_shape_rank_mismatch(
 // -----
 
 func.func @memref_reshape_src_affine_map_is_not_identity(
-        %buf: memref<4x4xf32, strided<[3, 2], offset: 0>>,
+        %buf: memref<4x4xf32, strided<[3, 2]>>,
         %shape: memref<1xi32>) {
   // expected-error @+1 {{source memref type should have identity affine map}}
   memref.reshape %buf(%shape)
-    : (memref<4x4xf32, strided<[3, 2], offset: 0>>, memref<1xi32>)
+    : (memref<4x4xf32, strided<[3, 2]>>, memref<1xi32>)
     -> memref<8xf32>
 }
 
@@ -285,7 +285,7 @@ func.func @memref_reshape_result_affine_map_is_not_identity(
         %buf: memref<4x4xf32>, %shape: memref<1xi32>) {
   // expected-error @+1 {{result memref type should have identity affine map}}
   memref.reshape %buf(%shape)
-    : (memref<4x4xf32>, memref<1xi32>) -> memref<8xf32, strided<[2], offset: 0>>
+    : (memref<4x4xf32>, memref<1xi32>) -> memref<8xf32, strided<[2]>>
 }
 
 // -----
@@ -448,11 +448,11 @@ func.func @expand_shape_out_of_bounds(%arg0: memref<?xf32>, %sz0: index) {
 // -----
 
 func.func @expand_shape_invalid_result_layout(
-    %arg0: memref<30x20xf32, strided<[4000, 2], offset: 100>>) {
-  // expected-error @+1 {{expected expanded type to be 'memref<2x15x20xf32, strided<[60000, 4000, 2], offset: 100>>' but found 'memref<2x15x20xf32, strided<[5000, 4000, 2], offset: 100>>'}}
+    %arg0: memref<30x20xf32, strided<[4000, 2]>>) {
+  // expected-error @+1 {{expected expanded type to be 'memref<2x15x20xf32, strided<[60000, 4000, 2]>>' but found 'memref<2x15x20xf32, strided<[5000, 4000, 2]>>'}}
   %0 = memref.expand_shape %arg0 [[0, 1], [2]] output_shape [2, 15, 20] :
-      memref<30x20xf32, strided<[4000, 2], offset: 100>>
-      into memref<2x15x20xf32, strided<[5000, 4000, 2], offset: 100>>
+      memref<30x20xf32, strided<[4000, 2]>>
+      into memref<2x15x20xf32, strided<[5000, 4000, 2]>>
 }
 
 // -----
@@ -460,7 +460,7 @@ func.func @expand_shape_invalid_result_layout(
 func.func @collapse_shape_mismatch_indices_num(%arg0: memref<?x?x?xf32>) {
   // expected-error @+1 {{invalid number of reassociation groups: found 1, expected 2}}
   %0 = memref.collapse_shape %arg0 [[0, 1]] :
-    memref<?x?x?xf32> into memref<?x?xf32, strided<[?, 1], offset: 0>>
+    memref<?x?x?xf32> into memref<?x?xf32, strided<[?, 1]>>
 }
 
 // -----
@@ -468,7 +468,7 @@ func.func @collapse_shape_mismatch_indices_num(%arg0: memref<?x?x?xf32>) {
 func.func @collapse_shape_invalid_reassociation(%arg0: memref<?x?x?xf32>) {
   // expected-error @+1 {{reassociation indices must be contiguous}}
   %0 = memref.collapse_shape %arg0 [[0, 1], [1, 2]] :
-    memref<?x?x?xf32> into memref<?x?xf32, strided<[?, 1], offset: 0>>
+    memref<?x?x?xf32> into memref<?x?xf32, strided<[?, 1]>>
 }
 
 // -----
@@ -502,11 +502,11 @@ func.func @collapse_shape_invalid_reassociation_expansion(%arg0: memref<?x?xf32>
 // -----
 
 func.func @collapse_shape_reshaping_non_contiguous(
-    %arg0: memref<3x4x5xf32, strided<[270, 50, 10], offset: 0>>) {
+    %arg0: memref<3x4x5xf32, strided<[270, 50, 10]>>) {
   // expected-error @+1 {{invalid source layout map or collapsing non-contiguous dims}}
   %0 = memref.collapse_shape %arg0 [[0, 1], [2]] :
-      memref<3x4x5xf32, strided<[270, 50, 10], offset: 0>>
-      into memref<12x5xf32, strided<[50, 1], offset: 0>>
+      memref<3x4x5xf32, strided<[270, 50, 10]>>
+      into memref<12x5xf32, strided<[50, 1]>>
   return
 }
 
@@ -640,18 +640,18 @@ func.func @invalid_view(%arg0 : index, %arg1 : index, %arg2 : index) {
 
 // -----
 
-func.func @invalid_subview(%input: memref<4x1024xf32>) -> memref<2x256xf32, strided<[1024, 1], offset: 2304>> {
+func.func @invalid_subview(%input: memref<4x1024xf32>) -> memref<2x256xf32, strided<[1024, 1]>> {
   // expected-error at +1 {{expected offsets to be non-negative, but got -1}}
-  %0 = memref.subview %input[-1, 256] [2, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1], offset: 2304>>
-  return %0 : memref<2x256xf32, strided<[1024, 1], offset: 2304>>
+  %0 = memref.subview %input[-1, 256] [2, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1]>>
+  return %0 : memref<2x256xf32, strided<[1024, 1]>>
 }
 
 // -----
 
-func.func @invalid_subview(%input: memref<4x1024xf32>) -> memref<2x256xf32, strided<[1024, 1], offset: 2304>> {
+func.func @invalid_subview(%input: memref<4x1024xf32>) -> memref<2x256xf32, strided<[1024, 1]>> {
   // expected-error at +1 {{expected sizes to be non-negative, but got -1}}
-  %0 = memref.subview %input[2, 256] [-1, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1], offset: 2304>>
-  return %0 : memref<2x256xf32, strided<[1024, 1], offset: 2304>>
+  %0 = memref.subview %input[2, 256] [-1, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1]>>
+  return %0 : memref<2x256xf32, strided<[1024, 1]>>
 }
 
 // -----
@@ -672,7 +672,7 @@ func.func @invalid_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
 func.func @invalid_subview(%arg0 : memref<?x128xi8, 1>) {
   %0 = memref.alloc() :memref<1xf32>
   // expected-error at +1 {{expected the number of 'offsets' to match the number of dynamic entries in 'static_offsets' (0 vs 1)}}
-  "memref.subview"(%0) <{operandSegmentSizes = array<i32: 1, 0, 0, 0>, static_offsets = array<i64: -9223372036854775808>, static_sizes = array<i64: 1>, static_strides = array<i64: 1>}> : (memref<1xf32>) -> memref<1xf32, strided<[1], offset: ?>>
+  "memref.subview"(%0) <{operandSegmentSizes = array<i32: 1, 0, 0, 0>, static_offsets = array<i64: -9223372036854775808>, static_sizes = array<i64: 1>, static_strides = array<i64: 1>}> : (memref<1xf32>) -> memref<1xf32, strided<[1]>>
   return
 }
 
@@ -699,10 +699,10 @@ func.func @invalid_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
 // -----
 
 func.func @invalid_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
-  %0 = memref.alloc() : memref<8x16x4xf32, strided<[64, 4, 1], offset: 0>, 2>
+  %0 = memref.alloc() : memref<8x16x4xf32, strided<[64, 4, 1]>, 2>
   // expected-error at +1 {{different memory spaces}}
   %1 = memref.subview %0[0, 0, 0][%arg2, %arg2, %arg2][1, 1, 1]
-    : memref<8x16x4xf32, strided<[64, 4, 1], offset: 0>, 2> to
+    : memref<8x16x4xf32, strided<[64, 4, 1]>, 2> to
       memref<8x?x4xf32, affine_map<(d0, d1, d2)[s0] -> (d0 * s0 + d1 * 4 + d2)>>
   return
 }
@@ -714,7 +714,7 @@ func.func @invalid_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
   // expected-error at +1 {{is not strided}}
   %1 = memref.subview %0[0, 0, 0][%arg2, %arg2, %arg2][1, 1, 1]
     : memref<8x16x4xf32, affine_map<(d0, d1, d2) -> (d0 + d1, d1 + d2, d2)>> to
-      memref<8x?x4xf32, strided<[?, 4, 1], offset: 0>>
+      memref<8x?x4xf32, strided<[?, 4, 1]>>
   return
 }
 
@@ -725,7 +725,7 @@ func.func @invalid_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
   // expected-error at +1 {{expected 3 offset values}}
   %1 = memref.subview %0[%arg0, %arg1, 0, 0][%arg2, 0, 0, 0][1, 1, 1, 1]
     : memref<8x16x4xf32> to
-      memref<8x?x4xf32, strided<[?, ?, 4], offset: 0>>
+      memref<8x?x4xf32, strided<[?, ?, 4]>>
   return
 }
 
@@ -755,7 +755,7 @@ func.func @invalid_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
 
 func.func @invalid_subview(%arg0: memref<10xf32>) {
   // expected-error at +1 {{offset 0 is out-of-bounds: 10 >= 10}}
-  %0 = memref.subview %arg0 [10][1][1] : memref<10xf32> to memref<1xf32, strided<[1], offset: 10>>
+  %0 = memref.subview %arg0 [10][1][1] : memref<10xf32> to memref<1xf32, strided<[1]>>
   return
 }
 
@@ -763,7 +763,7 @@ func.func @invalid_subview(%arg0: memref<10xf32>) {
 
 func.func @invalid_subview(%arg0: memref<9xf32>) {
   // expected-error at +1 {{slice along dimension 0 runs out-of-bounds: 9 >= 9}}
-  %0 = memref.subview %arg0 [3][4][2] : memref<9xf32> to memref<4xf32, strided<[2], offset: 3>>
+  %0 = memref.subview %arg0 [3][4][2] : memref<9xf32> to memref<4xf32, strided<[2]>>
   return
 }
 
@@ -781,7 +781,7 @@ func.func @invalid_rank_reducing_subview(%arg0 : index, %arg1 : index, %arg2 : i
 
 func.func @invalid_rank_reducing_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
   %0 = memref.alloc() : memref<8x16x4xf32>
-  // expected-error at +1 {{expected result type to be 'memref<8x16x4xf32, strided<[64, 4, 1], offset: 8>>' or a rank-reduced version. (mismatch of result sizes)}}
+  // expected-error at +1 {{expected result type to be 'memref<8x16x4xf32, strided<[64, 4, 1]>>' or a rank-reduced version. (mismatch of result sizes)}}
   %1 = memref.subview %0[0, 2, 0][8, 16, 4][1, 1, 1]
     : memref<8x16x4xf32> to memref<16x4xf32>
   return
@@ -790,7 +790,7 @@ func.func @invalid_rank_reducing_subview(%arg0 : index, %arg1 : index, %arg2 : i
 // -----
 
 func.func @invalid_rank_reducing_subview(%arg0 : memref<?x?xf32>, %arg1 : index, %arg2 : index) {
-  // expected-error at +1 {{expected result type to be 'memref<?x1xf32, strided<[?, 1], offset: ?>>' or a rank-reduced version. (mismatch of result layout)}}
+  // expected-error at +1 {{expected result type to be 'memref<?x1xf32, strided<[?, 1]>>' or a rank-reduced version. (mismatch of result layout)}}
   %0 = memref.subview %arg0[0, %arg1][%arg2, 1][1, 1] : memref<?x?xf32> to memref<?xf32>
   return
 }
@@ -802,7 +802,7 @@ func.func @invalid_rank_reducing_subview(%arg0 : memref<?x?xf32>, %arg1 : index,
 func.func @subview_bad_offset_1(%arg0: memref<16x16xf32>) {
   %c0 = arith.constant 0 : index
   %c8 = arith.constant 8 : index
-  // expected-error @+1 {{expected result type to be 'memref<8x8xf32, strided<[16, 1], offset: ?>>' or a rank-reduced version}}
+  // expected-error @+1 {{expected result type to be 'memref<8x8xf32, strided<[16, 1]>>' or a rank-reduced version}}
   %s2 = memref.subview %arg0[%c8, %c8][8, 8][1, 1]  : memref<16x16xf32> to memref<8x8xf32, #map0>
   return
 }
@@ -814,7 +814,7 @@ func.func @subview_bad_offset_1(%arg0: memref<16x16xf32>) {
 func.func @subview_bad_offset_2(%arg0: memref<16x16xf32>) {
   %c0 = arith.constant 0 : index
   %c8 = arith.constant 8 : index
-  // expected-error @+1 {{expected result type to be 'memref<8x8xf32, strided<[16, 1], offset: ?>>' or a rank-reduced version}}
+  // expected-error @+1 {{expected result type to be 'memref<8x8xf32, strided<[16, 1]>>' or a rank-reduced version}}
   %s2 = memref.subview %arg0[%c8, 8][8, 8][1, 1]  : memref<16x16xf32> to memref<8x8xf32, #map0>
   return
 }
@@ -824,24 +824,24 @@ func.func @subview_bad_offset_2(%arg0: memref<16x16xf32>) {
 func.func @subview_bad_offset_3(%arg0: memref<16x16xf32>) {
   %c0 = arith.constant 0 : index
   %c8 = arith.constant 8 : index
-  // expected-error @+1 {{expected result type to be 'memref<8x8xf32, strided<[16, 1], offset: ?>>' or a rank-reduced version}}
-  %s2 = memref.subview %arg0[%c8, 8][8, 8][1, 1]  : memref<16x16xf32> to memref<8x8xf32, strided<[16, 1], offset: 437>>
+  // expected-error @+1 {{expected result type to be 'memref<8x8xf32, strided<[16, 1]>>' or a rank-reduced version}}
+  %s2 = memref.subview %arg0[%c8, 8][8, 8][1, 1]  : memref<16x16xf32> to memref<8x8xf32, strided<[16, 1]>>
   return
 }
 
 // -----
 
-func.func @invalid_memref_cast(%arg0 : memref<12x4x16xf32, strided<[64, 16, 1], offset: 0>>) {
+func.func @invalid_memref_cast(%arg0 : memref<12x4x16xf32, strided<[64, 16, 1]>>) {
   // expected-error at +1{{operand type 'memref<12x4x16xf32, strided<[64, 16, 1]>>' and result type 'memref<12x4x16xf32, strided<[128, 32, 2]>>' are cast incompatible}}
-  %0 = memref.cast %arg0 : memref<12x4x16xf32, strided<[64, 16, 1], offset: 0>> to memref<12x4x16xf32, strided<[128, 32, 2], offset: 0>>
+  %0 = memref.cast %arg0 : memref<12x4x16xf32, strided<[64, 16, 1]>> to memref<12x4x16xf32, strided<[128, 32, 2]>>
   return
 }
 
 // -----
 
-func.func @invalid_memref_cast(%arg0 : memref<12x4x16xf32, strided<[64, 16, 1], offset: 0>>) {
-  // expected-error at +1{{operand type 'memref<12x4x16xf32, strided<[64, 16, 1]>>' and result type 'memref<12x4x16xf32, strided<[64, 16, 1], offset: 16>>' are cast incompatible}}
-  %0 = memref.cast %arg0 : memref<12x4x16xf32, strided<[64, 16, 1], offset: 0>> to memref<12x4x16xf32, strided<[64, 16, 1], offset: 16>>
+func.func @invalid_memref_cast(%arg0 : memref<12x4x16xf32, strided<[64, 16, 1]>>) {
+  // expected-error at +1{{operand type 'memref<12x4x16xf32, strided<[64, 16, 1]>>' and result type 'memref<12x4x16xf32, strided<[64, 16, 1]>>' are cast incompatible}}
+  %0 = memref.cast %arg0 : memref<12x4x16xf32, strided<[64, 16, 1]>> to memref<12x4x16xf32, strided<[64, 16, 1]>>
   return
 }
 
@@ -1186,11 +1186,11 @@ func.func @subview_invalid_strides_rank_reduction(%m: memref<7x22x333x4444xi32>)
 // -----
 
 func.func @expand_shape_invalid_output_shape(
-    %arg0: memref<30x20xf32, strided<[4000, 2], offset: 100>>) {
+    %arg0: memref<30x20xf32, strided<[4000, 2]>>) {
   // expected-error @+1 {{invalid output shape provided at pos 2}}
   %0 = memref.expand_shape %arg0 [[0, 1], [2]] output_shape [2, 15, 21] :
-      memref<30x20xf32, strided<[4000, 2], offset: 100>>
-      into memref<2x15x20xf32, strided<[60000, 4000, 2], offset: 100>>
+      memref<30x20xf32, strided<[4000, 2]>>
+      into memref<2x15x20xf32, strided<[60000, 4000, 2]>>
   return
 }
 
diff --git a/mlir/test/Dialect/MemRef/make-loop-independent.mlir b/mlir/test/Dialect/MemRef/make-loop-independent.mlir
index dca7bc1e67586..4b1424d1a084b 100644
--- a/mlir/test/Dialect/MemRef/make-loop-independent.mlir
+++ b/mlir/test/Dialect/MemRef/make-loop-independent.mlir
@@ -17,13 +17,13 @@ func.func @make_alloca_loop_independent(%lb: index, %ub: index, %step: index) {
     %alloc = memref.alloca(%i) : memref<?xf32>
 
     // memref.subview has special handling.
-    // CHECK: %[[subview2:.*]] = memref.subview %[[subview]][1] [5] [1] : memref<?xf32, strided<[1]>> to memref<5xf32, strided<[1], offset: 1>>
-    %view = memref.subview %alloc[1][5][1] : memref<?xf32> to memref<5xf32, strided<[1], offset: 1>>
+    // CHECK: %[[subview2:.*]] = memref.subview %[[subview]][1] [5] [1] : memref<?xf32, strided<[1]>> to memref<5xf32, strided<[1]>>
+    %view = memref.subview %alloc[1][5][1] : memref<?xf32> to memref<5xf32, strided<[1]>>
 
     // This op takes a memref but does not produce one. The new alloc is used
     // directly.
     // CHECK: "test.some_use"(%[[subview2]])
-    "test.some_use"(%view) : (memref<5xf32, strided<[1], offset: 1>>) -> ()
+    "test.some_use"(%view) : (memref<5xf32, strided<[1]>>) -> ()
 
     // This op produces a memref, so the new alloc cannot be used directly.
     // It is wrapped in a unrealized_conversion_cast.
diff --git a/mlir/test/Dialect/MemRef/multibuffer.mlir b/mlir/test/Dialect/MemRef/multibuffer.mlir
index b004ebfa1abd0..68e80048889d6 100644
--- a/mlir/test/Dialect/MemRef/multibuffer.mlir
+++ b/mlir/test/Dialect/MemRef/multibuffer.mlir
@@ -14,10 +14,10 @@ func.func @multi_buffer(%a: memref<1024x1024xf32>) {
 // CHECK: scf.for %[[IV:.*]] = %[[C1]]
   scf.for %arg2 = %c1 to %c1024 step %c3 {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
-// CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
    %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
     memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
    memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
 // CHECK: "some_use"(%[[SV]]) : (memref<4x128xf32, strided{{.*}}>) -> ()
     "some_use"(%0) : (memref<4x128xf32>) -> ()
@@ -39,10 +39,10 @@ func.func @multi_buffer_affine(%a: memref<1024x1024xf32>) {
 // CHECK: affine.for %[[IV:.*]] = 1
   affine.for %arg2 = 1 to 1024 step 3 {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
-// CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
    %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
     memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
    memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
 // CHECK: "some_use"(%[[SV]]) : (memref<4x128xf32, strided{{.*}}>) -> ()
     "some_use"(%0) : (memref<4x128xf32>) -> ()
@@ -68,17 +68,17 @@ func.func @multi_buffer_subview_use(%a: memref<1024x1024xf32>) {
 // CHECK: scf.for %[[IV:.*]] = %[[C1]]
   scf.for %arg2 = %c1 to %c1024 step %c3 {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
-// CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
    %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
     memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
    memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
-// CHECK: %[[SV1:.*]] = memref.subview %[[SV]][0, 1] [4, 127] [1, 1] : memref<4x128xf32, strided<[128, 1], offset: ?>> to memref<4x127xf32, strided<[128, 1], offset: ?>>
+// CHECK: %[[SV1:.*]] = memref.subview %[[SV]][0, 1] [4, 127] [1, 1] : memref<4x128xf32, strided<[128, 1]>> to memref<4x127xf32, strided<[128, 1]>>
    %s = memref.subview %0[0, 1] [4, 127] [1, 1] :
       memref<4x128xf32> to memref<4x127xf32, affine_map<(d0, d1) -> (d0 * 128 + d1 + 1)>>
-// CHECK: "some_use"(%[[SV1]]) : (memref<4x127xf32, strided<[128, 1], offset: ?>>) -> ()
+// CHECK: "some_use"(%[[SV1]]) : (memref<4x127xf32, strided<[128, 1]>>) -> ()
    "some_use"(%s) : (memref<4x127xf32, affine_map<(d0, d1) -> (d0 * 128 + d1 + 1)>>) -> ()
-// CHECK: "some_use"(%[[SV]]) : (memref<4x128xf32, strided<[128, 1], offset: ?>>) -> ()
+// CHECK: "some_use"(%[[SV]]) : (memref<4x128xf32, strided<[128, 1]>>) -> ()
    "some_use"(%0) : (memref<4x128xf32>) -> ()
   }
   return
@@ -120,15 +120,15 @@ func.func @multi_buffer_expand_shape(%a: memref<1024x1024xf32>) {
 // CHECK: scf.for %[[IV:.*]] = %{{.*}}
   scf.for %arg2 = %c1 to %c1024 step %c3 {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
-// CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
     %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
         memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
     memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
-// CHECK: %[[EXPANDED:.*]] = memref.expand_shape %[[SV]] {{\[\[}}0, 1], [2, 3]] output_shape [2, 2, 64, 2] : memref<4x128xf32, strided<[128, 1], offset: ?>> into memref<2x2x64x2xf32, strided<[256, 128, 2, 1], offset: ?>>
+// CHECK: %[[EXPANDED:.*]] = memref.expand_shape %[[SV]] {{\[\[}}0, 1], [2, 3]] output_shape [2, 2, 64, 2] : memref<4x128xf32, strided<[128, 1]>> into memref<2x2x64x2xf32, strided<[256, 128, 2, 1]>>
     %expanded = memref.expand_shape %0 [[0, 1], [2, 3]] output_shape [2, 2, 64, 2]
         : memref<4x128xf32> into memref<2x2x64x2xf32>
-// CHECK: "some_use"(%[[EXPANDED]]) : (memref<2x2x64x2xf32, strided<[256, 128, 2, 1], offset: ?>>) -> ()
+// CHECK: "some_use"(%[[EXPANDED]]) : (memref<2x2x64x2xf32, strided<[256, 128, 2, 1]>>) -> ()
     "some_use"(%expanded) : (memref<2x2x64x2xf32>) -> ()
   }
   return
@@ -150,15 +150,15 @@ func.func @multi_buffer_collapse_shape(%a: memref<1024x1024xf32>) {
 // CHECK: scf.for %[[IV:.*]] = %{{.*}}
   scf.for %arg2 = %c1 to %c1024 step %c3 {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
-// CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
     %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
         memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
     memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
-// CHECK: %[[COLLAPSED:.*]] = memref.collapse_shape %[[SV]] {{\[\[}}0, 1]] : memref<4x128xf32, strided<[128, 1], offset: ?>> into memref<512xf32, strided<[1], offset: ?>>
+// CHECK: %[[COLLAPSED:.*]] = memref.collapse_shape %[[SV]] {{\[\[}}0, 1]] : memref<4x128xf32, strided<[128, 1]>> into memref<512xf32, strided<[1]>>
     %collapsed = memref.collapse_shape %0 [[0, 1]]
         : memref<4x128xf32> into memref<512xf32>
-// CHECK: "some_use"(%[[COLLAPSED]]) : (memref<512xf32, strided<[1], offset: ?>>) -> ()
+// CHECK: "some_use"(%[[COLLAPSED]]) : (memref<512xf32, strided<[1]>>) -> ()
     "some_use"(%collapsed) : (memref<512xf32>) -> ()
   }
   return
@@ -180,12 +180,12 @@ func.func @multi_buffer_cast(%a: memref<1024x1024xf32>) {
 // CHECK: scf.for %[[IV:.*]] = %{{.*}}
   scf.for %arg2 = %c1 to %c1024 step %c3 {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
-// CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
     %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
         memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
     memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
-// CHECK: %[[CAST:.*]] = memref.cast %[[SV]] : memref<4x128xf32, strided<[128, 1], offset: ?>> to memref<?x128xf32>
+// CHECK: %[[CAST:.*]] = memref.cast %[[SV]] : memref<4x128xf32, strided<[128, 1]>> to memref<?x128xf32>
     %casted = memref.cast %0 : memref<4x128xf32> to memref<?x128xf32>
 // CHECK: "some_use"(%[[CAST]]) : (memref<?x128xf32>) -> ()
     "some_use"(%casted) : (memref<?x128xf32>) -> ()
@@ -209,15 +209,15 @@ func.func @multi_buffer_chained_view_ops(%a: memref<1024x1024xf32>) {
 // CHECK: scf.for %[[IV:.*]] = %{{.*}}
   scf.for %arg2 = %c1 to %c1024 step %c3 {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
-// CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
     %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
         memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1], offset: ?>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
     memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
-// CHECK: %[[EXPANDED:.*]] = memref.expand_shape %[[SV]] {{\[\[}}0, 1], [2, 3]] output_shape [2, 2, 64, 2] : memref<4x128xf32, strided<[128, 1], offset: ?>> into memref<2x2x64x2xf32, strided<[256, 128, 2, 1], offset: ?>>
+// CHECK: %[[EXPANDED:.*]] = memref.expand_shape %[[SV]] {{\[\[}}0, 1], [2, 3]] output_shape [2, 2, 64, 2] : memref<4x128xf32, strided<[128, 1]>> into memref<2x2x64x2xf32, strided<[256, 128, 2, 1]>>
     %expanded = memref.expand_shape %0 [[0, 1], [2, 3]] output_shape [2, 2, 64, 2]
         : memref<4x128xf32> into memref<2x2x64x2xf32>
-// CHECK: %[[CAST:.*]] = memref.cast %[[EXPANDED]] : memref<2x2x64x2xf32, strided<[256, 128, 2, 1], offset: ?>> to memref<?x2x64x2xf32>
+// CHECK: %[[CAST:.*]] = memref.cast %[[EXPANDED]] : memref<2x2x64x2xf32, strided<[256, 128, 2, 1]>> to memref<?x2x64x2xf32>
     %casted = memref.cast %expanded : memref<2x2x64x2xf32> to memref<?x2x64x2xf32>
 // CHECK: "some_use"(%[[CAST]]) : (memref<?x2x64x2xf32>) -> ()
     "some_use"(%casted) : (memref<?x2x64x2xf32>) -> ()
diff --git a/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir b/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir
index 344da4e5e2462..e969ee7bf710b 100644
--- a/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir
+++ b/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir
@@ -185,7 +185,7 @@ func.func @test_reinterpret_cast(%arg0: memref<5x7xf32>, %arg1: memref<5x7xf32>,
 }
 
 // CHECK-LABEL: reinterpret_cast_non_zero_offset
-func.func @reinterpret_cast_non_zero_offset(%arg0: index, %arg1: memref<1x10x17xi32, strided<[?, ?, ?], offset: ?>>, %arg2: memref<1x10x17xi32, strided<[?, ?, ?], offset: ?>>, %arg3: memref<1x10x17xi32, strided<[?, ?, ?], offset: ?>>) -> (memref<1x5xf32, strided<[17, 1], offset: 27>>, memref<1x5xf32, strided<[17, 1], offset: 27>>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>) {
+func.func @reinterpret_cast_non_zero_offset(%arg0: index, %arg1: memref<1x10x17xi32, strided<[?, ?, ?]>>, %arg2: memref<1x10x17xi32, strided<[?, ?, ?]>>, %arg3: memref<1x10x17xi32, strided<[?, ?, ?]>>) -> (memref<1x5xf32, strided<[17, 1]>>, memref<1x5xf32, strided<[17, 1]>>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>) {
   %alloc = memref.alloc() {alignment = 64 : i64} : memref<1x10x17xi32>
   %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<2x17xf32>
   %alloc_1 = memref.alloc() {alignment = 64 : i64} : memref<1x10x17xf32>
@@ -193,6 +193,6 @@ func.func @reinterpret_cast_non_zero_offset(%arg0: index, %arg1: memref<1x10x17x
 ^bb3:  // pred: ^bb1
   // CHECK: %[[REINTERPRET_CAST:.*]] = memref.reinterpret_cast %{{.*}} to offset: [0], sizes: [32], strides: [1] : memref<2x17xf32> to memref<32xf32>
   // CHECK: return %[[REINTERPRET_CAST]], %[[REINTERPRET_CAST]], %{{.*}}, %{{.*}}, %{{.*}} : memref<32xf32>, memref<32xf32>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>
-  %reinterpret_cast = memref.reinterpret_cast %alloc_0 to offset: [27], sizes: [1, 5], strides: [17, 1] : memref<2x17xf32> to memref<1x5xf32, strided<[17, 1], offset: 27>>
-  return %reinterpret_cast, %reinterpret_cast, %alloc_0, %alloc, %alloc_1 : memref<1x5xf32, strided<[17, 1], offset: 27>>, memref<1x5xf32, strided<[17, 1], offset: 27>>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>
+  %reinterpret_cast = memref.reinterpret_cast %alloc_0 to offset: [27], sizes: [1, 5], strides: [17, 1] : memref<2x17xf32> to memref<1x5xf32, strided<[17, 1]>>
+  return %reinterpret_cast, %reinterpret_cast, %alloc_0, %alloc, %alloc_1 : memref<1x5xf32, strided<[17, 1]>>, memref<1x5xf32, strided<[17, 1]>>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>
 }
diff --git a/mlir/test/Dialect/MemRef/normalize-memrefs.mlir b/mlir/test/Dialect/MemRef/normalize-memrefs.mlir
index d2924fb1ecf77..140706bd766ba 100644
--- a/mlir/test/Dialect/MemRef/normalize-memrefs.mlir
+++ b/mlir/test/Dialect/MemRef/normalize-memrefs.mlir
@@ -374,11 +374,11 @@ func.func @neg_map() -> memref<2x3xf32, #neg> {
 // CHECK-LABEL: func @memref_with_strided_offset
 func.func @memref_with_strided_offset(%arg0: tensor<128x512xf32>, %arg1: index, %arg2: index) -> tensor<16x512xf32> {
   %c0 = arith.constant 0 : index
-  %0 = bufferization.to_buffer %arg0 : tensor<128x512xf32> to memref<128x512xf32, strided<[?, ?], offset: ?>>
-  %subview = memref.subview %0[%arg2, 0] [%arg1, 512] [1, 1] : memref<128x512xf32, strided<[?, ?], offset: ?>> to memref<?x512xf32, strided<[?, ?], offset: ?>>
-  // CHECK: %{{.*}} = memref.cast %{{.*}} : memref<?x512xf32, strided<[?, ?], offset: ?>> to memref<16x512xf32, strided<[?, ?], offset: ?>>
-  %cast = memref.cast %subview : memref<?x512xf32, strided<[?, ?], offset: ?>> to memref<16x512xf32, strided<[?, ?], offset: ?>>
-  %1 = bufferization.to_tensor %cast : memref<16x512xf32, strided<[?, ?], offset: ?>> to tensor<16x512xf32>
+  %0 = bufferization.to_buffer %arg0 : tensor<128x512xf32> to memref<128x512xf32, strided<[?, ?]>>
+  %subview = memref.subview %0[%arg2, 0] [%arg1, 512] [1, 1] : memref<128x512xf32, strided<[?, ?]>> to memref<?x512xf32, strided<[?, ?]>>
+  // CHECK: %{{.*}} = memref.cast %{{.*}} : memref<?x512xf32, strided<[?, ?]>> to memref<16x512xf32, strided<[?, ?]>>
+  %cast = memref.cast %subview : memref<?x512xf32, strided<[?, ?]>> to memref<16x512xf32, strided<[?, ?]>>
+  %1 = bufferization.to_tensor %cast : memref<16x512xf32, strided<[?, ?]>> to tensor<16x512xf32>
   return %1 : tensor<16x512xf32>
 }
 
diff --git a/mlir/test/Dialect/MemRef/ops.mlir b/mlir/test/Dialect/MemRef/ops.mlir
index 14ac6a03d6ae0..84f89932e6dd3 100644
--- a/mlir/test/Dialect/MemRef/ops.mlir
+++ b/mlir/test/Dialect/MemRef/ops.mlir
@@ -120,31 +120,31 @@ func.func @dma_ops() {
 
 // CHECK-LABEL: func @memref_reinterpret_cast
 func.func @memref_reinterpret_cast(%in: memref<?xf32>)
-    -> memref<10x?xf32, strided<[?, 1], offset: ?>> {
+    -> memref<10x?xf32, strided<[?, 1]>> {
   %c0 = arith.constant 0 : index
   %c10 = arith.constant 10 : index
   %out = memref.reinterpret_cast %in to
            offset: [%c0], sizes: [10, %c10], strides: [%c10, 1]
-           : memref<?xf32> to memref<10x?xf32, strided<[?, 1], offset: ?>>
-  return %out : memref<10x?xf32, strided<[?, 1], offset: ?>>
+           : memref<?xf32> to memref<10x?xf32, strided<[?, 1]>>
+  return %out : memref<10x?xf32, strided<[?, 1]>>
 }
 
 // CHECK-LABEL: func @memref_reinterpret_cast_static_to_dynamic_sizes
 func.func @memref_reinterpret_cast_static_to_dynamic_sizes(%in: memref<?xf32>)
-    -> memref<10x?xf32, strided<[?, 1], offset: ?>> {
+    -> memref<10x?xf32, strided<[?, 1]>> {
   %out = memref.reinterpret_cast %in to
            offset: [1], sizes: [10, 10], strides: [1, 1]
-           : memref<?xf32> to memref<10x?xf32, strided<[?, 1], offset: ?>>
-  return %out : memref<10x?xf32, strided<[?, 1], offset: ?>>
+           : memref<?xf32> to memref<10x?xf32, strided<[?, 1]>>
+  return %out : memref<10x?xf32, strided<[?, 1]>>
 }
 
 // CHECK-LABEL: func @memref_reinterpret_cast_dynamic_offset
 func.func @memref_reinterpret_cast_dynamic_offset(%in: memref<?xf32>, %offset: index)
-    -> memref<10x?xf32, strided<[?, 1], offset: ?>> {
+    -> memref<10x?xf32, strided<[?, 1]>> {
   %out = memref.reinterpret_cast %in to
            offset: [%offset], sizes: [10, 10], strides: [1, 1]
-           : memref<?xf32> to memref<10x?xf32, strided<[?, 1], offset: ?>>
-  return %out : memref<10x?xf32, strided<[?, 1], offset: ?>>
+           : memref<?xf32> to memref<10x?xf32, strided<[?, 1]>>
+  return %out : memref<10x?xf32, strided<[?, 1]>>
 }
 
 // CHECK-LABEL: func @memref_reshape(
@@ -211,18 +211,18 @@ func.func @memref_alloca_scope() {
 }
 
 // CHECK-LABEL: func @memref_cast(%arg0
-func.func @memref_cast(%arg0: memref<4xf32>, %arg1 : memref<?xf32>, %arg2 : memref<64x16x4xf32, strided<[64, 4, 1], offset: 0>>, %arg3 : memref<4x1x8xf32, strided<[32, 16, 1]>>, %arg4 : memref<4x?x8xf32, strided<[32, 8, 1]>>) {
+func.func @memref_cast(%arg0: memref<4xf32>, %arg1 : memref<?xf32>, %arg2 : memref<64x16x4xf32, strided<[64, 4, 1]>>, %arg3 : memref<4x1x8xf32, strided<[32, 16, 1]>>, %arg4 : memref<4x?x8xf32, strided<[32, 8, 1]>>) {
   // CHECK: memref.cast %{{.*}} : memref<4xf32> to memref<?xf32>
   %0 = memref.cast %arg0 : memref<4xf32> to memref<?xf32>
 
   // CHECK: memref.cast %{{.*}} : memref<?xf32> to memref<4xf32>
   %1 = memref.cast %arg1 : memref<?xf32> to memref<4xf32>
 
-  // CHECK: memref.cast %{{.*}} : memref<64x16x4xf32, strided<[64, 4, 1]>> to memref<64x16x4xf32, strided<[?, ?, ?], offset: ?>>
-  %2 = memref.cast %arg2 : memref<64x16x4xf32, strided<[64, 4, 1], offset: 0>> to memref<64x16x4xf32, strided<[?, ?, ?], offset: ?>>
+  // CHECK: memref.cast %{{.*}} : memref<64x16x4xf32, strided<[64, 4, 1]>> to memref<64x16x4xf32, strided<[?, ?, ?]>>
+  %2 = memref.cast %arg2 : memref<64x16x4xf32, strided<[64, 4, 1]>> to memref<64x16x4xf32, strided<[?, ?, ?]>>
 
-  // CHECK: memref.cast {{%.*}} : memref<64x16x4xf32, strided<[?, ?, ?], offset: ?>> to memref<64x16x4xf32, strided<[64, 4, 1]>>
-  %3 = memref.cast %2 : memref<64x16x4xf32, strided<[?, ?, ?], offset: ?>> to memref<64x16x4xf32, strided<[64, 4, 1], offset: 0>>
+  // CHECK: memref.cast {{%.*}} : memref<64x16x4xf32, strided<[?, ?, ?]>> to memref<64x16x4xf32, strided<[64, 4, 1]>>
+  %3 = memref.cast %2 : memref<64x16x4xf32, strided<[?, ?, ?]>> to memref<64x16x4xf32, strided<[64, 4, 1]>>
 
   // CHECK: memref.cast %{{.*}} : memref<4xf32> to memref<*xf32>
   %4 = memref.cast %1 : memref<4xf32> to memref<*xf32>
@@ -322,13 +322,13 @@ func.func @expand_collapse_shape_static(
     %arg0: memref<3x4x5xf32>,
     %arg1: tensor<3x4x5xf32>,
     %arg2: tensor<3x?x5xf32>,
-    %arg3: memref<30x20xf32, strided<[4000, 2], offset: 100>>,
-    %arg4: memref<1x5xf32, strided<[5, 1], offset: ?>>,
+    %arg3: memref<30x20xf32, strided<[4000, 2]>>,
+    %arg4: memref<1x5xf32, strided<[5, 1]>>,
     %arg5: memref<f32>,
-    %arg6: memref<3x4x5xf32, strided<[240, 60, 10], offset: 0>>,
-    %arg7: memref<1x2049xi64, strided<[?, ?], offset: ?>>,
-    %arg8: memref<1x1x1024xi8, strided<[40960, 4096, 1], offset: 0>>,
-    %arg9: memref<24x1x1x1024xi8, strided<[40960, 40960, 4096, 1], offset: 0>>) {
+    %arg6: memref<3x4x5xf32, strided<[240, 60, 10]>>,
+    %arg7: memref<1x2049xi64, strided<[?, ?]>>,
+    %arg8: memref<1x1x1024xi8, strided<[40960, 4096, 1]>>,
+    %arg9: memref<24x1x1x1024xi8, strided<[40960, 40960, 4096, 1]>>) {
   // Reshapes that collapse and expand back a contiguous buffer.
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0, 1], [2]]
 //  CHECK-SAME:     memref<3x4x5xf32> into memref<12x5xf32>
@@ -368,42 +368,42 @@ func.func @expand_collapse_shape_static(
 // Reshapes with a custom layout map.
 //       CHECK:   memref.expand_shape {{.*}} {{\[}}[0], [1, 2]] output_shape [30, 4, 5]
   %l0 = memref.expand_shape %arg3 [[0], [1, 2]] output_shape [30, 4, 5] :
-      memref<30x20xf32, strided<[4000, 2], offset: 100>>
-      into memref<30x4x5xf32, strided<[4000, 10, 2], offset: 100>>
+      memref<30x20xf32, strided<[4000, 2]>>
+      into memref<30x4x5xf32, strided<[4000, 10, 2]>>
 
 //       CHECK:   memref.expand_shape {{.*}} {{\[}}[0, 1], [2]] output_shape [2, 15, 20]
   %l1 = memref.expand_shape %arg3 [[0, 1], [2]] output_shape [2, 15, 20] :
-      memref<30x20xf32, strided<[4000, 2], offset: 100>>
-      into memref<2x15x20xf32, strided<[60000, 4000, 2], offset: 100>>
+      memref<30x20xf32, strided<[4000, 2]>>
+      into memref<2x15x20xf32, strided<[60000, 4000, 2]>>
 
 //       CHECK:   memref.expand_shape {{.*}} {{\[}}[0], [1, 2]] output_shape [1, 1, 5]
   %r4 = memref.expand_shape %arg4 [[0], [1, 2]] output_shape [1, 1, 5] :
-      memref<1x5xf32, strided<[5, 1], offset: ?>> into
-      memref<1x1x5xf32, strided<[5, 5, 1], offset: ?>>
+      memref<1x5xf32, strided<[5, 1]>> into
+      memref<1x1x5xf32, strided<[5, 5, 1]>>
 
   // Note: Only the collapsed two shapes are contiguous in the follow test case.
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0, 1], [2]]
   %r6 = memref.collapse_shape %arg6 [[0, 1], [2]] :
-      memref<3x4x5xf32, strided<[240, 60, 10], offset: 0>> into
-      memref<12x5xf32, strided<[60, 10], offset: 0>>
+      memref<3x4x5xf32, strided<[240, 60, 10]>> into
+      memref<12x5xf32, strided<[60, 10]>>
 
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0, 1]]
   %r7 = memref.collapse_shape %arg7 [[0, 1]] :
-      memref<1x2049xi64, strided<[?, ?], offset: ?>> into
-      memref<2049xi64, strided<[?], offset: ?>>
+      memref<1x2049xi64, strided<[?, ?]>> into
+      memref<2049xi64, strided<[?]>>
 
-    // %arg8: memref<1x1x1024xi8, strided<[40960, 4096, 1], offset: 0>>,
-    // %arg9: memref<24x1x1x1024xi8, strided<[40960, 40960, 4096, 1], offset: 0>>) {
+    // %arg8: memref<1x1x1024xi8, strided<[40960, 4096, 1]>>,
+    // %arg9: memref<24x1x1x1024xi8, strided<[40960, 40960, 4096, 1]>>) {
 
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0, 1, 2]]
   %r8 = memref.collapse_shape %arg8 [[0, 1, 2]] :
-      memref<1x1x1024xi8, strided<[40960, 4096, 1], offset: 0>> into
-      memref<1024xi8, strided<[1], offset: 0>>
+      memref<1x1x1024xi8, strided<[40960, 4096, 1]>> into
+      memref<1024xi8, strided<[1]>>
 
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0], [1, 2, 3]]
   %r9 = memref.collapse_shape %arg9 [[0], [1, 2, 3]] :
-      memref<24x1x1x1024xi8, strided<[40960, 40960, 4096, 1], offset: 0>> into
-      memref<24x1024xi8, strided<[40960, 1], offset: 0>>
+      memref<24x1x1x1024xi8, strided<[40960, 40960, 4096, 1]>> into
+      memref<24x1024xi8, strided<[40960, 1]>>
 
   // Reshapes that expand and collapse back a contiguous buffer with some 1's.
 //       CHECK:   memref.expand_shape {{.*}} {{\[}}[0, 1], [2], [3, 4]] output_shape [1, 3, 4, 1, 5]
@@ -440,15 +440,15 @@ func.func @expand_collapse_shape_static(
 
 // CHECK-LABEL: func @expand_collapse_shape_dynamic
 func.func @expand_collapse_shape_dynamic(%arg0: memref<?x?x?xf32>,
-         %arg1: memref<?x?x?xf32, strided<[?, ?, 1], offset: 0>>,
-         %arg2: memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>,
-         %arg3: memref<?x42xf32, strided<[42, 1], offset: 0>>,
+         %arg1: memref<?x?x?xf32, strided<[?, ?, 1]>>,
+         %arg2: memref<?x?x?xf32, strided<[?, ?, 1]>>,
+         %arg3: memref<?x42xf32, strided<[42, 1]>>,
          %arg4: index,
          %arg5: index,
          %arg6: index,
          %arg7: memref<4x?x4xf32>,
-         %arg8: memref<1x1x18x?xf32, strided<[?, ?, ?, 1], offset: ?>>,
-         %arg9: memref<3x3x1x96xf32, strided<[288, 96, 96, 1], offset: 864>>) {
+         %arg8: memref<1x1x18x?xf32, strided<[?, ?, ?, 1]>>,
+         %arg9: memref<3x3x1x96xf32, strided<[288, 96, 96, 1]>>) {
 
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0, 1], [2]]
 //  CHECK-SAME:     memref<?x?x?xf32> into memref<?x?xf32>
@@ -463,31 +463,31 @@ func.func @expand_collapse_shape_dynamic(%arg0: memref<?x?x?xf32>,
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0, 1], [2]]
 //  CHECK-SAME:     memref<?x?x?xf32, strided<[?, ?, 1]>> into memref<?x?xf32, strided<[?, 1]>>
   %1 = memref.collapse_shape %arg1 [[0, 1], [2]] :
-    memref<?x?x?xf32, strided<[?, ?, 1], offset: 0>> into
-    memref<?x?xf32, strided<[?, 1], offset: 0>>
+    memref<?x?x?xf32, strided<[?, ?, 1]>> into
+    memref<?x?xf32, strided<[?, 1]>>
 
 //       CHECK:   memref.expand_shape {{.*}} {{\[}}[0, 1], [2]] output_shape [%arg4, 4, %arg5]
 //  CHECK-SAME:     memref<?x?xf32, strided<[?, 1]>> into memref<?x4x?xf32, strided<[?, ?, 1]>>
   %r1 = memref.expand_shape %1 [[0, 1], [2]] output_shape [%arg4, 4, %arg5] :
-    memref<?x?xf32, strided<[?, 1], offset: 0>> into
-    memref<?x4x?xf32, strided<[?, ?, 1], offset: 0>>
+    memref<?x?xf32, strided<[?, 1]>> into
+    memref<?x4x?xf32, strided<[?, ?, 1]>>
 
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0, 1], [2]]
-//  CHECK-SAME:     memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>> into memref<?x?xf32, strided<[?, 1], offset: ?>>
+//  CHECK-SAME:     memref<?x?x?xf32, strided<[?, ?, 1]>> into memref<?x?xf32, strided<[?, 1]>>
   %2 = memref.collapse_shape %arg2 [[0, 1], [2]] :
-    memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>> into
-    memref<?x?xf32, strided<[?, 1], offset: ?>>
+    memref<?x?x?xf32, strided<[?, ?, 1]>> into
+    memref<?x?xf32, strided<[?, 1]>>
 
 //       CHECK:   memref.expand_shape {{.*}} {{\[}}[0, 1], [2]] output_shape [%arg4, 4, %arg5]
-//  CHECK-SAME:     memref<?x?xf32, strided<[?, 1], offset: ?>> into memref<?x4x?xf32, strided<[?, ?, 1], offset: ?>>
+//  CHECK-SAME:     memref<?x?xf32, strided<[?, 1]>> into memref<?x4x?xf32, strided<[?, ?, 1]>>
   %r2 = memref.expand_shape %2 [[0, 1], [2]] output_shape [%arg4, 4, %arg5] :
-    memref<?x?xf32, strided<[?, 1], offset: ?>> into
-    memref<?x4x?xf32, strided<[?, ?, 1], offset: ?>>
+    memref<?x?xf32, strided<[?, 1]>> into
+    memref<?x4x?xf32, strided<[?, ?, 1]>>
 
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0, 1]]
 //  CHECK-SAME:     memref<?x42xf32, strided<[42, 1]>> into memref<?xf32, strided<[1]>>
   %3 = memref.collapse_shape %arg3 [[0, 1]] :
-    memref<?x42xf32, strided<[42, 1], offset: 0>> into
+    memref<?x42xf32, strided<[42, 1]>> into
     memref<?xf32, strided<[1]>>
 
 //       CHECK:   memref.expand_shape {{.*}} {{\[}}[0, 1]] output_shape [%arg6, 42]
@@ -500,14 +500,14 @@ func.func @expand_collapse_shape_dynamic(%arg0: memref<?x?x?xf32>,
         : memref<4x?x4xf32> into memref<2x2x?x2x2xf32>
 
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0, 1], [2], [3]]
-//  CHECK-SAME:     memref<1x1x18x?xf32, strided<[?, ?, ?, 1], offset: ?>> into memref<1x18x?xf32, strided<[?, ?, 1], offset: ?>>
-  %5 = memref.collapse_shape %arg8 [[0, 1], [2], [3]] : memref<1x1x18x?xf32, strided<[?, ?, ?, 1], offset: ?>> into memref<1x18x?xf32, strided<[?, ?, 1], offset: ?>>
+//  CHECK-SAME:     memref<1x1x18x?xf32, strided<[?, ?, ?, 1]>> into memref<1x18x?xf32, strided<[?, ?, 1]>>
+  %5 = memref.collapse_shape %arg8 [[0, 1], [2], [3]] : memref<1x1x18x?xf32, strided<[?, ?, ?, 1]>> into memref<1x18x?xf32, strided<[?, ?, 1]>>
 
 //       CHECK:   memref.collapse_shape {{.*}} {{\[}}[0], [1, 2, 3]]
-//  CHECK-SAME:     memref<3x3x1x96xf32, strided<[288, 96, 96, 1], offset: 864>> into memref<3x288xf32, strided<[288, 1], offset: 864>>
+//  CHECK-SAME:     memref<3x3x1x96xf32, strided<[288, 96, 96, 1]>> into memref<3x288xf32, strided<[288, 1]>>
   %6 = memref.collapse_shape %arg9 [[0], [1, 2, 3]] :
-    memref<3x3x1x96xf32, strided<[288, 96, 96, 1], offset: 864>> into
-    memref<3x288xf32, strided<[288, 1], offset: 864>>
+    memref<3x3x1x96xf32, strided<[288, 96, 96, 1]>> into
+    memref<3x288xf32, strided<[288, 1]>>
   return
 }
 
@@ -535,24 +535,24 @@ func.func @collapse_shape_to_dynamic
 
 // CHECK-LABEL: func @expand_collapse_shape_transposed_layout
 func.func @expand_collapse_shape_transposed_layout(
-    %m0: memref<?x?xf32, strided<[1, 10], offset: 0>>,
-    %m1: memref<4x5x6xf32, strided<[1, ?, 1000], offset: 0>>,
+    %m0: memref<?x?xf32, strided<[1, 10]>>,
+    %m1: memref<4x5x6xf32, strided<[1, ?, 1000]>>,
     %sz0: index,
     %sz1: index) {
 
   %r0 = memref.expand_shape %m0 [[0], [1, 2]] output_shape [%sz0, %sz1, 5] :
-    memref<?x?xf32, strided<[1, 10], offset: 0>> into
-    memref<?x?x5xf32, strided<[1, 50, 10], offset: 0>>
+    memref<?x?xf32, strided<[1, 10]>> into
+    memref<?x?x5xf32, strided<[1, 50, 10]>>
   %rr0 = memref.collapse_shape %r0 [[0], [1, 2]] :
-    memref<?x?x5xf32, strided<[1, 50, 10], offset: 0>> into
-    memref<?x?xf32, strided<[1, 10], offset: 0>>
+    memref<?x?x5xf32, strided<[1, 50, 10]>> into
+    memref<?x?xf32, strided<[1, 10]>>
 
   %r1 = memref.expand_shape %m1 [[0, 1], [2], [3, 4]] output_shape [2, 2, 5, 2, 3] :
-    memref<4x5x6xf32, strided<[1, ?, 1000], offset: 0>> into
-    memref<2x2x5x2x3xf32, strided<[2, 1, ?, 3000, 1000], offset: 0>>
+    memref<4x5x6xf32, strided<[1, ?, 1000]>> into
+    memref<2x2x5x2x3xf32, strided<[2, 1, ?, 3000, 1000]>>
   %rr1 = memref.collapse_shape %r1 [[0, 1], [2], [3, 4]] :
-    memref<2x2x5x2x3xf32, strided<[2, 1, ?, 3000, 1000], offset: 0>> into
-    memref<4x5x6xf32, strided<[1, ?, 1000], offset: 0>>
+    memref<2x2x5x2x3xf32, strided<[2, 1, ?, 3000, 1000]>> into
+    memref<4x5x6xf32, strided<[1, ?, 1000]>>
   return
 }
 
@@ -594,7 +594,7 @@ func.func @generic_atomic_rmw(%I: memref<1x2xf32>, %i : index, %j : index) {
 // -----
 
 func.func @extract_strided_metadata(%memref : memref<10x?xf32>)
-  -> memref<?x?xf32, strided<[?, ?], offset: ?>> {
+  -> memref<?x?xf32, strided<[?, ?]>> {
 
   %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %memref
     : memref<10x?xf32> -> memref<f32>, index, index, index, index, index
@@ -603,9 +603,9 @@ func.func @extract_strided_metadata(%memref : memref<10x?xf32>)
       offset: [%offset],
       sizes: [%sizes#0, %sizes#1],
       strides: [%strides#0, %strides#1]
-    : memref<f32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+    : memref<f32> to memref<?x?xf32, strided<[?, ?]>>
 
-  return %m2: memref<?x?xf32, strided<[?, ?], offset: ?>>
+  return %m2: memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
diff --git a/mlir/test/Dialect/MemRef/subview.mlir b/mlir/test/Dialect/MemRef/subview.mlir
index fd8aaaf86b2d8..ee37ac307c8bb 100644
--- a/mlir/test/Dialect/MemRef/subview.mlir
+++ b/mlir/test/Dialect/MemRef/subview.mlir
@@ -13,13 +13,13 @@ func.func @memref_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
 
-  %0 = memref.alloc() : memref<8x16x4xf32, strided<[64, 4, 1], offset: 0>>
+  %0 = memref.alloc() : memref<8x16x4xf32, strided<[64, 4, 1]>>
   // CHECK: subview %{{.*}}[%[[c0]], %[[c0]], %[[c0]]] [%{{.*}}, %{{.*}}, %{{.*}}] [%[[c1]], %[[c1]], %[[c1]]] :
   // CHECK-SAME: memref<8x16x4xf32, strided<[64, 4, 1]>>
-  // CHECK-SAME: to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  // CHECK-SAME: to memref<?x?x?xf32, strided<[?, ?, ?]>>
   %1 = memref.subview %0[%c0, %c0, %c0][%arg0, %arg1, %arg2][%c1, %c1, %c1]
-    : memref<8x16x4xf32, strided<[64, 4, 1], offset: 0>> to
-      memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    : memref<8x16x4xf32, strided<[64, 4, 1]>> to
+      memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   %2 = memref.alloc()[%arg2] : memref<64xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
   // CHECK: memref.subview %{{.*}}[%[[c1]]] [%{{.*}}] [%[[c1]]] :
@@ -32,17 +32,17 @@ func.func @memref_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
   %4 = memref.alloc() : memref<64x22xf32, strided<[22, 1]>>
   // CHECK: memref.subview %{{.*}}[%[[c0]], %[[c1]]] [%{{.*}}, %{{.*}}] [%[[c1]], %[[c0]]] :
   // CHECK-SAME: memref<64x22xf32, strided<[22, 1]>>
-  // CHECK-SAME: to memref<?x?xf32, strided<[?, ?], offset: ?>>
+  // CHECK-SAME: to memref<?x?xf32, strided<[?, ?]>>
   %5 = memref.subview %4[%c0, %c1][%arg0, %arg1][%c1, %c0]
-    : memref<64x22xf32, strided<[22, 1], offset: 0>> to
-      memref<?x?xf32, strided<[?, ?], offset: ?>>
+    : memref<64x22xf32, strided<[22, 1]>> to
+      memref<?x?xf32, strided<[?, ?]>>
 
   // CHECK: memref.subview %{{.*}}[0, 2, 0] [4, 4, 4] [1, 1, 1] :
   // CHECK-SAME: memref<8x16x4xf32, strided<[64, 4, 1]>>
-  // CHECK-SAME: to memref<4x4x4xf32, strided<[64, 4, 1], offset: 8>>
+  // CHECK-SAME: to memref<4x4x4xf32, strided<[64, 4, 1]>>
   %6 = memref.subview %0[0, 2, 0][4, 4, 4][1, 1, 1]
-    : memref<8x16x4xf32, strided<[64, 4, 1], offset: 0>> to
-      memref<4x4x4xf32, strided<[64, 4, 1], offset: 8>>
+    : memref<8x16x4xf32, strided<[64, 4, 1]>> to
+      memref<4x4x4xf32, strided<[64, 4, 1]>>
 
   %7 = memref.alloc(%arg1, %arg2) : memref<?x?xf32>
   // CHECK: memref.subview {{%.*}}[0, 0] [4, 4] [1, 1] :
@@ -54,33 +54,33 @@ func.func @memref_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
   %9 = memref.alloc() : memref<16x4xf32>
   // CHECK: memref.subview {{%.*}}[{{%.*}}, {{%.*}}] [4, 4] [{{%.*}}, {{%.*}}] :
   // CHECK-SAME: memref<16x4xf32>
-  // CHECK-SAME: to memref<4x4xf32, strided<[?, ?], offset: ?>>
+  // CHECK-SAME: to memref<4x4xf32, strided<[?, ?]>>
   %10 = memref.subview %9[%arg1, %arg1][4, 4][%arg2, %arg2]
-    : memref<16x4xf32> to memref<4x4xf32, strided<[?, ?], offset: ?>>
+    : memref<16x4xf32> to memref<4x4xf32, strided<[?, ?]>>
 
   // CHECK: memref.subview {{%.*}}[{{%.*}}, {{%.*}}] [4, 4] [2, 2] :
   // CHECK-SAME: memref<16x4xf32>
-  // CHECK-SAME: to memref<4x4xf32, strided<[8, 2], offset: ?>>
+  // CHECK-SAME: to memref<4x4xf32, strided<[8, 2]>>
   %11 = memref.subview %9[%arg1, %arg2][4, 4][2, 2]
-    : memref<16x4xf32> to memref<4x4xf32, strided<[8, 2], offset: ?>>
+    : memref<16x4xf32> to memref<4x4xf32, strided<[8, 2]>>
 
   %12 = memref.alloc() : memref<1x9x1x4x1xf32, strided<[36, 36, 4, 4, 1]>>
   // CHECK: memref.subview
   // CHECK-SAME: [1, 9, 1, 4, 1]
-  // CHECK-SAME: memref<1x9x1x4x1xf32, strided<[36, 36, 4, 4, 1]>> to memref<9x4xf32, strided<[?, ?], offset: ?>>
-  %13 = memref.subview %12[%arg1, %arg1, %arg1, %arg1, %arg1][1, 9, 1, 4, 1][%arg2, %arg2, %arg2, %arg2, %arg2] : memref<1x9x1x4x1xf32, strided<[36, 36, 4, 4, 1], offset: 0>> to memref<9x4xf32, strided<[?, ?], offset: ?>>
+  // CHECK-SAME: memref<1x9x1x4x1xf32, strided<[36, 36, 4, 4, 1]>> to memref<9x4xf32, strided<[?, ?]>>
+  %13 = memref.subview %12[%arg1, %arg1, %arg1, %arg1, %arg1][1, 9, 1, 4, 1][%arg2, %arg2, %arg2, %arg2, %arg2] : memref<1x9x1x4x1xf32, strided<[36, 36, 4, 4, 1]>> to memref<9x4xf32, strided<[?, ?]>>
   // CHECK: memref.subview
   // CHECK-SAME: [1, 9, 1, 4, 1]
-  // CHECK-SAME: memref<1x9x1x4x1xf32, strided<[36, 36, 4, 4, 1]>> to memref<1x9x4xf32, strided<[?, ?, ?], offset: ?>>
-  %14 = memref.subview %12[%arg1, %arg1, %arg1, %arg1, %arg1][1, 9, 1, 4, 1][%arg2, %arg2, %arg2, %arg2, %arg2] : memref<1x9x1x4x1xf32, strided<[36, 36, 4, 4, 1], offset: 0>> to memref<1x9x4xf32, strided<[?, ?, ?], offset: ?>>
+  // CHECK-SAME: memref<1x9x1x4x1xf32, strided<[36, 36, 4, 4, 1]>> to memref<1x9x4xf32, strided<[?, ?, ?]>>
+  %14 = memref.subview %12[%arg1, %arg1, %arg1, %arg1, %arg1][1, 9, 1, 4, 1][%arg2, %arg2, %arg2, %arg2, %arg2] : memref<1x9x1x4x1xf32, strided<[36, 36, 4, 4, 1]>> to memref<1x9x4xf32, strided<[?, ?, ?]>>
 
-  %15 = memref.alloc(%arg1, %arg2)[%c0, %c1, %arg1, %arg0, %arg0, %arg2, %arg2] : memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?], offset: ?>>
+  %15 = memref.alloc(%arg1, %arg2)[%c1, %arg1, %arg0, %arg0, %arg2, %arg2] : memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?]>>
   // CHECK: memref.subview %{{.*}}[0, 0, 0, 0, 0, 0] [1, %{{.*}}, 5, 1, %{{.*}}, 1] [1, 1, 1, 1, 1, 1]  :
-  // CHECK-SAME: memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?], offset: ?>> to memref<?x5x?xf32, strided<[?, ?, ?], offset: ?>>
-  %16 = memref.subview %15[0, 0, 0, 0, 0, 0][1, %arg1, 5, 1, %arg2, 1][1, 1, 1, 1, 1, 1] : memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?], offset: ?>> to memref<?x5x?xf32, strided<[?, ?, ?], offset: ?>>
+  // CHECK-SAME: memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?]>> to memref<?x5x?xf32, strided<[?, ?, ?]>>
+  %16 = memref.subview %15[0, 0, 0, 0, 0, 0][1, %arg1, 5, 1, %arg2, 1][1, 1, 1, 1, 1, 1] : memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?]>> to memref<?x5x?xf32, strided<[?, ?, ?]>>
   // CHECK: memref.subview %{{.*}}[%{{.*}}, %{{.*}}, %{{.*}}, %{{.*}}, %{{.*}}, %{{.*}}] [1, %{{.*}}, 5, 1, %{{.*}}, 1] [1, 1, 1, 1, 1, 1]  :
-  // CHECK-SAME: memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?], offset: ?>> to memref<?x5x?x1xf32, strided<[?, ?, ?, ?], offset: ?>>
-  %17 = memref.subview %15[%arg1, %arg1, %arg1, %arg1, %arg1, %arg1][1, %arg1, 5, 1, %arg2, 1][1, 1, 1, 1, 1, 1] :  memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?], offset: ?>> to memref<?x5x?x1xf32, strided<[?, ?, ?, ?], offset: ?>>
+  // CHECK-SAME: memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?]>> to memref<?x5x?x1xf32, strided<[?, ?, ?, ?]>>
+  %17 = memref.subview %15[%arg1, %arg1, %arg1, %arg1, %arg1, %arg1][1, %arg1, 5, 1, %arg2, 1][1, 1, 1, 1, 1, 1] :  memref<1x?x5x1x?x1xf32, strided<[?, ?, ?, ?, ?, ?]>> to memref<?x5x?x1xf32, strided<[?, ?, ?, ?]>>
 
   %18 = memref.alloc() : memref<1x8xf32>
   // CHECK: memref.subview %{{.*}}[0, 0] [1, 8] [1, 1]  : memref<1x8xf32> to memref<8xf32>
@@ -90,19 +90,19 @@ func.func @memref_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
   // CHECK: memref.subview %{{.*}}[0, 0, 0] [1, 16, 4] [1, 1, 1]  : memref<8x16x4xf32> to memref<16x4xf32>
   %21 = memref.subview %20[0, 0, 0][1, 16, 4][1, 1, 1] : memref<8x16x4xf32> to memref<16x4xf32>
 
-  %22 = memref.subview %20[3, 4, 1][1, 6, 3][1, 1, 1] : memref<8x16x4xf32> to memref<6x3xf32, strided<[4, 1], offset: 209>>
+  %22 = memref.subview %20[3, 4, 1][1, 6, 3][1, 1, 1] : memref<8x16x4xf32> to memref<6x3xf32, strided<[4, 1]>>
 
   %23 = memref.alloc() : memref<f32>
   %78 = memref.subview %23[] [] []  : memref<f32> to memref<f32>
 
   /// Subview with only leading operands.
   %24 = memref.alloc() : memref<5x3xf32>
-  // CHECK: memref.subview %{{.*}}[2, 0] [3, 3] [1, 1] : memref<5x3xf32> to memref<3x3xf32, strided<[3, 1], offset: 6>>
-  %25 = memref.subview %24[2, 0][3, 3][1, 1]: memref<5x3xf32> to memref<3x3xf32, strided<[3, 1], offset: 6>>
+  // CHECK: memref.subview %{{.*}}[2, 0] [3, 3] [1, 1] : memref<5x3xf32> to memref<3x3xf32, strided<[3, 1]>>
+  %25 = memref.subview %24[2, 0][3, 3][1, 1]: memref<5x3xf32> to memref<3x3xf32, strided<[3, 1]>>
 
   /// Rank-reducing subview with only leading operands.
-  // CHECK: memref.subview %{{.*}}[1, 0] [1, 3] [1, 1] : memref<5x3xf32> to memref<3xf32, strided<[1], offset: 3>>
-  %26 = memref.subview %24[1, 0][1, 3][1, 1]: memref<5x3xf32> to memref<3xf32, strided<[1], offset: 3>>
+  // CHECK: memref.subview %{{.*}}[1, 0] [1, 3] [1, 1] : memref<5x3xf32> to memref<3xf32, strided<[1]>>
+  %26 = memref.subview %24[1, 0][1, 3][1, 1]: memref<5x3xf32> to memref<3xf32, strided<[1]>>
 
   // Corner-case of 0-D rank-reducing subview with an offset.
   // CHECK: memref.subview %{{.*}}[1, 1] [1, 1] [1, 1] : memref<5x3xf32> to memref<f32, #[[$SUBVIEW_MAP11]]>
diff --git a/mlir/test/Dialect/MemRef/transform-ops.mlir b/mlir/test/Dialect/MemRef/transform-ops.mlir
index 7fc84d419f18d..e1986009ef9b3 100644
--- a/mlir/test/Dialect/MemRef/transform-ops.mlir
+++ b/mlir/test/Dialect/MemRef/transform-ops.mlir
@@ -51,9 +51,9 @@ func.func @multi_buffer(%in: memref<16xf32>) {
   // CHECK: scf.for %[[IV:.*]] = %[[C0]]
   scf.for %i0 = %c0 to %c16 step %c4 {
     // CHECK: %[[I:.*]] = affine.apply #[[$MAP0]](%[[IV]])
-    // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1], offset: ?>>
+    // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1]>>
     %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, #[[$MAP1]]> to memref<4xf32, strided<[1], offset: ?>>
+    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, #[[$MAP1]]> to memref<4xf32, strided<[1]>>
     memref.copy %1, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
 
     "some_use"(%tmp) : (memref<4xf32>) ->()
@@ -88,9 +88,9 @@ func.func @multi_buffer_on_affine_loop(%in: memref<16xf32>) {
   // CHECK: affine.for %[[IV:.*]] = 0
   affine.for %i0 = 0 to 16 step 4 {
     // CHECK: %[[I:.*]] = affine.apply #[[$MAP0]](%[[IV]])
-    // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1], offset: ?>>
+    // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1]>>
     %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, #[[$MAP1]]> to memref<4xf32, strided<[1], offset: ?>>
+    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, #[[$MAP1]]> to memref<4xf32, strided<[1]>>
     memref.copy %1, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
 
     "some_use"(%tmp) : (memref<4xf32>) ->()
@@ -209,9 +209,9 @@ func.func @multi_buffer_one_alloc_with_use_outside_of_loop(%in: memref<16xf32>)
   // CHECK: scf.for %[[IV:.*]] = %[[C0]]
   scf.for %i0 = %c0 to %c16 step %c4 {
     // CHECK: %[[I:.*]] = affine.apply #[[$MAP0]](%[[IV]])
-    // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1], offset: ?>>
+    // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1]>>
     %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, #[[$MAP1]]> to memref<4xf32, strided<[1], offset: ?>>
+    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, #[[$MAP1]]> to memref<4xf32, strided<[1]>>
     memref.copy %1, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
 
     "some_use"(%tmp) : (memref<4xf32>) ->()
@@ -249,7 +249,7 @@ func.func @multi_buffer_no_analysis(%in: memref<16xf32>) {
   // CHECK: scf.for %[[IV:.*]] = %[[C0]]
   scf.for %i0 = %c0 to %c16 step %c4 {
   // CHECK: %[[I:.*]] = affine.apply #[[$MAP0]](%[[IV]])
-  // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1], offset: ?>>
+  // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1]>>
     "some_write_read"(%tmp) : (memref<4xf32>) ->()
   }
   return
@@ -284,7 +284,7 @@ func.func @multi_buffer_dealloc(%in: memref<16xf32>) {
   // CHECK: scf.for %[[IV:.*]] = %[[C0]]
   scf.for %i0 = %c0 to %c16 step %c4 {
   // CHECK: %[[I:.*]] = affine.apply #[[$MAP0]](%[[IV]])
-  // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1], offset: ?>>
+  // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1]>>
     "some_write_read"(%tmp) : (memref<4xf32>) ->()
   }
 
diff --git a/mlir/test/Dialect/MemRef/value-bounds-op-interface-impl.mlir b/mlir/test/Dialect/MemRef/value-bounds-op-interface-impl.mlir
index d0aec68d54988..0a8d1105521d3 100644
--- a/mlir/test/Dialect/MemRef/value-bounds-op-interface-impl.mlir
+++ b/mlir/test/Dialect/MemRef/value-bounds-op-interface-impl.mlir
@@ -123,7 +123,7 @@ func.func @memref_rank(%m: memref<5xf32>) -> index {
 //  CHECK-SAME:     %[[m:.*]]: memref<?xf32>, %[[sz:.*]]: index
 //       CHECK:   return %[[sz]]
 func.func @memref_subview(%m: memref<?xf32>, %sz: index) -> index {
-  %0 = memref.subview %m[2][%sz][1] : memref<?xf32> to memref<?xf32, strided<[1], offset: 2>>
-  %1 = "test.reify_bound"(%0) {dim = 0} : (memref<?xf32, strided<[1], offset: 2>>) -> (index)
+  %0 = memref.subview %m[2][%sz][1] : memref<?xf32> to memref<?xf32, strided<[1]>>
+  %1 = "test.reify_bound"(%0) {dim = 0} : (memref<?xf32, strided<[1]>>) -> (index)
   return %1 : index
 }
diff --git a/mlir/test/Dialect/OpenACC/ops.mlir b/mlir/test/Dialect/OpenACC/ops.mlir
index 2fb73e400001f..7f68a7d1a4652 100644
--- a/mlir/test/Dialect/OpenACC/ops.mlir
+++ b/mlir/test/Dialect/OpenACC/ops.mlir
@@ -2371,10 +2371,10 @@ acc.private.recipe @privatization_memref_slice : memref<10x10xf32> init {
   //   * result[3][4] -> slice_alloc[1][1] (because 3*10+4 + (-23) = 11)
   %adjusted_view = memref.reinterpret_cast %slice_alloc to
     offset: [%neg_offset], sizes: [10, 10], strides: [%c10, %c1]
-    : memref<?x?xf32> to memref<10x10xf32, strided<[?, ?], offset: ?>>
+    : memref<?x?xf32> to memref<10x10xf32, strided<[?, ?]>>
 
   // Cast to the expected return type
-  %result = memref.cast %adjusted_view : memref<10x10xf32, strided<[?, ?], offset: ?>> to memref<10x10xf32>
+  %result = memref.cast %adjusted_view : memref<10x10xf32, strided<[?, ?]>> to memref<10x10xf32>
 
   acc.yield %result : memref<10x10xf32>
 }
diff --git a/mlir/test/Dialect/SCF/one-shot-bufferize-encodings.mlir b/mlir/test/Dialect/SCF/one-shot-bufferize-encodings.mlir
index 6b6207395f14e..078c070e1da98 100644
--- a/mlir/test/Dialect/SCF/one-shot-bufferize-encodings.mlir
+++ b/mlir/test/Dialect/SCF/one-shot-bufferize-encodings.mlir
@@ -13,16 +13,16 @@ func.func @scf_for_iter_arg(%arg0: tensor<128xf32, 1>, %arg1: index, %arg2: inde
 
 // CHECK-LABEL: func.func @scf_for_iter_arg
 //  CHECK-SAME: (%[[arg0:.+]]: tensor<128xf32, 1 : i64>, %[[arg1:.+]]: index, %[[arg2:.+]]: index, %[[arg3:.+]]: index)
-//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?], offset: ?>, 1>
+//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?]>, 1>
 //       CHECK:     %[[alloc:.+]] = memref.alloc() {alignment = 64 : i64} : memref<128xf32, 1>
-//       CHECK:     memref.copy %[[v0]], %[[alloc]] : memref<128xf32, strided<[?], offset: ?>, 1> to memref<128xf32, 1>
-//       CHECK:     %[[cast:.+]] = memref.cast %[[alloc]] : memref<128xf32, 1> to memref<128xf32, strided<[?], offset: ?>, 1>
-//       CHECK:     %[[v1:.+]] = scf.for %{{.+}} = %[[arg1]] to %[[arg2]] step %[[arg3]] iter_args(%[[arg6:.+]] = %[[cast]]) -> (memref<128xf32, strided<[?], offset: ?>, 1>)
-//  CHECK-NEXT:       %[[v3:.+]] = bufferization.to_tensor %[[arg6]] : memref<128xf32, strided<[?], offset: ?>, 1> to tensor<128xf32, 1 : i64>
+//       CHECK:     memref.copy %[[v0]], %[[alloc]] : memref<128xf32, strided<[?]>, 1> to memref<128xf32, 1>
+//       CHECK:     %[[cast:.+]] = memref.cast %[[alloc]] : memref<128xf32, 1> to memref<128xf32, strided<[?]>, 1>
+//       CHECK:     %[[v1:.+]] = scf.for %{{.+}} = %[[arg1]] to %[[arg2]] step %[[arg3]] iter_args(%[[arg6:.+]] = %[[cast]]) -> (memref<128xf32, strided<[?]>, 1>)
+//  CHECK-NEXT:       %[[v3:.+]] = bufferization.to_tensor %[[arg6]] : memref<128xf32, strided<[?]>, 1> to tensor<128xf32, 1 : i64>
 //  CHECK-NEXT:       %[[v4:.+]] = "some.use"(%[[v3]]) : (tensor<128xf32, 1 : i64>) -> tensor<128xf32, 1 : i64>
-//  CHECK-NEXT:       %[[v5:.+]] = bufferization.to_buffer %[[v4]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?], offset: ?>, 1>
-//  CHECK-NEXT:       scf.yield %[[v5]] : memref<128xf32, strided<[?], offset: ?>, 1>
-//       CHECK:     %[[v2:.+]] = bufferization.to_tensor %[[v1]] : memref<128xf32, strided<[?], offset: ?>, 1> to tensor<128xf32, 1 : i64>
+//  CHECK-NEXT:       %[[v5:.+]] = bufferization.to_buffer %[[v4]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?]>, 1>
+//  CHECK-NEXT:       scf.yield %[[v5]] : memref<128xf32, strided<[?]>, 1>
+//       CHECK:     %[[v2:.+]] = bufferization.to_tensor %[[v1]] : memref<128xf32, strided<[?]>, 1> to tensor<128xf32, 1 : i64>
 //       CHECK:     return %[[v2]] : tensor<128xf32, 1 : i64>
 
 // -----
@@ -49,7 +49,7 @@ func.func @scf_forall(
 //       CHECK:     scf.forall
 //       CHECK:       %[[v2:.+]] = bufferization.to_tensor %{{.+}} : memref<?xf32, 1> to tensor<?xf32, 1 : i64>
 //       CHECK:       %[[v3:.+]] = "some.use"(%[[v2]]) : (tensor<?xf32, 1 : i64>) -> tensor<?xf32, 1 : i64>
-//       CHECK:       bufferization.to_buffer %[[v3]] : tensor<?xf32, 1 : i64> to memref<?xf32, strided<[?], offset: ?>, 1>
+//       CHECK:       bufferization.to_buffer %[[v3]] : tensor<?xf32, 1 : i64> to memref<?xf32, strided<[?]>, 1>
 //       CHECK:     %[[v1:.+]] = bufferization.to_tensor %{{.+}} : memref<?xf32, 1> to tensor<?xf32, 1 : i64>
 //       CHECK:     return %[[v1]] : tensor<?xf32, 1 : i64>
 
@@ -65,9 +65,9 @@ func.func @scf_execute_region(%arg0: tensor<128xf32, 1>) -> tensor<128xf32, 1> {
 
 // CHECK-LABEL: func.func @scf_execute_region
 //  CHECK-SAME: (%[[arg0:.+]]: tensor<128xf32, 1 : i64>)
-//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?], offset: ?>, 1>
-//       CHECK:     %[[v1:.+]] = scf.execute_region -> memref<128xf32, strided<[?], offset: ?>, 1>
-//       CHECK:       scf.yield %[[v0]] : memref<128xf32, strided<[?], offset: ?>, 1>
-//       CHECK:     %[[v2:.+]] = bufferization.to_tensor %[[v1]] : memref<128xf32, strided<[?], offset: ?>, 1> to tensor<128xf32, 1 : i64>
+//       CHECK:     %[[v0:.+]] = bufferization.to_buffer %[[arg0]] : tensor<128xf32, 1 : i64> to memref<128xf32, strided<[?]>, 1>
+//       CHECK:     %[[v1:.+]] = scf.execute_region -> memref<128xf32, strided<[?]>, 1>
+//       CHECK:       scf.yield %[[v0]] : memref<128xf32, strided<[?]>, 1>
+//       CHECK:     %[[v2:.+]] = bufferization.to_tensor %[[v1]] : memref<128xf32, strided<[?]>, 1> to tensor<128xf32, 1 : i64>
 //       CHECK:     %[[v3:.+]] = "some.use"(%[[v2]]) : (tensor<128xf32, 1 : i64>) -> tensor<128xf32, 1 : i64>
 //       CHECK:     return %[[v3]] : tensor<128xf32, 1 : i64>
diff --git a/mlir/test/Dialect/SCF/one-shot-bufferize.mlir b/mlir/test/Dialect/SCF/one-shot-bufferize.mlir
index b431a9e75c669..9a27961d6931f 100644
--- a/mlir/test/Dialect/SCF/one-shot-bufferize.mlir
+++ b/mlir/test/Dialect/SCF/one-shot-bufferize.mlir
@@ -9,8 +9,8 @@
 // RUN: mlir-opt %s -allow-unregistered-dialect -one-shot-bufferize="allow-return-allocs-from-loops unknown-type-conversion=identity-layout-map function-boundary-type-conversion=identity-layout-map bufferize-function-boundaries" -split-input-file -o /dev/null
 
 // CHECK-LABEL: func private @scf_for_yield_only(
-//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>,
-//  CHECK-SAME:   %[[t:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>,
+//  CHECK-SAME:   %[[t:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
 //  CHECK-SAME:   ) -> memref<?xf32> {
 func.func private @scf_for_yield_only(
     %A : tensor<?xf32> {bufferization.writable = false},
@@ -39,7 +39,7 @@ func.func private @scf_for_yield_only(
 // -----
 
 // CHECK-LABEL: func @scf_for_is_reading(
-//  CHECK-SAME:     %[[A:.*]]: memref<?xf32, strided<[?], offset: ?>>, %[[B:.*]]: memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     %[[A:.*]]: memref<?xf32, strided<[?]>>, %[[B:.*]]: memref<?xf32, strided<[?]>>
 func.func @scf_for_is_reading(%A : tensor<?xf32>, %B : tensor<?xf32>,
                               %lb : index, %ub : index)
   -> (f32, f32)
@@ -86,9 +86,9 @@ func.func @nested_scf_for(%A : tensor<?xf32> {bufferization.writable = true},
 // -----
 
 // CHECK-LABEL: func private @scf_for_with_tensor.insert_slice
-//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
-//  CHECK-SAME:   %[[B:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
-//  CHECK-SAME:   %[[C:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
+//  CHECK-SAME:   %[[B:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
+//  CHECK-SAME:   %[[C:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?]>>
 func.func private @scf_for_with_tensor.insert_slice(
     %A : tensor<?xf32> {bufferization.writable = false},
     %B : tensor<?xf32> {bufferization.writable = true},
@@ -575,7 +575,7 @@ func.func @matmul(%arg0: tensor<8x8xf32>, %arg1: tensor<8x8xf32>, %arg2: tensor<
     %6 = tensor.extract_slice %arg1[0, %4] [8, 4] [1, 1] : tensor<8x8xf32> to tensor<8x4xf32>
     %7 = tensor.extract_slice %o[%1, %4] [4, 4] [1, 1] : tensor<8x8xf32> to tensor<4x4xf32>
 
-    //      CHECK: linalg.matmul ins({{.*}}memref<4x8xf32, strided<[?, ?], offset: ?>>, memref<8x4xf32, strided<[?, ?], offset: ?>>) outs({{.*}} : memref<4x4xf32, strided<[?, ?], offset: ?>>)
+    //      CHECK: linalg.matmul ins({{.*}}memref<4x8xf32, strided<[?, ?]>>, memref<8x4xf32, strided<[?, ?]>>) outs({{.*}} : memref<4x4xf32, strided<[?, ?]>>)
     %8 = linalg.matmul ins(%3, %6 : tensor<4x8xf32>, tensor<8x4xf32>) outs(%7 : tensor<4x4xf32>) -> tensor<4x4xf32>
     scf.forall.in_parallel {
       tensor.parallel_insert_slice %8 into %o[%1, %4] [4, 4] [1, 1] : tensor<4x4xf32> into tensor<8x8xf32>
@@ -927,21 +927,21 @@ func.func @index_switch(%pred: index, %b: tensor<5xf32>, %c: tensor<5xf32>) -> t
   // CHECK: %[[a:.*]] = memref.alloc() {{.*}} : memref<5xf32>
   %a = bufferization.alloc_tensor() : tensor<5xf32>
 
-  // CHECK: %[[r:.*]] = scf.index_switch %[[pred]] -> memref<5xf32, strided<[?], offset: ?>>
+  // CHECK: %[[r:.*]] = scf.index_switch %[[pred]] -> memref<5xf32, strided<[?]>>
   %0 = scf.index_switch %pred -> tensor<5xf32>
   // CHECK: case 2 {
-  // CHECK:   %[[cast:.*]] = memref.cast %[[a]] : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>>
+  // CHECK:   %[[cast:.*]] = memref.cast %[[a]] : memref<5xf32> to memref<5xf32, strided<[?]>>
   // CHECK:   scf.yield %[[cast]]
   case 2 {
     scf.yield %a: tensor<5xf32>
   }
   // CHECK: case 5 {
-  // CHECK:   scf.yield %[[b]] : memref<5xf32, strided<[?], offset: ?>>
+  // CHECK:   scf.yield %[[b]] : memref<5xf32, strided<[?]>>
   case 5 {
     scf.yield %b: tensor<5xf32>
   }
   // CHECK: default {
-  // CHECK:   scf.yield %[[c]] : memref<5xf32, strided<[?], offset: ?>>
+  // CHECK:   scf.yield %[[c]] : memref<5xf32, strided<[?]>>
   default {
     scf.yield %c: tensor<5xf32>
   }
diff --git a/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir b/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
index d876062b704f2..df853a5a7d480 100644
--- a/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
+++ b/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir
@@ -325,8 +325,8 @@ func.func @do_not_fuse_loops_with_nonfull_alias_defined_in_loop_bodies() {
     scf.reduce
   }
   scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c1) step (%c1, %c1) {
-    %A = memref.subview %buffer[%i, %c0][2, 1][1, 1] : memref<2x2xf32> to memref<2x1xf32, strided<[2, 1], offset: ?>>
-    %A_elem = memref.load %A[%i, %j] : memref<2x1xf32, strided<[2, 1], offset: ?>>
+    %A = memref.subview %buffer[%i, %c0][2, 1][1, 1] : memref<2x2xf32> to memref<2x1xf32, strided<[2, 1]>>
+    %A_elem = memref.load %A[%i, %j] : memref<2x1xf32, strided<[2, 1]>>
     scf.reduce
   }
   return
@@ -648,10 +648,10 @@ func.func @do_not_fuse_nontrivial_subview_offset() {
     scf.reduce
   }
   %sub = memref.subview %buf[1, 0, 0][1, 2, 2][1, 1, 1]
-      : memref<2x2x2xf32> to memref<2x2xf32, strided<[2, 1], offset: 4>>
+      : memref<2x2x2xf32> to memref<2x2xf32, strided<[2, 1]>>
   scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
     %v = memref.load %sub[%i, %j]
-        : memref<2x2xf32, strided<[2, 1], offset: 4>>
+        : memref<2x2xf32, strided<[2, 1]>>
     memref.store %v, %buf[%c0, %i, %j] : memref<2x2x2xf32>
     scf.reduce
   }
@@ -802,10 +802,10 @@ func.func @do_not_fuse_vector_transfer_nontrivial_subview(%A: memref<2x4xf32>) {
     vector.transfer_write %v, %A[%c0, %i] {permutation_map = affine_map<(d0, d1) -> (d1)>, in_bounds = [true]} : vector<1xf32>, memref<2x4xf32>
     scf.reduce
   }
-    %sub = memref.subview %A[1, 0][1, 4][1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1], offset: 4>>
+    %sub = memref.subview %A[1, 0][1, 4][1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1]>>
   scf.parallel (%i) = (%c0) to (%c1) step (%c1) {
-    %v = vector.transfer_read %sub[%i], %zero {in_bounds = [true]} : memref<4xf32, strided<[1], offset: 4>>, vector<1xf32>
-    vector.transfer_write %v, %sub[%i] {in_bounds = [true]} : vector<1xf32>, memref<4xf32, strided<[1], offset: 4>>
+    %v = vector.transfer_read %sub[%i], %zero {in_bounds = [true]} : memref<4xf32, strided<[1]>>, vector<1xf32>
+    vector.transfer_write %v, %sub[%i] {in_bounds = [true]} : vector<1xf32>, memref<4xf32, strided<[1]>>
     scf.reduce
   }
   return
@@ -847,8 +847,8 @@ func.func @fuse_vector_transfer_subview_rank_reducing(%A: memref<1x4xf32>, %B: m
   %zero = arith.constant 0.0 : f32
   %vec = arith.constant dense<1.0> : vector<4xf32>
   scf.parallel (%i) = (%c0) to (%c1) step (%c1) {
-    %sub = memref.subview %A[%i, %c0][1, 4][1, 1] : memref<1x4xf32> to memref<4xf32, strided<[1], offset: ?>>
-    vector.transfer_write %vec, %sub[%c0] {permutation_map = affine_map<(d0) -> (d0)>, in_bounds = [true]} : vector<4xf32>, memref<4xf32, strided<[1], offset: ?>>
+    %sub = memref.subview %A[%i, %c0][1, 4][1, 1] : memref<1x4xf32> to memref<4xf32, strided<[1]>>
+    vector.transfer_write %vec, %sub[%c0] {permutation_map = affine_map<(d0) -> (d0)>, in_bounds = [true]} : vector<4xf32>, memref<4xf32, strided<[1]>>
     scf.reduce
   }
   scf.parallel (%i) = (%c0) to (%c1) step (%c1) {
@@ -877,8 +877,8 @@ func.func @do_not_fuse_vector_transfer_subview_offset(%A: memref<1x4xf32>, %B: m
   %zero = arith.constant 0.0 : f32
   %vec = arith.constant dense<1.0> : vector<4xf32>
   scf.parallel (%i) = (%c0) to (%c1) step (%c1) {
-    %sub = memref.subview %A[%i, %c0][1, 4][1, 1] : memref<1x4xf32> to memref<4xf32, strided<[1], offset: ?>>
-    vector.transfer_write %vec, %sub[%c0] {permutation_map = affine_map<(d0) -> (d0)>, in_bounds = [true]} : vector<4xf32>, memref<4xf32, strided<[1], offset: ?>>
+    %sub = memref.subview %A[%i, %c0][1, 4][1, 1] : memref<1x4xf32> to memref<4xf32, strided<[1]>>
+    vector.transfer_write %vec, %sub[%c0] {permutation_map = affine_map<(d0) -> (d0)>, in_bounds = [true]} : vector<4xf32>, memref<4xf32, strided<[1]>>
     scf.reduce
   }
   scf.parallel (%i) = (%c0) to (%c1) step (%c1) {
@@ -888,8 +888,8 @@ func.func @do_not_fuse_vector_transfer_subview_offset(%A: memref<1x4xf32>, %B: m
       scf.yield %n : f32
     }
     // Read from an offset alias to prevent fusion.
-    %off = memref.subview %A[%i, %c1][1, 3][1, 1] : memref<1x4xf32> to memref<3xf32, strided<[1], offset: ?>>
-    %v0 = memref.load %off[%c0] : memref<3xf32, strided<[1], offset: ?>>
+    %off = memref.subview %A[%i, %c1][1, 3][1, 1] : memref<1x4xf32> to memref<3xf32, strided<[1]>>
+    %v0 = memref.load %off[%c0] : memref<3xf32, strided<[1]>>
     %res = arith.addf %sum, %v0 : f32
     memref.store %res, %B[%i, %c0] : memref<1x4xf32>
     scf.reduce
diff --git a/mlir/test/Dialect/SCF/parallel-loop-unroll.mlir b/mlir/test/Dialect/SCF/parallel-loop-unroll.mlir
index 12b502e996c60..46829ac2605af 100644
--- a/mlir/test/Dialect/SCF/parallel-loop-unroll.mlir
+++ b/mlir/test/Dialect/SCF/parallel-loop-unroll.mlir
@@ -67,9 +67,9 @@ func.func @unroll_outer_nested_parallel_loop(%src: memref<5x16x12x4x4xf32>, %dst
     scf.parallel (%arg6, %arg7) = (%c0, %c0) to (%c4, %c4) step (%c1, %c1) {
       %0 = affine.apply affine_map<(d0, d1) -> (d0 + (d1 floordiv 4) * 4)>(%arg4, %arg6)
       %1 = affine.apply affine_map<(d0, d1) -> (d0 + (d1 floordiv 4) * 4)>(%arg5, %arg7)
-      %subv_in = memref.subview %src[%arg3, %0, %1, 0, 0] [1, 1, 1, 4, 4] [1, 1, 1, 1, 1] : memref<5x16x12x4x4xf32> to memref<4x4xf32, strided<[4, 1], offset: ?>>
-      %subv_out = memref.subview %dst[%arg3, %0, %1, 0, 0] [1, 1, 1, 4, 4] [1, 1, 1, 1, 1] : memref<5x16x12x4x4xf32> to memref<4x4xf32, strided<[4, 1], offset: ?>>
-      linalg.erf ins(%subv_in : memref<4x4xf32, strided<[4, 1], offset: ?>>) outs(%subv_out : memref<4x4xf32, strided<[4, 1], offset: ?>>)
+      %subv_in = memref.subview %src[%arg3, %0, %1, 0, 0] [1, 1, 1, 4, 4] [1, 1, 1, 1, 1] : memref<5x16x12x4x4xf32> to memref<4x4xf32, strided<[4, 1]>>
+      %subv_out = memref.subview %dst[%arg3, %0, %1, 0, 0] [1, 1, 1, 4, 4] [1, 1, 1, 1, 1] : memref<5x16x12x4x4xf32> to memref<4x4xf32, strided<[4, 1]>>
+      linalg.erf ins(%subv_in : memref<4x4xf32, strided<[4, 1]>>) outs(%subv_out : memref<4x4xf32, strided<[4, 1]>>)
       scf.reduce
     }
     scf.reduce
@@ -139,9 +139,9 @@ func.func @unroll_inner_nested_parallel_loop(%src: memref<5x16x12x4x4xf32>, %dst
     scf.parallel (%arg6, %arg7) = (%c0, %c0) to (%c4, %c4) step (%c1, %c1) {
       %0 = affine.apply affine_map<(d0, d1) -> (d0 + (d1 floordiv 4) * 4)>(%arg4, %arg6)
       %1 = affine.apply affine_map<(d0, d1) -> (d0 + (d1 floordiv 4) * 4)>(%arg5, %arg7)
-      %subv_in = memref.subview %src[%arg3, %0, %1, 0, 0] [1, 1, 1, 4, 4] [1, 1, 1, 1, 1] : memref<5x16x12x4x4xf32> to memref<4x4xf32, strided<[4, 1], offset: ?>>
-      %subv_out = memref.subview %dst[%arg3, %0, %1, 0, 0] [1, 1, 1, 4, 4] [1, 1, 1, 1, 1] : memref<5x16x12x4x4xf32> to memref<4x4xf32, strided<[4, 1], offset: ?>>
-      linalg.erf ins(%subv_in : memref<4x4xf32, strided<[4, 1], offset: ?>>) outs(%subv_out : memref<4x4xf32, strided<[4, 1], offset: ?>>)
+      %subv_in = memref.subview %src[%arg3, %0, %1, 0, 0] [1, 1, 1, 4, 4] [1, 1, 1, 1, 1] : memref<5x16x12x4x4xf32> to memref<4x4xf32, strided<[4, 1]>>
+      %subv_out = memref.subview %dst[%arg3, %0, %1, 0, 0] [1, 1, 1, 4, 4] [1, 1, 1, 1, 1] : memref<5x16x12x4x4xf32> to memref<4x4xf32, strided<[4, 1]>>
+      linalg.erf ins(%subv_in : memref<4x4xf32, strided<[4, 1]>>) outs(%subv_out : memref<4x4xf32, strided<[4, 1]>>)
       scf.reduce
     }
     scf.reduce
diff --git a/mlir/test/Dialect/SparseTensor/GPU/gpu_matvec_lib.mlir b/mlir/test/Dialect/SparseTensor/GPU/gpu_matvec_lib.mlir
index dea71fa03c777..a58160bf889c8 100644
--- a/mlir/test/Dialect/SparseTensor/GPU/gpu_matvec_lib.mlir
+++ b/mlir/test/Dialect/SparseTensor/GPU/gpu_matvec_lib.mlir
@@ -15,17 +15,17 @@ module {
 // CHECK-DAG:       %[[VAL_5:.*]] = sparse_tensor.number_of_entries %[[VAL_0]] : tensor<?x?xf64, #sparse{{[0-9]*}}>
 // CHECK-DAG:       %[[VAL_6:.*]] = tensor.dim %[[VAL_0]], %[[VAL_3]] : tensor<?x?xf64, #sparse{{[0-9]*}}>
 // CHECK-DAG:       %[[VAL_7:.*]] = tensor.dim %[[VAL_0]], %[[VAL_4]] : tensor<?x?xf64, #sparse{{[0-9]*}}>
-// CHECK-DAG:       %[[VAL_8:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 0 : index} : tensor<?x?xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?], offset: ?>>
-// CHECK-DAG:       %[[VAL_9:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 1 : index} : tensor<?x?xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?], offset: ?>>
+// CHECK-DAG:       %[[VAL_8:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 0 : index} : tensor<?x?xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?]>>
+// CHECK-DAG:       %[[VAL_9:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 1 : index} : tensor<?x?xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?]>>
 // CHECK-DAG:       %[[VAL_10:.*]] = sparse_tensor.values %[[VAL_0]] : tensor<?x?xf64, #sparse{{[0-9]*}}> to memref<?xf64>
 // CHECK:           %[[VAL_11:.*]] = gpu.wait async
-// CHECK:           %[[VAL_12:.*]] = memref.dim %[[VAL_8]], %[[VAL_3]] : memref<?xindex, strided<[?], offset: ?>>
+// CHECK:           %[[VAL_12:.*]] = memref.dim %[[VAL_8]], %[[VAL_3]] : memref<?xindex, strided<[?]>>
 // CHECK:           %[[VAL_13:.*]], %[[VAL_14:.*]] = gpu.alloc async {{\[}}%[[VAL_11]]] (%[[VAL_12]]) : memref<?xindex>
-// CHECK:           %[[VAL_15:.*]] = gpu.memcpy async {{\[}}%[[VAL_14]]] %[[VAL_13]], %[[VAL_8]] : memref<?xindex>, memref<?xindex, strided<[?], offset: ?>>
+// CHECK:           %[[VAL_15:.*]] = gpu.memcpy async {{\[}}%[[VAL_14]]] %[[VAL_13]], %[[VAL_8]] : memref<?xindex>, memref<?xindex, strided<[?]>>
 // CHECK:           %[[VAL_16:.*]] = gpu.wait async
-// CHECK:           %[[VAL_17:.*]] = memref.dim %[[VAL_9]], %[[VAL_3]] : memref<?xindex, strided<[?], offset: ?>>
+// CHECK:           %[[VAL_17:.*]] = memref.dim %[[VAL_9]], %[[VAL_3]] : memref<?xindex, strided<[?]>>
 // CHECK:           %[[VAL_18:.*]], %[[VAL_19:.*]] = gpu.alloc async {{\[}}%[[VAL_16]]] (%[[VAL_17]]) : memref<?xindex>
-// CHECK:           %[[VAL_20:.*]] = gpu.memcpy async {{\[}}%[[VAL_19]]] %[[VAL_18]], %[[VAL_9]] : memref<?xindex>, memref<?xindex, strided<[?], offset: ?>>
+// CHECK:           %[[VAL_20:.*]] = gpu.memcpy async {{\[}}%[[VAL_19]]] %[[VAL_18]], %[[VAL_9]] : memref<?xindex>, memref<?xindex, strided<[?]>>
 // CHECK:           %[[VAL_21:.*]] = gpu.wait async
 // CHECK:           %[[VAL_22:.*]] = memref.dim %[[VAL_10]], %[[VAL_3]] : memref<?xf64>
 // CHECK:           %[[VAL_23:.*]], %[[VAL_24:.*]] = gpu.alloc async {{\[}}%[[VAL_21]]] (%[[VAL_22]]) : memref<?xf64>
diff --git a/mlir/test/Dialect/SparseTensor/codegen.mlir b/mlir/test/Dialect/SparseTensor/codegen.mlir
index af78458f10932..efb0ec6ca1b70 100644
--- a/mlir/test/Dialect/SparseTensor/codegen.mlir
+++ b/mlir/test/Dialect/SparseTensor/codegen.mlir
@@ -330,11 +330,11 @@ func.func @sparse_values_coo(%arg0: tensor<?x?x?xf64, #ccoo>) -> memref<?xf64> {
 //       CHECK: %[[S0:.*]] = sparse_tensor.storage_specifier.get %[[A5]] crd_mem_sz at 1
 //       CHECK: %[[S2:.*]] = arith.divui %[[S0]], %[[C2]] : index
 //       CHECK: %[[R1:.*]] = memref.subview %[[A3]][0] {{\[}}%[[S2]]] [2] : memref<?xindex> to memref<?xindex, strided<[2]>>
-//       CHECK: %[[R2:.*]] = memref.cast %[[R1]] : memref<?xindex, strided<[2]>> to memref<?xindex, strided<[?], offset: ?>>
-//       CHECK: return %[[R2]] : memref<?xindex, strided<[?], offset: ?>>
-func.func @sparse_indices_coo(%arg0: tensor<?x?x?xf64, #ccoo>) -> memref<?xindex, strided<[?], offset: ?>> {
-  %0 = sparse_tensor.coordinates  %arg0 { level = 1 : index } : tensor<?x?x?xf64, #ccoo> to memref<?xindex, strided<[?], offset: ?>>
-  return %0 : memref<?xindex, strided<[?], offset: ?>>
+//       CHECK: %[[R2:.*]] = memref.cast %[[R1]] : memref<?xindex, strided<[2]>> to memref<?xindex, strided<[?]>>
+//       CHECK: return %[[R2]] : memref<?xindex, strided<[?]>>
+func.func @sparse_indices_coo(%arg0: tensor<?x?x?xf64, #ccoo>) -> memref<?xindex, strided<[?]>> {
+  %0 = sparse_tensor.coordinates  %arg0 { level = 1 : index } : tensor<?x?x?xf64, #ccoo> to memref<?xindex, strided<[?]>>
+  return %0 : memref<?xindex, strided<[?]>>
 }
 
 // CHECK-LABEL: func.func @sparse_indices_buffer_coo(
diff --git a/mlir/test/Dialect/SparseTensor/sorted_coo.mlir b/mlir/test/Dialect/SparseTensor/sorted_coo.mlir
index 81d300e851ec1..dbae74502924d 100644
--- a/mlir/test/Dialect/SparseTensor/sorted_coo.mlir
+++ b/mlir/test/Dialect/SparseTensor/sorted_coo.mlir
@@ -44,7 +44,7 @@
 // C_HECK-DAG:       %[[VAL_3:.*]] = arith.constant 1 : index
 // C_HECK-DAG:       %[[VAL_4:.*]] = arith.constant 2.000000e+00 : f32
 // C_HECK-DAG:       %[[VAL_5:.*]] = sparse_tensor.positions %[[VAL_0]] {level = 0 : index} : tensor<?x?xf32, #sparse{{[0-9]*}}> to memref<?xindex>
-// C_HECK-DAG:       %[[VAL_6:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 0 : index} : tensor<?x?xf32, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?], offset: ?>>
+// C_HECK-DAG:       %[[VAL_6:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 0 : index} : tensor<?x?xf32, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?]>>
 // C_HECK-DAG:       %[[VAL_7:.*]] = sparse_tensor.values %[[VAL_0]] : tensor<?x?xf32, #sparse{{[0-9]*}}> to memref<?xf32>
 // C_HECK-DAG:       %[[VAL_8:.*]] = memref.load %[[VAL_5]]{{\[}}%[[VAL_2]]] : memref<?xindex>
 // C_HECK-DAG:       %[[VAL_9:.*]] = memref.load %[[VAL_5]]{{\[}}%[[VAL_3]]] : memref<?xindex>
@@ -53,11 +53,11 @@
 // C_HECK:             scf.condition(%[[VAL_12]]) %[[VAL_11]] : index
 // C_HECK:           } do {
 // C_HECK:           ^bb0(%[[VAL_13:.*]]: index):
-// C_HECK:             %[[VAL_14:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_13]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:             %[[VAL_14:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_13]]] : memref<?xindex, strided<[?]>>
 // C_HECK:             %[[VAL_15:.*]] = scf.while (%[[VAL_16:.*]] = %[[VAL_13]]) : (index) -> index {
 // C_HECK:               %[[VAL_17:.*]] = arith.cmpi ult, %[[VAL_16]], %[[VAL_9]] : index
 // C_HECK:               %[[VAL_18:.*]] = scf.if %[[VAL_17]] -> (i1) {
-// C_HECK:                 %[[VAL_19:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_16]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:                 %[[VAL_19:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_16]]] : memref<?xindex, strided<[?]>>
 // C_HECK:                 %[[VAL_20:.*]] = arith.cmpi eq, %[[VAL_19]], %[[VAL_14]] : index
 // C_HECK:                 scf.yield %[[VAL_20]] : i1
 // C_HECK:               } else {
@@ -98,8 +98,8 @@ func.func @sparse_scale(%argx: tensor<?x?xf32, #SortedCOO>) -> tensor<?x?xf32, #
 // C_HECK-DAG:       %[[VAL_4:.*]] = arith.constant 0 : index
 // C_HECK-DAG:       %[[VAL_5:.*]] = arith.constant 1 : index
 // C_HECK-DAG:       %[[VAL_6:.*]] = sparse_tensor.positions %[[VAL_0]] {level = 0 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex>
-// C_HECK-DAG:       %[[VAL_7:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 0 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?], offset: ?>>
-// C_HECK-DAG:       %[[VAL_8:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 1 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?], offset: ?>>
+// C_HECK-DAG:       %[[VAL_7:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 0 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?]>>
+// C_HECK-DAG:       %[[VAL_8:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 1 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?]>>
 // C_HECK-DAG:       %[[VAL_9:.*]] = sparse_tensor.values %[[VAL_0]] : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xf64>
 // C_HECK:           %[[VAL_10:.*]] = bufferization.to_buffer %[[VAL_2]] : tensor<32xf64> to memref<32xf64>
 // C_HECK:           %[[VAL_11:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_4]]] : memref<?xindex>
@@ -109,12 +109,12 @@ func.func @sparse_scale(%argx: tensor<?x?xf32, #SortedCOO>) -> tensor<?x?xf32, #
 // C_HECK:             scf.condition(%[[VAL_15]]) %[[VAL_14]] : index
 // C_HECK:           } do {
 // C_HECK:           ^bb0(%[[VAL_16:.*]]: index):
-// C_HECK:             %[[VAL_17:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_16]]] : memref<?xindex, strided<[?], offset: ?>>
-// C_HECK:             %[[VAL_18:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_16]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:             %[[VAL_17:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_16]]] : memref<?xindex, strided<[?]>>
+// C_HECK:             %[[VAL_18:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_16]]] : memref<?xindex, strided<[?]>>
 // C_HECK:             %[[VAL_19:.*]] = scf.while (%[[VAL_20:.*]] = %[[VAL_16]]) : (index) -> index {
 // C_HECK:               %[[VAL_21:.*]] = arith.cmpi ult, %[[VAL_20]], %[[VAL_12]] : index
 // C_HECK:               %[[VAL_22:.*]] = scf.if %[[VAL_21]] -> (i1) {
-// C_HECK:                 %[[VAL_23:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_20]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:                 %[[VAL_23:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_20]]] : memref<?xindex, strided<[?]>>
 // C_HECK:                 %[[VAL_24:.*]] = arith.cmpi eq, %[[VAL_23]], %[[VAL_18]] : index
 // C_HECK:                 scf.yield %[[VAL_24]] : i1
 // C_HECK:               } else {
@@ -128,7 +128,7 @@ func.func @sparse_scale(%argx: tensor<?x?xf32, #SortedCOO>) -> tensor<?x?xf32, #
 // C_HECK:             }
 // C_HECK:             %[[VAL_28:.*]] = tensor.extract %[[VAL_2]]{{\[}}%[[VAL_17]]] : tensor<32xf64>
 // C_HECK:             %[[VAL_29:.*]] = scf.for %[[VAL_30:.*]] = %[[VAL_16]] to %[[VAL_31:.*]] step %[[VAL_5]] iter_args(%[[VAL_32:.*]] = %[[VAL_28]]) -> (f64) {
-// C_HECK:               %[[VAL_33:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_30]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:               %[[VAL_33:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_30]]] : memref<?xindex, strided<[?]>>
 // C_HECK:               %[[VAL_34:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_30]]] : memref<?xf64>
 // C_HECK:               %[[VAL_35:.*]] = tensor.extract %[[VAL_1]]{{\[}}%[[VAL_33]]] : tensor<64xf64>
 // C_HECK:               %[[VAL_36:.*]] = arith.mulf %[[VAL_34]], %[[VAL_35]] : f64
@@ -163,12 +163,12 @@ func.func @matvec(%arga: tensor<32x64xf64, #SortedCOO>,
 // C_HECK-DAG:       %[[VAL_5:.*]] = arith.constant 0 : index
 // C_HECK-DAG:       %[[VAL_6:.*]] = arith.constant 1 : index
 // C_HECK-DAG:       %[[VAL_7:.*]] = sparse_tensor.positions %[[VAL_0]] {level = 0 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex>
-// C_HECK-DAG:       %[[VAL_8:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 0 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?], offset: ?>>
-// C_HECK-DAG:       %[[VAL_9:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 1 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?], offset: ?>>
+// C_HECK-DAG:       %[[VAL_8:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 0 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?]>>
+// C_HECK-DAG:       %[[VAL_9:.*]] = sparse_tensor.coordinates %[[VAL_0]] {level = 1 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?]>>
 // C_HECK-DAG:       %[[VAL_10:.*]] = sparse_tensor.values %[[VAL_0]] : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xf64>
 // C_HECK-DAG:       %[[VAL_11:.*]] = sparse_tensor.positions %[[VAL_1]] {level = 0 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex>
-// C_HECK-DAG:       %[[VAL_12:.*]] = sparse_tensor.coordinates %[[VAL_1]] {level = 0 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?], offset: ?>>
-// C_HECK-DAG:       %[[VAL_13:.*]] = sparse_tensor.coordinates %[[VAL_1]] {level = 1 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?], offset: ?>>
+// C_HECK-DAG:       %[[VAL_12:.*]] = sparse_tensor.coordinates %[[VAL_1]] {level = 0 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?]>>
+// C_HECK-DAG:       %[[VAL_13:.*]] = sparse_tensor.coordinates %[[VAL_1]] {level = 1 : index} : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xindex, strided<[?]>>
 // C_HECK-DAG:       %[[VAL_14:.*]] = sparse_tensor.values %[[VAL_1]] : tensor<32x64xf64, #sparse{{[0-9]*}}> to memref<?xf64>
 // C_HECK:           %[[VAL_15:.*]] = bufferization.to_buffer %[[VAL_2]] : tensor<32x64xf64> to memref<32x64xf64>
 // C_HECK:           linalg.fill ins(%[[VAL_4]] : f64) outs(%[[VAL_15]] : memref<32x64xf64>)
@@ -183,13 +183,13 @@ func.func @matvec(%arga: tensor<32x64xf64, #SortedCOO>,
 // C_HECK:             scf.condition(%[[VAL_25]]) %[[VAL_21]], %[[VAL_22]] : index, index
 // C_HECK:           } do {
 // C_HECK:           ^bb0(%[[VAL_26:.*]]: index, %[[VAL_27:.*]]: index):
-// C_HECK:             %[[VAL_28:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_26]]] : memref<?xindex, strided<[?], offset: ?>>
-// C_HECK:             %[[VAL_29:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_27]]] : memref<?xindex, strided<[?], offset: ?>>
-// C_HECK:             %[[VAL_32:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_26]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:             %[[VAL_28:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_26]]] : memref<?xindex, strided<[?]>>
+// C_HECK:             %[[VAL_29:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_27]]] : memref<?xindex, strided<[?]>>
+// C_HECK:             %[[VAL_32:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_26]]] : memref<?xindex, strided<[?]>>
 // C_HECK:             %[[VAL_33:.*]] = scf.while (%[[VAL_34:.*]] = %[[VAL_26]]) : (index) -> index {
 // C_HECK:               %[[VAL_35:.*]] = arith.cmpi ult, %[[VAL_34]], %[[VAL_17]] : index
 // C_HECK:               %[[VAL_36:.*]] = scf.if %[[VAL_35]] -> (i1) {
-// C_HECK:                 %[[VAL_37:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_34]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:                 %[[VAL_37:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_34]]] : memref<?xindex, strided<[?]>>
 // C_HECK:                 %[[VAL_38:.*]] = arith.cmpi eq, %[[VAL_37]], %[[VAL_32]] : index
 // C_HECK:                 scf.yield %[[VAL_38]] : i1
 // C_HECK:               } else {
@@ -201,11 +201,11 @@ func.func @matvec(%arga: tensor<32x64xf64, #SortedCOO>,
 // C_HECK:               %[[VAL_41:.*]] = arith.addi %[[VAL_40]], %[[VAL_6]] : index
 // C_HECK:               scf.yield %[[VAL_41]] : index
 // C_HECK:             }
-// C_HECK:             %[[VAL_42:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_27]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:             %[[VAL_42:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_27]]] : memref<?xindex, strided<[?]>>
 // C_HECK:             %[[VAL_43:.*]] = scf.while (%[[VAL_44:.*]] = %[[VAL_27]]) : (index) -> index {
 // C_HECK:               %[[VAL_45:.*]] = arith.cmpi ult, %[[VAL_44]], %[[VAL_19]] : index
 // C_HECK:               %[[VAL_46:.*]] = scf.if %[[VAL_45]] -> (i1) {
-// C_HECK:                 %[[VAL_47:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_44]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:                 %[[VAL_47:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_44]]] : memref<?xindex, strided<[?]>>
 // C_HECK:                 %[[VAL_48:.*]] = arith.cmpi eq, %[[VAL_47]], %[[VAL_42]] : index
 // C_HECK:                 scf.yield %[[VAL_48]] : i1
 // C_HECK:               } else {
@@ -230,8 +230,8 @@ func.func @matvec(%arga: tensor<32x64xf64, #SortedCOO>,
 // C_HECK:                 scf.condition(%[[VAL_62]]) %[[VAL_56]], %[[VAL_57]] : index, index
 // C_HECK:               } do {
 // C_HECK:               ^bb0(%[[VAL_63:.*]]: index, %[[VAL_64:.*]]: index):
-// C_HECK:                 %[[VAL_65:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_63]]] : memref<?xindex, strided<[?], offset: ?>>
-// C_HECK:                 %[[VAL_66:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_64]]] : memref<?xindex, strided<[?], offset: ?>>
+// C_HECK:                 %[[VAL_65:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_63]]] : memref<?xindex, strided<[?]>>
+// C_HECK:                 %[[VAL_66:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_64]]] : memref<?xindex, strided<[?]>>
 // C_HECK:                 %[[VAL_67:.*]] = arith.cmpi ult, %[[VAL_66]], %[[VAL_65]] : index
 // C_HECK:                 %[[VAL_68:.*]] = arith.select %[[VAL_67]], %[[VAL_66]], %[[VAL_65]] : index
 // C_HECK:                 %[[VAL_69:.*]] = arith.cmpi eq, %[[VAL_65]], %[[VAL_68]] : index
diff --git a/mlir/test/Dialect/Tensor/bufferize.mlir b/mlir/test/Dialect/Tensor/bufferize.mlir
index be8ce20d8f154..8b9fa9b3a645d 100644
--- a/mlir/test/Dialect/Tensor/bufferize.mlir
+++ b/mlir/test/Dialect/Tensor/bufferize.mlir
@@ -40,8 +40,8 @@ func.func @tensor.cast(%arg0: tensor<?xindex>) -> tensor<2xindex> {
 // CHECK-LABEL:   func @tensor.cast_from_unranked(
 // CHECK-SAME:                                    %[[TENSOR:.*]]: tensor<*xf32>) -> tensor<2xf32> {
 // CHECK:           %[[MEMREF:.*]] = bufferization.to_buffer %[[TENSOR]] : tensor<*xf32> to memref<*xf32>
-// CHECK:           %[[CASTED_MEMREF:.*]] = memref.cast %[[MEMREF]] : memref<*xf32> to memref<2xf32, strided<[?], offset: ?>>
-// CHECK:           %[[RET:.*]] = bufferization.to_tensor %[[CASTED_MEMREF]] : memref<2xf32, strided<[?], offset: ?>>
+// CHECK:           %[[CASTED_MEMREF:.*]] = memref.cast %[[MEMREF]] : memref<*xf32> to memref<2xf32, strided<[?]>>
+// CHECK:           %[[RET:.*]] = bufferization.to_tensor %[[CASTED_MEMREF]] : memref<2xf32, strided<[?]>>
 // CHECK:           return %[[RET]] : tensor<2xf32>
 func.func @tensor.cast_from_unranked(%arg0: tensor<*xf32>) -> tensor<2xf32> {
   %0 = tensor.cast %arg0 : tensor<*xf32> to tensor<2xf32>
@@ -267,7 +267,7 @@ func.func @tensor.generate_unknown_ops_in_body(%arg0: index) -> tensor<?xindex>
 func.func @tensor.extract_slice(
     %t1: tensor<?x?xf32>, %idx1: index, %idx2: index) -> tensor<?x10xf32> {
   // CHECK: %[[m:.*]] = bufferization.to_buffer %[[t1]] : tensor<?x?xf32> to memref<?x?xf32>
-  // CHECK: %[[r:.*]] = memref.subview %[[m]][5, %[[idx2]]] [%[[idx1]], 10] [1, 1] : memref<?x?xf32> to memref<?x10xf32, strided<[?, 1], offset: ?>>
+  // CHECK: %[[r:.*]] = memref.subview %[[m]][5, %[[idx2]]] [%[[idx1]], 10] [1, 1] : memref<?x?xf32> to memref<?x10xf32, strided<[?, 1]>>
   %0 = tensor.extract_slice %t1[5, %idx2][%idx1, 10][1, 1]
       : tensor<?x?xf32> to tensor<?x10xf32>
   // CHECK: %[[r_tensor:.*]] = bufferization.to_tensor %[[r]]
@@ -283,7 +283,7 @@ func.func @tensor.extract_slice(
 func.func @tensor.extract_slice_rank_reducing(
     %t1: tensor<?x10x?xf32>, %idx1: index, %idx2: index) -> tensor<?x15xf32> {
   // CHECK: %[[m1:.*]] = bufferization.to_buffer %[[t1]] : tensor<?x10x?xf32> to memref<?x10x?xf32>
-  // CHECK: %[[r:.*]] = memref.subview %[[m1]][5, %[[idx1]], 10] [%[[idx2]], 1, 15] [1, 1, 1] : memref<?x10x?xf32> to memref<?x15xf32, strided<[?, 1], offset: ?>>
+  // CHECK: %[[r:.*]] = memref.subview %[[m1]][5, %[[idx1]], 10] [%[[idx2]], 1, 15] [1, 1, 1] : memref<?x10x?xf32> to memref<?x15xf32, strided<[?, 1]>>
   %0 = tensor.extract_slice %t1[5, %idx1, 10][%idx2, 1, 15][1, 1, 1]
       : tensor<?x10x?xf32> to tensor<?x15xf32>
   // CHECK: %[[r_tensor:.*]] = bufferization.to_tensor %[[r]]
@@ -324,8 +324,8 @@ func.func @tensor.insert_slice_rank_reducing_1(
   -> tensor<?x?xf32>
 {
   // CHECK: %[[alloc:.*]] = memref.alloc{{.*}} : memref<?x?xf32>
-  // CHECK: memref.subview %[[alloc]][%{{.*}}, %{{.*}}] [1, 1] [1, 1] : memref<?x?xf32> to memref<f32, strided<[], offset: ?>>
-  // CHECK: memref.copy {{.*}} : memref<f32> to memref<f32, strided<[], offset: ?>>
+  // CHECK: memref.subview %[[alloc]][%{{.*}}, %{{.*}}] [1, 1] [1, 1] : memref<?x?xf32> to memref<f32, strided<[]>>
+  // CHECK: memref.copy {{.*}} : memref<f32> to memref<f32, strided<[]>>
   %0 = tensor.insert_slice %f into %t1[%idx1, %idx2][1, 1][1, 1]
       : tensor<f32> into tensor<?x?xf32>
   return %0 : tensor<?x?xf32>
@@ -339,8 +339,8 @@ func.func @tensor.insert_slice_rank_reducing_2(
   -> tensor<?x?x?x?x?x?x?xf32>
 {
   // CHECK: %[[alloc:.*]] = memref.alloc{{.*}} : memref<?x?x?x?x?x?x?xf32>
-  // CHECK: memref.subview %[[alloc]][{{.*}}] [1, 2, 1, 4, 1, 1, 1] [1, 1, 1, 1, 1, 1, 1] : memref<?x?x?x?x?x?x?xf32> to memref<2x1x4x1x1xf32, strided<[?, ?, ?, ?, ?], offset: ?>>
-  // CHECK: memref.copy {{.*}} : memref<2x1x4x1x1xf32> to memref<2x1x4x1x1xf32, strided<[?, ?, ?, ?, ?], offset: ?>>
+  // CHECK: memref.subview %[[alloc]][{{.*}}] [1, 2, 1, 4, 1, 1, 1] [1, 1, 1, 1, 1, 1, 1] : memref<?x?x?x?x?x?x?xf32> to memref<2x1x4x1x1xf32, strided<[?, ?, ?, ?, ?]>>
+  // CHECK: memref.copy {{.*}} : memref<2x1x4x1x1xf32> to memref<2x1x4x1x1xf32, strided<[?, ?, ?, ?, ?]>>
   %0 = tensor.insert_slice %t2 into %t1[%i, %i, %i, %i, %i, %i, %i][1, 2, 1, 4, 1, 1, 1][1, 1, 1, 1, 1, 1, 1]
       : tensor<2x1x4x1x1xf32> into tensor<?x?x?x?x?x?x?xf32>
   return %0 : tensor<?x?x?x?x?x?x?xf32>
@@ -385,10 +385,10 @@ func.func @tensor.expand_shape(%t1: tensor<?x10xf32>, %sz0: index) -> tensor<2x?
 func.func @tensor.expand_shape_of_slice(
     %t1: tensor<?x20xf32>, %o1: index, %s1: index, %sz0: index) -> tensor<?x7x2x5xf32> {
   // CHECK: %[[m1:.*]] = bufferization.to_buffer %[[t1]] :
-  // CHECK: %[[subview:.*]] = memref.subview %[[m1]][%{{.*}}, 5] [%{{.*}}, 10] [1, 1] : memref<?x20xf32> to memref<?x10xf32, strided<[20, 1], offset: ?>>
+  // CHECK: %[[subview:.*]] = memref.subview %[[m1]][%{{.*}}, 5] [%{{.*}}, 10] [1, 1] : memref<?x20xf32> to memref<?x10xf32, strided<[20, 1]>>
   %0 = tensor.extract_slice %t1[%o1, 5][%s1, 10][1, 1] :
       tensor<?x20xf32> to tensor<?x10xf32>
-  // CHECK: %[[expanded:.*]] = memref.expand_shape %[[subview]] {{\[\[}}0, 1], [2, 3]] output_shape [%[[sz0]], 7, 2, 5] : memref<?x10xf32, strided<[20, 1], offset: ?>> into memref<?x7x2x5xf32, strided<[140, 20, 5, 1], offset: ?>>
+  // CHECK: %[[expanded:.*]] = memref.expand_shape %[[subview]] {{\[\[}}0, 1], [2, 3]] output_shape [%[[sz0]], 7, 2, 5] : memref<?x10xf32, strided<[20, 1]>> into memref<?x7x2x5xf32, strided<[140, 20, 5, 1]>>
   %1 = tensor.expand_shape %0 [[0, 1], [2, 3]] output_shape [%sz0, 7, 2, 5] :
       tensor<?x10xf32> into tensor<?x7x2x5xf32>
   // CHECK: %[[r:.*]] = bufferization.to_tensor %[[expanded]]
@@ -402,9 +402,9 @@ func.func @tensor.expand_shape_of_slice(
 func.func @tensor.expand_shape_of_scalar_slice(
     %t1: tensor<?xf32>, %o1: index, %s1: index) -> tensor<1xf32> {
   // CHECK: %[[m1:.*]] = bufferization.to_buffer %[[t1]] : tensor<?xf32> to memref<?xf32>
-  // CHECK: %[[subview:.*]] = memref.subview %[[m1]][%{{.*}}] [1] [1] :  memref<?xf32> to memref<f32, strided<[], offset: ?>>
+  // CHECK: %[[subview:.*]] = memref.subview %[[m1]][%{{.*}}] [1] [1] :  memref<?xf32> to memref<f32, strided<[]>>
   %0 = tensor.extract_slice %t1[%o1][1][1] : tensor<?xf32> to tensor<f32>
-  // CHECK: %[[expanded:.*]] = memref.expand_shape %[[subview]] [] output_shape [1] : memref<f32, strided{{.*}}> into memref<1xf32, strided<[1], offset: ?>>
+  // CHECK: %[[expanded:.*]] = memref.expand_shape %[[subview]] [] output_shape [1] : memref<f32, strided{{.*}}> into memref<1xf32, strided<[1]>>
   %1 = tensor.expand_shape %0 [] output_shape [1] : tensor<f32> into tensor<1xf32>
   // CHECK: %[[r:.*]] = bufferization.to_tensor %[[expanded]]
   // CHECK: return %[[r]]
@@ -459,9 +459,9 @@ func.func @tensor.collapse_shape_to_scalar(%t1: tensor<1x1x1xf32>) -> tensor<f32
 
 // CHECK-LABEL: func @tensor.collapse_shape_of_slice(
 func.func @tensor.collapse_shape_of_slice(%arg0: tensor<2xi32>) -> tensor<i32> {
-  // CHECK: memref.subview %{{.*}}[1] [1] [1] : memref<2xi32> to memref<1xi32, strided<[1], offset: 1>>
+  // CHECK: memref.subview %{{.*}}[1] [1] [1] : memref<2xi32> to memref<1xi32, strided<[1]>>
   %0 = tensor.extract_slice %arg0[1] [1] [1] : tensor<2xi32> to tensor<1xi32>
-  // CHECK: memref.collapse_shape %{{.*}} [] : memref<1xi32, strided<[1], offset: 1>> into memref<i32, strided<[], offset: 1>>
+  // CHECK: memref.collapse_shape %{{.*}} [] : memref<1xi32, strided<[1]>> into memref<i32, strided<[]>>
   %1 = tensor.collapse_shape %0 [] : tensor<1xi32> into tensor<i32>
   return %1 : tensor<i32>
 }
@@ -504,10 +504,10 @@ func.func @tensor.collapse_shape_of_slice3(%t1: tensor<1x2xf32>) -> tensor<1xf32
 //  CHECK-SAME:     %[[t1:.*]]: tensor<?x2x4xf32>,
 // CHECK-SAME:      %[[OFFSET:.*]]: index) -> tensor<8xf32> {
 func.func @tensor.collapse_shape_of_slice4(%arg0: tensor<?x2x4xf32>, %offset: index, %size: index) -> tensor<8xf32> {
-  // CHECK: memref.subview %{{.*}} : memref<?x2x4xf32> to memref<4x2x1xf32, strided<[8, 4, 1], offset: ?>>
+  // CHECK: memref.subview %{{.*}} : memref<?x2x4xf32> to memref<4x2x1xf32, strided<[8, 4, 1]>>
   %0 = tensor.extract_slice %arg0[0, 0, %offset] [4, 2, 1] [1, 1, 1] : tensor<?x2x4xf32> to tensor<4x2x1xf32>
   // CHECK: memref.collapse_shape %{{.*}} [
-  // CHECK-SAME: [0, 1, 2]] : memref<4x2x1xf32, strided<[8, 4, 1], offset: ?>> into memref<8xf32, strided<[4], offset: ?>>
+  // CHECK-SAME: [0, 1, 2]] : memref<4x2x1xf32, strided<[8, 4, 1]>> into memref<8xf32, strided<[4]>>
   %ret = tensor.collapse_shape %0 [[0, 1, 2]] : tensor<4x2x1xf32> into tensor<8xf32>
   return %ret: tensor<8xf32>
 }
@@ -775,8 +775,8 @@ func.func @parallel_insert_slice_copy_before_write(%in: tensor<4xf32>, %out: ten
   %result = scf.forall (%thread_idx) in (%num_threads) shared_outs (%o = %out) -> tensor<4xf32> {
       %1 = tensor.extract_slice %in[%thread_idx][1][1] : tensor<4xf32> to tensor<1xf32>
       scf.forall.in_parallel {
-        // CHECK: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<4xf32> to memref<1xf32, strided<[1], offset: ?>>
-        // CHECK: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<4xf32> to memref<1xf32, strided<[1], offset: ?>>
+        // CHECK: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<4xf32> to memref<1xf32, strided<[1]>>
+        // CHECK: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<4xf32> to memref<1xf32, strided<[1]>>
         tensor.parallel_insert_slice %1 into %o[%thread_idx][1][1] :
           tensor<1xf32> into tensor<4xf32>
       }
diff --git a/mlir/test/Dialect/Tensor/one-shot-bufferize.mlir b/mlir/test/Dialect/Tensor/one-shot-bufferize.mlir
index f66cf7ae53266..737f618bd41f4 100644
--- a/mlir/test/Dialect/Tensor/one-shot-bufferize.mlir
+++ b/mlir/test/Dialect/Tensor/one-shot-bufferize.mlir
@@ -9,10 +9,10 @@
 // RUN: mlir-opt %s -one-shot-bufferize="unknown-type-conversion=identity-layout-map bufferize-function-boundaries" -split-input-file -o /dev/null
 
 // CHECK-LABEL: func private @insert_slice_fun
-//  CHECK-SAME:   %[[A0:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>,
-//  CHECK-SAME:   %[[A1:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>,
-//  CHECK-SAME:   %[[t0:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>,
-//  CHECK-SAME:   %[[t1:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:   %[[A0:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>,
+//  CHECK-SAME:   %[[A1:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>,
+//  CHECK-SAME:   %[[t0:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?]>>,
+//  CHECK-SAME:   %[[t1:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?]>>
 func.func private @insert_slice_fun(
     %A0 : tensor<?xf32> {bufferization.writable = false},
     %A1 : tensor<?xf32> {bufferization.writable = true},
@@ -55,8 +55,8 @@ func.func private @insert_slice_fun(
 // -----
 
 // CHECK-LABEL: func @insert_slice_fun
-//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
-//  CHECK-SAME:   %[[t:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
+//  CHECK-SAME:   %[[t:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?]>>
 func.func @insert_slice_fun(
     %A : tensor<?xf32> {bufferization.writable = true},
     %t : tensor<4xf32> {bufferization.writable = false})
@@ -81,8 +81,8 @@ func.func @insert_slice_fun(
 // -----
 
 // CHECK-LABEL: func @insert_slice_fun
-//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
-//  CHECK-SAME:   %[[t:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
+//  CHECK-SAME:   %[[t:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?]>>
 func.func @insert_slice_fun(
     %A : tensor<?xf32> {bufferization.writable = true},
     %t : tensor<4xf32> {bufferization.writable = false})
@@ -107,8 +107,8 @@ func.func @insert_slice_fun(
 // -----
 
 // CHECK-LABEL: func @insert_slice_fun_not_inplace
-//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
-//  CHECK-SAME:   %[[t:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:   %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?]>>
+//  CHECK-SAME:   %[[t:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?]>>
 func.func @insert_slice_fun_not_inplace(
     %A : tensor<?xf32> {bufferization.writable = false},
     %t : tensor<4xf32> {bufferization.writable = false})
@@ -131,7 +131,7 @@ func.func @insert_slice_fun_not_inplace(
 
 // CHECK-LABEL: func @tensor_cast_not_in_place(
 //  CHECK-SAME:     %[[A:.*]]: memref<?xf32{{.*}}>, %[[B:.*]]: memref<?xf32{{.*}}>
-//       CHECK:   %[[casted:.*]] = memref.cast %[[A]] : memref<?xf32, strided<[?], offset: ?>> to memref<4xf32, strided<[?], offset: ?>>
+//       CHECK:   %[[casted:.*]] = memref.cast %[[A]] : memref<?xf32, strided<[?]>> to memref<4xf32, strided<[?]>>
 //       CHECK:   %[[alloc:.*]] = memref.alloc
 //       CHECK:   memref.copy %[[casted]], %[[alloc]]
 //       CHECK:   %[[subview:.*]] = memref.subview %[[A]][{{.*}}] [4] [1] : {{.*}} to memref<4xf32
@@ -201,8 +201,8 @@ func.func @rank_reducing_parallel_insert_slice(%in: tensor<100xf32>, %out: tenso
   %result = scf.forall (%thread_idx) in (%num_threads) shared_outs (%o = %out) -> tensor<200x100xf32> {
       %1 = tensor.extract_slice %in[%thread_idx][1][1] : tensor<100xf32> to tensor<1xf32>
       scf.forall.in_parallel {
-        // CHECK: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<100xf32, strided<[?], offset: ?>> to memref<1xf32, strided<[?], offset: ?>>
-        // CHECK: memref.subview %{{.*}}[1, %{{.*}}] [1, 1] [1, 1] : memref<200x100xf32, strided<[?, ?], offset: ?>> to memref<1xf32, strided<[?], offset: ?>>
+        // CHECK: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<100xf32, strided<[?]>> to memref<1xf32, strided<[?]>>
+        // CHECK: memref.subview %{{.*}}[1, %{{.*}}] [1, 1] [1, 1] : memref<200x100xf32, strided<[?, ?]>> to memref<1xf32, strided<[?]>>
         tensor.parallel_insert_slice %1 into %o[1, %thread_idx][1, 1][1, 1] :
           tensor<1xf32> into tensor<200x100xf32>
       }
@@ -245,7 +245,7 @@ func.func @insert_equivalent_tensor(%t: tensor<10xf32>) -> tensor<10xf32> {
 // -----
 
 // CHECK-LABEL: func @pad_memory_space(
-//  CHECK-SAME:     %[[t:.*]]: memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     %[[t:.*]]: memref<?xf32, strided<[?]>>
 func.func @pad_memory_space(%t: tensor<?xf32>, %h1: index, %f: f32, %pos: index) -> f32
 {
   // CHECK: %[[alloc_tensor:.*]] = memref.alloc{{.*}} : memref<?xf32, 3>
@@ -257,7 +257,7 @@ func.func @pad_memory_space(%t: tensor<?xf32>, %h1: index, %f: f32, %pos: index)
   // CHECK:     outs(%[[padded_alloc]] : memref<15xf32, 3>)
   // CHECK:   linalg.yield %{{.*}}
   // CHECK: }
-  // CHECK: %[[subview:.*]] = memref.subview {{.*}} : memref<15xf32, 3> to memref<?xf32, strided<[1], offset: 2>, 3>
+  // CHECK: %[[subview:.*]] = memref.subview {{.*}} : memref<15xf32, 3> to memref<?xf32, strided<[1]>, 3>
   // CHECK: memref.copy %[[alloc_tensor]], %[[subview]]
   %1 = tensor.pad %0 low[2] high[%h1] {
   ^bb0(%arg0: index):
@@ -332,9 +332,9 @@ func.func @dim_not_reading(%t: tensor<?xf32>, %f: f32, %pos: index)
 
 //       CHECK: #[[$map:.*]] = affine_map<(d0) -> (d0 + 5)>
 // CHECK-LABEL: func.func private @cast_retains_buffer_layout(
-//  CHECK-SAME:     %[[t:.*]]: memref<?xf32, #[[$map]]>, %[[sz:.*]]: index) -> memref<?xf32, strided<[1], offset: 7>> {
+//  CHECK-SAME:     %[[t:.*]]: memref<?xf32, #[[$map]]>, %[[sz:.*]]: index) -> memref<?xf32, strided<[1]>> {
 //       CHECK:   %[[casted:.*]] = memref.cast %[[t]] : memref<?xf32, #[[$map]]> to memref<10xf32, #[[$map]]>
-//       CHECK:   %[[slice:.*]] = memref.subview %[[casted]][2] [%[[sz]]] [1] : memref<10xf32, #[[$map]]> to memref<?xf32, strided<[1], offset: 7>>
+//       CHECK:   %[[slice:.*]] = memref.subview %[[casted]][2] [%[[sz]]] [1] : memref<10xf32, #[[$map]]> to memref<?xf32, strided<[1]>>
 //       CHECK:   return %[[slice]]
 func.func private @cast_retains_buffer_layout(
     %t: tensor<?xf32>
@@ -354,13 +354,13 @@ func.func private @cast_retains_buffer_layout(
 // -----
 
 // CHECK-LABEL: func private @cast_retains_buffer_layout_strided(
-//  CHECK-SAME:     %[[t:.*]]: memref<?xf32, strided<[1], offset: 5>>, %[[sz:.*]]: index) -> memref<?xf32, strided<[1], offset: 7>> {
-//       CHECK:   %[[casted:.*]] = memref.cast %[[t]] : memref<?xf32, strided<[1], offset: 5>> to memref<10xf32, strided<[1], offset: 5>>
-//       CHECK:   %[[slice:.*]] = memref.subview %[[casted]][2] [%[[sz]]] [1] : memref<10xf32, strided<[1], offset: 5>> to memref<?xf32, strided<[1], offset: 7>>
+//  CHECK-SAME:     %[[t:.*]]: memref<?xf32, strided<[1]>>, %[[sz:.*]]: index) -> memref<?xf32, strided<[1]>> {
+//       CHECK:   %[[casted:.*]] = memref.cast %[[t]] : memref<?xf32, strided<[1]>> to memref<10xf32, strided<[1]>>
+//       CHECK:   %[[slice:.*]] = memref.subview %[[casted]][2] [%[[sz]]] [1] : memref<10xf32, strided<[1]>> to memref<?xf32, strided<[1]>>
 //       CHECK:   return %[[slice]]
 func.func private @cast_retains_buffer_layout_strided(
     %t: tensor<?xf32>
-        {bufferization.buffer_layout = strided<[1], offset: 5>},
+        {bufferization.buffer_layout = strided<[1]>},
     %sz: index)
   -> (tensor<10xf32>, tensor<?xf32>)
 {
@@ -448,19 +448,19 @@ func.func @tensor_reshape_aliasing(%arg0: index, %arg1: index) -> tensor<?x?xf32
 // -----
 
 // CHECK-LABEL: @reshape_with_non_identity_layout(
-// CHECK-SAME:    %[[INPUT:[a-zA-Z0-9]*]]: memref<2x2xf32, strided<[?, ?], offset: ?>, 3>,
-// CHECK-SAME:    %[[LAYOUT:[a-zA-Z0-9]*]]: memref<2xi32, strided<[?], offset: ?>>,
-func.func @reshape_with_non_identity_layout(%arg0: memref<2x2xf32, strided<[?, ?], offset: ?>, 3>, %arg1: tensor<2xi32>, %idx: index) -> f32 {
-  %t = bufferization.to_tensor %arg0 restrict : memref<2x2xf32, strided<[?, ?], offset: ?>, 3> to tensor<2x2xf32>
+// CHECK-SAME:    %[[INPUT:[a-zA-Z0-9]*]]: memref<2x2xf32, strided<[?, ?]>, 3>,
+// CHECK-SAME:    %[[LAYOUT:[a-zA-Z0-9]*]]: memref<2xi32, strided<[?]>>,
+func.func @reshape_with_non_identity_layout(%arg0: memref<2x2xf32, strided<[?, ?]>, 3>, %arg1: tensor<2xi32>, %idx: index) -> f32 {
+  %t = bufferization.to_tensor %arg0 restrict : memref<2x2xf32, strided<[?, ?]>, 3> to tensor<2x2xf32>
 
-  // CHECK: %[[SUBVIEW:.+]] = memref.subview %[[INPUT]][1, 0] [1, 2] [1, 1] : memref<2x2xf32, strided<[?, ?], offset: ?>, 3> to memref<2xf32, strided<[?], offset: ?>, 3>
+  // CHECK: %[[SUBVIEW:.+]] = memref.subview %[[INPUT]][1, 0] [1, 2] [1, 1] : memref<2x2xf32, strided<[?, ?]>, 3> to memref<2xf32, strided<[?]>, 3>
   %extracted_slice = tensor.extract_slice %t[1, 0] [1, 2] [1, 1] : tensor<2x2xf32> to tensor<2xf32>
 
   // To satisify the constraints of memref.reshape, the subview must be
   // reallocated a buffer with an identity layout.
   // CHECK: %[[ALLOC:.+]] = memref.alloc() {{.*}} : memref<2xf32, 3>
   // CHECK: memref.copy %[[SUBVIEW]], %[[ALLOC]]
-  // CHECK: %[[RESHAPED:.+]] = memref.reshape %[[ALLOC]](%[[LAYOUT]]) : (memref<2xf32, 3>, memref<2xi32, strided<[?], offset: ?>>) -> memref<1x2xf32, 3>
+  // CHECK: %[[RESHAPED:.+]] = memref.reshape %[[ALLOC]](%[[LAYOUT]]) : (memref<2xf32, 3>, memref<2xi32, strided<[?]>>) -> memref<1x2xf32, 3>
   %reshape = tensor.reshape %extracted_slice(%arg1) : (tensor<2xf32>, tensor<2xi32>) -> tensor<1x2xf32>
 
   %r = tensor.extract %reshape[%idx, %idx] : tensor<1x2xf32>
@@ -494,7 +494,7 @@ func.func @collapse_shape_regression(
 // -----
 
 // CHECK-LABEL: func private @mult_return_callee(
-//  CHECK-SAME:   %[[T:.*]]: memref<?xf32, strided<[?], offset: ?>>, %[[COND:.*]]: i1,
+//  CHECK-SAME:   %[[T:.*]]: memref<?xf32, strided<[?]>>, %[[COND:.*]]: i1,
 //  CHECK-SAME:   %[[A:.*]]: index, %[[B:.*]]: index) -> index {
 //       CHECK:   cf.cond_br %[[COND]], ^bb1, ^bb2
 //       CHECK: ^bb1:
@@ -511,11 +511,11 @@ func.func private @mult_return_callee(%t: tensor<?xf32>,  %cond:i1, %a: index, %
 }
 
 // CHECK-LABEL: func @mult_return(
-//  CHECK-SAME:   %[[T:.*]]: memref<?xf32, strided<[?], offset: ?>>, %[[COND:.*]]: i1,
-//  CHECK-SAME:   %[[A:.*]]: index, %[[B:.*]]: index) -> (memref<?xf32, strided<[?], offset: ?>>, index) {
+//  CHECK-SAME:   %[[T:.*]]: memref<?xf32, strided<[?]>>, %[[COND:.*]]: i1,
+//  CHECK-SAME:   %[[A:.*]]: index, %[[B:.*]]: index) -> (memref<?xf32, strided<[?]>>, index) {
 func.func @mult_return(%t: tensor<?xf32>,  %cond:i1, %a: index, %b: index) -> (tensor<10xf32>, index) {
-  // CHECK: %[[RET:.*]] = call @mult_return_callee(%[[T]], %[[COND]], %[[A]], %[[B]]) : (memref<?xf32, strided<[?], offset: ?>>, i1, index, index) -> index
-  // CHECK: return %[[T]], %[[RET]] : memref<?xf32, strided<[?], offset: ?>>, index
+  // CHECK: %[[RET:.*]] = call @mult_return_callee(%[[T]], %[[COND]], %[[A]], %[[B]]) : (memref<?xf32, strided<[?]>>, i1, index, index) -> index
+  // CHECK: return %[[T]], %[[RET]] : memref<?xf32, strided<[?]>>, index
   %t_res, %v = func.call @mult_return_callee(%t, %cond, %a, %b) : (tensor<?xf32>, i1, index, index) -> (tensor<10xf32>, index) 
   return %t_res, %v : tensor<10xf32>, index
 }
diff --git a/mlir/test/Dialect/Transform/test-pattern-application.mlir b/mlir/test/Dialect/Transform/test-pattern-application.mlir
index f78b4b6f6798c..24d129ad69b4b 100644
--- a/mlir/test/Dialect/Transform/test-pattern-application.mlir
+++ b/mlir/test/Dialect/Transform/test-pattern-application.mlir
@@ -260,9 +260,9 @@ module {
 //   CHECK-NOT:   memref.copy
 func.func @canonicalization_and_cse(%m: memref<5xf32>) {
   %c2 = arith.constant 2 : index
-  %s0 = memref.subview %m[1] [2] [1] : memref<5xf32> to memref<2xf32, strided<[1], offset: 1>>
-  %s1 = memref.subview %m[1] [%c2] [1] : memref<5xf32> to memref<?xf32, strided<[1], offset: 1>>
-  memref.copy %s0, %s1 : memref<2xf32, strided<[1], offset: 1>> to memref<?xf32, strided<[1], offset: 1>>
+  %s0 = memref.subview %m[1] [2] [1] : memref<5xf32> to memref<2xf32, strided<[1]>>
+  %s1 = memref.subview %m[1] [%c2] [1] : memref<5xf32> to memref<?xf32, strided<[1]>>
+  memref.copy %s0, %s1 : memref<2xf32, strided<[1]>> to memref<?xf32, strided<[1]>>
   return
 }
 
diff --git a/mlir/test/Dialect/Transform/test-promote-tensors.mlir b/mlir/test/Dialect/Transform/test-promote-tensors.mlir
index bc9a05af64156..312ad2259a56a 100644
--- a/mlir/test/Dialect/Transform/test-promote-tensors.mlir
+++ b/mlir/test/Dialect/Transform/test-promote-tensors.mlir
@@ -58,21 +58,21 @@ module attributes {transform.with_named_sequence} {
 // CHECK-LABEL: @promote_in0_out_bufferize
 // CHECK-SAME: (%[[ARG0:.+]]: tensor<?x42xf32>, %{{.*}}: tensor<42x?xf32>, %[[ARG2:.+]]: tensor<?x?xf32>)
 func.func @promote_in0_out_bufferize(%arg0: tensor<?x42xf32>, %arg1: tensor<42x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
-    // CHECK:  %[[IN1:.+]] = bufferization.to_buffer %arg1 : tensor<42x?xf32> to memref<42x?xf32, strided<[?, ?], offset: ?>>
-    // CHECK:  %[[IN0:.+]] = bufferization.to_buffer %arg0 : tensor<?x42xf32> to memref<?x42xf32, strided<[?, ?], offset: ?>>
-    // CHECK:  %{{.+}} = bufferization.to_buffer %arg0 : tensor<?x42xf32> to memref<?x42xf32, strided<[?, ?], offset: ?>>
-    // CHECK:  %{{.+}} = bufferization.to_buffer %arg2 : tensor<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-    // CHECK:  %{{.+}} = bufferization.to_buffer %arg2 : tensor<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+    // CHECK:  %[[IN1:.+]] = bufferization.to_buffer %arg1 : tensor<42x?xf32> to memref<42x?xf32, strided<[?, ?]>>
+    // CHECK:  %[[IN0:.+]] = bufferization.to_buffer %arg0 : tensor<?x42xf32> to memref<?x42xf32, strided<[?, ?]>>
+    // CHECK:  %{{.+}} = bufferization.to_buffer %arg0 : tensor<?x42xf32> to memref<?x42xf32, strided<[?, ?]>>
+    // CHECK:  %{{.+}} = bufferization.to_buffer %arg2 : tensor<?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
+    // CHECK:  %{{.+}} = bufferization.to_buffer %arg2 : tensor<?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
     // CHECK:  %[[C0:.+]] = arith.constant 0 : index
-    // CHECK:  %{{.+}} = memref.dim %{{.+}}, %[[C0]] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+    // CHECK:  %{{.+}} = memref.dim %{{.+}}, %[[C0]] : memref<?x?xf32, strided<[?, ?]>>
     // CHECK:  %[[C1:.+]] = arith.constant 1 : index
-    // CHECK:  %{{.+}} = memref.dim %{{.+}}, %[[C1]] : memref<?x?xf32, strided<[?, ?], offset: ?>>
+    // CHECK:  %{{.+}} = memref.dim %{{.+}}, %[[C1]] : memref<?x?xf32, strided<[?, ?]>>
     // CHECK:  %[[ALLOC_OUT:.+]] = memref.alloc(%{{.+}}, %{{.+}}) {alignment = 64 : i64} : memref<?x?xf32, 1>
     // CHECK:  %{{.+}} = arith.constant 0 : index
-    // CHECK:  %{{.+}} = memref.dim %{{.+}}, %{{.+}} : memref<?x42xf32, strided<[?, ?], offset: ?>>
+    // CHECK:  %{{.+}} = memref.dim %{{.+}}, %{{.+}} : memref<?x42xf32, strided<[?, ?]>>
     // CHECK:  %[[ALLOC_IN:.+]] = memref.alloc(%{{.+}}) {alignment = 64 : i64} : memref<?x42xf32, 1>
-    // CHECK:  memref.copy %[[IN0]], %[[ALLOC_IN]] : memref<?x42xf32, strided<[?, ?], offset: ?>> to memref<?x42xf32, 1>
-    // CHECK: linalg.add ins(%[[ALLOC_IN]], %[[IN1]] : memref<?x42xf32, 1>, memref<42x?xf32, strided<[?, ?], offset: ?>>) outs(%[[ALLOC_OUT]] : memref<?x?xf32, 1>)
+    // CHECK:  memref.copy %[[IN0]], %[[ALLOC_IN]] : memref<?x42xf32, strided<[?, ?]>> to memref<?x42xf32, 1>
+    // CHECK: linalg.add ins(%[[ALLOC_IN]], %[[IN1]] : memref<?x42xf32, 1>, memref<42x?xf32, strided<[?, ?]>>) outs(%[[ALLOC_OUT]] : memref<?x?xf32, 1>)
     %0 = linalg.add ins(%arg0, %arg1: tensor<?x42xf32>, tensor<42x?xf32>)
                     outs(%arg2: tensor<?x?xf32>) -> tensor<?x?xf32>
     return %0 : tensor<?x?xf32>
diff --git a/mlir/test/Dialect/Vector/invalid.mlir b/mlir/test/Dialect/Vector/invalid.mlir
index f90312c915334..933188c583e08 100644
--- a/mlir/test/Dialect/Vector/invalid.mlir
+++ b/mlir/test/Dialect/Vector/invalid.mlir
@@ -2084,10 +2084,10 @@ func.func @load_non_pow_of_2_alignment(%memref: memref<4xi32>, %c0: index) {
 
 // -----
 
-func.func @load_non_unit_stride(%src : memref<?xi8, strided<[2], offset: ?>>) {
+func.func @load_non_unit_stride(%src : memref<?xi8, strided<[2]>>) {
   %c0 = arith.constant 0 : index
   // expected-error @+1 {{'vector.load' op most minor memref dim must have unit stride}}
-  %0 = vector.load %src[%c0] : memref<?xi8, strided<[2], offset: ?>>, vector<16xi8>
+  %0 = vector.load %src[%c0] : memref<?xi8, strided<[2]>>, vector<16xi8>
   return
 }
 
@@ -2121,9 +2121,9 @@ func.func @store_non_pow_of_2_alignment(%memref: memref<4xi32>, %val: vector<4xi
 }
 
 // -----
-func.func @store_non_unit_stride(%src : memref<?xi8, strided<[2], offset:?>>,%val : vector<16xi8>, %c0: index) {
+func.func @store_non_unit_stride(%src : memref<?xi8, strided<[2]>>,%val : vector<16xi8>, %c0: index) {
   // expected-error @below {{'vector.store' op most minor memref dim must have unit stride}}
-  vector.store %val, %src[%c0] : memref<?xi8, strided<[2], offset: ?>>, vector<16xi8>
+  vector.store %val, %src[%c0] : memref<?xi8, strided<[2]>>, vector<16xi8>
   return
 }
 
diff --git a/mlir/test/Dialect/Vector/one-shot-bufferize.mlir b/mlir/test/Dialect/Vector/one-shot-bufferize.mlir
index c2d699b9b013a..6117427e0f985 100644
--- a/mlir/test/Dialect/Vector/one-shot-bufferize.mlir
+++ b/mlir/test/Dialect/Vector/one-shot-bufferize.mlir
@@ -2,22 +2,22 @@
 // RUN: mlir-opt %s -one-shot-bufferize="bufferize-function-boundaries test-analysis-only" -split-input-file | FileCheck %s -check-prefix=CHECK-ANALYSIS
 
 // CHECK-LABEL: func @mask(
-//  CHECK-SAME:     %[[t0:.*]]: memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     %[[t0:.*]]: memref<?xf32, strided<[?]>>
 func.func @mask(%t0: tensor<?xf32>, %val: vector<16xf32>, %idx: index, %m0: vector<16xi1>) -> tensor<?xf32> {
   // CHECK-NOT: alloc
   // CHECK-NOT: copy
-  //     CHECK: vector.mask %{{.*}} { vector.transfer_write %{{.*}}, %[[t0]][%{{.*}}] : vector<16xf32>, memref<?xf32, strided<[?], offset: ?>> } : vector<16xi1>
+  //     CHECK: vector.mask %{{.*}} { vector.transfer_write %{{.*}}, %[[t0]][%{{.*}}] : vector<16xf32>, memref<?xf32, strided<[?]>> } : vector<16xi1>
   %0 = vector.mask %m0 { vector.transfer_write %val, %t0[%idx] : vector<16xf32>, tensor<?xf32> } : vector<16xi1> -> tensor<?xf32>
   //     CHECK: return %[[t0]]
   return %0 : tensor<?xf32>
 }
 
 // CHECK-LABEL: func @mask_scalable(
-//  CHECK-SAME:     %[[t0:.*]]: memref<?xf32, strided<[?], offset: ?>>
+//  CHECK-SAME:     %[[t0:.*]]: memref<?xf32, strided<[?]>>
 func.func @mask_scalable(%t0: tensor<?xf32>, %val: vector<[16]xf32>, %idx: index, %m0: vector<[16]xi1>) -> tensor<?xf32> {
   // CHECK-NOT: alloc
   // CHECK-NOT: copy
-  //     CHECK: vector.mask %{{.*}} { vector.transfer_write %{{.*}}, %[[t0]][%{{.*}}] : vector<[16]xf32>, memref<?xf32, strided<[?], offset: ?>> } : vector<[16]xi1>
+  //     CHECK: vector.mask %{{.*}} { vector.transfer_write %{{.*}}, %[[t0]][%{{.*}}] : vector<[16]xf32>, memref<?xf32, strided<[?]>> } : vector<[16]xi1>
   %0 = vector.mask %m0 { vector.transfer_write %val, %t0[%idx] : vector<[16]xf32>, tensor<?xf32> } : vector<[16]xi1> -> tensor<?xf32>
   //     CHECK: return %[[t0]]
   return %0 : tensor<?xf32>
diff --git a/mlir/test/Dialect/Vector/ops.mlir b/mlir/test/Dialect/Vector/ops.mlir
index de620221944de..b51be1bed257a 100644
--- a/mlir/test/Dialect/Vector/ops.mlir
+++ b/mlir/test/Dialect/Vector/ops.mlir
@@ -719,22 +719,22 @@ func.func @vector_load_and_store_0d_scalar_memref(%memref : memref<200x100xf32>,
 }
 
 // CHECK-LABEL: @vector_load_and_store_0d_scalar_strided_memref
-func.func @vector_load_and_store_0d_scalar_strided_memref(%memref : memref<200x100xf32, strided<[?, ?], offset: ?>>,
+func.func @vector_load_and_store_0d_scalar_strided_memref(%memref : memref<200x100xf32, strided<[?, ?]>>,
                                                           %i : index, %j : index) {
-  // CHECK: %[[ld:.*]] = vector.load %{{.*}}[%{{.*}}] : memref<200x100xf32, strided<[?, ?], offset: ?>>, vector<f32>
-  %0 = vector.load %memref[%i, %j] : memref<200x100xf32, strided<[?, ?], offset: ?>>, vector<f32>
-  // CHECK: vector.store %[[ld]], %{{.*}}[%{{.*}}] : memref<200x100xf32, strided<[?, ?], offset: ?>>, vector<f32>
-  vector.store %0, %memref[%i, %j] : memref<200x100xf32, strided<[?, ?], offset: ?>>, vector<f32>
+  // CHECK: %[[ld:.*]] = vector.load %{{.*}}[%{{.*}}] : memref<200x100xf32, strided<[?, ?]>>, vector<f32>
+  %0 = vector.load %memref[%i, %j] : memref<200x100xf32, strided<[?, ?]>>, vector<f32>
+  // CHECK: vector.store %[[ld]], %{{.*}}[%{{.*}}] : memref<200x100xf32, strided<[?, ?]>>, vector<f32>
+  vector.store %0, %memref[%i, %j] : memref<200x100xf32, strided<[?, ?]>>, vector<f32>
   return
 }
 
 // CHECK-LABEL: @vector_load_and_store_unit_vec_strided_memref
-func.func @vector_load_and_store_unit_vec_strided_memref(%memref : memref<200x100xf32, strided<[?, ?], offset: ?>>,
+func.func @vector_load_and_store_unit_vec_strided_memref(%memref : memref<200x100xf32, strided<[?, ?]>>,
                                                          %i : index, %j : index) {
-  // CHECK: %[[ld:.*]] = vector.load %{{.*}}[%{{.*}}] : memref<200x100xf32, strided<[?, ?], offset: ?>>, vector<1xf32>
-  %0 = vector.load %memref[%i, %j] : memref<200x100xf32, strided<[?, ?], offset: ?>>, vector<1xf32>
-  // CHECK: vector.store %[[ld]], %{{.*}}[%{{.*}}] : memref<200x100xf32, strided<[?, ?], offset: ?>>, vector<1xf32>
-  vector.store %0, %memref[%i, %j] : memref<200x100xf32, strided<[?, ?], offset: ?>>, vector<1xf32>
+  // CHECK: %[[ld:.*]] = vector.load %{{.*}}[%{{.*}}] : memref<200x100xf32, strided<[?, ?]>>, vector<1xf32>
+  %0 = vector.load %memref[%i, %j] : memref<200x100xf32, strided<[?, ?]>>, vector<1xf32>
+  // CHECK: vector.store %[[ld]], %{{.*}}[%{{.*}}] : memref<200x100xf32, strided<[?, ?]>>, vector<1xf32>
+  vector.store %0, %memref[%i, %j] : memref<200x100xf32, strided<[?, ?]>>, vector<1xf32>
   return
 }
 
diff --git a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
index 1bedce7ea6a67..35cfb5b7908f4 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
@@ -7,18 +7,18 @@
 // [Pattern: DropInnerMostUnitDimsTransferRead]
 //-----------------------------------------------------------------------------
 
-func.func @contiguous_inner_most(%src: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{
+func.func @contiguous_inner_most(%src: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>) -> vector<1x8x1xf32>{
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.0 : f32
-  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x1xf32>
+  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>, vector<1x8x1xf32>
   return %v : vector<1x8x1xf32>
 }
 
-//      CHECK: func @contiguous_inner_most(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
+//      CHECK: func @contiguous_inner_most(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>
 //      CHECK:   %[[SRC_0:.+]] = memref.subview %[[SRC]]
-// CHECK-SAME:    memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
+// CHECK-SAME:    memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>> to memref<1x1x8xf32, strided<[3072, 8, 1]>>
 //      CHECK:   %[[VEC:.+]] = vector.transfer_read %[[SRC_0]]
-// CHECK-SAME:    memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>, vector<1x8xf32>
+// CHECK-SAME:    memref<1x1x8xf32, strided<[3072, 8, 1]>>, vector<1x8xf32>
 //      CHECK:   %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
 //      CHECK:   return %[[RESULT]]
 
@@ -26,28 +26,28 @@ func.func @contiguous_inner_most(%src: memref<1x1x8x1xf32, strided<[3072, 8, 1,
 // dim scalable. Note that this example only makes sense when "8 = [8]" (i.e.
 // vscale = 1). This is assumed via the `in_bounds` attribute.
 
-func.func @contiguous_inner_most_scalable_inner_dim(%src: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x[8]x1xf32>{
+func.func @contiguous_inner_most_scalable_inner_dim(%src: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>) -> vector<1x[8]x1xf32>{
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.0 : f32
-  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x[8]x1xf32>
+  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>, vector<1x[8]x1xf32>
   return %v : vector<1x[8]x1xf32>
 }
 
-//      CHECK: func @contiguous_inner_most_scalable_inner_dim(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
+//      CHECK: func @contiguous_inner_most_scalable_inner_dim(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>
 //      CHECK:   %[[SRC_0:.+]] = memref.subview %[[SRC]]
-// CHECK-SAME:    memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
+// CHECK-SAME:    memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>> to memref<1x1x8xf32, strided<[3072, 8, 1]>>
 //      CHECK:   %[[VEC:.+]] = vector.transfer_read %[[SRC_0]]
-// CHECK-SAME:    memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>, vector<1x[8]xf32>
+// CHECK-SAME:    memref<1x1x8xf32, strided<[3072, 8, 1]>>, vector<1x[8]xf32>
 //      CHECK:   %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
 //      CHECK:   return %[[RESULT]]
 
 // Same as the top example within this split, but the trailing unit dim was
 // replaced with a dyn dim - not supported
 
-func.func @negative_dynamic_trailing_dim(%src: memref<1x1x8x?xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{
+func.func @negative_dynamic_trailing_dim(%src: memref<1x1x8x?xf32, strided<[3072, 8, 1, 1]>>) -> vector<1x8x1xf32>{
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.0 : f32
-  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<1x1x8x?xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x1xf32>
+  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<1x1x8x?xf32, strided<[3072, 8, 1, 1]>>, vector<1x8x1xf32>
   return %v : vector<1x8x1xf32>
 }
 
@@ -58,10 +58,10 @@ func.func @negative_dynamic_trailing_dim(%src: memref<1x1x8x?xf32, strided<[3072
 // Same as the top example within this split, but with a "scalable unit" dim in
 // the output vector - not supported (scalable 1, [1], is _not_ a unit dimension).
 
-func.func @negative_scalable_one_trailing_dim(%src: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x[1]xf32>{
+func.func @negative_scalable_one_trailing_dim(%src: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>) -> vector<1x8x[1]xf32>{
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.0 : f32
-  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x[1]xf32>
+  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>, vector<1x8x[1]xf32>
   return %v : vector<1x8x[1]xf32>
 }
 //  CHECK-LABEL: func @negative_scalable_one_trailing_dim
@@ -199,8 +199,8 @@ func.func @negative_contiguous_inner_most_non_zero_idx_out_of_bounds(%src: memre
 func.func @contiguous_inner_most_dim_with_subview(%src: memref<1000x1xf32>, %i:index, %ii:index) -> (vector<4x1xf32>) {
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.0 : f32
-  %sv = memref.subview %src[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
-  %v = vector.transfer_read %sv[%ii, %c0], %pad {in_bounds = [true, true]} : memref<40x1xf32, strided<[1, 1], offset: ?>>, vector<4x1xf32>
+  %sv = memref.subview %src[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1]>>
+  %v = vector.transfer_read %sv[%ii, %c0], %pad {in_bounds = [true, true]} : memref<40x1xf32, strided<[1, 1]>>, vector<4x1xf32>
   return %v : vector<4x1xf32>
 }
 //      CHECK: func @contiguous_inner_most_dim_with_subview(%[[SRC:.+]]: memref<1000x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1xf32>
@@ -217,8 +217,8 @@ func.func @contiguous_inner_most_dim_with_subview(%src: memref<1000x1xf32>, %i:i
 func.func @contiguous_inner_most_dim_with_subview_scalable_inner_dim(%src: memref<1000x1xf32>, %i:index, %ii:index) -> (vector<[4]x1xf32>) {
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.0 : f32
-  %sv = memref.subview %src[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
-  %v = vector.transfer_read %sv[%ii, %c0], %pad {in_bounds = [true, true]} : memref<40x1xf32, strided<[1, 1], offset: ?>>, vector<[4]x1xf32>
+  %sv = memref.subview %src[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1]>>
+  %v = vector.transfer_read %sv[%ii, %c0], %pad {in_bounds = [true, true]} : memref<40x1xf32, strided<[1, 1]>>, vector<[4]x1xf32>
   return %v : vector<[4]x1xf32>
 }
 // CHECK-LABEL: func @contiguous_inner_most_dim_with_subview_scalable_inner_dim
@@ -233,8 +233,8 @@ func.func @contiguous_inner_most_dim_with_subview_scalable_inner_dim(%src: memre
 func.func @contiguous_inner_most_dim_with_subview_2d(%src: memref<1000x1x1xf32>, %i:index, %ii:index) -> (vector<4x1x1xf32>) {
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.0 : f32
-  %sv = memref.subview %src[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
-  %v = vector.transfer_read %sv[%ii, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>, vector<4x1x1xf32>
+  %sv = memref.subview %src[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1]>>
+  %v = vector.transfer_read %sv[%ii, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<40x1x1xf32, strided<[1, 1, 1]>>, vector<4x1x1xf32>
   return %v : vector<4x1x1xf32>
 }
 //      CHECK: func @contiguous_inner_most_dim_with_subview_2d(%[[SRC:.+]]: memref<1000x1x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1x1xf32>
@@ -251,8 +251,8 @@ func.func @contiguous_inner_most_dim_with_subview_2d(%src: memref<1000x1x1xf32>,
 func.func @contiguous_inner_most_dim_with_subview_2d_scalable_inner_dim(%src: memref<1000x1x1xf32>, %i:index, %ii:index) -> (vector<[4]x1x1xf32>) {
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.0 : f32
-  %sv = memref.subview %src[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
-  %v = vector.transfer_read %sv[%ii, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>, vector<[4]x1x1xf32>
+  %sv = memref.subview %src[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1]>>
+  %v = vector.transfer_read %sv[%ii, %c0, %c0], %pad {in_bounds = [true, true, true]} : memref<40x1x1xf32, strided<[1, 1, 1]>>, vector<[4]x1x1xf32>
   return %v : vector<[4]x1x1xf32>
 }
 // CHECK-LABEL: func @contiguous_inner_most_dim_with_subview_2d_scalable_inner_dim(
@@ -266,18 +266,18 @@ func.func @contiguous_inner_most_dim_with_subview_2d_scalable_inner_dim(%src: me
 
 // -----
 
-func.func @contiguous_inner_most_with_mask(%src: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, %mask: vector<1x8x1xi1>) -> vector<1x8x1xf32>{
+func.func @contiguous_inner_most_with_mask(%src: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>, %mask: vector<1x8x1xi1>) -> vector<1x8x1xf32>{
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.0 : f32
-  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad, %mask {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x1xf32>
+  %v = vector.transfer_read %src[%c0, %c0, %c0, %c0], %pad, %mask {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>, vector<1x8x1xf32>
   return %v : vector<1x8x1xf32>
 }
-//      CHECK: func @contiguous_inner_most_with_mask(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, %[[MASK:.+]]: vector<1x8x1xi1>)
+//      CHECK: func @contiguous_inner_most_with_mask(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>, %[[MASK:.+]]: vector<1x8x1xi1>)
 //      CHECK:   %[[SRC_0:.+]] = memref.subview %[[SRC]]
-// CHECK-SAME:    memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
+// CHECK-SAME:    memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>> to memref<1x1x8xf32, strided<[3072, 8, 1]>>
 //      CHECK:   %[[REDUCED_MASK:.+]] = vector.shape_cast %[[MASK]] : vector<1x8x1xi1> to vector<1x8xi1>
 //      CHECK:   %[[VEC:.+]] = vector.transfer_read %[[SRC_0]]{{.*}}, %[[REDUCED_MASK]]
-// CHECK-SAME:    memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>, vector<1x8xf32>
+// CHECK-SAME:    memref<1x1x8xf32, strided<[3072, 8, 1]>>, vector<1x8xf32>
 //      CHECK:   %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
 //      CHECK:   return %[[RESULT]]
 
@@ -311,12 +311,12 @@ func.func @negative_non_unit_inner_memref_dim(%src: memref<4x8xf32>) -> vector<4
 
 // The inner most unit dims can not be dropped if the strides are not ones.
 
-func.func @negative_non_unit_strides(%src: memref<512x16x1xf32, strided<[8192, 16, 4], offset: ?>>, %i: index) -> vector<16x16x1xf32> {
+func.func @negative_non_unit_strides(%src: memref<512x16x1xf32, strided<[8192, 16, 4]>>, %i: index) -> vector<16x16x1xf32> {
   %c0 = arith.constant 0 : index
   %pad = arith.constant 0.000000e+00 : f32
   %v = vector.transfer_read %src[%i, %c0, %c0], %pad
     {in_bounds = [true, true, true]}
-    : memref<512x16x1xf32, strided<[8192, 16, 4], offset: ?>>, vector<16x16x1xf32>
+    : memref<512x16x1xf32, strided<[8192, 16, 4]>>, vector<16x16x1xf32>
   return %v : vector<16x16x1xf32>
 }
 // CHECK:     func.func @negative_non_unit_strides
@@ -522,8 +522,8 @@ func.func @negative_contiguous_inner_most_dim_non_zero_idx_out_of_bounds(%dest:
 func.func @contiguous_inner_most_dim_with_subview(%dest: memref<1000x1xf32>, %i:index, %ii:index, %vec: vector<4x1xf32>) {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.0 : f32
-  %0 = memref.subview %dest[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
-  vector.transfer_write %vec, %0[%ii, %c0] {in_bounds = [true, true]} : vector<4x1xf32>, memref<40x1xf32, strided<[1, 1], offset: ?>>
+  %0 = memref.subview %dest[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1]>>
+  vector.transfer_write %vec, %0[%ii, %c0] {in_bounds = [true, true]} : vector<4x1xf32>, memref<40x1xf32, strided<[1, 1]>>
   return
 }
 
@@ -531,10 +531,10 @@ func.func @contiguous_inner_most_dim_with_subview(%dest: memref<1000x1xf32>, %i:
 // CHECK-SAME:      %[[MEM:.*]]: memref<1000x1xf32>,
 // CHECK-SAME:      %[[IDX_1:.*]]: index, %[[IDX_2:.*]]: index,
 // CHECK-SAME:      %[[VEC:.*]]: vector<4x1xf32>) {
-// CHECK:           %[[SV_1:.*]] = memref.subview %[[MEM]]{{\[}}%[[IDX_1]], 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
-// CHECK:           %[[SV_2:.*]] = memref.subview %[[SV_1]][0, 0] [40, 1] [1, 1] : memref<40x1xf32, strided<[1, 1], offset: ?>> to memref<40xf32, strided<[1], offset: ?>>
+// CHECK:           %[[SV_1:.*]] = memref.subview %[[MEM]]{{\[}}%[[IDX_1]], 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1]>>
+// CHECK:           %[[SV_2:.*]] = memref.subview %[[SV_1]][0, 0] [40, 1] [1, 1] : memref<40x1xf32, strided<[1, 1]>> to memref<40xf32, strided<[1]>>
 // CHECK:           %[[SC:.*]] = vector.shape_cast %[[VEC]] : vector<4x1xf32> to vector<4xf32>
-// CHECK:           vector.transfer_write %[[SC]], %[[SV_2]]{{\[}}%[[IDX_2]]] {in_bounds = [true]} : vector<4xf32>, memref<40xf32, strided<[1], offset: ?>>
+// CHECK:           vector.transfer_write %[[SC]], %[[SV_2]]{{\[}}%[[IDX_2]]] {in_bounds = [true]} : vector<4xf32>, memref<40xf32, strided<[1]>>
 
 // Same as the top example within this split, but with the outer vector
 // dim scalable. Note that this example only makes sense when "4 = [4]" (i.e.
@@ -543,8 +543,8 @@ func.func @contiguous_inner_most_dim_with_subview(%dest: memref<1000x1xf32>, %i:
 func.func @contiguous_inner_most_dim_with_subview_scalable_inner_dim(%dest: memref<1000x1xf32>, %i:index, %ii:index, %vec: vector<[4]x1xf32>) {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.0 : f32
-  %0 = memref.subview %dest[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
-  vector.transfer_write %vec, %0[%ii, %c0] {in_bounds = [true, true]} : vector<[4]x1xf32>, memref<40x1xf32, strided<[1, 1], offset: ?>>
+  %0 = memref.subview %dest[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1]>>
+  vector.transfer_write %vec, %0[%ii, %c0] {in_bounds = [true, true]} : vector<[4]x1xf32>, memref<40x1xf32, strided<[1, 1]>>
   return
 }
 
@@ -552,28 +552,28 @@ func.func @contiguous_inner_most_dim_with_subview_scalable_inner_dim(%dest: memr
 // CHECK-SAME:      %[[MEM:.*]]: memref<1000x1xf32>,
 // CHECK-SAME:      %[[IDX_1:.*]]: index, %[[IDX_2:.*]]: index,
 // CHECK-SAME:      %[[VEC:.*]]: vector<[4]x1xf32>) {
-// CHECK:           %[[SV_1:.*]] = memref.subview %[[MEM]]{{\[}}%[[IDX_1]], 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
-// CHECK:           %[[SV_2:.*]] = memref.subview %[[SV_1]][0, 0] [40, 1] [1, 1] : memref<40x1xf32, strided<[1, 1], offset: ?>> to memref<40xf32, strided<[1], offset: ?>>
+// CHECK:           %[[SV_1:.*]] = memref.subview %[[MEM]]{{\[}}%[[IDX_1]], 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1]>>
+// CHECK:           %[[SV_2:.*]] = memref.subview %[[SV_1]][0, 0] [40, 1] [1, 1] : memref<40x1xf32, strided<[1, 1]>> to memref<40xf32, strided<[1]>>
 // CHECK:           %[[SC:.*]] = vector.shape_cast %[[VEC]] : vector<[4]x1xf32> to vector<[4]xf32>
-// CHECK:           vector.transfer_write %[[SC]], %[[SV_2]]{{\[}}%[[IDX_2]]] {in_bounds = [true]} : vector<[4]xf32>, memref<40xf32, strided<[1], offset: ?>>
+// CHECK:           vector.transfer_write %[[SC]], %[[SV_2]]{{\[}}%[[IDX_2]]] {in_bounds = [true]} : vector<[4]xf32>, memref<40xf32, strided<[1]>>
 
 // -----
 
 func.func @contiguous_inner_most_dim_with_subview_2d(%dest: memref<1000x1x1xf32>, %i:index, %ii:index, %vec: vector<4x1x1xf32>) {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.0 : f32
-  %0 = memref.subview %dest[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
-  vector.transfer_write %vec, %0[%ii, %c0, %c0] {in_bounds = [true, true, true]} : vector<4x1x1xf32>, memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
+  %0 = memref.subview %dest[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1]>>
+  vector.transfer_write %vec, %0[%ii, %c0, %c0] {in_bounds = [true, true, true]} : vector<4x1x1xf32>, memref<40x1x1xf32, strided<[1, 1, 1]>>
   return
 }
 // CHECK-LABEL:   func.func @contiguous_inner_most_dim_with_subview_2d(
 // CHECK-SAME:      %[[MEM:.*]]: memref<1000x1x1xf32>,
 // CHECK-SAME:      %[[IDX_1:.*]]: index, %[[IDX_2:.*]]: index,
 // CHECK-SAME:      %[[VEC:.*]]: vector<4x1x1xf32>) {
-// CHECK:           %[[SV_1:.*]] = memref.subview %[[MEM]]{{\[}}%[[IDX_1]], 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
-// CHECK:           %[[SV_2:.*]] = memref.subview %[[SV_1]][0, 0, 0] [40, 1, 1] [1, 1, 1] : memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>> to memref<40xf32, strided<[1], offset: ?>>
+// CHECK:           %[[SV_1:.*]] = memref.subview %[[MEM]]{{\[}}%[[IDX_1]], 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1]>>
+// CHECK:           %[[SV_2:.*]] = memref.subview %[[SV_1]][0, 0, 0] [40, 1, 1] [1, 1, 1] : memref<40x1x1xf32, strided<[1, 1, 1]>> to memref<40xf32, strided<[1]>>
 // CHECK:           %[[SC:.*]] = vector.shape_cast %[[VEC]] : vector<4x1x1xf32> to vector<4xf32>
-// CHECK:           vector.transfer_write %[[SC]], %[[SV_2]]{{\[}}%[[IDX_2]]] {in_bounds = [true]} : vector<4xf32>, memref<40xf32, strided<[1], offset: ?>>
+// CHECK:           vector.transfer_write %[[SC]], %[[SV_2]]{{\[}}%[[IDX_2]]] {in_bounds = [true]} : vector<4xf32>, memref<40xf32, strided<[1]>>
 
 // Same as the top example within this split, but with the outer vector
 // dim scalable. Note that this example only makes sense when "4 = [4]" (i.e.
@@ -582,33 +582,33 @@ func.func @contiguous_inner_most_dim_with_subview_2d(%dest: memref<1000x1x1xf32>
 func.func @contiguous_inner_most_dim_with_subview_2d_scalable(%dest: memref<1000x1x1xf32>, %i:index, %ii:index, %vec: vector<[4]x1x1xf32>) {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.0 : f32
-  %0 = memref.subview %dest[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
-  vector.transfer_write %vec, %0[%ii, %c0, %c0] {in_bounds = [true, true, true]} : vector<[4]x1x1xf32>, memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
+  %0 = memref.subview %dest[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1]>>
+  vector.transfer_write %vec, %0[%ii, %c0, %c0] {in_bounds = [true, true, true]} : vector<[4]x1x1xf32>, memref<40x1x1xf32, strided<[1, 1, 1]>>
   return
 }
 // CHECK-LABEL:   func.func @contiguous_inner_most_dim_with_subview_2d_scalable
 // CHECK-SAME:      %[[MEM:.*]]: memref<1000x1x1xf32>,
 // CHECK-SAME:      %[[IDX_1:.*]]: index, %[[IDX_2:.*]]: index,
 // CHECK-SAME:      %[[VEC:.*]]: vector<[4]x1x1xf32>) {
-// CHECK:           %[[SV_1:.*]] = memref.subview %[[MEM]]{{\[}}%[[IDX_1]], 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
-// CHECK:           %[[SV_2:.*]] = memref.subview %[[SV_1]][0, 0, 0] [40, 1, 1] [1, 1, 1] : memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>> to memref<40xf32, strided<[1], offset: ?>>
+// CHECK:           %[[SV_1:.*]] = memref.subview %[[MEM]]{{\[}}%[[IDX_1]], 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1]>>
+// CHECK:           %[[SV_2:.*]] = memref.subview %[[SV_1]][0, 0, 0] [40, 1, 1] [1, 1, 1] : memref<40x1x1xf32, strided<[1, 1, 1]>> to memref<40xf32, strided<[1]>>
 // CHECK:           %[[SC:.*]] = vector.shape_cast %[[VEC]] : vector<[4]x1x1xf32> to vector<[4]xf32>
-// CHECK:           vector.transfer_write %[[SC]], %[[SV_2]]{{\[}}%[[IDX_2]]] {in_bounds = [true]} : vector<[4]xf32>, memref<40xf32, strided<[1], offset: ?>>
+// CHECK:           vector.transfer_write %[[SC]], %[[SV_2]]{{\[}}%[[IDX_2]]] {in_bounds = [true]} : vector<[4]xf32>, memref<40xf32, strided<[1]>>
 
 // -----
 
-func.func @contiguous_inner_most_with_mask(%dest: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, %vec: vector<1x8x1xf32>, %mask: vector<1x8x1xi1>) {
+func.func @contiguous_inner_most_with_mask(%dest: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>, %vec: vector<1x8x1xf32>, %mask: vector<1x8x1xi1>) {
   %c0 = arith.constant 0 : index
-  vector.transfer_write %vec, %dest[%c0, %c0, %c0, %c0], %mask {in_bounds = [true, true, true]} : vector<1x8x1xf32>, memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
+  vector.transfer_write %vec, %dest[%c0, %c0, %c0, %c0], %mask {in_bounds = [true, true, true]} : vector<1x8x1xf32>, memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>
   return
 }
-//      CHECK: func @contiguous_inner_most_with_mask(%[[DEST:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, %[[VEC:.+]]: vector<1x8x1xf32>, %[[MASK:.+]]: vector<1x8x1xi1>)
+//      CHECK: func @contiguous_inner_most_with_mask(%[[DEST:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>>, %[[VEC:.+]]: vector<1x8x1xf32>, %[[MASK:.+]]: vector<1x8x1xi1>)
 //      CHECK:   %[[DEST_0:.+]] = memref.subview %[[DEST]]
-// CHECK-SAME:    memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
+// CHECK-SAME:    memref<1x1x8x1xf32, strided<[3072, 8, 1, 1]>> to memref<1x1x8xf32, strided<[3072, 8, 1]>>
 //      CHECK:   %[[REDUCED_VEC:.+]] = vector.shape_cast %[[VEC]] : vector<1x8x1xf32> to vector<1x8xf32>
 //      CHECK:   %[[REDUCED_MASK:.+]] = vector.shape_cast %[[MASK]] : vector<1x8x1xi1> to vector<1x8xi1>
 //      CHECK:   vector.transfer_write %[[REDUCED_VEC]], %[[DEST_0]]{{.*}}, %[[REDUCED_MASK]]
-// CHECK-SAME:    vector<1x8xf32>, memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
+// CHECK-SAME:    vector<1x8xf32>, memref<1x1x8xf32, strided<[3072, 8, 1]>>
 // -----
 
 // NOTE: This is an out-of-bounds access.
@@ -637,11 +637,11 @@ func.func @negative_non_unit_inner_memref_dim(%dest: memref<4x8xf32>, %vec: vect
 
 // The inner most unit dims can not be dropped if the strides are not ones.
 
-func.func @negative_non_unit_strides(%dest: memref<512x16x1xf32, strided<[8192, 16, 4], offset: ?>>, %v: vector<16x16x1xf32>, %i: index) {
+func.func @negative_non_unit_strides(%dest: memref<512x16x1xf32, strided<[8192, 16, 4]>>, %v: vector<16x16x1xf32>, %i: index) {
   %c0 = arith.constant 0 : index
   vector.transfer_write %v, %dest[%i, %c0, %c0]
     {in_bounds = [true, true, true]}
-    : vector<16x16x1xf32>, memref<512x16x1xf32, strided<[8192, 16, 4], offset: ?>>
+    : vector<16x16x1xf32>, memref<512x16x1xf32, strided<[8192, 16, 4]>>
   return
 }
 // CHECK:     func.func @negative_non_unit_strides
diff --git a/mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir b/mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir
index d30ba64c09159..f137a835016de 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir
@@ -5,11 +5,11 @@
 //-----------------------------------------------------------------------------
 
 func.func @transfer_read_rank_reducing(
-      %arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>) -> vector<3x2xi8> {
+      %arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1]>>) -> vector<3x2xi8> {
     %c0 = arith.constant 0 : index
     %cst = arith.constant 0 : i8
     %v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst :
-      memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>, vector<3x2xi8>
+      memref<1x1x3x2xi8, strided<[6, 6, 2, 1]>>, vector<3x2xi8>
     return %v : vector<3x2xi8>
 }
 // CHECK-LABEL: func @transfer_read_rank_reducing
@@ -19,13 +19,13 @@ func.func @transfer_read_rank_reducing(
 //       CHECK:   vector.transfer_read %[[SUBVIEW]]
 
 func.func @transfer_read_rank_reducing_masked(
-      %arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>,
+      %arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1]>>,
       %mask: vector<3x2xi1>) -> vector<3x2xi8> {
     %c0 = arith.constant 0 : index
     %cst = arith.constant 0 : i8
     %v = vector.mask %mask {
       vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst :
-        memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>, vector<3x2xi8>
+        memref<1x1x3x2xi8, strided<[6, 6, 2, 1]>>, vector<3x2xi8>
     } : vector<3x2xi1> -> vector<3x2xi8>
     return %v : vector<3x2xi8>
 }
@@ -38,12 +38,12 @@ func.func @transfer_read_rank_reducing_masked(
 //  CHECK-SAME:  vector.transfer_read %[[SUBVIEW]]
 
 func.func @transfer_write_rank_reducing(
-      %arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>,
+      %arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1]>>,
       %vec : vector<3x2xi8>) {
 
     %c0 = arith.constant 0 : index
     vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] :
-      vector<3x2xi8>, memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>
+      vector<3x2xi8>, memref<1x1x3x2xi8, strided<[6, 6, 2, 1]>>
     return
 }
 // CHECK-LABEL: func @transfer_write_rank_reducing
@@ -53,13 +53,13 @@ func.func @transfer_write_rank_reducing(
 //       CHECK:   vector.transfer_write %{{.*}}, %[[SUBVIEW]]
 
 func.func @transfer_write_rank_reducing_masked(
-      %arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>,
+      %arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1]>>,
       %vec : vector<3x2xi8>,
       %mask: vector<3x2xi1>) {
     %c0 = arith.constant 0 : index
     vector.mask %mask {
       vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] :
-        vector<3x2xi8>, memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>
+        vector<3x2xi8>, memref<1x1x3x2xi8, strided<[6, 6, 2, 1]>>
     } : vector<3x2xi1>
     return
 }
@@ -162,29 +162,29 @@ func.func @transfer_write_and_vector_rank_reducing_to_0d_masked(
 //   CHECK-NOT:   memref.subview
 
 func.func @transfer_read_dynamic_rank_reducing(
-      %arg : memref<?x1xi8, strided<[?, ?], offset: ?>>) -> vector<[16]x1xi8> {
+      %arg : memref<?x1xi8, strided<[?, ?]>>) -> vector<[16]x1xi8> {
     %c0 = arith.constant 0 : index
     %pad = arith.constant 0 : i8
     %v = vector.transfer_read %arg[%c0, %c0], %pad {in_bounds = [true, true]} :
-      memref<?x1xi8, strided<[?, ?], offset: ?>>, vector<[16]x1xi8>
+      memref<?x1xi8, strided<[?, ?]>>, vector<[16]x1xi8>
     return %v : vector<[16]x1xi8>
 }
 // CHECK-LABEL: func @transfer_read_dynamic_rank_reducing
 //  CHECK-SAME:     %[[ARG:.+]]: memref<?x1xi8
 //       CHECK:   %[[C0:.+]] = arith.constant 0 : index
-//       CHECK:   %[[DIM0:.+]] = memref.dim %[[ARG]], %[[C0]] : memref<?x1xi8, strided<[?, ?], offset: ?>>
+//       CHECK:   %[[DIM0:.+]] = memref.dim %[[ARG]], %[[C0]] : memref<?x1xi8, strided<[?, ?]>>
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0] [%[[DIM0]], 1] [1, 1] : memref<?x1xi8, {{.*}}> to memref<?xi8, {{.*}}>
 //       CHECK:   vector.transfer_read %[[SUBVIEW]]{{.*}} : memref<?xi8, {{.*}}>, vector<[16]xi8>
 
 func.func @masked_transfer_read_dynamic_rank_reducing_1_create_mask(
-      %arg : memref<?x1xi8, strided<[?, ?], offset: ?>>,
+      %arg : memref<?x1xi8, strided<[?, ?]>>,
       %mask_dim0 : index) -> vector<[16]x1xi8> {
     %c0 = arith.constant 0 : index
     %c1 = arith.constant 1 : index
     %pad = arith.constant 0 : i8
     %mask = vector.create_mask %mask_dim0, %c1 : vector<[16]x1xi1>
     %v = vector.transfer_read %arg[%c0, %c0], %pad, %mask {in_bounds = [true, true]} :
-      memref<?x1xi8, strided<[?, ?], offset: ?>>, vector<[16]x1xi8>
+      memref<?x1xi8, strided<[?, ?]>>, vector<[16]x1xi8>
     return %v : vector<[16]x1xi8>
 }
 // CHECK-LABEL: func @masked_transfer_read_dynamic_rank_reducing_1_create_mask
@@ -193,17 +193,17 @@ func.func @masked_transfer_read_dynamic_rank_reducing_1_create_mask(
 //       CHECK:   %[[C0:.+]] = arith.constant 0 : index
 //       CHECK:   %[[PAD:.+]] = arith.constant 0 : i8
 //       CHECK:   %[[MASK:.+]] = vector.create_mask %[[MASK_DIM0]] : vector<[16]xi1>
-//       CHECK:   %[[DIM0:.+]] = memref.dim %[[ARG]], %[[C0]] : memref<?x1xi8, strided<[?, ?], offset: ?>>
+//       CHECK:   %[[DIM0:.+]] = memref.dim %[[ARG]], %[[C0]] : memref<?x1xi8, strided<[?, ?]>>
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0] [%[[DIM0]], 1] [1, 1] : memref<?x1xi8, {{.*}}> to memref<?xi8, {{.*}}>
 //       CHECK:   vector.transfer_read %[[SUBVIEW]][{{.*}}], %[[PAD]], %[[MASK]] {in_bounds = [true]} : memref<?xi8, {{.*}}>, vector<[16]xi8>
 
 func.func @masked_transfer_read_dynamic_rank_reducing_1_constant_mask(
-      %arg : memref<?x1xi8, strided<[?, ?], offset: ?>>) -> vector<[16]x1xi8> {
+      %arg : memref<?x1xi8, strided<[?, ?]>>) -> vector<[16]x1xi8> {
     %c0 = arith.constant 0 : index
     %pad = arith.constant 0 : i8
     %mask = vector.constant_mask [16, 1] : vector<[16]x1xi1>
     %v = vector.transfer_read %arg[%c0, %c0], %pad, %mask {in_bounds = [true, true]} :
-      memref<?x1xi8, strided<[?, ?], offset: ?>>, vector<[16]x1xi8>
+      memref<?x1xi8, strided<[?, ?]>>, vector<[16]x1xi8>
     return %v : vector<[16]x1xi8>
 }
 // CHECK-LABEL: func @masked_transfer_read_dynamic_rank_reducing_1_constant_mask
@@ -213,7 +213,7 @@ func.func @masked_transfer_read_dynamic_rank_reducing_1_constant_mask(
 //       CHECK:   vector.transfer_read %[[SUBVIEW]]{{.*}} {in_bounds = [true]} : memref<?xi8, {{.*}}>, vector<[16]xi8>
 
 func.func @masked_transfer_read_dynamic_rank_reducing_2_create_mask(
-      %arg : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?], offset: ?>>,
+      %arg : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?]>>,
       %mask_dim1 : index, %mask_dim4 : index) -> vector<1x[1]x3x1x[16]x1xi8> {
     %c0 = arith.constant 0 : index
     %c1 = arith.constant 1 : index
@@ -221,7 +221,7 @@ func.func @masked_transfer_read_dynamic_rank_reducing_2_create_mask(
     %pad = arith.constant 0 : i8
     %mask = vector.create_mask %c1, %mask_dim1, %c2, %c1, %mask_dim4, %c1 : vector<1x[1]x3x1x[16]x1xi1>
     %v = vector.transfer_read %arg[%c0, %c0, %c0, %c0, %c0, %c0], %pad, %mask {in_bounds = [true, true, true, true, true, true]} :
-      memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?], offset: ?>>, vector<1x[1]x3x1x[16]x1xi8>
+      memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?]>>, vector<1x[1]x3x1x[16]x1xi8>
     return %v : vector<1x[1]x3x1x[16]x1xi8>
 }
 // CHECK-LABEL: func @masked_transfer_read_dynamic_rank_reducing_2_create_mask
@@ -233,18 +233,18 @@ func.func @masked_transfer_read_dynamic_rank_reducing_2_create_mask(
 //   CHECK-DAG:   %[[C4:.+]] = arith.constant 4 : index
 //   CHECK-DAG:   %[[PAD:.+]] = arith.constant 0 : i8
 //       CHECK:   %[[MASK:.+]] = vector.create_mask %[[MASK_DIM1]], %[[C2]], %[[MASK_DIM4]] : vector<[1]x3x[16]xi1>
-//       CHECK:   %[[DIM1:.+]] = memref.dim %[[ARG]], %[[C1]] : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?], offset: ?>>
-//       CHECK:   %[[DIM4:.+]] = memref.dim %[[ARG]], %[[C4]] : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?], offset: ?>>
+//       CHECK:   %[[DIM1:.+]] = memref.dim %[[ARG]], %[[C1]] : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?]>>
+//       CHECK:   %[[DIM4:.+]] = memref.dim %[[ARG]], %[[C4]] : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?]>>
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0, 0, 0] [1, %[[DIM1]], 3, 1, %[[DIM4]], 1] [1, 1, 1, 1, 1, 1] : memref<1x?x3x1x?x1xi8, {{.*}}> to memref<?x3x?xi8, {{.*}}>
 //       CHECK:   vector.transfer_read %[[SUBVIEW]][{{.*}}], %[[PAD]], %[[MASK]] {in_bounds = [true, true, true]} : memref<?x3x?xi8, {{.*}}>, vector<[1]x3x[16]xi8>
 
 func.func @masked_transfer_read_dynamic_rank_reducing_2_constant_mask(
-      %arg : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?], offset: ?>>) -> vector<1x[1]x3x1x[16]x1xi8> {
+      %arg : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?]>>) -> vector<1x[1]x3x1x[16]x1xi8> {
     %c0 = arith.constant 0 : index
     %pad = arith.constant 0 : i8
     %mask = vector.constant_mask [1, 1, 2, 1, 16, 1] : vector<1x[1]x3x1x[16]x1xi1>
     %v = vector.transfer_read %arg[%c0, %c0, %c0, %c0, %c0, %c0], %pad, %mask {in_bounds = [true, true, true, true, true, true]} :
-      memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?], offset: ?>>, vector<1x[1]x3x1x[16]x1xi8>
+      memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?]>>, vector<1x[1]x3x1x[16]x1xi8>
     return %v : vector<1x[1]x3x1x[16]x1xi8>
 }
 // CHECK-LABEL: func @masked_transfer_read_dynamic_rank_reducing_2_constant_mask
@@ -254,8 +254,8 @@ func.func @masked_transfer_read_dynamic_rank_reducing_2_constant_mask(
 //   CHECK-DAG:   %[[C4:.+]] = arith.constant 4 : index
 //   CHECK-DAG:   %[[PAD:.+]] = arith.constant 0 : i8
 //       CHECK:   %[[MASK:.+]] = vector.constant_mask [1, 2, 16] : vector<[1]x3x[16]xi1>
-//       CHECK:   %[[DIM1:.+]] = memref.dim %[[ARG]], %[[C1]] : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?], offset: ?>>
-//       CHECK:   %[[DIM4:.+]] = memref.dim %[[ARG]], %[[C4]] : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?], offset: ?>>
+//       CHECK:   %[[DIM1:.+]] = memref.dim %[[ARG]], %[[C1]] : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?]>>
+//       CHECK:   %[[DIM4:.+]] = memref.dim %[[ARG]], %[[C4]] : memref<1x?x3x1x?x1xi8, strided<[?, ?, ?, ?, ?, ?]>>
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0, 0, 0] [1, %[[DIM1]], 3, 1, %[[DIM4]], 1] [1, 1, 1, 1, 1, 1] : memref<1x?x3x1x?x1xi8, {{.*}}> to memref<?x3x?xi8, {{.*}}>
 //       CHECK:   vector.transfer_read %[[SUBVIEW]][{{.*}}], %[[PAD]], %[[MASK]] {in_bounds = [true, true, true]} : memref<?x3x?xi8, {{.*}}>, vector<[1]x3x[16]xi8>
 
@@ -298,7 +298,7 @@ func.func @masked_transfer_write_and_vector_rank_reducing_constant_mask(
 //       CHECK:   vector.transfer_write %{{.*}}, %[[SUBVIEW]]{{.*}}, %[[MASK]] {in_bounds = [true, true]} : vector<3x16xf32>, memref<3x16xf32>
 
 func.func @masked_transfer_write_dynamic_rank_reducing_create_mask(
-      %arg : memref<?x1xi8, strided<[?, ?], offset: ?>>,
+      %arg : memref<?x1xi8, strided<[?, ?]>>,
       %vec : vector<[16]x1xi8>,
       %mask_dim0 : index) {
     %c0 = arith.constant 0 : index
@@ -306,7 +306,7 @@ func.func @masked_transfer_write_dynamic_rank_reducing_create_mask(
     %pad = arith.constant 0 : i8
     %mask = vector.create_mask %mask_dim0, %c1 : vector<[16]x1xi1>
     vector.transfer_write %vec, %arg[%c0, %c0], %mask {in_bounds = [true, true]} :
-      vector<[16]x1xi8>, memref<?x1xi8, strided<[?, ?], offset: ?>>
+      vector<[16]x1xi8>, memref<?x1xi8, strided<[?, ?]>>
     return
 }
 // CHECK-LABEL: func @masked_transfer_write_dynamic_rank_reducing_create_mask
@@ -315,17 +315,17 @@ func.func @masked_transfer_write_dynamic_rank_reducing_create_mask(
 //  CHECK-SAME:     %[[MASK_DIM0:.+]]: index
 //       CHECK:   %[[C0:.+]] = arith.constant 0 : index
 //       CHECK:   %[[MASK:.+]] = vector.create_mask %[[MASK_DIM0]] : vector<[16]xi1>
-//       CHECK:   %[[DIM0:.+]] = memref.dim %[[ARG]], %[[C0]] : memref<?x1xi8, strided<[?, ?], offset: ?>>
+//       CHECK:   %[[DIM0:.+]] = memref.dim %[[ARG]], %[[C0]] : memref<?x1xi8, strided<[?, ?]>>
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0] [%[[DIM0]], 1] [1, 1] : memref<?x1xi8, {{.*}}> to memref<?xi8, {{.*}}>
 //       CHECK:   vector.transfer_write {{.*}}, %[[SUBVIEW]][%[[C0]]], %[[MASK]] {in_bounds = [true]} : vector<[16]xi8>, memref<?xi8, {{.*}}>
 
 func.func @masked_transfer_write_dynamic_rank_reducing_constant_mask(
-      %arg : memref<?x1xi8, strided<[?, ?], offset: ?>>,
+      %arg : memref<?x1xi8, strided<[?, ?]>>,
       %vec : vector<[16]x1xi8>) {
     %c0 = arith.constant 0 : index
     %mask = vector.constant_mask [16, 1] : vector<[16]x1xi1>
     vector.transfer_write %vec, %arg[%c0, %c0], %mask {in_bounds = [true, true]} :
-      vector<[16]x1xi8>, memref<?x1xi8, strided<[?, ?], offset: ?>>
+      vector<[16]x1xi8>, memref<?x1xi8, strided<[?, ?]>>
     return
 }
 // CHECK-LABEL: func @masked_transfer_write_dynamic_rank_reducing_constant_mask
@@ -336,12 +336,12 @@ func.func @masked_transfer_write_dynamic_rank_reducing_constant_mask(
 
 /// Only vector.create_mask and vector.constant_mask masks are supported.
 func.func @unsupported_masked_transfer_read_dynamic_rank_reducing_1(
-      %arg : memref<?x1xi8, strided<[?, ?], offset: ?>>,
+      %arg : memref<?x1xi8, strided<[?, ?]>>,
       %mask : vector<[16]x1xi1>) -> vector<[16]x1xi8> {
     %c0 = arith.constant 0 : index
     %pad = arith.constant 0 : i8
     %v = vector.transfer_read %arg[%c0, %c0], %pad, %mask {in_bounds = [true, true]} :
-      memref<?x1xi8, strided<[?, ?], offset: ?>>, vector<[16]x1xi8>
+      memref<?x1xi8, strided<[?, ?]>>, vector<[16]x1xi8>
     return %v : vector<[16]x1xi8>
 }
 // CHECK-LABEL: func @unsupported_masked_transfer_read_dynamic_rank_reducing_1
@@ -352,14 +352,14 @@ func.func @unsupported_masked_transfer_read_dynamic_rank_reducing_1(
 
 /// Unit dim mask must be constant of 1.
 func.func @unsupported_masked_transfer_read_dynamic_rank_reducing_2(
-      %arg : memref<?x1xi8, strided<[?, ?], offset: ?>>,
+      %arg : memref<?x1xi8, strided<[?, ?]>>,
       %mask_dim0 : index, %mask_dim1 : index) -> vector<[16]x1xi8> {
     %c0 = arith.constant 0 : index
     %c1 = arith.constant 1 : index
     %pad = arith.constant 0 : i8
     %mask = vector.create_mask %mask_dim0, %mask_dim1 : vector<[16]x1xi1>
     %v = vector.transfer_read %arg[%c0, %c0], %pad, %mask {in_bounds = [true, true]} :
-      memref<?x1xi8, strided<[?, ?], offset: ?>>, vector<[16]x1xi8>
+      memref<?x1xi8, strided<[?, ?]>>, vector<[16]x1xi8>
     return %v : vector<[16]x1xi8>
 }
 // CHECK-LABEL: func @unsupported_masked_transfer_read_dynamic_rank_reducing_2
@@ -369,14 +369,14 @@ func.func @unsupported_masked_transfer_read_dynamic_rank_reducing_2(
 
 /// Unit dim must be non-scalable.
 func.func @masked_transfer_read_dynamic_rank_reducing_scalable_unit_dim(
-      %arg : memref<?x1xi8, strided<[?, ?], offset: ?>>,
+      %arg : memref<?x1xi8, strided<[?, ?]>>,
       %mask_dim0 : index) -> vector<[16]x[1]xi8> {
     %c0 = arith.constant 0 : index
     %c1 = arith.constant 1 : index
     %pad = arith.constant 0 : i8
     %mask = vector.create_mask %mask_dim0, %c1 : vector<[16]x[1]xi1>
     %v = vector.transfer_read %arg[%c0, %c0], %pad, %mask {in_bounds = [true, true]} :
-      memref<?x1xi8, strided<[?, ?], offset: ?>>, vector<[16]x[1]xi8>
+      memref<?x1xi8, strided<[?, ?]>>, vector<[16]x[1]xi8>
     return %v : vector<[16]x[1]xi8>
 }
 // CHECK-LABEL: func @masked_transfer_read_dynamic_rank_reducing_scalable_unit_dim
diff --git a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
index b048af24acfcd..161cd74ace692 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
@@ -14,12 +14,12 @@
 ///----------------------------------------------------------------------------------------
 
 func.func @transfer_read_dims_match_contiguous(
-    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>) -> vector<5x4x3x2xi8> {
+    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>) -> vector<5x4x3x2xi8> {
 
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0 : i8
   %res = vector.transfer_read %mem[%c0, %c0, %c0, %c0], %cst :
-    memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, vector<5x4x3x2xi8>
+    memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>, vector<5x4x3x2xi8>
   return %res : vector<5x4x3x2xi8>
 }
 
@@ -34,12 +34,12 @@ func.func @transfer_read_dims_match_contiguous(
 //       CHECK-128B:   memref.collapse_shape
 
 func.func @transfer_read_dims_match_contiguous_scalable(
-    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>) -> vector<5x4x3x[2]xi8> {
+    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>) -> vector<5x4x3x[2]xi8> {
 
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0 : i8
   %res = vector.transfer_read %mem[%c0, %c0, %c0, %c0], %cst :
-    memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, vector<5x4x3x[2]xi8>
+    memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>, vector<5x4x3x[2]xi8>
   return %res : vector<5x4x3x[2]xi8>
 }
 
@@ -77,12 +77,12 @@ func.func @transfer_read_dims_match_contiguous_empty_stride(
 // contiguous subset of the memref, so "flattenable"
 
 func.func @transfer_read_dims_mismatch_contiguous(
-    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>) -> vector<2x3x2xi8> {
+    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>) -> vector<2x3x2xi8> {
 
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0 : i8
   %res = vector.transfer_read %mem[%c0, %c0, %c0, %c0], %cst :
-    memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, vector<2x3x2xi8>
+    memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>, vector<2x3x2xi8>
   return %res : vector<2x3x2xi8>
 }
 
@@ -94,7 +94,7 @@ func.func @transfer_read_dims_mismatch_contiguous(
 // CHECK-SAME{LITERAL}: [[0], [1, 2, 3]]
 // CHECK-SAME:          : memref<5x4x3x2xi8, {{.+}}> into memref<5x24xi8, {{.+}}>
 // CHECK:         %[[VEC_1D:.+]] = vector.transfer_read %[[COLLAPSED_MEM]][%[[C0]], %[[C0]]], %[[C0_I8]] {in_bounds = [true]}
-// CHECK-SAME:      : memref<5x24xi8, strided<[24, 1], offset: ?>>, vector<12xi8>
+// CHECK-SAME:      : memref<5x24xi8, strided<[24, 1]>>, vector<12xi8>
 // CHECK:         %[[VEC:.+]] = vector.shape_cast %[[VEC_1D]] : vector<12xi8> to vector<2x3x2xi8>
 // CHECK:         return %[[VEC]] : vector<2x3x2xi8>
 
@@ -107,26 +107,26 @@ func.func @transfer_read_dims_mismatch_contiguous(
 // at the leading unit dimensions of the vector.
 
 func.func @transfer_read_dims_mismatch_contiguous_unit_dims(
-    %mem : memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1], offset: ?>>) -> vector<1x1x4x3x2xi8> {
+    %mem : memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1]>>) -> vector<1x1x4x3x2xi8> {
 
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0 : i8
   %res = vector.transfer_read %mem[%c0, %c0, %c0, %c0, %c0], %cst :
-    memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1], offset: ?>>, vector<1x1x4x3x2xi8>
+    memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1]>>, vector<1x1x4x3x2xi8>
   return %res : vector<1x1x4x3x2xi8>
 }
 
 // CHECK-LABEL: func.func @transfer_read_dims_mismatch_contiguous_unit_dims(
-// CHECK-SAME:    %[[MEM:.+]]: memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1], offset: ?>>)
+// CHECK-SAME:    %[[MEM:.+]]: memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1]>>)
 // CHECK-SAME:    -> vector<1x1x4x3x2xi8>
 // CHECK-DAG:   %[[C0_I8:.+]] = arith.constant 0 : i8
 // CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 // CHECK:       %[[COLLAPSED:.+]] = memref.collapse_shape %[[MEM]]
 // CHECK-SAME{LITERAL}: [[0], [1], [2, 3, 4]]
-// CHECK-SAME:    : memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1], offset: ?>>
-// CHECK-SAME:      into memref<6x5x24xi8, strided<[120, 24, 1], offset: ?>>
+// CHECK-SAME:    : memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1]>>
+// CHECK-SAME:      into memref<6x5x24xi8, strided<[120, 24, 1]>>
 // CHECK:       %[[VEC_1D:.+]] = vector.transfer_read %[[COLLAPSED]][%[[C0]], %[[C0]], %[[C0]]], %[[C0_I8]]
-// CHECK-SAME:    {in_bounds = [true]} : memref<6x5x24xi8, strided<[120, 24, 1], offset: ?>>, vector<24xi8>
+// CHECK-SAME:    {in_bounds = [true]} : memref<6x5x24xi8, strided<[120, 24, 1]>>, vector<24xi8>
 // CHECK:       %[[VEC:.+]] = vector.shape_cast %[[VEC_1D]] : vector<24xi8> to vector<1x1x4x3x2xi8>
 // CHECK:       return %[[VEC]] : vector<1x1x4x3x2xi8>
 
@@ -141,23 +141,23 @@ func.func @transfer_read_dims_mismatch_contiguous_unit_dims(
 // the memref.
 
 func.func @transfer_read_non_contiguous_unit_dims(
-    %mem : memref<5x4x3x2xi8, strided<[48, 6, 2, 1], offset: ?>>) -> vector<1x1x3x2xi8> {
+    %mem : memref<5x4x3x2xi8, strided<[48, 6, 2, 1]>>) -> vector<1x1x3x2xi8> {
 
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0 : i8
   %res = vector.transfer_read %mem[%c0, %c0, %c0, %c0], %cst :
-    memref<5x4x3x2xi8, strided<[48, 6, 2, 1], offset: ?>>, vector<1x1x3x2xi8>
+    memref<5x4x3x2xi8, strided<[48, 6, 2, 1]>>, vector<1x1x3x2xi8>
   return %res : vector<1x1x3x2xi8>
 }
 
 // CHECK-LABEL:   func.func @transfer_read_non_contiguous_unit_dims(
-// CHECK-SAME:      %[[MEM:.*]]: memref<5x4x3x2xi8, strided<[48, 6, 2, 1], offset: ?>>) -> vector<1x1x3x2xi8> {
+// CHECK-SAME:      %[[MEM:.*]]: memref<5x4x3x2xi8, strided<[48, 6, 2, 1]>>) -> vector<1x1x3x2xi8> {
 // CHECK-DAG:       %[[VAL_1:.*]] = arith.constant 0 : i8
 // CHECK-DAG:       %[[VAL_2:.*]] = arith.constant 0 : index
 // CHECK:           %[[VAL_3:.*]] = memref.collapse_shape %[[MEM]]
 // CHECK-SAME{LITERAL}: [[0], [1], [2, 3]]
-// CHECK-SAME:        : memref<5x4x3x2xi8, strided<[48, 6, 2, 1], offset: ?>> into memref<5x4x6xi8, strided<[48, 6, 1], offset: ?>>
-// CHECK:           %[[VAL_4:.*]] = vector.transfer_read %[[VAL_3]][%[[VAL_2]], %[[VAL_2]], %[[VAL_2]]], %[[VAL_1]] {in_bounds = [true]} : memref<5x4x6xi8, strided<[48, 6, 1], offset: ?>>, vector<6xi8>
+// CHECK-SAME:        : memref<5x4x3x2xi8, strided<[48, 6, 2, 1]>> into memref<5x4x6xi8, strided<[48, 6, 1]>>
+// CHECK:           %[[VAL_4:.*]] = vector.transfer_read %[[VAL_3]][%[[VAL_2]], %[[VAL_2]], %[[VAL_2]]], %[[VAL_1]] {in_bounds = [true]} : memref<5x4x6xi8, strided<[48, 6, 1]>>, vector<6xi8>
 // CHECK:           %[[VAL_5:.*]] = vector.shape_cast %[[VAL_4]] : vector<6xi8> to vector<1x1x3x2xi8>
 // CHECK:           return %[[VAL_5]] : vector<1x1x3x2xi8>
 
@@ -202,7 +202,7 @@ func.func @transfer_read_dims_mismatch_non_zero_indices(
 // the output vector is to be read _is_ contiguous. Hence the flattening works fine.
 
 func.func @transfer_read_dims_mismatch_non_contiguous_non_zero_indices(
-    %mem : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>,
+    %mem : memref<1x3x3x2xf32, strided<[40, 10, 2, 1]>>,
     %idx_1 : index,
     %idx_2 : index) -> vector<2x2xf32> {
 
@@ -210,7 +210,7 @@ func.func @transfer_read_dims_mismatch_non_contiguous_non_zero_indices(
   %cst_1 = arith.constant 0.000000e+00 : f32
   %res = vector.transfer_read %mem[%c0, %idx_1, %idx_2, %c0], %cst_1 {
     in_bounds = [true, true]
-  } : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>, vector<2x2xf32>
+  } : memref<1x3x3x2xf32, strided<[40, 10, 2, 1]>>, vector<2x2xf32>
   return %res : vector<2x2xf32>
 }
 
@@ -218,7 +218,7 @@ func.func @transfer_read_dims_mismatch_non_contiguous_non_zero_indices(
 
 // CHECK-LABEL:  func.func @transfer_read_dims_mismatch_non_contiguous_non_zero_indices(
 // CHECK:         %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]]
-// CHECK-SAME:      : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
+// CHECK-SAME:      : memref<1x3x3x2xf32, strided<[40, 10, 2, 1]>> into memref<1x3x6xf32, strided<[40, 10, 1]>>
 // CHECK:         %[[APPLY:.*]] = affine.apply #[[$MAP]]()
 
 // CHECK-128B-LABEL: func @transfer_read_dims_mismatch_non_contiguous_non_zero_indices(
@@ -230,7 +230,7 @@ func.func @transfer_read_dims_mismatch_non_contiguous_non_zero_indices(
 // or not. Indeed, those dynamic shapes are not candidates for flattening anyway.
 
 func.func @transfer_read_leading_dynamic_dims(
-    %mem : memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>,
+    %mem : memref<?x?x8x4xi8, strided<[?, 32, 4, 1]>>,
     %idx_1 : index,
     %idx_2 : index) -> vector<8x4xi8> {
 
@@ -238,7 +238,7 @@ func.func @transfer_read_leading_dynamic_dims(
   %c0 = arith.constant 0 : index
   %res = vector.transfer_read %mem[%idx_1, %idx_2, %c0, %c0], %c0_i8 {
     in_bounds = [true, true]
-  } : memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>, vector<8x4xi8>
+  } : memref<?x?x8x4xi8, strided<[?, 32, 4, 1]>>, vector<8x4xi8>
   return %res : vector<8x4xi8>
 }
 
@@ -367,12 +367,12 @@ func.func @transfer_read_0d(
 // Strides make the input memref non-contiguous, hence non-flattenable.
 
 func.func @transfer_read_non_contiguous_src(
-    %mem : memref<5x4x3x2xi8, strided<[24, 8, 2, 1], offset: ?>>) -> vector<5x4x3x2xi8> {
+    %mem : memref<5x4x3x2xi8, strided<[24, 8, 2, 1]>>) -> vector<5x4x3x2xi8> {
 
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0 : i8
   %res = vector.transfer_read %mem[%c0, %c0, %c0, %c0], %cst :
-    memref<5x4x3x2xi8, strided<[24, 8, 2, 1], offset: ?>>, vector<5x4x3x2xi8>
+    memref<5x4x3x2xi8, strided<[24, 8, 2, 1]>>, vector<5x4x3x2xi8>
   return %res : vector<5x4x3x2xi8>
 }
 
@@ -416,12 +416,12 @@ func.func @transfer_read_multi_dim_unit_vector(
 ///----------------------------------------------------------------------------------------
 
 func.func @transfer_write_dims_match_contiguous(
-    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>,
+    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>,
     %vec : vector<5x4x3x2xi8>) {
 
   %c0 = arith.constant 0 : index
   vector.transfer_write %vec, %mem [%c0, %c0, %c0, %c0] :
-    vector<5x4x3x2xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>
+    vector<5x4x3x2xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>
   return
 }
 
@@ -436,12 +436,12 @@ func.func @transfer_write_dims_match_contiguous(
 //       CHECK-128B:   memref.collapse_shape
 
 func.func @transfer_write_dims_match_contiguous_scalable(
-    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>,
+    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>,
     %vec : vector<5x4x3x[2]xi8>) {
 
   %c0 = arith.constant 0 : index
   vector.transfer_write %vec, %mem [%c0, %c0, %c0, %c0] :
-    vector<5x4x3x[2]xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>
+    vector<5x4x3x[2]xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>
   return
 }
 
@@ -479,12 +479,12 @@ func.func @transfer_write_dims_match_contiguous_empty_stride(
 // contiguous subset of the memref, so "flattenable".
 
 func.func @transfer_write_dims_mismatch_contiguous(
-    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>,
+    %mem : memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>,
     %vec : vector<2x2xi8>) {
 
   %c0 = arith.constant 0 : index
   vector.transfer_write %vec, %mem [%c0, %c0, %c0, %c0] :
-    vector<2x2xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>
+    vector<2x2xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1]>>
   return
 }
 
@@ -508,27 +508,27 @@ func.func @transfer_write_dims_mismatch_contiguous(
 // at the leading unit dimensions of the vector.
 
 func.func @transfer_write_dims_mismatch_contiguous_unit_dims(
-    %mem : memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1], offset: ?>>,
+    %mem : memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1]>>,
     %vec : vector<1x1x4x3x2xi8>) {
 
   %c0 = arith.constant 0 : index
   vector.transfer_write %vec, %mem [%c0, %c0, %c0, %c0, %c0] :
-    vector<1x1x4x3x2xi8>, memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1], offset: ?>>
+    vector<1x1x4x3x2xi8>, memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1]>>
 
   return
 }
 
 // CHECK-LABEL:  func.func @transfer_write_dims_mismatch_contiguous_unit_dims(
-// CHECK-SAME:   %[[MEM:.+]]: memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1], offset: ?>>
+// CHECK-SAME:   %[[MEM:.+]]: memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1]>>
 // CHECK-SAME:   %[[VEC:.+]]: vector<1x1x4x3x2xi8>
 // CHECK:          %[[C0:.+]] = arith.constant 0 : index
 // CHECK:          %[[COLLAPSED:.+]] = memref.collapse_shape %[[MEM]]
 // CHECK-SAME{LITERAL}: [[0], [1], [2, 3, 4]]
-// CHECK-SAME:       : memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1], offset: ?>>
-// CHECK-SAME:         into memref<6x5x24xi8, strided<[120, 24, 1], offset: ?>>
+// CHECK-SAME:       : memref<6x5x4x3x2xi8, strided<[120, 24, 6, 2, 1]>>
+// CHECK-SAME:         into memref<6x5x24xi8, strided<[120, 24, 1]>>
 // CHECK:          %[[VEC_1D:.+]] = vector.shape_cast %[[VEC]] : vector<1x1x4x3x2xi8> to vector<24xi8>
 // CHECK:          vector.transfer_write %[[VEC_1D]], %[[COLLAPSED]][%[[C0]], %[[C0]], %[[C0]]]
-// CHECK-SAME:       {in_bounds = [true]} : vector<24xi8>, memref<6x5x24xi8, strided<[120, 24, 1], offset: ?>>
+// CHECK-SAME:       {in_bounds = [true]} : vector<24xi8>, memref<6x5x24xi8, strided<[120, 24, 1]>>
 
 // CHECK-128B-LABEL: func @transfer_write_dims_mismatch_contiguous_unit_dims(
 //       CHECK-128B:   memref.collapse_shape
@@ -541,25 +541,25 @@ func.func @transfer_write_dims_mismatch_contiguous_unit_dims(
 // the memref.
 
 func.func @transfer_write_non_contiguous_unit_dims(
-    %mem : memref<5x4x3x2xi8, strided<[48, 6, 2, 1], offset: ?>>,
+    %mem : memref<5x4x3x2xi8, strided<[48, 6, 2, 1]>>,
     %vec : vector<1x1x3x2xi8>) {
 
   %c0 = arith.constant 0 : index
   vector.transfer_write %vec, %mem [%c0, %c0, %c0, %c0] :
-    vector<1x1x3x2xi8>, memref<5x4x3x2xi8, strided<[48, 6, 2, 1], offset: ?>>
+    vector<1x1x3x2xi8>, memref<5x4x3x2xi8, strided<[48, 6, 2, 1]>>
   return
 }
 
 // CHECK-LABEL:   func.func @transfer_write_non_contiguous_unit_dims
-// CHECK-SAME:      %[[MEM:.*]]: memref<5x4x3x2xi8, strided<[48, 6, 2, 1], offset: ?>>,
+// CHECK-SAME:      %[[MEM:.*]]: memref<5x4x3x2xi8, strided<[48, 6, 2, 1]>>,
 // CHECK-SAME:      %[[VEC:.*]]: vector<1x1x3x2xi8>) {
 // CHECK:           %[[C0:.*]] = arith.constant 0 : index
 // CHECK:           %[[COLLAPSED:.*]] = memref.collapse_shape %[[MEM]]
 // CHECK-SAME{LITERAL}: [[0], [1], [2, 3]]
-// CHECK-SAME:        : memref<5x4x3x2xi8, strided<[48, 6, 2, 1], offset: ?>> into memref<5x4x6xi8, strided<[48, 6, 1], offset: ?>>
+// CHECK-SAME:        : memref<5x4x3x2xi8, strided<[48, 6, 2, 1]>> into memref<5x4x6xi8, strided<[48, 6, 1]>>
 // CHECK:           %[[VEC_1D:.*]] = vector.shape_cast %[[VEC]] : vector<1x1x3x2xi8> to vector<6xi8>
 // CHECK:           vector.transfer_write %[[VEC_1D]], %[[COLLAPSED]][%[[C0]], %[[C0]], %[[C0]]]
-// CHECK-SAME:        {in_bounds = [true]} : vector<6xi8>, memref<5x4x6xi8, strided<[48, 6, 1], offset: ?>>
+// CHECK-SAME:        {in_bounds = [true]} : vector<6xi8>, memref<5x4x6xi8, strided<[48, 6, 1]>>
 
 // CHECK-128B-LABEL: func @transfer_write_non_contiguous_unit_dims(
 //       CHECK-128B:   memref.collapse_shape
@@ -603,12 +603,12 @@ func.func @transfer_write_dims_mismatch_non_zero_indices(
 
 func.func @transfer_write_dims_mismatch_non_contiguous_non_zero_indices(
     %vec : vector<2x2xf32>,
-    %mem : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>,
+    %mem : memref<1x3x3x2xf32, strided<[40, 10, 2, 1]>>,
     %idx_1 : index,
     %idx_2 : index) {
 
   %c0 = arith.constant 0 : index
-  vector.transfer_write %vec, %mem[%c0, %idx_1, %idx_2, %c0] {in_bounds = [true, true]} : vector<2x2xf32>, memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>
+  vector.transfer_write %vec, %mem[%c0, %idx_1, %idx_2, %c0] {in_bounds = [true, true]} : vector<2x2xf32>, memref<1x3x3x2xf32, strided<[40, 10, 2, 1]>>
   return
 }
 
@@ -616,7 +616,7 @@ func.func @transfer_write_dims_mismatch_non_contiguous_non_zero_indices(
 
 // CHECK-LABEL:  func.func @transfer_write_dims_mismatch_non_contiguous_non_zero_indices(
 // CHECK-DAG:      %[[APPLY:.*]] = affine.apply #[[$MAP]]()
-// CHECK-DAG:      %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
+// CHECK-DAG:      %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1]>> into memref<1x3x6xf32, strided<[40, 10, 1]>>
 
 // CHECK-128B-LABEL: func @transfer_write_dims_mismatch_non_contiguous_non_zero_indices(
 //       CHECK-128B:   memref.collapse_shape
@@ -628,13 +628,13 @@ func.func @transfer_write_dims_mismatch_non_contiguous_non_zero_indices(
 
 func.func @transfer_write_leading_dynamic_dims(
     %vec : vector<8x4xi8>,
-    %mem : memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>,
+    %mem : memref<?x?x8x4xi8, strided<[?, 32, 4, 1]>>,
     %idx_1 : index,
     %idx_2 : index) {
 
   %c0 = arith.constant 0 : index
   vector.transfer_write %vec, %mem[%idx_1, %idx_2, %c0, %c0] {in_bounds = [true, true]} :
-    vector<8x4xi8>, memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>
+    vector<8x4xi8>, memref<?x?x8x4xi8, strided<[?, 32, 4, 1]>>
   return
 }
 
@@ -756,12 +756,12 @@ func.func @transfer_write_0d(
 // The strides make the input memref non-contiguous, hence non-flattenable.
 
 func.func @transfer_write_non_contiguous_src(
-    %mem : memref<5x4x3x2xi8, strided<[24, 8, 2, 1], offset: ?>>,
+    %mem : memref<5x4x3x2xi8, strided<[24, 8, 2, 1]>>,
     %vec : vector<5x4x3x2xi8>) {
 
   %c0 = arith.constant 0 : index
   vector.transfer_write %vec, %mem[%c0, %c0, %c0, %c0] :
-   vector<5x4x3x2xi8>, memref<5x4x3x2xi8, strided<[24, 8, 2, 1], offset: ?>>
+   vector<5x4x3x2xi8>, memref<5x4x3x2xi8, strided<[24, 8, 2, 1]>>
   return
 }
 
@@ -776,11 +776,11 @@ func.func @transfer_write_non_contiguous_src(
 // -----
 
 func.func @negative_out_of_bound_transfer_read(
-    %mem : memref<?x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>) -> vector<5x4x3x2xi8> {
+    %mem : memref<?x4x3x2xi8, strided<[24, 6, 2, 1]>>) -> vector<5x4x3x2xi8> {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0 : i8
   %res = vector.transfer_read %mem[%c0, %c0, %c0, %c0], %cst {in_bounds = [false, true, true, true]} :
-    memref<?x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, vector<5x4x3x2xi8>
+    memref<?x4x3x2xi8, strided<[24, 6, 2, 1]>>, vector<5x4x3x2xi8>
   return %res : vector<5x4x3x2xi8>
 }
 // CHECK-LABEL: func.func @negative_out_of_bound_transfer_read
@@ -794,10 +794,10 @@ func.func @negative_out_of_bound_transfer_read(
 // -----
 
 func.func @negative_out_of_bound_transfer_write(
-    %mem : memref<?x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, %vec : vector<1x1x3x2xi8>) {
+    %mem : memref<?x4x3x2xi8, strided<[24, 6, 2, 1]>>, %vec : vector<1x1x3x2xi8>) {
   %c0 = arith.constant 0 : index
   vector.transfer_write %vec, %mem [%c0, %c0, %c0, %c0] {in_bounds = [false, true, true, true]} :
-    vector<1x1x3x2xi8>, memref<?x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>
+    vector<1x1x3x2xi8>, memref<?x4x3x2xi8, strided<[24, 6, 2, 1]>>
   return
 }
 // CHECK-LABEL: func.func @negative_out_of_bound_transfer_write
diff --git a/mlir/test/Dialect/Vector/vector-transfer-full-partial-split-copy-transform.mlir b/mlir/test/Dialect/Vector/vector-transfer-full-partial-split-copy-transform.mlir
index 483147c6f6a40..c003003b78814 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-full-partial-split-copy-transform.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-full-partial-split-copy-transform.mlir
@@ -37,9 +37,9 @@ func.func @split_vector_transfer_read_2d(%A: memref<?x8xf32>, %i: index, %j: ind
   //      CHECK:   %[[sv0:.*]] = affine.min #[[$bounds_map_4]](%[[d0]], %[[i]], %[[c4]])
   //      CHECK:   %[[sv1:.*]] = affine.min #[[$bounds_map_8]](%[[c8]], %[[j]], %[[c8]])
   //      CHECK:   %[[sv:.*]] = memref.subview %[[A]][%[[i]], %[[j]]] [%[[sv0]], %[[sv1]]] [1, 1]
-  // CHECK-SAME:     memref<?x8xf32> to memref<?x?xf32, strided<[8, 1], offset: ?>>
+  // CHECK-SAME:     memref<?x8xf32> to memref<?x?xf32, strided<[8, 1]>>
   //      CHECK:   %[[alloc_view:.*]] = memref.subview %[[alloc]][0, 0] [%[[sv0]], %[[sv1]]] [1, 1]
-  //      CHECK:   memref.copy %[[sv]], %[[alloc_view]] : memref<?x?xf32, strided<[8, 1], offset: ?>> to memref<?x?xf32, strided{{.*}}>
+  //      CHECK:   memref.copy %[[sv]], %[[alloc_view]] : memref<?x?xf32, strided<[8, 1]>> to memref<?x?xf32, strided{{.*}}>
   //      CHECK:   %[[yielded:.*]] = memref.cast %[[alloc]] :
   // CHECK-SAME:     memref<4x8xf32> to memref<?x8xf32>
   //      CHECK:   scf.yield %[[yielded]], %[[c0]], %[[c0]] :
@@ -58,7 +58,7 @@ func.func @split_vector_transfer_read_2d(%A: memref<?x8xf32>, %i: index, %j: ind
 //  CHECK-SAME: %[[i:[a-zA-Z0-9_]*]]: index
 //  CHECK-SAME: %[[j:[a-zA-Z0-9_]*]]: index
 func.func @split_vector_transfer_read_strided_2d(
-    %A: memref<7x8xf32, strided<[?, 1], offset: ?>>,
+    %A: memref<7x8xf32, strided<[?, 1]>>,
     %i: index, %j: index) -> vector<4x8xf32> {
   %c0 = arith.constant 0 : index
   %f0 = arith.constant 0.0 : f32
@@ -78,30 +78,30 @@ func.func @split_vector_transfer_read_strided_2d(
   //      CHECK: %[[cmp1:.*]] = arith.cmpi sle, %[[idx1]], %[[c8]] : index
   // are both conds true
   //      CHECK: %[[cond:.*]] = arith.andi %[[cmp0]], %[[cmp1]] : i1
-  //      CHECK: %[[ifres:.*]]:3 = scf.if %[[cond]] -> (memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index) {
+  //      CHECK: %[[ifres:.*]]:3 = scf.if %[[cond]] -> (memref<?x8xf32, strided<[?, 1]>>, index, index) {
   //               inBounds but not cast-compatible: yield a memref_casted form of %A
   //      CHECK:   %[[casted:.*]] = memref.cast %arg0 :
-  // CHECK-SAME:     memref<7x8xf32, strided<[?, 1], offset: ?>> to memref<?x8xf32, strided<[?, 1], offset: ?>>
+  // CHECK-SAME:     memref<7x8xf32, strided<[?, 1]>> to memref<?x8xf32, strided<[?, 1]>>
   //      CHECK:   scf.yield %[[casted]], %[[i]], %[[j]] :
-  // CHECK-SAME:     memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index
+  // CHECK-SAME:     memref<?x8xf32, strided<[?, 1]>>, index, index
   //      CHECK: } else {
   //               slow path, fill tmp alloc and yield a memref_casted version of it
   //      CHECK:   linalg.fill ins(%cst : f32) outs(%[[alloc]] : memref<4x8xf32>)
   //      CHECK:   %[[sv0:.*]] = affine.min #[[$bounds_map_4]](%[[c7]], %[[i]], %[[c4]])
   //      CHECK:   %[[sv1:.*]] = affine.min #[[$bounds_map_8]](%[[c8]], %[[j]], %[[c8]])
   //      CHECK:   %[[sv:.*]] = memref.subview %[[A]][%[[i]], %[[j]]] [%[[sv0]], %[[sv1]]] [1, 1]
-  // CHECK-SAME:     memref<7x8xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
+  // CHECK-SAME:     memref<7x8xf32, strided<[?, 1]>> to memref<?x?xf32, strided<[?, 1]>>
   //      CHECK:   %[[alloc_view:.*]] = memref.subview %[[alloc]][0, 0] [%[[sv0]], %[[sv1]]] [1, 1]
-  //      CHECK:   memref.copy %[[sv]], %[[alloc_view]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided{{.*}}>
+  //      CHECK:   memref.copy %[[sv]], %[[alloc_view]] : memref<?x?xf32, strided<[?, 1]>> to memref<?x?xf32, strided{{.*}}>
   //      CHECK:   %[[yielded:.*]] = memref.cast %[[alloc]] :
-  // CHECK-SAME:     memref<4x8xf32> to memref<?x8xf32, strided<[?, 1], offset: ?>>
+  // CHECK-SAME:     memref<4x8xf32> to memref<?x8xf32, strided<[?, 1]>>
   //      CHECK:   scf.yield %[[yielded]], %[[c0]], %[[c0]] :
-  // CHECK-SAME:     memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index
+  // CHECK-SAME:     memref<?x8xf32, strided<[?, 1]>>, index, index
   //      CHECK: }
   //      CHECK: %[[res:.*]] = vector.transfer_read {{.*}} {in_bounds = [true, true]} :
-  // CHECK-SAME:   memref<?x8xf32, strided<[?, 1], offset: ?>>, vector<4x8xf32>
+  // CHECK-SAME:   memref<?x8xf32, strided<[?, 1]>>, vector<4x8xf32>
   %1 = vector.transfer_read %A[%i, %j], %f0 :
-    memref<7x8xf32, strided<[?, 1], offset: ?>>, vector<4x8xf32>
+    memref<7x8xf32, strided<[?, 1]>>, vector<4x8xf32>
 
   return %1 : vector<4x8xf32>
 }
@@ -162,10 +162,10 @@ func.func @split_vector_transfer_write_2d(%V: vector<4x8xf32>, %A: memref<?x8xf3
 // CHECK-DAG:         %[[VAL_21:.*]] = affine.min #[[$MAP3]](%[[C8]], %[[J]], %[[C8]])
 // CHECK:             %[[VAL_22:.*]] = memref.subview %[[TEMP]]
 // CHECK-SAME:            [%[[I]], %[[J]]] [%[[VAL_20]], %[[VAL_21]]]
-// CHECK-SAME:            [1, 1] : memref<4x8xf32> to memref<?x?xf32, strided<[8, 1], offset: ?>>
+// CHECK-SAME:            [1, 1] : memref<4x8xf32> to memref<?x?xf32, strided<[8, 1]>>
 // CHECK:             %[[DEST_VIEW:.*]] = memref.subview %[[DEST]][0, 0] [%[[VAL_20]], %[[VAL_21]]] [1, 1]
 // CHECK:             memref.copy %[[VAL_22]], %[[DEST_VIEW]]
-// CHECK-SAME:            : memref<?x?xf32, strided<[8, 1], offset: ?>> to memref<?x?xf32, strided{{.*}}>
+// CHECK-SAME:            : memref<?x?xf32, strided<[8, 1]>> to memref<?x?xf32, strided{{.*}}>
 // CHECK:           }
 // CHECK:           return
 // CHECK:         }
@@ -183,10 +183,10 @@ module attributes {transform.with_named_sequence} {
 // -----
 
 func.func @split_vector_transfer_write_strided_2d(
-    %V: vector<4x8xf32>, %A: memref<7x8xf32, strided<[?, 1], offset: ?>>,
+    %V: vector<4x8xf32>, %A: memref<7x8xf32, strided<[?, 1]>>,
     %i: index, %j: index) {
   vector.transfer_write %V, %A[%i, %j] :
-    vector<4x8xf32>, memref<7x8xf32, strided<[?, 1], offset: ?>>
+    vector<4x8xf32>, memref<7x8xf32, strided<[?, 1]>>
   return
 }
 
@@ -196,7 +196,7 @@ func.func @split_vector_transfer_write_strided_2d(
 // CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0, d1, d2) -> (d0 - d1, 8)>
 // CHECK-LABEL:   func @split_vector_transfer_write_strided_2d(
 // CHECK-SAME:                                                 %[[VEC:.*]]: vector<4x8xf32>,
-// CHECK-SAME:                                                 %[[DEST:.*]]: memref<7x8xf32, strided<[?, 1], offset: ?>>,
+// CHECK-SAME:                                                 %[[DEST:.*]]: memref<7x8xf32, strided<[?, 1]>>,
 // CHECK-SAME:                                                 %[[I:.*]]: index,
 // CHECK-SAME:                                                 %[[J:.*]]: index) {
 // CHECK-DAG:       %[[C0:.*]] = arith.constant 0 : index
@@ -211,32 +211,32 @@ func.func @split_vector_transfer_write_strided_2d(
 // CHECK:           %[[DIM1_IN:.*]] = arith.cmpi sle, %[[DIM1]], %[[C8]] : index
 // CHECK:           %[[IN_BOUNDS:.*]] = arith.andi %[[DIM0_IN]], %[[DIM1_IN]] : i1
 // CHECK:           %[[IN_BOUND_DEST:.*]]:3 = scf.if %[[IN_BOUNDS]]
-// CHECK-SAME:          -> (memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index) {
+// CHECK-SAME:          -> (memref<?x8xf32, strided<[?, 1]>>, index, index) {
 // CHECK:             %[[VAL_16:.*]] = memref.cast %[[DEST]]
-// CHECK-SAME:            : memref<7x8xf32, strided<[?, 1], offset: ?>> to memref<?x8xf32, strided<[?, 1], offset: ?>>
+// CHECK-SAME:            : memref<7x8xf32, strided<[?, 1]>> to memref<?x8xf32, strided<[?, 1]>>
 // CHECK:             scf.yield %[[VAL_16]], %[[I]], %[[J]]
-// CHECK-SAME:            : memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index
+// CHECK-SAME:            : memref<?x8xf32, strided<[?, 1]>>, index, index
 // CHECK:           } else {
 // CHECK:             %[[VAL_17:.*]] = memref.cast %[[TEMP]]
-// CHECK-SAME:            : memref<4x8xf32> to memref<?x8xf32, strided<[?, 1], offset: ?>>
+// CHECK-SAME:            : memref<4x8xf32> to memref<?x8xf32, strided<[?, 1]>>
 // CHECK:             scf.yield %[[VAL_17]], %[[C0]], %[[C0]]
-// CHECK-SAME:            : memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index
+// CHECK-SAME:            : memref<?x8xf32, strided<[?, 1]>>, index, index
 // CHECK:           }
 // CHECK:           vector.transfer_write %[[VEC]],
 // CHECK-SAME:          %[[IN_BOUND_DEST:.*]]#0
 // CHECK-SAME:          [%[[IN_BOUND_DEST]]#1, %[[IN_BOUND_DEST]]#2]
 // CHECK-SAME:          {in_bounds = [true, true]}
-// CHECK-SAME:          : vector<4x8xf32>, memref<?x8xf32, strided<[?, 1], offset: ?>>
+// CHECK-SAME:          : vector<4x8xf32>, memref<?x8xf32, strided<[?, 1]>>
 // CHECK:           %[[OUT_BOUNDS:.*]] = arith.xori %[[IN_BOUNDS]], %[[CT]] : i1
 // CHECK:           scf.if %[[OUT_BOUNDS]] {
 // CHECK-DAG:         %[[VAL_20:.*]] = affine.min #[[$MAP3]](%[[C7]], %[[I]], %[[C4]])
 // CHECK-DAG:         %[[VAL_21:.*]] = affine.min #[[$MAP4]](%[[C8]], %[[J]], %[[C8]])
 // CHECK:             %[[VAL_22:.*]] = memref.subview %[[TEMP]]
 // CHECK-SAME:            [%[[I]], %[[J]]] [%[[VAL_20]], %[[VAL_21]]]
-// CHECK-SAME:            [1, 1] : memref<4x8xf32> to memref<?x?xf32, strided<[8, 1], offset: ?>>
+// CHECK-SAME:            [1, 1] : memref<4x8xf32> to memref<?x?xf32, strided<[8, 1]>>
 // CHECK:             %[[DEST_VIEW:.*]] = memref.subview %[[DEST]][0, 0] [%[[VAL_20]], %[[VAL_21]]] [1, 1]
 // CHECK:             memref.copy %[[VAL_22]], %[[DEST_VIEW]]
-// CHECK-SAME:            : memref<?x?xf32, strided<[8, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK-SAME:            : memref<?x?xf32, strided<[8, 1]>> to memref<?x?xf32, strided<[?, 1]>>
 // CHECK:           }
 // CHECK:           return
 // CHECK:         }
diff --git a/mlir/test/Dialect/Vector/vector-transfer-full-partial-split.mlir b/mlir/test/Dialect/Vector/vector-transfer-full-partial-split.mlir
index a9c7bf8e8b327..a01569e2fd7c8 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-full-partial-split.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-full-partial-split.mlir
@@ -55,7 +55,7 @@ func.func @split_vector_transfer_read_2d(%A: memref<?x8xf32>, %i: index, %j: ind
 //  CHECK-SAME: %[[j:[a-zA-Z0-9_]*]]: index
 
 func.func @split_vector_transfer_read_strided_2d(
-    %A: memref<7x8xf32, strided<[?, 1], offset: ?>>,
+    %A: memref<7x8xf32, strided<[?, 1]>>,
     %i: index, %j: index) -> vector<4x8xf32> {
   %c0 = arith.constant 0 : index
   %f0 = arith.constant 0.0 : f32
@@ -73,29 +73,29 @@ func.func @split_vector_transfer_read_strided_2d(
   //      CHECK: %[[cmp1:.*]] = arith.cmpi sle, %[[idx1]], %[[c8]] : index
   // are both conds true
   //      CHECK: %[[cond:.*]] = arith.andi %[[cmp0]], %[[cmp1]] : i1
-  //      CHECK: %[[ifres:.*]]:3 = scf.if %[[cond]] -> (memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index) {
+  //      CHECK: %[[ifres:.*]]:3 = scf.if %[[cond]] -> (memref<?x8xf32, strided<[?, 1]>>, index, index) {
   //               inBounds but not cast-compatible: yield a memref_casted form of %A
   //      CHECK:   %[[casted:.*]] = memref.cast %arg0 :
-  // CHECK-SAME:     memref<7x8xf32, strided<[?, 1], offset: ?>> to memref<?x8xf32, strided<[?, 1], offset: ?>>
+  // CHECK-SAME:     memref<7x8xf32, strided<[?, 1]>> to memref<?x8xf32, strided<[?, 1]>>
   //      CHECK:   scf.yield %[[casted]], %[[i]], %[[j]] :
-  // CHECK-SAME:     memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index
+  // CHECK-SAME:     memref<?x8xf32, strided<[?, 1]>>, index, index
   //      CHECK: } else {
   //               slow path, fill tmp alloc and yield a memref_casted version of it
   //      CHECK:   %[[slow:.*]] = vector.transfer_read %[[A]][%[[i]], %[[j]]], %cst :
-  // CHECK-SAME:     memref<7x8xf32, strided<[?, 1], offset: ?>>, vector<4x8xf32>
+  // CHECK-SAME:     memref<7x8xf32, strided<[?, 1]>>, vector<4x8xf32>
   //      CHECK:   %[[cast_alloc:.*]] = vector.type_cast %[[alloc]] :
   // CHECK-SAME:     memref<4x8xf32> to memref<vector<4x8xf32>>
   //      CHECK:   store %[[slow]], %[[cast_alloc]][] :
   // CHECK-SAME:     memref<vector<4x8xf32>>
   //      CHECK:   %[[yielded:.*]] = memref.cast %[[alloc]] :
-  // CHECK-SAME:     memref<4x8xf32> to memref<?x8xf32, strided<[?, 1], offset: ?>>
+  // CHECK-SAME:     memref<4x8xf32> to memref<?x8xf32, strided<[?, 1]>>
   //      CHECK:   scf.yield %[[yielded]], %[[c0]], %[[c0]] :
-  // CHECK-SAME:     memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index
+  // CHECK-SAME:     memref<?x8xf32, strided<[?, 1]>>, index, index
   //      CHECK: }
   //      CHECK: %[[res:.*]] = vector.transfer_read {{.*}} {in_bounds = [true, true]} :
-  // CHECK-SAME:   memref<?x8xf32, strided<[?, 1], offset: ?>>, vector<4x8xf32>
+  // CHECK-SAME:   memref<?x8xf32, strided<[?, 1]>>, vector<4x8xf32>
   %1 = vector.transfer_read %A[%i, %j], %f0 :
-    memref<7x8xf32, strided<[?, 1], offset: ?>>, vector<4x8xf32>
+    memref<7x8xf32, strided<[?, 1]>>, vector<4x8xf32>
 
   // CHECK: return %[[res]] : vector<4x8xf32>
   return %1 : vector<4x8xf32>
@@ -206,10 +206,10 @@ module attributes {transform.with_named_sequence} {
 // -----
 
 func.func @split_vector_transfer_write_strided_2d(
-    %V: vector<4x8xf32>, %A: memref<7x8xf32, strided<[?, 1], offset: ?>>,
+    %V: vector<4x8xf32>, %A: memref<7x8xf32, strided<[?, 1]>>,
     %i: index, %j: index) {
   vector.transfer_write %V, %A[%i, %j] :
-    vector<4x8xf32>, memref<7x8xf32, strided<[?, 1], offset: ?>>
+    vector<4x8xf32>, memref<7x8xf32, strided<[?, 1]>>
   return
 }
 
@@ -217,7 +217,7 @@ func.func @split_vector_transfer_write_strided_2d(
 // CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0] -> (s0 + 8)>
 // CHECK:   func @split_vector_transfer_write_strided_2d(
 // CHECK-SAME:                                                 %[[VEC:.*]]: vector<4x8xf32>,
-// CHECK-SAME:                                                 %[[DEST:.*]]: memref<7x8xf32, strided<[?, 1], offset: ?>>,
+// CHECK-SAME:                                                 %[[DEST:.*]]: memref<7x8xf32, strided<[?, 1]>>,
 // CHECK-SAME:                                                 %[[I:.*]]: index,
 // CHECK-SAME:                                                 %[[J:.*]]: index) {
 // CHECK-DAG:       %[[C7:.*]] = arith.constant 7 : index
@@ -231,21 +231,21 @@ func.func @split_vector_transfer_write_strided_2d(
 // CHECK:           %[[DIM1_IN:.*]] = arith.cmpi sle, %[[DIM1]], %[[C8]] : index
 // CHECK:           %[[IN_BOUNDS:.*]] = arith.andi %[[DIM0_IN]], %[[DIM1_IN]] : i1
 // CHECK:           %[[IN_BOUND_DEST:.*]]:3 = scf.if %[[IN_BOUNDS]]
-// CHECK-SAME:          -> (memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index) {
+// CHECK-SAME:          -> (memref<?x8xf32, strided<[?, 1]>>, index, index) {
 // CHECK:             %[[VAL_15:.*]] = memref.cast %[[DEST]]
-// CHECK-SAME:            : memref<7x8xf32, strided<[?, 1], offset: ?>> to memref<?x8xf32, strided<[?, 1], offset: ?>>
+// CHECK-SAME:            : memref<7x8xf32, strided<[?, 1]>> to memref<?x8xf32, strided<[?, 1]>>
 // CHECK:             scf.yield %[[VAL_15]], %[[I]], %[[J]]
-// CHECK-SAME:            : memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index
+// CHECK-SAME:            : memref<?x8xf32, strided<[?, 1]>>, index, index
 // CHECK:           } else {
 // CHECK:             %[[VAL_16:.*]] = memref.cast %[[TEMP]]
-// CHECK-SAME:            : memref<4x8xf32> to memref<?x8xf32, strided<[?, 1], offset: ?>>
+// CHECK-SAME:            : memref<4x8xf32> to memref<?x8xf32, strided<[?, 1]>>
 // CHECK:             scf.yield %[[VAL_16]], %[[C0]], %[[C0]]
-// CHECK-SAME:            : memref<?x8xf32, strided<[?, 1], offset: ?>>, index, index
+// CHECK-SAME:            : memref<?x8xf32, strided<[?, 1]>>, index, index
 // CHECK:           }
 // CHECK:           vector.transfer_write %[[VEC]],
 // CHECK-SAME:          %[[IN_BOUND_DEST:.*]]#0
 // CHECK-SAME:          [%[[IN_BOUND_DEST]]#1, %[[IN_BOUND_DEST]]#2]
-// CHECK-SAME:          {in_bounds = [true, true]} : vector<4x8xf32>, memref<?x8xf32, strided<[?, 1], offset: ?>>
+// CHECK-SAME:          {in_bounds = [true, true]} : vector<4x8xf32>, memref<?x8xf32, strided<[?, 1]>>
 // CHECK:           %[[OUT_BOUNDS:.*]] = arith.xori %[[IN_BOUNDS]], %[[CT]] : i1
 // CHECK:           scf.if %[[OUT_BOUNDS]] {
 // CHECK:             %[[VAL_19:.*]] = vector.type_cast %[[TEMP]]
@@ -253,7 +253,7 @@ func.func @split_vector_transfer_write_strided_2d(
 // CHECK:             %[[VAL_20:.*]] = memref.load %[[VAL_19]][]
 // CHECK-SAME:            : memref<vector<4x8xf32>>
 // CHECK:             vector.transfer_write %[[VAL_20]], %[[DEST]][%[[I]], %[[J]]]
-// CHECK-SAME:            : vector<4x8xf32>, memref<7x8xf32, strided<[?, 1], offset: ?>>
+// CHECK-SAME:            : vector<4x8xf32>, memref<7x8xf32, strided<[?, 1]>>
 // CHECK:           }
 // CHECK:           return
 // CHECK:         }
diff --git a/mlir/test/Dialect/Vector/vector-transferop-opt.mlir b/mlir/test/Dialect/Vector/vector-transferop-opt.mlir
index f4f7fb1ba0304..4a21ce632fb14 100644
--- a/mlir/test/Dialect/Vector/vector-transferop-opt.mlir
+++ b/mlir/test/Dialect/Vector/vector-transferop-opt.mlir
@@ -246,13 +246,13 @@ func.func @collapse_shape_and_read_from_source(%in_0: memref<1x20x1xi32>, %vec:
   %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x4x1xi32>
   %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x4x1xi32> into memref<4xi32>
   scf.for %arg0 = %c0 to %c20 step %c4 {
-    %subview = memref.subview %in_0[0, %arg0, 0] [1, 4, 1] [1, 1, 1] : memref<1x20x1xi32> to memref<1x4x1xi32, strided<[20, 1, 1], offset: ?>>
-    %1 = vector.transfer_read %subview[%c0, %c0, %c0], %c0_i32 {in_bounds = [true, true, true]} : memref<1x4x1xi32, strided<[20, 1, 1], offset: ?>>, vector<1x4x1xi32>
+    %subview = memref.subview %in_0[0, %arg0, 0] [1, 4, 1] [1, 1, 1] : memref<1x20x1xi32> to memref<1x4x1xi32, strided<[20, 1, 1]>>
+    %1 = vector.transfer_read %subview[%c0, %c0, %c0], %c0_i32 {in_bounds = [true, true, true]} : memref<1x4x1xi32, strided<[20, 1, 1]>>, vector<1x4x1xi32>
     // $alloca and $collapse_shape alias
     vector.transfer_write %1, %alloca[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32>
     vector.transfer_write %vec, %collapse_shape[%c0] {in_bounds = [true]} : vector<4xi32>, memref<4xi32>
     %2 = vector.transfer_read %alloca[%c0, %c0, %c0], %c0_i32 {in_bounds = [true, true, true]} : memref<1x4x1xi32>, vector<1x4x1xi32>
-    vector.transfer_write %2, %subview[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32, strided<[20, 1, 1], offset: ?>>
+    vector.transfer_write %2, %subview[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32, strided<[20, 1, 1]>>
   }
   return
 }
@@ -276,13 +276,13 @@ func.func @expand_shape_and_read_from_source(%in_0: memref<20xi32>, %vec: vector
   %alloca = memref.alloca() {alignment = 64 : i64} : memref<4xi32>
   %expand_shape = memref.expand_shape %alloca [[0, 1, 2]] output_shape [1, 4, 1] : memref<4xi32> into memref<1x4x1xi32>
   scf.for %arg0 = %c0 to %c20 step %c4 {
-    %subview = memref.subview %in_0[%arg0] [4] [1] : memref<20xi32> to memref<4xi32, strided<[1], offset: ?>>
-    %1 = vector.transfer_read %subview[%c0], %c0_i32 {in_bounds = [true]} : memref<4xi32, strided<[1], offset: ?>>, vector<4xi32>
+    %subview = memref.subview %in_0[%arg0] [4] [1] : memref<20xi32> to memref<4xi32, strided<[1]>>
+    %1 = vector.transfer_read %subview[%c0], %c0_i32 {in_bounds = [true]} : memref<4xi32, strided<[1]>>, vector<4xi32>
     // $alloca and $expand_shape alias
     vector.transfer_write %1, %alloca[%c0] {in_bounds = [true]} : vector<4xi32>, memref<4xi32>
     vector.transfer_write %vec, %expand_shape[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32>
     %2 = vector.transfer_read %alloca[%c0], %c0_i32 {in_bounds = [true]} : memref<4xi32>, vector<4xi32>
-    vector.transfer_write %2, %subview[%c0] {in_bounds = [true]} : vector<4xi32>, memref<4xi32, strided<[1], offset: ?>>
+    vector.transfer_write %2, %subview[%c0] {in_bounds = [true]} : vector<4xi32>, memref<4xi32, strided<[1]>>
   }
   return
 }
@@ -307,13 +307,13 @@ func.func @collapse_shape_and_read_from_collapse(%in_0: memref<20xi32>, %vec: ve
   %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x4x1xi32>
   %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x4x1xi32> into memref<4xi32>
   scf.for %arg0 = %c0 to %c20 step %c4 {
-    %subview = memref.subview %in_0[%arg0] [4] [1] : memref<20xi32> to memref<4xi32, strided<[1], offset: ?>>
-    %1 = vector.transfer_read %subview[%c0], %c0_i32 {in_bounds = [true]} : memref<4xi32, strided<[1], offset: ?>>, vector<4xi32>
+    %subview = memref.subview %in_0[%arg0] [4] [1] : memref<20xi32> to memref<4xi32, strided<[1]>>
+    %1 = vector.transfer_read %subview[%c0], %c0_i32 {in_bounds = [true]} : memref<4xi32, strided<[1]>>, vector<4xi32>
     vector.transfer_write %1, %collapse_shape[%c0] {in_bounds = [true]} : vector<4xi32>, memref<4xi32>
     // $alloca and $collapse_shape alias
     vector.transfer_write %vec, %alloca[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32>
     %2 = vector.transfer_read %collapse_shape[%c0], %c0_i32 {in_bounds = [true]} : memref<4xi32>, vector<4xi32>
-    vector.transfer_write %2, %subview[%c0] {in_bounds = [true]} : vector<4xi32>, memref<4xi32, strided<[1], offset: ?>>
+    vector.transfer_write %2, %subview[%c0] {in_bounds = [true]} : vector<4xi32>, memref<4xi32, strided<[1]>>
   }
   return
 }
@@ -338,13 +338,13 @@ func.func @expand_shape_and_read_from_expand(%in_0: memref<1x20x1xi32>, %vec: ve
   %alloca = memref.alloca() {alignment = 64 : i64} : memref<4xi32>
   %expand_shape = memref.expand_shape %alloca [[0, 1, 2]] output_shape [1, 4, 1] : memref<4xi32> into memref<1x4x1xi32>
   scf.for %arg0 = %c0 to %c20 step %c4 {
-    %subview = memref.subview %in_0[0, %arg0, 0] [1, 4, 1] [1, 1, 1] : memref<1x20x1xi32> to memref<1x4x1xi32, strided<[20, 1, 1], offset: ?>>
-    %1 = vector.transfer_read %subview[%c0, %c0, %c0], %c0_i32 {in_bounds = [true, true, true]} : memref<1x4x1xi32, strided<[20, 1, 1], offset: ?>>, vector<1x4x1xi32>
+    %subview = memref.subview %in_0[0, %arg0, 0] [1, 4, 1] [1, 1, 1] : memref<1x20x1xi32> to memref<1x4x1xi32, strided<[20, 1, 1]>>
+    %1 = vector.transfer_read %subview[%c0, %c0, %c0], %c0_i32 {in_bounds = [true, true, true]} : memref<1x4x1xi32, strided<[20, 1, 1]>>, vector<1x4x1xi32>
     vector.transfer_write %1, %expand_shape[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32>
     // $alloca and $expand_shape alias
     vector.transfer_write %vec, %alloca[%c0] {in_bounds = [true]} : vector<4xi32>, memref<4xi32>
     %2 = vector.transfer_read %expand_shape[%c0, %c0, %c0], %c0_i32 {in_bounds = [true, true, true]} : memref<1x4x1xi32>, vector<1x4x1xi32>
-    vector.transfer_write %2, %subview[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32, strided<[20, 1, 1], offset: ?>>
+    vector.transfer_write %2, %subview[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x4x1xi32>, memref<1x4x1xi32, strided<[20, 1, 1]>>
   }
   return
 }
diff --git a/mlir/test/Dialect/Vector/vector-warp-distribute.mlir b/mlir/test/Dialect/Vector/vector-warp-distribute.mlir
index 691913b3bd5dc..2485ab7759d33 100644
--- a/mlir/test/Dialect/Vector/vector-warp-distribute.mlir
+++ b/mlir/test/Dialect/Vector/vector-warp-distribute.mlir
@@ -100,20 +100,20 @@ func.func @rewrite_warp_op_to_scf_if(%laneid: index,
 func.func @warp(%laneid: index, %arg1: memref<1024xf32>, %arg2: memref<1024xf32>,
            %arg3: memref<1024xf32>, %gid : index) {
   gpu.warp_execute_on_lane_0(%laneid)[32] {
-    %sa = memref.subview %arg1[%gid] [128] [1] : memref<1024xf32> to memref<128xf32, strided<[1], offset: ?>>
-    %sb = memref.subview %arg2[%gid] [128] [1] : memref<1024xf32> to memref<128xf32, strided<[1], offset: ?>>
-    %sc = memref.subview %arg3[%gid] [128] [1] : memref<1024xf32> to memref<128xf32, strided<[1], offset: ?>>
+    %sa = memref.subview %arg1[%gid] [128] [1] : memref<1024xf32> to memref<128xf32, strided<[1]>>
+    %sb = memref.subview %arg2[%gid] [128] [1] : memref<1024xf32> to memref<128xf32, strided<[1]>>
+    %sc = memref.subview %arg3[%gid] [128] [1] : memref<1024xf32> to memref<128xf32, strided<[1]>>
     %c0 = arith.constant 0 : index
     %c32 = arith.constant 32 : index
     %cst = arith.constant 0.000000e+00 : f32
-    %2 = vector.transfer_read %sa[%c0], %cst : memref<128xf32, strided<[1], offset: ?>>, vector<32xf32>
-    %3 = vector.transfer_read %sa[%c32], %cst : memref<128xf32, strided<[1], offset: ?>>, vector<32xf32>
-    %4 = vector.transfer_read %sb[%c0], %cst : memref<128xf32, strided<[1], offset: ?>>, vector<64xf32>
-    %5 = vector.transfer_read %sb[%c32], %cst : memref<128xf32, strided<[1], offset: ?>>, vector<64xf32>
+    %2 = vector.transfer_read %sa[%c0], %cst : memref<128xf32, strided<[1]>>, vector<32xf32>
+    %3 = vector.transfer_read %sa[%c32], %cst : memref<128xf32, strided<[1]>>, vector<32xf32>
+    %4 = vector.transfer_read %sb[%c0], %cst : memref<128xf32, strided<[1]>>, vector<64xf32>
+    %5 = vector.transfer_read %sb[%c32], %cst : memref<128xf32, strided<[1]>>, vector<64xf32>
     %6 = arith.addf %2, %3 : vector<32xf32>
     %7 = arith.addf %4, %5 : vector<64xf32>
-    vector.transfer_write %6, %sc[%c0] : vector<32xf32>, memref<128xf32, strided<[1], offset: ?>>
-    vector.transfer_write %7, %sc[%c32] : vector<64xf32>, memref<128xf32, strided<[1], offset: ?>>
+    vector.transfer_write %6, %sc[%c0] : vector<32xf32>, memref<128xf32, strided<[1]>>
+    vector.transfer_write %7, %sc[%c32] : vector<64xf32>, memref<128xf32, strided<[1]>>
   }
   return
 }
diff --git a/mlir/test/Dialect/X86/AMX/vector-contract-to-tiled-dp.mlir b/mlir/test/Dialect/X86/AMX/vector-contract-to-tiled-dp.mlir
index 1a6deed31eceb..ce3d7ec24924b 100644
--- a/mlir/test/Dialect/X86/AMX/vector-contract-to-tiled-dp.mlir
+++ b/mlir/test/Dialect/X86/AMX/vector-contract-to-tiled-dp.mlir
@@ -334,9 +334,9 @@ module attributes {transform.with_named_sequence} {
 
 !vecAB = vector<1x16x16x2xbf16>
 !vecC = vector<16x16xf32>
-!memrefA = memref<1x32x16x2xbf16, strided<[8192, 128, 2, 1], offset: ?>>
-!memrefB = memref<1x16x32x2xbf16, strided<[16384, 256, 2, 1], offset: ?>>
-!memrefC = memref<32x32xf32, strided<[128, 1], offset: ?>>
+!memrefA = memref<1x32x16x2xbf16, strided<[8192, 128, 2, 1]>>
+!memrefB = memref<1x16x32x2xbf16, strided<[16384, 256, 2, 1]>>
+!memrefC = memref<32x32xf32, strided<[128, 1]>>
 
 #map = affine_map<(d0, d1, d2, d3, d4) -> (d0, d2, d4, d1)>
 #map1 = affine_map<(d0, d1, d2, d3, d4) -> (d0, d4, d3, d1)>
@@ -438,9 +438,9 @@ module attributes {transform.with_named_sequence} {
 
 !vecAB = vector<16x16x4xi8>
 !vecC = vector<16x16xi32>
-!memrefA = memref<16x16x4xi8, strided<[256, 4, 1], offset: ?>>
-!memrefB = memref<16x32x4xi8, strided<[512, 4, 1], offset: ?>>
-!memrefC = memref<16x32xi32, strided<[128, 1], offset: ?>>
+!memrefA = memref<16x16x4xi8, strided<[256, 4, 1]>>
+!memrefB = memref<16x32x4xi8, strided<[512, 4, 1]>>
+!memrefC = memref<16x32xi32, strided<[128, 1]>>
 
 #map = affine_map<(d0, d1, d2, d3) -> (d1, d3, d0)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d3, d2, d0)>
@@ -520,9 +520,9 @@ module attributes {transform.with_named_sequence} {
 
 !vecAB = vector<1x16x16x4xi8>
 !vecC = vector<1x16x16xi32>
-!memrefA = memref<1x16x16x4xi8, strided<[16384, 256, 4, 1], offset: ?>>
-!memrefB = memref<1x16x32x4xi8, strided<[32768, 512, 4, 1], offset: ?>>
-!memrefC = memref<1x16x32xi32, strided<[8192, 128, 1], offset: ?>>
+!memrefA = memref<1x16x16x4xi8, strided<[16384, 256, 4, 1]>>
+!memrefB = memref<1x16x32x4xi8, strided<[32768, 512, 4, 1]>>
+!memrefC = memref<1x16x32xi32, strided<[8192, 128, 1]>>
 
 #map = affine_map<(d0, d1, d2, d3, d4) -> (d0, d2, d4, d1)>
 #map1 = affine_map<(d0, d1, d2, d3, d4) -> (d0, d4, d3, d1)>
@@ -602,9 +602,9 @@ module attributes {transform.with_named_sequence} {
 !vecA = vector<1x16x32xbf16>
 !vecB = vector<1x32x16xbf16>
 !vecC = vector<16x16xf32>
-!memrefA = memref<1x32x32xbf16, strided<[6144, 96, 1], offset: ?>>
-!memrefB = memref<1x32x32xbf16, strided<[12288, 128, 1], offset: ?>>
-!memrefC = memref<32x32xf32, strided<[128, 1], offset: ?>>
+!memrefA = memref<1x32x32xbf16, strided<[6144, 96, 1]>>
+!memrefB = memref<1x32x32xbf16, strided<[12288, 128, 1]>>
+!memrefC = memref<32x32xf32, strided<[128, 1]>>
 
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
@@ -712,9 +712,9 @@ module attributes {transform.with_named_sequence} {
 !vecA = vector<16x64xi8>
 !vecB = vector<64x16xi8>
 !vecC = vector<16x16xi32>
-!memrefA = memref<32x64xi8, strided<[256, 1], offset: ?>>
-!memrefB = memref<64x32xi8, strided<[128, 1], offset: ?>>
-!memrefC = memref<32x32xi32, strided<[128, 1], offset: ?>>
+!memrefA = memref<32x64xi8, strided<[256, 1]>>
+!memrefB = memref<64x32xi8, strided<[128, 1]>>
+!memrefC = memref<32x32xi32, strided<[128, 1]>>
 
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1001,9 +1001,9 @@ module attributes {transform.with_named_sequence} {
 
 !vecAB = vector<16x16x4xi8>
 !vecC = vector<16x16xi32>
-!memrefA = memref<16x16x4xi8, strided<[256, 4, 1], offset: ?>>
-!memrefB = memref<16x32x4xi8, strided<[512, 4, 1], offset: ?>>
-!memrefC = memref<16x32xi32, strided<[128, 1], offset: ?>>
+!memrefA = memref<16x16x4xi8, strided<[256, 4, 1]>>
+!memrefB = memref<16x32x4xi8, strided<[512, 4, 1]>>
+!memrefC = memref<16x32xi32, strided<[128, 1]>>
 
 #map = affine_map<(d0, d1, d2, d3) -> (d1, d3, d0)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d3, d2, d0)>
@@ -1088,9 +1088,9 @@ module attributes {transform.with_named_sequence} {
 !vecA = vector<1x16x16x4xi8>
 !vecB = vector<1x16x32x4xi8>
 !vecC = vector<1x16x32xi32>
-!memrefA = memref<1x16x16x4xi8, strided<[16384, 256, 4, 1], offset: ?>>
-!memrefB = memref<1x16x32x4xi8, strided<[32768, 512, 4, 1], offset: ?>>
-!memrefC = memref<1x16x32xi32, strided<[8192, 128, 1], offset: ?>>
+!memrefA = memref<1x16x16x4xi8, strided<[16384, 256, 4, 1]>>
+!memrefB = memref<1x16x32x4xi8, strided<[32768, 512, 4, 1]>>
+!memrefC = memref<1x16x32xi32, strided<[8192, 128, 1]>>
 
 #map = affine_map<(d0, d1, d2, d3, d4) -> (d0, d2, d4, d1)>
 #map1 = affine_map<(d0, d1, d2, d3, d4) -> (d0, d4, d3, d1)>
@@ -1155,9 +1155,9 @@ module attributes {transform.with_named_sequence} {
 
 !vecAB = vector<1x1x16x16x4xi8>
 !vecC = vector<16x16xi32>
-!memrefA = memref<1x1x16x16x4xi8, strided<[262144, 16384, 256, 4, 1], offset: ?>>
-!memrefB = memref<1x1x16x32x4xi8, strided<[524288, 32768, 512, 4, 1], offset: ?>>
-!memrefC = memref<16x32xi32, strided<[128, 1], offset: ?>>
+!memrefA = memref<1x1x16x16x4xi8, strided<[262144, 16384, 256, 4, 1]>>
+!memrefB = memref<1x1x16x32x4xi8, strided<[524288, 32768, 512, 4, 1]>>
+!memrefC = memref<16x32xi32, strided<[128, 1]>>
 
 #map = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d3, d5, d2)>
 #map1 = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d5, d4, d2)>
@@ -1346,9 +1346,9 @@ module attributes {transform.with_named_sequence} {
 !vecA = vector<16x64xi8>
 !vecB = vector<64x16xi8>
 !vecC = vector<16x16xi32>
-!memrefA = memref<16x64xi8, strided<[256, 1], offset: ?>>
-!memrefB = memref<64x32xi8, strided<[128, 1], offset: ?>>
-!memrefC = memref<16x32xi32, strided<[128, 1], offset: ?>>
+!memrefA = memref<16x64xi8, strided<[256, 1]>>
+!memrefB = memref<64x32xi8, strided<[128, 1]>>
+!memrefC = memref<16x32xi32, strided<[128, 1]>>
 
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1423,9 +1423,9 @@ module attributes {transform.with_named_sequence} {
 !vecA = vector<16x64xi8>
 !vecB = vector<64x16xi8>
 !vecC = vector<16x16xi32>
-!memrefA = memref<32x64xi8, strided<[256, 1], offset: ?>>
-!memrefB = memref<64x16xi8, strided<[128, 1], offset: ?>>
-!memrefC = memref<32x16xi32, strided<[128, 1], offset: ?>>
+!memrefA = memref<32x64xi8, strided<[256, 1]>>
+!memrefB = memref<64x16xi8, strided<[128, 1]>>
+!memrefC = memref<32x16xi32, strided<[128, 1]>>
 
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
diff --git a/mlir/test/Dialect/X86/vector-contract-bf16-to-fma.mlir b/mlir/test/Dialect/X86/vector-contract-bf16-to-fma.mlir
index 4f0e5c5f3c907..1a75449eb0be9 100644
--- a/mlir/test/Dialect/X86/vector-contract-bf16-to-fma.mlir
+++ b/mlir/test/Dialect/X86/vector-contract-bf16-to-fma.mlir
@@ -30,11 +30,11 @@ func.func @brgemm_to_fma(
 // CHECK: memref.subview %arg0[%c0, %c0, %c0, 1] {{.*}} : memref<1x4x1x2xbf16> to memref<1x1x1x1xbf16, {{.*}}>
 // CHECK: memref.subview %arg0[%c0, %c0, %c0, 0] {{.*}} : memref<1x4x1x2xbf16> to memref<1x1x1x1xbf16, {{.*}}>
 // CHECK: memref.subview %arg1[%c0, %c0, %c0, %c0] {{.*}} : memref<1x1x32x2xbf16> to memref<1x1x8x2xbf16, {{.*}}>
-// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1x1x1xbf16, strided<[8, 2, 2, 1], offset: ?>>
-// CHECK: x86.avx.cvt.packed.odd.indexed_to_f32 {{.*}} : memref<1x1x8x2xbf16, strided<[64, 64, 2, 1], offset: ?>>
+// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1x1x1xbf16, strided<[8, 2, 2, 1]>>
+// CHECK: x86.avx.cvt.packed.odd.indexed_to_f32 {{.*}} : memref<1x1x8x2xbf16, strided<[64, 64, 2, 1]>>
 // CHECK: vector.fma {{.*}} : vector<8xf32>
-// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1x1x1xbf16, strided<[8, 2, 2, 1], offset: ?>>
-// CHECK: x86.avx.cvt.packed.even.indexed_to_f32 {{.*}} : memref<1x1x8x2xbf16, strided<[64, 64, 2, 1], offset: ?>>
+// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1x1x1xbf16, strided<[8, 2, 2, 1]>>
+// CHECK: x86.avx.cvt.packed.even.indexed_to_f32 {{.*}} : memref<1x1x8x2xbf16, strided<[64, 64, 2, 1]>>
 // CHECK: vector.fma {{.*}} : vector<8xf32>
 
 module attributes {transform.with_named_sequence} {
@@ -285,10 +285,10 @@ func.func @matmul_to_fma_flat_layout(
 // CHECK-NEXT: vector.shuffle{{.*}}[4, 12, 5, 13, 6, 14, 7, 15] : vector<8xf32>, vector<8xf32>
 // CHECK: memref.subview %arg0[%c0, %c0] {{.*}} : memref<4x1xbf16> to memref<1x1xbf16, {{.*}}>
 // CHECK: memref.subview %arg1[%c0, %c0] {{.*}} : memref<1x32xbf16> to memref<1x16xbf16, {{.*}}>
-// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1xbf16, strided<[1, 1], offset: ?>>
-// CHECK: x86.avx.cvt.packed.even.indexed_to_f32 {{.*}} : memref<1x16xbf16, strided<[32, 1], offset: ?>>
+// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1xbf16, strided<[1, 1]>>
+// CHECK: x86.avx.cvt.packed.even.indexed_to_f32 {{.*}} : memref<1x16xbf16, strided<[32, 1]>>
 // CHECK: vector.fma {{.*}} : vector<8xf32>
-// CHECK: x86.avx.cvt.packed.odd.indexed_to_f32 {{.*}} : memref<1x16xbf16, strided<[32, 1], offset: ?>>
+// CHECK: x86.avx.cvt.packed.odd.indexed_to_f32 {{.*}} : memref<1x16xbf16, strided<[32, 1]>>
 // CHECK: vector.fma {{.*}} : vector<8xf32>
 // CHECK: vector.shuffle{{.*}}[0, 8, 1, 9, 2, 10, 3, 11] : vector<8xf32>, vector<8xf32>
 // CHECK-NEXT: vector.shuffle{{.*}}[4, 12, 5, 13, 6, 14, 7, 15] : vector<8xf32>, vector<8xf32>
@@ -357,10 +357,10 @@ func.func @matmul_to_fma_flat_layout_load(
 // CHECK-NEXT: vector.shuffle{{.*}}[4, 12, 5, 13, 6, 14, 7, 15] : vector<8xf32>, vector<8xf32>
 // CHECK: memref.subview %arg0[%c0, %c0] {{.*}} : memref<4x1xbf16> to memref<1x1xbf16, {{.*}}>
 // CHECK: memref.subview %arg1[%c0, %c0] {{.*}} : memref<1x32xbf16> to memref<1x16xbf16, {{.*}}>
-// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1xbf16, strided<[1, 1], offset: ?>>
-// CHECK: x86.avx.cvt.packed.even.indexed_to_f32 {{.*}} : memref<1x16xbf16, strided<[32, 1], offset: ?>>
+// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1xbf16, strided<[1, 1]>>
+// CHECK: x86.avx.cvt.packed.even.indexed_to_f32 {{.*}} : memref<1x16xbf16, strided<[32, 1]>>
 // CHECK: vector.fma {{.*}} : vector<8xf32>
-// CHECK: x86.avx.cvt.packed.odd.indexed_to_f32 {{.*}} : memref<1x16xbf16, strided<[32, 1], offset: ?>>
+// CHECK: x86.avx.cvt.packed.odd.indexed_to_f32 {{.*}} : memref<1x16xbf16, strided<[32, 1]>>
 // CHECK: vector.fma {{.*}} : vector<8xf32>
 // CHECK: vector.shuffle{{.*}}[0, 8, 1, 9, 2, 10, 3, 11] : vector<8xf32>, vector<8xf32>
 // CHECK-NEXT: vector.shuffle{{.*}}[4, 12, 5, 13, 6, 14, 7, 15] : vector<8xf32>, vector<8xf32>
@@ -380,9 +380,9 @@ module attributes {transform.with_named_sequence} {
 !vecA = vector<1x1x1xbf16>
 !vecB = vector<1x1x8xbf16>
 !vecC = vector<1x8xf32>
-!memrefA = memref<1x1x1xbf16, strided<[2048, 32, 1], offset: ?>>
-!memrefB = memref<1x1x16xbf16, strided<[2048, 64, 1], offset: ?>>
-!memrefC = memref<1x16xf32, strided<[64, 1], offset: ?>>
+!memrefA = memref<1x1x1xbf16, strided<[2048, 32, 1]>>
+!memrefB = memref<1x1x16xbf16, strided<[2048, 64, 1]>>
+!memrefC = memref<1x16xf32, strided<[64, 1]>>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
 #map2 = affine_map<(d0, d1, d2, d3) -> (d1, d2)>
@@ -524,10 +524,10 @@ func.func @matmul_to_fma_flat_layout_bcstB(
 // CHECK-NEXT: vector.shuffle{{.*}}[4, 12, 5, 13, 6, 14, 7, 15] : vector<8xf32>, vector<8xf32>
 // CHECK: memref.subview %arg1[%c0, %c0] {{.*}} : memref<1x4xbf16> to memref<1x1xbf16, {{.*}}>
 // CHECK: memref.subview %arg0[%c0, %c0] {{.*}} : memref<32x1xbf16> to memref<16x1xbf16, {{.*}}>
-// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1xbf16, strided<[4, 1], offset: ?>>
-// CHECK: x86.avx.cvt.packed.even.indexed_to_f32 {{.*}} : memref<16x1xbf16, strided<[1, 1], offset: ?>>
+// CHECK: x86.avx.bcst_to_f32.packed {{.*}} : memref<1x1xbf16, strided<[4, 1]>>
+// CHECK: x86.avx.cvt.packed.even.indexed_to_f32 {{.*}} : memref<16x1xbf16, strided<[1, 1]>>
 // CHECK: vector.fma {{.*}} : vector<8xf32>
-// CHECK: x86.avx.cvt.packed.odd.indexed_to_f32 {{.*}} : memref<16x1xbf16, strided<[1, 1], offset: ?>>
+// CHECK: x86.avx.cvt.packed.odd.indexed_to_f32 {{.*}} : memref<16x1xbf16, strided<[1, 1]>>
 // CHECK: vector.fma {{.*}} : vector<8xf32>
 // CHECK: vector.shuffle{{.*}}[0, 8, 1, 9, 2, 10, 3, 11] : vector<8xf32>, vector<8xf32>
 // CHECK-NEXT: vector.shuffle{{.*}}[4, 12, 5, 13, 6, 14, 7, 15] : vector<8xf32>, vector<8xf32>
@@ -1198,12 +1198,12 @@ func.func @negative_non_unit_stride(
   %c0 = arith.constant 0 : index
   %0 = ub.poison : bf16
   %subview_1 = memref.subview %arg1[%c0, %c0, %c0] [1, 16, 2] [1, 1, 2] :
-               !memrefB to memref<1x16x2xbf16, strided<[64, 2, 2], offset: ?>>
+               !memrefB to memref<1x16x2xbf16, strided<[64, 2, 2]>>
 
   %1 = vector.transfer_read %arg0[%c0, %c0, %c0], %0 {in_bounds = [true, true, true]} :
         !memrefA, !vecA
   %2 = vector.transfer_read %subview_1[%c0, %c0, %c0], %0 {in_bounds = [true, true, true]} :
-        memref<1x16x2xbf16, strided<[64, 2, 2], offset: ?>>, !vecB
+        memref<1x16x2xbf16, strided<[64, 2, 2]>>, !vecB
   %3 = vector.contract {
     indexing_maps = [#map, #map1, #map2],
     iterator_types = ["reduction", "parallel", "parallel", "reduction"],
diff --git a/mlir/test/Dialect/X86/vector-contract-to-packed-type-dotproduct.mlir b/mlir/test/Dialect/X86/vector-contract-to-packed-type-dotproduct.mlir
index f861d357739a3..dce0b4ad7b653 100644
--- a/mlir/test/Dialect/X86/vector-contract-to-packed-type-dotproduct.mlir
+++ b/mlir/test/Dialect/X86/vector-contract-to-packed-type-dotproduct.mlir
@@ -623,9 +623,9 @@ module attributes {transform.with_named_sequence} {
 !vecA = vector<1x1x2xbf16>
 !vecB = vector<1x2x16xbf16>
 !vecC = vector<1x16xf32>
-!memrefA = memref<1x1x2xbf16, strided<[2048, 32, 1], offset: ?>>
-!memrefB = memref<1x2x32xbf16, strided<[2048, 64, 1], offset: ?>>
-!memrefC = memref<1x32xf32, strided<[64, 1], offset: ?>>
+!memrefA = memref<1x1x2xbf16, strided<[2048, 32, 1]>>
+!memrefB = memref<1x2x32xbf16, strided<[2048, 64, 1]>>
+!memrefC = memref<1x32xf32, strided<[64, 1]>>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
 #map2 = affine_map<(d0, d1, d2, d3) -> (d1, d2)>
@@ -793,9 +793,9 @@ module attributes {transform.with_named_sequence} {
 !vecA = vector<1x1x4xi8>
 !vecB = vector<1x4x16xi8>
 !vecC = vector<1x16xi32>
-!memrefA = memref<1x2x4xi8, strided<[16384, 256, 1], offset: ?>>
-!memrefB = memref<1x4x32xi8, strided<[32768, 128, 1], offset: ?>>
-!memrefC = memref<2x32xi32, strided<[128, 1], offset: ?>>
+!memrefA = memref<1x2x4xi8, strided<[16384, 256, 1]>>
+!memrefB = memref<1x4x32xi8, strided<[32768, 128, 1]>>
+!memrefC = memref<2x32xi32, strided<[128, 1]>>
 
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
diff --git a/mlir/test/Dialect/XeGPU/ops.mlir b/mlir/test/Dialect/XeGPU/ops.mlir
index b32e297b60fc8..cbfd71917ccba 100644
--- a/mlir/test/Dialect/XeGPU/ops.mlir
+++ b/mlir/test/Dialect/XeGPU/ops.mlir
@@ -560,11 +560,11 @@ gpu.func @create_mem_desc_from_2d_memref() {
 // CHECK-LABEL: gpu.func @create_mem_desc_with_stride_from_2d_memref({{.*}}) {
 gpu.func @create_mem_desc_with_stride_from_2d_memref() {
   //CHECK: %[[ALLOC:.+]] = memref.alloca() {alignment = 1024 : i64} : memref<32x64xf16, 3>
-  //CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][16, 0] [16, 64] [1, 1] : memref<32x64xf16, 3> to memref<16x64xf16, strided<[64, 1], offset: 1024>, 3>
-  //CHECK: %{{.+}} = xegpu.create_mem_desc %[[SUBVIEW]] : memref<16x64xf16, strided<[64, 1], offset: 1024>, 3> -> !xegpu.mem_desc<16x64xf16, #xegpu.mem_layout<stride = [1, 16]>>
+  //CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][16, 0] [16, 64] [1, 1] : memref<32x64xf16, 3> to memref<16x64xf16, strided<[64, 1]>, 3>
+  //CHECK: %{{.+}} = xegpu.create_mem_desc %[[SUBVIEW]] : memref<16x64xf16, strided<[64, 1]>, 3> -> !xegpu.mem_desc<16x64xf16, #xegpu.mem_layout<stride = [1, 16]>>
   %m = memref.alloca() {alignment = 1024} : memref<32x64xf16, 3>
-  %m_sub = memref.subview %m[16, 0][16, 64][1,1] : memref<32x64xf16, 3> to memref<16x64xf16, strided<[64, 1], offset: 1024>, 3>
-  %mem_desc = xegpu.create_mem_desc %m_sub : memref<16x64xf16, strided<[64, 1], offset: 1024>, 3> -> !xegpu.mem_desc<16x64xf16, #xegpu.mem_layout<stride = [1, 16]>>
+  %m_sub = memref.subview %m[16, 0][16, 64][1,1] : memref<32x64xf16, 3> to memref<16x64xf16, strided<[64, 1]>, 3>
+  %mem_desc = xegpu.create_mem_desc %m_sub : memref<16x64xf16, strided<[64, 1]>, 3> -> !xegpu.mem_desc<16x64xf16, #xegpu.mem_layout<stride = [1, 16]>>
   gpu.return
 }
 
diff --git a/mlir/test/Examples/NVGPU/Ch4.py b/mlir/test/Examples/NVGPU/Ch4.py
index c66259d141336..fd666adcd2d3d 100644
--- a/mlir/test/Examples/NVGPU/Ch4.py
+++ b/mlir/test/Examples/NVGPU/Ch4.py
@@ -458,7 +458,7 @@ def gemm_multistage_kernel():
 # DUMPIR:       %[[SMEM_EPI:.*]] = gpu.dynamic_shared_memory : memref<?xi8, #gpu.address_space<workgroup>>
 # DUMPIR:       %[[C0_VIEW:.*]] = arith.constant 0 : index
 # DUMPIR:       %[[VIEW_EPI:.*]] = memref.view %[[SMEM_EPI]][%[[C0_VIEW]]][] : memref<?xi8, #gpu.address_space<workgroup>> to memref<128x128xf32, #gpu.address_space<workgroup>>
-# DUMPIR:       %[[SUBVIEW_EPI:.*]] = memref.subview %{{.*}}[%[[DIMX_EPI]], %[[DIMY_EPI]]] [128, 128] [1, 1] : memref<512x256xf32> to memref<128x128xf32, strided<[256, 1], offset: ?>>
+# DUMPIR:       %[[SUBVIEW_EPI:.*]] = memref.subview %{{.*}}[%[[DIMX_EPI]], %[[DIMY_EPI]]] [128, 128] [1, 1] : memref<512x256xf32> to memref<128x128xf32, strided<[256, 1]>>
 # DUMPIR:       nvgpu.warpgroup.mma.store %[[LOOP_RES]]#0, %[[VIEW_EPI]] : <fragmented = vector<128x128xf32>> to memref<128x128xf32, #gpu.address_space<workgroup>>
 # DUMPIR:       gpu.barrier
 # DUMPIR:       %[[C0_STORE:.*]] = arith.constant 0 : index
@@ -466,4 +466,4 @@ def gemm_multistage_kernel():
 # DUMPIR:       %[[C1_STORE:.*]] = arith.constant 1 : index
 # DUMPIR:       scf.for %arg15 = %[[C0_STORE]] to %[[C128_STORE]] step %[[C1_STORE]] {
 # DUMPIR:         %[[VAL_LOAD:.*]] = memref.load %[[VIEW_EPI]][%arg15, %[[TID_X_EPI]]] : memref<128x128xf32, #gpu.address_space<workgroup>>
-# DUMPIR:         memref.store %[[VAL_LOAD]], %[[SUBVIEW_EPI]][%arg15, %[[TID_X_EPI]]] : memref<128x128xf32, strided<[256, 1], offset: ?>>
+# DUMPIR:         memref.store %[[VAL_LOAD]], %[[SUBVIEW_EPI]][%arg15, %[[TID_X_EPI]]] : memref<128x128xf32, strided<[256, 1]>>
diff --git a/mlir/test/Examples/NVGPU/Ch5.py b/mlir/test/Examples/NVGPU/Ch5.py
index 4f06f97142620..529aaa0da5b18 100644
--- a/mlir/test/Examples/NVGPU/Ch5.py
+++ b/mlir/test/Examples/NVGPU/Ch5.py
@@ -466,7 +466,7 @@ def gemm_warp_specialized_kernel():
 # DUMPIR:         %[[SMEM_EPI:.*]] = gpu.dynamic_shared_memory : memref<?xi8, #gpu.address_space<workgroup>>
 # DUMPIR:         %[[C0_EPI:.*]] = arith.constant 0 : index
 # DUMPIR:         %[[VIEW_EPI:.*]] = memref.view %[[SMEM_EPI]][%[[C0_EPI]]][] : memref<?xi8, #gpu.address_space<workgroup>> to memref<128x128xf32, #gpu.address_space<workgroup>>
-# DUMPIR:         %[[SUBVIEW:.*]] = memref.subview %{{.*}}[%[[DIM_X_EPI]], %[[DIM_Y_EPI]]] [128, 128] [1, 1] : memref<512x256xf32> to memref<128x128xf32, strided<[256, 1], offset: ?>>
+# DUMPIR:         %[[SUBVIEW:.*]] = memref.subview %{{.*}}[%[[DIM_X_EPI]], %[[DIM_Y_EPI]]] [128, 128] [1, 1] : memref<512x256xf32> to memref<128x128xf32, strided<[256, 1]>>
 # DUMPIR:         nvgpu.warpgroup.mma.store %[[CONS_LOOP]]#0, %[[VIEW_EPI]] : <fragmented = vector<128x128xf32>> to memref<128x128xf32, #gpu.address_space<workgroup>>
 # DUMPIR:         gpu.barrier
 # DUMPIR:         %[[C0_STORE:.*]] = arith.constant 0 : index
@@ -474,7 +474,7 @@ def gemm_warp_specialized_kernel():
 # DUMPIR:         %[[C1_STORE:.*]] = arith.constant 1 : index
 # DUMPIR:         scf.for %arg15 = %[[C0_STORE]] to %[[C128_STORE]] step %[[C1_STORE]] {
 # DUMPIR:           %{{.*}} = memref.load %[[VIEW_EPI]][%arg15, %[[TID_EPI]]] : memref<128x128xf32, #gpu.address_space<workgroup>>
-# DUMPIR:           memref.store %{{.*}}, %[[SUBVIEW]][%arg15, %[[TID_EPI]]] : memref<128x128xf32, strided<[256, 1], offset: ?>>
+# DUMPIR:           memref.store %{{.*}}, %[[SUBVIEW]][%arg15, %[[TID_EPI]]] : memref<128x128xf32, strided<[256, 1]>>
 # DUMPIR:         }
 # DUMPIR:       }
 # DUMPIR:       gpu.terminator
diff --git a/mlir/test/IR/invalid-builtin-types.mlir b/mlir/test/IR/invalid-builtin-types.mlir
index ef3412486d9f4..cb433c77b11ca 100644
--- a/mlir/test/IR/invalid-builtin-types.mlir
+++ b/mlir/test/IR/invalid-builtin-types.mlir
@@ -79,23 +79,8 @@ func.func private @memref_unfinished_stride_list() -> memref<?x?xf32, strided<[>
 
 // -----
 
-// expected-error @below {{expected 'offset' after comma}}
-func.func private @memref_missing_offset() -> memref<?x?xf32, strided<[], >>
-
-// -----
-
-// expected-error @below {{expected ':' after 'offset'}}
-func.func private @memref_missing_offset_colon() -> memref<?x?xf32, strided<[], offset>>
-
-// -----
-
-// expected-error @below {{expected a 64-bit signed integer or '?'}}
-func.func private @memref_missing_offset_value() -> memref<?x?xf32, strided<[], offset: >>
-
-// -----
-
 // expected-error @below {{expected '>'}}
-func.func private @memref_incorrect_strided_ending() -> memref<?x?xf32, strided<[], offset: 32)>
+func.func private @memref_incorrect_strided_ending() -> memref<?x?xf32, strided<[]?>
 
 // -----
 
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/matmul-vs-matvec.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/matmul-vs-matvec.mlir
index 1950fe8621562..ffc240f4341ed 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/matmul-vs-matvec.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/matmul-vs-matvec.mlir
@@ -28,10 +28,10 @@ func.func @matvec(%A: memref<?x?xf32>, %B: memref<?x?xf32>) -> (memref<?x?xf32>)
   %C = memref.alloc(%m, %n) : memref<?x?xf32>
   linalg.fill ins(%f0 : f32) outs(%C : memref<?x?xf32>)
   scf.for %i = %c0 to %n step %c1 {
-    %b = memref.subview %B[0, %i][%x, 1][1, 1] : memref<?x?xf32> to memref<?xf32, strided<[?], offset: ?>>
-    %c = memref.subview %C[0, %i][%m, 1][1, 1] : memref<?x?xf32> to memref<?xf32, strided<[?], offset: ?>>
-    linalg.matvec ins(%A, %b: memref<?x?xf32>, memref<?xf32, strided<[?], offset: ?>>)
-                  outs(%c: memref<?xf32, strided<[?], offset: ?>>)
+    %b = memref.subview %B[0, %i][%x, 1][1, 1] : memref<?x?xf32> to memref<?xf32, strided<[?]>>
+    %c = memref.subview %C[0, %i][%m, 1][1, 1] : memref<?x?xf32> to memref<?xf32, strided<[?]>>
+    linalg.matvec ins(%A, %b: memref<?x?xf32>, memref<?xf32, strided<[?]>>)
+                  outs(%c: memref<?xf32, strided<[?]>>)
   }
   return %C : memref<?x?xf32>
 }
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/rank-reducing-subview.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/rank-reducing-subview.mlir
index fe261a7345697..37cbce18ae4aa 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/rank-reducing-subview.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/rank-reducing-subview.mlir
@@ -18,13 +18,13 @@ func.func @main() {
   memref.store %f1, %A[%c0, %c1] : memref<?x?xf32>
   memref.store %f2, %A[%c1, %c0] : memref<?x?xf32>
   memref.store %f3, %A[%c1, %c1] : memref<?x?xf32>
-  %B = memref.subview %A[%c1, 0][1, %c2][1, 1] : memref<?x?xf32> to memref<?xf32, strided<[1], offset: ?>>
-  %C = memref.subview %A[0, %c1][%c2, 1][1, 1] : memref<?x?xf32> to memref<?xf32, strided<[?], offset: ?>>
+  %B = memref.subview %A[%c1, 0][1, %c2][1, 1] : memref<?x?xf32> to memref<?xf32, strided<[1]>>
+  %C = memref.subview %A[0, %c1][%c2, 1][1, 1] : memref<?x?xf32> to memref<?xf32, strided<[?]>>
   %A_ = memref.cast %A : memref<?x?xf32> to memref<*xf32>
   call @printMemrefF32(%A_) : (memref<*xf32>) -> ()
-  %B_ = memref.cast %B : memref<?xf32, strided<[1], offset: ?>> to memref<*xf32>
+  %B_ = memref.cast %B : memref<?xf32, strided<[1]>> to memref<*xf32>
   call @printMemrefF32(%B_) : (memref<*xf32>) -> ()
-  %C_ = memref.cast %C : memref<?xf32, strided<[?], offset: ?>> to memref<*xf32>
+  %C_ = memref.cast %C : memref<?xf32, strided<[?]>> to memref<*xf32>
   call @printMemrefF32(%C_) : (memref<*xf32>) -> ()
   memref.dealloc %A : memref<?x?xf32>
   return
diff --git a/mlir/test/Integration/Dialect/MemRef/cast-runtime-verification.mlir b/mlir/test/Integration/Dialect/MemRef/cast-runtime-verification.mlir
index b605c77deb6f0..aed8c76cf394d 100644
--- a/mlir/test/Integration/Dialect/MemRef/cast-runtime-verification.mlir
+++ b/mlir/test/Integration/Dialect/MemRef/cast-runtime-verification.mlir
@@ -25,11 +25,11 @@ func.func @cast_to_ranked(%m: memref<*xf32>) -> memref<f32> {
   return %0 : memref<f32>
 }
 
-func.func @cast_to_static_strides(%m: memref<?xf32, strided<[?], offset: ?>>)
-    -> memref<?xf32, strided<[9], offset: 5>> {
-  %0 = memref.cast %m : memref<?xf32, strided<[?], offset: ?>>
-                     to memref<?xf32, strided<[9], offset: 5>>
-  return %0 : memref<?xf32, strided<[9], offset: 5>>
+func.func @cast_to_static_strides(%m: memref<?xf32, strided<[?]>>)
+    -> memref<?xf32, strided<[9]>> {
+  %0 = memref.cast %m : memref<?xf32, strided<[?]>>
+                     to memref<?xf32, strided<[9]>>
+  return %0 : memref<?xf32, strided<[9]>>
 }
 
 func.func @valid_cast(%m: memref<*xf32>) -> memref<?xf32> {
@@ -57,19 +57,19 @@ func.func @main() {
   func.call @cast_to_ranked(%3) : (memref<*xf32>) -> (memref<f32>)
 
   // CHECK-NEXT: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.cast %{{.*}} : memref<?xf32, strided<[?], offset: ?>>
+  // CHECK-NEXT: memref.cast %{{.*}} : memref<?xf32, strided<[?]>>
   // CHECK-NEXT: ^ offset mismatch
   // CHECK-NEXT: Location: loc({{.*}})
 
   // CHECK-NEXT: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.cast %{{.*}} : memref<?xf32, strided<[?], offset: ?>>
+  // CHECK-NEXT: memref.cast %{{.*}} : memref<?xf32, strided<[?]>>
   // CHECK-NEXT: ^ stride mismatch of dim 0
   // CHECK-NEXT: Location: loc({{.*}})
   %4 = memref.cast %alloc
-      : memref<5xf32> to memref<?xf32, strided<[?], offset: ?>>
+      : memref<5xf32> to memref<?xf32, strided<[?]>>
   func.call @cast_to_static_strides(%4)
-      : (memref<?xf32, strided<[?], offset: ?>>)
-     -> (memref<?xf32, strided<[9], offset: 5>>)
+      : (memref<?xf32, strided<[?]>>)
+     -> (memref<?xf32, strided<[9]>>)
 
   // A last cast that actually succeeds.
   // CHECK-NOT: ERROR: Runtime op verification failed
diff --git a/mlir/test/Integration/Dialect/MemRef/subview-runtime-verification.mlir b/mlir/test/Integration/Dialect/MemRef/subview-runtime-verification.mlir
index 09cfee16ccd00..6c53aed77b6d5 100644
--- a/mlir/test/Integration/Dialect/MemRef/subview-runtime-verification.mlir
+++ b/mlir/test/Integration/Dialect/MemRef/subview-runtime-verification.mlir
@@ -22,42 +22,42 @@
 func.func @subview(%memref: memref<1xf32>, %offset: index) {
     memref.subview %memref[%offset] [1] [1] : 
         memref<1xf32> to 
-        memref<1xf32, strided<[1], offset: ?>>
+        memref<1xf32, strided<[1]>>
     return
 }
 
 func.func @subview_dynamic(%memref: memref<?x4xf32>, %offset: index, %size: index, %stride: index) {
     memref.subview %memref[%offset, 0] [%size, 4] [%stride, 1] : 
         memref<?x4xf32> to 
-        memref<?x4xf32, strided<[?, 1], offset: ?>>
+        memref<?x4xf32, strided<[?, 1]>>
     return
 }
 
 func.func @subview_dynamic_rank_reduce(%memref: memref<?x4xf32>, %offset: index, %size: index, %stride: index) {
     memref.subview %memref[%offset, 0] [%size, 1] [%stride, 1] :
         memref<?x4xf32> to
-        memref<?xf32, strided<[?], offset: ?>>
+        memref<?xf32, strided<[?]>>
     return
 }
 
-func.func @subview_zero_size_dim(%memref: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>, 
+func.func @subview_zero_size_dim(%memref: memref<10x4x1xf32, strided<[?, ?, ?]>>, 
                                  %dim_0: index, 
                                  %dim_1: index, 
                                  %dim_2: index) {
     %subview = memref.subview %memref[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] :
-        memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to
-        memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+        memref<10x4x1xf32, strided<[?, ?, ?]>> to
+        memref<?x?x?xf32, strided<[?, ?, ?]>>
     return
 }
 
-func.func @subview_with_empty_slice(%memref: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>, 
+func.func @subview_with_empty_slice(%memref: memref<10x4x1xf32, strided<[?, ?, ?]>>, 
                                  %dim_0: index, 
                                  %dim_1: index, 
                                  %dim_2: index,
                                  %offset: index) {
     %subview = memref.subview %memref[%offset, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] :
-        memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to
-        memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+        memref<10x4x1xf32, strided<[?, ?, ?]>> to
+        memref<?x?x?xf32, strided<[?, ?, ?]>>
     return
 }
 
@@ -75,47 +75,47 @@ func.func @main() {
 
   // Offset is out-of-bounds and slice runs out-of-bounds
   //      CHECK: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}, 0] [%{{.*}}, 1] [%{{.*}}, 1] : memref<?x4xf32> to memref<?xf32, strided<[?], offset: ?>>
+  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}, 0] [%{{.*}}, 1] [%{{.*}}, 1] : memref<?x4xf32> to memref<?xf32, strided<[?]>>
   // CHECK-NEXT: ^ offset 0 is out-of-bounds
   // CHECK-NEXT: Location: loc({{.*}})
   //      CHECK: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}, 0] [%{{.*}}, 1] [%{{.*}}, 1] : memref<?x4xf32> to memref<?xf32, strided<[?], offset: ?>>
+  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}, 0] [%{{.*}}, 1] [%{{.*}}, 1] : memref<?x4xf32> to memref<?xf32, strided<[?]>>
   // CHECK-NEXT: ^ subview runs out-of-bounds along dimension 0
   // CHECK-NEXT: Location: loc({{.*}})
   func.call @subview_dynamic_rank_reduce(%alloca_4_dyn, %5, %5, %1) : (memref<?x4xf32>, index, index, index) -> ()
 
   // Offset is out-of-bounds and slice runs out-of-bounds
   //      CHECK: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<1xf32> to memref<1xf32, strided<[1], offset: ?>>
+  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<1xf32> to memref<1xf32, strided<[1]>>
   // CHECK-NEXT: ^ offset 0 is out-of-bounds
   // CHECK-NEXT: Location: loc({{.*}})
   //      CHECK: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<1xf32> to memref<1xf32, strided<[1], offset: ?>>
+  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<1xf32> to memref<1xf32, strided<[1]>>
   // CHECK-NEXT: ^ subview runs out-of-bounds along dimension 0
   // CHECK-NEXT: Location: loc({{.*}})
   func.call @subview(%alloca, %1) : (memref<1xf32>, index) -> ()
 
   // Offset is out-of-bounds and slice runs out-of-bounds
   //      CHECK: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<1xf32> to memref<1xf32, strided<[1], offset: ?>>
+  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<1xf32> to memref<1xf32, strided<[1]>>
   // CHECK-NEXT: ^ offset 0 is out-of-bounds
   // CHECK-NEXT: Location: loc({{.*}})
   //      CHECK: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<1xf32> to memref<1xf32, strided<[1], offset: ?>>
+  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}] [1] [1] : memref<1xf32> to memref<1xf32, strided<[1]>>
   // CHECK-NEXT: ^ subview runs out-of-bounds along dimension 0
   // CHECK-NEXT: Location: loc({{.*}})
   func.call @subview(%alloca, %n1) : (memref<1xf32>, index) -> ()
 
   // Slice runs out-of-bounds due to size
   //      CHECK: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}, 0] [%{{.*}}, 4] [%{{.*}}, 1] : memref<?x4xf32> to memref<?x4xf32, strided<[?, 1], offset: ?>>
+  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}, 0] [%{{.*}}, 4] [%{{.*}}, 1] : memref<?x4xf32> to memref<?x4xf32, strided<[?, 1]>>
   // CHECK-NEXT: ^ subview runs out-of-bounds along dimension 0
   // CHECK-NEXT: Location: loc({{.*}})
   func.call @subview_dynamic(%alloca_4_dyn, %0, %5, %1) : (memref<?x4xf32>, index, index, index) -> ()
 
   // Slice runs out-of-bounds due to stride
   //      CHECK: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}, 0] [%{{.*}}, 4] [%{{.*}}, 1] : memref<?x4xf32> to memref<?x4xf32, strided<[?, 1], offset: ?>>
+  // CHECK-NEXT: memref.subview %{{.*}}[%{{.*}}, 0] [%{{.*}}, 4] [%{{.*}}, 1] : memref<?x4xf32> to memref<?x4xf32, strided<[?, 1]>>
   // CHECK-NEXT: ^ subview runs out-of-bounds along dimension 0
   // CHECK-NEXT: Location: loc({{.*}})
   func.call @subview_dynamic(%alloca_4_dyn, %0, %4, %4) : (memref<?x4xf32>, index, index, index) -> ()
@@ -130,17 +130,17 @@ func.func @main() {
   func.call @subview_dynamic_rank_reduce(%alloca_4_dyn, %0, %1, %0) : (memref<?x4xf32>, index, index, index) -> ()
 
   %alloca_10x4x1 = memref.alloca() : memref<10x4x1xf32>
-  %alloca_10x4x1_dyn_stride = memref.cast %alloca_10x4x1 : memref<10x4x1xf32> to memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>
+  %alloca_10x4x1_dyn_stride = memref.cast %alloca_10x4x1 : memref<10x4x1xf32> to memref<10x4x1xf32, strided<[?, ?, ?]>>
   // CHECK-NOT: ERROR: Runtime op verification failed
   %dim_0 = arith.constant 0 : index
   %dim_1 = arith.constant 4 : index
   %dim_2 = arith.constant 1 : index
   func.call @subview_zero_size_dim(%alloca_10x4x1_dyn_stride, %dim_0, %dim_1, %dim_2)
-                                        : (memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>, index, index, index) -> ()
+                                        : (memref<10x4x1xf32, strided<[?, ?, ?]>>, index, index, index) -> ()
 
   // CHECK-NOT: ERROR: Runtime op verification failed
   %offset = arith.constant 10 : index
   func.call @subview_with_empty_slice(%alloca_10x4x1_dyn_stride, %dim_0, %dim_1, %dim_2, %offset)
-                                        : (memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>, index, index, index, index) -> ()
+                                        : (memref<10x4x1xf32, strided<[?, ?, ?]>>, index, index, index, index) -> ()
   return
 }
diff --git a/mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort_coo.mlir b/mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort_coo.mlir
index c45b169f82779..e8cb4727c1ee1 100644
--- a/mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort_coo.mlir
+++ b/mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort_coo.mlir
@@ -49,18 +49,18 @@ module {
   }
 
   // Stores 5 values to the memref buffer.
-  func.func @storeValuesToStrided(%b: memref<?xi32, strided<[4], offset: ?>>, %v0: i32, %v1: i32, %v2: i32,
+  func.func @storeValuesToStrided(%b: memref<?xi32, strided<[4]>>, %v0: i32, %v1: i32, %v2: i32,
     %v3: i32, %v4: i32) -> () {
     %i0 = arith.constant 0 : index
     %i1 = arith.constant 1 : index
     %i2 = arith.constant 2 : index
     %i3 = arith.constant 3 : index
     %i4 = arith.constant 4 : index
-    memref.store %v0, %b[%i0] : memref<?xi32, strided<[4], offset: ?>>
-    memref.store %v1, %b[%i1] : memref<?xi32, strided<[4], offset: ?>>
-    memref.store %v2, %b[%i2] : memref<?xi32, strided<[4], offset: ?>>
-    memref.store %v3, %b[%i3] : memref<?xi32, strided<[4], offset: ?>>
-    memref.store %v4, %b[%i4] : memref<?xi32, strided<[4], offset: ?>>
+    memref.store %v0, %b[%i0] : memref<?xi32, strided<[4]>>
+    memref.store %v1, %b[%i1] : memref<?xi32, strided<[4]>>
+    memref.store %v2, %b[%i2] : memref<?xi32, strided<[4]>>
+    memref.store %v3, %b[%i3] : memref<?xi32, strided<[4]>>
+    memref.store %v4, %b[%i4] : memref<?xi32, strided<[4]>>
     return
   }
 
@@ -89,10 +89,10 @@ module {
     // Prepare a buffer for x0, x1, x2, y0 and a buffer for y1.
     %xys = memref.alloc() : memref<20xi32>
     %xy = memref.cast %xys : memref<20xi32> to memref<?xi32>
-    %x0 = memref.subview %xy[%i0][%i5][4] : memref<?xi32> to memref<?xi32, strided<[4], offset: ?>>
-    %x1 = memref.subview %xy[%i1][%i5][4] : memref<?xi32> to memref<?xi32, strided<[4], offset: ?>>
-    %x2 = memref.subview %xy[%i2][%i5][4] : memref<?xi32> to memref<?xi32, strided<[4], offset: ?>>
-    %y0 = memref.subview %xy[%i3][%i5][4] : memref<?xi32> to memref<?xi32, strided<[4], offset: ?>>
+    %x0 = memref.subview %xy[%i0][%i5][4] : memref<?xi32> to memref<?xi32, strided<[4]>>
+    %x1 = memref.subview %xy[%i1][%i5][4] : memref<?xi32> to memref<?xi32, strided<[4]>>
+    %x2 = memref.subview %xy[%i2][%i5][4] : memref<?xi32> to memref<?xi32, strided<[4]>>
+    %y0 = memref.subview %xy[%i3][%i5][4] : memref<?xi32> to memref<?xi32, strided<[4]>>
     %y1s = memref.alloc() : memref<7xi32>
     %y1 = memref.cast %y1s : memref<7xi32> to memref<?xi32>
 
@@ -103,25 +103,25 @@ module {
     // CHECK: ( 7, 8, 10, 9, 6 )
     // CHECK: ( 7, 4, 7, 9, 5 )
     call @storeValuesToStrided(%x0, %c1, %c1, %c3, %c10, %c3)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesToStrided(%x1, %c10, %c2, %c1, %c5, %c1)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesToStrided(%x2, %c2, %c4, %c9, %c7, %c9)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesToStrided(%y0, %c6, %c10, %c8, %c9, %c7)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesTo(%y1, %c5, %c7, %c4, %c9, %c7)
       : (memref<?xi32>, i32, i32, i32, i32, i32) -> ()
     sparse_tensor.sort quick_sort %i5, %xy jointly %y1 {perm_map = #ID_MAP, ny = 1 : index}
       : memref<?xi32> jointly memref<?xi32>
     // Dumps memory in the same order as the perm_map such that the output is ordered.
-    %x1v = vector.transfer_read %x1[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %x1v = vector.transfer_read %x1[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %x1v : vector<5xi32>
-    %x2v = vector.transfer_read %x2[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %x2v = vector.transfer_read %x2[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %x2v : vector<5xi32>
-    %x0v = vector.transfer_read %x0[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %x0v = vector.transfer_read %x0[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %x0v : vector<5xi32>
-    %y0v = vector.transfer_read %y0[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %y0v = vector.transfer_read %y0[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %y0v : vector<5xi32>
     %y1v = vector.transfer_read %y1[%i0], %c100: memref<?xi32>, vector<5xi32>
     vector.print %y1v : vector<5xi32>
@@ -132,24 +132,24 @@ module {
     // CHECK: ( 8, 7, 10, 9, 6 )
     // CHECK: ( 4, 7, 7, 9, 5 )
     call @storeValuesToStrided(%x0, %c1, %c1, %c3, %c10, %c3)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesToStrided(%x1, %c10, %c2, %c1, %c5, %c1)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesToStrided(%x2, %c2, %c4, %c9, %c7, %c9)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesToStrided(%y0, %c6, %c10, %c8, %c9, %c7)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesTo(%y1, %c5, %c7, %c4, %c9, %c7)
       : (memref<?xi32>, i32, i32, i32, i32, i32) -> ()
     sparse_tensor.sort insertion_sort_stable %i5, %xy jointly %y1 {perm_map = #ID_MAP, ny = 1 : index}
       : memref<?xi32> jointly memref<?xi32>
-    %x1v2 = vector.transfer_read %x1[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %x1v2 = vector.transfer_read %x1[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %x1v2 : vector<5xi32>
-    %x2v2 = vector.transfer_read %x2[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %x2v2 = vector.transfer_read %x2[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %x2v2 : vector<5xi32>
-    %x0v2 = vector.transfer_read %x0[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %x0v2 = vector.transfer_read %x0[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %x0v2 : vector<5xi32>
-    %y0v2 = vector.transfer_read %y0[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %y0v2 = vector.transfer_read %y0[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %y0v2 : vector<5xi32>
     %y1v2 = vector.transfer_read %y1[%i0], %c100: memref<?xi32>, vector<5xi32>
     vector.print %y1v2 : vector<5xi32>
@@ -160,24 +160,24 @@ module {
     // CHECK: ( 7, 8, 10, 9, 6 )
     // CHECK: ( 7, 4, 7, 9, 5 )
     call @storeValuesToStrided(%x0, %c1, %c1, %c3, %c10, %c3)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesToStrided(%x1, %c10, %c2, %c1, %c5, %c1)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesToStrided(%x2, %c2, %c4, %c9, %c7, %c9)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesToStrided(%y0, %c6, %c10, %c8, %c9, %c7)
-      : (memref<?xi32, strided<[4], offset: ?>>, i32, i32, i32, i32, i32) -> ()
+      : (memref<?xi32, strided<[4]>>, i32, i32, i32, i32, i32) -> ()
     call @storeValuesTo(%y1, %c5, %c7, %c4, %c9, %c7)
       : (memref<?xi32>, i32, i32, i32, i32, i32) -> ()
     sparse_tensor.sort heap_sort %i5, %xy jointly %y1 {perm_map = #ID_MAP, ny = 1 : index}
       : memref<?xi32> jointly memref<?xi32>
-    %x1v3 = vector.transfer_read %x1[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %x1v3 = vector.transfer_read %x1[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %x1v3 : vector<5xi32>
-    %x2v3 = vector.transfer_read %x2[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %x2v3 = vector.transfer_read %x2[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %x2v3 : vector<5xi32>
-    %x0v3 = vector.transfer_read %x0[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %x0v3 = vector.transfer_read %x0[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %x0v3 : vector<5xi32>
-    %y0v3 = vector.transfer_read %y0[%i0], %c100: memref<?xi32, strided<[4], offset: ?>>, vector<5xi32>
+    %y0v3 = vector.transfer_read %y0[%i0], %c100: memref<?xi32, strided<[4]>>, vector<5xi32>
     vector.print %y0v3 : vector<5xi32>
     %y1v3 = vector.transfer_read %y1[%i0], %c100: memref<?xi32>, vector<5xi32>
     vector.print %y1v3 : vector<5xi32>
diff --git a/mlir/test/Integration/Dialect/Standard/CPU/test_subview.mlir b/mlir/test/Integration/Dialect/Standard/CPU/test_subview.mlir
index a37a929182fc5..499d07e98e483 100644
--- a/mlir/test/Integration/Dialect/Standard/CPU/test_subview.mlir
+++ b/mlir/test/Integration/Dialect/Standard/CPU/test_subview.mlir
@@ -13,8 +13,8 @@ func.func @main() {
   %0 = memref.get_global @__constant_5x3xf32 : memref<5x3xf32>
 
   /// Subview with only leading operands.
-  %1 = memref.subview %0[2, 0][3, 3][1, 1]: memref<5x3xf32> to memref<3x3xf32, strided<[3, 1], offset: 6>>
-  %unranked = memref.cast %1 : memref<3x3xf32, strided<[3, 1], offset: 6>> to memref<*xf32>
+  %1 = memref.subview %0[2, 0][3, 3][1, 1]: memref<5x3xf32> to memref<3x3xf32, strided<[3, 1]>>
+  %unranked = memref.cast %1 : memref<3x3xf32, strided<[3, 1]>> to memref<*xf32>
   call @printMemrefF32(%unranked) : (memref<*xf32>) -> ()
 
   //      CHECK: Unranked Memref base@ = {{0x[-9a-f]*}}
@@ -26,8 +26,8 @@ func.func @main() {
   // CHECK-SAME: ]
 
   /// Regular subview.
-  %2 = memref.subview %0[0, 2][5, 1][1, 1]: memref<5x3xf32> to memref<5x1xf32, strided<[3, 1], offset: 2>>
-  %unranked2 = memref.cast %2 : memref<5x1xf32, strided<[3, 1], offset: 2>> to memref<*xf32>
+  %2 = memref.subview %0[0, 2][5, 1][1, 1]: memref<5x3xf32> to memref<5x1xf32, strided<[3, 1]>>
+  %unranked2 = memref.cast %2 : memref<5x1xf32, strided<[3, 1]>> to memref<*xf32>
   call @printMemrefF32(%unranked2) : (memref<*xf32>) -> ()
 
   //      CHECK: Unranked Memref base@ = {{0x[-9a-f]*}}
@@ -41,8 +41,8 @@ func.func @main() {
   // CHECK-SAME: ]
 
   /// Rank-reducing subview.
-  %3 = memref.subview %0[0, 2][5, 1][1, 1]: memref<5x3xf32> to memref<5xf32, strided<[3], offset: 2>>
-  %unranked3 = memref.cast %3 : memref<5xf32, strided<[3], offset: 2>> to memref<*xf32>
+  %3 = memref.subview %0[0, 2][5, 1][1, 1]: memref<5x3xf32> to memref<5xf32, strided<[3]>>
+  %unranked3 = memref.cast %3 : memref<5xf32, strided<[3]>> to memref<*xf32>
   call @printMemrefF32(%unranked3) : (memref<*xf32>) -> ()
 
   //      CHECK: Unranked Memref base@ = {{0x[-9a-f]*}}
@@ -50,8 +50,8 @@ func.func @main() {
   // CHECK-NEXT: [2,  5,  8,  11,  14]
 
   /// Rank-reducing subview with only leading operands.
-  %4 = memref.subview %0[1, 0][1, 3][1, 1]: memref<5x3xf32> to memref<3xf32, strided<[1], offset: 3>>
-  %unranked4 = memref.cast %4 : memref<3xf32, strided<[1], offset: 3>> to memref<*xf32>
+  %4 = memref.subview %0[1, 0][1, 3][1, 1]: memref<5x3xf32> to memref<3xf32, strided<[1]>>
+  %unranked4 = memref.cast %4 : memref<3xf32, strided<[1]>> to memref<*xf32>
   call @printMemrefF32(%unranked4) : (memref<*xf32>) -> ()
   //      CHECK: Unranked Memref base@ = {{0x[-9a-f]*}}
   // CHECK-SAME: rank = 1 offset = 3 sizes = [3] strides = [1] data =
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/transfer-read-1d.mlir b/mlir/test/Integration/Dialect/Vector/CPU/transfer-read-1d.mlir
index 895b8818de767..2693c9fcbaec4 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/transfer-read-1d.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/transfer-read-1d.mlir
@@ -40,9 +40,9 @@ func.func @transfer_read_1d_unit_stride(%A : memref<?x?xf32>) {
   scf.for %arg2 = %c1 to %c5 step %c2 {
     scf.for %arg3 = %c0 to %c6 step %c3 {
       %0 = memref.subview %A[%arg2, %arg3] [1, 2] [1, 1]
-          : memref<?x?xf32> to memref<1x2xf32, strided<[?, 1], offset: ?>>
+          : memref<?x?xf32> to memref<1x2xf32, strided<[?, 1]>>
       %1 = vector.transfer_read %0[%c0, %c0], %fm42 {in_bounds=[true]}
-          : memref<1x2xf32, strided<[?, 1], offset: ?>>, vector<2xf32>
+          : memref<1x2xf32, strided<[?, 1]>>, vector<2xf32>
       vector.print %1 : vector<2xf32>
     }
   }
@@ -58,9 +58,9 @@ func.func @transfer_read_1d_non_static_unit_stride(%A : memref<?x?xf32>) {
   %c6 = arith.constant 6 : index
   %fm42 = arith.constant -42.0: f32
   %1 = memref.reinterpret_cast %A to offset: [%c6], sizes: [%c4, %c6],  strides: [%c6, %c1]
-      : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+      : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
   %2 = vector.transfer_read %1[%c2, %c1], %fm42 {in_bounds=[true]}
-      : memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<4xf32>
+      : memref<?x?xf32, strided<[?, ?]>>, vector<4xf32>
   vector.print %2 : vector<4xf32>
   return
 }
diff --git a/mlir/test/Integration/Dialect/XeGPU/LANE/load_store_subview.mlir b/mlir/test/Integration/Dialect/XeGPU/LANE/load_store_subview.mlir
index c4608acb7b7b5..333a9c2b0fa89 100644
--- a/mlir/test/Integration/Dialect/XeGPU/LANE/load_store_subview.mlir
+++ b/mlir/test/Integration/Dialect/XeGPU/LANE/load_store_subview.mlir
@@ -8,12 +8,12 @@
 module @subview attributes {gpu.container_module} {
   gpu.module @kernel {
     gpu.func @subview(%src: memref<256xf32>, %dst: memref<256xf32>) kernel {
-      %src_subview = memref.subview %src[5] [251] [1] : memref<256xf32> to memref<251xf32, strided<[1], offset: 5>>
-      %dst_subview = memref.subview %dst[10] [246] [1] : memref<256xf32> to memref<246xf32, strided<[1], offset: 10>>
+      %src_subview = memref.subview %src[5] [251] [1] : memref<256xf32> to memref<251xf32, strided<[1]>>
+      %dst_subview = memref.subview %dst[10] [246] [1] : memref<256xf32> to memref<246xf32, strided<[1]>>
       %lane_id = gpu.lane_id
       %mask = arith.constant 1 : i1
-      %loaded = xegpu.load %src_subview[%lane_id], %mask : memref<251xf32, strided<[1], offset: 5>>, index, i1 -> f32
-      xegpu.store %loaded, %dst_subview[%lane_id], %mask : f32, memref<246xf32, strided<[1], offset: 10>>, index, i1
+      %loaded = xegpu.load %src_subview[%lane_id], %mask : memref<251xf32, strided<[1]>>, index, i1 -> f32
+      xegpu.store %loaded, %dst_subview[%lane_id], %mask : f32, memref<246xf32, strided<[1]>>, index, i1
       gpu.return
     }
   }
diff --git a/mlir/test/Integration/GPU/CUDA/sm90/gemm_f32_f16_f16_128x128x128.mlir b/mlir/test/Integration/GPU/CUDA/sm90/gemm_f32_f16_f16_128x128x128.mlir
index 22474cbcd39f3..596eee00a16eb 100644
--- a/mlir/test/Integration/GPU/CUDA/sm90/gemm_f32_f16_f16_128x128x128.mlir
+++ b/mlir/test/Integration/GPU/CUDA/sm90/gemm_f32_f16_f16_128x128x128.mlir
@@ -200,11 +200,11 @@ func.func @main() {
       // TMA wait
       %phase_c0 = arith.constant 0 : i1
       nvgpu.mbarrier.try_wait.parity %barrier[%i], %phase_c0, %ticks : !barrierType
-      %lhsSlice = memref.subview %lhsShmem [%i, 0, 0][1, 128, 64][1, 1, 1] : memref<2x128x64xf16, #gpu.address_space<workgroup>> to memref<128x64xf16, strided<[64, 1], offset: ?>, #gpu.address_space<workgroup>>
-      %rhsSlice = memref.subview %rhsShmem [%i, 0, 0][1, 64, 128][1, 1, 1] : memref<2x64x128xf16, #gpu.address_space<workgroup>> to memref<64x128xf16, strided<[128, 1], offset: ?>, #gpu.address_space<workgroup>>
+      %lhsSlice = memref.subview %lhsShmem [%i, 0, 0][1, 128, 64][1, 1, 1] : memref<2x128x64xf16, #gpu.address_space<workgroup>> to memref<128x64xf16, strided<[64, 1]>, #gpu.address_space<workgroup>>
+      %rhsSlice = memref.subview %rhsShmem [%i, 0, 0][1, 64, 128][1, 1, 1] : memref<2x64x128xf16, #gpu.address_space<workgroup>> to memref<64x128xf16, strided<[128, 1]>, #gpu.address_space<workgroup>>
       // Descriptor WGMMA
-      %dA = nvgpu.warpgroup.generate.descriptor %lhsSlice, %descA : memref<128x64xf16, strided<[64, 1], offset: ?>, #gpu.address_space<workgroup>>, !lhsTensorMap -> !nvgpu.warpgroup.descriptor<tensor=memref<128x64xf16, 3>>
-      %dB = nvgpu.warpgroup.generate.descriptor %rhsSlice, %descB : memref<64x128xf16, strided<[128, 1], offset: ?>, #gpu.address_space<workgroup>>, !rhsTensorMap -> !nvgpu.warpgroup.descriptor<tensor=memref<64x128xf16, 3>>
+      %dA = nvgpu.warpgroup.generate.descriptor %lhsSlice, %descA : memref<128x64xf16, strided<[64, 1]>, #gpu.address_space<workgroup>>, !lhsTensorMap -> !nvgpu.warpgroup.descriptor<tensor=memref<128x64xf16, 3>>
+      %dB = nvgpu.warpgroup.generate.descriptor %rhsSlice, %descB : memref<64x128xf16, strided<[128, 1]>, #gpu.address_space<workgroup>>, !rhsTensorMap -> !nvgpu.warpgroup.descriptor<tensor=memref<64x128xf16, 3>>
       // Perform WGMMA 128x128x64
       %md  = nvgpu.warpgroup.mma %dA, %dB, %mc {transposeB} : <tensor = memref<128x64xf16,3>>, <tensor = memref<64x128xf16,3>>, <fragmented = vector<128x128xf32>> -> <fragmented = vector<128x128xf32>>
       scf.yield %md : !nvgpu.warpgroup.accumulator<fragmented = vector<128x128xf32>>
diff --git a/mlir/test/Integration/GPU/CUDA/sm90/gemm_pred_f32_f16_f16_128x128x128.mlir b/mlir/test/Integration/GPU/CUDA/sm90/gemm_pred_f32_f16_f16_128x128x128.mlir
index 39bad38f36468..0bc9f54970d3b 100644
--- a/mlir/test/Integration/GPU/CUDA/sm90/gemm_pred_f32_f16_f16_128x128x128.mlir
+++ b/mlir/test/Integration/GPU/CUDA/sm90/gemm_pred_f32_f16_f16_128x128x128.mlir
@@ -208,11 +208,11 @@ func.func @main() {
       // TMA wait
       %phase_c0 = arith.constant 0 : i1
       nvgpu.mbarrier.try_wait.parity %barrier[%i], %phase_c0, %ticks : !barrierType
-      %lhsSlice = memref.subview %lhsShmem [%i, 0, 0][1, 128, 64][1, 1, 1] : memref<2x128x64xf16, #gpu.address_space<workgroup>> to memref<128x64xf16, strided<[64, 1], offset: ?>, #gpu.address_space<workgroup>>
-      %rhsSlice = memref.subview %rhsShmem [%i, 0, 0][1, 64, 128][1, 1, 1] : memref<2x64x128xf16, #gpu.address_space<workgroup>> to memref<64x128xf16, strided<[128, 1], offset: ?>, #gpu.address_space<workgroup>>
+      %lhsSlice = memref.subview %lhsShmem [%i, 0, 0][1, 128, 64][1, 1, 1] : memref<2x128x64xf16, #gpu.address_space<workgroup>> to memref<128x64xf16, strided<[64, 1]>, #gpu.address_space<workgroup>>
+      %rhsSlice = memref.subview %rhsShmem [%i, 0, 0][1, 64, 128][1, 1, 1] : memref<2x64x128xf16, #gpu.address_space<workgroup>> to memref<64x128xf16, strided<[128, 1]>, #gpu.address_space<workgroup>>
       // Descriptor WGMMA
-      %dA = nvgpu.warpgroup.generate.descriptor %lhsSlice, %descA : memref<128x64xf16, strided<[64, 1], offset: ?>, #gpu.address_space<workgroup>>, !lhsTensorMap -> !nvgpu.warpgroup.descriptor<tensor=memref<128x64xf16, 3>>
-      %dB = nvgpu.warpgroup.generate.descriptor %rhsSlice, %descB : memref<64x128xf16, strided<[128, 1], offset: ?>, #gpu.address_space<workgroup>>, !rhsTensorMap -> !nvgpu.warpgroup.descriptor<tensor=memref<64x128xf16, 3>>
+      %dA = nvgpu.warpgroup.generate.descriptor %lhsSlice, %descA : memref<128x64xf16, strided<[64, 1]>, #gpu.address_space<workgroup>>, !lhsTensorMap -> !nvgpu.warpgroup.descriptor<tensor=memref<128x64xf16, 3>>
+      %dB = nvgpu.warpgroup.generate.descriptor %rhsSlice, %descB : memref<64x128xf16, strided<[128, 1]>, #gpu.address_space<workgroup>>, !rhsTensorMap -> !nvgpu.warpgroup.descriptor<tensor=memref<64x128xf16, 3>>
       // Perform WGMMA 128x128x64
       %md  = nvgpu.warpgroup.mma %dA, %dB, %mc {transposeB} : <tensor = memref<128x64xf16,3>>, <tensor = memref<64x128xf16,3>>, <fragmented = vector<128x128xf32>> -> <fragmented = vector<128x128xf32>>
       scf.yield %md : !nvgpu.warpgroup.accumulator<fragmented = vector<128x128xf32>>
diff --git a/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py b/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
index bf983d96e2ed8..eb54ce6fcc711 100644
--- a/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
+++ b/mlir/test/Integration/GPU/CUDA/sm90/python/tools/matmulBuilder.py
@@ -611,7 +611,7 @@ def generate_matmul_ws(
                 rty = ir.MemRefType.get(
                     (BLOCK_M, BLOCK_N),
                     c_elem_ty,
-                    ir.Attribute.parse("strided<[" + str(N) + ", 1], offset: ?>"),
+                    ir.Attribute.parse("strided<[" + str(N) + ", 1]>"),
                 )
                 c_device_per_block = memref.SubViewOp(
                     rty,
@@ -1113,7 +1113,7 @@ def generate_matmul_multistage(
                 rty = ir.MemRefType.get(
                     (BLOCK_M, BLOCK_N),
                     c_elem_ty,
-                    ir.Attribute.parse("strided<[" + str(N) + ", 1], offset: ?>"),
+                    ir.Attribute.parse("strided<[" + str(N) + ", 1]>"),
                 )
                 c_device_per_block = memref.SubViewOp(
                     rty,
diff --git a/mlir/test/Integration/GPU/CUDA/sm90/tma_load_128x128_stride_noswizzle.mlir b/mlir/test/Integration/GPU/CUDA/sm90/tma_load_128x128_stride_noswizzle.mlir
index f281c028ebcae..958f023b95db5 100644
--- a/mlir/test/Integration/GPU/CUDA/sm90/tma_load_128x128_stride_noswizzle.mlir
+++ b/mlir/test/Integration/GPU/CUDA/sm90/tma_load_128x128_stride_noswizzle.mlir
@@ -93,8 +93,8 @@ module {
         scf.for %arg15 = %c0 to %c2 step %c1 {
           %38 = arith.muli %arg14, %c64 : index
           %39 = arith.muli %arg15, %c64 : index
-          %subview = memref.subview %view[%arg14, %arg15, 0, 0] [1, 1, 64, 64] [1, 1, 1, 1] : memref<2x2x64x64xf16, #gpu.address_space<workgroup>> to memref<64x64xf16, strided<[64, 1], offset: ?>, #gpu.address_space<workgroup>>
-          %subview_0 = memref.subview %dstMemref[%38, %39] [64, 64] [1, 1] : memref<128x128xf16> to memref<64x64xf16, strided<[128, 1], offset: ?>>
+          %subview = memref.subview %view[%arg14, %arg15, 0, 0] [1, 1, 64, 64] [1, 1, 1, 1] : memref<2x2x64x64xf16, #gpu.address_space<workgroup>> to memref<64x64xf16, strided<[64, 1]>, #gpu.address_space<workgroup>>
+          %subview_0 = memref.subview %dstMemref[%38, %39] [64, 64] [1, 1] : memref<128x128xf16> to memref<64x64xf16, strided<[128, 1]>>
           %block_dim_x = gpu.block_dim x
           %thread_id_y = gpu.thread_id y
           %40 = arith.muli %thread_id_y, %block_dim_x : index
@@ -108,8 +108,8 @@ module {
           scf.if %45 {
             scf.for %arg16 = %c0 to %c64 step %c1 {
               scf.for %arg17 = %c0 to %c64 step %c1 {
-                %46 = memref.load %subview[%arg16, %arg17] : memref<64x64xf16, strided<[64, 1], offset: ?>, #gpu.address_space<workgroup>>
-                memref.store %46, %subview_0[%arg16, %arg17] : memref<64x64xf16, strided<[128, 1], offset: ?>>
+                %46 = memref.load %subview[%arg16, %arg17] : memref<64x64xf16, strided<[64, 1]>, #gpu.address_space<workgroup>>
+                memref.store %46, %subview_0[%arg16, %arg17] : memref<64x64xf16, strided<[128, 1]>>
               }
             }
           }
diff --git a/mlir/test/Transforms/canonicalize.mlir b/mlir/test/Transforms/canonicalize.mlir
index 8e02c06a0a293..35fe199610ae2 100644
--- a/mlir/test/Transforms/canonicalize.mlir
+++ b/mlir/test/Transforms/canonicalize.mlir
@@ -499,9 +499,9 @@ func.func @dim_op_fold(%arg0: index, %arg1: index, %arg2: index, %BUF: memref<?x
     affine.for %arg4 = 0 to %ub {
       %s = memref.dim %0, %c0 : memref<?x?xf32>
       %v = memref.view %3[%c0][%arg4, %s] : memref<?xi8> to memref<?x?xf32>
-      %sv = memref.subview %0[%c0, %c0][%s,%arg4][%c1,%c1] : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+      %sv = memref.subview %0[%c0, %c0][%s,%arg4][%c1,%c1] : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
       %l = memref.dim %v, %c1 : memref<?x?xf32>
-      %u = memref.dim %sv, %c0 : memref<?x?xf32, strided<[?, ?], offset: ?>>
+      %u = memref.dim %sv, %c0 : memref<?x?xf32, strided<[?, ?]>>
       affine.for %arg5 = %l to %u {
         "foo"() : () -> ()
       }
@@ -752,7 +752,7 @@ func.func @subview(%arg0 : index, %arg1 : index) -> (index, index) {
   %c15 = arith.constant 15 : index
 
   // CHECK: %[[ALLOC0:.*]] = memref.alloc()
-  %0 = memref.alloc() : memref<128x96x64xf32, strided<[6144, 64, 1], offset: 0>>
+  %0 = memref.alloc() : memref<128x96x64xf32, strided<[6144, 64, 1]>>
 
   // Test: subview with constant base memref and constant operands is folded.
   // Note that the subview uses the base memrefs layout map because it used
@@ -761,106 +761,106 @@ func.func @subview(%arg0 : index, %arg1 : index) -> (index, index) {
   // CHECK-SAME: memref<128x96x64xf32, strided<[6144, 64, 1]>>
   // CHECK-SAME: to memref<7x11x2xf32, strided<[6144, 64, 1]>>
   %1 = memref.subview %0[%c0, %c0, %c0] [%c7, %c11, %c2] [%c1, %c1, %c1]
-    : memref<128x96x64xf32, strided<[6144, 64, 1], offset: 0>> to
-      memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  %v0 = memref.load %1[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    : memref<128x96x64xf32, strided<[6144, 64, 1]>> to
+      memref<?x?x?xf32, strided<[?, ?, ?]>>
+  %v0 = memref.load %1[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // Test: subview with one dynamic operand can also be folded.
   // CHECK: memref.subview %[[ALLOC0]][0, %[[ARG0]], 0] [7, 11, 15] [1, 1, 1] :
   // CHECK-SAME: memref<128x96x64xf32, strided<[6144, 64, 1]>>
-  // CHECK-SAME: to memref<7x11x15xf32, strided<[6144, 64, 1], offset: ?>>
+  // CHECK-SAME: to memref<7x11x15xf32, strided<[6144, 64, 1]>>
   %2 = memref.subview %0[%c0, %arg0, %c0] [%c7, %c11, %c15] [%c1, %c1, %c1]
-    : memref<128x96x64xf32, strided<[6144, 64, 1], offset: 0>> to
-      memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  memref.store %v0, %2[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    : memref<128x96x64xf32, strided<[6144, 64, 1]>> to
+      memref<?x?x?xf32, strided<[?, ?, ?]>>
+  memref.store %v0, %2[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // CHECK: %[[ALLOC1:.*]] = memref.alloc(%[[ARG0]])
-  %3 = memref.alloc(%arg0) : memref<?x16x4xf32, strided<[64, 4, 1], offset: 0>>
+  %3 = memref.alloc(%arg0) : memref<?x16x4xf32, strided<[64, 4, 1]>>
   // Test: subview with constant operands but dynamic base memref is folded as long as the strides and offset of the base memref are static.
   // CHECK: memref.subview %[[ALLOC1]][0, 0, 0] [7, 11, 2] [1, 1, 1] :
   // CHECK-SAME: memref<?x16x4xf32, strided<[64, 4, 1]>>
   // CHECK-SAME: to memref<7x11x2xf32, strided<[64, 4, 1]>>
   %4 = memref.subview %3[%c0, %c0, %c0] [%c7, %c11, %c2] [%c1, %c1, %c1]
-    : memref<?x16x4xf32, strided<[64, 4, 1], offset: 0>> to
-      memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  memref.store %v0, %4[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    : memref<?x16x4xf32, strided<[64, 4, 1]>> to
+      memref<?x?x?xf32, strided<[?, ?, ?]>>
+  memref.store %v0, %4[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // Test: subview offset operands are folded correctly w.r.t. base strides.
   // CHECK: memref.subview %[[ALLOC0]][1, 2, 7] [7, 11, 2] [1, 1, 1] :
   // CHECK-SAME: memref<128x96x64xf32, strided<[6144, 64, 1]>> to
-  // CHECK-SAME: memref<7x11x2xf32, strided<[6144, 64, 1], offset: 6279>>
+  // CHECK-SAME: memref<7x11x2xf32, strided<[6144, 64, 1]>>
   %5 = memref.subview %0[%c1, %c2, %c7] [%c7, %c11, %c2] [%c1, %c1, %c1]
-    : memref<128x96x64xf32, strided<[6144, 64, 1], offset: 0>> to
-      memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  memref.store %v0, %5[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    : memref<128x96x64xf32, strided<[6144, 64, 1]>> to
+      memref<?x?x?xf32, strided<[?, ?, ?]>>
+  memref.store %v0, %5[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // Test: subview stride operands are folded correctly w.r.t. base strides.
   // CHECK: memref.subview %[[ALLOC0]][0, 0, 0] [7, 11, 2] [2, 7, 11] :
   // CHECK-SAME: memref<128x96x64xf32, strided<[6144, 64, 1]>>
   // CHECK-SAME: to memref<7x11x2xf32, strided<[12288, 448, 11]>>
   %6 = memref.subview %0[%c0, %c0, %c0] [%c7, %c11, %c2] [%c2, %c7, %c11]
-    : memref<128x96x64xf32, strided<[6144, 64, 1], offset: 0>> to
-      memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  memref.store %v0, %6[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    : memref<128x96x64xf32, strided<[6144, 64, 1]>> to
+      memref<?x?x?xf32, strided<[?, ?, ?]>>
+  memref.store %v0, %6[%c0, %c0, %c0] : memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // Test: subview shape are folded, but offsets and strides are not even if base memref is static
   // CHECK: memref.subview %[[ALLOC0]][%[[ARG0]], %[[ARG0]], %[[ARG0]]] [7, 11, 2] [%[[ARG1]], %[[ARG1]], %[[ARG1]]] :
   // CHECK-SAME: memref<128x96x64xf32, strided<[6144, 64, 1]>> to
-  // CHECK-SAME: memref<7x11x2xf32, strided<[?, ?, ?], offset: ?>>
+  // CHECK-SAME: memref<7x11x2xf32, strided<[?, ?, ?]>>
   %10 = memref.subview %0[%arg0, %arg0, %arg0] [%c7, %c11, %c2] [%arg1, %arg1, %arg1] :
-    memref<128x96x64xf32, strided<[6144, 64, 1], offset: 0>> to
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<128x96x64xf32, strided<[6144, 64, 1]>> to
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
   memref.store %v0, %10[%arg1, %arg1, %arg1] :
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // Test: subview strides are folded, but offsets and shape are not even if base memref is static
   // CHECK: memref.subview %[[ALLOC0]][%[[ARG0]], %[[ARG0]], %[[ARG0]]] [%[[ARG1]], %[[ARG1]], %[[ARG1]]] [2, 7, 11] :
   // CHECK-SAME: memref<128x96x64xf32, strided<[6144, 64, 1]>> to
-  // CHECK-SAME: memref<?x?x?xf32, strided<[12288, 448, 11], offset: ?>>
+  // CHECK-SAME: memref<?x?x?xf32, strided<[12288, 448, 11]>>
   %11 = memref.subview %0[%arg0, %arg0, %arg0] [%arg1, %arg1, %arg1] [%c2, %c7, %c11] :
-    memref<128x96x64xf32, strided<[6144, 64, 1], offset: 0>> to
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<128x96x64xf32, strided<[6144, 64, 1]>> to
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
   memref.store %v0, %11[%arg0, %arg0, %arg0] :
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // Test: subview offsets are folded, but strides and shape are not even if base memref is static
   // CHECK: memref.subview %[[ALLOC0]][1, 2, 7] [%[[ARG1]], %[[ARG1]], %[[ARG1]]] [%[[ARG0]], %[[ARG0]], %[[ARG0]]] :
   // CHECK-SAME: memref<128x96x64xf32, strided<[6144, 64, 1]>> to
-  // CHECK-SAME: memref<?x?x?xf32, strided<[?, ?, ?], offset: 6279>>
+  // CHECK-SAME: memref<?x?x?xf32, strided<[?, ?, ?]>>
   %13 = memref.subview %0[%c1, %c2, %c7] [%arg1, %arg1, %arg1] [%arg0, %arg0, %arg0] :
-    memref<128x96x64xf32, strided<[6144, 64, 1], offset: 0>> to
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<128x96x64xf32, strided<[6144, 64, 1]>> to
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
   memref.store %v0, %13[%arg1, %arg1, %arg1] :
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // CHECK: %[[ALLOC2:.*]] = memref.alloc(%[[ARG0]], %[[ARG0]], %[[ARG1]])
   %14 = memref.alloc(%arg0, %arg0, %arg1) : memref<?x?x?xf32>
   // Test: subview shape are folded, even if base memref is not static
   // CHECK: memref.subview %[[ALLOC2]][%[[ARG0]], %[[ARG0]], %[[ARG0]]] [7, 11, 2] [%[[ARG1]], %[[ARG1]], %[[ARG1]]] :
   // CHECK-SAME: memref<?x?x?xf32> to
-  // CHECK-SAME: memref<7x11x2xf32, strided<[?, ?, ?], offset: ?>>
+  // CHECK-SAME: memref<7x11x2xf32, strided<[?, ?, ?]>>
   %15 = memref.subview %14[%arg0, %arg0, %arg0] [%c7, %c11, %c2] [%arg1, %arg1, %arg1] :
     memref<?x?x?xf32> to
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  memref.store %v0, %15[%arg1, %arg1, %arg1] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
+  memref.store %v0, %15[%arg1, %arg1, %arg1] : memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // TEST: subview strides are folded, in the type only the most minor stride is folded.
   // CHECK: memref.subview %[[ALLOC2]][%[[ARG0]], %[[ARG0]], %[[ARG0]]] [%[[ARG1]], %[[ARG1]], %[[ARG1]]] [2, 2, 2] :
   // CHECK-SAME: memref<?x?x?xf32> to
-  // CHECK-SAME: memref<?x?x?xf32, strided<[?, ?, 2], offset: ?>>
+  // CHECK-SAME: memref<?x?x?xf32, strided<[?, ?, 2]>>
   %16 = memref.subview %14[%arg0, %arg0, %arg0] [%arg1, %arg1, %arg1] [%c2, %c2, %c2] :
     memref<?x?x?xf32> to
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  memref.store %v0, %16[%arg0, %arg0, %arg0] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
+  memref.store %v0, %16[%arg0, %arg0, %arg0] : memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // TEST: subview offsets are folded but the type offset remains dynamic, when the base memref is not static
   // CHECK: memref.subview %[[ALLOC2]][1, 1, 1] [%[[ARG0]], %[[ARG0]], %[[ARG0]]] [%[[ARG1]], %[[ARG1]], %[[ARG1]]] :
   // CHECK-SAME: memref<?x?x?xf32> to
-  // CHECK-SAME: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  // CHECK-SAME: memref<?x?x?xf32, strided<[?, ?, ?]>>
   %17 = memref.subview %14[%c1, %c1, %c1] [%arg0, %arg0, %arg0] [%arg1, %arg1, %arg1] :
     memref<?x?x?xf32> to
-    memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  memref.store %v0, %17[%arg0, %arg0, %arg0] : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+    memref<?x?x?xf32, strided<[?, ?, ?]>>
+  memref.store %v0, %17[%arg0, %arg0, %arg0] : memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // CHECK: %[[ALLOC3:.*]] = memref.alloc() : memref<128x64xf32>
   %18 = memref.alloc() : memref<128x64xf32>
@@ -869,24 +869,24 @@ func.func @subview(%arg0 : index, %arg1 : index) -> (index, index) {
   // TEST: subview strides are maintained when sizes are folded
   // CHECK: memref.subview %[[ALLOC3]][%arg1, %arg1] [2, 4] [1, 1] :
   // CHECK-SAME: memref<128x64xf32> to
-  // CHECK-SAME: memref<2x4xf32, strided<[64, 1], offset: ?>
+  // CHECK-SAME: memref<2x4xf32, strided<[64, 1]>
   %19 = memref.subview %18[%arg1, %arg1] [%c2, %c4] [1, 1] :
     memref<128x64xf32> to
-    memref<?x?xf32, strided<[64, 1], offset: ?>>
-  memref.store %v0, %19[%arg1, %arg1] : memref<?x?xf32, strided<[64, 1], offset: ?>>
+    memref<?x?xf32, strided<[64, 1]>>
+  memref.store %v0, %19[%arg1, %arg1] : memref<?x?xf32, strided<[64, 1]>>
 
   // TEST: subview strides and sizes are maintained when offsets are folded
   // CHECK: memref.subview %[[ALLOC3]][2, 4] [12, 4] [1, 1] :
   // CHECK-SAME: memref<128x64xf32> to
-  // CHECK-SAME: memref<12x4xf32, strided<[64, 1], offset: 132>>
+  // CHECK-SAME: memref<12x4xf32, strided<[64, 1]>>
   %20 = memref.subview %18[%c2, %c4] [12, 4] [1, 1] :
     memref<128x64xf32> to
-    memref<12x4xf32, strided<[64, 1], offset: ?>>
-  memref.store %v0, %20[%arg1, %arg1] : memref<12x4xf32, strided<[64, 1], offset: ?>>
+    memref<12x4xf32, strided<[64, 1]>>
+  memref.store %v0, %20[%arg1, %arg1] : memref<12x4xf32, strided<[64, 1]>>
 
   // Test: dim on subview is rewritten to size operand.
-  %7 = memref.dim %4, %c0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
-  %8 = memref.dim %4, %c1 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
+  %7 = memref.dim %4, %c0 : memref<?x?x?xf32, strided<[?, ?, ?]>>
+  %8 = memref.dim %4, %c1 : memref<?x?x?xf32, strided<[?, ?, ?]>>
 
   // CHECK: return %[[C7]], %[[C11]]
   return %7, %8 : index, index
@@ -1046,11 +1046,11 @@ func.func @tensor_arith.ceildivui_by_one(%arg0: tensor<4x5xi32>) -> tensor<4x5xi
 // -----
 
 // CHECK-LABEL: func @memref_cast_folding_subview
-func.func @memref_cast_folding_subview(%arg0: memref<4x5xf32>, %i: index) -> (memref<?x?xf32, strided<[?, ?], offset: ?>>) {
+func.func @memref_cast_folding_subview(%arg0: memref<4x5xf32>, %i: index) -> (memref<?x?xf32, strided<[?, ?]>>) {
   %0 = memref.cast %arg0 : memref<4x5xf32> to memref<?x?xf32>
   // CHECK-NEXT: memref.subview %{{.*}}: memref<4x5xf32>
-  %1 = memref.subview %0[%i, %i][%i, %i][%i, %i]: memref<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
-  return %1: memref<?x?xf32, strided<[?, ?], offset: ?>>
+  %1 = memref.subview %0[%i, %i][%i, %i][%i, %i]: memref<?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
+  return %1: memref<?x?xf32, strided<[?, ?]>>
 }
 
 // -----
diff --git a/mlir/test/Transforms/compose-subview.mlir b/mlir/test/Transforms/compose-subview.mlir
index d6fa442fe5300..9d058a3fa039b 100644
--- a/mlir/test/Transforms/compose-subview.mlir
+++ b/mlir/test/Transforms/compose-subview.mlir
@@ -1,105 +1,105 @@
 // RUN: mlir-opt %s -test-compose-subview -split-input-file | FileCheck %s
 
 // CHECK-LABEL: func.func @subview_strided(
-// CHECK-SAME: %[[input:.*]]: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1], offset: 3456>> {
-func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1], offset: 3456>> {
-  // CHECK: {{.*}} = memref.subview %[[input]][3, 384] [1, 128] [1, 1] : memref<4x1024xf32> to memref<1x128xf32, strided<[1024, 1], offset: 3456>>
-  %0 = memref.subview %input[2, 256] [2, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1], offset: 2304>>
-  %1 = memref.subview %0[1, 128] [1, 128] [1, 1] : memref<2x256xf32, strided<[1024, 1], offset: 2304>> to memref<1x128xf32, strided<[1024, 1], offset: 3456>>
-  return %1 : memref<1x128xf32, strided<[1024, 1], offset: 3456>>
+// CHECK-SAME: %[[input:.*]]: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1]>> {
+func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1]>> {
+  // CHECK: {{.*}} = memref.subview %[[input]][3, 384] [1, 128] [1, 1] : memref<4x1024xf32> to memref<1x128xf32, strided<[1024, 1]>>
+  %0 = memref.subview %input[2, 256] [2, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1]>>
+  %1 = memref.subview %0[1, 128] [1, 128] [1, 1] : memref<2x256xf32, strided<[1024, 1]>> to memref<1x128xf32, strided<[1024, 1]>>
+  return %1 : memref<1x128xf32, strided<[1024, 1]>>
 }
 
 // -----
 
 // CHECK-LABEL: func.func @subview_strided(
-// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x10xf32, strided<[1024, 1], offset: 3745>> {
-func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x10xf32, strided<[1024, 1], offset: 3745>> {
-  // CHECK:     {{.*}} = memref.subview %[[input]][3, 673] [1, 10] [1, 1] : memref<4x1024xf32> to memref<1x10xf32, strided<[1024, 1], offset: 3745>>
-  %0 = memref.subview %input[1, 512] [3, 256] [1, 1] : memref<4x1024xf32> to memref<3x256xf32, strided<[1024, 1], offset: 1536>>
-  %1 = memref.subview %0[1, 128] [2, 128] [1, 1] : memref<3x256xf32, strided<[1024, 1], offset: 1536>> to memref<2x128xf32, strided<[1024, 1], offset: 2688>>
-  %2 = memref.subview %1[1, 33] [1, 10] [1, 1] : memref<2x128xf32, strided<[1024, 1], offset: 2688>> to memref<1x10xf32, strided<[1024, 1], offset: 3745>>
-  return %2 : memref<1x10xf32, strided<[1024, 1], offset: 3745>>
+// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x10xf32, strided<[1024, 1]>> {
+func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x10xf32, strided<[1024, 1]>> {
+  // CHECK:     {{.*}} = memref.subview %[[input]][3, 673] [1, 10] [1, 1] : memref<4x1024xf32> to memref<1x10xf32, strided<[1024, 1]>>
+  %0 = memref.subview %input[1, 512] [3, 256] [1, 1] : memref<4x1024xf32> to memref<3x256xf32, strided<[1024, 1]>>
+  %1 = memref.subview %0[1, 128] [2, 128] [1, 1] : memref<3x256xf32, strided<[1024, 1]>> to memref<2x128xf32, strided<[1024, 1]>>
+  %2 = memref.subview %1[1, 33] [1, 10] [1, 1] : memref<2x128xf32, strided<[1024, 1]>> to memref<1x10xf32, strided<[1024, 1]>>
+  return %2 : memref<1x10xf32, strided<[1024, 1]>>
 }
 
 // -----
 
 // CHECK-LABEL: func.func @subview_strided(
-// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1], offset: ?>> {
-func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1], offset: ?>> {
+// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1]>> {
+func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1]>> {
   // CHECK: %[[C3:.*]] = arith.constant 3 : index
   %cst_1 = arith.constant 1 : index
   %cst_2 = arith.constant 2 : index
-  // CHECK: {{.*}} = memref.subview %[[input]]{{\[}}%[[C3]], 384] [1, 128] [1, 1] : memref<4x1024xf32> to memref<1x128xf32, strided<[1024, 1], offset: ?>>
-  %0 = memref.subview %input[%cst_2, 256] [2, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1], offset: ?>>
-  %1 = memref.subview %0[%cst_1, 128] [1, 128] [1, 1] : memref<2x256xf32, strided<[1024, 1], offset: ?>> to memref<1x128xf32, strided<[1024, 1], offset: ?>>
-  return %1 : memref<1x128xf32, strided<[1024, 1], offset: ?>>
+  // CHECK: {{.*}} = memref.subview %[[input]]{{\[}}%[[C3]], 384] [1, 128] [1, 1] : memref<4x1024xf32> to memref<1x128xf32, strided<[1024, 1]>>
+  %0 = memref.subview %input[%cst_2, 256] [2, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1]>>
+  %1 = memref.subview %0[%cst_1, 128] [1, 128] [1, 1] : memref<2x256xf32, strided<[1024, 1]>> to memref<1x128xf32, strided<[1024, 1]>>
+  return %1 : memref<1x128xf32, strided<[1024, 1]>>
 }
 
 // -----
 
 // CHECK-LABEL: func.func @subview_strided(
-// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1], offset: ?>> {
-func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1], offset: ?>> {
+// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1]>> {
+func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1]>> {
   // CHECK: %[[C3:.*]] = arith.constant 3 : index
   %cst_2 = arith.constant 2 : index
   // CHECK: %[[C384:.*]] = arith.constant 384 : index
   %cst_128 = arith.constant 128 : index
-  // CHECK: {{.*}} = memref.subview %[[input]]{{\[}}%[[C3]], %[[C384]]] [1, 128] [1, 1] : memref<4x1024xf32> to memref<1x128xf32, strided<[1024, 1], offset: ?>>
-  %0 = memref.subview %input[%cst_2, 256] [2, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1], offset: ?>>
-  %1 = memref.subview %0[1, %cst_128] [1, 128] [1, 1] : memref<2x256xf32, strided<[1024, 1], offset: ?>> to memref<1x128xf32, strided<[1024, 1], offset: ?>>
-  return %1 : memref<1x128xf32, strided<[1024, 1], offset: ?>>
+  // CHECK: {{.*}} = memref.subview %[[input]]{{\[}}%[[C3]], %[[C384]]] [1, 128] [1, 1] : memref<4x1024xf32> to memref<1x128xf32, strided<[1024, 1]>>
+  %0 = memref.subview %input[%cst_2, 256] [2, 256] [1, 1] : memref<4x1024xf32> to memref<2x256xf32, strided<[1024, 1]>>
+  %1 = memref.subview %0[1, %cst_128] [1, 128] [1, 1] : memref<2x256xf32, strided<[1024, 1]>> to memref<1x128xf32, strided<[1024, 1]>>
+  return %1 : memref<1x128xf32, strided<[1024, 1]>>
 }
 
 // -----
 
 // CHECK-LABEL: func.func @subview_strided(
-// CHECK-SAME:  %[[input:.*]]: memref<8x1024xf32>) -> memref<1x64xf32, strided<[4096, 4], offset: 4480>> {
-func.func @subview_strided(%input: memref<8x1024xf32>) -> memref<1x64xf32, strided<[4096, 4], offset: 4480>> {
-  // CHECK:     {{.*}} = memref.subview %[[input]][4, 384] [1, 64] [4, 4] : memref<8x1024xf32> to memref<1x64xf32, strided<[4096, 4], offset: 4480>>
-  %0 = memref.subview %input[2, 256] [2, 256] [2, 2] : memref<8x1024xf32> to memref<2x256xf32, strided<[2048, 2], offset: 2304>>
-  %1 = memref.subview %0[1, 64] [1, 64] [2, 2] : memref<2x256xf32, strided<[2048, 2], offset: 2304>> to memref<1x64xf32, strided<[4096, 4], offset: 4480>>
-  return %1 : memref<1x64xf32, strided<[4096, 4], offset: 4480>>
+// CHECK-SAME:  %[[input:.*]]: memref<8x1024xf32>) -> memref<1x64xf32, strided<[4096, 4]>> {
+func.func @subview_strided(%input: memref<8x1024xf32>) -> memref<1x64xf32, strided<[4096, 4]>> {
+  // CHECK:     {{.*}} = memref.subview %[[input]][4, 384] [1, 64] [4, 4] : memref<8x1024xf32> to memref<1x64xf32, strided<[4096, 4]>>
+  %0 = memref.subview %input[2, 256] [2, 256] [2, 2] : memref<8x1024xf32> to memref<2x256xf32, strided<[2048, 2]>>
+  %1 = memref.subview %0[1, 64] [1, 64] [2, 2] : memref<2x256xf32, strided<[2048, 2]>> to memref<1x64xf32, strided<[4096, 4]>>
+  return %1 : memref<1x64xf32, strided<[4096, 4]>>
 }
 
 // -----
 
 // CHECK-LABEL: func.func @subview_strided(
-// CHECK-SAME:  %[[input:.*]]: memref<30x30xf32>) -> memref<2x2xf32, strided<[240, 8], offset: 217>> {
-func.func @subview_strided(%input: memref<30x30xf32>) -> memref<2x2xf32, strided<[240, 8], offset: 217>> {
-  // CHECK:     {{.*}} = memref.subview %[[input]][7, 7] [2, 2] [8, 8] : memref<30x30xf32> to memref<2x2xf32, strided<[240, 8], offset: 217>>
-  %0 = memref.subview %input[1, 1] [12, 12] [2, 2] : memref<30x30xf32> to memref<12x12xf32, strided<[60, 2], offset: 31>>
-  %1 = memref.subview %0[1, 1] [5, 5] [2, 2] : memref<12x12xf32, strided<[60, 2], offset: 31>> to memref<5x5xf32, strided<[120, 4], offset: 93>>
-  %2 = memref.subview %1[1, 1] [2, 2] [2, 2] : memref<5x5xf32, strided<[120, 4], offset: 93>> to memref<2x2xf32, strided<[240, 8], offset: 217>>
-  return %2 : memref<2x2xf32, strided<[240, 8], offset: 217>> 
+// CHECK-SAME:  %[[input:.*]]: memref<30x30xf32>) -> memref<2x2xf32, strided<[240, 8]>> {
+func.func @subview_strided(%input: memref<30x30xf32>) -> memref<2x2xf32, strided<[240, 8]>> {
+  // CHECK:     {{.*}} = memref.subview %[[input]][7, 7] [2, 2] [8, 8] : memref<30x30xf32> to memref<2x2xf32, strided<[240, 8]>>
+  %0 = memref.subview %input[1, 1] [12, 12] [2, 2] : memref<30x30xf32> to memref<12x12xf32, strided<[60, 2]>>
+  %1 = memref.subview %0[1, 1] [5, 5] [2, 2] : memref<12x12xf32, strided<[60, 2]>> to memref<5x5xf32, strided<[120, 4]>>
+  %2 = memref.subview %1[1, 1] [2, 2] [2, 2] : memref<5x5xf32, strided<[120, 4]>> to memref<2x2xf32, strided<[240, 8]>>
+  return %2 : memref<2x2xf32, strided<[240, 8]>> 
 }
 
 // -----
 
 // CHECK-LABEL: func.func @subview_strided(
-// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x64xf32, strided<[4096, 4], offset: ?>> {
-func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x64xf32, strided<[4096, 4], offset: ?>> {
+// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x64xf32, strided<[4096, 4]>> {
+func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x64xf32, strided<[4096, 4]>> {
   // CHECK:     %[[C4:.*]] = arith.constant 4 : index
   %cst_2 = arith.constant 2 : index
   // CHECK:     %[[C384:.*]] = arith.constant 384 : index
   %cst_64 = arith.constant 64 : index
-  // CHECK:     {{.*}} = memref.subview %[[input]]{{\[}}%[[C4]], %[[C384]]] [1, 64] [4, 4] : memref<4x1024xf32> to memref<1x64xf32, strided<[4096, 4], offset: ?>>
-  %0 = memref.subview %input[%cst_2, 256] [2, 256] [2, 2] : memref<4x1024xf32> to memref<2x256xf32, strided<[2048, 2], offset: ?>>
-  %1 = memref.subview %0[1, %cst_64] [1, 64] [2, 2] : memref<2x256xf32, strided<[2048, 2], offset: ?>> to memref<1x64xf32, strided<[4096, 4], offset: ?>>
-  return %1 : memref<1x64xf32, strided<[4096, 4], offset: ?>>
+  // CHECK:     {{.*}} = memref.subview %[[input]]{{\[}}%[[C4]], %[[C384]]] [1, 64] [4, 4] : memref<4x1024xf32> to memref<1x64xf32, strided<[4096, 4]>>
+  %0 = memref.subview %input[%cst_2, 256] [2, 256] [2, 2] : memref<4x1024xf32> to memref<2x256xf32, strided<[2048, 2]>>
+  %1 = memref.subview %0[1, %cst_64] [1, 64] [2, 2] : memref<2x256xf32, strided<[2048, 2]>> to memref<1x64xf32, strided<[4096, 4]>>
+  return %1 : memref<1x64xf32, strided<[4096, 4]>>
 }
 
 // -----
 
 // CHECK-LABEL: func.func @subview_strided(
-// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x64xf32, strided<[4096, 4], offset: ?>> {
-func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x64xf32, strided<[4096, 4], offset: ?>> {
+// CHECK-SAME:  %[[input:.*]]: memref<4x1024xf32>) -> memref<1x64xf32, strided<[4096, 4]>> {
+func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x64xf32, strided<[4096, 4]>> {
   // CHECK:     %[[C4:.*]] = arith.constant 4 : index
   %cst_1 = arith.constant 1 : index
   %cst_2 = arith.constant 2 : index
-  // CHECK:     {{.*}} = memref.subview %[[input]]{{\[}}%[[C4]], 384] [1, 64] [4, 4] : memref<4x1024xf32> to memref<1x64xf32, strided<[4096, 4], offset: ?>>
-  %0 = memref.subview %input[%cst_2, 256] [2, 256] [2, 2] : memref<4x1024xf32> to memref<2x256xf32, strided<[2048, 2], offset: ?>>
-  %1 = memref.subview %0[%cst_1, 64] [1, 64] [2, 2] : memref<2x256xf32, strided<[2048, 2], offset: ?>> to memref<1x64xf32, strided<[4096, 4], offset: ?>>
-  return %1 : memref<1x64xf32, strided<[4096, 4], offset: ?>>
+  // CHECK:     {{.*}} = memref.subview %[[input]]{{\[}}%[[C4]], 384] [1, 64] [4, 4] : memref<4x1024xf32> to memref<1x64xf32, strided<[4096, 4]>>
+  %0 = memref.subview %input[%cst_2, 256] [2, 256] [2, 2] : memref<4x1024xf32> to memref<2x256xf32, strided<[2048, 2]>>
+  %1 = memref.subview %0[%cst_1, 64] [1, 64] [2, 2] : memref<2x256xf32, strided<[2048, 2]>> to memref<1x64xf32, strided<[4096, 4]>>
+  return %1 : memref<1x64xf32, strided<[4096, 4]>>
 }
 
 // -----
diff --git a/mlir/test/Transforms/test-bubble-down-memory-space-casts.mlir b/mlir/test/Transforms/test-bubble-down-memory-space-casts.mlir
index e4fce89cffb45..723dc2d275652 100644
--- a/mlir/test/Transforms/test-bubble-down-memory-space-casts.mlir
+++ b/mlir/test/Transforms/test-bubble-down-memory-space-casts.mlir
@@ -66,32 +66,32 @@ func.func @view(%arg0: memref<?xi8, 1>, %arg1: index, %arg2: index) -> memref<?x
 
 // CHECK-LABEL:   func.func @subview(
 // CHECK-SAME:      %[[ARG0:.*]]: memref<?x?xf32, 1>,
-// CHECK-SAME:      %[[ARG1:.*]]: index) -> memref<8x2xf32, strided<[?, 2], offset: ?>> {
-// CHECK:           %[[VAL_0:.*]] = memref.subview %[[ARG0]][4, 2] [8, 2] [3, 2] : memref<?x?xf32, 1> to memref<8x2xf32, strided<[?, 2], offset: ?>, 1>
-// CHECK:           %[[VAL_1:.*]] = memref.memory_space_cast %[[VAL_0]] : memref<8x2xf32, strided<[?, 2], offset: ?>, 1> to memref<8x2xf32, strided<[?, 2], offset: ?>>
-// CHECK:           return %[[VAL_1]] : memref<8x2xf32, strided<[?, 2], offset: ?>>
+// CHECK-SAME:      %[[ARG1:.*]]: index) -> memref<8x2xf32, strided<[?, 2]>> {
+// CHECK:           %[[VAL_0:.*]] = memref.subview %[[ARG0]][4, 2] [8, 2] [3, 2] : memref<?x?xf32, 1> to memref<8x2xf32, strided<[?, 2]>, 1>
+// CHECK:           %[[VAL_1:.*]] = memref.memory_space_cast %[[VAL_0]] : memref<8x2xf32, strided<[?, 2]>, 1> to memref<8x2xf32, strided<[?, 2]>>
+// CHECK:           return %[[VAL_1]] : memref<8x2xf32, strided<[?, 2]>>
 // CHECK:         }
-func.func @subview(%arg0: memref<?x?xf32, 1>, %arg1: index) -> memref<8x2xf32, strided<[?, 2], offset: ?>> {
+func.func @subview(%arg0: memref<?x?xf32, 1>, %arg1: index) -> memref<8x2xf32, strided<[?, 2]>> {
   %memspacecast = memref.memory_space_cast %arg0 : memref<?x?xf32, 1> to memref<?x?xf32>
-  %subview = memref.subview %memspacecast[4, 2] [8, 2] [3, 2] : memref<?x?xf32> to memref<8x2xf32, strided<[?, 2], offset: ?>>
-  return %subview : memref<8x2xf32, strided<[?, 2], offset: ?>>
+  %subview = memref.subview %memspacecast[4, 2] [8, 2] [3, 2] : memref<?x?xf32> to memref<8x2xf32, strided<[?, 2]>>
+  return %subview : memref<8x2xf32, strided<[?, 2]>>
 }
 
 // CHECK-LABEL:   func.func @reinterpret_cast(
 // CHECK-SAME:      %[[ARG0:.*]]: memref<?xf32, 1>,
-// CHECK-SAME:      %[[ARG1:.*]]: index) -> memref<10x?xf32, strided<[?, 1], offset: ?>> {
+// CHECK-SAME:      %[[ARG1:.*]]: index) -> memref<10x?xf32, strided<[?, 1]>> {
 // CHECK-DAG:       %[[VAL_0:.*]] = arith.constant 10 : index
 // CHECK-DAG:       %[[VAL_1:.*]] = arith.constant 0 : index
-// CHECK:           %[[VAL_2:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: {{\[}}%[[VAL_1]]], sizes: [10, %[[VAL_0]]], strides: {{\[}}%[[VAL_0]], 1] : memref<?xf32, 1> to memref<10x?xf32, strided<[?, 1], offset: ?>, 1>
-// CHECK:           %[[VAL_3:.*]] = memref.memory_space_cast %[[VAL_2]] : memref<10x?xf32, strided<[?, 1], offset: ?>, 1> to memref<10x?xf32, strided<[?, 1], offset: ?>>
-// CHECK:           return %[[VAL_3]] : memref<10x?xf32, strided<[?, 1], offset: ?>>
+// CHECK:           %[[VAL_2:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: {{\[}}%[[VAL_1]]], sizes: [10, %[[VAL_0]]], strides: {{\[}}%[[VAL_0]], 1] : memref<?xf32, 1> to memref<10x?xf32, strided<[?, 1]>, 1>
+// CHECK:           %[[VAL_3:.*]] = memref.memory_space_cast %[[VAL_2]] : memref<10x?xf32, strided<[?, 1]>, 1> to memref<10x?xf32, strided<[?, 1]>>
+// CHECK:           return %[[VAL_3]] : memref<10x?xf32, strided<[?, 1]>>
 // CHECK:         }
-func.func @reinterpret_cast(%arg0: memref<?xf32, 1>, %arg1: index) -> memref<10x?xf32, strided<[?, 1], offset: ?>> {
+func.func @reinterpret_cast(%arg0: memref<?xf32, 1>, %arg1: index) -> memref<10x?xf32, strided<[?, 1]>> {
   %memspacecast = memref.memory_space_cast %arg0 : memref<?xf32, 1> to memref<?xf32>
   %c0 = arith.constant 0 : index
   %c10 = arith.constant 10 : index
-  %reinterpret_cast = memref.reinterpret_cast %memspacecast to offset: [%c0], sizes: [10, %c10], strides: [%c10, 1] : memref<?xf32> to memref<10x?xf32, strided<[?, 1], offset: ?>>
-  return %reinterpret_cast : memref<10x?xf32, strided<[?, 1], offset: ?>>
+  %reinterpret_cast = memref.reinterpret_cast %memspacecast to offset: [%c0], sizes: [10, %c10], strides: [%c10, 1] : memref<?xf32> to memref<10x?xf32, strided<[?, 1]>>
+  return %reinterpret_cast : memref<10x?xf32, strided<[?, 1]>>
 }
 
 // CHECK-LABEL:   func.func @reshape(
diff --git a/mlir/test/mlir-runner/copy.mlir b/mlir/test/mlir-runner/copy.mlir
index ae8d7e611353a..b677c7ce8cb2f 100644
--- a/mlir/test/mlir-runner/copy.mlir
+++ b/mlir/test/mlir-runner/copy.mlir
@@ -38,8 +38,8 @@ func.func @main() -> () {
 
   %copy_two = memref.alloc() : memref<3x2xf32>
   %copy_two_casted = memref.reinterpret_cast %copy_two to offset: [0], sizes: [2, 3], strides: [1, 2]
-    : memref<3x2xf32> to memref<2x3xf32, strided<[1, 2], offset: 0>>
-  memref.copy %input, %copy_two_casted : memref<2x3xf32> to memref<2x3xf32, strided<[1, 2], offset: 0>>
+    : memref<3x2xf32> to memref<2x3xf32, strided<[1, 2]>>
+  memref.copy %input, %copy_two_casted : memref<2x3xf32> to memref<2x3xf32, strided<[1, 2]>>
   %unranked_copy_two = memref.cast %copy_two : memref<3x2xf32> to memref<*xf32>
   call @printMemrefF32(%unranked_copy_two) : (memref<*xf32>) -> ()
   // CHECK: rank = 2 offset = 0 sizes = [3, 2] strides = [2, 1]
@@ -53,10 +53,10 @@ func.func @main() -> () {
   memref.copy %input_empty, %copy_empty : memref<3x0x1xf32> to memref<3x0x1xf32>
 
   %input_empty_casted = memref.reinterpret_cast %input_empty to offset: [0], sizes: [0, 3, 1], strides: [3, 1, 1]
-    : memref<3x0x1xf32> to memref<0x3x1xf32, strided<[3, 1, 1], offset: 0>>
+    : memref<3x0x1xf32> to memref<0x3x1xf32, strided<[3, 1, 1]>>
   %copy_empty_casted = memref.alloc() : memref<0x3x1xf32>
   // Copying a casted empty shape should do nothing (and should not crash).
-  memref.copy %input_empty_casted, %copy_empty_casted : memref<0x3x1xf32, strided<[3, 1, 1], offset: 0>> to memref<0x3x1xf32>
+  memref.copy %input_empty_casted, %copy_empty_casted : memref<0x3x1xf32, strided<[3, 1, 1]>> to memref<0x3x1xf32>
 
   %scalar = memref.alloc() : memref<f32>
   memref.store %c42, %scalar[] : memref<f32>
diff --git a/mlir/test/mlir-runner/memref-reinterpret-cast.mlir b/mlir/test/mlir-runner/memref-reinterpret-cast.mlir
index 42cea6e0bf497..2e15fcded1bb8 100644
--- a/mlir/test/mlir-runner/memref-reinterpret-cast.mlir
+++ b/mlir/test/mlir-runner/memref-reinterpret-cast.mlir
@@ -60,10 +60,10 @@ func.func @cast_ranked_memref_to_dynamic_shape(%input : memref<2x3xf32>) {
   %c6 = arith.constant 6 : index
   %output = memref.reinterpret_cast %input to
            offset: [%c0], sizes: [%c1, %c6], strides: [%c6, %c1]
-           : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+           : memref<2x3xf32> to memref<?x?xf32, strided<[?, ?]>>
 
   %unranked_output = memref.cast %output
-      : memref<?x?xf32, strided<[?, ?], offset: ?>> to memref<*xf32>
+      : memref<?x?xf32, strided<[?, ?]>> to memref<*xf32>
   call @printMemrefF32(%unranked_output) : (memref<*xf32>) -> ()
   // CHECK: rank = 2 offset = 0 sizes = [1, 6] strides = [6, 1] data =
   // CHECK-NEXT: [0,   1,   2,   3,   4,   5]
@@ -96,10 +96,10 @@ func.func @cast_unranked_memref_to_dynamic_shape(%input : memref<2x3xf32>) {
   %c6 = arith.constant 6 : index
   %output = memref.reinterpret_cast %unranked_input to
            offset: [%c0], sizes: [%c1, %c6], strides: [%c6, %c1]
-           : memref<*xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+           : memref<*xf32> to memref<?x?xf32, strided<[?, ?]>>
 
   %unranked_output = memref.cast %output
-      : memref<?x?xf32, strided<[?, ?], offset: ?>> to memref<*xf32>
+      : memref<?x?xf32, strided<[?, ?]>> to memref<*xf32>
   call @printMemrefF32(%unranked_output) : (memref<*xf32>) -> ()
   // CHECK: rank = 2 offset = 0 sizes = [1, 6] strides = [6, 1] data =
   // CHECK-NEXT: [0,   1,   2,   3,   4,   5]
diff --git a/mlir/test/python/dialects/memref.py b/mlir/test/python/dialects/memref.py
index b91fdc367cf30..d1d2b4e9cb627 100644
--- a/mlir/test/python/dialects/memref.py
+++ b/mlir/test/python/dialects/memref.py
@@ -26,7 +26,7 @@ def testSubViewAccessors():
       %3 = arith.constant 3 : index
       %4 = arith.constant 4 : index
       %5 = arith.constant 5 : index
-      memref.subview %arg0[%0, %1][%2, %3][%4, %5] : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
+      memref.subview %arg0[%0, %1][%2, %3][%4, %5] : memref<?x?xf32> to memref<?x?xf32, strided<[?, ?]>>
       return
     }
   """,
@@ -103,7 +103,7 @@ def testSubViewOpInferReturnTypeSemantics():
 
             y = memref.subview(x, [1, 1], [3, 3], [1, 1])
             assert y.owner.verify()
-            # CHECK: %{{.*}} = memref.subview %[[ALLOC]][1, 1] [3, 3] [1, 1] : memref<10x10xi32> to memref<3x3xi32, strided<[10, 1], offset: 11>>
+            # CHECK: %{{.*}} = memref.subview %[[ALLOC]][1, 1] [3, 3] [1, 1] : memref<10x10xi32> to memref<3x3xi32, strided<[10, 1]>>
             print(y.owner)
 
             z = memref.subview(
@@ -112,7 +112,7 @@ def testSubViewOpInferReturnTypeSemantics():
                 [3, 3],
                 [1, 1],
             )
-            # CHECK: %{{.*}} =  memref.subview %[[ALLOC]][1, 1] [3, 3] [1, 1] : memref<10x10xi32> to memref<3x3xi32, strided<[10, 1], offset: 11>>
+            # CHECK: %{{.*}} =  memref.subview %[[ALLOC]][1, 1] [3, 3] [1, 1] : memref<10x10xi32> to memref<3x3xi32, strided<[10, 1]>>
             print(z.owner)
 
             z = memref.subview(
@@ -121,7 +121,7 @@ def testSubViewOpInferReturnTypeSemantics():
                 [3, 3],
                 [1, 1],
             )
-            # CHECK: %{{.*}} =  memref.subview %[[ALLOC]][3, 4] [3, 3] [1, 1] : memref<10x10xi32> to memref<3x3xi32, strided<[10, 1], offset: 34>>
+            # CHECK: %{{.*}} =  memref.subview %[[ALLOC]][3, 4] [3, 3] [1, 1] : memref<10x10xi32> to memref<3x3xi32, strided<[10, 1]>>
             print(z.owner)
 
             s = arith.addi(arith.constant(T.index(), 3), arith.constant(T.index(), 4))
@@ -131,7 +131,7 @@ def testSubViewOpInferReturnTypeSemantics():
                 [3, 3],
                 [1, 1],
             )
-            # CHECK: {{.*}} = memref.subview %[[ALLOC]][%0, 0] [3, 3] [1, 1] : memref<10x10xi32> to memref<3x3xi32, strided<[10, 1], offset: ?>>
+            # CHECK: {{.*}} = memref.subview %[[ALLOC]][%0, 0] [3, 3] [1, 1] : memref<10x10xi32> to memref<3x3xi32, strided<[10, 1]>>
             print(z)
 
             try:
@@ -167,7 +167,7 @@ def testSubViewOpInferReturnTypeSemantics():
                 [],
                 [arith.constant(T.index(), 42)],
             )
-            # CHECK: %[[DYNAMICALLOC:.*]] = memref.alloc()[%c42] : memref<10x10xi32, strided<[10, 1], offset: ?>>
+            # CHECK: %[[DYNAMICALLOC:.*]] = memref.alloc()[%c42] : memref<10x10xi32, strided<[10, 1]>>
             print(x.owner)
             y = memref.subview(
                 x,
@@ -176,7 +176,7 @@ def testSubViewOpInferReturnTypeSemantics():
                 [1, 1],
                 result_type=T.memref(3, 3, T.i32(), layout=layout),
             )
-            # CHECK: %{{.*}} = memref.subview %[[DYNAMICALLOC]][1, 1] [3, 3] [1, 1] : memref<10x10xi32, strided<[10, 1], offset: ?>> to memref<3x3xi32, strided<[10, 1], offset: ?>>
+            # CHECK: %{{.*}} = memref.subview %[[DYNAMICALLOC]][1, 1] [3, 3] [1, 1] : memref<10x10xi32, strided<[10, 1]>> to memref<3x3xi32, strided<[10, 1]>>
             print(y.owner)
 
 
diff --git a/mlir/test/python/execution_engine.py b/mlir/test/python/execution_engine.py
index 858ee089042ad..ce03dc70adea2 100644
--- a/mlir/test/python/execution_engine.py
+++ b/mlir/test/python/execution_engine.py
@@ -283,12 +283,12 @@ def callback(a):
             r"""
 func.func @callback_memref(%arg0: memref<5xf32>) attributes {llvm.emit_c_interface} {
   %base_buffer, %offset, %sizes, %strides = memref.extract_strided_metadata %arg0 : memref<5xf32> -> memref<f32>, index, index, index
-  %reinterpret_cast = memref.reinterpret_cast %base_buffer to offset: [3], sizes: [2], strides: [1] : memref<f32> to memref<2xf32, strided<[1], offset: 3>>
-  %cast = memref.cast %reinterpret_cast : memref<2xf32, strided<[1], offset: 3>> to memref<?xf32, strided<[?], offset: ?>>
-  call @some_callback_into_python(%cast) : (memref<?xf32, strided<[?], offset: ?>>) -> ()
+  %reinterpret_cast = memref.reinterpret_cast %base_buffer to offset: [3], sizes: [2], strides: [1] : memref<f32> to memref<2xf32, strided<[1]>>
+  %cast = memref.cast %reinterpret_cast : memref<2xf32, strided<[1]>> to memref<?xf32, strided<[?]>>
+  call @some_callback_into_python(%cast) : (memref<?xf32, strided<[?]>>) -> ()
   return
 }
-func.func private @some_callback_into_python(memref<?xf32, strided<[?], offset: ?>>) attributes {llvm.emit_c_interface}
+func.func private @some_callback_into_python(memref<?xf32, strided<[?]>>) attributes {llvm.emit_c_interface}
 """
         )
         execution_engine = ExecutionEngine(lowerToLLVM(module))
@@ -322,8 +322,8 @@ def callback(a):
             r"""
 func.func @callback_memref(%arg0: memref<5xf32>) attributes {llvm.emit_c_interface} {
     %base_buffer, %offset, %sizes, %strides = memref.extract_strided_metadata %arg0 : memref<5xf32> -> memref<f32>, index, index, index
-    %reinterpret_cast = memref.reinterpret_cast %base_buffer to offset: [3], sizes: [2], strides: [1] : memref<f32> to memref<2xf32, strided<[1], offset: 3>>
-    %cast = memref.cast %reinterpret_cast : memref<2xf32, strided<[1], offset: 3>> to memref<*xf32>
+    %reinterpret_cast = memref.reinterpret_cast %base_buffer to offset: [3], sizes: [2], strides: [1] : memref<f32> to memref<2xf32, strided<[1]>>
+    %cast = memref.cast %reinterpret_cast : memref<2xf32, strided<[1]>> to memref<*xf32>
     call @some_callback_into_python(%cast) : (memref<*xf32>) -> ()
     return
 }
diff --git a/mlir/test/python/ir/attributes.py b/mlir/test/python/ir/attributes.py
index 3ba3788023293..d086c7eb0b5d4 100644
--- a/mlir/test/python/ir/attributes.py
+++ b/mlir/test/python/ir/attributes.py
@@ -644,11 +644,9 @@ def testArrayAttr():
 @run
 def testStridedLayoutAttr():
     with Context():
-        attr = StridedLayoutAttr.get(42, [5, 7, 13])
-        # CHECK: strided<[5, 7, 13], offset: 42>
+        attr = StridedLayoutAttr.get([5, 7, 13])
+        # CHECK: strided<[5, 7, 13]>
         print(attr)
-        # CHECK: 42
-        print(attr.offset)
         # CHECK: 3
         print(len(attr.strides))
         # CHECK: 5
@@ -660,10 +658,8 @@ def testStridedLayoutAttr():
 
         attr = StridedLayoutAttr.get_fully_dynamic(3)
         dynamic = ShapedType.get_dynamic_stride_or_offset()
-        # CHECK: strided<[?, ?, ?], offset: ?>
+        # CHECK: strided<[?, ?, ?]>
         print(attr)
-        # CHECK: offset is dynamic: True
-        print(f"offset is dynamic: {attr.offset == dynamic}")
         # CHECK: rank: 3
         print(f"rank: {len(attr.strides)}")
         # CHECK: strides are dynamic: [True, True, True]
diff --git a/mlir/test/python/ir/builtin_types.py b/mlir/test/python/ir/builtin_types.py
index 3fa93f9d04630..bfc7980f36ffe 100644
--- a/mlir/test/python/ir/builtin_types.py
+++ b/mlir/test/python/ir/builtin_types.py
@@ -953,8 +953,8 @@ def testCustomTypeTypeCaster():
 # CHECK-LABEL: TEST: testTypeWrappers
 @run
 def testTypeWrappers():
-    def stride(strides, offset=0):
-        return StridedLayoutAttr.get(offset, strides)
+    def stride(strides):
+        return StridedLayoutAttr.get(strides)
 
     with Context(), Location.unknown():
         ia = T.i(5)
@@ -987,12 +987,6 @@ def stride(strides, offset=0):
         m3 = T.memref(2, 3, 4, T.f64(), memory_space=1, layout=stride([5, 7, 13]))
         assert repr(m3) == "MemRefType(memref<2x3x4xf64, strided<[5, 7, 13]>, 1>)"
 
-        m4 = T.memref(2, 3, 4, T.f64(), memory_space=1, layout=stride([5, 7, 13], 42))
-        assert (
-            repr(m4)
-            == "MemRefType(memref<2x3x4xf64, strided<[5, 7, 13], offset: 42>, 1>)"
-        )
-
         S = ShapedType.get_dynamic_size()
 
         t1 = T.tensor(S, 3, S, T.f64())
diff --git a/mlir/unittests/Dialect/MemRef/InferShapeTest.cpp b/mlir/unittests/Dialect/MemRef/InferShapeTest.cpp
index 3937095c119c3..6810e0d11bc20 100644
--- a/mlir/unittests/Dialect/MemRef/InferShapeTest.cpp
+++ b/mlir/unittests/Dialect/MemRef/InferShapeTest.cpp
@@ -24,7 +24,7 @@ TEST(InferShapeTest, inferRankReducedShapeIdentity) {
       /*resultShape=*/{2}, sourceMemref, {2, 3}, {1, 2}, {1, 1});
   auto expectedType = MemRefType::get(
       {2}, b.getIndexType(),
-      StridedLayoutAttr::get(&ctx, /*offset=*/13, /*strides=*/{1}));
+      StridedLayoutAttr::get(&ctx, /*strides=*/{1}));
   EXPECT_EQ(reducedType, expectedType);
 }
 
@@ -40,7 +40,7 @@ TEST(InferShapeTest, inferRankReducedShapeNonIdentity) {
       /*resultShape=*/{2}, sourceMemref, {2, 3}, {1, 2}, {1, 1});
   auto expectedType = MemRefType::get(
       {2}, b.getIndexType(),
-      StridedLayoutAttr::get(&ctx, /*offset=*/2003, /*strides=*/{1}));
+      StridedLayoutAttr::get(&ctx, /*strides=*/{1}));
   EXPECT_EQ(reducedType, expectedType);
 }
 
@@ -55,6 +55,6 @@ TEST(InferShapeTest, inferRankReducedShapeToScalar) {
       /*resultShape=*/{}, sourceMemref, {2, 3}, {1, 1}, {1, 1});
   auto expectedType = MemRefType::get(
       {}, b.getIndexType(),
-      StridedLayoutAttr::get(&ctx, /*offset=*/2003, /*strides=*/{}));
+      StridedLayoutAttr::get(&ctx, /*strides=*/{}));
   EXPECT_EQ(reducedType, expectedType);
 }
diff --git a/mlir/unittests/IR/MemrefLayoutTest.cpp b/mlir/unittests/IR/MemrefLayoutTest.cpp
index f243a76ee660c..76adf94b11661 100644
--- a/mlir/unittests/IR/MemrefLayoutTest.cpp
+++ b/mlir/unittests/IR/MemrefLayoutTest.cpp
@@ -25,7 +25,7 @@ TEST(MemRefLayout, numContigDim) {
   const int64_t _ = ShapedType::kDynamic;
   const FloatType f32 = b.getF32Type();
   auto strided = [&ctx](ArrayRef<int64_t> s) {
-    return StridedLayoutAttr::get(&ctx, 0, s);
+    return StridedLayoutAttr::get(&ctx, s);
   };
 
   // Special case for identity maps and no explicit `strided` attribute - the
@@ -94,7 +94,7 @@ TEST(MemRefLayout, contigTrailingDim) {
   const int64_t _ = ShapedType::kDynamic;
   const FloatType f32 = b.getF32Type();
   auto strided = [&ctx](ArrayRef<int64_t> s) {
-    return StridedLayoutAttr::get(&ctx, 0, s);
+    return StridedLayoutAttr::get(&ctx, s);
   };
 
   // A not-entirely-continuous, not-entirely-discontinuous memref.

>From 85b660234fb313275031b517d64b412c81ef1046 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 01:35:53 +0200
Subject: [PATCH 04/27] [WIP][mlir] step 2 follow-ups: collapse_shape, narrow
 type, stale assert

- CollapseShapeOp: treat strided<[]> as equivalent to identity for rank-0
  results in both the type builder and the verifier.
- EmulateNarrowType: emit strided<[1]> only when the linearized shape is
  non-empty; rank-0 stays identity.
- ExpandStridedMetadata: drop the now-vacuous assertion that compared the
  computed offset against the subview result type's static offset.
- A few test files (invalid.mlir, multibuffer.mlir, alloc-symbol cases)
  updated for the new symbol counts and removed-checks.

Subset status: 41/1694 dialect/conversion/IR/Transforms tests still
failing (down from ~120 across the full suite). Remaining clusters are
CHECK-line drift in printer-driven tests and the SparseTensor integration
runtime hangs noted in the prior commit message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp      | 28 +++++++-
 .../MemRef/Transforms/EmulateNarrowType.cpp   | 16 +++--
 .../Transforms/ExpandStridedMetadata.cpp      |  9 +--
 mlir/test/Dialect/MemRef/invalid.mlir         | 72 +------------------
 mlir/test/Dialect/MemRef/multibuffer.mlir     | 50 ++++++-------
 5 files changed, 62 insertions(+), 113 deletions(-)

diff --git a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
index 9c52f64099278..67abdb4da09da 100644
--- a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+++ b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
@@ -2860,7 +2860,11 @@ MemRefType CollapseShapeOp::computeCollapsedType(
       computeCollapsedLayoutMap(srcType, reassociation);
   assert(succeeded(computedLayout) &&
          "invalid source layout map or collapsing non-contiguous dims");
-  return MemRefType::get(resultShape, srcType.getElementType(), *computedLayout,
+  // strided<[]> is degenerate and equivalent to the identity layout.
+  MemRefLayoutAttrInterface layout = *computedLayout;
+  if (computedLayout->getStrides().empty())
+    layout = MemRefLayoutAttrInterface();
+  return MemRefType::get(resultShape, srcType.getElementType(), layout,
                          srcType.getMemorySpace());
 }
 
@@ -2916,7 +2920,27 @@ LogicalResult CollapseShapeOp::verify() {
                         *computedLayout, srcType.getMemorySpace());
   }
 
-  if (expectedResultType != resultType)
+  // For rank-0 results the strided layout degenerates to strided<[]> which
+  // is equivalent to the identity layout. Treat the two forms as equal.
+  auto layoutsEquivalent = [](MemRefType a, MemRefType b) {
+    if (a == b)
+      return true;
+    if (a.getRank() != 0 || b.getRank() != 0)
+      return false;
+    if (a.getElementType() != b.getElementType())
+      return false;
+    if (a.getMemorySpace() != b.getMemorySpace())
+      return false;
+    auto isIdentityOrEmptyStrided = [](MemRefLayoutAttrInterface l) {
+      if (!l || l.isIdentity())
+        return true;
+      auto strided = dyn_cast<StridedLayoutAttr>(l);
+      return strided && strided.getStrides().empty();
+    };
+    return isIdentityOrEmptyStrided(a.getLayout()) &&
+           isIdentityOrEmptyStrided(b.getLayout());
+  };
+  if (!layoutsEquivalent(expectedResultType, resultType))
     return emitOpError("expected collapsed type to be ")
            << expectedResultType << " but found " << resultType;
 
diff --git a/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp b/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
index c1a4716fc8668..d38f21f791d29 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
@@ -690,16 +690,18 @@ void memref::populateMemRefNarrowTypeEmulationConversions(
         if (!newElemTy)
           return nullptr;
 
-        // The strided layout no longer carries offset information. The
-        // lowering of any op that produced an offset against the source memref
-        // is responsible for materializing the equivalent offset on the
-        // narrow-element memref.
+        // The strided layout no longer carries offset information; runtime
+        // offsets live on the producing op. Emit an explicit strided layout
+        // for the (rank-1) linearized form so downstream patterns that key
+        // on layout presence keep working; rank-0 stays identity.
+        SmallVector<int64_t> linearizedShape =
+            getLinearizedShape(ty, width, loadStoreWidth);
         StridedLayoutAttr layoutAttr;
-        if (offset != 0)
+        if (!linearizedShape.empty())
           layoutAttr =
               StridedLayoutAttr::get(ty.getContext(), ArrayRef<int64_t>{1});
 
-        return MemRefType::get(getLinearizedShape(ty, width, loadStoreWidth),
-                               newElemTy, layoutAttr, ty.getMemorySpace());
+        return MemRefType::get(linearizedShape, newElemTy, layoutAttr,
+                               ty.getMemorySpace());
       });
 }
diff --git a/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp b/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp
index 20d543b7210b1..cda14f1c3cf2c 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp
@@ -114,14 +114,7 @@ resolveSubviewStridedMetadata(RewriterBase &rewriter,
   // Compute the offset.
   OpFoldResult finalOffset =
       makeComposedFoldedAffineApply(rewriter, origLoc, expr, values);
-#ifndef NDEBUG
-  // Assert that the computed offset matches the offset of the result type of
-  // the subview op (if both are static).
-  std::optional<int64_t> computedOffset = getConstantIntValue(finalOffset);
-  if (computedOffset && ShapedType::isStatic(resultOffset))
-    assert(*computedOffset == resultOffset &&
-           "mismatch between computed offset and result type offset");
-#endif // NDEBUG
+  (void)resultOffset;
 
   // The final result is  <baseBuffer, offset, sizes, strides>.
   // Thus we need 1 + 1 + subview.getRank() + subview.getRank(), to hold all
diff --git a/mlir/test/Dialect/MemRef/invalid.mlir b/mlir/test/Dialect/MemRef/invalid.mlir
index c8ce8fda648df..f0a63bdaa9ef3 100644
--- a/mlir/test/Dialect/MemRef/invalid.mlir
+++ b/mlir/test/Dialect/MemRef/invalid.mlir
@@ -142,7 +142,7 @@ func.func @transpose_bad_rank(%v : memref<?x?xf32, affine_map<(i, j)[off, M]->(o
 // -----
 
 func.func @transpose_wrong_type(%v : memref<?x?xf32, affine_map<(i, j)[off, M]->(off + M * i + j)>>) {
-  // expected-error @+1 {{result type 'memref<?x?xf32, affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>>' is not equivalent to the canonical transposed input type 'memref<?x?xf32, affine_map<(d0, d1)[s0, s1] -> (d0 + s0 + d1 * s1)>>'}}
+  // expected-error @+1 {{result type 'memref<?x?xf32, affine_map<(d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)>>' is not equivalent to the canonical transposed input type 'memref<?x?xf32, affine_map<(d0, d1)[s0] -> (d0 + d1 * s0)>>'}}
   memref.transpose %v (i, j) -> (j, i) : memref<?x?xf32, affine_map<(i, j)[off, M]->(off + M * i + j)>> to memref<?x?xf32, affine_map<(i, j)[off, M]->(off + M * i + j)>>
 }
 
@@ -178,16 +178,6 @@ func.func @memref_reinterpret_cast_incompatible_memory_space(%in: memref<*xf32>)
 
 // -----
 
-func.func @memref_reinterpret_cast_offset_mismatch(%in: memref<?xf32>) {
-  // expected-error @+1 {{expected result type with offset = 1 instead of 2}}
-  %out = memref.reinterpret_cast %in to
-           offset: [1], sizes: [10], strides: [1]
-         : memref<?xf32> to memref<10xf32, strided<[1]>>
-  return
-}
-
-// -----
-
 func.func @memref_reinterpret_cast_size_mismatch(%in: memref<*xf32>) {
   // expected-error @+1 {{expected result type with size = 10 instead of 1 in dim = 0}}
   %out = memref.reinterpret_cast %in to
@@ -208,24 +198,6 @@ func.func @memref_reinterpret_cast_offset_mismatch(%in: memref<?xf32>) {
 
 // -----
 
-func.func @memref_reinterpret_cast_no_map_but_offset(%in: memref<?xf32>) {
-  // expected-error @+1 {{expected result type with offset = 2 instead of 0}}
-  %out = memref.reinterpret_cast %in to offset: [2], sizes: [10], strides: [1]
-         : memref<?xf32> to memref<10xf32>
-  return
-}
-
-// -----
-
-func.func @memref_reinterpret_cast_offset_mismatch_dynamic(%in: memref<?xf32>, %offset : index) {
-  // expected-error @+1 {{expected result type with offset = dynamic instead of 0}}
-  %out = memref.reinterpret_cast %in to offset: [%offset], sizes: [10], strides: [1]
-         : memref<?xf32> to memref<10xf32>
-  return
-}
-
-// -----
-
 func.func @memref_reinterpret_cast_no_map_but_stride(%in: memref<?xf32>) {
   // expected-error @+1 {{expected result type with stride = 10 instead of 1 in dim = 0}}
   %out = memref.reinterpret_cast %in to offset: [0], sizes: [10], strides: [10]
@@ -797,40 +769,6 @@ func.func @invalid_rank_reducing_subview(%arg0 : memref<?x?xf32>, %arg1 : index,
 
 // -----
 
-#map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1)>
-
-func.func @subview_bad_offset_1(%arg0: memref<16x16xf32>) {
-  %c0 = arith.constant 0 : index
-  %c8 = arith.constant 8 : index
-  // expected-error @+1 {{expected result type to be 'memref<8x8xf32, strided<[16, 1]>>' or a rank-reduced version}}
-  %s2 = memref.subview %arg0[%c8, %c8][8, 8][1, 1]  : memref<16x16xf32> to memref<8x8xf32, #map0>
-  return
-}
-
-// -----
-
-#map0 = affine_map<(d0, d1)[s0] -> (d0 * 16 + d1 + 136)>
-
-func.func @subview_bad_offset_2(%arg0: memref<16x16xf32>) {
-  %c0 = arith.constant 0 : index
-  %c8 = arith.constant 8 : index
-  // expected-error @+1 {{expected result type to be 'memref<8x8xf32, strided<[16, 1]>>' or a rank-reduced version}}
-  %s2 = memref.subview %arg0[%c8, 8][8, 8][1, 1]  : memref<16x16xf32> to memref<8x8xf32, #map0>
-  return
-}
-
-// -----
-
-func.func @subview_bad_offset_3(%arg0: memref<16x16xf32>) {
-  %c0 = arith.constant 0 : index
-  %c8 = arith.constant 8 : index
-  // expected-error @+1 {{expected result type to be 'memref<8x8xf32, strided<[16, 1]>>' or a rank-reduced version}}
-  %s2 = memref.subview %arg0[%c8, 8][8, 8][1, 1]  : memref<16x16xf32> to memref<8x8xf32, strided<[16, 1]>>
-  return
-}
-
-// -----
-
 func.func @invalid_memref_cast(%arg0 : memref<12x4x16xf32, strided<[64, 16, 1]>>) {
   // expected-error at +1{{operand type 'memref<12x4x16xf32, strided<[64, 16, 1]>>' and result type 'memref<12x4x16xf32, strided<[128, 32, 2]>>' are cast incompatible}}
   %0 = memref.cast %arg0 : memref<12x4x16xf32, strided<[64, 16, 1]>> to memref<12x4x16xf32, strided<[128, 32, 2]>>
@@ -839,14 +777,6 @@ func.func @invalid_memref_cast(%arg0 : memref<12x4x16xf32, strided<[64, 16, 1]>>
 
 // -----
 
-func.func @invalid_memref_cast(%arg0 : memref<12x4x16xf32, strided<[64, 16, 1]>>) {
-  // expected-error at +1{{operand type 'memref<12x4x16xf32, strided<[64, 16, 1]>>' and result type 'memref<12x4x16xf32, strided<[64, 16, 1]>>' are cast incompatible}}
-  %0 = memref.cast %arg0 : memref<12x4x16xf32, strided<[64, 16, 1]>> to memref<12x4x16xf32, strided<[64, 16, 1]>>
-  return
-}
-
-// -----
-
 // incompatible element types
 func.func @invalid_memref_cast() {
   %0 = memref.alloc() : memref<2x5xf32, 0>
diff --git a/mlir/test/Dialect/MemRef/multibuffer.mlir b/mlir/test/Dialect/MemRef/multibuffer.mlir
index 68e80048889d6..2dadf9cc57fd4 100644
--- a/mlir/test/Dialect/MemRef/multibuffer.mlir
+++ b/mlir/test/Dialect/MemRef/multibuffer.mlir
@@ -16,9 +16,9 @@ func.func @multi_buffer(%a: memref<1024x1024xf32>) {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
 // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
    %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
-    memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
-   memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
+    memref<1024x1024xf32> to memref<4x128xf32, strided<[1024, 1]>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, strided{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
+   memref.copy %1, %0 : memref<4x128xf32, strided<[1024, 1]>> to memref<4x128xf32>
 // CHECK: "some_use"(%[[SV]]) : (memref<4x128xf32, strided{{.*}}>) -> ()
     "some_use"(%0) : (memref<4x128xf32>) -> ()
 // CHECK: "some_use"(%[[SV]]) : (memref<4x128xf32, strided{{.*}}>) -> ()
@@ -41,9 +41,9 @@ func.func @multi_buffer_affine(%a: memref<1024x1024xf32>) {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
 // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
    %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
-    memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
-   memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
+    memref<1024x1024xf32> to memref<4x128xf32, strided<[1024, 1]>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, strided{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
+   memref.copy %1, %0 : memref<4x128xf32, strided<[1024, 1]>> to memref<4x128xf32>
 // CHECK: "some_use"(%[[SV]]) : (memref<4x128xf32, strided{{.*}}>) -> ()
     "some_use"(%0) : (memref<4x128xf32>) -> ()
 // CHECK: "some_use"(%[[SV]]) : (memref<4x128xf32, strided{{.*}}>) -> ()
@@ -70,14 +70,14 @@ func.func @multi_buffer_subview_use(%a: memref<1024x1024xf32>) {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
 // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
    %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
-    memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
-   memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
+    memref<1024x1024xf32> to memref<4x128xf32, strided<[1024, 1]>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, strided{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
+   memref.copy %1, %0 : memref<4x128xf32, strided<[1024, 1]>> to memref<4x128xf32>
 // CHECK: %[[SV1:.*]] = memref.subview %[[SV]][0, 1] [4, 127] [1, 1] : memref<4x128xf32, strided<[128, 1]>> to memref<4x127xf32, strided<[128, 1]>>
    %s = memref.subview %0[0, 1] [4, 127] [1, 1] :
-      memref<4x128xf32> to memref<4x127xf32, affine_map<(d0, d1) -> (d0 * 128 + d1 + 1)>>
+      memref<4x128xf32> to memref<4x127xf32, strided<[128, 1]>>
 // CHECK: "some_use"(%[[SV1]]) : (memref<4x127xf32, strided<[128, 1]>>) -> ()
-   "some_use"(%s) : (memref<4x127xf32, affine_map<(d0, d1) -> (d0 * 128 + d1 + 1)>>) -> ()
+   "some_use"(%s) : (memref<4x127xf32, strided<[128, 1]>>) -> ()
 // CHECK: "some_use"(%[[SV]]) : (memref<4x128xf32, strided<[128, 1]>>) -> ()
    "some_use"(%0) : (memref<4x128xf32>) -> ()
   }
@@ -97,8 +97,8 @@ func.func @multi_buffer_negative(%a: memref<1024x1024xf32>) {
   scf.for %arg2 = %c0 to %c1024 step %c3 {
    "blocking_use"(%0) : (memref<4x128xf32>) -> ()
    %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
-    memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-   memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
+    memref<1024x1024xf32> to memref<4x128xf32, strided<[1024, 1]>>
+   memref.copy %1, %0 : memref<4x128xf32, strided<[1024, 1]>> to memref<4x128xf32>
    "some_use"(%0) : (memref<4x128xf32>) -> ()
   }
   return
@@ -122,9 +122,9 @@ func.func @multi_buffer_expand_shape(%a: memref<1024x1024xf32>) {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
 // CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
     %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
-        memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
-    memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
+        memref<1024x1024xf32> to memref<4x128xf32, strided<[1024, 1]>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, strided{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
+    memref.copy %1, %0 : memref<4x128xf32, strided<[1024, 1]>> to memref<4x128xf32>
 // CHECK: %[[EXPANDED:.*]] = memref.expand_shape %[[SV]] {{\[\[}}0, 1], [2, 3]] output_shape [2, 2, 64, 2] : memref<4x128xf32, strided<[128, 1]>> into memref<2x2x64x2xf32, strided<[256, 128, 2, 1]>>
     %expanded = memref.expand_shape %0 [[0, 1], [2, 3]] output_shape [2, 2, 64, 2]
         : memref<4x128xf32> into memref<2x2x64x2xf32>
@@ -152,9 +152,9 @@ func.func @multi_buffer_collapse_shape(%a: memref<1024x1024xf32>) {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
 // CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
     %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
-        memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
-    memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
+        memref<1024x1024xf32> to memref<4x128xf32, strided<[1024, 1]>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, strided{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
+    memref.copy %1, %0 : memref<4x128xf32, strided<[1024, 1]>> to memref<4x128xf32>
 // CHECK: %[[COLLAPSED:.*]] = memref.collapse_shape %[[SV]] {{\[\[}}0, 1]] : memref<4x128xf32, strided<[128, 1]>> into memref<512xf32, strided<[1]>>
     %collapsed = memref.collapse_shape %0 [[0, 1]]
         : memref<4x128xf32> into memref<512xf32>
@@ -182,9 +182,9 @@ func.func @multi_buffer_cast(%a: memref<1024x1024xf32>) {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
 // CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
     %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
-        memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
-    memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
+        memref<1024x1024xf32> to memref<4x128xf32, strided<[1024, 1]>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, strided{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
+    memref.copy %1, %0 : memref<4x128xf32, strided<[1024, 1]>> to memref<4x128xf32>
 // CHECK: %[[CAST:.*]] = memref.cast %[[SV]] : memref<4x128xf32, strided<[128, 1]>> to memref<?x128xf32>
     %casted = memref.cast %0 : memref<4x128xf32> to memref<?x128xf32>
 // CHECK: "some_use"(%[[CAST]]) : (memref<?x128xf32>) -> ()
@@ -211,9 +211,9 @@ func.func @multi_buffer_chained_view_ops(%a: memref<1024x1024xf32>) {
 // CHECK: %[[I:.*]] = affine.apply #[[$MAP1]](%[[IV]])
 // CHECK: %[[SV:.*]] = memref.subview %[[ALLOC]][%[[I]], 0, 0] [1, 4, 128] [1, 1, 1] : memref<5x4x128xf32> to memref<4x128xf32, strided<[128, 1]>>
     %1 = memref.subview %a[%arg2, 0] [4, 128] [1, 1] :
-        memref<1024x1024xf32> to memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
-// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, #{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
-    memref.copy %1, %0 : memref<4x128xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>> to memref<4x128xf32>
+        memref<1024x1024xf32> to memref<4x128xf32, strided<[1024, 1]>>
+// CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4x128xf32, strided{{.*}}> to memref<4x128xf32, strided<[128, 1]>>
+    memref.copy %1, %0 : memref<4x128xf32, strided<[1024, 1]>> to memref<4x128xf32>
 // CHECK: %[[EXPANDED:.*]] = memref.expand_shape %[[SV]] {{\[\[}}0, 1], [2, 3]] output_shape [2, 2, 64, 2] : memref<4x128xf32, strided<[128, 1]>> into memref<2x2x64x2xf32, strided<[256, 128, 2, 1]>>
     %expanded = memref.expand_shape %0 [[0, 1], [2, 3]] output_shape [2, 2, 64, 2]
         : memref<4x128xf32> into memref<2x2x64x2xf32>

>From 8858800580c11d87a3ee6f9af17073d2343a6da3 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 01:59:18 +0200
Subject: [PATCH 05/27] [WIP][mlir] step 2 follow-ups: more CHECK line and test
 updates

- EmulateNarrowType: revert to identity layout (no strided<[1]>) for
  linearized memrefs. Downstream patterns handle identity correctly and
  this matches prior behavior for the offset==0 case.
- Conversion/MemRefToLLVM: drop now-unused offset extractvalue and GEP
  from CHECK patterns; offset is now always materialized as constant 0.
- Conversion/FuncToLLVM: update offset constant in BAREPTR descriptor.
- Dialect/Affine/memref-stride-calculation: drop redundant offset
  symbol operands in alloc cases; update expected offsets to 0.
- Dialect/MemRef/emulate-narrow-type variants: bulk-strip strided<[1]>
  from result types where the new lowering produces identity.
- expand-strided-metadata: extend offset stripping regex to cover the
  spaced "offset : N" form.

Subset progress: 32/1694 dialect tests still failing (down from 41).
Remaining are mostly larger CHECK rewrites in expand-strided-metadata,
XeGPU conversion patterns, and several SCF/Linalg/Bufferization tests
where pipelines now produce different IR shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 .../MemRef/Transforms/EmulateNarrowType.cpp   | 15 ++++--------
 .../FuncToLLVM/func-memref-return.mlir        |  2 +-
 .../expand-then-convert-to-llvm.mlir          | 13 ++++------
 .../MemRefToLLVM/memref-to-llvm.mlir          |  8 ++-----
 .../Affine/memref-stride-calculation.mlir     |  6 ++---
 .../Dialect/MemRef/emulate-narrow-type.mlir   | 24 +++++++++----------
 .../MemRef/expand-strided-metadata.mlir       |  4 ++--
 7 files changed, 29 insertions(+), 43 deletions(-)

diff --git a/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp b/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
index d38f21f791d29..d86c3a9448c28 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
@@ -691,17 +691,10 @@ void memref::populateMemRefNarrowTypeEmulationConversions(
           return nullptr;
 
         // The strided layout no longer carries offset information; runtime
-        // offsets live on the producing op. Emit an explicit strided layout
-        // for the (rank-1) linearized form so downstream patterns that key
-        // on layout presence keep working; rank-0 stays identity.
-        SmallVector<int64_t> linearizedShape =
-            getLinearizedShape(ty, width, loadStoreWidth);
-        StridedLayoutAttr layoutAttr;
-        if (!linearizedShape.empty())
-          layoutAttr =
-              StridedLayoutAttr::get(ty.getContext(), ArrayRef<int64_t>{1});
-
-        return MemRefType::get(linearizedShape, newElemTy, layoutAttr,
+        // offsets live on the producing op. The linearized memref keeps its
+        // identity layout.
+        return MemRefType::get(getLinearizedShape(ty, width, loadStoreWidth),
+                               newElemTy, MemRefLayoutAttrInterface(),
                                ty.getMemorySpace());
       });
 }
diff --git a/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir b/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
index a9036959b4a7b..95a786d9ab0ff 100644
--- a/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
+++ b/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
@@ -47,7 +47,7 @@ func.func @check_static_return_with_offset(%static : memref<32x18xf32, strided<[
 // BAREPTR: %[[udf:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // BAREPTR-NEXT: %[[base0:.*]] = llvm.insertvalue %[[arg]], %[[udf]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // BAREPTR-NEXT: %[[aligned:.*]] = llvm.insertvalue %[[arg]], %[[base0]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// BAREPTR-NEXT: %[[val0:.*]] = llvm.mlir.constant(7 : index) : i64
+// BAREPTR-NEXT: %[[val0:.*]] = llvm.mlir.constant(0 : index) : i64
 // BAREPTR-NEXT: %[[ins0:.*]] = llvm.insertvalue %[[val0]], %[[aligned]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // BAREPTR-NEXT: %[[val1:.*]] = llvm.mlir.constant(32 : index) : i64
 // BAREPTR-NEXT: %[[ins1:.*]] = llvm.insertvalue %[[val1]], %[[ins0]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
diff --git a/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir b/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
index bd89db7b20c54..c9158cea321de 100644
--- a/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
@@ -422,7 +422,7 @@ func.func @collapse_shape_dynamic_with_non_identity_layout(
 // CHECK:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<4x?x?xf32, strided<[?, 4, 1]>> to !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i64,
 // CHECK:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i64,
-// CHECK:           %[[OFFSET:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK:           %[[OFFSET:.*]] = llvm.mlir.constant(0 : index) : i64
 // CHECK:           %[[SIZE1:.*]] = llvm.extractvalue %[[MEM]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[SIZE2:.*]] = llvm.extractvalue %[[MEM]][3, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
@@ -447,7 +447,7 @@ func.func @collapse_shape_dynamic_with_non_identity_layout(
 // CHECK32:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<4x?x?xf32, strided<[?, 4, 1]>> to !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
 // CHECK32:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i32,
 // CHECK32:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i32,
-// CHECK32:           %[[OFFSET:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
+// CHECK32:           %[[OFFSET:.*]] = llvm.mlir.constant(0 : index) : i32
 // CHECK32:           %[[SIZE1:.*]] = llvm.extractvalue %[[MEM]][3, 1] : !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
 // CHECK32:           %[[SIZE2:.*]] = llvm.extractvalue %[[MEM]][3, 2] : !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
 // CHECK32:           %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM]][4, 0] : !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
@@ -641,7 +641,6 @@ func.func @expand_shape_dynamic_with_non_identity_layout(
 // CHECK:           %[[INSERTVALUE_0:.*]] = llvm.insertvalue %[[EXTRACTVALUE_0]], %[[MLIR_0]][0] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[INSERTVALUE_1:.*]] = llvm.insertvalue %[[EXTRACTVALUE_1]], %[[INSERTVALUE_0]][1] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[MLIR_1:.*]] = llvm.mlir.constant(0 : index) : i64
-// CHECK:           %[[EXTRACTVALUE_2:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[EXTRACTVALUE_3:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[EXTRACTVALUE_4:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[MUL_0:.*]] = llvm.mul %[[EXTRACTVALUE_4]], %[[UNREALIZED_CONVERSION_CAST_0]] overflow<nsw> : i64
@@ -650,7 +649,7 @@ func.func @expand_shape_dynamic_with_non_identity_layout(
 // CHECK:           %[[MLIR_2:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_2:.*]] = llvm.insertvalue %[[EXTRACTVALUE_0]], %[[MLIR_2]][0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_3:.*]] = llvm.insertvalue %[[EXTRACTVALUE_1]], %[[INSERTVALUE_2]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[INSERTVALUE_4:.*]] = llvm.insertvalue %[[EXTRACTVALUE_2]], %[[INSERTVALUE_3]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK:           %[[INSERTVALUE_4:.*]] = llvm.insertvalue %[[MLIR_1]], %[[INSERTVALUE_3]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[MLIR_3:.*]] = llvm.mlir.constant(1 : index) : i64
 // CHECK:           %[[INSERTVALUE_5:.*]] = llvm.insertvalue %[[MLIR_3]], %[[INSERTVALUE_4]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_6:.*]] = llvm.insertvalue %[[EXTRACTVALUE_3]], %[[INSERTVALUE_5]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
@@ -683,10 +682,8 @@ func.func @collapse_static_shape_with_non_identity_layout(%arg: memref<1x1x8x8xf
 // CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf32, strided<[?, ?]>>,
 // CHECK: %[[DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xf32, strided<[?, ?]>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK: %[[ALIGNED_PTR:.*]] = llvm.extractvalue %[[DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK: %[[OFFSET:.*]] = llvm.extractvalue %[[DESC]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK: %[[BUFF_ADDR:.*]] = llvm.getelementptr %[[ALIGNED_PTR]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, f32
-// CHECK: llvm.intr.assume %{{.*}} ["align"(%[[BUFF_ADDR]], %{{.*}} : !llvm.ptr, i64)] : i1
-// CHECK: %[[LD_ADDR:.*]] = llvm.getelementptr inbounds|nuw %[[BUFF_ADDR]][%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
+// CHECK: llvm.intr.assume %{{.*}} ["align"(%[[ALIGNED_PTR]], %{{.*}} : !llvm.ptr, i64)] : i1
+// CHECK: %[[LD_ADDR:.*]] = llvm.getelementptr inbounds|nuw %[[ALIGNED_PTR]][%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[VAL:.*]] = llvm.load %[[LD_ADDR]] : !llvm.ptr -> f32
 // CHECK: return %[[VAL]] : f32
 func.func @load_and_assume(
diff --git a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
index fede45f965329..3a0f85fad49b0 100644
--- a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
@@ -229,11 +229,9 @@ func.func @distinct_objects_noop(%arg0: memref<?xf16>) -> memref<?xf16> {
 // CHECK-INTERFACE-LABEL: func @assume_alignment_w_offset
 func.func @assume_alignment_w_offset(%0 : memref<4x4xf16, strided<[?, ?]>>) {
   // CHECK-DAG: %[[PTR:.*]] = llvm.extractvalue %[[MEMREF:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-  // CHECK-DAG: %[[OFFSET:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-  // CHECK-DAG: %[[BUFF_ADDR:.*]] =  llvm.getelementptr %[[PTR]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, f16
   // CHECK-DAG: %[[TRUE:.*]] = llvm.mlir.constant(true) : i1
   // CHECK-DAG: %[[ALIGN:.*]] = llvm.mlir.constant(16 : index) : i64
-  // CHECK-NEXT: llvm.intr.assume %[[TRUE]] ["align"(%[[BUFF_ADDR]], %[[ALIGN]] : !llvm.ptr, i64)] : i1
+  // CHECK: llvm.intr.assume %[[TRUE]] ["align"(%[[PTR]], %[[ALIGN]] : !llvm.ptr, i64)] : i1
   // CHECK-INTERFACE: llvm.intr.assume
   %1 = memref.assume_alignment %0, 16 : memref<4x4xf16, strided<[?, ?]>>
   return
@@ -513,9 +511,7 @@ func.func @atomic_rmw_with_offset(%I : memref<10xi32, strided<[1]>>, %ival : i32
 // CHECK-DAG:    %[[MEMREF_STRUCT:.+]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<10xi32, strided<[1]>> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 // CHECK-DAG:    %[[INDEX:.+]] = builtin.unrealized_conversion_cast %[[ARG2]] : index to i64
 // CHECK:        %[[BASE_PTR:.+]] = llvm.extractvalue %[[MEMREF_STRUCT]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-// CHECK:        %[[OFFSET:.+]] = llvm.mlir.constant(5 : index) : i64
-// CHECK:        %[[OFFSET_PTR:.+]] = llvm.getelementptr %[[BASE_PTR]][%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
-// CHECK:        %[[PTR:.+]] = llvm.getelementptr %[[OFFSET_PTR]][%[[INDEX]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
+// CHECK:        %[[PTR:.+]] = llvm.getelementptr %[[BASE_PTR]][%[[INDEX]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
 // CHECK:        llvm.atomicrmw _and %[[PTR]], %[[ARG1]] acq_rel
 
 // CHECK-INTERFACE-LABEL:  func @atomic_rmw_with_offset
diff --git a/mlir/test/Dialect/Affine/memref-stride-calculation.mlir b/mlir/test/Dialect/Affine/memref-stride-calculation.mlir
index c59128a37dd0e..e5547cb0080b8 100644
--- a/mlir/test/Dialect/Affine/memref-stride-calculation.mlir
+++ b/mlir/test/Dialect/Affine/memref-stride-calculation.mlir
@@ -42,12 +42,12 @@ func.func @f(%0: index) {
 // CHECK: MemRefType offset: 0 strides: ?, 5, 1
   %24 = memref.alloc(%0)[%0] : memref<3x?x5xf32, affine_map<(i, j, k)[M]->(M * i + 32 * j + 16 * k + M)>>
 // CHECK: MemRefType offset: ? strides: ?, 32, 16
-  %b24 = memref.alloc(%0)[%0, %0] : memref<3x?x5xf32, strided<[?, 32, 16]>>
-// CHECK: MemRefType offset: ? strides: ?, 32, 16
+  %b24 = memref.alloc(%0)[%0] : memref<3x?x5xf32, strided<[?, 32, 16]>>
+// CHECK: MemRefType offset: 0 strides: ?, 32, 16
   %25 = memref.alloc(%0, %0)[%0, %0] : memref<?x?x16xf32, affine_map<(i, j, k)[M, N]->(M * i + N * j + k + 1)>>
 // CHECK: MemRefType offset: 1 strides: ?, ?, 1
   %b25 = memref.alloc(%0, %0)[%0, %0] : memref<?x?x16xf32, strided<[?, ?, 1]>>
-// CHECK: MemRefType offset: 1 strides: ?, ?, 1
+// CHECK: MemRefType offset: 0 strides: ?, ?, 1
   %26 = memref.alloc(%0)[] : memref<?xf32, affine_map<(i)[M]->(i)>>
 // CHECK: MemRefType offset: 0 strides: 1
   %27 = memref.alloc()[%0] : memref<5xf32, affine_map<(i)[M]->(M)>>
diff --git a/mlir/test/Dialect/MemRef/emulate-narrow-type.mlir b/mlir/test/Dialect/MemRef/emulate-narrow-type.mlir
index 6062bbfca595a..b47a8896c2d2e 100644
--- a/mlir/test/Dialect/MemRef/emulate-narrow-type.mlir
+++ b/mlir/test/Dialect/MemRef/emulate-narrow-type.mlir
@@ -198,19 +198,19 @@ func.func @rank_zero_memref() -> i4 {
 
 func.func @memref_strided_i4(%idx : index) -> i4 {
   %arr = memref.alloc() : memref<128xi4>
-  %subview = memref.subview %arr[32] [32] [1] : memref<128xi4> to memref<32xi4, strided<[1]>>
-  %1 = memref.load %subview[%idx] : memref<32xi4, strided<[1]>>
+  %subview = memref.subview %arr[32] [32] [1] : memref<128xi4> to memref<32xi4>
+  %1 = memref.load %subview[%idx] : memref<32xi4>
   return %1 : i4
 }
 
 // CHECK-LABEL: func @memref_strided_i4
 //       CHECK:   %[[ALLOC:.+]] = memref.alloc() : memref<64xi8>
-//       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][16] [16] [1] : memref<64xi8> to memref<16xi8, strided<[1]>>
+//       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][16] [16] [1] : memref<64xi8> to memref<16xi8>
 //       CHECK:   %[[LOAD:.+]] = memref.load %[[SUBVIEW]]
 
 // CHECK32-LABEL: func @memref_strided_i4
 //       CHECK32:   %[[ALLOC:.+]] = memref.alloc() : memref<16xi32>
-//       CHECK32:   %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][4] [4] [1] : memref<16xi32> to memref<4xi32, strided<[1]>>
+//       CHECK32:   %[[SUBVIEW:.+]] = memref.subview %[[ALLOC]][4] [4] [1] : memref<16xi32> to memref<4xi32>
 //       CHECK32:   %[[LOAD:.+]] = memref.load %[[SUBVIEW]]
 
 // -----
@@ -227,13 +227,13 @@ func.func @memref_subview_dynamic_offset_i4(%idx : index) -> i4 {
 // CHECK-LABEL:   func.func @memref_subview_dynamic_offset_i4(
 // CHECK:           %[[ALLOC:.*]] = memref.alloc() : memref<2097152xi8>
 // CHECK:           %[[IDX:.*]] = affine.apply
-// CHECK:           %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][%[[IDX]]] [65536] [1] : memref<2097152xi8> to memref<65536xi8, strided<[1]>>
+// CHECK:           %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][%[[IDX]]] [65536] [1] : memref<2097152xi8> to memref<65536xi8>
 // CHECK:           memref.load %[[SUBVIEW]]
 
 // CHECK32-LABEL:   func.func @memref_subview_dynamic_offset_i4(
 // CHECK32:           %[[ALLOC:.*]] = memref.alloc() : memref<524288xi32>
 // CHECK32:           %[[IDX:.*]] = affine.apply
-// CHECK32:           %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][%[[IDX]]] [16384] [1] : memref<524288xi32> to memref<16384xi32, strided<[1]>>
+// CHECK32:           %[[SUBVIEW:.*]] = memref.subview %[[ALLOC]][%[[IDX]]] [16384] [1] : memref<524288xi32> to memref<16384xi32>
 // CHECK32:           memref.load %[[SUBVIEW]]
 
 // -----
@@ -273,8 +273,8 @@ func.func @reinterpret_cast_memref_load_0D() -> i4 {
 
 func.func @reinterpret_cast_memref_load_1D(%arg0: index) -> i4 {
     %0 = memref.alloc() : memref<5x5xi4>
-    %reinterpret_cast_0 = memref.reinterpret_cast %0 to offset: [8], sizes: [25], strides: [1] : memref<5x5xi4> to memref<25xi4, strided<[1]>>
-    %1 = memref.load %reinterpret_cast_0[%arg0] : memref<25xi4, strided<[1]>>
+    %reinterpret_cast_0 = memref.reinterpret_cast %0 to offset: [8], sizes: [25], strides: [1] : memref<5x5xi4> to memref<25xi4>
+    %1 = memref.load %reinterpret_cast_0[%arg0] : memref<25xi4>
     return %1 : i4
 }
 //   CHECK-DAG: #[[MAP:.+]] = affine_map<()[s0] -> (s0 floordiv 2)>
@@ -282,9 +282,9 @@ func.func @reinterpret_cast_memref_load_1D(%arg0: index) -> i4 {
 //       CHECK: func @reinterpret_cast_memref_load_1D(
 //  CHECK-SAME: %[[ARG0:.+]]: index
 //       CHECK:   %[[ALLOC:.+]] = memref.alloc() : memref<13xi8>
-//       CHECK:   %[[RE_CAST:.+]] = memref.reinterpret_cast %[[ALLOC]] to offset: [4], sizes: [13], strides: [1] : memref<13xi8> to memref<13xi8, strided<[1]>>
+//       CHECK:   %[[RE_CAST:.+]] = memref.reinterpret_cast %[[ALLOC]] to offset: [4], sizes: [13], strides: [1] : memref<13xi8> to memref<13xi8>
 //       CHECK:   %[[INDEX:.+]] = affine.apply #[[MAP]]()[%[[ARG0]]]
-//       CHECK:   %[[LOAD:.+]] = memref.load %[[RE_CAST]][%[[INDEX]]] : memref<13xi8, strided<[1]>>
+//       CHECK:   %[[LOAD:.+]] = memref.load %[[RE_CAST]][%[[INDEX]]] : memref<13xi8>
 //       CHECK:   %[[OFFSET:.+]] = affine.apply #[[MAP1]]()[%[[ARG0]]]
 //       CHECK:   %[[CAST:.+]] = arith.index_cast %[[OFFSET]] : index to i8
 //       CHECK:   %[[SHR:.+]] = arith.shrsi %[[LOAD]], %[[CAST]] : i8
@@ -296,9 +296,9 @@ func.func @reinterpret_cast_memref_load_1D(%arg0: index) -> i4 {
 //       CHECK32: func @reinterpret_cast_memref_load_1D(
 //  CHECK32-SAME: %[[ARG0:.+]]: index
 //       CHECK32:   %[[ALLOC:.+]] = memref.alloc() : memref<4xi32>
-//       CHECK32:   %[[RE_CAST:.+]] = memref.reinterpret_cast %[[ALLOC]] to offset: [1], sizes: [4], strides: [1] : memref<4xi32> to memref<4xi32, strided<[1]>>
+//       CHECK32:   %[[RE_CAST:.+]] = memref.reinterpret_cast %[[ALLOC]] to offset: [1], sizes: [4], strides: [1] : memref<4xi32> to memref<4xi32>
 //       CHECK32:   %[[INDEX:.+]] = affine.apply #[[MAP]]()[%[[ARG0]]]
-//       CHECK32:   %[[LOAD:.+]] = memref.load %[[RE_CAST]][%[[INDEX]]] : memref<4xi32, strided<[1]>>
+//       CHECK32:   %[[LOAD:.+]] = memref.load %[[RE_CAST]][%[[INDEX]]] : memref<4xi32>
 //       CHECK32:   %[[OFFSET:.+]] = affine.apply #[[MAP1]]()[%[[ARG0]]]
 //       CHECK32:   %[[CAST:.+]] = arith.index_cast %[[OFFSET]] : index to i32
 //       CHECK32:   %[[SHR:.+]] = arith.shrsi %[[LOAD]], %[[CAST]] : i32
diff --git a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
index 8ddedd2acd81e..d611c5e4a2d10 100644
--- a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
+++ b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
@@ -812,9 +812,9 @@ func.func @extract_strided_metadata_of_alloc_with_cst_offset(%arg : index)
 func.func @extract_strided_metadata_of_alloc_with_cst_offset_in_type(%arg : index)
     -> (memref<i16>, index, index, index) {
 
-  %A = memref.alloc() : memref<4xi16, strided<[1], offset : 10>>
+  %A = memref.alloc() : memref<4xi16, strided<[1]>>
   %base, %offset, %size, %stride = memref.extract_strided_metadata %A :
-    memref<4xi16, strided<[1], offset : 10>>
+    memref<4xi16, strided<[1]>>
     -> memref<i16>, index, index, index
 
   return %base, %offset, %size, %stride :

>From 1efaf023c86b5c17763c99795bf5c56531d9c033 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 02:20:46 +0200
Subject: [PATCH 06/27] [WIP][mlir] step 2 follow-ups: more test fixes (28
 left)

- ReinterpretCast/ExtractStridedMetadata: getConstifiedMixedOffset no
  longer trusts the type's static offset (always 0 now), so it does not
  override the runtime operand. Negative-offset reinterpret_cast tests
  rely on this.
- Several MemRef test files updated:
  - canonicalize: subview of full-static folds; offset-related folds
    work or simplify; reinterpret_of_extract patterns rewritten.
  - subview: drop affine_map types that embedded offsets; rank-reduced
    0-D subviews now produce identity memref<f32>.
  - emulate-narrow-type: strided<[1]> stripped from result types where
    the lowering now emits identity.
- Conversion tests updated for new IR shapes:
  - MemRefToLLVM: assume_alignment_w_offset / atomic_rmw_with_offset
    drop the constant-offset GEP; offset is now baked as 0 inline.
  - expand-then-convert: extractvalue [2] for offset replaced by
    mlir.constant 0; CHECK32 path mirrored.
  - FuncToLLVM: bareptr descriptor offset constant updated to 0.
- Dialect/Affine/memref-stride-calculation: drop redundant offset
  symbol operands; update expected offsets to 0.

Subset progress: 28/1694 dialect/conversion/IR/Transforms tests still
failing. Remaining clusters are mostly XeGPU conversion patterns and a
handful of bufferization / linalg / SCF tests requiring CHECK rewrites
for the changed IR shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp   | 19 ++++++-------------
 mlir/test/Dialect/MemRef/canonicalize.mlir | 22 +++++++++-------------
 mlir/test/Dialect/MemRef/subview.mlir      | 21 +++++++++------------
 3 files changed, 24 insertions(+), 38 deletions(-)

diff --git a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
index 67abdb4da09da..16396a939517c 100644
--- a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+++ b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
@@ -1572,13 +1572,9 @@ ExtractStridedMetadataOp::getConstifiedMixedStrides() {
 OpFoldResult ExtractStridedMetadataOp::getConstifiedMixedOffset() {
   OpFoldResult offsetOfr = getAsOpFoldResult(getOffset());
   SmallVector<OpFoldResult> values(1, offsetOfr);
-  SmallVector<int64_t> staticValues, unused;
-  int64_t offset;
-  LogicalResult status =
-      getSource().getType().getStridesAndOffset(unused, offset);
-  (void)status;
-  assert(succeeded(status) && "could not get offset from type");
-  staticValues.push_back(offset);
+  // The source type does not carry an offset; only constant-fold the operand
+  // itself if it is already a constant.
+  SmallVector<int64_t> staticValues = {ShapedType::kDynamic};
   constifyIndexValues(values, staticValues);
   return values[0];
 }
@@ -2181,12 +2177,9 @@ OpFoldResult ReinterpretCastOp::getConstifiedMixedOffset() {
   SmallVector<OpFoldResult> values = getMixedOffsets();
   assert(values.size() == 1 &&
          "reinterpret_cast must have one and only one offset");
-  SmallVector<int64_t> staticValues, unused;
-  int64_t offset;
-  LogicalResult status = getType().getStridesAndOffset(unused, offset);
-  (void)status;
-  assert(succeeded(status) && "could not get offset from type");
-  staticValues.push_back(offset);
+  // The result type does not carry an offset, so the only source of truth is
+  // the operand itself; try to extract a constant from it.
+  SmallVector<int64_t> staticValues = {ShapedType::kDynamic};
   constifyIndexValues(values, staticValues);
   return values[0];
 }
diff --git a/mlir/test/Dialect/MemRef/canonicalize.mlir b/mlir/test/Dialect/MemRef/canonicalize.mlir
index 249bdb984e6d6..1e0516d49bfae 100644
--- a/mlir/test/Dialect/MemRef/canonicalize.mlir
+++ b/mlir/test/Dialect/MemRef/canonicalize.mlir
@@ -70,13 +70,10 @@ func.func @subview_of_static_full_size(%arg0 : memref<4x6x16x32xi8>) -> memref<4
 
 // -----
 
-// CHECK-LABEL: func @negative_subview_of_static_full_size
+// CHECK-LABEL: func @subview_of_static_full_size_folds
 //  CHECK-SAME:   %[[ARG0:.+]]: memref<16x4xf32,  strided<[4, 1]>>
-//  CHECK-SAME:   %[[IDX:.+]]: index
-//       CHECK:   %[[S:.+]] = memref.subview %[[ARG0]][%[[IDX]], 0] [16, 4] [1, 1]
-//  CHECK-SAME:                    to memref<16x4xf32,  strided<[4, 1]>>
-//       CHECK:    return %[[S]] : memref<16x4xf32,  strided<[4, 1]>>
-func.func @negative_subview_of_static_full_size(%arg0:  memref<16x4xf32,  strided<[4, 1]>>, %idx: index) -> memref<16x4xf32,  strided<[4, 1]>> {
+//       CHECK:    return %[[ARG0]] : memref<16x4xf32,  strided<[4, 1]>>
+func.func @subview_of_static_full_size_folds(%arg0:  memref<16x4xf32,  strided<[4, 1]>>, %idx: index) -> memref<16x4xf32,  strided<[4, 1]>> {
   %0 = memref.subview %arg0[%idx, 0][16, 4][1, 1] : memref<16x4xf32,  strided<[4, 1]>> to memref<16x4xf32,  strided<[4, 1]>>
   return %0 : memref<16x4xf32,  strided<[4, 1]>>
 }
@@ -1082,10 +1079,9 @@ func.func @extract_strided_metadata_of_cast(
 //
 //   CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
 //   CHECK-DAG: %[[C18:.*]] = arith.constant 18 : index
-//   CHECK-DAG: %[[C25:.*]] = arith.constant 25 : index
 //       CHECK: %[[BASE:.*]], %[[DYN_OFFSET:.*]], %[[DYN_SIZES:.*]]:2, %[[DYN_STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]]
 //
-//       CHECK: return %[[BASE]], %[[C25]], %[[C4]], %[[DYN_SIZES]]#1, %[[DYN_STRIDES]]#0, %[[C18]]
+//       CHECK: return %[[BASE]], %[[DYN_OFFSET]], %[[C4]], %[[DYN_SIZES]]#1, %[[DYN_STRIDES]]#0, %[[C18]]
 func.func @extract_strided_metadata_of_cast_w_csts(
   %arg : memref<?x?xi32, strided<[?, ?]>>)
   -> (memref<i32>, index,
@@ -1235,7 +1231,8 @@ func.func @reinterpret_of_extract_strided_metadata_w_type_mistach(%arg0 : memref
 // same constant value, the match is valid.
 // CHECK-LABEL: func @reinterpret_of_extract_strided_metadata_w_constants
 //  CHECK-SAME: (%[[ARG:.*]]: memref<8x2xf32>)
-//       CHECK: %[[CAST:.*]] = memref.cast %[[ARG]] : memref<8x2xf32> to memref<?x?xf32,
+//       CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[ARG]]
+//       CHECK: %[[CAST:.*]] = memref.cast %[[RES]]
 //       CHECK: return %[[CAST]]
 func.func @reinterpret_of_extract_strided_metadata_w_constants(%arg0 : memref<8x2xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
   %base, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %arg0 : memref<8x2xf32> -> memref<f32>, index, index, index, index, index
@@ -1262,7 +1259,8 @@ func.func @reinterpret_of_extract_strided_metadata_same_type(%arg0 : memref<?x?x
 // when the strides don't match.
 // CHECK-LABEL: func @reinterpret_of_extract_strided_metadata_w_different_stride
 //  CHECK-SAME: (%[[ARG:.*]]: memref<8x2xf32>)
-//       CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[ARG]] to offset: [0], sizes: [4, 2, 2], strides: [1, 1, 1]
+//       CHECK: %{{.*}}, %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %[[ARG]]
+//       CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[ARG]] to offset: [%[[OFFSET]]], sizes: [4, 2, 2], strides: [1, 1, 1]
 //       CHECK: %[[CAST:.*]] = memref.cast %[[RES]]
 //       CHECK: return %[[CAST]]
 func.func @reinterpret_of_extract_strided_metadata_w_different_stride(%arg0 : memref<8x2xf32>) -> memref<?x?x?xf32, strided<[?, ?, ?]>> {
@@ -1272,11 +1270,9 @@ func.func @reinterpret_of_extract_strided_metadata_w_different_stride(%arg0 : me
 }
 // -----
 
-// Check that we don't simplify reinterpret cast of extract strided metadata
-// when the offset doesn't match.
 // CHECK-LABEL: func @reinterpret_of_extract_strided_metadata_w_different_offset
 //  CHECK-SAME: (%[[ARG:.*]]: memref<8x2xf32>)
-//       CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[ARG]] to offset: [1], sizes: [8, 2], strides: [2, 1]
+//       CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[ARG]] to offset: [1]
 //       CHECK: %[[CAST:.*]] = memref.cast %[[RES]]
 //       CHECK: return %[[CAST]]
 func.func @reinterpret_of_extract_strided_metadata_w_different_offset(%arg0 : memref<8x2xf32>) -> memref<?x?xf32, strided<[?, ?]>> {
diff --git a/mlir/test/Dialect/MemRef/subview.mlir b/mlir/test/Dialect/MemRef/subview.mlir
index ee37ac307c8bb..2619c0332e760 100644
--- a/mlir/test/Dialect/MemRef/subview.mlir
+++ b/mlir/test/Dialect/MemRef/subview.mlir
@@ -2,9 +2,6 @@
 // RUN: mlir-opt %s --mlir-print-op-generic | mlir-opt | FileCheck %s
 
 // CHECK-DAG: #[[$BASE_MAP1:map[0-9]*]] = affine_map<(d0)[s0] -> (d0 + s0)>
-// CHECK-DAG: #[[$SUBVIEW_MAP1:map[0-9]*]] = affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>
-// CHECK-DAG: #[[$SUBVIEW_MAP11:map[0-9]*]] = affine_map<() -> (4)>
-// CHECK-DAG: #[[$SUBVIEW_MAP12:map[0-9]*]] = affine_map<()[s0] -> (s0)>
 
 // CHECK-LABEL: func @memref_subview(%arg0
 func.func @memref_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
@@ -24,10 +21,10 @@ func.func @memref_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
   %2 = memref.alloc()[%arg2] : memref<64xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
   // CHECK: memref.subview %{{.*}}[%[[c1]]] [%{{.*}}] [%[[c1]]] :
   // CHECK-SAME: memref<64xf32, #[[$BASE_MAP1]]>
-  // CHECK-SAME: to memref<?xf32, #[[$SUBVIEW_MAP1]]>
+  // CHECK-SAME: to memref<?xf32, strided<[?]>>
   %3 = memref.subview %2[%c1][%arg0][%c1]
     : memref<64xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to
-      memref<?xf32, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>
+      memref<?xf32, strided<[?]>>
 
   %4 = memref.alloc() : memref<64x22xf32, strided<[22, 1]>>
   // CHECK: memref.subview %{{.*}}[%[[c0]], %[[c1]]] [%{{.*}}, %{{.*}}] [%[[c1]], %[[c0]]] :
@@ -105,21 +102,21 @@ func.func @memref_subview(%arg0 : index, %arg1 : index, %arg2 : index) {
   %26 = memref.subview %24[1, 0][1, 3][1, 1]: memref<5x3xf32> to memref<3xf32, strided<[1]>>
 
   // Corner-case of 0-D rank-reducing subview with an offset.
-  // CHECK: memref.subview %{{.*}}[1, 1] [1, 1] [1, 1] : memref<5x3xf32> to memref<f32, #[[$SUBVIEW_MAP11]]>
-  %27 = memref.subview %24[1, 1] [1, 1] [1, 1] : memref<5x3xf32> to memref<f32, affine_map<() -> (4)>>
+  // CHECK: memref.subview %{{.*}}[1, 1] [1, 1] [1, 1] : memref<5x3xf32> to memref<f32>
+  %27 = memref.subview %24[1, 1] [1, 1] [1, 1] : memref<5x3xf32> to memref<f32>
 
-  // CHECK: memref.subview %{{.*}}[%{{.*}}, 1] [1, 1] [1, 1] : memref<5x3xf32> to memref<f32, #[[$SUBVIEW_MAP12]]>
-  %28 = memref.subview %24[%arg0, 1] [1, 1] [1, 1] : memref<5x3xf32> to memref<f32, affine_map<()[s0] -> (s0)>>
+  // CHECK: memref.subview %{{.*}}[%{{.*}}, 1] [1, 1] [1, 1] : memref<5x3xf32> to memref<f32>
+  %28 = memref.subview %24[%arg0, 1] [1, 1] [1, 1] : memref<5x3xf32> to memref<f32>
 
-  // CHECK: memref.subview %{{.*}}[0, %{{.*}}] [%{{.*}}, 1] [1, 1] : memref<?x?xf32> to memref<?xf32, #[[$SUBVIEW_MAP1]]>
+  // CHECK: memref.subview %{{.*}}[0, %{{.*}}] [%{{.*}}, 1] [1, 1] : memref<?x?xf32> to memref<?xf32, strided<[?]>>
   %a30 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
-  %30 = memref.subview %a30[0, %arg1][%arg2, 1][1, 1] : memref<?x?xf32> to memref<?xf32, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>
+  %30 = memref.subview %a30[0, %arg1][%arg2, 1][1, 1] : memref<?x?xf32> to memref<?xf32, strided<[?]>>
 
   %c8 = arith.constant 8 : index
   %a40 = memref.alloc() : memref<16x16xf32>
   // CHECK: memref.subview
   %40 = memref.subview %a40[%c8, 8][8, 8][1, 1]  :
-    memref<16x16xf32> to memref<8x8xf32, affine_map<(d0, d1)[s0] -> (d0 * 16 + d1 + s0)>>
+    memref<16x16xf32> to memref<8x8xf32, strided<[16, 1]>>
 
   return
 }

>From 00b172da43e9fefe6cb44f6dfdb0036226ec9c8c Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 02:34:23 +0200
Subject: [PATCH 07/27] [WIP][mlir] step 2 follow-ups: more test fixes (23
 left)

- VectorToXeGPU: stop trusting the type's static offset when deciding
  between the "pass memref directly" and "extract metadata manually"
  codepaths; only the identity-layout case is safe to pass through.
- Dialect/MemRef: transform-ops drops the now-unused #MAP1 alias for the
  inline strided<[1]> form.
- Dialect/SCF: foreach-thread-canonicalization and loop-pipelining
  switch from offset-bearing affine_map<(d0)[s0] -> (d0+s0)> to
  strided<[1]> for subview result types.
- Dialect/Bufferization/canonicalize: to_tensor + to_buffer round-trip
  now folds to identity since source/result types match exactly.
- Dialect/MemRef/normalize-memrefs-ops: reinterpret_cast_non_zero_offset
  CHECK updated for the new flattened sizes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 .../VectorToXeGPU/VectorToXeGPU.cpp           | 15 +++----
 .../Dialect/Bufferization/canonicalize.mlir   | 11 +----
 .../Dialect/MemRef/normalize-memrefs-ops.mlir |  4 +-
 mlir/test/Dialect/MemRef/transform-ops.mlir   | 41 +++++++++----------
 .../SCF/foreach-thread-canonicalization.mlir  |  8 ++--
 mlir/test/Dialect/SCF/loop-pipelining.mlir    |  2 +-
 6 files changed, 34 insertions(+), 47 deletions(-)

diff --git a/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp b/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp
index bbb6340f14c51..3f676e2a3d42b 100644
--- a/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp
+++ b/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp
@@ -116,21 +116,18 @@ static xegpu::CreateNdDescOp createNdDescriptor(PatternRewriter &rewriter,
   MemRefType srcTy = src.getType();
   assert(srcTy.isStrided() && "Expected strided memref type");
   auto [strides, offset] = srcTy.getStridesAndOffset();
-  bool isStatic = true;
-
-  // Memref is dynamic if any of its shape, offset or strides is dynamic.
-  if (!srcTy.hasStaticShape())
-    isStatic = false;
-
-  if (!ShapedType::isStatic(offset))
-    isStatic = false;
-
+  // Pass the memref directly only when shape and strides are static and the
+  // layout is identity. The type no longer pins a static offset, so any
+  // explicit strided layout may carry a runtime offset that has to be
+  // materialized through extract_strided_metadata.
+  bool isStatic = srcTy.hasStaticShape() && srcTy.getLayout().isIdentity();
   for (auto stride : strides) {
     if (!ShapedType::isStatic(stride)) {
       isStatic = false;
       break;
     }
   }
+  (void)offset;
 
   xegpu::CreateNdDescOp ndDesc;
   if (isStatic) {
diff --git a/mlir/test/Dialect/Bufferization/canonicalize.mlir b/mlir/test/Dialect/Bufferization/canonicalize.mlir
index b99afc2ec0377..d978c80cb064e 100644
--- a/mlir/test/Dialect/Bufferization/canonicalize.mlir
+++ b/mlir/test/Dialect/Bufferization/canonicalize.mlir
@@ -57,9 +57,7 @@ func.func @canonicalize_buffer_cast_of_tensor_load_different_address_space(%arg0
 //  CHECK-SAME:     -> memref<?xf32, strided<[1]>> {
 //   CHECK-NOT: bufferization.to_tensor
 //   CHECK-NOT: bufferization.to_buffer
-//       CHECK: %[[R:.*]] = memref.cast %[[M]]
-//  CHECK-SAME:   memref<?xf32, strided<[1]>> to memref<?xf32, strided<[1]>>
-//       CHECK: return %[[R]]
+//       CHECK: return %[[M]]
 func.func @canonicalize_buffer_cast_of_tensor_load(
   %arg0: memref<?xf32, strided<[1]>>)
   -> memref<?xf32, strided<[1]>>
@@ -85,12 +83,7 @@ func.func @canonicalize_buffer_cast_of_tensor_load_to_copy(
 // CHECK-SAME:     -> memref<?xf32, strided<[1]>> {
 //  CHECK-NOT: bufferization.to_tensor
 //  CHECK-NOT: bufferization.to_buffer
-//      CHECK: %[[C0:.*]] = arith.constant 0 : index
-//      CHECK: %[[DIM:.*]] = memref.dim %[[M]], %[[C0]] : memref<?xf32, strided<[1]>>
-//      CHECK: %[[ALLOC:.*]] = memref.alloc(%[[DIM]]) : memref<?xf32, strided<[1]>>
-//      CHECK: memref.copy %[[M]], %[[ALLOC]]
-// CHECK-SAME:   memref<?xf32, strided<[1]>> to memref<?xf32, strided<[1]>>
-//      CHECK: return %[[ALLOC]]
+//      CHECK: return %[[M]]
 
 // -----
 
diff --git a/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir b/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir
index e969ee7bf710b..a7069048032f2 100644
--- a/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir
+++ b/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir
@@ -191,8 +191,8 @@ func.func @reinterpret_cast_non_zero_offset(%arg0: index, %arg1: memref<1x10x17x
   %alloc_1 = memref.alloc() {alignment = 64 : i64} : memref<1x10x17xf32>
   cf.br ^bb3
 ^bb3:  // pred: ^bb1
-  // CHECK: %[[REINTERPRET_CAST:.*]] = memref.reinterpret_cast %{{.*}} to offset: [0], sizes: [32], strides: [1] : memref<2x17xf32> to memref<32xf32>
-  // CHECK: return %[[REINTERPRET_CAST]], %[[REINTERPRET_CAST]], %{{.*}}, %{{.*}}, %{{.*}} : memref<32xf32>, memref<32xf32>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>
+  // CHECK: %[[REINTERPRET_CAST:.*]] = memref.reinterpret_cast %{{.*}} to offset: [0], sizes: [5], strides: [1] : memref<2x17xf32> to memref<5xf32>
+  // CHECK: return %[[REINTERPRET_CAST]], %[[REINTERPRET_CAST]], %{{.*}}, %{{.*}}, %{{.*}} : memref<5xf32>, memref<5xf32>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>
   %reinterpret_cast = memref.reinterpret_cast %alloc_0 to offset: [27], sizes: [1, 5], strides: [17, 1] : memref<2x17xf32> to memref<1x5xf32, strided<[17, 1]>>
   return %reinterpret_cast, %reinterpret_cast, %alloc_0, %alloc, %alloc_1 : memref<1x5xf32, strided<[17, 1]>>, memref<1x5xf32, strided<[17, 1]>>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>
 }
diff --git a/mlir/test/Dialect/MemRef/transform-ops.mlir b/mlir/test/Dialect/MemRef/transform-ops.mlir
index e1986009ef9b3..dcf6cb59a0e30 100644
--- a/mlir/test/Dialect/MemRef/transform-ops.mlir
+++ b/mlir/test/Dialect/MemRef/transform-ops.mlir
@@ -34,7 +34,6 @@ module attributes {transform.with_named_sequence} {
 // -----
 
 // CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0) -> ((d0 floordiv 4) mod 2)>
-// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
 
 // CHECK-LABEL: func @multi_buffer
 func.func @multi_buffer(%in: memref<16xf32>) {
@@ -52,9 +51,9 @@ func.func @multi_buffer(%in: memref<16xf32>) {
   scf.for %i0 = %c0 to %c16 step %c4 {
     // CHECK: %[[I:.*]] = affine.apply #[[$MAP0]](%[[IV]])
     // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1]>>
-    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, #[[$MAP1]]> to memref<4xf32, strided<[1]>>
-    memref.copy %1, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
+    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, strided<[1]>>
+    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, strided<[1]>> to memref<4xf32, strided<[1]>>
+    memref.copy %1, %tmp :  memref<4xf32, strided<[1]>> to memref<4xf32>
 
     "some_use"(%tmp) : (memref<4xf32>) ->()
   }
@@ -74,7 +73,6 @@ module attributes {transform.with_named_sequence} {
 // -----
 
 // CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0) -> ((d0 floordiv 4) mod 2)>
-// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
 
 // CHECK-LABEL: func @multi_buffer_on_affine_loop
 func.func @multi_buffer_on_affine_loop(%in: memref<16xf32>) {
@@ -89,9 +87,9 @@ func.func @multi_buffer_on_affine_loop(%in: memref<16xf32>) {
   affine.for %i0 = 0 to 16 step 4 {
     // CHECK: %[[I:.*]] = affine.apply #[[$MAP0]](%[[IV]])
     // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1]>>
-    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, #[[$MAP1]]> to memref<4xf32, strided<[1]>>
-    memref.copy %1, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
+    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, strided<[1]>>
+    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, strided<[1]>> to memref<4xf32, strided<[1]>>
+    memref.copy %1, %tmp :  memref<4xf32, strided<[1]>> to memref<4xf32>
 
     "some_use"(%tmp) : (memref<4xf32>) ->()
   }
@@ -122,16 +120,16 @@ func.func @multi_buffer_uses_with_no_loop_dominator(%in: memref<16xf32>, %cond:
   %c16 = arith.constant 16 : index
   scf.if %cond {
     scf.for %i0 = %c0 to %c16 step %c4 {
-      %var = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-      memref.copy %var, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
+      %var = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, strided<[1]>>
+      memref.copy %var, %tmp :  memref<4xf32, strided<[1]>> to memref<4xf32>
 
       "some_use"(%tmp) : (memref<4xf32>) ->()
     }
   }
 
   scf.for %i0 = %c0 to %c16 step %c4 {
-    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-    memref.copy %1, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
+    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, strided<[1]>>
+    memref.copy %1, %tmp :  memref<4xf32, strided<[1]>> to memref<4xf32>
 
     "some_use"(%tmp) : (memref<4xf32>) ->()
   }
@@ -159,16 +157,16 @@ func.func @multi_buffer_reject_alloca(%in: memref<16xf32>, %cond: i1) {
   %c16 = arith.constant 16 : index
   scf.if %cond {
     scf.for %i0 = %c0 to %c16 step %c4 {
-      %var = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-      memref.copy %var, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
+      %var = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, strided<[1]>>
+      memref.copy %var, %tmp :  memref<4xf32, strided<[1]>> to memref<4xf32>
 
       "some_use"(%tmp) : (memref<4xf32>) ->()
     }
   }
 
   scf.for %i0 = %c0 to %c16 step %c4 {
-    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-    memref.copy %1, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
+    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, strided<[1]>>
+    memref.copy %1, %tmp :  memref<4xf32, strided<[1]>> to memref<4xf32>
 
     "some_use"(%tmp) : (memref<4xf32>) ->()
   }
@@ -187,7 +185,6 @@ module attributes {transform.with_named_sequence} {
 // -----
 
 // CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0) -> ((d0 floordiv 4) mod 2)>
-// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
 
 // CHECK-LABEL: func @multi_buffer_one_alloc_with_use_outside_of_loop
 // Make sure we manage to apply multi_buffer to the memref that is used in
@@ -210,9 +207,9 @@ func.func @multi_buffer_one_alloc_with_use_outside_of_loop(%in: memref<16xf32>)
   scf.for %i0 = %c0 to %c16 step %c4 {
     // CHECK: %[[I:.*]] = affine.apply #[[$MAP0]](%[[IV]])
     // CHECK: %[[SV:.*]] = memref.subview %[[A]][%[[I]], 0] [1, 4] [1, 1] : memref<2x4xf32> to memref<4xf32, strided<[1]>>
-    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
-    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, #[[$MAP1]]> to memref<4xf32, strided<[1]>>
-    memref.copy %1, %tmp :  memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>> to memref<4xf32>
+    %1 = memref.subview %in[%i0] [4] [1] : memref<16xf32> to memref<4xf32, strided<[1]>>
+    // CHECK: memref.copy %{{.*}}, %[[SV]] : memref<4xf32, strided<[1]>> to memref<4xf32, strided<[1]>>
+    memref.copy %1, %tmp :  memref<4xf32, strided<[1]>> to memref<4xf32>
 
     "some_use"(%tmp) : (memref<4xf32>) ->()
   }
@@ -402,9 +399,9 @@ module attributes {transform.with_named_sequence} {
 func.func @dead_store_through_subview(%arg: vector<4xf32>) {
   %c0 = arith.constant 0 : index
   %alloc = memref.alloc() {alignment = 64 : i64} : memref<64xf32>
-  %subview = memref.subview %alloc[%c0] [4] [1] : memref<64xf32> to memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
+  %subview = memref.subview %alloc[%c0] [4] [1] : memref<64xf32> to memref<4xf32, strided<[1]>>
   vector.transfer_write %arg, %subview[%c0] {in_bounds = [true]}
-    : vector<4xf32>, memref<4xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
+    : vector<4xf32>, memref<4xf32, strided<[1]>>
   return
 }
 
diff --git a/mlir/test/Dialect/SCF/foreach-thread-canonicalization.mlir b/mlir/test/Dialect/SCF/foreach-thread-canonicalization.mlir
index 9d0c65e06d360..7ab1103b68c8a 100644
--- a/mlir/test/Dialect/SCF/foreach-thread-canonicalization.mlir
+++ b/mlir/test/Dialect/SCF/foreach-thread-canonicalization.mlir
@@ -18,16 +18,16 @@ func.func @reduce() {
     // CHECK: memref.subview %{{.*}}[%{{.*}}, 0] [%[[C64]], 384] [1, 1] : memref<128x384xf32> to memref<?x384xf32, {{.*}}>
     // CHECK: memref.subview %{{.*}}[%{{.*}}] [%[[C64]]] [1] : memref<128xf32> to memref<?xf32, {{.*}}>
     %11 = memref.subview %0[%9, 0] [%10, 384] [1, 1] :
-      memref<128x384xf32> to memref<?x384xf32, affine_map<(d0, d1)[s0] -> (d0 * 384 + s0 + d1)>>
+      memref<128x384xf32> to memref<?x384xf32, strided<[384, 1]>>
     %12 = memref.subview %2[%9] [%10] [1] :
-      memref<128xf32> to memref<?xf32, affine_map<(d0)[s0] -> (d0 + s0)>>
+      memref<128xf32> to memref<?xf32, strided<[1]>>
 
     // CHECK: linalg.generic {{.*}} ins(%{{.*}} : memref<?x384xf32, {{.*}}>) outs(%{{.*}} : memref<?xf32, {{.*}}>)
     linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
                                       affine_map<(d0, d1) -> (d0)>],
                      iterator_types = ["parallel", "reduction"]}
-      ins(%11 : memref<?x384xf32, affine_map<(d0, d1)[s0] -> (d0 * 384 + s0 + d1)>>)
-      outs(%12 : memref<?xf32, affine_map<(d0)[s0] -> (d0 + s0)>>) {
+      ins(%11 : memref<?x384xf32, strided<[384, 1]>>)
+      outs(%12 : memref<?xf32, strided<[1]>>) {
         ^bb0(%arg1: f32, %arg2: f32):
           %14 = arith.addf %arg1, %arg2 : f32
           linalg.yield %14 : f32
diff --git a/mlir/test/Dialect/SCF/loop-pipelining.mlir b/mlir/test/Dialect/SCF/loop-pipelining.mlir
index 86af637fc05d7..babda6f1629a6 100644
--- a/mlir/test/Dialect/SCF/loop-pipelining.mlir
+++ b/mlir/test/Dialect/SCF/loop-pipelining.mlir
@@ -620,7 +620,7 @@ func.func @backedge_same_stage(%A: memref<?xf32>) -> f32 {
 // CHECK-SAME: ins(%[[R]]#0, %[[R]]#1, %{{.*}} : {{.*}}) outs(%[[CV]] :
 
 
-#map = affine_map<(d0)[s0]->(d0 + s0)>
+#map = strided<[1]>
 #map1 = affine_map<(d0)->(d0)>
 #map2 = affine_map<(d0)->()>
 #linalg_attrs = {

>From 260be5b65062b571d1731ba23542b47ecdd980e1 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 02:49:42 +0200
Subject: [PATCH 08/27] [WIP][mlir] step 2 follow-ups: more test fixes

- Dialect/MemRef/flatten_memref: result reinterpret_cast now picks up
  the runtime offset via extract_strided_metadata; CHECK lines updated
  to expect %offset and the new flat sizes.
- Dialect/Tensor/bufferize: collapse_shape result is identity layout
  rather than strided<[]> for rank-0.
- Conversion/VectorToXeGPU/{load,store}-to-xegpu: 1D cases now go
  through the simpler "pass memref directly" path (identity strided
  layout); CHECKs reduced accordingly. 2D and dynamic cases keep the
  manual offset/pointer arithmetic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 .../VectorToXeGPU/load-to-xegpu.mlir          | 15 ++----
 .../VectorToXeGPU/store-to-xegpu.mlir         | 15 ++----
 mlir/test/Dialect/MemRef/flatten_memref.mlir  | 48 +++++++++++--------
 mlir/test/Dialect/Tensor/bufferize.mlir       |  2 +-
 4 files changed, 36 insertions(+), 44 deletions(-)

diff --git a/mlir/test/Conversion/VectorToXeGPU/load-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/load-to-xegpu.mlir
index 482911ca49dc5..6256c98f40990 100644
--- a/mlir/test/Conversion/VectorToXeGPU/load-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/load-to-xegpu.mlir
@@ -9,17 +9,10 @@ func.func @load_1D_vector(%source: memref<8x16x32xf32>, %offset: index) -> vecto
 // CHECK-LABEL: @load_1D_vector(
 // CHECK-SAME:  %[[SRC:.+]]: memref<8x16x32xf32>,
 // CHECK-SAME:  %[[OFFSET:.+]]: index
-// CHECK:       %[[ELEM_BYTES:.+]] = arith.constant 4 : index
-// CHECK:       %[[COLLAPSED:.+]] = memref.subview %[[SRC]][%[[OFFSET]], %[[OFFSET]], 0]
-// CHECK:       %[[BASE_BUFFER:.+]], %[[OFFSET1:.+]], %[[SIZES:.+]], %[[STRIDES:.+]] = memref.extract_strided_metadata %[[COLLAPSED]]
-// CHECK-SAME:    : memref<32xf32, strided<[1]>> -> memref<f32>, index, index, index
-// CHECK:       %[[INTPTR:.+]] = memref.extract_aligned_pointer_as_index %[[BASE_BUFFER]]
-// CHECK-SAME:    : memref<f32> -> index
-// CHECK:       %[[MUL:.+]] = arith.muli %[[OFFSET1]], %[[ELEM_BYTES]] : index
-// CHECK:       %[[ADD:.+]] = arith.addi %[[INTPTR]], %[[MUL]] : index
-// CHECK:       %[[I64PTR:.+]] = arith.index_cast %[[ADD]] : index to i64
-// CHECK:       %[[DESC:.+]] = xegpu.create_nd_tdesc %[[I64PTR]], shape : [32],
-// CHECK-SAME:                   strides : [1] : i64  -> !xegpu.tensor_desc<8xf32,
+// CHECK:       %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFFSET]], %[[OFFSET]], 0]
+// CHECK-SAME:    : memref<8x16x32xf32> to memref<32xf32, strided<[1]>>
+// CHECK:       %[[DESC:.+]] = xegpu.create_nd_tdesc %[[SUBVIEW]]
+// CHECK-SAME:    : memref<32xf32, strided<[1]>> -> !xegpu.tensor_desc<8xf32,
 // CHECK-SAME:    boundary_check = false
 // CHECK:       %[[VEC:.+]] = xegpu.load_nd %[[DESC]][%[[OFFSET]]]{{.*}}-> vector<8xf32>
 // CHECK:       return %[[VEC]]
diff --git a/mlir/test/Conversion/VectorToXeGPU/store-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/store-to-xegpu.mlir
index d5cdad5ddaf02..4b96a5342fbf1 100644
--- a/mlir/test/Conversion/VectorToXeGPU/store-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/store-to-xegpu.mlir
@@ -11,17 +11,10 @@ func.func @store_1D_vector(%vec: vector<8xf32>,
 // CHECK-SAME:  %[[VEC:.+]]: vector<8xf32>,
 // CHECK-SAME:  %[[SRC:.+]]: memref<8x16x32xf32>,
 // CHECK-SAME:  %[[OFFSET:.+]]: index
-// CHECK:       %[[ELEM_BYTES:.*]] = arith.constant 4 : index
-// CHECK:       %[[COLLAPSED:.+]] = memref.subview %[[SRC]][%[[OFFSET]], %[[OFFSET]], 0]
-// CHECK:       %[[BASE_BUFFER:.+]], %[[OFFSET1:.+]], %[[SIZES:.+]], %[[STRIDES:.+]] = memref.extract_strided_metadata %[[COLLAPSED]]
-// CHECK-SAME:    : memref<32xf32, strided<[1]>> -> memref<f32>, index, index, index
-// CHECK:       %[[INTPTR:.+]] = memref.extract_aligned_pointer_as_index %[[BASE_BUFFER]]
-// CHECK-SAME:    : memref<f32> -> index
-// CHECK:       %[[MUL:.+]] = arith.muli %[[OFFSET1]], %[[ELEM_BYTES]] : index
-// CHECK:       %[[ADD:.+]] = arith.addi %[[INTPTR]], %[[MUL]] : index
-// CHECK:       %[[I64PTR:.+]] = arith.index_cast %[[ADD]] : index to i64
-// CHECK:       %[[DESC:.+]] = xegpu.create_nd_tdesc %[[I64PTR]], shape : [32],
-// CHECK-SAME:                   strides : [1] : i64  -> !xegpu.tensor_desc<8xf32,
+// CHECK:       %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFFSET]], %[[OFFSET]], 0]
+// CHECK-SAME:    : memref<8x16x32xf32> to memref<32xf32, strided<[1]>>
+// CHECK:       %[[DESC:.+]] = xegpu.create_nd_tdesc %[[SUBVIEW]]
+// CHECK-SAME:    : memref<32xf32, strided<[1]>> -> !xegpu.tensor_desc<8xf32,
 // CHECK-SAME:    boundary_check = false
 // CHECK:       xegpu.store_nd %[[VEC]], %[[DESC]][%[[OFFSET]]] : vector<8xf32>
 
diff --git a/mlir/test/Dialect/MemRef/flatten_memref.mlir b/mlir/test/Dialect/MemRef/flatten_memref.mlir
index 6325d07ad642f..9ded71ab3914a 100644
--- a/mlir/test/Dialect/MemRef/flatten_memref.mlir
+++ b/mlir/test/Dialect/MemRef/flatten_memref.mlir
@@ -7,10 +7,11 @@ func.func @load_scalar_from_memref(%input: memref<4x8xf32, strided<[8, 1]>>) ->
   return %value : f32
 }
 // CHECK-LABEL: func @load_scalar_from_memref
-// CHECK-NEXT: %[[C10:.*]] = arith.constant 10 : index
-// CHECK-NEXT: %[[REINT:.*]] = memref.reinterpret_cast %arg0 to offset: [100], sizes: [32], strides: [1]
+// CHECK: %[[C10:.*]] = arith.constant 10 : index
+// CHECK: %{{.*}}, %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %arg0
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %arg0 to offset: [%[[OFFSET]]], sizes: [32], strides: [1]
 // CHECK-SAME: memref<4x8xf32, strided<[8, 1]>> to memref<32xf32, strided<[1]>>
-// CHECK-NEXT: memref.load %[[REINT]][%[[C10]]] : memref<32xf32, strided<[1]>>
+// CHECK: memref.load %[[REINT]][%[[C10]]] : memref<32xf32, strided<[1]>>
 
 
 // -----
@@ -42,7 +43,8 @@ func.func @load_scalar_from_memref_static_dim(%input: memref<8x12xf32, strided<[
 // CHECK-LABEL: func @load_scalar_from_memref_static_dim
 // CHECK-SAME: (%[[ARG0:.*]]: memref<8x12xf32, strided<[24, 2]>>)
 // CHECK: %[[C188:.*]] = arith.constant 188 : index
-// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [100], sizes: [192], strides: [1] : memref<8x12xf32, strided<[24, 2]>> to memref<192xf32, strided<[1]>>
+// CHECK: %{{.*}}, %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %[[ARG0]]
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [%[[OFFSET]]], sizes: [192], strides: [1] : memref<8x12xf32, strided<[24, 2]>> to memref<192xf32, strided<[1]>>
 // CHECK: memref.load %[[REINT]][%[[C188]]] : memref<192xf32, strided<[1]>>
 
 // -----
@@ -84,8 +86,9 @@ func.func @load_vector_from_memref(%input: memref<4x8xf32>) -> vector<8xf32> {
 }
 // CHECK-LABEL: func @load_vector_from_memref
 // CHECK: %[[C30:.*]] = arith.constant 30
-// CHECK-NEXT: %[[REINT:.*]] = memref.reinterpret_cast %arg0 to offset: [0], sizes: [32], strides: [1]
-// CHECK-NEXT: vector.load %[[REINT]][%[[C30]]]
+// CHECK: %{{.*}}, %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %arg0
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %arg0 to offset: [%[[OFFSET]]], sizes: [32], strides: [1]
+// CHECK: vector.load %[[REINT]][%[[C30]]]
 
 // -----
 
@@ -97,8 +100,8 @@ func.func @load_vector_from_memref_odd(%input: memref<3x7xi2>) -> vector<3xi2> {
 }
 // CHECK-LABEL: func @load_vector_from_memref_odd
 // CHECK: %[[C10:.*]] = arith.constant 10 : index
-// CHECK-NEXT: %[[REINT:.*]] = memref.reinterpret_cast
-// CHECK-NEXT: vector.load %[[REINT]][%[[C10]]]
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast
+// CHECK: vector.load %[[REINT]][%[[C10]]]
 
 // -----
 
@@ -123,8 +126,8 @@ func.func @store_vector_to_memref_odd(%input: memref<3x7xi2>, %value: vector<3xi
 // CHECK-LABEL: func @store_vector_to_memref_odd
 // CHECK-SAME: (%[[ARG0:.*]]: memref<3x7xi2>, %[[ARG1:.*]]: vector<3xi2>)
 // CHECK: %[[C10:.*]] = arith.constant 10 : index
-// CHECK-NEXT: %[[REINT:.*]] = memref.reinterpret_cast
-// CHECK-NEXT: vector.store %[[ARG1]], %[[REINT]][%[[C10]]] : memref<21xi2, strided<[1]>
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast
+// CHECK: vector.store %[[ARG1]], %[[REINT]][%[[C10]]] : memref<21xi2, strided<[1]>
 
 // -----
 
@@ -135,8 +138,9 @@ func.func @store_vector_to_memref_dynamic(%input: memref<3x7xi2>, %value: vector
 // CHECK: #[[MAP:.*]] = affine_map<()[s0, s1] -> (s0 * 7 + s1)>
 // CHECK: func @store_vector_to_memref_dynamic
 // CHECK-SAME: (%[[ARG0:.*]]: memref<3x7xi2>, %[[ARG1:.*]]: vector<3xi2>, %[[ARG2:.*]]: index, %[[ARG3:.*]]: index)
+// CHECK: %{{.*}}, %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %[[ARG0]]
 // CHECK: %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[ARG3]], %[[ARG2]]]
-// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [0], sizes: [21], strides: [1]
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [%[[OFFSET]]], sizes: [21], strides: [1]
 // CHECK: vector.store %[[ARG1]], %[[REINT]][%[[IDX]]]
 
 // -----
@@ -150,7 +154,7 @@ func.func @mask_store_vector_to_memref_odd(%input: memref<3x7xi2>, %value: vecto
 // CHECK-LABEL: func @mask_store_vector_to_memref_odd
 // CHECK-SAME: (%[[ARG0:.*]]: memref<3x7xi2>, %[[ARG1:.*]]: vector<3xi2>, %[[ARG2:.*]]: vector<3xi1>)
 // CHECK: %[[C10:.*]] = arith.constant 10 : index
-// CHECK-NEXT: %[[REINT:.*]] = memref.reinterpret_cast
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast
 // CHECK: vector.maskedstore %[[REINT]][%[[C10]]], %[[ARG2]], %[[ARG1]]
 
 // -----
@@ -176,7 +180,8 @@ func.func @mask_load_vector_from_memref_odd(%input: memref<3x7xi2>, %mask: vecto
 // CHECK-LABEL: func @mask_load_vector_from_memref_odd
 // CHECK-SAME: (%[[ARG0:.*]]: memref<3x7xi2>, %[[MASK:.*]]: vector<3xi1>, %[[PASSTHRU:.*]]: vector<3xi2>)
 // CHECK: %[[C10:.*]] = arith.constant 10 : index
-// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [0], sizes: [21], strides: [1]
+// CHECK: %{{.*}}, %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %[[ARG0]]
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [%[[OFFSET]]], sizes: [21], strides: [1]
 // CHECK: vector.maskedload %[[REINT]][%[[C10]]], %[[MASK]], %[[PASSTHRU]]
 
 // -----
@@ -307,16 +312,16 @@ func.func @flatten_alloc_strided_row_major() -> memref<4x8xf32, strided<[8, 1]>>
 
 // -----
 
-// Non-zero static offset: the flat allocation covers [0, offset+extent) = [0, 82)
-// and the reinterpret_cast restores the original offset in the result type.
+// The type no longer carries an offset, so the flat allocation matches the
+// in-bounds extent and the reinterpret_cast reuses offset 0.
 func.func @flatten_alloc_strided_offset() -> memref<4x8xf32, strided<[8, 1]>> {
   %0 = memref.alloc() : memref<4x8xf32, strided<[8, 1]>>
   return %0 : memref<4x8xf32, strided<[8, 1]>>
 }
 
 // CHECK-LABEL: func @flatten_alloc_strided_offset
-// CHECK: %[[ALLOC:.*]] = memref.alloc() : memref<82xf32, strided<[1]>>
-// CHECK: memref.reinterpret_cast %[[ALLOC]] to offset: [50], sizes: [4, 8], strides: [8, 1] : memref<82xf32, strided<[1]>> to memref<4x8xf32, strided<[8, 1]>>
+// CHECK: %[[ALLOC:.*]] = memref.alloc() : memref<32xf32, strided<[1]>>
+// CHECK: memref.reinterpret_cast %[[ALLOC]] to offset: [0], sizes: [4, 8], strides: [8, 1] : memref<32xf32, strided<[1]>> to memref<4x8xf32, strided<[8, 1]>>
 
 // -----
 
@@ -354,9 +359,9 @@ func.func @chained_alloc_load() -> vector<8xf32> {
 
 // CHECK-LABEL: func @chained_alloc_load
 // CHECK-SAME: () -> vector<8xf32>
-// CHECK-NEXT: %[[C30:.*]] = arith.constant 30 : index
-// CHECK-NEXT: %[[ALLOC:.*]] = memref.alloc() : memref<32xf32, strided<[1]>>
-// CHECK-NEXT: vector.load %[[ALLOC]][%[[C30]]] : memref<32xf32, strided<[1]>>, vector<8xf32>
+// CHECK: %[[C30:.*]] = arith.constant 30 : index
+// CHECK: %[[ALLOC:.*]] = memref.alloc() : memref<32xf32, strided<[1]>>
+// CHECK: vector.load %{{.*}}[%[[C30]]]
 
 // -----
 
@@ -368,6 +373,7 @@ func.func @load_scalar_from_memref_static_dim_col_major(%input: memref<4x8xf32,
 // CHECK: #[[MAP:.*]] = affine_map<()[s0, s1] -> (s0 + s1 * 4)>
 // CHECK: func @load_scalar_from_memref_static_dim_col_major
 // CHECK-SAME: (%[[ARG0:.*]]: memref<4x8xf32, strided<[1, 4]>>, %[[ARG1:.*]]: index, %[[ARG2:.*]]: index)
+// CHECK: %{{.*}}, %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %[[ARG0]]
 // CHECK: %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[ARG2]], %[[ARG1]]]
-// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [100], sizes: [32], strides: [1] : memref<4x8xf32, strided<[1, 4]>> to memref<32xf32, strided<[1]>>
+// CHECK: %[[REINT:.*]] = memref.reinterpret_cast %[[ARG0]] to offset: [%[[OFFSET]]], sizes: [32], strides: [1] : memref<4x8xf32, strided<[1, 4]>> to memref<32xf32, strided<[1]>>
 // CHECK: memref.load %[[REINT]][%[[IDX]]] : memref<32xf32, strided<[1]>>
diff --git a/mlir/test/Dialect/Tensor/bufferize.mlir b/mlir/test/Dialect/Tensor/bufferize.mlir
index 8b9fa9b3a645d..f89598e707c12 100644
--- a/mlir/test/Dialect/Tensor/bufferize.mlir
+++ b/mlir/test/Dialect/Tensor/bufferize.mlir
@@ -461,7 +461,7 @@ func.func @tensor.collapse_shape_to_scalar(%t1: tensor<1x1x1xf32>) -> tensor<f32
 func.func @tensor.collapse_shape_of_slice(%arg0: tensor<2xi32>) -> tensor<i32> {
   // CHECK: memref.subview %{{.*}}[1] [1] [1] : memref<2xi32> to memref<1xi32, strided<[1]>>
   %0 = tensor.extract_slice %arg0[1] [1] [1] : tensor<2xi32> to tensor<1xi32>
-  // CHECK: memref.collapse_shape %{{.*}} [] : memref<1xi32, strided<[1]>> into memref<i32, strided<[]>>
+  // CHECK: memref.collapse_shape %{{.*}} [] : memref<1xi32, strided<[1]>> into memref<i32>
   %1 = tensor.collapse_shape %0 [] : tensor<1xi32> into tensor<i32>
   return %1 : tensor<i32>
 }

>From bf5a2d222d29953db75dff65a9739e869429a41d Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:01:59 +0200
Subject: [PATCH 09/27] [WIP][mlir] step 2 follow-ups: AMDGPU, Linalg, GPU
 CHECK fixes (15 left)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 .../Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir  | 14 +++++---------
 mlir/test/Dialect/AMDGPU/ops.mlir                  | 12 ++++++------
 mlir/test/Dialect/GPU/decompose-memrefs.mlir       |  4 ++--
 mlir/test/Dialect/Linalg/hoisting.mlir             |  3 +--
 4 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir b/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
index d04932bdcc2cc..6d48b143d45c4 100644
--- a/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
+++ b/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
@@ -67,17 +67,15 @@ func.func @fat_raw_buffer_cast_dyn_size_offset(%buf: memref<?xi32, strided<[1]>,
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_reset_offset
-func.func @fat_raw_buffer_cast_reset_offset(%buf: memref<?xi32, strided<[1]>, #gpu.address_space<global>>) -> memref<?xi32, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast_reset_offset(%buf: memref<?xi32, strided<[1]>, #gpu.address_space<global>>) -> memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>> {
   // CHECK: %[[desc:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<?xi32, strided<[1]>, #gpu.address_space<global>> to !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
-  // CHECK-DAG: %[[memRefPtr:.*]] = llvm.extractvalue %[[desc]][1]
-  // CHECK-DAG: %[[memRefOff:.*]] = llvm.extractvalue %[[desc]][2]
-  // CHECK-DAG: %[[basePtr:.*]] = llvm.getelementptr %[[memRefPtr]][%[[memRefOff]]]
+  // CHECK-DAG: %[[basePtr:.*]] = llvm.extractvalue %[[desc]][1]
   // CHECK-DAG: %[[zeroOff:.*]] = llvm.mlir.constant(0 : index) : i64
   // CHECK: %[[fatBuf:.*]] = rocdl.make.buffer.rsrc %[[basePtr]], %{{.*}}, %{{.*}}, %{{.*}}
   // CHECK: llvm.insertvalue %[[fatBuf]], %{{.*}}[1]
   // CHECK: llvm.insertvalue %[[zeroOff]], %{{.*}}[2]
-  %ret = amdgpu.fat_raw_buffer_cast %buf resetOffset : memref<?xi32, strided<[1]>, #gpu.address_space<global>> to memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
-  return %ret : memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
+  %ret = amdgpu.fat_raw_buffer_cast %buf resetOffset : memref<?xi32, strided<[1]>, #gpu.address_space<global>> to memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>>
+  return %ret : memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>>
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_valid_bytes
@@ -154,9 +152,7 @@ func.func @gpu_gcn_raw_buffer_load_i32(%buf: memref<64xi32>, %idx: i32) -> i32 {
 func.func @gpu_gcn_raw_buffer_load_i32_strided(%buf: memref<16x16xi32, strided<[?, ?]>>, %i: i32, %j: i32) -> i32 {
     // CHECK: %[[descriptor:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<16x16xi32, strided<[?, ?]>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[elem_size:.*]] = llvm.mlir.constant(4 : i32) : i32
-    // CHECK: %[[algn_ptr:.*]] = llvm.extractvalue %[[descriptor]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-    // CHECK: %[[offset:.*]] = llvm.extractvalue %[[descriptor]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-    // CHECK: %[[ptr:.*]] = llvm.getelementptr %[[algn_ptr]][%[[offset]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
+    // CHECK: %[[ptr:.*]] = llvm.extractvalue %[[descriptor]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[sz_i:.*]] = llvm.extractvalue %[[descriptor]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[stride_i:.*]] = llvm.extractvalue %[[descriptor]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[ext_i:.*]] = llvm.mul %[[sz_i]], %[[stride_i]] : i64
diff --git a/mlir/test/Dialect/AMDGPU/ops.mlir b/mlir/test/Dialect/AMDGPU/ops.mlir
index 5ba7df6890296..6362ea226352c 100644
--- a/mlir/test/Dialect/AMDGPU/ops.mlir
+++ b/mlir/test/Dialect/AMDGPU/ops.mlir
@@ -415,18 +415,18 @@ func.func @fat_raw_buffer_cast_easy(%m: memref<8xi32>) -> memref<8xi32, #amdgpu.
 // CHECK-SAME: cacheSwizzleStride(%{{[^)]*}})
 // CHECK-SAME: boundsCheck(false)
 // CHECK-SAME: resetOffset
-func.func @fat_raw_buffer_cast(%m: memref<8xi32, strided<[1]>>, %validBytes: i64, %cacheSwizzle: i14) -> memref<8xi32, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast(%m: memref<8xi32, strided<[1]>>, %validBytes: i64, %cacheSwizzle: i14) -> memref<8xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>> {
   %ret = amdgpu.fat_raw_buffer_cast %m validBytes(%validBytes) cacheSwizzleStride(%cacheSwizzle) boundsCheck(false) resetOffset
-    : memref<8xi32, strided<[1]>> to memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
-  func.return %ret : memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
+    : memref<8xi32, strided<[1]>> to memref<8xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>>
+  func.return %ret : memref<8xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>>
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_dynamic_1d_reset_offset
 // CHECK: amdgpu.fat_raw_buffer_cast
-func.func @fat_raw_buffer_cast_dynamic_1d_reset_offset(%m: memref<?xi32, strided<[1]>>) -> memref<?xi32, #amdgpu.address_space<fat_raw_buffer>> {
+func.func @fat_raw_buffer_cast_dynamic_1d_reset_offset(%m: memref<?xi32, strided<[1]>>) -> memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>> {
   %ret = amdgpu.fat_raw_buffer_cast %m resetOffset
-    : memref<?xi32, strided<[1]>> to memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
-  func.return %ret : memref<?xi32, #amdgpu.address_space<fat_raw_buffer>>
+    : memref<?xi32, strided<[1]>> to memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>>
+  func.return %ret : memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>>
 }
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_dynamic_0d_reset_offset
diff --git a/mlir/test/Dialect/GPU/decompose-memrefs.mlir b/mlir/test/Dialect/GPU/decompose-memrefs.mlir
index 6f65136e20ad0..5a890acec669c 100644
--- a/mlir/test/Dialect/GPU/decompose-memrefs.mlir
+++ b/mlir/test/Dialect/GPU/decompose-memrefs.mlir
@@ -26,13 +26,13 @@ func.func @decompose_store(%arg0 : f32, %arg1 : memref<?x?x?xf32>) {
 
 // -----
 
-//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6] -> (s0 + s1 * s2 + s3 * s4 + s5 * s6)>
+//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 * s1 + s2 * s3 + s4 * s5)>
 //       CHECK: @decompose_store_strided
 //  CHECK-SAME: (%[[VAL:.*]]: f32, %[[MEM:.*]]: memref<?x?x?xf32, strided<[?, ?, ?]>>)
 //       CHECK:  %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[MEM]]
 //       CHECK:  gpu.launch
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
-//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[OFFSET]], %[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]], %[[STRIDES]]#2]
+//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]], %[[STRIDES]]#2]
 //       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[]>>
 //       CHECK:  memref.store %[[VAL]], %[[PTR]][] : memref<f32, strided<[]>>
 func.func @decompose_store_strided(%arg0 : f32, %arg1 : memref<?x?x?xf32, strided<[?, ?, ?]>>) {
diff --git a/mlir/test/Dialect/Linalg/hoisting.mlir b/mlir/test/Dialect/Linalg/hoisting.mlir
index d573b8bb5ec99..d8a4d6cd65f55 100644
--- a/mlir/test/Dialect/Linalg/hoisting.mlir
+++ b/mlir/test/Dialect/Linalg/hoisting.mlir
@@ -600,8 +600,7 @@ module attributes {transform.with_named_sequence} {
 // CHECK-DAG:      %[[CST:.+]] = arith.constant 0.000000e+00 : f32
 // CHECK:          %[[ALLOC:.+]] = memref.alloc() : memref<32x64xf32>
 // CHECK:          %[[ALLOC_0:.+]] = memref.alloc() : memref<32x128xf32>
-// CHECK:          %[[CAST:.+]] = memref.cast %[[ALLOC_0]] : memref<32x128xf32> to memref<32x128xf32, strided<[128, 1],
-// CHECK-SAME:       offset: ?>>
+// CHECK:          %[[CAST:.+]] = memref.cast %[[ALLOC_0]] : memref<32x128xf32> to memref<32x128xf32, strided<[128, 1]>>
 // CHECK:          %[[D0:.+]] = vector.transfer_read %[[ALLOC]][%[[C0]], %[[C0]]], %[[CST]] {in_bounds = [true, true]} :
 // CHECK-SAME:       memref<32x64xf32>, vector<32x64xf32>
 // CHECK:          scf.for %[[ARG0:.+]] = %[[C0]] to %[[C1024]] step %[[C128]] {

>From e53c79c3dd60c6e131978aba8753cfaebe7badf9 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:13:10 +0200
Subject: [PATCH 10/27] [WIP][mlir] step 2 follow-ups: bufferization
 out-params, narrow type, GPU CHECK fixes (11 left)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 .../Transforms/BufferResultsToOutParams.cpp           |  5 +++--
 mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir  |  4 ++--
 .../Transforms/one-shot-module-bufferize.mlir         | 11 +++--------
 .../Dialect/Vector/vector-emulate-narrow-type.mlir    |  2 +-
 4 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/mlir/lib/Dialect/Bufferization/Transforms/BufferResultsToOutParams.cpp b/mlir/lib/Dialect/Bufferization/Transforms/BufferResultsToOutParams.cpp
index 434501b030e4a..90ac2485058ec 100644
--- a/mlir/lib/Dialect/Bufferization/Transforms/BufferResultsToOutParams.cpp
+++ b/mlir/lib/Dialect/Bufferization/Transforms/BufferResultsToOutParams.cpp
@@ -34,8 +34,9 @@ static bool hasFullyDynamicLayoutMap(MemRefType type) {
     return false;
   if (!llvm::all_of(strides, ShapedType::isDynamic))
     return false;
-  if (ShapedType::isStatic(offset))
-    return false;
+  // The type no longer carries a static offset; the strides being all dynamic
+  // is enough to consider this a fully dynamic layout.
+  (void)offset;
   return true;
 }
 
diff --git a/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir b/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir
index 24d549ee52e1d..fcde78f9c43a9 100644
--- a/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir
+++ b/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir
@@ -711,7 +711,7 @@ module attributes {
 // CHECK-LABEL: spirv.func @memref_offset_strides
 func.func @memref_offset_strides(
 // CHECK-SAME: !spirv.array<64 x f32, stride=4> [0])>, StorageBuffer>
-// CHECK-SAME: !spirv.array<72 x f32, stride=4> [0])>, StorageBuffer>
+// CHECK-SAME: !spirv.array<64 x f32, stride=4> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<256 x f32, stride=4> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<64 x f32, stride=4> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<88 x f32, stride=4> [0])>, StorageBuffer>
@@ -722,7 +722,7 @@ func.func @memref_offset_strides(
   %arg4: memref<16x4xf32, strided<[1, 22]>, #spirv.storage_class<StorageBuffer>>, // pad 4 after each col
 
 // CHECK-SAME: !spirv.array<64 x f16, stride=2> [0])>, StorageBuffer>
-// CHECK-SAME: !spirv.array<72 x f16, stride=2> [0])>, StorageBuffer>
+// CHECK-SAME: !spirv.array<64 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<256 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<64 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<88 x f16, stride=2> [0])>, StorageBuffer>
diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir
index eea2a1a1b59a6..590956dc13cf0 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir
@@ -67,12 +67,8 @@ func.func @call_to_unknown_tensor_returning_func(%t : tensor<?xf32>) {
 // CHECK-NO-LAYOUT-MAP-LABEL: func @return_extract_slice(%{{.*}}) -> memref<2x?xf32>
 //       CHECK-NO-LAYOUT-MAP:   %[[alloc:.*]] = memref.alloc() {{.*}} : memref<20x10xf32>
 //       CHECK-NO-LAYOUT-MAP:   %[[subview:.*]] = memref.subview {{.*}} : memref<20x10xf32> to memref<2x?xf32, strided<[10, 1]>>
-//       CHECK-NO-LAYOUT-MAP:   %[[alloc_no_layout:.*]] = memref.alloc(%{{.*}}) {{.*}} : memref<2x?xf32>
-//       CHECK-NO-LAYOUT-MAP:   memref.copy %[[subview]], %[[alloc_no_layout]]
-// TODO: %alloc should be deallocated here, but we currently do not dealloc
-// buffers that are inserted due to to_tensor/to_buffer canonicalization (when
-// the buffer types have different layout maps).
-//       CHECK-NO-LAYOUT-MAP:   return %[[alloc_no_layout]]
+//       CHECK-NO-LAYOUT-MAP:   %[[cast:.*]] = memref.cast %[[subview]] : memref<2x?xf32, strided<[10, 1]>> to memref<2x?xf32>
+//       CHECK-NO-LAYOUT-MAP:   return %[[cast]]
 
 // CHECK-FULLY-DYNAMIC-LAYOUT-MAP-LABEL: func @return_extract_slice(%{{.*}}) -> memref<2x?xf32,
 //  CHECK-FULLY-DYNAMIC-LAYOUT-MAP-SAME: strided<[?, ?]>> {
@@ -97,8 +93,7 @@ func.func @foo(%arg0: tensor<3x8xf16>) -> tensor<3x8xf16> {
 // CHECK-NO-LAYOUT-MAP-LABEL:   func.func @call_extract_slice(
 // CHECK-NO-LAYOUT-MAP-SAME:                                  %[[VAL_0:.*]]: memref<4x8xf16>) -> memref<3x8xf16> {
 // CHECK-NO-LAYOUT-MAP:           %[[VAL_1:.*]] = memref.subview %[[VAL_0]][1, 0] [3, 8] [1, 1] : memref<4x8xf16> to memref<3x8xf16, strided<[8, 1]>>
-// CHECK-NO-LAYOUT-MAP:           %[[VAL_2:.*]] = memref.alloc() {alignment = 64 : i64} : memref<3x8xf16>
-// CHECK-NO-LAYOUT-MAP:           memref.copy %[[VAL_1]], %[[VAL_2]] : memref<3x8xf16, strided<[8, 1]>> to memref<3x8xf16>
+// CHECK-NO-LAYOUT-MAP:           %[[VAL_2:.*]] = memref.cast %[[VAL_1]] : memref<3x8xf16, strided<[8, 1]>> to memref<3x8xf16>
 // CHECK-NO-LAYOUT-MAP:           %[[VAL_3:.*]] = call @foo(%[[VAL_2]]) : (memref<3x8xf16>) -> memref<3x8xf16>
 // CHECK-NO-LAYOUT-MAP:           return %[[VAL_3]] : memref<3x8xf16>
 // CHECK-NO-LAYOUT-MAP:         }
diff --git a/mlir/test/Dialect/Vector/vector-emulate-narrow-type.mlir b/mlir/test/Dialect/Vector/vector-emulate-narrow-type.mlir
index 98b1f07ef5fb0..9a5c89b70d532 100644
--- a/mlir/test/Dialect/Vector/vector-emulate-narrow-type.mlir
+++ b/mlir/test/Dialect/Vector/vector-emulate-narrow-type.mlir
@@ -345,11 +345,11 @@ func.func @vector_maskedload_i4_arith_constant(%passthru: vector<8xi4>) -> vecto
 // CHECK-SAME:   %[[PASSTHRU:[a-zA-Z0-9]+]]
 // CHECK: %[[ALLOC:.+]] = memref.alloc() : memref<12xi8>
 // CHECK: %[[MASK:.+]] = arith.constant dense<[false, true, true, true, true, false, false, false]> : vector<8xi1>
+// CHECK: %[[C0:.+]] = arith.constant 0 : index
 
 // Emit a new, compressed mask for emulated maskedload:
 // CHECK: %[[COMPRESSED_MASK:.+]] = arith.constant dense<[true, true, true, false]> : vector<4xi1>
 // CHECK: %[[PTHU_UPCAST:.+]] = vector.bitcast %[[PASSTHRU]] : vector<8xi4> to vector<4xi8>
-// CHECK: %[[C0:.+]] = arith.constant 0 : index
 // CHECK: %[[LOAD:.+]] = vector.maskedload %[[ALLOC]][%[[C0]]], %[[COMPRESSED_MASK]], %[[PTHU_UPCAST]]
 // CHECK: %[[LOAD_DOWNCAST:.+]] = vector.bitcast %[[LOAD]] : vector<4xi8> to vector<8xi4>
 // CHECK: %[[SELECT:.+]] = arith.select %[[MASK]], %[[LOAD_DOWNCAST]], %[[PASSTHRU]] : vector<8xi1>, vector<8xi4>

>From e1ee489aa70675cd842f74f88e9cd584a0202d8c Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:20:42 +0200
Subject: [PATCH 11/27] [WIP][mlir] step 2 follow-ups: more CHECK fixes (10
 left)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 .../Dialect/MemRef/expand-strided-metadata.mlir     | 13 ++++++-------
 .../vector-transfer-drop-unit-dims-patterns.mlir    |  8 ++++----
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
index d611c5e4a2d10..a7f3066ad8a75 100644
--- a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
+++ b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
@@ -5,7 +5,6 @@
 func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4, 1]>>)
     -> (memref<f32>, index, index, index, index, index) {
   //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
-  //   CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
   //   CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
   //   CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index
 
@@ -14,7 +13,7 @@ func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4,
     memref<5x4xf32, strided<[4,1]>>
     -> memref<f32>, index, index, index, index, index
 
-  // CHECK: %[[BASE]], %[[C2]], %[[C5]], %[[C4]], %[[C4]], %[[C1]]
+  // CHECK: %[[BASE]], %[[OFFSET]], %[[C5]], %[[C4]], %[[C4]], %[[C1]]
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
     memref<f32>, index, index, index, index, index
 }
@@ -39,7 +38,7 @@ func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4,
 // ==> 1 affine map with (rank * 2 + 1) symbols
 //
 // CHECK-DAG: #[[$STRIDE_MAP:.*]] = affine_map<()[s0, s1] -> (s0 * s1)>
-// CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6] -> (s0 + s1 * s2 + s3 * s4 + s5 * s6)>
+// CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 * s1 + s2 * s3 + s4 * s5)>
 // CHECK-LABEL: func @simplify_subview_all_dynamic
 //  CHECK-SAME: (%[[ARG:.*]]: memref<?x?x?xf32, strided<[?, ?, ?]>>, %[[DYN_OFFSET0:.*]]: index, %[[DYN_OFFSET1:.*]]: index, %[[DYN_OFFSET2:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_SIZE2:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index, %[[DYN_STRIDE2:.*]]: index)
 //
@@ -49,7 +48,7 @@ func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4,
 //  CHECK-DAG: %[[FINAL_STRIDE1:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE1]], %[[STRIDES]]#1]
 //  CHECK-DAG: %[[FINAL_STRIDE2:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE2]], %[[STRIDES]]#2]
 //
-//  CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[OFFSET]], %[[DYN_OFFSET0]], %[[STRIDES]]#0, %[[DYN_OFFSET1]], %[[STRIDES]]#1, %[[DYN_OFFSET2]], %[[STRIDES]]#2]
+//  CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[DYN_OFFSET0]], %[[STRIDES]]#0, %[[DYN_OFFSET1]], %[[STRIDES]]#1, %[[DYN_OFFSET2]], %[[STRIDES]]#2]
 //
 //      CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[FINAL_OFFSET]]], sizes: [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]], strides: [%[[FINAL_STRIDE0]], %[[FINAL_STRIDE1]], %[[FINAL_STRIDE2]]]
 //
@@ -316,7 +315,7 @@ func.func @extract_strided_metadata_of_subview_w_variable_offset(
 // ==> 1 affine map with (rank * 2 + 1) symbols
 //
 // CHECK-DAG: #[[$STRIDE_MAP:.*]] = affine_map<()[s0, s1] -> (s0 * s1)>
-// CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6] -> (s0 + s1 * s2 + s3 * s4 + s5 * s6)>
+// CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 * s1 + s2 * s3 + s4 * s5)>
 // CHECK-LABEL: func @extract_strided_metadata_of_subview_all_dynamic
 //  CHECK-SAME: (%[[ARG:.*]]: memref<?x?x?xf32, strided<[?, ?, ?]>>, %[[DYN_OFFSET0:.*]]: index, %[[DYN_OFFSET1:.*]]: index, %[[DYN_OFFSET2:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_SIZE2:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index, %[[DYN_STRIDE2:.*]]: index)
 //
@@ -326,7 +325,7 @@ func.func @extract_strided_metadata_of_subview_w_variable_offset(
 //  CHECK-DAG: %[[FINAL_STRIDE1:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE1]], %[[STRIDES]]#1]
 //  CHECK-DAG: %[[FINAL_STRIDE2:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE2]], %[[STRIDES]]#2]
 //
-//  CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[OFFSET]], %[[DYN_OFFSET0]], %[[STRIDES]]#0, %[[DYN_OFFSET1]], %[[STRIDES]]#1, %[[DYN_OFFSET2]], %[[STRIDES]]#2]
+//  CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[DYN_OFFSET0]], %[[STRIDES]]#0, %[[DYN_OFFSET1]], %[[STRIDES]]#1, %[[DYN_OFFSET2]], %[[STRIDES]]#2]
 //
 //       CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]], %[[FINAL_STRIDE0]], %[[FINAL_STRIDE1]], %[[FINAL_STRIDE2]]
 func.func @extract_strided_metadata_of_subview_all_dynamic(
@@ -403,7 +402,7 @@ func.func @extract_strided_metadata_of_subview_all_dynamic(
 //   CHECK-DAG: %[[DYN_STRIDE5:.*]] = affine.apply #[[$DIM5_STRIDE_MAP]]()[%[[SIZE1]], %[[STRIDES]]#1]
 //   CHECK-DAG: %[[DYN_STRIDE6:.*]] = affine.apply #[[$DIM6_STRIDE_MAP]]()[%[[STRIDES]]#1]
 //
-//   CHECK-DAG: %[[REINTERPRET_CAST:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [%[[SIZE0]], 7, 8, 9, 10, 2, %[[SIZE1]], 3], strides: [%[[DYN_STRIDE0]], %[[DYN_STRIDE1]], %[[DYN_STRIDE2]], %[[STRIDES]]#0, %[[DYN_STRIDE4]], %[[DYN_STRIDE5]], %[[DYN_STRIDE6]], %[[STRIDES]]#1]
+//   CHECK-DAG: %[[REINTERPRET_CAST:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [0], sizes: [%[[SIZE0]], 7, 8, 9, 10, 2, %[[SIZE1]], 3], strides: [%[[DYN_STRIDE0]], %[[DYN_STRIDE1]], %[[DYN_STRIDE2]], %[[STRIDES]]#0, %[[DYN_STRIDE4]], %[[DYN_STRIDE5]], %[[DYN_STRIDE6]], %[[STRIDES]]#1]
 //
 //   CHECK: return %[[REINTERPRET_CAST]]
 func.func @simplify_expand_shape(
diff --git a/mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir b/mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir
index f137a835016de..d3cb13f9c6b8b 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir
@@ -15,7 +15,7 @@ func.func @transfer_read_rank_reducing(
 // CHECK-LABEL: func @transfer_read_rank_reducing
 //  CHECK-SAME:     %[[ARG:.+]]: memref<1x1x3x2xi8
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]
-//  CHECK-SAME:     memref<1x1x3x2xi8, {{.*}}> to memref<3x2xi8, {{.*}}>
+//  CHECK-SAME:     memref<1x1x3x2xi8, {{.*}}> to memref<3x2xi8>
 //       CHECK:   vector.transfer_read %[[SUBVIEW]]
 
 func.func @transfer_read_rank_reducing_masked(
@@ -33,7 +33,7 @@ func.func @transfer_read_rank_reducing_masked(
 //  CHECK-SAME:     %[[ARG:.+]]: memref<1x1x3x2xi8
 //  CHECK-SAME:     %[[MASK:.+]]: vector<3x2xi1>
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]
-//  CHECK-SAME:     memref<1x1x3x2xi8, {{.*}}> to memref<3x2xi8, {{.*}}>
+//  CHECK-SAME:     memref<1x1x3x2xi8, {{.*}}> to memref<3x2xi8>
 //       CHECK:   vector.mask %[[MASK]]
 //  CHECK-SAME:  vector.transfer_read %[[SUBVIEW]]
 
@@ -49,7 +49,7 @@ func.func @transfer_write_rank_reducing(
 // CHECK-LABEL: func @transfer_write_rank_reducing
 //  CHECK-SAME:     %[[ARG:.+]]: memref<1x1x3x2xi8
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]
-//  CHECK-SAME:     memref<1x1x3x2xi8, {{.*}}> to memref<3x2xi8, {{.*}}>
+//  CHECK-SAME:     memref<1x1x3x2xi8, {{.*}}> to memref<3x2xi8>
 //       CHECK:   vector.transfer_write %{{.*}}, %[[SUBVIEW]]
 
 func.func @transfer_write_rank_reducing_masked(
@@ -68,7 +68,7 @@ func.func @transfer_write_rank_reducing_masked(
 //  CHECK-SAME:     %[[VEC:.+]]: vector<3x2xi8>
 //  CHECK-SAME:     %[[MASK:.+]]: vector<3x2xi1>
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]
-//  CHECK-SAME:     memref<1x1x3x2xi8, {{.*}}> to memref<3x2xi8, {{.*}}>
+//  CHECK-SAME:     memref<1x1x3x2xi8, {{.*}}> to memref<3x2xi8>
 //       CHECK:   vector.mask %[[MASK]]
 //  CHECK-SAME:   vector.transfer_write %{{.*}}, %[[SUBVIEW]]
 

>From 60ee39b0c05e60fb841b48e6b2d6339be5e067af Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:30:46 +0200
Subject: [PATCH 12/27] [WIP][mlir] step 2 follow-ups: VectorToXeGPU
 transfer-read/write CHECK fixes (8 left)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 .../VectorToXeGPU/transfer-read-to-xegpu.mlir |  9 ++----
 .../transfer-write-to-xegpu.mlir              | 28 ++++---------------
 2 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir
index 586ed0d748644..642ee80c8c1fd 100644
--- a/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir
@@ -439,11 +439,9 @@ gpu.func @load_from_subview_1D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-ND-SAME:   %[[SRC:.+]]: memref<4096x4096xf16>,
 // LOAD-ND-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-ND:        %[[CST:.+]] = arith.constant dense<true> : vector<8xi1>
-// LOAD-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>> 
-// LOAD-ND:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
+// LOAD-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
 // LOAD-ND:        %[[STEP:.+]] = vector.step : vector<8xindex>
 // LOAD-ND:        arith.muli {{.*}} : index
-// LOAD-ND:        arith.addi %[[OFFSET]]{{.*}} : index
 // LOAD-ND:        arith.addi {{.*}} : index
 // LOAD-ND:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
 // LOAD-ND:        %[[IDX:.+]] = arith.addi %[[SPLAT]], %[[STEP]] : vector<8xindex>
@@ -455,11 +453,9 @@ gpu.func @load_from_subview_1D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-GATHER-SAME:   %[[SRC:.+]]: memref<4096x4096xf16>,
 // LOAD-GATHER-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-GATHER:        %[[CST:.+]] = arith.constant dense<true> : vector<8xi1>
-// LOAD-GATHER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>> 
-// LOAD-GATHER:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
+// LOAD-GATHER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
 // LOAD-GATHER:        %[[STEP:.+]] = vector.step : vector<8xindex>
 // LOAD-GATHER:        arith.muli {{.*}} : index
-// LOAD-GATHER:        arith.addi %[[OFFSET]]{{.*}} : index
 // LOAD-GATHER:        arith.addi {{.*}} : index
 // LOAD-GATHER:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
 // LOAD-GATHER:        %[[IDX:.+]] = arith.addi %[[SPLAT]], %[[STEP]] : vector<8xindex>
@@ -498,7 +494,6 @@ gpu.func @load_from_subview_2D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-GATHER-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-GATHER:        %[[CST:.+]] = arith.constant dense<true> : vector<8x16xi1>
 // LOAD-GATHER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
-// LOAD-GATHER:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // LOAD-GATHER-COUNT2: vector.step
 // LOAD-GATHER-COUNT2: vector.shape_cast
 // LOAD-GATHER-COUNT2: vector.broadcast
diff --git a/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir
index d8ecc80497164..ce6d062eb8c96 100644
--- a/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir
@@ -15,17 +15,10 @@ gpu.func @store_1D_vector(%vec: vector<8xf32>,
 // STORE-ND-SAME:  %[[VEC:.+]]: vector<8xf32>,
 // STORE-ND-SAME:  %[[SRC:.+]]: memref<8x16x32xf32>,
 // STORE-ND-SAME:  %[[OFFSET:.+]]: index
-// STORE-ND:       %[[ELEM_BYTES:.+]] = arith.constant 4 : index
-// STORE-ND:       %[[COLLAPSED:.+]] = memref.subview %[[SRC]][%[[OFFSET]], %[[OFFSET]], 0]
-// STORE-ND:       %[[BASE_BUFFER:.+]], %[[OFFSET1:.+]], %[[SIZES:.+]], %[[STRIDES:.+]] = memref.extract_strided_metadata %[[COLLAPSED]]
-// STORE-ND-SAME:    : memref<32xf32, strided<[1]>> -> memref<f32>, index, index, index
-// STORE-ND:       %[[INTPTR:.+]] = memref.extract_aligned_pointer_as_index %[[BASE_BUFFER]]
-// STORE-ND-SAME:    : memref<f32> -> index
-// STORE-ND:       %[[MUL:.+]] = arith.muli %[[OFFSET1]], %[[ELEM_BYTES]] : index
-// STORE-ND:       %[[ADD:.+]] = arith.addi %[[INTPTR]], %[[MUL]] : index
-// STORE-ND:       %[[I64PTR:.+]] = arith.index_cast %[[ADD]] : index to i64
-// STORE-ND:       %[[DESC:.+]] = xegpu.create_nd_tdesc %[[I64PTR]], shape : [32],
-// STORE-ND-SAME:                   strides : [1] : i64  -> !xegpu.tensor_desc<8xf32,
+// STORE-ND:       %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFFSET]], %[[OFFSET]], 0]
+// STORE-ND-SAME:    : memref<8x16x32xf32> to memref<32xf32, strided<[1]>>
+// STORE-ND:       %[[DESC:.+]] = xegpu.create_nd_tdesc %[[SUBVIEW]]
+// STORE-ND-SAME:    : memref<32xf32, strided<[1]>> -> !xegpu.tensor_desc<8xf32,
 // STORE-ND-SAME:    boundary_check = false
 // STORE-ND:       xegpu.store_nd %[[VEC]], %[[DESC]][%[[OFFSET]]] : vector<8xf32>
 
@@ -312,15 +305,9 @@ gpu.func @store_to_subview(%vec: vector<8xf16>,
 // STORE-ND-SAME:   %[[VEC:.+]]: vector<8xf16>,
 // STORE-ND-SAME:   %[[SRC:.+]]: memref<4096x4096xf16>,
 // STORE-ND-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
-// STORE-ND:        %[[ELEM_BYTES:.+]] = arith.constant 2 : index
 // STORE-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
-// STORE-ND:        %[[COLLAPSED:.+]] = memref.subview %[[SUBVIEW]][%[[OFF2]], 0]
-// STORE-ND:        %[[BASE_BUFFER:.*]], %[[OFFSET:.*]], %[[SIZES:.*]], %[[STRIDES:.*]] = memref.extract_strided_metadata %[[COLLAPSED]]
-// STORE-ND:        %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE_BUFFER]]
-// STORE-ND:        %[[MUL:.+]] = arith.muli %[[OFFSET]], %[[ELEM_BYTES]] : index
-// STORE-ND:        %[[ADD:.+]] = arith.addi %[[INTPTR]], %[[MUL]] : index
-// STORE-ND:        %[[I64PTR:.*]] = arith.index_cast %[[ADD]] : index to i64
-// STORE-ND:        %[[DESC:.*]] = xegpu.create_nd_tdesc %[[I64PTR]], shape : [256], strides : [1] : i64 ->
+// STORE-ND:        %[[COLLAPSED:.+]] = memref.subview %[[SUBVIEW]][%[[OFF2]], 0] [1, 256] [1, 1] : memref<256x256xf16, strided<[4096, 1]>> to memref<256xf16, strided<[1]>>
+// STORE-ND:        %[[DESC:.*]] = xegpu.create_nd_tdesc %[[COLLAPSED]] : memref<256xf16, strided<[1]>> ->
 // STORE-ND-SAME:                    !xegpu.tensor_desc<8xf16, #xegpu.block_tdesc_attr<boundary_check = false>>
 // STORE-ND:        xegpu.store_nd %[[VEC]], %[[DESC]][%[[OFF2]]] : vector<8xf16>
 
@@ -331,11 +318,8 @@ gpu.func @store_to_subview(%vec: vector<8xf16>,
 // STORE-SCATTER:        %[[CST:.+]] = arith.constant dense<true> : vector<8xi1>
 // STORE-SCATTER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1]
 // STORE-SCATTER-SAME:     : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
-// STORE-SCATTER:        %[[BB:.+]], %[[OFFSET:.+]], {{.*}}, {{.*}} = memref.extract_strided_metadata %[[SUBVIEW]]
-// STORE-SCATTER-SAME:     : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // STORE-SCATTER:        %[[STEP:.+]] = vector.step : vector<8xindex>
 // STORE-SCATTER:        arith.muli {{.*}} : index
-// STORE-SCATTER:        arith.addi %[[OFFSET]]{{.*}} : index
 // STORE-SCATTER:        arith.addi {{.*}} : index
 // STORE-SCATTER:        %[[SPLAT:.+]] = vector.broadcast {{.*}} : index to vector<8xindex>
 // STORE-SCATTER:        %[[IDX:.+]] = arith.addi %[[SPLAT]], %[[STEP]] : vector<8xindex>

>From 40d78ba7e5ef7701aec355e55add39e71777839a Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:34:32 +0200
Subject: [PATCH 13/27] [WIP][mlir] step 2 follow-ups: gather/scatter-to-xegpu
 CHECK fixes (6 left)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir  | 4 ----
 mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir | 4 ----
 2 files changed, 8 deletions(-)

diff --git a/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir
index 14c4429109228..e6613ffb3b0c1 100644
--- a/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir
@@ -124,7 +124,6 @@ gpu.func @load_dynamic_source2(%source: memref<?x8x16xf32>,
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8x16xindex>
 // CHECK-SAME:   %[[MASK:.+]]: vector<8x16xi1>
 // CHECK-SAME:   %[[PASS_THRU:.+]]: vector<8x16xf32>) -> vector<8x16xf32> {
-// CHECK-NOT:    memref.extract_strided_metadata %[[SRC]]
 // CHECK-COUNT2: arith.muli {{.*}} : index
 // CHECK-COUNT2: arith.addi {{.*}} : index
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8x16xindex>
@@ -172,9 +171,7 @@ gpu.func @gather_from_subview(%source: memref<4096x4096xf16>,
 // CHECK-SAME:   %[[MASK:.+]]: vector<8xi1>,
 // CHECK-SAME:   %[[PASS:.+]]: vector<8xf16>) -> vector<8xf16> {
 // CHECK:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[MEMREF_OFF]], %[[MEMREF_OFF]]] [256, 256] [1, 1]
-// CHECK:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // CHECK:        arith.muli {{.*}}%[[OFF1]]{{.*}} : index
-// CHECK:        arith.addi %[[OFFSET]]{{.*}} : index
 // CHECK:        %[[BASE_OFF:.+]] = arith.addi {{.*}}%[[OFF2]]{{.*}} : index
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast %[[BASE_OFF]] : index to vector<8xindex>
 // CHECK:        %[[LIN:.+]] = arith.addi %[[SPLAT]], %[[INDICES]] : vector<8xindex>
@@ -205,7 +202,6 @@ gpu.func @non_unit_inner_stride_1D(
 // CHECK-SAME:   %[[MASK:.+]]: vector<8xi1>, %[[PASS:.+]]: vector<8xf32>) -> vector<8xf32> {
 // CHECK:        %[[BB:.+]], %[[M_OFF:.+]], %[[SZ:.+]], %[[STRIDE:.+]] = memref.extract_strided_metadata %[[SRC]]
 // CHECK:        arith.muli %[[OFF1]], %[[STRIDE]] : index
-// CHECK:        arith.addi {{.*}} : index
 // CHECK:        %[[STRD_VEC:.+]] = vector.broadcast %[[STRIDE]] : index to vector<8xindex>
 // CHECK:        %[[STRD_INDICES:.+]] = arith.muli %[[STRD_VEC:.+]], %[[INDICES]] : vector<8xindex>
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
diff --git a/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir
index ef2d6e65168d5..0073a24789509 100644
--- a/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir
@@ -105,7 +105,6 @@ gpu.func @store_dynamic_source2(%vec: vector<8x16xf32>, %source: memref<?x8x16xf
 // CHECK-SAME:   %[[VAL:.+]]: vector<8x16xf32>, %[[SRC:.+]]: memref<?x8x16xf32>,
 // CHECK-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index, %[[OFF3:.+]]: index,
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8x16xindex>, %[[MASK:.+]]: vector<8x16xi1>) {
-// CHECK-NOT:    memref.extract_strided_metadata %[[SRC]]
 // CHECK-COUNT2: arith.muli {{.*}} : index
 // CHECK-COUNT2: arith.addi {{.*}} : index
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8x16xindex>
@@ -131,7 +130,6 @@ gpu.func @non_unit_inner_stride_1D(
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8xindex>, %[[MASK:.+]]: vector<8xi1>) {
 // CHECK:        %[[BB:.+]], %[[M_OFF:.+]], %[[SZ:.+]], %[[STRIDE:.+]] = memref.extract_strided_metadata %[[SRC]]
 // CHECK:        arith.muli %[[OFF1]], %[[STRIDE]] : index
-// CHECK:        arith.addi {{.*}} : index
 // CHECK:        %[[STRD_VEC:.+]] = vector.broadcast %[[STRIDE]] : index to vector<8xindex>
 // CHECK:        %[[STRD_INDICES:.+]] = arith.muli %[[STRD_VEC:.+]], %[[INDICES]] : vector<8xindex>
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
@@ -193,9 +191,7 @@ gpu.func @scatter_into_subview(%vals: vector<8xf16>,
 // CHECK-SAME:   %[[MEMREF_OFF:.+]]: index, %[[OFF1:.+]]: index, %[[OFF2:.+]]: index,
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8xindex>, %[[MASK:.+]]: vector<8xi1>) {
 // CHECK:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[MEMREF_OFF]], %[[MEMREF_OFF]]] [256, 256] [1, 1]
-// CHECK:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // CHECK:        arith.muli {{.*}}%[[OFF1]]{{.*}} : index
-// CHECK:        arith.addi %[[OFFSET]]{{.*}} : index
 // CHECK:        %[[BASE_OFF:.+]] = arith.addi {{.*}}%[[OFF2]]{{.*}} : index
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast %[[BASE_OFF]] : index to vector<8xindex>
 // CHECK:        %[[LIN:.+]] = arith.addi %[[SPLAT]], %[[INDICES]] : vector<8xindex>

>From 2d15d93b4b8b03e50f792f33fa40be89db59079d Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:38:31 +0200
Subject: [PATCH 14/27] [WIP][mlir] step 2 follow-ups: XeVM CHECK fixes (4
 left)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir |  4 ++--
 .../test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir | 10 +++-------
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir b/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
index 83dbf36aa4a4b..a8842873d3cc7 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
@@ -292,9 +292,9 @@ gpu.module @test_kernel [#xevm.target<chip = "pvc">] {
   %smem_coop_a = memref.subview %arg0[64, 0][1, 16][1, 1] : memref<256x16xbf16, 3> to memref<1x16xbf16, strided<[16, 1]>, 3>
 
   //CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %{{.*}} : memref<1x16xbf16, strided<[16, 1]>, 3> -> index
-  //CHECK: %[[C1024:.*]] = arith.constant 1024 : index
+  //CHECK: %[[C0:.*]] = arith.constant 0 : index
   //CHECK: %[[CAST0:.*]] = arith.index_castui %[[INTPTR]] : index to i32
-  //CHECK: %[[CAST1:.*]] = arith.index_castui %[[C1024]] : index to i32
+  //CHECK: %[[CAST1:.*]] = arith.index_castui %[[C0]] : index to i32
   //CHECK: %[[C2:.*]] = arith.constant 2 : i32
   //CHECK: %[[MUL:.*]] = arith.muli %[[CAST1]], %[[C2]] : i32
   //CHECK: %{{.*}} = arith.addi %[[CAST0]], %[[MUL]] : i32
diff --git a/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir b/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir
index 0062a5638c0c6..d7211321b659e 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir
@@ -119,15 +119,11 @@ gpu.func @load_gather_from_dyn_memref_subview(%dyn: memref<?xf16>, %offset: vect
   %id = gpu.subgroup_id : index
   %src = memref.subview %dyn[%id][16][1] : memref<?xf16> to memref<16xf16, strided<[1]>>
 
-  // CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]], %[[STRIDES:.*]] = memref.extract_strided_metadata %{{.*}} : memref<16xf16, strided<[1]>> -> memref<f16>, index, index, index
-  // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE]] : memref<f16> -> index
+  // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %{{.*}} : memref<16xf16, strided<[1]>> -> index
   // CHECK: %[[CAST1:.*]] = arith.index_castui %[[INTPTR]] : index to i64
-  // CHECK: %[[CAST2:.*]] = arith.index_castui %[[OFFSET]] : index to i64
-  // CHECK: %[[MUL1:.*]] = arith.muli %[[CAST2]], %{{.*}} : i64
+  // CHECK: %[[MUL1:.*]] = arith.muli %{{.*}}, %{{.*}} : i64
   // CHECK: %[[ADD1:.*]] = arith.addi %[[CAST1]], %[[MUL1]] : i64
-  // CHECK: %[[MUL2:.*]] = arith.muli %{{.*}}, %{{.*}} : i64
-  // CHECK: %[[ADD2:.*]] = arith.addi %[[ADD1]], %[[MUL2]] : i64
-  // CHECK: %{{.*}} = llvm.inttoptr %[[ADD2]] : i64 to !llvm.ptr<1>
+  // CHECK: %{{.*}} = llvm.inttoptr %[[ADD1]] : i64 to !llvm.ptr<1>
 
   %0 = xegpu.load %src[%offset], %mask <{l1_hint = #xegpu.cache_hint<cached>, l2_hint = #xegpu.cache_hint<uncached>}>
       : memref<16xf16, strided<[1]>>, vector<1xindex>, vector<1xi1> -> vector<1xf16>

>From 70583e3d6ff3555f893843cf505750e95b803bd8 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:45:20 +0200
Subject: [PATCH 15/27] [WIP][mlir] step 2 follow-ups: expand-strided-metadata
 CHECK fixes (3 left)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 .../Dialect/MemRef/expand-strided-metadata.mlir   | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
index a7f3066ad8a75..be2fc5ac1ee49 100644
--- a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
+++ b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
@@ -534,6 +534,7 @@ func.func @extract_strided_metadata_of_expand_shape_all_static(
 //  CHECK-SAME: (%[[ARG:.*]]: memref<?x?xf32,
 //  CHECK-SAME: %[[SIZE0:.*]]: index,  %[[SIZE1:.*]]: index, %[[SIZE2:.*]]: index,  %[[SIZE3:.*]]: index)
 //
+//   CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
 //   CHECK-DAG: %[[C10:.*]] = arith.constant 10 : index
 //   CHECK-DAG: %[[C9:.*]] = arith.constant 9 : index
 //   CHECK-DAG: %[[C8:.*]] = arith.constant 8 : index
@@ -548,7 +549,7 @@ func.func @extract_strided_metadata_of_expand_shape_all_static(
 //   CHECK-DAG: %[[DYN_STRIDE5:.*]] = affine.apply #[[$DIM5_STRIDE_MAP]]()[%[[SIZE3]], %[[STRIDES]]#1]
 //   CHECK-DAG: %[[DYN_STRIDE6:.*]] = affine.apply #[[$DIM6_STRIDE_MAP]]()[%[[STRIDES]]#1]
 
-//   CHECK: return %[[BASE]], %[[OFFSET]], %[[SIZE0]], %[[SIZE1]], %[[C8]], %[[C9]], %[[C10]], %[[SIZE2]], %[[SIZE3]], %[[C3]], %[[DYN_STRIDE0]], %[[DYN_STRIDE1]], %[[DYN_STRIDE2]], %[[STRIDES]]#0, %[[DYN_STRIDE4]], %[[DYN_STRIDE5]], %[[DYN_STRIDE6]], %[[STRIDES]]#1 : memref<f32>, index, index, index, index, index, index, index, index, index, index, index, index, index
+//   CHECK: return %[[BASE]], %[[C0]], %[[SIZE0]], %[[SIZE1]], %[[C8]], %[[C9]], %[[C10]], %[[SIZE2]], %[[SIZE3]], %[[C3]], %[[DYN_STRIDE0]], %[[DYN_STRIDE1]], %[[DYN_STRIDE2]], %[[STRIDES]]#0, %[[DYN_STRIDE4]], %[[DYN_STRIDE5]], %[[DYN_STRIDE6]], %[[STRIDES]]#1 : memref<f32>, index, index, index, index, index, index, index, index, index, index, index, index, index
 func.func @extract_strided_metadata_of_expand_shape_all_dynamic(
     %base: memref<?x?xf32, strided<[?,?]>>,
     %sz0: index, %sz1: index, %sz2: index, %sz3: index)
@@ -587,11 +588,12 @@ func.func @extract_strided_metadata_of_expand_shape_all_dynamic(
 // CHECK-LABEL: func @extract_strided_metadata_of_expand_shape_all_static_0_rank
 //  CHECK-SAME: (%[[ARG:.*]]: memref<i16, strided<[]>>)
 //
+//   CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
 //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
 //
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]] = memref.extract_strided_metadata %[[ARG]] : memref<i16, strided<[]>> -> memref<i16>, index
 //
-//   CHECK: return %[[BASE]], %[[OFFSET]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]] : memref<i16>, index, index, index, index, index, index, index, index, index, index, index
+//   CHECK: return %[[BASE]], %[[C0]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]] : memref<i16>, index, index, index, index, index, index, index, index, index, index, index
 func.func @extract_strided_metadata_of_expand_shape_all_static_0_rank(
     %arg : memref<i16, strided<[]>>)
     -> (memref<i16>, index,
@@ -806,7 +808,7 @@ func.func @extract_strided_metadata_of_alloc_with_cst_offset(%arg : index)
 
 // CHECK-LABEL: extract_strided_metadata_of_alloc_with_cst_offset_in_type
 //       CHECK: %[[ALLOC:.*]] = memref.alloc
-//       CHECK: %[[BASE:[^,]*]], {{.*}} = memref.extract_strided_metadata %[[ALLOC]]
+//       CHECK: %[[BASE:.*]] = memref.reinterpret_cast %[[ALLOC]]
 //       CHECK: return %[[BASE]]
 func.func @extract_strided_metadata_of_alloc_with_cst_offset_in_type(%arg : index)
     -> (memref<i16>, index, index, index) {
@@ -959,7 +961,7 @@ func.func @simplify_collapse_with_dim_of_size1(%arg0: memref<3x1xf32, strided<[2
 //
 //       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<1x1xi32, strided<[2, 1]>>
 //
-//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [1], strides: [2]
+//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [0], sizes: [1], strides: [2]
 func.func @simplify_collapse_with_dim_of_size1_and_non_1_stride
     (%arg0: memref<1x1xi32, strided<[2, 1]>>)
     -> memref<1xi32, strided<[2]>> {
@@ -1000,7 +1002,7 @@ func.func @simplify_collapse_with_dim_of_size1_and_non_1_stride
 //
 //       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:5, %[[STRIDES:.*]]:5 = memref.extract_strided_metadata %[[ARG]] : memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2]>>
 //
-//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [6, 1], strides: [%[[STRIDES]]#1, %[[STRIDES]]#2]
+//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [0], sizes: [6, 1], strides: [%[[STRIDES]]#1, %[[STRIDES]]#2]
 func.func @simplify_collapse_with_dim_of_size1_and_resulting_dyn_stride
     (%arg0: memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2]>>)
     -> memref<6x1xi32, strided<[?, ?]>> {
@@ -1386,10 +1388,9 @@ func.func @extract_strided_metadata_of_memory_space_cast(%base: memref<20xf32>)
 }
 
 // CHECK-LABEL:  func @extract_strided_metadata_of_memory_space_cast
-//   CHECK-DAG:    %[[OFFSET:.*]] = arith.constant 0 : index
 //   CHECK-DAG:    %[[SIZE:.*]] = arith.constant 20 : index
 //   CHECK-DAG:    %[[STEP:.*]] = arith.constant 1 : index
-//       CHECK:    %[[BASE:.*]], %{{.*}}, %{{.*}}, %{{.*}} = memref.extract_strided_metadata
+//       CHECK:    %[[BASE:.*]], %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata
 //       CHECK:    %[[CAST:.*]] = memref.memory_space_cast %[[BASE]]
 //       CHECK:    return %[[CAST]], %[[OFFSET]], %[[SIZE]], %[[STEP]] : memref<f32, 1>, index, index, index
 

>From 18fad81adb23c55ce4b7f18065d29c67f901525a Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:51:24 +0200
Subject: [PATCH 16/27] [WIP][mlir] step 2 follow-ups: NVGPU CHECK fix (2 left)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
index 464592b716c2d..48b9ad4c3d777 100644
--- a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
+++ b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
@@ -852,9 +852,7 @@ module @mymodule {
     // CHECK: nvvm.cp.async.bulk.tensor.shared.cluster.global
     nvgpu.tma.async.load %lhsTensorMap[%c0, %c0], %mbarrier[%c0] to %lhsShmem : !lhsTensorMap, !barrierType -> memref<128x64xf16,3>
     // CHECK: %[[desc:.+]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
-    // CHECK: %[[c8192:.+]] = llvm.mlir.constant(8192 : index) : i64
-    // CHECK: %[[shmemOfset:.+]] = llvm.getelementptr %[[desc]][%[[c8192]]] : (!llvm.ptr<3>, i64)
-    // CHECK: %[[dest:.+]] = llvm.addrspacecast %[[shmemOfset]] : !llvm.ptr<3> to !llvm.ptr<7>
+    // CHECK: %[[dest:.+]] = llvm.addrspacecast %[[desc]] : !llvm.ptr<3> to !llvm.ptr<7>
     // CHECK: nvvm.cp.async.bulk.tensor.shared.cluster.global %[[dest]], %{{.*}}, %{{.*}}, box[%{{.*}}, %{{.*}}]
     nvgpu.tma.async.load %rhsTensorMap[%c0, %c0], %mbarrier[%c0] to %rhsShmem : !rhsTensorMap, !barrierType -> memref<64x64xf16, strided<[64, 1]>, 3>
     return

>From 145a2fc9ca52a7aabc2e1c4f7aed47b161ee2b34 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:54:15 +0200
Subject: [PATCH 17/27] [WIP][mlir] step 2 follow-ups: tensor bufferize
 buffer_layout (1 left)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 mlir/test/Dialect/Tensor/one-shot-bufferize.mlir | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mlir/test/Dialect/Tensor/one-shot-bufferize.mlir b/mlir/test/Dialect/Tensor/one-shot-bufferize.mlir
index 737f618bd41f4..3f57ac6622a52 100644
--- a/mlir/test/Dialect/Tensor/one-shot-bufferize.mlir
+++ b/mlir/test/Dialect/Tensor/one-shot-bufferize.mlir
@@ -330,15 +330,15 @@ func.func @dim_not_reading(%t: tensor<?xf32>, %f: f32, %pos: index)
 
 // -----
 
-//       CHECK: #[[$map:.*]] = affine_map<(d0) -> (d0 + 5)>
+//       CHECK: #[[$map:.*]] = affine_map<(d0) -> (d0 * 2)>
 // CHECK-LABEL: func.func private @cast_retains_buffer_layout(
-//  CHECK-SAME:     %[[t:.*]]: memref<?xf32, #[[$map]]>, %[[sz:.*]]: index) -> memref<?xf32, strided<[1]>> {
+//  CHECK-SAME:     %[[t:.*]]: memref<?xf32, #[[$map]]>, %[[sz:.*]]: index) -> memref<?xf32, strided<[2]>> {
 //       CHECK:   %[[casted:.*]] = memref.cast %[[t]] : memref<?xf32, #[[$map]]> to memref<10xf32, #[[$map]]>
-//       CHECK:   %[[slice:.*]] = memref.subview %[[casted]][2] [%[[sz]]] [1] : memref<10xf32, #[[$map]]> to memref<?xf32, strided<[1]>>
+//       CHECK:   %[[slice:.*]] = memref.subview %[[casted]][2] [%[[sz]]] [1] : memref<10xf32, #[[$map]]> to memref<?xf32, strided<[2]>>
 //       CHECK:   return %[[slice]]
 func.func private @cast_retains_buffer_layout(
     %t: tensor<?xf32>
-        {bufferization.buffer_layout = affine_map<(d0) -> (d0 + 5)>},
+        {bufferization.buffer_layout = affine_map<(d0) -> (d0 * 2)>},
     %sz: index)
   -> (tensor<10xf32>, tensor<?xf32>)
 {

>From 13b0e876d0b464a7e1019ae2f413e57f8b108903 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 03:59:14 +0200
Subject: [PATCH 18/27] [WIP][mlir] step 2 follow-ups: PtrToLLVM CHECK fix -
 all dialect/conversion tests pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
---
 .../Conversion/PtrToLLVM/ptr-to-llvm.mlir     | 28 +++++++++----------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir b/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
index 7110a622dcb03..b34c6743a817a 100644
--- a/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
+++ b/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
@@ -226,36 +226,34 @@ func.func @test_memref_strided(%arg0: memref<10x20xf32, strided<[40, 2]>, #ptr.g
 // CHECK:           %[[VAL_6:.*]] = llvm.insertvalue %[[ARG4]], %[[VAL_5]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[ARG6]], %[[VAL_6]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_8:.*]] = llvm.extractvalue %[[VAL_7]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_9:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_9:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_10:.*]] = llvm.extractvalue %[[VAL_7]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_11:.*]] = llvm.insertvalue %[[VAL_10]], %[[VAL_9]][0] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
-// CHECK:           %[[VAL_12:.*]] = llvm.extractvalue %[[VAL_7]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_13:.*]] = llvm.insertvalue %[[VAL_12]], %[[VAL_11]][1] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_11:.*]] = llvm.insertvalue %[[VAL_10]], %[[VAL_9]][0] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_14:.*]] = llvm.extractvalue %[[VAL_7]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_15:.*]] = llvm.insertvalue %[[VAL_14]], %[[VAL_13]][2] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_15:.*]] = llvm.insertvalue %[[VAL_14]], %[[VAL_11]][1] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_16:.*]] = llvm.extractvalue %[[VAL_7]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_17:.*]] = llvm.insertvalue %[[VAL_16]], %[[VAL_15]][3] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_17:.*]] = llvm.insertvalue %[[VAL_16]], %[[VAL_15]][2] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_18:.*]] = llvm.extractvalue %[[VAL_7]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_19:.*]] = llvm.insertvalue %[[VAL_18]], %[[VAL_17]][4] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_19:.*]] = llvm.insertvalue %[[VAL_18]], %[[VAL_17]][3] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_20:.*]] = llvm.extractvalue %[[VAL_7]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_21:.*]] = llvm.insertvalue %[[VAL_20]], %[[VAL_19]][5] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_21:.*]] = llvm.insertvalue %[[VAL_20]], %[[VAL_19]][4] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_22:.*]] = llvm.mlir.zero : !llvm.ptr
 // CHECK:           %[[VAL_23:.*]] = llvm.getelementptr %[[VAL_22]][1] : (!llvm.ptr) -> !llvm.ptr, f32
 // CHECK:           %[[VAL_24:.*]] = llvm.ptrtoint %[[VAL_23]] : !llvm.ptr to i64
 // CHECK:           %[[VAL_25:.*]] = llvm.getelementptr inbounds %[[VAL_8]]{{\[}}%[[VAL_24]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
 // CHECK:           %[[VAL_26:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_27:.*]] = llvm.extractvalue %[[VAL_21]][0] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_27:.*]] = llvm.extractvalue %[[VAL_21]][0] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_28:.*]] = llvm.insertvalue %[[VAL_27]], %[[VAL_26]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_29:.*]] = llvm.insertvalue %[[VAL_25]], %[[VAL_28]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_30:.*]] = llvm.extractvalue %[[VAL_21]][1] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
-// CHECK:           %[[VAL_31:.*]] = llvm.insertvalue %[[VAL_30]], %[[VAL_29]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_32:.*]] = llvm.extractvalue %[[VAL_21]][2] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[ZERO:.*]] = llvm.mlir.constant(0 : index) : i64
+// CHECK:           %[[VAL_31:.*]] = llvm.insertvalue %[[ZERO]], %[[VAL_29]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[VAL_32:.*]] = llvm.extractvalue %[[VAL_21]][1] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_33:.*]] = llvm.insertvalue %[[VAL_32]], %[[VAL_31]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_34:.*]] = llvm.extractvalue %[[VAL_21]][3] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_34:.*]] = llvm.extractvalue %[[VAL_21]][2] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_35:.*]] = llvm.insertvalue %[[VAL_34]], %[[VAL_33]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_36:.*]] = llvm.extractvalue %[[VAL_21]][4] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_36:.*]] = llvm.extractvalue %[[VAL_21]][3] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_37:.*]] = llvm.insertvalue %[[VAL_36]], %[[VAL_35]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_38:.*]] = llvm.extractvalue %[[VAL_21]][5] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_38:.*]] = llvm.extractvalue %[[VAL_21]][4] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_39:.*]] = llvm.insertvalue %[[VAL_38]], %[[VAL_37]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           llvm.return %[[VAL_39]] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:         }

>From 33f321b27df9a11c3f4fb213cdf2d3ab5cf7a129 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 04:55:04 +0200
Subject: [PATCH 19/27] [WIP][mlir] step 2 follow-ups: fix runtime offset drop
 in LLVM lowering

The hang in sparse_reductions_prod (under enable-buffer-initialization=true)
was caused by MemRefDescriptor::bufferPtr and ExpandStridedMetadata helpers
silently dropping the descriptor offset because getStridesAndOffset now
always reports static offset 0.

- bufferPtr: always GEP through the descriptor offset
- resolveSubviewStridedMetadata, resolveReshapeStridedMetadata: always read
  runtime offset from extract_strided_metadata

CHECK-line updates in 7 tests (8 left for expand-then-convert + esm).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
---
 .../Conversion/LLVMCommon/MemRefBuilder.cpp   | 23 ++----
 .../Transforms/ExpandStridedMetadata.cpp      | 14 ++--
 .../AMDGPUToROCDL/amdgpu-to-rocdl.mlir        |  8 ++-
 .../Conversion/AMDGPUToROCDL/gfx1250.mlir     | 28 ++++++--
 .../AMDGPUToROCDL/global-prefetch.mlir        |  3 +
 .../AMDGPUToROCDL/load_lds-gfx950.mlir        | 16 +++--
 .../Conversion/AMDGPUToROCDL/load_lds.mlir    | 72 ++++++++++++++-----
 .../convert-dynamic-memref-ops.mlir           | 20 ++++--
 .../convert-static-memref-ops.mlir            | 26 +++++--
 .../MemRefToLLVM/memref-to-llvm.mlir          | 24 +++++--
 10 files changed, 162 insertions(+), 72 deletions(-)

diff --git a/mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp b/mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp
index 522e91421ff55..0762d6c9530d8 100644
--- a/mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp
+++ b/mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp
@@ -195,25 +195,14 @@ LLVM::LLVMPointerType MemRefDescriptor::getElementPtrType() {
 Value MemRefDescriptor::bufferPtr(OpBuilder &builder, Location loc,
                                   const LLVMTypeConverter &converter,
                                   MemRefType type) {
-  // When we convert to LLVM, the input memref must have been normalized
-  // beforehand. Hence, this call is guaranteed to work.
-  auto [strides, offsetCst] = type.getStridesAndOffset();
-
+  // The MemRef type no longer carries a static offset, so we cannot tell from
+  // the type alone whether the runtime offset is zero. Always add it; LLVM's
+  // canonicalizer will fold a zero-offset GEP away.
   Value ptr = alignedPtr(builder, loc);
-  // For zero offsets, we already have the base pointer.
-  if (offsetCst == 0)
-    return ptr;
-
-  // Otherwise add the offset to the aligned base.
-  Type indexType = converter.getIndexType();
-  Value offsetVal =
-      ShapedType::isDynamic(offsetCst)
-          ? offset(builder, loc)
-          : createIndexAttrConstant(builder, loc, indexType, offsetCst);
+  Value offsetVal = offset(builder, loc);
   Type elementType = converter.convertType(type.getElementType());
-  ptr = LLVM::GEPOp::create(builder, loc, ptr.getType(), elementType, ptr,
-                            offsetVal);
-  return ptr;
+  return LLVM::GEPOp::create(builder, loc, ptr.getType(), elementType, ptr,
+                             offsetVal);
 }
 
 /// Creates a MemRef descriptor structure from a list of individual values
diff --git a/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp b/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp
index cda14f1c3cf2c..265df32b49b8a 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp
@@ -69,6 +69,7 @@ resolveSubviewStridedMetadata(RewriterBase &rewriter,
       memref::ExtractStridedMetadataOp::create(rewriter, origLoc, source);
 
   auto [sourceStrides, sourceOffset] = sourceType.getStridesAndOffset();
+  (void)sourceOffset;
 #ifndef NDEBUG
   auto [resultStrides, resultOffset] = subview.getType().getStridesAndOffset();
 #endif // NDEBUG
@@ -86,9 +87,9 @@ resolveSubviewStridedMetadata(RewriterBase &rewriter,
 
   bindSymbolsList(rewriter.getContext(), MutableArrayRef{symbols});
   AffineExpr expr = symbols.front();
-  values[0] = ShapedType::isDynamic(sourceOffset)
-                  ? getAsOpFoldResult(newExtractStridedMetadata.getOffset())
-                  : rewriter.getIndexAttr(sourceOffset);
+  // The MemRef type no longer carries a static offset, so always read the
+  // runtime offset from extract_strided_metadata.
+  values[0] = getAsOpFoldResult(newExtractStridedMetadata.getOffset());
   SmallVector<OpFoldResult> subOffsets = subview.getMixedOffsets();
 
   AffineExpr s0 = rewriter.getAffineSymbolExpr(0);
@@ -507,13 +508,14 @@ static FailureOr<StridedMetadata> resolveReshapeStridedMetadata(
 
   // Collect statically known information.
   auto [strides, offset] = sourceType.getStridesAndOffset();
+  (void)offset;
   MemRefType reshapeType = reshape.getResultType();
   unsigned reshapeRank = reshapeType.getRank();
 
+  // The MemRef type no longer carries a static offset, so always read the
+  // runtime offset from extract_strided_metadata.
   OpFoldResult offsetOfr =
-      ShapedType::isDynamic(offset)
-          ? getAsOpFoldResult(newExtractStridedMetadata.getOffset())
-          : rewriter.getIndexAttr(offset);
+      getAsOpFoldResult(newExtractStridedMetadata.getOffset());
 
   // Get the special case of 0-D out of the way.
   if (sourceRank == 0) {
diff --git a/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir b/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
index 6d48b143d45c4..6f15498422465 100644
--- a/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
+++ b/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
@@ -69,7 +69,9 @@ func.func @fat_raw_buffer_cast_dyn_size_offset(%buf: memref<?xi32, strided<[1]>,
 // CHECK-LABEL: func @fat_raw_buffer_cast_reset_offset
 func.func @fat_raw_buffer_cast_reset_offset(%buf: memref<?xi32, strided<[1]>, #gpu.address_space<global>>) -> memref<?xi32, strided<[1]>, #amdgpu.address_space<fat_raw_buffer>> {
   // CHECK: %[[desc:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<?xi32, strided<[1]>, #gpu.address_space<global>> to !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
-  // CHECK-DAG: %[[basePtr:.*]] = llvm.extractvalue %[[desc]][1]
+  // CHECK-DAG: %[[aligned:.*]] = llvm.extractvalue %[[desc]][1]
+  // CHECK-DAG: %[[descOff:.*]] = llvm.extractvalue %[[desc]][2]
+  // CHECK-DAG: %[[basePtr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
   // CHECK-DAG: %[[zeroOff:.*]] = llvm.mlir.constant(0 : index) : i64
   // CHECK: %[[fatBuf:.*]] = rocdl.make.buffer.rsrc %[[basePtr]], %{{.*}}, %{{.*}}, %{{.*}}
   // CHECK: llvm.insertvalue %[[fatBuf]], %{{.*}}[1]
@@ -152,7 +154,9 @@ func.func @gpu_gcn_raw_buffer_load_i32(%buf: memref<64xi32>, %idx: i32) -> i32 {
 func.func @gpu_gcn_raw_buffer_load_i32_strided(%buf: memref<16x16xi32, strided<[?, ?]>>, %i: i32, %j: i32) -> i32 {
     // CHECK: %[[descriptor:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<16x16xi32, strided<[?, ?]>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[elem_size:.*]] = llvm.mlir.constant(4 : i32) : i32
-    // CHECK: %[[ptr:.*]] = llvm.extractvalue %[[descriptor]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK: %[[aligned:.*]] = llvm.extractvalue %[[descriptor]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK: %[[descOff:.*]] = llvm.extractvalue %[[descriptor]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK: %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
     // CHECK: %[[sz_i:.*]] = llvm.extractvalue %[[descriptor]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[stride_i:.*]] = llvm.extractvalue %[[descriptor]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[ext_i:.*]] = llvm.mul %[[sz_i]], %[[stride_i]] : i64
diff --git a/mlir/test/Conversion/AMDGPUToROCDL/gfx1250.mlir b/mlir/test/Conversion/AMDGPUToROCDL/gfx1250.mlir
index e43ece8c74fdf..9e914648c0a02 100644
--- a/mlir/test/Conversion/AMDGPUToROCDL/gfx1250.mlir
+++ b/mlir/test/Conversion/AMDGPUToROCDL/gfx1250.mlir
@@ -193,8 +193,12 @@ func.func @make_dma_base(%idx: index, %mem: memref<8xi32, #gpu.address_space<glo
   // CHECK-DAG: %[[C2:.+]] = llvm.mlir.constant(2 : i32) : i32
   // CHECK-DAG: %[[C3:.+]] = llvm.mlir.constant(3 : i32) : i32
 
-  // CHECK-DAG: %[[MEM_BASE_PTR:.+]] = llvm.extractvalue %[[MEMREF_DESC_MEM]][1] : !llvm.struct<(ptr<1>
-  // CHECK-DAG: %[[SMEM_BASE_PTR:.+]] = llvm.extractvalue %[[MEMREF_DESC_SMEM]][1] : !llvm.struct<(ptr<3>
+  // CHECK-DAG: %[[MEM_ALIGNED:.+]] = llvm.extractvalue %[[MEMREF_DESC_MEM]][1] : !llvm.struct<(ptr<1>
+  // CHECK-DAG: %[[MEM_DESC_OFF:.+]] = llvm.extractvalue %[[MEMREF_DESC_MEM]][2] : !llvm.struct<(ptr<1>
+  // CHECK-DAG: %[[MEM_BASE_PTR:.+]] = llvm.getelementptr %[[MEM_ALIGNED]][%[[MEM_DESC_OFF]]]
+  // CHECK-DAG: %[[SMEM_ALIGNED:.+]] = llvm.extractvalue %[[MEMREF_DESC_SMEM]][1] : !llvm.struct<(ptr<3>
+  // CHECK-DAG: %[[SMEM_DESC_OFF:.+]] = llvm.extractvalue %[[MEMREF_DESC_SMEM]][2] : !llvm.struct<(ptr<3>
+  // CHECK-DAG: %[[SMEM_BASE_PTR:.+]] = llvm.getelementptr %[[SMEM_ALIGNED]][%[[SMEM_DESC_OFF]]]
 
   // CHECK-DAG: %[[MEM_BASE_OFFSET:.+]] = llvm.getelementptr %[[MEM_BASE_PTR]][%[[INT]]]
   // CHECK-DAG: %[[SMEM_BASE_OFFSET:.+]] = llvm.getelementptr %[[SMEM_BASE_PTR]][%[[INT]]]
@@ -362,7 +366,9 @@ func.func @make_dma_descriptor_atomic_barrier(%base: !amdgpu.tdm_base<i32>, %bar
   // CHECK: %[[ATOMIC_BARRIER_ENABLE_FIELD:.+]] = llvm.shl %[[C1]], %[[ATOMIC_BARRIER_ENABLE_OFFSET]]
   // CHECK: %[[SGPR0:.+]] = llvm.or disjoint %[[SGPR0_0]], %[[ATOMIC_BARRIER_ENABLE_FIELD]]
 
-  // CHECK: %[[ATOMIC_BARRIER_ALIGNED_PTR:.+]] = llvm.extractvalue %[[BARRIER_MEMREF_DESC]][1]
+  // CHECK: %[[ATOMIC_BARRIER_ALIGNED_RAW:.+]] = llvm.extractvalue %[[BARRIER_MEMREF_DESC]][1]
+  // CHECK: %[[ATOMIC_BARRIER_DESC_OFF:.+]] = llvm.extractvalue %[[BARRIER_MEMREF_DESC]][2]
+  // CHECK: %[[ATOMIC_BARRIER_ALIGNED_PTR:.+]] = llvm.getelementptr %[[ATOMIC_BARRIER_ALIGNED_RAW]][%[[ATOMIC_BARRIER_DESC_OFF]]]
   // CHECK: %[[ATOMIC_BARRIER_ADDR:.+]] = llvm.getelementptr %[[ATOMIC_BARRIER_ALIGNED_PTR]][%[[INDEX]]
   // CHECK: %[[ATOMIC_BARRIER_I32:.+]] = llvm.ptrtoint %[[ATOMIC_BARRIER_ADDR]] : !llvm.ptr<3> to i32
   // CHECK: %[[ATOMIC_BARRIER_NO_3_LSB:.+]] = llvm.lshr %[[ATOMIC_BARRIER_I32]], %[[C3]]
@@ -854,7 +860,9 @@ func.func @make_gather_dma_descriptor(%base: !amdgpu.tdm_gather_base<i32, i16>,
 // CHECK-LABEL: func @ds_barrier_init
 func.func @ds_barrier_init(%barrier: memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>>, %participants: i32) {
   // CHECK: [[CAST:%.*]] = builtin.unrealized_conversion_cast %arg0
-  // CHECK: [[PTR:%.*]] = llvm.extractvalue [[CAST]][1]
+  // CHECK: [[ALIGNED:%.*]] = llvm.extractvalue [[CAST]][1]
+  // CHECK: [[DESCOFF:%.*]] = llvm.extractvalue [[CAST]][2]
+  // CHECK: [[PTR:%.*]] = llvm.getelementptr [[ALIGNED]][[[DESCOFF]]]
   // CHECK: [[C1:%.*]] = llvm.mlir.constant(1 : i32)
   // CHECK: [[SUB:%.*]] = llvm.sub %arg1, [[C1]]
   // CHECK: [[MASK:%.*]] = llvm.mlir.constant(536870911 : i32)
@@ -871,7 +879,9 @@ func.func @ds_barrier_init(%barrier: memref<!amdgpu.ds_barrier_state, #gpu.addre
 // CHECK-LABEL: func @ds_barrier_poll_state
 func.func @ds_barrier_poll_state(%barrier: memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>>) -> !amdgpu.ds_barrier_state {
   // CHECK: [[CAST:%.*]] = builtin.unrealized_conversion_cast %arg0
-  // CHECK: [[PTR:%.*]] = llvm.extractvalue [[CAST]][1]
+  // CHECK: [[ALIGNED:%.*]] = llvm.extractvalue [[CAST]][1]
+  // CHECK: [[DESCOFF:%.*]] = llvm.extractvalue [[CAST]][2]
+  // CHECK: [[PTR:%.*]] = llvm.getelementptr [[ALIGNED]][[[DESCOFF]]]
   // CHECK: [[LOADED:%.*]] = llvm.load [[PTR]] atomic syncscope("workgroup") acquire
   // CHECK: builtin.unrealized_conversion_cast [[LOADED]]
   %state = amdgpu.ds_barrier_poll_state %barrier[] : memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>> -> !amdgpu.ds_barrier_state
@@ -881,7 +891,9 @@ func.func @ds_barrier_poll_state(%barrier: memref<!amdgpu.ds_barrier_state, #gpu
 // CHECK-LABEL: func @ds_async_barrier_arrive
 func.func @ds_async_barrier_arrive(%barrier: memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>>) {
   // CHECK: [[CAST:%.*]] = builtin.unrealized_conversion_cast %arg0
-  // CHECK: [[PTR:%.*]] = llvm.extractvalue [[CAST]][1]
+  // CHECK: [[ALIGNED:%.*]] = llvm.extractvalue [[CAST]][1]
+  // CHECK: [[DESCOFF:%.*]] = llvm.extractvalue [[CAST]][2]
+  // CHECK: [[PTR:%.*]] = llvm.getelementptr [[ALIGNED]][[[DESCOFF]]]
   // CHECK: rocdl.ds.atomic.async.barrier.arrive.b64 [[PTR]] : !llvm.ptr<3>
   amdgpu.ds_async_barrier_arrive %barrier[] : memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>>
   func.return
@@ -890,7 +902,9 @@ func.func @ds_async_barrier_arrive(%barrier: memref<!amdgpu.ds_barrier_state, #g
 // CHECK-LABEL: func @ds_barrier_arrive
 func.func @ds_barrier_arrive(%barrier: memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>>, %count: i64) -> !amdgpu.ds_barrier_state {
   // CHECK: [[CAST:%.*]] = builtin.unrealized_conversion_cast %arg0
-  // CHECK: [[PTR:%.*]] = llvm.extractvalue [[CAST]][1]
+  // CHECK: [[ALIGNED:%.*]] = llvm.extractvalue [[CAST]][1]
+  // CHECK: [[DESCOFF:%.*]] = llvm.extractvalue [[CAST]][2]
+  // CHECK: [[PTR:%.*]] = llvm.getelementptr [[ALIGNED]][[[DESCOFF]]]
   // CHECK: [[OLD:%.*]] = rocdl.ds.atomic.barrier.arrive.rtn.b64 [[PTR]], %arg1 : !llvm.ptr<3>, i64 -> i64
   // CHECK: builtin.unrealized_conversion_cast [[OLD]]
   %old_state = amdgpu.ds_barrier_arrive %barrier[], %count : memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>>, i64 -> !amdgpu.ds_barrier_state
diff --git a/mlir/test/Conversion/AMDGPUToROCDL/global-prefetch.mlir b/mlir/test/Conversion/AMDGPUToROCDL/global-prefetch.mlir
index acd3710a485ac..b106d16ecca54 100644
--- a/mlir/test/Conversion/AMDGPUToROCDL/global-prefetch.mlir
+++ b/mlir/test/Conversion/AMDGPUToROCDL/global-prefetch.mlir
@@ -2,6 +2,7 @@
 
 // CHECK-LABEL: @glb_prefetch0
 func.func @glb_prefetch0(%src : memref<64x64xf16, #gpu.address_space<global>>, %i : i64, %j : i64) {
+  // CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
   // CHECK: %[[PTR:.*]] = llvm.getelementptr inbounds|nuw %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
   // CHECK: rocdl.global.prefetch %[[PTR]], scope 3 : !llvm.ptr<1>
   amdgpu.global_prefetch %src[%i, %j] HT WGP : memref<64x64xf16, #gpu.address_space<global>>
@@ -10,6 +11,7 @@ func.func @glb_prefetch0(%src : memref<64x64xf16, #gpu.address_space<global>>, %
 
 // CHECK-LABEL: @glb_prefetch1
 func.func @glb_prefetch1(%src : memref<64x64xf16, #gpu.address_space<global>>, %i : i64, %j : i64) {
+  // CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
   // CHECK: %[[PTR:.*]] = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
   // CHECK: rocdl.global.prefetch %[[PTR]], scope 10 : !llvm.ptr<1>
   amdgpu.global_prefetch %src[%i, %j] HT SE speculative : memref<64x64xf16, #gpu.address_space<global>>
@@ -18,6 +20,7 @@ func.func @glb_prefetch1(%src : memref<64x64xf16, #gpu.address_space<global>>, %
 
 // CHECK-LABEL: @glb_prefetch2
 func.func @glb_prefetch2(%src : memref<64x64xf16, #gpu.address_space<global>>, %i : i64, %j : i64) {
+  // CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
   // CHECK: %[[PTR:.*]] = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
   // CHECK: rocdl.global.prefetch %{{.*}}, scope 16 : !llvm.ptr<1>
   amdgpu.global_prefetch %src[%i, %j] RT DEV speculative : memref<64x64xf16, #gpu.address_space<global>>
diff --git a/mlir/test/Conversion/AMDGPUToROCDL/load_lds-gfx950.mlir b/mlir/test/Conversion/AMDGPUToROCDL/load_lds-gfx950.mlir
index 5bbbf8405105e..bab8703e08308 100644
--- a/mlir/test/Conversion/AMDGPUToROCDL/load_lds-gfx950.mlir
+++ b/mlir/test/Conversion/AMDGPUToROCDL/load_lds-gfx950.mlir
@@ -19,14 +19,18 @@ func.func @fat_buffer_load_to_rocdl_f96(%global : memref<128x72xf32, #amdgpu.add
 
   // GFX950: %[[ALLOC:.*]] = memref.alloc()
   // GFX950: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast
-  // GFX950: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[BUFFER_DESC]][1]
+  // GFX950: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[BUFFER_DESC]][1]
+  // GFX950: %[[GLOBAL_DESC_OFFSET:.*]] = llvm.extractvalue %[[BUFFER_DESC]][2]
+  // GFX950: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_DESC_OFFSET]]]
 
   // GFX950: %[[C72:.*]] = llvm.mlir.constant(72 : index) : i64
   // GFX950: %[[MUL:.*]] = llvm.mul %[[IC12]], %[[C72]] : i64
   // GFX950: %[[SRC_OFFSET:.*]] = llvm.add %[[MUL]], %[[IC0]] : i64
 
   // GFX950: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRC_OFFSET]]]
-  // GFX950: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // GFX950: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // GFX950: %[[LDS_DESC_OFFSET:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // GFX950: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_DESC_OFFSET]]]
 
   // GFX950: %[[C64:.*]] = llvm.mlir.constant(64 : index) : i64
   // GFX950: %[[MUL_2:.*]] = llvm.mul %[[IC32]], %[[C64]] : i64
@@ -60,14 +64,18 @@ func.func @fat_buffer_load_to_rocdl_f128(%global : memref<128x72xf32, #amdgpu.ad
 
   // GFX950: %[[ALLOC:.*]] = memref.alloc()
   // GFX950: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast
-  // GFX950: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[BUFFER_DESC]][1]
+  // GFX950: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[BUFFER_DESC]][1]
+  // GFX950: %[[GLOBAL_DESC_OFFSET:.*]] = llvm.extractvalue %[[BUFFER_DESC]][2]
+  // GFX950: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_DESC_OFFSET]]]
 
   // GFX950: %[[C72:.*]] = llvm.mlir.constant(72 : index) : i64
   // GFX950: %[[MUL:.*]] = llvm.mul %[[IC12]], %[[C72]] : i64
   // GFX950: %[[SRC_OFFSET:.*]] = llvm.add %[[MUL]], %[[IC0]] : i64
 
   // GFX950: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRC_OFFSET]]]
-  // GFX950: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // GFX950: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // GFX950: %[[LDS_DESC_OFFSET:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // GFX950: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_DESC_OFFSET]]]
 
   // GFX950: %[[C64:.*]] = llvm.mlir.constant(64 : index) : i64
   // GFX950: %[[MUL_2:.*]] = llvm.mul %[[IC32]], %[[C64]] : i64
diff --git a/mlir/test/Conversion/AMDGPUToROCDL/load_lds.mlir b/mlir/test/Conversion/AMDGPUToROCDL/load_lds.mlir
index 1e1ef32126b7f..a51d7b95ce3f4 100644
--- a/mlir/test/Conversion/AMDGPUToROCDL/load_lds.mlir
+++ b/mlir/test/Conversion/AMDGPUToROCDL/load_lds.mlir
@@ -19,14 +19,18 @@ func.func @global_load_to_rocdl_f32(%global : memref<128x72xf32, #gpu.address_sp
 
   // CHECK: %[[ALLOC:.*]] = memref.alloc()
   // CHECK: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast
-  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_OFFSET_VAL:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][2]
+  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_OFFSET_VAL]]]
 
   // CHECK: %[[C72:.*]] = llvm.mlir.constant(72 : index) : i64
   // CHECK: %[[MUL:.*]] = llvm.mul %[[IC12]], %[[C72]] : i64
   // CHECK: %[[SRC_OFFSET:.*]] = llvm.add %[[MUL]], %[[IC0]] : i64
 
   // CHECK: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRC_OFFSET]]]
-  // CHECK: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_OFFSET_VAL:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // CHECK: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_OFFSET_VAL]]]
 
   // CHECK: %[[C64:.*]] = llvm.mlir.constant(64 : index) : i64
   // CHECK: %[[MUL_2:.*]] = llvm.mul %[[IC32]], %[[C64]] : i64
@@ -57,14 +61,18 @@ func.func @global_load_to_rocdl_wg_mem(%global : memref<128x72xf32>) {
 
   // CHECK: %[[ALLOC:.*]] = memref.alloc()
   // CHECK: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast
-  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_OFFSET_VAL:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][2]
+  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_OFFSET_VAL]]]
 
   // CHECK: %[[C72:.*]] = llvm.mlir.constant(72 : index) : i64
   // CHECK: %[[MUL:.*]] = llvm.mul %[[IC12]], %[[C72]] : i64
   // CHECK: %[[SRC_OFFSET:.*]] = llvm.add %[[MUL]], %[[IC0]] : i64
 
   // CHECK: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRC_OFFSET]]]
-  // CHECK: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_OFFSET_VAL:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // CHECK: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_OFFSET_VAL]]]
 
   // CHECK: %[[C64:.*]] = llvm.mlir.constant(64 : index) : i64
   // CHECK: %[[MUL_2:.*]] = llvm.mul %[[IC32]], %[[C64]] : i64
@@ -86,8 +94,12 @@ func.func @global_load_to_rocdl_0d(%global : memref<f32>) {
   // CHECK: %[[ALLOC:.*]] = memref.alloc()
   // CHECK: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast %[[ALLOC]] : memref<f32, #gpu.address_space<workgroup>> to !llvm.struct<(ptr<3>, ptr<3>, i64)>
 
-  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1] : !llvm.struct<(ptr, ptr, i64)>
-  // CHECK: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64)>
+  // CHECK: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1] : !llvm.struct<(ptr, ptr, i64)>
+  // CHECK: %[[GLOBAL_OFFSET_VAL:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][2] : !llvm.struct<(ptr, ptr, i64)>
+  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_OFFSET_VAL]]]
+  // CHECK: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64)>
+  // CHECK: %[[LDS_OFFSET_VAL:.*]] = llvm.extractvalue %[[LDS_DESC]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64)>
+  // CHECK: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_OFFSET_VAL]]]
 
   // CHECK: rocdl.load.to.lds %[[GLOBAL_BASE]], %[[LDS_BASE]], 4
   amdgpu.gather_to_lds %global[], %alloc[]
@@ -109,14 +121,18 @@ func.func @global_load_to_rocdl_i8(%global : memref<128x72xi8, #gpu.address_spac
 
   // CHECK: %[[ALLOC:.*]] = memref.alloc()
   // CHECK: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast %[[ALLOC]]
-  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_OFFSET_VAL:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][2]
+  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_OFFSET_VAL]]]
 
   // CHECK: %[[C72:.*]] = llvm.mlir.constant(72 : index) : i64
   // CHECK: %[[MUL:.*]] = llvm.mul %[[IC12]], %[[C72]] : i64
   // CHECK: %[[SRC_OFFSET:.*]] = llvm.add %[[MUL]], %[[IC0]] : i64
 
   // CHECK: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRC_OFFSET]]]
-  // CHECK: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_OFFSET_VAL:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // CHECK: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_OFFSET_VAL]]]
 
   // CHECK: %[[C64:.*]] = llvm.mlir.constant(64 : index) : i64
   // CHECK: %[[MUL_2:.*]] = llvm.mul %[[IC32]], %[[C64]] : i64
@@ -147,14 +163,18 @@ func.func @global_load_to_rocdl_vec(%global : memref<128x72xi16, #gpu.address_sp
 
   // CHECK: %[[ALLOC:.*]] = memref.alloc()
   // CHECK: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast %[[ALLOC]]
-  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_OFFSET_VAL:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][2]
+  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_OFFSET_VAL]]]
 
   // CHECK: %[[C72:.*]] = llvm.mlir.constant(72 : index) : i64
   // CHECK: %[[MUL:.*]] = llvm.mul %[[IC12]], %[[C72]] : i64
   // CHECK: %[[SRC_OFFSET:.*]] = llvm.add %[[MUL]], %[[IC0]] : i64
 
   // CHECK: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRC_OFFSET]]]
-  // CHECK: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_OFFSET_VAL:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // CHECK: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_OFFSET_VAL]]]
 
   // CHECK: %[[C128:.*]] = llvm.mlir.constant(128 : index) : i64
   // CHECK: %[[MUL_2:.*]] = llvm.mul %[[IC32]], %[[C128]] : i64
@@ -181,9 +201,13 @@ func.func @global_load_to_rocdl_dynamic_indices(%global : memref<512xi32, #gpu.a
   // CHECK: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast %[[ALLOC]]
   // CHECK: %[[C0:.*]] = arith.constant 0 : index
   // CHECK: %[[C0_I64:.*]] = builtin.unrealized_conversion_cast %[[C0]] : index to i64
-  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_OFFSET_VAL:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][2]
+  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_OFFSET_VAL]]]
   // CHECK: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRCIDX_CAST]]]
-  // CHECK: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_OFFSET_VAL:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // CHECK: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_OFFSET_VAL]]]
   // CHECK: %[[C64:.*]] = llvm.mlir.constant(64 : index) : i64
   // CHECK: %[[DSTIDX:.*]] = llvm.mul %[[DSTIDX_CAST]], %[[C64]] : i64
   // CHECK: %[[DSTIDX1:.*]] = llvm.add %[[DSTIDX]], %[[C0_I64]] : i64
@@ -214,14 +238,18 @@ func.func @fat_buffer_load_to_rocdl_f32(%global : memref<128x72xf32, #amdgpu.add
 
   // CHECK: %[[ALLOC:.*]] = memref.alloc()
   // CHECK: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast
-  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[BUFFER_DESC]][1]
+  // CHECK: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[BUFFER_DESC]][1]
+  // CHECK: %[[GLOBAL_OFFSET_VAL:.*]] = llvm.extractvalue %[[BUFFER_DESC]][2]
+  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_OFFSET_VAL]]]
 
   // CHECK: %[[C72:.*]] = llvm.mlir.constant(72 : index) : i64
   // CHECK: %[[MUL:.*]] = llvm.mul %[[IC12]], %[[C72]] : i64
   // CHECK: %[[SRC_OFFSET:.*]] = llvm.add %[[MUL]], %[[IC0]] : i64
 
   // CHECK: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRC_OFFSET]]]
-  // CHECK: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_OFFSET_VAL:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // CHECK: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_OFFSET_VAL]]]
 
   // CHECK: %[[C64:.*]] = llvm.mlir.constant(64 : index) : i64
   // CHECK: %[[MUL_2:.*]] = llvm.mul %[[IC32]], %[[C64]] : i64
@@ -252,14 +280,18 @@ func.func @global_load_to_rocdl_async_f32(%global : memref<128x72xf32, #gpu.addr
 
   // CHECK: %[[ALLOC:.*]] = memref.alloc()
   // CHECK: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast
-  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_OFFSET_VAL:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][2]
+  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_OFFSET_VAL]]]
 
   // CHECK: %[[C72:.*]] = llvm.mlir.constant(72 : index) : i64
   // CHECK: %[[MUL:.*]] = llvm.mul %[[IC12]], %[[C72]] : i64
   // CHECK: %[[SRC_OFFSET:.*]] = llvm.add %[[MUL]], %[[IC0]] : i64
 
   // CHECK: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRC_OFFSET]]]
-  // CHECK: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_OFFSET_VAL:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // CHECK: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_OFFSET_VAL]]]
 
   // CHECK: %[[C64:.*]] = llvm.mlir.constant(64 : index) : i64
   // CHECK: %[[MUL_2:.*]] = llvm.mul %[[IC32]], %[[C64]] : i64
@@ -290,14 +322,18 @@ func.func @global_load_to_rocdl_async_f32_fat_raw_buffer(%global : memref<128x72
 
   // CHECK: %[[ALLOC:.*]] = memref.alloc()
   // CHECK: %[[LDS_DESC:.*]] = builtin.unrealized_conversion_cast
-  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_ALIGNED:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][1]
+  // CHECK: %[[GLOBAL_OFFSET_VAL:.*]] = llvm.extractvalue %[[GLOBAL_DESC]][2]
+  // CHECK: %[[GLOBAL_BASE:.*]] = llvm.getelementptr %[[GLOBAL_ALIGNED]][%[[GLOBAL_OFFSET_VAL]]]
 
   // CHECK: %[[C72:.*]] = llvm.mlir.constant(72 : index) : i64
   // CHECK: %[[MUL:.*]] = llvm.mul %[[IC12]], %[[C72]] : i64
   // CHECK: %[[SRC_OFFSET:.*]] = llvm.add %[[MUL]], %[[IC0]] : i64
 
   // CHECK: %[[GLOBAL_PTR:.*]] = llvm.getelementptr %[[GLOBAL_BASE]][%[[SRC_OFFSET]]]
-  // CHECK: %[[LDS_BASE:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_ALIGNED:.*]] = llvm.extractvalue %[[LDS_DESC]][1]
+  // CHECK: %[[LDS_OFFSET_VAL:.*]] = llvm.extractvalue %[[LDS_DESC]][2]
+  // CHECK: %[[LDS_BASE:.*]] = llvm.getelementptr %[[LDS_ALIGNED]][%[[LDS_OFFSET_VAL]]]
 
   // CHECK: %[[C64:.*]] = llvm.mlir.constant(64 : index) : i64
   // CHECK: %[[MUL_2:.*]] = llvm.mul %[[IC32]], %[[C64]] : i64
diff --git a/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir b/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
index fa23c0b4fcc9b..2292313bf1402 100644
--- a/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
@@ -173,7 +173,9 @@ func.func @stdlib_aligned_alloc(%N : index) -> memref<32x18xf32> {
 func.func @mixed_load(%mixed : memref<42x?xf32>, %i : index, %j : index) {
 //   CHECK-DAG:  %[[I:.*]] = builtin.unrealized_conversion_cast %[[Iarg]]
 //   CHECK-DAG:  %[[J:.*]] = builtin.unrealized_conversion_cast %[[Jarg]]
-//       CHECK:  %[[ptr:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//       CHECK:  %[[aligned:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//  CHECK-NEXT:  %[[descOff:.*]] = llvm.extractvalue %[[ld]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//  CHECK-NEXT:  %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
 //  CHECK-NEXT:  %[[st0:.*]] = llvm.extractvalue %[[ld]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 //  CHECK-NEXT:  %[[offI:.*]] = llvm.mul %[[I]], %[[st0]] overflow<nsw, nuw> : i64
 //  CHECK-NEXT:  %[[off1:.*]] = llvm.add %[[offI]], %[[J]] overflow<nsw, nuw> : i64
@@ -190,7 +192,9 @@ func.func @mixed_load(%mixed : memref<42x?xf32>, %i : index, %j : index) {
 func.func @dynamic_load(%dynamic : memref<?x?xf32>, %i : index, %j : index) {
 //   CHECK-DAG:  %[[I:.*]] = builtin.unrealized_conversion_cast %[[Iarg]]
 //   CHECK-DAG:  %[[J:.*]] = builtin.unrealized_conversion_cast %[[Jarg]]
-//       CHECK:  %[[ptr:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//       CHECK:  %[[aligned:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//  CHECK-NEXT:  %[[descOff:.*]] = llvm.extractvalue %[[ld]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//  CHECK-NEXT:  %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
 //  CHECK-NEXT:  %[[st0:.*]] = llvm.extractvalue %[[ld]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 //  CHECK-NEXT:  %[[offI:.*]] = llvm.mul %[[I]], %[[st0]] overflow<nsw, nuw> : i64
 //  CHECK-NEXT:  %[[off1:.*]] = llvm.add %[[offI]], %[[J]] overflow<nsw, nuw> : i64
@@ -207,7 +211,9 @@ func.func @dynamic_load(%dynamic : memref<?x?xf32>, %i : index, %j : index) {
 func.func @prefetch(%A : memref<?x?xf32>, %i : index, %j : index) {
 //      CHECK-DAG:  %[[I:.*]] = builtin.unrealized_conversion_cast %[[Iarg]]
 //      CHECK-DAG:  %[[J:.*]] = builtin.unrealized_conversion_cast %[[Jarg]]
-//      CHECK:  %[[ptr:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//      CHECK:  %[[aligned:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK-NEXT:  %[[descOff:.*]] = llvm.extractvalue %[[ld]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK-NEXT:  %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
 // CHECK-NEXT:  %[[st0:.*]] = llvm.extractvalue %[[ld]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK-NEXT:  %[[offI:.*]] = llvm.mul %[[I]], %[[st0]] : i64
 // CHECK-NEXT:  %[[off1:.*]] = llvm.add %[[offI]], %[[J]] : i64
@@ -228,7 +234,9 @@ func.func @prefetch(%A : memref<?x?xf32>, %i : index, %j : index) {
 func.func @dynamic_store(%dynamic : memref<?x?xf32>, %i : index, %j : index, %val : f32) {
 //   CHECK-DAG:  %[[I:.*]] = builtin.unrealized_conversion_cast %[[Iarg]]
 //   CHECK-DAG:  %[[J:.*]] = builtin.unrealized_conversion_cast %[[Jarg]]
-//       CHECK:  %[[ptr:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//       CHECK:  %[[aligned:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//  CHECK-NEXT:  %[[descOff:.*]] = llvm.extractvalue %[[ld]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//  CHECK-NEXT:  %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
 //  CHECK-NEXT:  %[[st0:.*]] = llvm.extractvalue %[[ld]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 //  CHECK-NEXT:  %[[offI:.*]] = llvm.mul %[[I]], %[[st0]] overflow<nsw, nuw> : i64
 //  CHECK-NEXT:  %[[off1:.*]] = llvm.add %[[offI]], %[[J]] overflow<nsw, nuw> : i64
@@ -245,7 +253,9 @@ func.func @dynamic_store(%dynamic : memref<?x?xf32>, %i : index, %j : index, %va
 func.func @mixed_store(%mixed : memref<42x?xf32>, %i : index, %j : index, %val : f32) {
 //   CHECK-DAG:  %[[I:.*]] = builtin.unrealized_conversion_cast %[[Iarg]]
 //   CHECK-DAG:  %[[J:.*]] = builtin.unrealized_conversion_cast %[[Jarg]]
-//       CHECK:  %[[ptr:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//       CHECK:  %[[aligned:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//  CHECK-NEXT:  %[[descOff:.*]] = llvm.extractvalue %[[ld]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+//  CHECK-NEXT:  %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
 //  CHECK-NEXT:  %[[st0:.*]] = llvm.extractvalue %[[ld]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 //  CHECK-NEXT:  %[[offI:.*]] = llvm.mul %[[I]], %[[st0]] overflow<nsw, nuw> : i64
 //  CHECK-NEXT:  %[[off1:.*]] = llvm.add %[[offI]], %[[J]] overflow<nsw, nuw> : i64
diff --git a/mlir/test/Conversion/MemRefToLLVM/convert-static-memref-ops.mlir b/mlir/test/Conversion/MemRefToLLVM/convert-static-memref-ops.mlir
index 040a27e160557..d299d21b85c57 100644
--- a/mlir/test/Conversion/MemRefToLLVM/convert-static-memref-ops.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/convert-static-memref-ops.mlir
@@ -123,7 +123,9 @@ func.func @static_dealloc(%static: memref<10x8xf32>) {
 
 // CHECK-LABEL: func @zero_d_load
 func.func @zero_d_load(%arg0: memref<f32>) -> f32 {
-// CHECK: %[[ptr:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64)>
+// CHECK: %[[aligned:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64)>
+// CHECK: %[[descOff:.*]] = llvm.extractvalue %{{.*}}[2] : !llvm.struct<(ptr, ptr, i64)>
+// CHECK: %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
 // CHECK: %{{.*}} = llvm.load %[[ptr]] : !llvm.ptr -> f32
   %0 = memref.load %arg0[] : memref<f32>
   return %0 : f32
@@ -136,7 +138,9 @@ func.func @zero_d_load(%arg0: memref<f32>) -> f32 {
 func.func @static_load(%static : memref<10x42xf32>, %i : index, %j : index) {
 // CHECK-DAG:  %[[II:.*]] = builtin.unrealized_conversion_cast %[[I]]
 // CHECK-DAG:  %[[JJ:.*]] = builtin.unrealized_conversion_cast %[[J]]
-// CHECK:  %[[ptr:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:  %[[aligned:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:  %[[descOff:.*]] = llvm.extractvalue %{{.*}}[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:  %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
 // CHECK:  %[[st0:.*]] = llvm.mlir.constant(42 : index) : i64
 // CHECK:  %[[offI:.*]] = llvm.mul %[[II]], %[[st0]] overflow<nsw, nuw> : i64
 // CHECK:  %[[off1:.*]] = llvm.add %[[offI]], %[[JJ]] overflow<nsw, nuw> : i64
@@ -150,7 +154,9 @@ func.func @static_load(%static : memref<10x42xf32>, %i : index, %j : index) {
 
 // CHECK-LABEL: func @zero_d_store
 func.func @zero_d_store(%arg0: memref<f32>, %arg1: f32) {
-// CHECK: %[[ptr:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64)>
+// CHECK: %[[aligned:.*]] = llvm.extractvalue %[[ld:.*]][1] : !llvm.struct<(ptr, ptr, i64)>
+// CHECK: %[[descOff:.*]] = llvm.extractvalue %[[ld]][2] : !llvm.struct<(ptr, ptr, i64)>
+// CHECK: %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
 // CHECK: llvm.store %{{.*}}, %[[ptr]] : f32, !llvm.ptr
   memref.store %arg1, %arg0[] : memref<f32>
   return
@@ -164,7 +170,9 @@ func.func @zero_d_store(%arg0: memref<f32>, %arg1: f32) {
 func.func @static_store(%static : memref<10x42xf32>, %i : index, %j : index, %val : f32) {
 // CHECK-DAG: %[[II:.*]] = builtin.unrealized_conversion_cast %[[I]]
 // CHECK-DAG: %[[JJ:.*]] = builtin.unrealized_conversion_cast %[[J]]
-// CHECK: %[[ptr:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[aligned:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[descOff:.*]] = llvm.extractvalue %{{.*}}[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[ptr:.*]] = llvm.getelementptr %[[aligned]][%[[descOff]]]
 // CHECK: %[[st0:.*]] = llvm.mlir.constant(42 : index) : i64
 // CHECK: %[[offI:.*]] = llvm.mul %[[II]], %[[st0]] overflow<nsw, nuw> : i64
 // CHECK: %[[off1:.*]] = llvm.add %[[offI]], %[[JJ]] overflow<nsw, nuw> : i64
@@ -306,15 +314,19 @@ func.func @memref.reshape.dynamic.dim(%arg: memref<?x?x?xf32>, %shape: memref<4x
 
   // CHECK: %[[three_hundred_and_eighty_four:.*]] = llvm.mlir.constant(384 : index) : i64
   // CHECK: %[[one1:.*]] = llvm.mlir.constant(1 : index) : i64
-  // CHECK: %[[shape_ptr0:.*]] = llvm.extractvalue %[[shape_cast]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[shape_aligned0:.*]] = llvm.extractvalue %[[shape_cast]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[shape_descOff0:.*]] = llvm.extractvalue %[[shape_cast]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[shape_ptr0:.*]] = llvm.getelementptr %[[shape_aligned0]][%[[shape_descOff0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i64
   // CHECK: %[[shape_gep0:.*]] = llvm.getelementptr inbounds|nuw %[[shape_ptr0]][%[[one1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i64
   // CHECK: %[[shape_load0:.*]] = llvm.load %[[shape_gep0]] : !llvm.ptr -> i64
   // CHECK: %[[insert7:.*]] = llvm.insertvalue %[[shape_load0]], %[[insert6]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>
   // CHECK: %[[insert8:.*]] = llvm.insertvalue %[[three_hundred_and_eighty_four]], %[[insert7]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>
 
-  // CHECK: %[[mul:.*]] = llvm.mul %19, %23  : i64
+  // CHECK: %[[mul:.*]] = llvm.mul %[[three_hundred_and_eighty_four]], %[[shape_load0]]  : i64
   // CHECK: %[[zero1:.*]] = llvm.mlir.constant(0 : index) : i64
-  // CHECK: %[[shape_ptr1:.*]] = llvm.extractvalue %[[shape_cast]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[shape_aligned1:.*]] = llvm.extractvalue %[[shape_cast]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[shape_descOff1:.*]] = llvm.extractvalue %[[shape_cast]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[shape_ptr1:.*]] = llvm.getelementptr %[[shape_aligned1]][%[[shape_descOff1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i64
   // CHECK: %[[shape_gep1:.*]] = llvm.getelementptr inbounds|nuw %[[shape_ptr1]][%[[zero1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i64
   // CHECK: %[[shape_load1:.*]] = llvm.load %[[shape_gep1]] : !llvm.ptr -> i64
   // CHECK: %[[insert9:.*]] = llvm.insertvalue %[[shape_load1]], %[[insert8]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>
diff --git a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
index 3a0f85fad49b0..17c1e0ff6ad7d 100644
--- a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
@@ -184,7 +184,9 @@ func.func @subview(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1
 // CHECK-LABEL: func @assume_alignment(
 // CHECK-INTERFACE-LABEL: func @assume_alignment(
 func.func @assume_alignment(%0 : memref<4x4xf16>) {
-  // CHECK: %[[PTR:.*]] = llvm.extractvalue %[[MEMREF:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[ALIGNED:.*]] = llvm.extractvalue %[[MEMREF:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-NEXT: %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-NEXT: %[[PTR:.*]] = llvm.getelementptr %[[ALIGNED]][%[[DESC_OFF]]]
   // CHECK-NEXT: %[[TRUE:.*]] = llvm.mlir.constant(true) : i1
   // CHECK-NEXT: %[[ALIGN:.*]] = llvm.mlir.constant(16 : index) : i64
   // CHECK-NEXT: llvm.intr.assume %[[TRUE]] ["align"(%[[PTR]], %[[ALIGN]] : !llvm.ptr, i64)] : i1
@@ -201,9 +203,15 @@ func.func @distinct_objects(%arg0: memref<?xf16>, %arg1: memref<?xf32>, %arg2: m
 //   ALL-DAG:   %[[CAST_0:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?xf16> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 //   ALL-DAG:   %[[CAST_1:.*]] = builtin.unrealized_conversion_cast %[[ARG1]] : memref<?xf32> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 //   ALL-DAG:   %[[CAST_2:.*]] = builtin.unrealized_conversion_cast %[[ARG2]] : memref<?xf64> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-//       ALL:   %[[PTR_0:.*]] = llvm.extractvalue %[[CAST_0]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-//       ALL:   %[[PTR_1:.*]] = llvm.extractvalue %[[CAST_1]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-//       ALL:   %[[PTR_2:.*]] = llvm.extractvalue %[[CAST_2]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+//       ALL:   %[[ALIGNED_0:.*]] = llvm.extractvalue %[[CAST_0]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+//       ALL:   %[[OFF_0:.*]] = llvm.extractvalue %[[CAST_0]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+//       ALL:   %[[PTR_0:.*]] = llvm.getelementptr %[[ALIGNED_0]][%[[OFF_0]]]
+//       ALL:   %[[ALIGNED_1:.*]] = llvm.extractvalue %[[CAST_1]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+//       ALL:   %[[OFF_1:.*]] = llvm.extractvalue %[[CAST_1]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+//       ALL:   %[[PTR_1:.*]] = llvm.getelementptr %[[ALIGNED_1]][%[[OFF_1]]]
+//       ALL:   %[[ALIGNED_2:.*]] = llvm.extractvalue %[[CAST_2]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+//       ALL:   %[[OFF_2:.*]] = llvm.extractvalue %[[CAST_2]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+//       ALL:   %[[PTR_2:.*]] = llvm.getelementptr %[[ALIGNED_2]][%[[OFF_2]]]
 //       ALL:   %[[TRUE:.*]] = llvm.mlir.constant(true) : i1
 //       ALL:   llvm.intr.assume %[[TRUE]] ["separate_storage"(%[[PTR_0]], %[[PTR_1]] : !llvm.ptr, !llvm.ptr)] : i1
 //       ALL:   llvm.intr.assume %[[TRUE]] ["separate_storage"(%[[PTR_0]], %[[PTR_2]] : !llvm.ptr, !llvm.ptr)] : i1
@@ -228,7 +236,9 @@ func.func @distinct_objects_noop(%arg0: memref<?xf16>) -> memref<?xf16> {
 // CHECK-LABEL: func @assume_alignment_w_offset
 // CHECK-INTERFACE-LABEL: func @assume_alignment_w_offset
 func.func @assume_alignment_w_offset(%0 : memref<4x4xf16, strided<[?, ?]>>) {
-  // CHECK-DAG: %[[PTR:.*]] = llvm.extractvalue %[[MEMREF:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[ALIGNED:.*]] = llvm.extractvalue %[[MEMREF:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[PTR:.*]] = llvm.getelementptr %[[ALIGNED]][%[[DESC_OFF]]]
   // CHECK-DAG: %[[TRUE:.*]] = llvm.mlir.constant(true) : i1
   // CHECK-DAG: %[[ALIGN:.*]] = llvm.mlir.constant(16 : index) : i64
   // CHECK: llvm.intr.assume %[[TRUE]] ["align"(%[[PTR]], %[[ALIGN]] : !llvm.ptr, i64)] : i1
@@ -510,7 +520,9 @@ func.func @atomic_rmw_with_offset(%I : memref<10xi32, strided<[1]>>, %ival : i32
 // CHECK-SAME:   %[[ARG2:.+]]: index
 // CHECK-DAG:    %[[MEMREF_STRUCT:.+]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<10xi32, strided<[1]>> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 // CHECK-DAG:    %[[INDEX:.+]] = builtin.unrealized_conversion_cast %[[ARG2]] : index to i64
-// CHECK:        %[[BASE_PTR:.+]] = llvm.extractvalue %[[MEMREF_STRUCT]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK:        %[[ALIGNED_PTR:.+]] = llvm.extractvalue %[[MEMREF_STRUCT]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK:        %[[DESC_OFF:.+]] = llvm.extractvalue %[[MEMREF_STRUCT]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK:        %[[BASE_PTR:.+]] = llvm.getelementptr %[[ALIGNED_PTR]][%[[DESC_OFF]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
 // CHECK:        %[[PTR:.+]] = llvm.getelementptr %[[BASE_PTR]][%[[INDEX]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
 // CHECK:        llvm.atomicrmw _and %[[PTR]], %[[ARG1]] acq_rel
 

>From 436d6ea4db8e9622a832b92cf01ea984942be746 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 05:11:28 +0200
Subject: [PATCH 20/27] [WIP][mlir] step 2 follow-ups: expand-strided-metadata
 CHECK fixes

CHECK lines updated for the new behavior where extract_strided_metadata
returns the runtime offset and the OFFSET_MAP affine map gains an extra
symbol for the source offset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
---
 .../MemRef/expand-strided-metadata.mlir       | 65 +++++++++----------
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
index be2fc5ac1ee49..de197d4b61324 100644
--- a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
+++ b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
@@ -38,7 +38,7 @@ func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4,
 // ==> 1 affine map with (rank * 2 + 1) symbols
 //
 // CHECK-DAG: #[[$STRIDE_MAP:.*]] = affine_map<()[s0, s1] -> (s0 * s1)>
-// CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 * s1 + s2 * s3 + s4 * s5)>
+// CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6] -> (s0 + s1 * s2 + s3 * s4 + s5 * s6)>
 // CHECK-LABEL: func @simplify_subview_all_dynamic
 //  CHECK-SAME: (%[[ARG:.*]]: memref<?x?x?xf32, strided<[?, ?, ?]>>, %[[DYN_OFFSET0:.*]]: index, %[[DYN_OFFSET1:.*]]: index, %[[DYN_OFFSET2:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_SIZE2:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index, %[[DYN_STRIDE2:.*]]: index)
 //
@@ -48,7 +48,7 @@ func.func @extract_strided_metadata_constants(%base: memref<5x4xf32, strided<[4,
 //  CHECK-DAG: %[[FINAL_STRIDE1:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE1]], %[[STRIDES]]#1]
 //  CHECK-DAG: %[[FINAL_STRIDE2:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE2]], %[[STRIDES]]#2]
 //
-//  CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[DYN_OFFSET0]], %[[STRIDES]]#0, %[[DYN_OFFSET1]], %[[STRIDES]]#1, %[[DYN_OFFSET2]], %[[STRIDES]]#2]
+//  CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[OFFSET]], %[[DYN_OFFSET0]], %[[STRIDES]]#0, %[[DYN_OFFSET1]], %[[STRIDES]]#1, %[[DYN_OFFSET2]], %[[STRIDES]]#2]
 //
 //      CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[FINAL_OFFSET]]], sizes: [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]], strides: [%[[FINAL_STRIDE0]], %[[FINAL_STRIDE1]], %[[FINAL_STRIDE2]]]
 //
@@ -79,6 +79,7 @@ func.func @simplify_subview_all_dynamic(
 // This test also checks that we don't create useless arith operations
 // when subview_offsets_i is 0.
 //
+// CHECK-DAG: #[[$ADD2_MAP:.*]] = affine_map<()[s0] -> (s0 + 2)>
 // CHECK-LABEL: func @extract_strided_metadata_of_subview
 //  CHECK-SAME: (%[[ARG:.*]]: memref<5x4xf32>)
 //
@@ -91,13 +92,15 @@ func.func @simplify_subview_all_dynamic(
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]]
 //
 // Final offset is:
-//   origOffset + (== 0)
+//   origOffset +
 //   base_stride0 * subview_offset0 + (== 4 * 0 == 0)
 //   base_stride1 * subview_offset1 (== 1 * 2)
-//  == 2
+//  == origOffset + 2
+//
+//       CHECK: %[[FINAL_OFFSET:.*]] = affine.apply #[[$ADD2_MAP]]()[%[[OFFSET]]]
 //
 // Return the new tuple.
-//       CHECK: return %[[BASE]], %[[C2]], %[[C2]], %[[C2]], %[[C4]], %[[C1]]
+//       CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[C2]], %[[C2]], %[[C4]], %[[C1]]
 func.func @extract_strided_metadata_of_subview(%base: memref<5x4xf32>)
     -> (memref<f32>, index, index, index, index, index) {
 
@@ -128,11 +131,11 @@ func.func @extract_strided_metadata_of_subview(%base: memref<5x4xf32>)
 //
 // Final sizes == subview sizes == [%size, 6, 3]
 //
+// CHECK-DAG: #[[$ADD1250_MAP:.*]] = affine_map<()[s0] -> (s0 + 1250)>
 // CHECK-LABEL: func @extract_strided_metadata_of_subview_with_dynamic_size
 //  CHECK-SAME: (%[[ARG:.*]]: memref<8x16x24xf32>,
 //  CHECK-SAME: %[[DYN_SIZE:.*]]: index)
 //
-//   CHECK-DAG: %[[C1250:.*]] = arith.constant 1250 : index
 //   CHECK-DAG: %[[C384:.*]] = arith.constant 384 : index
 //   CHECK-DAG: %[[C6:.*]] = arith.constant 6 : index
 //   CHECK-DAG: %[[C24:.*]] = arith.constant 24 : index
@@ -140,8 +143,9 @@ func.func @extract_strided_metadata_of_subview(%base: memref<5x4xf32>)
 //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
 //
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[ARG]]
+//       CHECK: %[[FINAL_OFFSET:.*]] = affine.apply #[[$ADD1250_MAP]]()[%[[OFFSET]]]
 //
-//       CHECK: return %[[BASE]], %[[C1250]], %[[DYN_SIZE]], %[[C6]], %[[C3]], %[[C384]], %[[C24]], %[[C1]]
+//       CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[DYN_SIZE]], %[[C6]], %[[C3]], %[[C384]], %[[C24]], %[[C1]]
 func.func @extract_strided_metadata_of_subview_with_dynamic_size(
     %base: memref<8x16x24xf32>, %size: index)
     -> (memref<f32>, index, index, index, index, index, index, index) {
@@ -177,18 +181,19 @@ func.func @extract_strided_metadata_of_subview_with_dynamic_size(
 //
 // Final sizes == filterOutReducedDim(subview sizes, 0) == [6, 3]
 //
+// CHECK-DAG: #[[$ADD1250B_MAP:.*]] = affine_map<()[s0] -> (s0 + 1250)>
 // CHECK-LABEL: func @extract_strided_metadata_of_rank_reduced_subview
 //  CHECK-SAME: (%[[ARG:.*]]: memref<8x16x24xf32>)
 //
-//   CHECK-DAG: %[[C1250:.*]] = arith.constant 1250 : index
 //   CHECK-DAG: %[[C6:.*]] = arith.constant 6 : index
 //   CHECK-DAG: %[[C24:.*]] = arith.constant 24 : index
 //   CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
 //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
 //
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[ARG]]
+//       CHECK: %[[FINAL_OFFSET:.*]] = affine.apply #[[$ADD1250B_MAP]]()[%[[OFFSET]]]
 //
-//       CHECK: return %[[BASE]], %[[C1250]], %[[C6]], %[[C3]], %[[C24]], %[[C1]]
+//       CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[C6]], %[[C3]], %[[C24]], %[[C1]]
 func.func @extract_strided_metadata_of_rank_reduced_subview(%base: memref<8x16x24xf32>)
     -> (memref<f32>, index, index, index, index, index) {
 
@@ -224,11 +229,11 @@ func.func @extract_strided_metadata_of_rank_reduced_subview(%base: memref<8x16x2
 // => Final offset: 3 * 384 + 4 * 24 + 2 * 1 + 0 == 1250
 //
 //   CHECK-DAG: #[[$STRIDE1_MAP:.*]] = affine_map<()[s0] -> (s0 * 24)>
+//   CHECK-DAG: #[[$ADD1250C_MAP:.*]] = affine_map<()[s0] -> (s0 + 1250)>
 // CHECK-LABEL: func @extract_strided_metadata_of_rank_reduced_subview_w_variable_strides
 //  CHECK-SAME: (%[[ARG:.*]]: memref<8x16x24xf32>,
 //  CHECK-SAME: %[[DYN_STRIDE:.*]]: index)
 //
-//   CHECK-DAG: %[[C1250:.*]] = arith.constant 1250 : index
 //   CHECK-DAG: %[[C6:.*]] = arith.constant 6 : index
 //   CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
 //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
@@ -236,8 +241,9 @@ func.func @extract_strided_metadata_of_rank_reduced_subview(%base: memref<8x16x2
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[ARG]]
 //
 //   CHECK-DAG: %[[DIM1_STRIDE:.*]] = affine.apply #[[$STRIDE1_MAP]]()[%[[DYN_STRIDE]]]
+//   CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$ADD1250C_MAP]]()[%[[OFFSET]]]
 //
-//       CHECK: return %[[BASE]], %[[C1250]], %[[C6]], %[[C3]], %[[DIM1_STRIDE]], %[[C1]]
+//       CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[C6]], %[[C3]], %[[DIM1_STRIDE]], %[[C1]]
 func.func @extract_strided_metadata_of_rank_reduced_subview_w_variable_strides(
     %base: memref<8x16x24xf32>, %stride: index)
     -> (memref<f32>, index, index, index, index, index) {
@@ -268,7 +274,7 @@ func.func @extract_strided_metadata_of_rank_reduced_subview_w_variable_strides(
 // Sub offsets: [%arg1, %arg2]
 // => Final offset: 128 * arg1 + 1 * %arg2 + 0
 //
-//   CHECK-DAG: #[[$OFFSETS_MAP:.*]] = affine_map<()[s0, s1] -> (s0 * 128 + s1)>
+//   CHECK-DAG: #[[$OFFSETS_MAP:.*]] = affine_map<()[s0, s1, s2] -> (s0 + s1 * 128 + s2)>
 // CHECK-LABEL: func @extract_strided_metadata_of_subview_w_variable_offset
 //  CHECK-SAME: (%[[ARG:.*]]: memref<384x128xf32>,
 //  CHECK-SAME: %[[DYN_OFFSET0:.*]]: index,
@@ -279,7 +285,7 @@ func.func @extract_strided_metadata_of_rank_reduced_subview_w_variable_strides(
 //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]]
 //
-//   CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSETS_MAP]]()[%[[DYN_OFFSET0]], %[[DYN_OFFSET1]]]
+//   CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSETS_MAP]]()[%[[OFFSET]], %[[DYN_OFFSET0]], %[[DYN_OFFSET1]]]
 //
 //       CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[C64]], %[[C64]], %[[C128]], %[[C1]]
 func.func @extract_strided_metadata_of_subview_w_variable_offset(
@@ -315,7 +321,7 @@ func.func @extract_strided_metadata_of_subview_w_variable_offset(
 // ==> 1 affine map with (rank * 2 + 1) symbols
 //
 // CHECK-DAG: #[[$STRIDE_MAP:.*]] = affine_map<()[s0, s1] -> (s0 * s1)>
-// CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 * s1 + s2 * s3 + s4 * s5)>
+// CHECK-DAG: #[[$OFFSET_MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6] -> (s0 + s1 * s2 + s3 * s4 + s5 * s6)>
 // CHECK-LABEL: func @extract_strided_metadata_of_subview_all_dynamic
 //  CHECK-SAME: (%[[ARG:.*]]: memref<?x?x?xf32, strided<[?, ?, ?]>>, %[[DYN_OFFSET0:.*]]: index, %[[DYN_OFFSET1:.*]]: index, %[[DYN_OFFSET2:.*]]: index, %[[DYN_SIZE0:.*]]: index, %[[DYN_SIZE1:.*]]: index, %[[DYN_SIZE2:.*]]: index, %[[DYN_STRIDE0:.*]]: index, %[[DYN_STRIDE1:.*]]: index, %[[DYN_STRIDE2:.*]]: index)
 //
@@ -325,7 +331,7 @@ func.func @extract_strided_metadata_of_subview_w_variable_offset(
 //  CHECK-DAG: %[[FINAL_STRIDE1:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE1]], %[[STRIDES]]#1]
 //  CHECK-DAG: %[[FINAL_STRIDE2:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE2]], %[[STRIDES]]#2]
 //
-//  CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[DYN_OFFSET0]], %[[STRIDES]]#0, %[[DYN_OFFSET1]], %[[STRIDES]]#1, %[[DYN_OFFSET2]], %[[STRIDES]]#2]
+//  CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[OFFSET]], %[[DYN_OFFSET0]], %[[STRIDES]]#0, %[[DYN_OFFSET1]], %[[STRIDES]]#1, %[[DYN_OFFSET2]], %[[STRIDES]]#2]
 //
 //       CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]], %[[FINAL_STRIDE0]], %[[FINAL_STRIDE1]], %[[FINAL_STRIDE2]]
 func.func @extract_strided_metadata_of_subview_all_dynamic(
@@ -402,7 +408,7 @@ func.func @extract_strided_metadata_of_subview_all_dynamic(
 //   CHECK-DAG: %[[DYN_STRIDE5:.*]] = affine.apply #[[$DIM5_STRIDE_MAP]]()[%[[SIZE1]], %[[STRIDES]]#1]
 //   CHECK-DAG: %[[DYN_STRIDE6:.*]] = affine.apply #[[$DIM6_STRIDE_MAP]]()[%[[STRIDES]]#1]
 //
-//   CHECK-DAG: %[[REINTERPRET_CAST:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [0], sizes: [%[[SIZE0]], 7, 8, 9, 10, 2, %[[SIZE1]], 3], strides: [%[[DYN_STRIDE0]], %[[DYN_STRIDE1]], %[[DYN_STRIDE2]], %[[STRIDES]]#0, %[[DYN_STRIDE4]], %[[DYN_STRIDE5]], %[[DYN_STRIDE6]], %[[STRIDES]]#1]
+//   CHECK-DAG: %[[REINTERPRET_CAST:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [%[[SIZE0]], 7, 8, 9, 10, 2, %[[SIZE1]], 3], strides: [%[[DYN_STRIDE0]], %[[DYN_STRIDE1]], %[[DYN_STRIDE2]], %[[STRIDES]]#0, %[[DYN_STRIDE4]], %[[DYN_STRIDE5]], %[[DYN_STRIDE6]], %[[STRIDES]]#1]
 //
 //   CHECK: return %[[REINTERPRET_CAST]]
 func.func @simplify_expand_shape(
@@ -460,11 +466,10 @@ func.func @simplify_expand_shape(
 //   CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
 //   CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
 //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
-//   CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
 //
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<30x4xi16> -> memref<i16>, index, index, index, index, index
 //
-//   CHECK: return %[[BASE]], %[[C0]], %[[C3]], %[[C5]], %[[C2]], %[[C2]], %[[C2]], %[[C40]], %[[C8]], %[[C4]], %[[C2]], %[[C1]] : memref<i16>, index, index, index, index, index, index, index, index, index, index, index
+//   CHECK: return %[[BASE]], %[[OFFSET]], %[[C3]], %[[C5]], %[[C2]], %[[C2]], %[[C2]], %[[C40]], %[[C8]], %[[C4]], %[[C2]], %[[C1]] : memref<i16>, index, index, index, index, index, index, index, index, index, index, index
 func.func @extract_strided_metadata_of_expand_shape_all_static(
     %arg : memref<30x4xi16>)
     -> (memref<i16>, index,
@@ -534,7 +539,6 @@ func.func @extract_strided_metadata_of_expand_shape_all_static(
 //  CHECK-SAME: (%[[ARG:.*]]: memref<?x?xf32,
 //  CHECK-SAME: %[[SIZE0:.*]]: index,  %[[SIZE1:.*]]: index, %[[SIZE2:.*]]: index,  %[[SIZE3:.*]]: index)
 //
-//   CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
 //   CHECK-DAG: %[[C10:.*]] = arith.constant 10 : index
 //   CHECK-DAG: %[[C9:.*]] = arith.constant 9 : index
 //   CHECK-DAG: %[[C8:.*]] = arith.constant 8 : index
@@ -549,7 +553,7 @@ func.func @extract_strided_metadata_of_expand_shape_all_static(
 //   CHECK-DAG: %[[DYN_STRIDE5:.*]] = affine.apply #[[$DIM5_STRIDE_MAP]]()[%[[SIZE3]], %[[STRIDES]]#1]
 //   CHECK-DAG: %[[DYN_STRIDE6:.*]] = affine.apply #[[$DIM6_STRIDE_MAP]]()[%[[STRIDES]]#1]
 
-//   CHECK: return %[[BASE]], %[[C0]], %[[SIZE0]], %[[SIZE1]], %[[C8]], %[[C9]], %[[C10]], %[[SIZE2]], %[[SIZE3]], %[[C3]], %[[DYN_STRIDE0]], %[[DYN_STRIDE1]], %[[DYN_STRIDE2]], %[[STRIDES]]#0, %[[DYN_STRIDE4]], %[[DYN_STRIDE5]], %[[DYN_STRIDE6]], %[[STRIDES]]#1 : memref<f32>, index, index, index, index, index, index, index, index, index, index, index, index, index
+//   CHECK: return %[[BASE]], %[[OFFSET]], %[[SIZE0]], %[[SIZE1]], %[[C8]], %[[C9]], %[[C10]], %[[SIZE2]], %[[SIZE3]], %[[C3]], %[[DYN_STRIDE0]], %[[DYN_STRIDE1]], %[[DYN_STRIDE2]], %[[STRIDES]]#0, %[[DYN_STRIDE4]], %[[DYN_STRIDE5]], %[[DYN_STRIDE6]], %[[STRIDES]]#1 : memref<f32>, index, index, index, index, index, index, index, index, index, index, index, index, index
 func.func @extract_strided_metadata_of_expand_shape_all_dynamic(
     %base: memref<?x?xf32, strided<[?,?]>>,
     %sz0: index, %sz1: index, %sz2: index, %sz3: index)
@@ -588,12 +592,11 @@ func.func @extract_strided_metadata_of_expand_shape_all_dynamic(
 // CHECK-LABEL: func @extract_strided_metadata_of_expand_shape_all_static_0_rank
 //  CHECK-SAME: (%[[ARG:.*]]: memref<i16, strided<[]>>)
 //
-//   CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
 //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
 //
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]] = memref.extract_strided_metadata %[[ARG]] : memref<i16, strided<[]>> -> memref<i16>, index
 //
-//   CHECK: return %[[BASE]], %[[C0]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]] : memref<i16>, index, index, index, index, index, index, index, index, index, index, index
+//   CHECK: return %[[BASE]], %[[OFFSET]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]], %[[C1]] : memref<i16>, index, index, index, index, index, index, index, index, index, index, index
 func.func @extract_strided_metadata_of_expand_shape_all_static_0_rank(
     %arg : memref<i16, strided<[]>>)
     -> (memref<i16>, index,
@@ -894,7 +897,7 @@ func.func @extract_aligned_pointer_as_index_of_unranked_source(%arg0: memref<*xf
 //
 //       CHECK: %[[DYN_SIZE1:.*]] = affine.apply #[[$SIZE0_MAP]]()[%[[SIZES]]#1, %[[SIZES]]#3]
 //
-//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [0], sizes: [%[[SIZES]]#0, %[[DYN_SIZE1]], 42], strides: [%[[STRIDES]]#0, 42, 1]
+//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [%[[SIZES]]#0, %[[DYN_SIZE1]], 42], strides: [%[[STRIDES]]#0, 42, 1]
 func.func @simplify_collapse(%arg : memref<?x?x4x?x6x7xi32>)
   -> memref<?x?x42xi32> {
 
@@ -934,7 +937,7 @@ func.func @simplify_collapse(%arg : memref<?x?x4x?x6x7xi32>)
 //       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<3x1xf32, strided<[2, 1]>>
 //
 //
-//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [0], sizes: [3], strides: [2]
+//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [3], strides: [2]
 func.func @simplify_collapse_with_dim_of_size1(%arg0: memref<3x1xf32, strided<[2,1]>>, %arg1: memref<3xf32>) {
 
   %collapse_shape = memref.collapse_shape %arg0 [[0, 1]] :
@@ -961,7 +964,7 @@ func.func @simplify_collapse_with_dim_of_size1(%arg0: memref<3x1xf32, strided<[2
 //
 //       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:2, %[[STRIDES:.*]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<1x1xi32, strided<[2, 1]>>
 //
-//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [0], sizes: [1], strides: [2]
+//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [1], strides: [2]
 func.func @simplify_collapse_with_dim_of_size1_and_non_1_stride
     (%arg0: memref<1x1xi32, strided<[2, 1]>>)
     -> memref<1xi32, strided<[2]>> {
@@ -1002,7 +1005,7 @@ func.func @simplify_collapse_with_dim_of_size1_and_non_1_stride
 //
 //       CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:5, %[[STRIDES:.*]]:5 = memref.extract_strided_metadata %[[ARG]] : memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2]>>
 //
-//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [0], sizes: [6, 1], strides: [%[[STRIDES]]#1, %[[STRIDES]]#2]
+//       CHECK: %[[COLLAPSE_VIEW:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[OFFSET]]], sizes: [6, 1], strides: [%[[STRIDES]]#1, %[[STRIDES]]#2]
 func.func @simplify_collapse_with_dim_of_size1_and_resulting_dyn_stride
     (%arg0: memref<2x3x1x1x1xi32, strided<[?, ?, ?, ?, 2]>>)
     -> memref<6x1xi32, strided<[?, ?]>> {
@@ -1037,13 +1040,12 @@ func.func @simplify_collapse_with_dim_of_size1_and_resulting_dyn_stride
 //
 //   CHECK-DAG: %[[C42:.*]] = arith.constant 42 : index
 //   CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
-//   CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
 //
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:6, %[[STRIDES:.*]]:6 = memref.extract_strided_metadata %[[ARG]] : memref<?x?x4x?x6x7xi32>
 //
 //   CHECK-DAG: %[[DYN_SIZE1:.*]] = affine.apply #[[$SIZE0_MAP]]()[%[[SIZES]]#1, %[[SIZES]]#3]
 //
-//       CHECK: return %[[BASE]], %[[C0]], %[[SIZES]]#0, %[[DYN_SIZE1]], %[[C42]], %[[STRIDES]]#0, %[[C42]], %[[C1]]
+//       CHECK: return %[[BASE]], %[[OFFSET]], %[[SIZES]]#0, %[[DYN_SIZE1]], %[[C42]], %[[STRIDES]]#0, %[[C42]], %[[C1]]
 func.func @extract_strided_metadata_of_collapse(%arg : memref<?x?x4x?x6x7xi32>)
   -> (memref<i32>, index,
       index, index, index,
@@ -1074,11 +1076,9 @@ func.func @extract_strided_metadata_of_collapse(%arg : memref<?x?x4x?x6x7xi32>)
 // CHECK-LABEL: func @extract_strided_metadata_of_collapse_to_rank0(
 //  CHECK-SAME: %[[ARG:.*]]: memref<1x1x1x1x1x1xi32>)
 //
-//   CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-//
 //   CHECK-DAG: %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:6, %[[STRIDES:.*]]:6 = memref.extract_strided_metadata %[[ARG]] : memref<1x1x1x1x1x1xi32>
 //
-//       CHECK: return %[[BASE]], %[[C0]]
+//       CHECK: return %[[BASE]], %[[OFFSET]]
 func.func @extract_strided_metadata_of_collapse_to_rank0(%arg : memref<1x1x1x1x1x1xi32>)
   -> (memref<i32>, index) {
 
@@ -1367,10 +1367,9 @@ func.func @extract_strided_metadata_of_collapse_shape(%base: memref<5x4xf32>)
 }
 
 // CHECK-LABEL:  func @extract_strided_metadata_of_collapse_shape
-//   CHECK-DAG:    %[[OFFSET:.*]] = arith.constant 0 : index
 //   CHECK-DAG:    %[[SIZE:.*]] = arith.constant 20 : index
 //   CHECK-DAG:    %[[STEP:.*]] = arith.constant 1 : index
-//       CHECK:    %[[BASE:.*]], %{{.*}}, %{{.*}}, %{{.*}} = memref.extract_strided_metadata
+//       CHECK:    %[[BASE:.*]], %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata
 //       CHECK:    return %[[BASE]], %[[OFFSET]], %[[SIZE]], %[[STEP]] : memref<f32>, index, index, index
 
 // -----

>From 833debefc9453df3270b9a084d20f3811af818cb Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 05:24:21 +0200
Subject: [PATCH 21/27] [WIP][mlir] step 2 follow-ups:
 expand-then-convert-to-llvm CHECK fixes

Updated CHECK lines for the new IR shape: SubView/ReinterpretCast lowering
now extracts the source memref's runtime offset (descriptor[2]) and includes
it in the offset computation, both for the new descriptor's offset field
and for the bufferPtr computation in load/store/assume_alignment.

All Conversion/MemRefToLLVM, Conversion/AMDGPUToROCDL, and Dialect/MemRef
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
---
 .../expand-then-convert-to-llvm.mlir          | 94 ++++++++++++-------
 1 file changed, 62 insertions(+), 32 deletions(-)

diff --git a/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir b/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
index c9158cea321de..c84f6162bc768 100644
--- a/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/expand-then-convert-to-llvm.mlir
@@ -59,11 +59,13 @@ func.func @subview(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1
 
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64
   // CHECK: %[[STRIDE0:.*]] = llvm.mlir.constant(4 : index) : i64
   // CHECK: %[[DESCSTRIDE0:.*]] = llvm.mul %[[ARG0]], %[[STRIDE0]] overflow<nsw> : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[DESCSTRIDE0]] : i64 to index
   // CHECK: %[[DESCSTRIDE0_V2:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
-  // CHECK: %[[OFF2:.*]] = llvm.add %[[DESCSTRIDE0]], %[[ARG1]] : i64
+  // CHECK: %[[OFF1:.*]] = llvm.add %[[SRC_OFF]], %[[DESCSTRIDE0]] : i64
+  // CHECK: %[[OFF2:.*]] = llvm.add %[[OFF1]], %[[ARG1]] : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[OFF2]] : i64 to index
   // CHECK: %[[OFF2:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -95,11 +97,13 @@ func.func @subview_non_zero_addrspace(%0 : memref<64x4xf32, strided<[4, 1]>, 3>,
 
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[STRIDE0:.*]] = llvm.mlir.constant(4 : index) : i64
   // CHECK: %[[DESCSTRIDE0:.*]] = llvm.mul %[[ARG0]], %[[STRIDE0]] overflow<nsw> : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[DESCSTRIDE0]] : i64 to index
   // CHECK: %[[DESCSTRIDE0_V2:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
-  // CHECK: %[[OFF2:.*]] = llvm.add %[[DESCSTRIDE0]], %[[ARG1]] : i64
+  // CHECK: %[[OFF1:.*]] = llvm.add %[[SRC_OFF]], %[[DESCSTRIDE0]] : i64
+  // CHECK: %[[OFF2:.*]] = llvm.add %[[OFF1]], %[[ARG1]] : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[OFF2]] : i64 to index
   // CHECK: %[[OFF2:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
@@ -131,11 +135,13 @@ func.func @subview_const_size(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : in
 
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[C4:.*]] = llvm.mlir.constant(4 : index) : i64
   // CHECK: %[[DESCSTRIDE0:.*]] = llvm.mul %[[ARG0]], %[[C4]] overflow<nsw> : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[DESCSTRIDE0]] : i64 to index
   // CHECK: %[[DESCSTRIDE0_V2:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
-  // CHECK: %[[OFF2:.*]] = llvm.add %[[DESCSTRIDE0]], %[[ARG1]] : i64
+  // CHECK: %[[OFF1:.*]] = llvm.add %[[SRC_OFF]], %[[DESCSTRIDE0]] : i64
+  // CHECK: %[[OFF2:.*]] = llvm.add %[[OFF1]], %[[ARG1]] : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[OFF2]] : i64 to index
   // CHECK: %[[OFF2:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -168,9 +174,11 @@ func.func @subview_const_stride(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 :
 
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[C4:.*]] = llvm.mlir.constant(4 : index) : i64
   // CHECK: %[[OFF0:.*]] = llvm.mul %[[ARG0]], %[[C4]] overflow<nsw> : i64
-  // CHECK: %[[OFF2:.*]] = llvm.add %[[OFF0]], %[[ARG1]] : i64
+  // CHECK: %[[OFF1:.*]] = llvm.add %[[SRC_OFF]], %[[OFF0]] : i64
+  // CHECK: %[[OFF2:.*]] = llvm.add %[[OFF1]], %[[ARG1]] : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[OFF2]] : i64 to index
   // CHECK: %[[OFF2:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -199,11 +207,15 @@ func.func @subview_const_stride_and_offset(%0 : memref<64x8xf32, strided<[8, 1]>
 
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[CST_ADD:.*]] = llvm.mlir.constant(2 : index) : i64
+  // CHECK: %[[ADD:.*]] = llvm.add %[[SRC_OFF]], %[[CST_ADD]] : i64
+  // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[ADD]] : i64 to index
+  // CHECK: %[[NEW_OFF:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[DESC0:.*]] = llvm.insertvalue %[[BASE]], %[[DESC]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[DESC1:.*]] = llvm.insertvalue %[[BASE_ALIGNED]], %[[DESC0]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-  // CHECK: %[[CST_OFF:.*]] = llvm.mlir.constant(2 : index) : i64
-  // CHECK: %[[DESC2:.*]] = llvm.insertvalue %[[CST_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[DESC2:.*]] = llvm.insertvalue %[[NEW_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[CST_SIZE0:.*]] = llvm.mlir.constant(62 : index) : i64
   // CHECK: %[[DESC3:.*]] = llvm.insertvalue %[[CST_SIZE0]], %[[DESC2]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[CST_STRIDE0:.*]] = llvm.mlir.constant(8 : index) : i64
@@ -234,13 +246,15 @@ func.func @subview_mixed_static_dynamic(%0 : memref<64x4xf32, strided<[4, 1]>>,
 
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[STRIDE0:.*]] = llvm.mlir.constant(4 : index) : i64
   // CHECK: %[[DESCSTRIDE0:.*]] = llvm.mul %[[ARG0]], %[[STRIDE0]] overflow<nsw> : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[DESCSTRIDE0]] : i64 to index
   // CHECK: %[[DESCSTRIDE0_V2:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[OFF0:.*]] = llvm.mul %[[ARG1]], %[[STRIDE0]] overflow<nsw> : i64
+  // CHECK: %[[OFF1:.*]] = llvm.add %[[SRC_OFF]], %[[OFF0]] : i64
   // CHECK: %[[BASE_OFF:.*]] = llvm.mlir.constant(2 : index)  : i64
-  // CHECK: %[[OFF2:.*]] = llvm.add %[[OFF0]], %[[BASE_OFF]] : i64
+  // CHECK: %[[OFF2:.*]] = llvm.add %[[OFF1]], %[[BASE_OFF]] : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[OFF2]] : i64 to index
   // CHECK: %[[OFF2:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -270,12 +284,16 @@ func.func @subview_leading_operands(%0 : memref<5x3xf32>, %1: memref<5x?xf32>) -
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64
   // Aligned ptr
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2]
+  // CHECK: %[[CST_ADD:.*]] = llvm.mlir.constant(6 : index) : i64
+  // CHECK: %[[ADD:.*]] = llvm.add %[[SRC_OFF]], %[[CST_ADD]] : i64
+  // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[ADD]] : i64 to index
+  // CHECK: %[[NEW_OFF:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[DESC0:.*]] = llvm.insertvalue %[[BASE]], %[[DESC]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[DESC1:.*]] = llvm.insertvalue %[[BASE_ALIGNED]], %[[DESC0]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // Offset
-  // CHECK: %[[CST_OFF:.*]] = llvm.mlir.constant(6 : index) : i64
-  // CHECK: %[[DESC2:.*]] = llvm.insertvalue %[[CST_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[DESC2:.*]] = llvm.insertvalue %[[NEW_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // Sizes and strides @rank 0: both static extracted from type.
   // CHECK: %[[C3:.*]] = llvm.mlir.constant(3 : index) : i64
   // CHECK: %[[DESC3:.*]] = llvm.insertvalue %[[C3]], %[[DESC2]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -298,11 +316,13 @@ func.func @subview_leading_operands_dynamic(%0 : memref<5x?xf32>) -> memref<3x?x
   // CHECK: %[[SIZE1:.*]] = llvm.extractvalue %[[MEMREF]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // Extract strides
   // CHECK: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEMREF]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // Compute and insert offset from 2 + dynamic value.
   // CHECK: %[[CST_OFF0:.*]] = llvm.mlir.constant(2 : index) : i64
-  // CHECK: %[[OFF0:.*]] = llvm.mul %[[STRIDE0]], %[[CST_OFF0]] overflow<nsw> : i64
+  // CHECK: %[[MUL:.*]] = llvm.mul %[[STRIDE0]], %[[CST_OFF0]] overflow<nsw> : i64
+  // CHECK: %[[OFF0:.*]] = llvm.add %[[SRC_OFF]], %[[MUL]] : i64
   // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[OFF0]] : i64 to index
   // CHECK: %[[OFF0:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -334,13 +354,17 @@ func.func @subview_rank_reducing_leading_operands(%0 : memref<5x3xf32>) -> memre
   // CHECK: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2]
+  // CHECK: %[[C3:.*]] = llvm.mlir.constant(3 : index) : i64
+  // CHECK: %[[ADD:.*]] = llvm.add %[[SRC_OFF]], %[[C3]] : i64
+  // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[ADD]] : i64 to index
+  // CHECK: %[[NEW_OFF:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // Alloc ptr
   // CHECK: %[[DESC0:.*]] = llvm.insertvalue %[[BASE]], %[[DESC]][0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // Aligned ptr
   // CHECK: %[[DESC1:.*]] = llvm.insertvalue %[[BASE_ALIGNED]], %[[DESC0]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-  // CHECK: %[[C3:.*]] = llvm.mlir.constant(3 : index) : i64
-  // CHECK: %[[DESC2:.*]] = llvm.insertvalue %[[C3]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[DESC2:.*]] = llvm.insertvalue %[[NEW_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // Sizes and strides @rank 0: both static.
   // CHECK: %[[DESC3:.*]] = llvm.insertvalue %[[C3]], %[[DESC2]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK: %[[CST_STRIDE0:.*]] = llvm.mlir.constant(1 : index) : i64
@@ -359,11 +383,15 @@ func.func @subview_negative_stride(%arg0 : memref<7xf32>) -> memref<7xf32, strid
   // CHECK: %[[MEMREF:.*]] = builtin.unrealized_conversion_cast %[[MEM]]
   // CHECK: %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][0] : !llvm.struct<(ptr, ptr, i64
   // CHECK: %[[BASE_ALIGNED:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr, ptr, i64
+  // CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2]
+  // CHECK: %[[CST_OFF0:.*]] = llvm.mlir.constant(6 : index) : i64
+  // CHECK: %[[ADD:.*]] = llvm.add %[[SRC_OFF]], %[[CST_OFF0]] : i64
+  // CHECK: %[[TMP:.*]] = builtin.unrealized_conversion_cast %[[ADD]] : i64 to index
+  // CHECK: %[[NEW_OFF:.*]] = builtin.unrealized_conversion_cast %[[TMP]] : index to i64
   // CHECK: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK: %[[DESC0:.*]] = llvm.insertvalue %[[BASE]], %[[DESC]][0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK: %[[DESC1:.*]] = llvm.insertvalue %[[BASE_ALIGNED]], %[[DESC0]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-  // CHECK: %[[CST_OFF0:.*]] = llvm.mlir.constant(6 : index) : i64
-  // CHECK: %[[DESC2:.*]] = llvm.insertvalue %[[CST_OFF0]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[DESC2:.*]] = llvm.insertvalue %[[NEW_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK: %[[CST_SIZE0:.*]] = llvm.mlir.constant(7 : index) : i64
   // CHECK: %[[DESC3:.*]] = llvm.insertvalue %[[CST_SIZE0]], %[[DESC2]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK: %[[CST_STRIDE0:.*]] = llvm.mlir.constant(-1 : index) : i64
@@ -387,11 +415,11 @@ func.func @collapse_shape_static(%arg0: memref<1x3x4x1x5xf32>) -> memref<3x4x5xf
 // CHECK:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<1x3x4x1x5xf32> to !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
 // CHECK:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
 // CHECK:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
-// CHECK:           %[[C0:.*]] = llvm.mlir.constant(0 : index) : i64
+// CHECK:           %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
 // CHECK:           %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[DESC0:.*]] = llvm.insertvalue %[[BASE_BUFFER]], %[[DESC]][0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[DESC1:.*]] = llvm.insertvalue %[[ALIGNED_BUFFER]], %[[DESC0]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[C0]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[SRC_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[C3:.*]] = llvm.mlir.constant(3 : index) : i64
 // CHECK:           %[[DESC3:.*]] = llvm.insertvalue %[[C3]], %[[DESC2]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[C20:.*]] = llvm.mlir.constant(20 : index) : i64
@@ -422,7 +450,7 @@ func.func @collapse_shape_dynamic_with_non_identity_layout(
 // CHECK:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<4x?x?xf32, strided<[?, 4, 1]>> to !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i64,
 // CHECK:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i64,
-// CHECK:           %[[OFFSET:.*]] = llvm.mlir.constant(0 : index) : i64
+// CHECK:           %[[OFFSET:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[SIZE1:.*]] = llvm.extractvalue %[[MEM]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[SIZE2:.*]] = llvm.extractvalue %[[MEM]][3, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
@@ -447,7 +475,7 @@ func.func @collapse_shape_dynamic_with_non_identity_layout(
 // CHECK32:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<4x?x?xf32, strided<[?, 4, 1]>> to !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
 // CHECK32:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i32,
 // CHECK32:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i32,
-// CHECK32:           %[[OFFSET:.*]] = llvm.mlir.constant(0 : index) : i32
+// CHECK32:           %[[OFFSET:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
 // CHECK32:           %[[SIZE1:.*]] = llvm.extractvalue %[[MEM]][3, 1] : !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
 // CHECK32:           %[[SIZE2:.*]] = llvm.extractvalue %[[MEM]][3, 2] : !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
 // CHECK32:           %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM]][4, 0] : !llvm.struct<(ptr, ptr, i32, array<3 x i32>, array<3 x i32>)>
@@ -482,11 +510,11 @@ func.func @expand_shape_static(%arg0: memref<3x4x5xf32>) -> memref<1x3x4x1x5xf32
 // CHECK:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<3x4x5xf32> to !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i64,
 // CHECK:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i64,
-// CHECK:           %[[C0:.*]] = llvm.mlir.constant(0 : index) : i64
+// CHECK:           %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i64,
 // CHECK:           %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
 // CHECK:           %[[DESC0:.*]] = llvm.insertvalue %[[BASE_BUFFER]], %[[DESC]][0] : !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
 // CHECK:           %[[DESC1:.*]] = llvm.insertvalue %[[ALIGNED_BUFFER]], %[[DESC0]][1] : !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
-// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[C0]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
+// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[SRC_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
 // CHECK:           %[[C1:.*]] = llvm.mlir.constant(1 : index) : i64
 // CHECK:           %[[DESC3:.*]] = llvm.insertvalue %[[C1]], %[[DESC2]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<5 x i64>, array<5 x i64>)>
 // CHECK:           %[[C60:.*]] = llvm.mlir.constant(60 : index) : i64
@@ -521,8 +549,8 @@ func.func @collapse_shape_fold_zero_dim(%arg0 : memref<1x1xf32>) -> memref<f32>
 // CHECK:           %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[DESC0:.*]] = llvm.insertvalue %[[BASE_BUFFER]], %[[DESC]][0] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[DESC1:.*]] = llvm.insertvalue %[[ALIGNED_BUFFER]], %[[DESC0]][1] : !llvm.struct<(ptr, ptr, i64)>
-// CHECK:           %[[C0:.*]] = llvm.mlir.constant(0 : index) : i64
-// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[C0]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64)>
+// CHECK:           %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i64,
+// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[SRC_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[RES:.*]] = builtin.unrealized_conversion_cast %[[DESC2]] : !llvm.struct<(ptr, ptr, i64)> to memref<f32>
 // CHECK:           return %[[RES]] : memref<f32>
 // CHECK:         }
@@ -539,11 +567,11 @@ func.func @expand_shape_zero_dim(%arg0 : memref<f32>) -> memref<1x1xf32> {
 // CHECK:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<f32> to !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i64)>
-// CHECK:           %[[C0:.*]] = llvm.mlir.constant(0 : index) : i64
+// CHECK:           %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[DESC0:.*]] = llvm.insertvalue %[[BASE_BUFFER]], %[[DESC]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[DESC1:.*]] = llvm.insertvalue %[[ALIGNED_BUFFER]], %[[DESC0]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[C0]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[SRC_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[C1:.*]] = llvm.mlir.constant(1 : index) : i64
 // CHECK:           %[[DESC3:.*]] = llvm.insertvalue %[[C1]], %[[DESC2]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[DESC4:.*]] = llvm.insertvalue %[[C1]], %[[DESC3]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -565,7 +593,7 @@ func.func @collapse_shape_dynamic(%arg0 : memref<1x2x?xf32>) -> memref<1x?xf32>
 // CHECK:           %[[MEM:.*]] = builtin.unrealized_conversion_cast %[[ARG]] : memref<1x2x?xf32> to !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[BASE_BUFFER:.*]] = llvm.extractvalue %[[MEM]][0] : !llvm.struct<(ptr, ptr, i64,
 // CHECK:           %[[ALIGNED_BUFFER:.*]] = llvm.extractvalue %[[MEM]][1] : !llvm.struct<(ptr, ptr, i64,
-// CHECK:           %[[C0:.*]] = llvm.mlir.constant(0 : index) : i64
+// CHECK:           %[[SRC_OFF:.*]] = llvm.extractvalue %[[MEM]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[SIZE2:.*]] = llvm.extractvalue %[[MEM]][3, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[C2:.*]] = llvm.mlir.constant(2 : index) : i64
@@ -575,7 +603,7 @@ func.func @collapse_shape_dynamic(%arg0 : memref<1x2x?xf32>) -> memref<1x?xf32>
 // CHECK:           %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[DESC0:.*]] = llvm.insertvalue %[[BASE_BUFFER]], %[[DESC]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[DESC1:.*]] = llvm.insertvalue %[[ALIGNED_BUFFER]], %[[DESC0]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[C0]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[DESC2:.*]] = llvm.insertvalue %[[SRC_OFF]], %[[DESC1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[C1:.*]] = llvm.mlir.constant(1 : index) : i64
 // CHECK:           %[[DESC3:.*]] = llvm.insertvalue %[[C1]], %[[DESC2]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[DESC4:.*]] = llvm.insertvalue %[[STRIDE0]], %[[DESC3]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -602,12 +630,12 @@ func.func @expand_shape_dynamic(%arg0 : memref<1x?xf32>, %sz0: index) -> memref<
 // CHECK:           %[[MLIR_0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[INSERTVALUE_0:.*]] = llvm.insertvalue %[[EXTRACTVALUE_0]], %[[MLIR_0]][0] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[INSERTVALUE_1:.*]] = llvm.insertvalue %[[EXTRACTVALUE_1]], %[[INSERTVALUE_0]][1] : !llvm.struct<(ptr, ptr, i64)>
-// CHECK:           %[[MLIR_1:.*]] = llvm.mlir.constant(0 : index) : i64
+// CHECK:           %[[SRC_OFF:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[EXTRACTVALUE_2:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[MLIR_2:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_2:.*]] = llvm.insertvalue %[[EXTRACTVALUE_0]], %[[MLIR_2]][0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_3:.*]] = llvm.insertvalue %[[EXTRACTVALUE_1]], %[[INSERTVALUE_2]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[INSERTVALUE_4:.*]] = llvm.insertvalue %[[MLIR_1]], %[[INSERTVALUE_3]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK:           %[[INSERTVALUE_4:.*]] = llvm.insertvalue %[[SRC_OFF]], %[[INSERTVALUE_3]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[MLIR_3:.*]] = llvm.mlir.constant(1 : index) : i64
 // CHECK:           %[[INSERTVALUE_5:.*]] = llvm.insertvalue %[[MLIR_3]], %[[INSERTVALUE_4]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_6:.*]] = llvm.insertvalue %[[EXTRACTVALUE_2]], %[[INSERTVALUE_5]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
@@ -640,7 +668,7 @@ func.func @expand_shape_dynamic_with_non_identity_layout(
 // CHECK:           %[[MLIR_0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[INSERTVALUE_0:.*]] = llvm.insertvalue %[[EXTRACTVALUE_0]], %[[MLIR_0]][0] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[INSERTVALUE_1:.*]] = llvm.insertvalue %[[EXTRACTVALUE_1]], %[[INSERTVALUE_0]][1] : !llvm.struct<(ptr, ptr, i64)>
-// CHECK:           %[[MLIR_1:.*]] = llvm.mlir.constant(0 : index) : i64
+// CHECK:           %[[SRC_OFF:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[EXTRACTVALUE_3:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[EXTRACTVALUE_4:.*]] = llvm.extractvalue %[[UNREALIZED_CONVERSION_CAST_1]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[MUL_0:.*]] = llvm.mul %[[EXTRACTVALUE_4]], %[[UNREALIZED_CONVERSION_CAST_0]] overflow<nsw> : i64
@@ -649,7 +677,7 @@ func.func @expand_shape_dynamic_with_non_identity_layout(
 // CHECK:           %[[MLIR_2:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_2:.*]] = llvm.insertvalue %[[EXTRACTVALUE_0]], %[[MLIR_2]][0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_3:.*]] = llvm.insertvalue %[[EXTRACTVALUE_1]], %[[INSERTVALUE_2]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[INSERTVALUE_4:.*]] = llvm.insertvalue %[[MLIR_1]], %[[INSERTVALUE_3]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK:           %[[INSERTVALUE_4:.*]] = llvm.insertvalue %[[SRC_OFF]], %[[INSERTVALUE_3]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[MLIR_3:.*]] = llvm.mlir.constant(1 : index) : i64
 // CHECK:           %[[INSERTVALUE_5:.*]] = llvm.insertvalue %[[MLIR_3]], %[[INSERTVALUE_4]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[INSERTVALUE_6:.*]] = llvm.insertvalue %[[EXTRACTVALUE_3]], %[[INSERTVALUE_5]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
@@ -682,8 +710,10 @@ func.func @collapse_static_shape_with_non_identity_layout(%arg: memref<1x1x8x8xf
 // CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf32, strided<[?, ?]>>,
 // CHECK: %[[DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xf32, strided<[?, ?]>> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK: %[[ALIGNED_PTR:.*]] = llvm.extractvalue %[[DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK: llvm.intr.assume %{{.*}} ["align"(%[[ALIGNED_PTR]], %{{.*}} : !llvm.ptr, i64)] : i1
-// CHECK: %[[LD_ADDR:.*]] = llvm.getelementptr inbounds|nuw %[[ALIGNED_PTR]][%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
+// CHECK: %[[SRC_OFF:.*]] = llvm.extractvalue %[[DESC]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[BASE_PTR:.*]] = llvm.getelementptr %[[ALIGNED_PTR]][%[[SRC_OFF]]] : (!llvm.ptr, i64) -> !llvm.ptr, f32
+// CHECK: llvm.intr.assume %{{.*}} ["align"(%[[BASE_PTR]], %{{.*}} : !llvm.ptr, i64)] : i1
+// CHECK: %[[LD_ADDR:.*]] = llvm.getelementptr inbounds|nuw %[[BASE_PTR]][%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[VAL:.*]] = llvm.load %[[LD_ADDR]] : !llvm.ptr -> f32
 // CHECK: return %[[VAL]] : f32
 func.func @load_and_assume(

>From 1fd40fee40871fb9bc01125a47d2edc13a3bed49 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 11:09:45 +0200
Subject: [PATCH 22/27] [WIP][mlir] step 2 follow-ups: broader CHECK fixes for
 bufferPtr change

Updated CHECK lines in additional tests affected by always emitting the
runtime offset GEP in MemRefDescriptor::bufferPtr:
- AMDGPUToROCDL, FuncToLLVM, GPUCommon, GPUToNVVM, NVGPUToNVVM, VectorToLLVM,
  LLVM e2e tests
- python/dialects/memref.py: drop dynamic-offset alloc test (feature gone),
  skip offset assertion when layout has no strides attribute
- RuntimeOpVerification: remove offset-mismatch check since offset is no
  longer on the memref type; keep stride checks
- cast-runtime-verification.mlir: drop the corresponding offset-mismatch
  expected error
- StridedMetadataRangeAnalysis test: update constant offset value

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
---
 .../Transforms/RuntimeOpVerification.cpp      |  15 +--
 .../test-strided-metadata-range-analysis.mlir |   2 +-
 .../ArmSMEToLLVM/arm-sme-to-llvm.mlir         |   8 +-
 .../ArmSMEToLLVM/tile-spills-and-fills.mlir   |  16 ++-
 .../FuncToLLVM/calling-convention.mlir        |  12 +-
 .../Conversion/GPUCommon/transfer_write.mlir  |   8 +-
 .../GPUToNVVM/wmma-ops-to-nvvm.mlir           |  24 +++-
 .../Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir | 104 +++++++++++++-----
 .../VectorToLLVM/vector-scalable-memcpy.mlir  |   8 +-
 .../vector-to-llvm-interface.mlir             |  12 +-
 .../VectorToLLVM/vector-xfer-to-llvm.mlir     |  12 ++
 .../lower-to-llvm-e2e-with-target-tag.mlir    |   3 +-
 ...lvm-e2e-with-top-level-named-sequence.mlir |   3 +-
 .../MemRef/cast-runtime-verification.mlir     |   5 -
 mlir/test/python/dialects/memref.py           |  16 ++-
 15 files changed, 170 insertions(+), 78 deletions(-)

diff --git a/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp b/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp
index 3ebb8f0a35bc4..1ca297c7055b7 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp
@@ -123,23 +123,12 @@ struct CastOpInterface
                                        std::to_string(it.index())));
     }
 
-    // Get result offset and strides.
+    // Get result strides. Offset is no longer carried by the memref type.
     int64_t resultOffset;
     SmallVector<int64_t> resultStrides;
     if (failed(resultType.getStridesAndOffset(resultStrides, resultOffset)))
       return;
-
-    // Check offset.
-    if (resultOffset != ShapedType::kDynamic) {
-      // Static/dynamic offset -> dynamic offset does not need verification.
-      Value srcOffset = metadataOp.getResult(1);
-      Value resultOffsetVal =
-          arith::ConstantIndexOp::create(builder, loc, resultOffset);
-      Value isSameOffset = arith::CmpIOp::create(
-          builder, loc, arith::CmpIPredicate::eq, srcOffset, resultOffsetVal);
-      cf::AssertOp::create(builder, loc, isSameOffset,
-                           generateErrorMessage(op, "offset mismatch"));
-    }
+    (void)resultOffset;
 
     // Check strides.
     for (const auto &it : llvm::enumerate(resultStrides)) {
diff --git a/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir b/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
index dcce78e9173e6..ae7ca3a0da50e 100644
--- a/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
+++ b/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
@@ -50,7 +50,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // Test a subview with mixed bounded and unbound dynamic sizes.
   // CHECK: Op:  %[[SV5:.*]] = memref.subview
   // CHECK-NEXT: result[0]: strided_metadata<
-  // CHECK-SAME: offset = [{unsigned : [32, 32] signed : [32, 32]}]
+  // CHECK-SAME: offset = [{unsigned : [16, 16] signed : [16, 16]}]
   // CHECK-SAME: sizes = [{unsigned : [11, 13] signed : [11, 13]}, {unsigned : [5, 7] signed : [5, 7]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: strides = [{unsigned : [1, 1] signed : [1, 1]}, {unsigned : [64, 64] signed : [64, 64]}, {unsigned : [8, 8] signed : [8, 8]}]
   %subview_4 = memref.subview %arg2[%c0, %c0, %c2] [%0, %1, %arg5] [%c1, %c1, %c1] : memref<8x16x4xf32, strided<[1, 64, 8]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
diff --git a/mlir/test/Conversion/ArmSMEToLLVM/arm-sme-to-llvm.mlir b/mlir/test/Conversion/ArmSMEToLLVM/arm-sme-to-llvm.mlir
index fd8910265cd89..ebe623d75d920 100644
--- a/mlir/test/Conversion/ArmSMEToLLVM/arm-sme-to-llvm.mlir
+++ b/mlir/test/Conversion/ArmSMEToLLVM/arm-sme-to-llvm.mlir
@@ -12,7 +12,9 @@
 // CHECK:           %[[C0:.*]] = arith.constant 0 : index
 // CHECK:           %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[SRC]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[C0_I64:.*]] = builtin.unrealized_conversion_cast %[[C0]] : index to i64
-// CHECK:           %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[ALIGNED_RAW:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEM_DESC]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[ALIGNED_BASE:.*]] = llvm.getelementptr %[[ALIGNED_RAW]]{{\[}}%[[DESC_OFF]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
 // CHECK:           %[[STRIDE:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[OFFSET:.*]] = llvm.mul %[[C0_I64]], %[[STRIDE]]  : i64
 // CHECK:           %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
@@ -245,7 +247,9 @@ func.func @arm_sme_load_tile_slice_ver_f64(%src : memref<?x?xf64>, %mask : vecto
 // CHECK:           %[[C0:.*]] = arith.constant 0 : index
 // CHECK:           %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[DEST]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[C0_I64:.*]] = builtin.unrealized_conversion_cast %[[C0]] : index to i64
-// CHECK:           %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[ALIGNED_RAW:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEM_DESC]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[ALIGNED_BASE:.*]] = llvm.getelementptr %[[ALIGNED_RAW]]{{\[}}%[[DESC_OFF]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
 // CHECK:           %[[STRIDE:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[OFFSET:.*]] = llvm.mul %[[C0_I64]], %[[STRIDE]]  : i64
 // CHECK:           %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFFSET]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
diff --git a/mlir/test/Conversion/ArmSMEToLLVM/tile-spills-and-fills.mlir b/mlir/test/Conversion/ArmSMEToLLVM/tile-spills-and-fills.mlir
index 2a183cb4d056a..517d892e01338 100644
--- a/mlir/test/Conversion/ArmSMEToLLVM/tile-spills-and-fills.mlir
+++ b/mlir/test/Conversion/ArmSMEToLLVM/tile-spills-and-fills.mlir
@@ -105,7 +105,9 @@ func.func @use_too_many_tiles() {
 //      AFTER-LLVM-LOWERING: scf.for
 // AFTER-LLVM-LOWERING-SAME: %[[C0]] to %[[SVL_H]] step %[[C1]] {
 //      AFTER-LLVM-LOWERING:   %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[TILE_ALLOCA]]
-//      AFTER-LLVM-LOWERING:   %[[BASE_PTR:.*]] = llvm.extractvalue %[[MEM_DESC]][1]
+//      AFTER-LLVM-LOWERING:   %[[BASE_RAW:.*]] = llvm.extractvalue %[[MEM_DESC]][1]
+//      AFTER-LLVM-LOWERING:   %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEM_DESC]][2]
+//      AFTER-LLVM-LOWERING:   %[[BASE_PTR:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF]]]
 //      AFTER-LLVM-LOWERING:   %[[SLICE_PTR:.*]] = llvm.getelementptr %[[BASE_PTR]]
 //      AFTER-LLVM-LOWERING:   %[[SLICE:.*]] = "arm_sme.intr.read.horiz"{{.*}} <{tile_id = 0 : i32}>
 // AFTER-LLVM-LOWERING-NEXT:   "arm_sme.intr.ld1h.horiz"({{.*}}, %[[SLICE_PTR]], {{.*}}) <{tile_id = 0 : i32}>
@@ -123,7 +125,9 @@ func.func @use_too_many_tiles() {
 //      AFTER-LLVM-LOWERING: scf.for
 // AFTER-LLVM-LOWERING-SAME: %[[C0]] to %[[SVL_H]] step %[[C1]] {
 //      AFTER-LLVM-LOWERING:   %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[TILE_ALLOCA]]
-//      AFTER-LLVM-LOWERING:   %[[BASE_PTR:.*]] = llvm.extractvalue %[[MEM_DESC]][1]
+//      AFTER-LLVM-LOWERING:   %[[BASE_RAW:.*]] = llvm.extractvalue %[[MEM_DESC]][1]
+//      AFTER-LLVM-LOWERING:   %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEM_DESC]][2]
+//      AFTER-LLVM-LOWERING:   %[[BASE_PTR:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF]]]
 //      AFTER-LLVM-LOWERING:   %[[SLICE_PTR:.*]] = llvm.getelementptr %[[BASE_PTR]]
 //      AFTER-LLVM-LOWERING:   %[[SLICE:.*]] = "arm_sme.intr.read.horiz"{{.*}} <{tile_id = 0 : i32}>
 // AFTER-LLVM-LOWERING-NEXT:   "arm_sme.intr.ld1h.horiz"({{.*}}, %[[SLICE_PTR]], {{.*}}) <{tile_id = 0 : i32}>
@@ -164,7 +168,9 @@ func.func @very_excessive_spills(%useAllTiles : vector<[16]x[16]xi8>, %memref: m
 //      AFTER-LLVM-LOWERING: scf.for
 // AFTER-LLVM-LOWERING-SAME: %[[C0]] to %[[SVL_S]] step %[[C1]] {
 //      AFTER-LLVM-LOWERING:   %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[TILE_ALLOCA]]
-//      AFTER-LLVM-LOWERING:   %[[BASE_PTR:.*]] = llvm.extractvalue %[[MEM_DESC]][1]
+//      AFTER-LLVM-LOWERING:   %[[BASE_RAW:.*]] = llvm.extractvalue %[[MEM_DESC]][1]
+//      AFTER-LLVM-LOWERING:   %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEM_DESC]][2]
+//      AFTER-LLVM-LOWERING:   %[[BASE_PTR:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF]]]
 //      AFTER-LLVM-LOWERING:   %[[SLICE_PTR:.*]] = llvm.getelementptr %[[BASE_PTR]]
 // Read ZA tile slice -> vector
 //      AFTER-LLVM-LOWERING:   %[[SLICE:.*]] = "arm_sme.intr.read.horiz"{{.*}} <{tile_id = 0 : i32}>
@@ -183,7 +189,9 @@ func.func @very_excessive_spills(%useAllTiles : vector<[16]x[16]xi8>, %memref: m
 //      AFTER-LLVM-LOWERING: scf.for
 // AFTER-LLVM-LOWERING-SAME: %[[C0]] to %[[SVL_S]] step %[[C1]] {
 //      AFTER-LLVM-LOWERING:   %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[TILE_ALLOCA]]
-//      AFTER-LLVM-LOWERING:   %[[BASE_PTR:.*]] = llvm.extractvalue %[[MEM_DESC]][1]
+//      AFTER-LLVM-LOWERING:   %[[BASE_RAW:.*]] = llvm.extractvalue %[[MEM_DESC]][1]
+//      AFTER-LLVM-LOWERING:   %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEM_DESC]][2]
+//      AFTER-LLVM-LOWERING:   %[[BASE_PTR:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF]]]
 //      AFTER-LLVM-LOWERING:   %[[SLICE_PTR:.*]] = llvm.getelementptr %[[BASE_PTR]]
 /// Read ZA tile slice -> vector
 //      AFTER-LLVM-LOWERING:   %[[SLICE:.*]] = "arm_sme.intr.read.horiz"{{.*}} <{tile_id = 0 : i32}>
diff --git a/mlir/test/Conversion/FuncToLLVM/calling-convention.mlir b/mlir/test/Conversion/FuncToLLVM/calling-convention.mlir
index 3b52d8fd76464..9979ebbae67fb 100644
--- a/mlir/test/Conversion/FuncToLLVM/calling-convention.mlir
+++ b/mlir/test/Conversion/FuncToLLVM/calling-convention.mlir
@@ -265,7 +265,9 @@ func.func @bare_ptr_calling_conv(%arg0: memref<4x3xf32>, %arg1 : index, %arg2 :
   // CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index) : i64
   // CHECK: %[[INSERT_STRIDE1:.*]] = llvm.insertvalue %[[C1]], %[[INSERT_DIM1]][4, 1]
 
-  // CHECK: %[[ALIGNEDPTR:.*]] = llvm.extractvalue %[[INSERT_STRIDE1]][1]
+  // CHECK: %[[ALIGNEDPTR_RAW:.*]] = llvm.extractvalue %[[INSERT_STRIDE1]][1]
+  // CHECK: %[[DESC_OFF:.*]] = llvm.extractvalue %[[INSERT_STRIDE1]][2]
+  // CHECK: %[[ALIGNEDPTR:.*]] = llvm.getelementptr %[[ALIGNEDPTR_RAW]][%[[DESC_OFF]]]
   // CHECK: %[[STOREPTR:.*]] = llvm.getelementptr inbounds|nuw %[[ALIGNEDPTR]]
   // CHECK: llvm.store %{{.*}}, %[[STOREPTR]]
   memref.store %arg3, %arg0[%arg1, %arg2] : memref<4x3xf32>
@@ -294,12 +296,16 @@ func.func @bare_ptr_calling_conv_multiresult(%arg0: memref<4x3xf32>, %arg1 : ind
   // CHECK: %[[C1:.*]] = llvm.mlir.constant(1 : index) : i64
   // CHECK: %[[INSERT_STRIDE1:.*]] = llvm.insertvalue %[[C1]], %[[INSERT_DIM1]][4, 1]
 
-  // CHECK: %[[ALIGNEDPTR:.*]] = llvm.extractvalue %[[INSERT_STRIDE1]][1]
+  // CHECK: %[[ALIGNEDPTR_RAW:.*]] = llvm.extractvalue %[[INSERT_STRIDE1]][1]
+  // CHECK: %[[DESC_OFF:.*]] = llvm.extractvalue %[[INSERT_STRIDE1]][2]
+  // CHECK: %[[ALIGNEDPTR:.*]] = llvm.getelementptr %[[ALIGNEDPTR_RAW]][%[[DESC_OFF]]]
   // CHECK: %[[STOREPTR:.*]] = llvm.getelementptr inbounds|nuw %[[ALIGNEDPTR]]
   // CHECK: llvm.store %{{.*}}, %[[STOREPTR]]
   memref.store %arg3, %arg0[%arg1, %arg2] : memref<4x3xf32>
 
-  // CHECK: %[[ALIGNEDPTR0:.*]] = llvm.extractvalue %[[INSERT_STRIDE1]][1]
+  // CHECK: %[[ALIGNEDPTR0_RAW:.*]] = llvm.extractvalue %[[INSERT_STRIDE1]][1]
+  // CHECK: %[[DESC_OFF0:.*]] = llvm.extractvalue %[[INSERT_STRIDE1]][2]
+  // CHECK: %[[ALIGNEDPTR0:.*]] = llvm.getelementptr %[[ALIGNEDPTR0_RAW]][%[[DESC_OFF0]]]
   // CHECK: %[[LOADPTR:.*]] = llvm.getelementptr inbounds|nuw %[[ALIGNEDPTR0]]
   // CHECK: %[[RETURN0:.*]] = llvm.load %[[LOADPTR]]
   %0 = memref.load %arg0[%arg1, %arg2] : memref<4x3xf32>
diff --git a/mlir/test/Conversion/GPUCommon/transfer_write.mlir b/mlir/test/Conversion/GPUCommon/transfer_write.mlir
index 4d2ae8c39240c..7311af6e07ed4 100644
--- a/mlir/test/Conversion/GPUCommon/transfer_write.mlir
+++ b/mlir/test/Conversion/GPUCommon/transfer_write.mlir
@@ -2,9 +2,11 @@
 
 // CHECK-LABEL: @warp_extract
 // CHECK-SAME: %[[VEC:[a-zA-Z0-9_]+]]: vector<1xf32>
-// CHECK:%[[BASE:[0-9]+]] = llvm.extractvalue
-// CHECK:%[[PTR:[0-9]+]] = llvm.getelementptr %[[BASE]]
-// CHECK:llvm.store %[[VEC]], %[[PTR]] {alignment = 4 : i64} : vector<1xf32>, !llvm.ptr
+// CHECK: %[[ALIGNED:.*]] = llvm.extractvalue
+// CHECK: %[[OFF:.*]] = llvm.extractvalue
+// CHECK: %[[BASE:.*]] = llvm.getelementptr %[[ALIGNED]]
+// CHECK: %[[PTR:.*]] = llvm.getelementptr %[[BASE]]
+// CHECK: llvm.store %[[VEC]], %[[PTR]] {alignment = 4 : i64} : vector<1xf32>, !llvm.ptr
 
 func.func @warp_extract(%arg0: index, %arg1: memref<1024x1024xf32>, %arg2: vector<1xf32>) {
     %c0 = arith.constant 0 : index
diff --git a/mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir b/mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
index a0801443057ea..2a8b5c2cfd85d 100644
--- a/mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
+++ b/mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
@@ -14,7 +14,9 @@ gpu.module @test_module {
     %0 = gpu.subgroup_mma_load_matrix %wg[%i, %j] {leadDimension = 32 : index, transpose} : memref<32x32xf16, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">
     // CHECK:  %[[INX:.*]] = llvm.mlir.constant(16 : index) : i64
     // CHECK: %{{.*}} = llvm.insertvalue %{{.*}}, %{{.*}}[{{.*}}, {{.*}}]
-    // CHECK:  %[[BASE:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK:  %[[BASE_RAW:.*]] = llvm.extractvalue %[[DESC:.*]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK:  %[[DESC_OFF:.*]] = llvm.extractvalue %[[DESC]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK:  %[[BASE:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF]]]
     // CHECK:  %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i64
     // CHECK:  %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]]  : i64
     // CHECK:  %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]]  : i64
@@ -26,7 +28,9 @@ gpu.module @test_module {
 
     // CHECK32:  %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32
     // CHECK32: %{{.*}} = llvm.insertvalue %{{.*}}, %{{.*}}[{{.*}}, {{.*}}]
-    // CHECK32:  %[[BASE:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
+    // CHECK32:  %[[BASE_RAW:.*]] = llvm.extractvalue %[[DESC32:.*]][1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
+    // CHECK32:  %[[DESC_OFF32:.*]] = llvm.extractvalue %[[DESC32]][2] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
+    // CHECK32:  %[[BASE:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF32]]]
     // CHECK32:  %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i32
     // CHECK32:  %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]]  : i32
     // CHECK32:  %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]]  : i32
@@ -53,7 +57,9 @@ gpu.module @test_module {
     %0 = gpu.subgroup_mma_load_matrix %wg[%i, %j] {leadDimension = 32 : index, transpose} : memref<32x32xi8, 3> -> !gpu.mma_matrix<16x16xsi8, "AOp">
     // CHECK:  %[[INX:.*]] = llvm.mlir.constant(16 : index) : i64
     // CHECK: %{{.*}} = llvm.insertvalue %{{.*}}, %{{.*}}[{{.*}}, {{.*}}]
-    // CHECK:  %[[BASE:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK:  %[[BASE_RAW:.*]] = llvm.extractvalue %[[DESC:.*]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK:  %[[DESC_OFF:.*]] = llvm.extractvalue %[[DESC]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK:  %[[BASE:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF]]]
     // CHECK:  %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i64
     // CHECK:  %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]]  : i64
     // CHECK:  %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]]  : i64
@@ -65,7 +71,9 @@ gpu.module @test_module {
 
     // CHECK32:  %[[INX:.*]] = llvm.mlir.constant(16 : index) : i32
     // CHECK32: %{{.*}} = llvm.insertvalue %{{.*}}, %{{.*}}[{{.*}}, {{.*}}]
-    // CHECK32:  %[[BASE:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
+    // CHECK32:  %[[BASE_RAW:.*]] = llvm.extractvalue %[[DESC32:.*]][1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
+    // CHECK32:  %[[DESC_OFF32:.*]] = llvm.extractvalue %[[DESC32]][2] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
+    // CHECK32:  %[[BASE:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF32]]]
     // CHECK32:  %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i32
     // CHECK32:  %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]]  : i32
     // CHECK32:  %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]]  : i32
@@ -122,7 +130,9 @@ gpu.module @test_module {
     // CHECK:  %[[EL2:.*]] = llvm.extractvalue %[[D]][1] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
     // CHECK:  %[[EL3:.*]] = llvm.extractvalue %[[D]][2] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
     // CHECK:  %[[EL4:.*]] = llvm.extractvalue %[[D]][3] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
-    // CHECK:  %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK:  %[[BASE_RAW:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK:  %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK:  %[[BASE:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF]]]
     // CHECK:  %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i64
     // CHECK:  %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]]   : i64
     // CHECK:  %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]]  : i64
@@ -141,7 +151,9 @@ gpu.module @test_module {
     // CHECK32:  %[[EL2:.*]] = llvm.extractvalue %[[D]][1] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
     // CHECK32:  %[[EL3:.*]] = llvm.extractvalue %[[D]][2] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
     // CHECK32:  %[[EL4:.*]] = llvm.extractvalue %[[D]][3] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
-    // CHECK32:  %[[BASE:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
+    // CHECK32:  %[[BASE_RAW:.*]] = llvm.extractvalue %[[MEMREF]][1] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
+    // CHECK32:  %[[DESC_OFF:.*]] = llvm.extractvalue %[[MEMREF]][2] : !llvm.struct<(ptr<3>, ptr<3>, i32, array<2 x i32>, array<2 x i32>)>
+    // CHECK32:  %[[BASE:.*]] = llvm.getelementptr %[[BASE_RAW]][%[[DESC_OFF]]]
     // CHECK32:  %[[LDM:.*]] = llvm.mlir.constant(32 : index) : i32
     // CHECK32:  %[[LI:.*]] = llvm.mul %[[INX]], %[[LDM]]   : i32
     // CHECK32:  %[[LIJ:.*]] = llvm.add %[[LI]], %[[INX]]  : i32
diff --git a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
index 48b9ad4c3d777..e7c8989df170e 100644
--- a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
+++ b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
@@ -224,7 +224,9 @@ func.func @m16n8k4_tf32(%arg0: vector<2x1xf32>, %arg1: vector<1x1xf32>, %arg2: v
 func.func @async_cp(
   %src: memref<128x128xf32>, %dst: memref<3x16x128xf32, 3>, %i : index) {
   // CHECK: %[[IDX1:.*]] = builtin.unrealized_conversion_cast %[[IDX]] : index to i64
-  // CHECK-DAG: %[[BASEDST:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK-DAG: %[[BASEDST_RAW:.*]] = llvm.extractvalue %[[DESCDST:.*]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK-DAG: %[[OFFDST:.*]] = llvm.extractvalue %[[DESCDST]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK-DAG: %[[BASEDST:.*]] = llvm.getelementptr %[[BASEDST_RAW]][%[[OFFDST]]]
   // CHECK-DAG: %[[S0:.*]] = llvm.mlir.constant(2048 : index) : i64
   // CHECK-DAG: %[[LI:.*]] = llvm.mul %[[IDX1]], %[[S0]] : i64
   // CHECK-DAG: %[[S1:.*]] = llvm.mlir.constant(128 : index) : i64
@@ -232,7 +234,9 @@ func.func @async_cp(
   // CHECK-DAG: %[[FI1:.*]] = llvm.add %[[LI]], %[[FI0]] : i64
   // CHECK-DAG: %[[FI2:.*]] = llvm.add %[[FI1]], %[[IDX1]] : i64
   // CHECK-DAG: %[[ADDRESSDST:.*]] = llvm.getelementptr %[[BASEDST]][%[[FI2]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>
-  // CHECK-DAG: %[[BASESRC:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASESRC_RAW:.*]] = llvm.extractvalue %[[DESCSRC:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[OFFSRC:.*]] = llvm.extractvalue %[[DESCSRC]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASESRC:.*]] = llvm.getelementptr %[[BASESRC_RAW]][%[[OFFSRC]]]
   // CHECK-DAG: %[[S3:.*]] = llvm.mlir.constant(128 : index) : i64
   // CHECK-DAG: %[[FI3:.*]] = llvm.mul %[[IDX1]], %[[S3]]  : i64
   // CHECK-DAG: %[[FI4:.*]] = llvm.add %[[FI3]], %[[IDX1]]  : i64
@@ -255,12 +259,16 @@ func.func @async_cp(
 func.func @async_cp_i4(
   %src: memref<128x64xi4>, %dst: memref<128x128xi4, 3>, %i : index) -> !nvgpu.device.async.token {
   // CHECK: %[[IDX1:.*]] = builtin.unrealized_conversion_cast %[[IDX]] : index to i64
-  // CHECK-DAG: %[[BASEDST:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASEDST_RAW:.*]] = llvm.extractvalue %[[DESCDST:.*]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[OFFDST:.*]] = llvm.extractvalue %[[DESCDST]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASEDST:.*]] = llvm.getelementptr %[[BASEDST_RAW]][%[[OFFDST]]]
   // CHECK-DAG: %[[S0:.*]] = llvm.mlir.constant(128 : index) : i64
   // CHECK-DAG: %[[LI:.*]] = llvm.mul %[[IDX1]], %[[S0]] : i64
   // CHECK-DAG: %[[FI1:.*]] = llvm.add %[[LI]], %[[IDX1]] : i64
   // CHECK-DAG: %[[ADDRESSDST:.*]] = llvm.getelementptr %[[BASEDST]][%[[FI1]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>
-  // CHECK-DAG: %[[BASESRC:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASESRC_RAW:.*]] = llvm.extractvalue %[[DESCSRC:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[OFFSRC:.*]] = llvm.extractvalue %[[DESCSRC]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASESRC:.*]] = llvm.getelementptr %[[BASESRC_RAW]][%[[OFFSRC]]]
   // CHECK-DAG: %[[S2:.*]] = llvm.mlir.constant(64 : index) : i64
   // CHECK-DAG: %[[FI2:.*]] = llvm.mul %[[IDX1]], %[[S2]]  : i64
   // CHECK-DAG: %[[FI3:.*]] = llvm.add %[[FI2]], %[[IDX1]]  : i64
@@ -277,7 +285,9 @@ func.func @async_cp_zfill_f32_align4(
   %src: memref<128x128xf32>, %dst: memref<3x16x128xf32, 3>, %i : index, %srcElements : index) {
   // CHECK-DAG: %[[IDX1:.*]] = builtin.unrealized_conversion_cast %[[IDX]] : index to i64
   // CHECK-DAG: %[[SRC1:.*]] = builtin.unrealized_conversion_cast %[[SRCELEMENTS]] : index to i64
-  // CHECK-DAG: %[[BASEDST:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK-DAG: %[[BASEDST_RAW:.*]] = llvm.extractvalue %[[DESCDST:.*]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK-DAG: %[[OFFDST:.*]] = llvm.extractvalue %[[DESCDST]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK-DAG: %[[BASEDST:.*]] = llvm.getelementptr %[[BASEDST_RAW]][%[[OFFDST]]]
   // CHECK-DAG: %[[S2048:.*]] = llvm.mlir.constant(2048 : index) : i64
   // CHECK-DAG: %[[LI1:.*]] = llvm.mul %[[IDX1]], %[[S2048]] : i64
   // CHECK-DAG: %[[S0:.*]] = llvm.mlir.constant(128 : index) : i64
@@ -285,7 +295,9 @@ func.func @async_cp_zfill_f32_align4(
   // CHECK-DAG: %[[FI1:.*]] = llvm.add %[[LI1]], %[[LI]] : i64
   // CHECK-DAG: %[[FI2:.*]] = llvm.add %[[FI1]], %[[IDX1]] : i64
   // CHECK-DAG: %[[ADDRESSDST:.*]] = llvm.getelementptr %[[BASEDST]][%[[FI2]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, f32
-  // CHECK-DAG: %[[BASESRC:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASESRC_RAW:.*]] = llvm.extractvalue %[[DESCSRC:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[OFFSRC:.*]] = llvm.extractvalue %[[DESCSRC]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASESRC:.*]] = llvm.getelementptr %[[BASESRC_RAW]][%[[OFFSRC]]]
   // CHECK-DAG: %[[S2:.*]] = llvm.mlir.constant(128 : index) : i64
   // CHECK-DAG: %[[FI2:.*]] = llvm.mul %[[IDX1]], %[[S2]]  : i64
   // CHECK-DAG: %[[FI3:.*]] = llvm.add %[[FI2]], %[[IDX1]]  : i64
@@ -312,7 +324,9 @@ func.func @async_cp_zfill_f32_align1(
   %src: memref<128x128xf32>, %dst: memref<3x16x128xf32, 3>, %i : index, %srcElements : index) {
   // CHECK-DAG: %[[IDX1:.*]] = builtin.unrealized_conversion_cast %[[IDX]] : index to i64
   // CHECK-DAG: %[[SRC1:.*]] = builtin.unrealized_conversion_cast %[[SRCELEMENTS]] : index to i64
-  // CHECK-DAG: %[[BASEDST:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK-DAG: %[[BASEDST_RAW:.*]] = llvm.extractvalue %[[DESCDST:.*]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK-DAG: %[[OFFDST:.*]] = llvm.extractvalue %[[DESCDST]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK-DAG: %[[BASEDST:.*]] = llvm.getelementptr %[[BASEDST_RAW]][%[[OFFDST]]]
   // CHECK-DAG: %[[S2048:.*]] = llvm.mlir.constant(2048 : index) : i64
   // CHECK-DAG: %[[LI1:.*]] = llvm.mul %[[IDX1]], %[[S2048]] : i64
   // CHECK-DAG: %[[S0:.*]] = llvm.mlir.constant(128 : index) : i64
@@ -320,7 +334,9 @@ func.func @async_cp_zfill_f32_align1(
   // CHECK-DAG: %[[FI1:.*]] = llvm.add %[[LI1]], %[[LI]] : i64
   // CHECK-DAG: %[[FI2:.*]] = llvm.add %[[FI1]], %[[IDX1]] : i64
   // CHECK-DAG: %[[ADDRESSDST:.*]] = llvm.getelementptr %[[BASEDST]][%[[FI2]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, f32
-  // CHECK-DAG: %[[BASESRC:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASESRC_RAW:.*]] = llvm.extractvalue %[[DESCSRC:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[OFFSRC:.*]] = llvm.extractvalue %[[DESCSRC]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK-DAG: %[[BASESRC:.*]] = llvm.getelementptr %[[BASESRC_RAW]][%[[OFFSRC]]]
   // CHECK-DAG: %[[S2:.*]] = llvm.mlir.constant(128 : index) : i64
   // CHECK-DAG: %[[FI2:.*]] = llvm.mul %[[IDX1]], %[[S2]]  : i64
   // CHECK-DAG: %[[FI3:.*]] = llvm.add %[[FI2]], %[[IDX1]]  : i64
@@ -484,17 +500,23 @@ func.func @mbarrier() {
   %barrier = nvgpu.mbarrier.create -> !barrierType
 
   // CHECK: %[[barStr:.+]] =  builtin.unrealized_conversion_cast %[[barMemref]] : memref<1xi64, 3> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
-  // CHECK: %[[base:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[bar_off:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base:.+]] = llvm.getelementptr %[[base_raw]][%[[bar_off]]]
   // CHECK: %[[barPtr:.+]] = llvm.getelementptr %[[base]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: nvvm.mbarrier.init %[[barPtr]]
     nvgpu.mbarrier.init %barrier[%c0], %num_threads : !barrierType
 
-  // CHECK: %[[base2:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base2_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[bar_off2:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base2:.+]] = llvm.getelementptr %[[base2_raw]][%[[bar_off2]]]
   // CHECK: %[[barPtr2:.+]] = llvm.getelementptr %[[base2]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: %[[token:.+]] = nvvm.mbarrier.arrive %[[barPtr2]]
   %token = nvgpu.mbarrier.arrive %barrier[%c0] : !barrierType -> !tokenType
 
-  // CHECK: %[[base3:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base3_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[bar_off3:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base3:.+]] = llvm.getelementptr %[[base3_raw]][%[[bar_off3]]]
   // CHECK: %[[barPtr3:.+]] = llvm.getelementptr %[[base3]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: nvvm.mbarrier.test.wait %[[barPtr3]], %[[token]]
   %isDone = nvgpu.mbarrier.test.wait %barrier[%c0], %token : !barrierType, !tokenType
@@ -514,17 +536,23 @@ func.func @mbarrier_nocomplete() {
   %barrier = nvgpu.mbarrier.create -> !barrierType
 
   // CHECK: %[[barStr:.+]] =  builtin.unrealized_conversion_cast %[[barMemref]] : memref<1xi64, 3> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
-  // CHECK: %[[base:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[bar_off:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base:.+]] = llvm.getelementptr %[[base_raw]][%[[bar_off]]]
   // CHECK: %[[barPtr:.+]] = llvm.getelementptr %[[base]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: nvvm.mbarrier.init %[[barPtr]]
   nvgpu.mbarrier.init %barrier[%c0], %num_threads : !barrierType
 
-  // CHECK: %[[base2:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base2_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[bar_off2:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base2:.+]] = llvm.getelementptr %[[base2_raw]][%[[bar_off2]]]
   // CHECK: %[[barPtr2:.+]] = llvm.getelementptr %[[base2]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: %[[token:.+]] = nvvm.mbarrier.arrive.nocomplete %[[barPtr2]]
   %token = nvgpu.mbarrier.arrive.nocomplete %barrier[%c0], %count : !barrierType -> !tokenType
 
-  // CHECK: %[[base3:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base3_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[bar_off3:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[base3:.+]] = llvm.getelementptr %[[base3_raw]][%[[bar_off3]]]
   // CHECK: %[[barPtr3:.+]] = llvm.getelementptr %[[base3]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: nvvm.mbarrier.test.wait %[[barPtr3]], %[[token]]
   %isDone = nvgpu.mbarrier.test.wait %barrier[%c0], %token : !barrierType, !tokenType
@@ -538,7 +566,9 @@ func.func @mbarrier_get(%barriers : !nvgpu.mbarrier.group<memorySpace = #gpu.add
   // CHECK: %[[S0:.+]] = builtin.unrealized_conversion_cast %[[ARG0]] : !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>, num_barriers = 5> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK: %[[c2:.+]] = arith.constant 2 : index
   // CHECK: %[[S1:.+]] = builtin.unrealized_conversion_cast %[[c2]] : index to i64
-  // CHECK: %[[S2:.+]] = llvm.extractvalue %[[S0]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)> 
+  // CHECK: %[[S2_RAW:.+]] = llvm.extractvalue %[[S0]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[S2_OFF:.+]] = llvm.extractvalue %[[S0]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[S2:.+]] = llvm.getelementptr %[[S2_RAW]][%[[S2_OFF]]]
   // CHECK: %[[S3:.+]] = llvm.getelementptr %[[S2]][%[[S1]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: %[[S4:.+]] = llvm.ptrtoint %[[S3]] : !llvm.ptr<3> to i32
   %c2 = arith.constant 2 : index
@@ -546,7 +576,9 @@ func.func @mbarrier_get(%barriers : !nvgpu.mbarrier.group<memorySpace = #gpu.add
 
   // CHECK: %[[c4:.+]] = arith.constant 4 : index
   // CHECK: %[[S5:.+]] = builtin.unrealized_conversion_cast %[[c4]] : index to i64
-  // CHECK: %[[S6:.+]] = llvm.extractvalue %[[S0]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)> 
+  // CHECK: %[[S6_RAW:.+]] = llvm.extractvalue %[[S0]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[S6_OFF:.+]] = llvm.extractvalue %[[S0]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[S6:.+]] = llvm.getelementptr %[[S6_RAW]][%[[S6_OFF]]]
   // CHECK: %[[S7:.+]] = llvm.getelementptr %[[S6]][%[[S5]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: %[[S8:.+]] = llvm.ptrtoint %[[S7]] : !llvm.ptr<3> to i64
   %c4 = arith.constant 4 : index
@@ -570,7 +602,9 @@ func.func @mbarrier_wait(%barriers : !nvgpu.mbarrier.group<memorySpace = #gpu.ad
 // CHECK: scf.for %[[i:.*]] =
 // CHECK: %[[S2:.+]] = arith.remui %[[i]], %[[c5]] : index
 // CHECK: %[[S3:.+]] = builtin.unrealized_conversion_cast %[[S2]] : index to i64
-// CHECK: %[[S4:.+]] = llvm.extractvalue %[[CARG0]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK: %[[S4_RAW:.+]] = llvm.extractvalue %[[CARG0]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK: %[[S4_OFF:.+]] = llvm.extractvalue %[[CARG0]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK: %[[S4:.+]] = llvm.getelementptr %[[S4_RAW]][%[[S4_OFF]]]
 // CHECK: %[[S5:.+]] = llvm.getelementptr %[[S4]][%[[S3]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
 // CHECK: nvvm.mbarrier.test.wait {{.*}}, %[[CARG1]]
     %mbarId = arith.remui %i, %numBarriers : index
@@ -590,7 +624,9 @@ func.func @mbarrier_txcount() {
     %barrier = nvgpu.mbarrier.create -> !barrierType
 
     // CHECK: %[[barStr:.+]] =  builtin.unrealized_conversion_cast %[[barMemref]] : memref<1xi64, 3> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
-    // CHECK: %[[base:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[bar_off:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base:.+]] = llvm.getelementptr %[[base_raw]][%[[bar_off]]]
     // CHECK: %[[barPtr:.+]] = llvm.getelementptr %[[base]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
     // CHECK: nvvm.mbarrier.init %[[barPtr]]
     nvgpu.mbarrier.init %barrier[%c0], %num_threads : !barrierType
@@ -601,14 +637,18 @@ func.func @mbarrier_txcount() {
 
     scf.if %cnd {
       %txcount = arith.constant 256 : index
-      // CHECK: %[[base2:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+      // CHECK: %[[base2_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+      // CHECK: %[[bar_off2:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+      // CHECK: %[[base2:.+]] = llvm.getelementptr %[[base2_raw]][%[[bar_off2]]]
       // CHECK: %[[barPtr2:.+]] = llvm.getelementptr %[[base2]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
       // CHECK: nvvm.mbarrier.arrive.expect_tx %[[barPtr2]]
       nvgpu.mbarrier.arrive.expect_tx %barrier[%c0], %txcount : !barrierType
       scf.yield
     } else {
       %txcount = arith.constant 0 : index
-      // CHECK: %[[base2:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+      // CHECK: %[[base2_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+      // CHECK: %[[bar_off2:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+      // CHECK: %[[base2:.+]] = llvm.getelementptr %[[base2_raw]][%[[bar_off2]]]
       // CHECK: %[[barPtr2:.+]] = llvm.getelementptr %[[base2]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
       // CHECK: nvvm.mbarrier.arrive.expect_tx %[[barPtr2]]
       nvgpu.mbarrier.arrive.expect_tx %barrier[%c0], %txcount : !barrierType
@@ -618,7 +658,9 @@ func.func @mbarrier_txcount() {
 
     %phase_c0 = arith.constant 0 : i1
     %ticks = arith.constant 10000000 : index
-    // CHECK: %[[base3:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base3_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[bar_off3:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base3:.+]] = llvm.getelementptr %[[base3_raw]][%[[bar_off3]]]
     // CHECK: %[[barPtr3:.+]] = llvm.getelementptr %[[base3]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
     // CHECK: nvvm.mbarrier.try_wait.parity %[[barPtr3]]
     nvgpu.mbarrier.try_wait.parity %barrier[%c0], %phase_c0, %ticks : !barrierType
@@ -641,20 +683,26 @@ func.func @mbarrier_txcount_pred() {
     %barrier = nvgpu.mbarrier.create -> !barrierType
 
     // CHECK: %[[barStr:.+]] =  builtin.unrealized_conversion_cast %[[barMemref]] : memref<1xi64, 3> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
-    // CHECK: %[[base:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[bar_off:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base:.+]] = llvm.getelementptr %[[base_raw]][%[[bar_off]]]
     // CHECK: %[[barPtr:.+]] = llvm.getelementptr %[[base]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
     // CHECK: nvvm.mbarrier.init %[[barPtr]], {{.*}}, predicate = %[[P]]
     nvgpu.mbarrier.init %barrier[%c0], %mine, predicate = %pred : !barrierType
 
     %txcount = arith.constant 256 : index
-    // CHECK: %[[base2:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base2_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[bar_off2:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base2:.+]] = llvm.getelementptr %[[base2_raw]][%[[bar_off2]]]
     // CHECK: %[[barPtr2:.+]] = llvm.getelementptr %[[base2]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
     // CHECK: nvvm.mbarrier.arrive.expect_tx %[[barPtr2]], {{.*}}, predicate = %[[P]]
     nvgpu.mbarrier.arrive.expect_tx %barrier[%c0], %txcount, predicate = %pred : !barrierType
 
     %phase_c0 = arith.constant 0 : i1
     %ticks = arith.constant 10000000 : index
-    // CHECK: %[[base3:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base3_raw:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[bar_off3:.+]] = llvm.extractvalue %[[barStr]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: %[[base3:.+]] = llvm.getelementptr %[[base3_raw]][%[[bar_off3]]]
     // CHECK: %[[barPtr3:.+]] = llvm.getelementptr %[[base3]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
     // CHECK: nvvm.mbarrier.try_wait.parity %[[barPtr3]]
     nvgpu.mbarrier.try_wait.parity %barrier[%c0], %phase_c0, %ticks : !barrierType
@@ -851,7 +899,9 @@ module @mymodule {
     %rhsShmem = memref.subview %rhsShmem3[0, 0, 0][1, 64, 64][1, 1, 1]  : memref<1x64x64xf16, strided<[4096, 64, 1]>, 3> to memref<64x64xf16, strided<[64, 1]>, 3>
     // CHECK: nvvm.cp.async.bulk.tensor.shared.cluster.global
     nvgpu.tma.async.load %lhsTensorMap[%c0, %c0], %mbarrier[%c0] to %lhsShmem : !lhsTensorMap, !barrierType -> memref<128x64xf16,3>
-    // CHECK: %[[desc:.+]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK: %[[desc_raw:.+]] = llvm.extractvalue %[[desc_struct:.*]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK: %[[desc_off:.+]] = llvm.extractvalue %[[desc_struct]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK: %[[desc:.+]] = llvm.getelementptr %[[desc_raw]][%[[desc_off]]]
     // CHECK: %[[dest:.+]] = llvm.addrspacecast %[[desc]] : !llvm.ptr<3> to !llvm.ptr<7>
     // CHECK: nvvm.cp.async.bulk.tensor.shared.cluster.global %[[dest]], %{{.*}}, %{{.*}}, box[%{{.*}}, %{{.*}}]
     nvgpu.tma.async.load %rhsTensorMap[%c0, %c0], %mbarrier[%c0] to %rhsShmem : !rhsTensorMap, !barrierType -> memref<64x64xf16, strided<[64, 1]>, 3>
@@ -870,7 +920,9 @@ func.func @create_wgmma_descriptor(%tensorMap : !tensorMap) -> !nvgpu.warpgroup.
     // CHECK: %[[S1:.+]] = builtin.unrealized_conversion_cast %[[Sre]] : memref<128x64xf16, 3> to !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
     // CHECK: %[[c64:.+]] =  llvm.mlir.constant(64 : i64) : i64
     // CHECK: %[[c1024:.+]] = llvm.mlir.constant(1024 : i64) : i64
-    // CHECK: %[[S2:.+]] = llvm.extractvalue %[[S1]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK: %[[S2_RAW:.+]] = llvm.extractvalue %[[S1]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK: %[[S2_OFF:.+]] = llvm.extractvalue %[[S1]][2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<2 x i64>, array<2 x i64>)>
+    // CHECK: %[[S2:.+]] = llvm.getelementptr %[[S2_RAW]][%[[S2_OFF]]]
     // CHECK: %[[S3:.+]] = llvm.ptrtoint %[[S2]] : !llvm.ptr<3> to i64
     // CHECK: %[[S4:.+]] = llvm.mlir.constant(46 : i64) : i64
     // CHECK: %[[S5:.+]] = llvm.shl %[[S3]], %[[S4]]  : i64
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-scalable-memcpy.mlir b/mlir/test/Conversion/VectorToLLVM/vector-scalable-memcpy.mlir
index 80e6caa05db5e..bc95dca04c93e 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-scalable-memcpy.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-scalable-memcpy.mlir
@@ -11,11 +11,15 @@ func.func @vector_scalable_memcopy(%src : memref<?xf32>, %dst : memref<?xf32>, %
   // CHECK: scf.for [[LOOPIDX:%arg[0-9]+]] = {{.*}}
   scf.for %i0 = %c0 to %size step %step {
     // CHECK: [[DATAIDX:%[0-9]+]] = builtin.unrealized_conversion_cast [[LOOPIDX]] : index to i64
-    // CHECK: [[SRCMEM:%[0-9]+]] = llvm.extractvalue [[SRCMRS]][1] : !llvm.struct<(ptr
+    // CHECK: [[SRCALIGNED:%[0-9]+]] = llvm.extractvalue [[SRCMRS]][1] : !llvm.struct<(ptr
+    // CHECK-NEXT: [[SRCOFF:%[0-9]+]] = llvm.extractvalue [[SRCMRS]][2] : !llvm.struct<(ptr
+    // CHECK-NEXT: [[SRCMEM:%[0-9]+]] = llvm.getelementptr [[SRCALIGNED]]{{.}}[[SRCOFF]]{{.}}
     // CHECK-NEXT: [[SRCPTR:%[0-9]+]] = llvm.getelementptr [[SRCMEM]]{{.}}[[DATAIDX]]{{.}} : (!llvm.ptr, i64) -> !llvm.ptr, f32
     // CHECK-NEXT: [[LDVAL:%[0-9]+]] = llvm.load [[SRCPTR]]{{.*}}: !llvm.ptr -> vector<[4]xf32>
     %0 = vector.load %src[%i0] : memref<?xf32>, vector<[4]xf32>
-    // CHECK: [[DSTMEM:%[0-9]+]] = llvm.extractvalue [[DSTMRS]][1] : !llvm.struct<(ptr
+    // CHECK: [[DSTALIGNED:%[0-9]+]] = llvm.extractvalue [[DSTMRS]][1] : !llvm.struct<(ptr
+    // CHECK-NEXT: [[DSTOFF:%[0-9]+]] = llvm.extractvalue [[DSTMRS]][2] : !llvm.struct<(ptr
+    // CHECK-NEXT: [[DSTMEM:%[0-9]+]] = llvm.getelementptr [[DSTALIGNED]]{{.}}[[DSTOFF]]{{.}}
     // CHECK-NEXT: [[DSTPTR:%[0-9]+]] = llvm.getelementptr [[DSTMEM]]{{.}}[[DATAIDX]]{{.}} : (!llvm.ptr, i64) -> !llvm.ptr, f32
     // CHECK-NEXT: llvm.store [[LDVAL]], [[DSTPTR]]{{.*}}: vector<[4]xf32>, !llvm.ptr
     vector.store %0, %dst[%i0] : memref<?xf32>, vector<[4]xf32>
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
index 00ed7f947b503..86a70c7bddcfd 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
@@ -1668,7 +1668,9 @@ func.func @load_0d(%memref : memref<200x100xf32>, %i : index, %j : index) -> vec
 // CHECK: %[[J:.*]] = builtin.unrealized_conversion_cast %{{.*}} : index to i64
 // CHECK: %[[I:.*]] = builtin.unrealized_conversion_cast %{{.*}} : index to i64
 // CHECK: %[[CAST_MEMREF:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<200x100xf32> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK: %[[REF:.*]] = llvm.extractvalue %[[CAST_MEMREF]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[REF_RAW:.*]] = llvm.extractvalue %[[CAST_MEMREF]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[REF_OFF:.*]] = llvm.extractvalue %[[CAST_MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[REF:.*]] = llvm.getelementptr %[[REF_RAW]][%[[REF_OFF]]]
 // CHECK: %[[C100:.*]] = llvm.mlir.constant(100 : index) : i64
 // CHECK: %[[MUL:.*]] = llvm.mul %[[I]], %[[C100]] : i64
 // CHECK: %[[ADD:.*]] = llvm.add %[[MUL]], %[[J]] : i64
@@ -1785,7 +1787,9 @@ func.func @store_0d(%memref : memref<200x100xf32>, %i : index, %j : index) {
 // CHECK: %[[CAST_MEMREF:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<200x100xf32> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK: %[[CST:.*]] = arith.constant dense<1.100000e+01> : vector<f32>
 // CHECK: %[[VAL:.*]] = builtin.unrealized_conversion_cast %[[CST]] : vector<f32> to vector<1xf32>
-// CHECK: %[[REF:.*]] = llvm.extractvalue %[[CAST_MEMREF]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[REF_RAW:.*]] = llvm.extractvalue %[[CAST_MEMREF]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[REF_OFF:.*]] = llvm.extractvalue %[[CAST_MEMREF]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK: %[[REF:.*]] = llvm.getelementptr %[[REF_RAW]][%[[REF_OFF]]]
 // CHECK: %[[C100:.*]] = llvm.mlir.constant(100 : index) : i64
 // CHECK: %[[MUL:.*]] = llvm.mul %[[I]], %[[C100]] : i64
 // CHECK: %[[ADD:.*]] = llvm.add %[[MUL]], %[[J]] : i64
@@ -2021,6 +2025,7 @@ func.func @gather_1d_from_2d(%arg0: memref<4x4xf32>, %arg1: vector<4xi32>, %arg2
 }
 
 // CHECK-LABEL: func @gather_1d_from_2d
+// CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[B:.*]] = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[P:.*]] = llvm.getelementptr %[[B]][%{{.*}}] : (!llvm.ptr, vector<4xi32>) -> vector<4x!llvm.ptr>, f32
 // CHECK: %[[G:.*]] = llvm.intr.masked.gather %[[P]], %{{.*}}, %{{.*}} {alignment = 4 : i32} : (vector<4x!llvm.ptr>, vector<4xi1>, vector<4xf32>) -> vector<4xf32>
@@ -2035,6 +2040,7 @@ func.func @gather_1d_from_2d_scalable(%arg0: memref<4x?xf32>, %arg1: vector<[4]x
 }
 
 // CHECK-LABEL: func @gather_1d_from_2d_scalable
+// CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[B:.*]] = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[P:.*]] = llvm.getelementptr %[[B]][%{{.*}}] : (!llvm.ptr, vector<[4]xi32>) -> vector<[4]x!llvm.ptr>, f32
 // CHECK: %[[G:.*]] = llvm.intr.masked.gather %[[P]], %{{.*}}, %{{.*}} {alignment = 4 : i32} : (vector<[4]x!llvm.ptr>, vector<[4]xi1>, vector<[4]xf32>) -> vector<[4]xf32>
@@ -2125,6 +2131,7 @@ func.func @scatter_1d_into_2d(%arg0: memref<4x4xf32>, %arg1: vector<4xi32>, %arg
 }
 
 // CHECK-LABEL: func @scatter_1d_into_2d
+// CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[B:.*]] = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[P:.*]] = llvm.getelementptr %[[B]][%{{.*}}] : (!llvm.ptr, vector<4xi32>) -> vector<4x!llvm.ptr>, f32
 // CHECK: llvm.intr.masked.scatter %{{.*}}, %[[P]], %{{.*}} {alignment = 4 : i32} : vector<4xf32>, vector<4xi1> into vector<4x!llvm.ptr>
@@ -2138,6 +2145,7 @@ func.func @scatter_1d_into_2d_scalable(%arg0: memref<4x?xf32>, %arg1: vector<[4]
 }
 
 // CHECK-LABEL: func @scatter_1d_into_2d_scalable
+// CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[B:.*]] = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr, i64) -> !llvm.ptr, f32
 // CHECK: %[[P:.*]] = llvm.getelementptr %[[B]][%{{.*}}] : (!llvm.ptr, vector<[4]xi32>) -> vector<[4]x!llvm.ptr>, f32
 // CHECK: llvm.intr.masked.scatter %{{.*}}, %[[P]], %{{.*}} {alignment = 4 : i32} : vector<[4]xf32>, vector<[4]xi1> into vector<[4]x!llvm.ptr>
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-xfer-to-llvm.mlir b/mlir/test/Conversion/VectorToLLVM/vector-xfer-to-llvm.mlir
index 18deadd0d7a79..d6b12c721a572 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-xfer-to-llvm.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-xfer-to-llvm.mlir
@@ -36,6 +36,8 @@ func.func @transfer_read_write_1d(%A : memref<?xf32>, %base: index) -> vector<17
 //       CHECK: %[[mask:.*]] = arith.cmpi sgt, %[[boundVect]], %[[linearIndex]] : vector<17x[[$IDX_TYPE]]>
 //
 // 5. Bitcast to vector form.
+//       CHECK: %{{.*}} = llvm.getelementptr %{{.*}} :
+//  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //       CHECK: %[[gep:.*]] = llvm.getelementptr %{{.*}} :
 //  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //
@@ -57,6 +59,8 @@ func.func @transfer_read_write_1d(%A : memref<?xf32>, %base: index) -> vector<17
 //  CHECK-SAME: %[[linearIndex]] : vector<17x[[$IDX_TYPE]]>
 //
 // 3. Bitcast to vector form.
+//       CHECK: %{{.*}} = llvm.getelementptr {{.*}} :
+//  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //       CHECK: %[[gep_b:.*]] = llvm.getelementptr {{.*}} :
 //  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //
@@ -100,6 +104,8 @@ func.func @transfer_read_write_1d_scalable(%A : memref<?xf32>, %base: index) ->
 //  CHECK-SAME: : vector<[17]x[[$IDX_TYPE]]>
 //
 // 5. Bitcast to vector form.
+//       CHECK: %{{.*}} = llvm.getelementptr %{{.*}} :
+//  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //       CHECK: %[[gep:.*]] = llvm.getelementptr %{{.*}} :
 //  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //
@@ -124,6 +130,8 @@ func.func @transfer_read_write_1d_scalable(%A : memref<?xf32>, %base: index) ->
 //  CHECK-SAME: %[[boundVect_b]] : vector<[17]x[[$IDX_TYPE]]>
 //
 // 4. Bitcast to vector form.
+//       CHECK: %{{.*}} = llvm.getelementptr {{.*}} :
+//  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //       CHECK: %[[gep_b:.*]] = llvm.getelementptr {{.*}} :
 //  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //
@@ -298,6 +306,8 @@ func.func @transfer_read_1d_inbounds(%A : memref<?xf32>, %base: index) -> vector
 //  CHECK-SAME: %[[BASE:[a-zA-Z0-9]*]]: index) -> vector<17xf32>
 //
 // 1. Bitcast to vector form.
+//       CHECK: %{{.*}} = llvm.getelementptr {{.*}} :
+//  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //       CHECK: %[[gep:.*]] = llvm.getelementptr {{.*}} :
 //  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //
@@ -314,6 +324,8 @@ func.func @transfer_read_1d_inbounds_scalable(%A : memref<?xf32>, %base: index)
 //  CHECK-SAME: %[[BASE:[a-zA-Z0-9]*]]: index) -> vector<[17]xf32>
 //
 // 1. Bitcast to vector form.
+//       CHECK: %{{.*}} = llvm.getelementptr {{.*}} :
+//  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //       CHECK: %[[gep:.*]] = llvm.getelementptr {{.*}} :
 //  CHECK-SAME: (!llvm.ptr, i64) -> !llvm.ptr, f32
 //
diff --git a/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-target-tag.mlir b/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-target-tag.mlir
index 8ef3cd5b88bec..624f11bcaa78e 100644
--- a/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-target-tag.mlir
+++ b/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-target-tag.mlir
@@ -29,7 +29,8 @@ func.func @subview(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1
 
   // CHECK-DAG: %[[STRIDE0:.*]] = llvm.mlir.constant(4 : index) : i64
   // CHECK-DAG: %[[DESCSTRIDE0:.*]] = llvm.mul %[[ARG0]], %[[STRIDE0]] overflow<nsw> : i64
-  // CHECK-DAG: %[[OFF2:.*]] = llvm.add %[[DESCSTRIDE0]], %[[ARG1]] : i64
+  // CHECK-DAG: %[[OFF1:.*]] = llvm.add %[[BASE_OFFSET]], %[[DESCSTRIDE0]] : i64
+  // CHECK-DAG: %[[OFF2:.*]] = llvm.add %[[OFF1]], %[[ARG1]] : i64
   // CHECK-DAG: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
   // Base address and algined address.
diff --git a/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-top-level-named-sequence.mlir b/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-top-level-named-sequence.mlir
index 48e18d95c0e59..748383ac5518a 100644
--- a/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-top-level-named-sequence.mlir
+++ b/mlir/test/Dialect/LLVM/lower-to-llvm-e2e-with-top-level-named-sequence.mlir
@@ -28,7 +28,8 @@ func.func @subview(%0 : memref<64x4xf32, strided<[4, 1]>>, %arg0 : index, %arg1
 
   // CHECK-DAG: %[[STRIDE0:.*]] = llvm.mlir.constant(4 : index) : i64
   // CHECK-DAG: %[[DESCSTRIDE0:.*]] = llvm.mul %[[ARG0]], %[[STRIDE0]] overflow<nsw> : i64
-  // CHECK-DAG: %[[OFF2:.*]] = llvm.add %[[DESCSTRIDE0]], %[[ARG1]] : i64
+  // CHECK-DAG: %[[OFF1:.*]] = llvm.add %[[BASE_OFFSET]], %[[DESCSTRIDE0]] : i64
+  // CHECK-DAG: %[[OFF2:.*]] = llvm.add %[[OFF1]], %[[ARG1]] : i64
   // CHECK-DAG: %[[DESC:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
   // Base address and algined address.
diff --git a/mlir/test/Integration/Dialect/MemRef/cast-runtime-verification.mlir b/mlir/test/Integration/Dialect/MemRef/cast-runtime-verification.mlir
index aed8c76cf394d..25e88acc17da1 100644
--- a/mlir/test/Integration/Dialect/MemRef/cast-runtime-verification.mlir
+++ b/mlir/test/Integration/Dialect/MemRef/cast-runtime-verification.mlir
@@ -56,11 +56,6 @@ func.func @main() {
   %3 = memref.cast %alloc : memref<5xf32> to memref<*xf32>
   func.call @cast_to_ranked(%3) : (memref<*xf32>) -> (memref<f32>)
 
-  // CHECK-NEXT: ERROR: Runtime op verification failed
-  // CHECK-NEXT: memref.cast %{{.*}} : memref<?xf32, strided<[?]>>
-  // CHECK-NEXT: ^ offset mismatch
-  // CHECK-NEXT: Location: loc({{.*}})
-
   // CHECK-NEXT: ERROR: Runtime op verification failed
   // CHECK-NEXT: memref.cast %{{.*}} : memref<?xf32, strided<[?]>>
   // CHECK-NEXT: ^ stride mismatch of dim 0
diff --git a/mlir/test/python/dialects/memref.py b/mlir/test/python/dialects/memref.py
index d1d2b4e9cb627..adbd2768ed694 100644
--- a/mlir/test/python/dialects/memref.py
+++ b/mlir/test/python/dialects/memref.py
@@ -156,7 +156,7 @@ def testSubViewOpInferReturnTypeSemantics():
                 # CHECK: mixed static/dynamic offset/sizes/strides requires explicit result type
                 print(e)
 
-            layout = StridedLayoutAttr.get(ShapedType.get_dynamic_size(), [10, 1])
+            layout = StridedLayoutAttr.get([10, 1])
             x = memref.alloc(
                 T.memref(
                     10,
@@ -165,9 +165,9 @@ def testSubViewOpInferReturnTypeSemantics():
                     layout=layout,
                 ),
                 [],
-                [arith.constant(T.index(), 42)],
+                [],
             )
-            # CHECK: %[[DYNAMICALLOC:.*]] = memref.alloc()[%c42] : memref<10x10xi32, strided<[10, 1]>>
+            # CHECK: %[[STATICALLOC:.*]] = memref.alloc() : memref<10x10xi32, strided<[10, 1]>>
             print(x.owner)
             y = memref.subview(
                 x,
@@ -176,7 +176,7 @@ def testSubViewOpInferReturnTypeSemantics():
                 [1, 1],
                 result_type=T.memref(3, 3, T.i32(), layout=layout),
             )
-            # CHECK: %{{.*}} = memref.subview %[[DYNAMICALLOC]][1, 1] [3, 3] [1, 1] : memref<10x10xi32, strided<[10, 1]>> to memref<3x3xi32, strided<[10, 1]>>
+            # CHECK: %{{.*}} = memref.subview %[[STATICALLOC]][1, 1] [3, 3] [1, 1] : memref<10x10xi32, strided<[10, 1]>> to memref<3x3xi32, strided<[10, 1]>>
             print(y.owner)
 
 
@@ -187,11 +187,9 @@ def check_strides_offset(memref, np_view):
         layout = memref.type.layout
         dtype_size_in_bytes = np_view.dtype.itemsize
         golden_strides = (np.array(np_view.strides) // dtype_size_in_bytes).tolist()
-        golden_offset = (
-            np_view.ctypes.data - np_view.base.ctypes.data
-        ) // dtype_size_in_bytes
-
-        assert (layout.strides, layout.offset) == (golden_strides, golden_offset)
+        # Offset is no longer carried by StridedLayoutAttr.
+        if hasattr(layout, "strides"):
+            assert layout.strides == golden_strides
 
     with Context() as ctx, Location.unknown(ctx):
         module = Module.create()

>From cbf20a4044e48863f93ab3139ad2879973d284a0 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 12:57:52 +0200
Subject: [PATCH 23/27] [WIP][mlir] step 3: rename getStridesAndOffset ->
 getStrides

MemRefType's offset is always 0 now (the type no longer carries it), so the
MemRefLayoutAttrInterface method, MemRefType helpers, C API, Python
bindings, and free helper are renamed/simplified to just return strides.
Runtime offset lives on extract_strided_metadata / the descriptor.

Interface/API surface:
- MemRefLayoutAttrInterface::getStridesAndOffset -> getStrides
- MemRefType::getStridesAndOffset -> getStrides (both overloads)
- detail::getAffineMapStridesAndOffset -> getAffineMapStrides
- StridedLayoutAttr::getStrides impl drops the offset argument
- mlirMemRefTypeGetStridesAndOffset -> mlirMemRefTypeGetStrides
- Python: PyMemRefType.get_strides_and_offset -> get_strides
- Python memref dialect helpers updated

Follow-on semantic fixes for call sites that were using the type-level
offset:
- RuntimeOpVerification, BufferizationOps, MemRefOps: drop offset
  compatibility checks (type no longer carries offset).
- MemRefBuilder/MemRefToLLVM/ViewOp/ReshapeOp lowerings: write 0 to the
  descriptor offset slot (instead of reading from type).
- PtrToLLVM metadata struct: always include the offset slot since the
  runtime offset is always dynamic.
- DecomposeMemRefs, SPIRVConversion, XeGPUToXeVM, VectorToXeGPU:
  always read the runtime offset via extract_strided_metadata.
- AMDGPU staticallyOutOfBounds: drop the static offset term.

Test updates covering the new IR shape in:
- Analysis/DataFlow strided-metadata analysis (offsets now unrefined).
- Dialect/Affine memref-stride-calculation dump (no offset line).
- Dialect/GPU decompose-memrefs (affine map has +1 symbol).
- Conversion/PtrToLLVM metadata struct layout.
- Conversion/XeGPUToXeVM suite (always-extract-strided-metadata path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
---
 mlir/include/mlir-c/BuiltinTypes.h            |  8 +-
 .../include/mlir/Dialect/XeGPU/IR/XeGPUOps.td |  2 +-
 .../mlir/IR/BuiltinAttributeInterfaces.h      | 10 +-
 .../mlir/IR/BuiltinAttributeInterfaces.td     | 17 ++--
 mlir/include/mlir/IR/BuiltinAttributes.td     |  2 +-
 mlir/include/mlir/IR/BuiltinTypes.td          |  9 +-
 .../DataFlow/StridedMetadataRangeAnalysis.cpp | 11 +--
 mlir/lib/Bindings/Python/IRTypes.cpp          | 15 ++-
 mlir/lib/CAPI/IR/BuiltinTypes.cpp             |  6 +-
 .../AMDGPUToROCDL/AMDGPUToROCDL.cpp           |  6 +-
 .../Conversion/LLVMCommon/MemRefBuilder.cpp   |  8 +-
 mlir/lib/Conversion/LLVMCommon/Pattern.cpp    |  2 +-
 .../Conversion/LLVMCommon/TypeConverter.cpp   |  9 +-
 .../Conversion/MemRefToLLVM/MemRefToLLVM.cpp  | 28 +++---
 mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp   | 52 +++++------
 .../Conversion/VectorToGPU/VectorToGPU.cpp    |  4 +-
 .../VectorToLLVM/ConvertVectorToLLVM.cpp      |  3 +-
 .../VectorToXeGPU/VectorToXeGPU.cpp           | 14 +--
 .../Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp    | 45 ++++-----
 mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp      |  6 +-
 .../Bufferization/IR/BufferizationOps.cpp     |  7 +-
 .../Transforms/BufferResultsToOutParams.cpp   |  4 +-
 .../GPU/Transforms/DecomposeMemRefs.cpp       |  6 +-
 mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp      | 76 +++++----------
 .../MemRef/Transforms/EmulateNarrowType.cpp   |  3 +-
 .../Transforms/ExpandStridedMetadata.cpp      | 16 ++--
 .../MemRef/Transforms/FlattenMemRefs.cpp      | 25 ++---
 .../Transforms/RuntimeOpVerification.cpp      |  4 +-
 mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp |  3 +-
 mlir/lib/Dialect/NVGPU/Utils/MMAUtils.cpp     |  4 +-
 .../SPIRV/Transforms/SPIRVConversion.cpp      | 31 +++----
 .../BufferizableOpInterfaceImpl.cpp           |  3 +-
 .../VectorTransferSplitRewritePatterns.cpp    |  4 +-
 .../Vector/Transforms/VectorTransforms.cpp    |  3 +-
 mlir/lib/Dialect/X86/IR/X86Dialect.cpp        |  6 +-
 mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp        |  4 +-
 mlir/lib/IR/BuiltinAttributeInterfaces.cpp    | 10 +-
 mlir/lib/IR/BuiltinAttributes.cpp             | 10 +-
 mlir/lib/IR/BuiltinTypes.cpp                  | 24 ++---
 mlir/python/mlir/dialects/memref.py           |  4 +-
 .../test-strided-metadata-range-analysis.mlir |  8 +-
 .../Conversion/PtrToLLVM/ptr-to-llvm.mlir     | 92 ++++++++++---------
 .../XeGPUToXeVM/create_nd_tdesc.mlir          | 18 +++-
 .../Conversion/XeGPUToXeVM/loadstore_1d.mlir  | 16 +++-
 .../XeGPUToXeVM/loadstore_matrix.mlir         | 26 +++---
 .../XeGPUToXeVM/loadstore_nd_sub_byte.mlir    |  6 +-
 .../XeGPUToXeVM/loadstoreprefetch.mlir        | 20 ++--
 .../XeGPUToXeVM/materializecast.mlir          |  9 +-
 .../Affine/memref-stride-calculation.mlir     | 62 ++++++-------
 mlir/test/Dialect/GPU/decompose-memrefs.mlir  | 20 ++--
 .../Analysis/TestMemRefStrideCalculation.cpp  | 10 +-
 51 files changed, 352 insertions(+), 439 deletions(-)

diff --git a/mlir/include/mlir-c/BuiltinTypes.h b/mlir/include/mlir-c/BuiltinTypes.h
index f6c30f375cb1a..b86b61a827102 100644
--- a/mlir/include/mlir-c/BuiltinTypes.h
+++ b/mlir/include/mlir-c/BuiltinTypes.h
@@ -536,10 +536,10 @@ MLIR_CAPI_EXPORTED MlirAffineMap mlirMemRefTypeGetAffineMap(MlirType type);
 MLIR_CAPI_EXPORTED MlirAttribute mlirMemRefTypeGetMemorySpace(MlirType type);
 
 /// Returns the strides of the MemRef if the layout map is in strided form.
-/// Both strides and offset are out params. strides must point to pre-allocated
-/// memory of length equal to the rank of the memref.
-MLIR_CAPI_EXPORTED MlirLogicalResult mlirMemRefTypeGetStridesAndOffset(
-    MlirType type, int64_t *strides, int64_t *offset);
+/// strides is an out param and must point to pre-allocated memory of length
+/// equal to the rank of the memref.
+MLIR_CAPI_EXPORTED MlirLogicalResult
+mlirMemRefTypeGetStrides(MlirType type, int64_t *strides);
 
 /// Returns the memory spcae of the given Unranked MemRef type.
 MLIR_CAPI_EXPORTED MlirAttribute
diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td
index 31fe93d209a6d..43925049d49b4 100644
--- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td
+++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUOps.td
@@ -208,7 +208,7 @@ def XeGPU_CreateNdDescOp: XeGPU_Op<"create_nd_tdesc", [Pure, ViewLikeOpInterface
       /// Get the static strides, the value passed to const_strides
       /// will overide the value in memref.
       if (auto memrefTy = llvm::dyn_cast<MemRefType>(getSourceType()))
-        statics = memrefTy.getStridesAndOffset().first;
+        statics = memrefTy.getStrides();
       if (auto attr = getConstStridesAttr())
         statics = llvm::to_vector(attr.asArrayRef());
 
diff --git a/mlir/include/mlir/IR/BuiltinAttributeInterfaces.h b/mlir/include/mlir/IR/BuiltinAttributeInterfaces.h
index b94a933b5c945..3f6123497a689 100644
--- a/mlir/include/mlir/IR/BuiltinAttributeInterfaces.h
+++ b/mlir/include/mlir/IR/BuiltinAttributeInterfaces.h
@@ -270,12 +270,10 @@ LogicalResult
 verifyAffineMapAsLayout(AffineMap m, ArrayRef<int64_t> shape,
                         function_ref<InFlightDiagnostic()> emitError);
 
-// Return the strides and offsets that can be inferred from the given affine
-// layout map given the map and a memref shape.
-LogicalResult getAffineMapStridesAndOffset(AffineMap map,
-                                           ArrayRef<int64_t> shape,
-                                           SmallVectorImpl<int64_t> &strides,
-                                           int64_t &offset);
+// Return the strides that can be inferred from the given affine layout map
+// given the map and a memref shape.
+LogicalResult getAffineMapStrides(AffineMap map, ArrayRef<int64_t> shape,
+                                  SmallVectorImpl<int64_t> &strides);
 } // namespace detail
 
 } // namespace mlir
diff --git a/mlir/include/mlir/IR/BuiltinAttributeInterfaces.td b/mlir/include/mlir/IR/BuiltinAttributeInterfaces.td
index 7bc7fbe8c50f2..35bb2997d2376 100644
--- a/mlir/include/mlir/IR/BuiltinAttributeInterfaces.td
+++ b/mlir/include/mlir/IR/BuiltinAttributeInterfaces.td
@@ -513,18 +513,17 @@ def MemRefLayoutAttrInterface : AttrInterface<"MemRefLayoutAttrInterface"> {
 
     InterfaceMethod<
       [{Return the strides (using ShapedType::kDynamic for the dynamic case)
-      that this layout corresponds to into `strides` and `offset` if such exist
-      and can be determined from a combination of the layout and the given
-      `shape`. If these strides cannot be inferred, return failure().
-      The values of `strides` and `offset` are undefined on failure.}],
-      "::llvm::LogicalResult", "getStridesAndOffset",
+      that this layout corresponds to into `strides` if such exist and can be
+      determined from a combination of the layout and the given `shape`. If
+      these strides cannot be inferred, return failure().
+      The values of `strides` are undefined on failure.}],
+      "::llvm::LogicalResult", "getStrides",
       (ins "::llvm::ArrayRef<int64_t>":$shape,
-           "::llvm::SmallVectorImpl<int64_t>&":$strides,
-           "int64_t&":$offset),
+           "::llvm::SmallVectorImpl<int64_t>&":$strides),
            [{}],
            [{
-            return ::mlir::detail::getAffineMapStridesAndOffset(
-              $_attr.getAffineMap(), shape, strides, offset);
+            return ::mlir::detail::getAffineMapStrides(
+              $_attr.getAffineMap(), shape, strides);
            }]
     >
   ];
diff --git a/mlir/include/mlir/IR/BuiltinAttributes.td b/mlir/include/mlir/IR/BuiltinAttributes.td
index e35de7aafdce9..b1cecd220a1f1 100644
--- a/mlir/include/mlir/IR/BuiltinAttributes.td
+++ b/mlir/include/mlir/IR/BuiltinAttributes.td
@@ -1025,7 +1025,7 @@ def Builtin_SparseElementsAttr : Builtin_Attr<
 
 def StridedLayoutAttr : Builtin_Attr<"StridedLayout", "strided_layout",
     [DeclareAttrInterfaceMethods<MemRefLayoutAttrInterface,
-                                 ["verifyLayout", "getStridesAndOffset"]>]> {
+                                 ["verifyLayout", "getStrides"]>]> {
   let summary = "An Attribute representing a strided layout of a shaped type";
   let description = [{
     Syntax:
diff --git a/mlir/include/mlir/IR/BuiltinTypes.td b/mlir/include/mlir/IR/BuiltinTypes.td
index 0db4c9174bab0..98324f6f6b072 100644
--- a/mlir/include/mlir/IR/BuiltinTypes.td
+++ b/mlir/include/mlir/IR/BuiltinTypes.td
@@ -1026,12 +1026,11 @@ def Builtin_MemRef : Builtin_Type<"MemRef", "memref", [
     /// static or dynamic (encoded with ShapedType::kDynamic). Strides encode
     /// the distance in the number of elements between successive entries along
     /// a particular dimension.
-    LogicalResult getStridesAndOffset(SmallVectorImpl<int64_t> &strides,
-                                      int64_t &offset) const;
+    LogicalResult getStrides(SmallVectorImpl<int64_t> &strides) const;
 
-    /// Wrapper around getStridesAndOffset(SmallVectorImpl<int64_t>, int64_t)
-    /// that will assert if the logical result is not succeeded.
-    std::pair<SmallVector<int64_t>, int64_t> getStridesAndOffset() const;
+    /// Wrapper around getStrides(SmallVectorImpl<int64_t>) that will assert if
+    /// the logical result is not succeeded.
+    SmallVector<int64_t> getStrides() const;
 
     /// Return "true" if the layout is compatible with strided semantics.
     bool isStrided();
diff --git a/mlir/lib/Analysis/DataFlow/StridedMetadataRangeAnalysis.cpp b/mlir/lib/Analysis/DataFlow/StridedMetadataRangeAnalysis.cpp
index 01c9dafaddf10..c4bcdc54b870b 100644
--- a/mlir/lib/Analysis/DataFlow/StridedMetadataRangeAnalysis.cpp
+++ b/mlir/lib/Analysis/DataFlow/StridedMetadataRangeAnalysis.cpp
@@ -43,17 +43,12 @@ static StridedMetadataRange getEntryStateImpl(Value v, int32_t indexBitwidth) {
   auto metadata =
       StridedMetadataRange::getMaxRanges(indexBitwidth, mTy.getRank());
 
-  // Compute the offset and strides.
-  int64_t offset;
+  // Compute the strides. Offset is no longer carried by the type; runtime
+  // offset comes from extract_strided_metadata.
   SmallVector<int64_t> strides;
-  if (failed(cast<MemRefType>(mTy).getStridesAndOffset(strides, offset)))
+  if (failed(cast<MemRefType>(mTy).getStrides(strides)))
     return metadata;
 
-  // Refine the metadata if we know it from the type.
-  if (!ShapedType::isDynamic(offset)) {
-    metadata.getOffsets()[0] =
-        ConstantIntRanges::constant(APInt(indexBitwidth, offset));
-  }
   for (auto &&[size, range] :
        llvm::zip_equal(mTy.getShape(), metadata.getSizes())) {
     if (ShapedType::isDynamic(size))
diff --git a/mlir/lib/Bindings/Python/IRTypes.cpp b/mlir/lib/Bindings/Python/IRTypes.cpp
index 75fd55c90c2b5..49dae10927b68 100644
--- a/mlir/lib/Bindings/Python/IRTypes.cpp
+++ b/mlir/lib/Bindings/Python/IRTypes.cpp
@@ -662,17 +662,16 @@ void PyMemRefType::bindDerived(ClassTy &c) {
           },
           "The layout of the MemRef type.")
       .def(
-          "get_strides_and_offset",
-          [](PyMemRefType &self) -> std::pair<std::vector<int64_t>, int64_t> {
+          "get_strides",
+          [](PyMemRefType &self) -> std::vector<int64_t> {
             std::vector<int64_t> strides(mlirShapedTypeGetRank(self));
-            int64_t offset;
-            if (mlirLogicalResultIsFailure(mlirMemRefTypeGetStridesAndOffset(
-                    self, strides.data(), &offset)))
+            if (mlirLogicalResultIsFailure(
+                    mlirMemRefTypeGetStrides(self, strides.data())))
               throw std::runtime_error(
-                  "Failed to extract strides and offset from memref.");
-            return {strides, offset};
+                  "Failed to extract strides from memref.");
+            return strides;
           },
-          "The strides and offset of the MemRef type.")
+          "The strides of the MemRef type.")
       .def_prop_ro(
           "affine_map",
           [](PyMemRefType &self) -> PyAffineMap {
diff --git a/mlir/lib/CAPI/IR/BuiltinTypes.cpp b/mlir/lib/CAPI/IR/BuiltinTypes.cpp
index 6464fef4653e1..eb5078ce7a691 100644
--- a/mlir/lib/CAPI/IR/BuiltinTypes.cpp
+++ b/mlir/lib/CAPI/IR/BuiltinTypes.cpp
@@ -602,12 +602,10 @@ MlirAttribute mlirMemRefTypeGetMemorySpace(MlirType type) {
   return wrap(llvm::cast<MemRefType>(unwrap(type)).getMemorySpace());
 }
 
-MlirLogicalResult mlirMemRefTypeGetStridesAndOffset(MlirType type,
-                                                    int64_t *strides,
-                                                    int64_t *offset) {
+MlirLogicalResult mlirMemRefTypeGetStrides(MlirType type, int64_t *strides) {
   MemRefType memrefType = llvm::cast<MemRefType>(unwrap(type));
   SmallVector<int64_t> strides_;
-  if (failed(memrefType.getStridesAndOffset(strides_, *offset)))
+  if (failed(memrefType.getStrides(strides_)))
     return mlirLogicalResultFailure();
 
   (void)llvm::copy(strides_, strides);
diff --git a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
index 14d99c250c0b6..fe38acec29e78 100644
--- a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+++ b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
@@ -227,9 +227,8 @@ struct FatRawBufferCastLowering
     int64_t elementByteWidth =
         dataLayout.getTypeSizeInBits(memrefType.getElementType()) / 8;
 
-    int64_t unusedOffset = 0;
     SmallVector<int64_t, 5> strideVals;
-    if (failed(memrefType.getStridesAndOffset(strideVals, unusedOffset)))
+    if (failed(memrefType.getStrides(strideVals)))
       return op.emitOpError("Can't lower non-stride-offset memrefs");
 
     Value numRecords = adaptor.getValidBytes();
@@ -398,9 +397,8 @@ struct RawBufferOpLowering : public ConvertOpToLLVMPattern<GpuOp> {
     }
 
     // Construct buffer descriptor from memref, attributes
-    int64_t offset = 0;
     SmallVector<int64_t, 5> strides;
-    if (failed(memrefType.getStridesAndOffset(strides, offset)))
+    if (failed(memrefType.getStrides(strides)))
       return gpuOp.emitOpError("Can't lower non-stride-offset memrefs");
 
     MemRefDescriptor memrefDescriptor(memref);
diff --git a/mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp b/mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp
index 0762d6c9530d8..1e4ab902282cc 100644
--- a/mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp
+++ b/mlir/lib/Conversion/LLVMCommon/MemRefBuilder.cpp
@@ -51,9 +51,9 @@ MemRefDescriptor MemRefDescriptor::fromStaticShape(
     MemRefType type, Value memory, Value alignedMemory) {
   assert(type.hasStaticShape() && "unexpected dynamic shape");
 
-  // Extract all strides and offsets and verify they are static.
-  auto [strides, offset] = type.getStridesAndOffset();
-  assert(ShapedType::isStatic(offset) && "expected static offset");
+  // Extract all strides and verify they are static. Offset is no longer carried
+  // by the type; static-shape memrefs have offset 0 in the descriptor.
+  SmallVector<int64_t> strides = type.getStrides();
   assert(!llvm::any_of(strides, ShapedType::isDynamic) &&
          "expected static strides");
 
@@ -63,7 +63,7 @@ MemRefDescriptor MemRefDescriptor::fromStaticShape(
   auto descr = MemRefDescriptor::poison(builder, loc, convertedType);
   descr.setAllocatedPtr(builder, loc, memory);
   descr.setAlignedPtr(builder, loc, alignedMemory);
-  descr.setConstantOffset(builder, loc, offset);
+  descr.setConstantOffset(builder, loc, 0);
 
   // Fill in sizes and strides
   for (unsigned i = 0, e = type.getRank(); i != e; ++i) {
diff --git a/mlir/lib/Conversion/LLVMCommon/Pattern.cpp b/mlir/lib/Conversion/LLVMCommon/Pattern.cpp
index 2e0d92c3ba847..cd51c21dcb679 100644
--- a/mlir/lib/Conversion/LLVMCommon/Pattern.cpp
+++ b/mlir/lib/Conversion/LLVMCommon/Pattern.cpp
@@ -605,7 +605,7 @@ Value mlir::LLVM::getStridedElementPtr(OpBuilder &builder, Location loc,
                                        MemRefType type, Value memRefDesc,
                                        ValueRange indices,
                                        LLVM::GEPNoWrapFlags noWrapFlags) {
-  auto [strides, offset] = type.getStridesAndOffset();
+  auto strides = type.getStrides();
 
   MemRefDescriptor memRefDescriptor(memRefDesc);
   // Use a canonical representation of the start address so that later
diff --git a/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp b/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp
index a60ecc97aaee0..1eedfb9c3c54d 100644
--- a/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp
+++ b/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp
@@ -595,22 +595,21 @@ bool LLVMTypeConverter::canConvertToBarePtr(BaseMemRefType type) {
     // Unranked memref is not supported in the bare pointer calling convention.
     return false;
 
-  // Check that the memref has static shape, strides and offset. Otherwise, it
-  // cannot be lowered to a bare pointer.
+  // Check that the memref has static shape and strides. Offset is no longer
+  // carried by the type. Otherwise, it cannot be lowered to a bare pointer.
   auto memrefTy = cast<MemRefType>(type);
   if (!memrefTy.hasStaticShape())
     return false;
 
-  int64_t offset = 0;
   SmallVector<int64_t, 4> strides;
-  if (failed(memrefTy.getStridesAndOffset(strides, offset)))
+  if (failed(memrefTy.getStrides(strides)))
     return false;
 
   for (int64_t stride : strides)
     if (ShapedType::isDynamic(stride))
       return false;
 
-  return ShapedType::isStatic(offset);
+  return true;
 }
 
 /// Convert a memref type to a bare pointer to the memref element type.
diff --git a/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp b/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
index c42a85fa375ba..b7863061a2199 100644
--- a/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
+++ b/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
@@ -1505,18 +1505,16 @@ struct MemRefReshapeOpLowering
       desc.setAllocatedPtr(rewriter, loc, allocatedPtr);
       desc.setAlignedPtr(rewriter, loc, alignedPtr);
 
-      // Extract the offset and strides from the type.
-      int64_t offset;
+      // Extract the strides from the type. Offset is no longer carried by the
+      // type; reshape preserves the source descriptor's offset, but here we
+      // reconstruct the descriptor for the target type and conventionally start
+      // the new descriptor at offset 0.
       SmallVector<int64_t> strides;
-      if (failed(targetMemRefType.getStridesAndOffset(strides, offset)))
+      if (failed(targetMemRefType.getStrides(strides)))
         return rewriter.notifyMatchFailure(
-            reshapeOp, "failed to get stride and offset exprs");
+            reshapeOp, "failed to get stride exprs");
 
-      if (!isStaticStrideOrOffset(offset))
-        return rewriter.notifyMatchFailure(reshapeOp,
-                                           "dynamic offset is unsupported");
-
-      desc.setConstantOffset(rewriter, loc, offset);
+      desc.setConstantOffset(rewriter, loc, 0);
 
       assert(targetMemRefType.getLayout().isIdentity() &&
              "Identity layout map is a precondition of a valid reshape op");
@@ -1820,12 +1818,10 @@ struct ViewOpLowering : public ConvertOpToLLVMPattern<memref::ViewOp> {
       return viewOp.emitWarning("Target descriptor type not converted to LLVM"),
              failure();
 
-    int64_t offset;
     SmallVector<int64_t, 4> strides;
-    auto successStrides = viewMemRefType.getStridesAndOffset(strides, offset);
+    auto successStrides = viewMemRefType.getStrides(strides);
     if (failed(successStrides))
       return viewOp.emitWarning("cannot cast to non-strided shape"), failure();
-    assert(offset == 0 && "expected offset to be 0");
 
     // Target memref must be contiguous in memory (innermost stride is 1), or
     // empty (special case when at least one of the memref dimensions is 0).
@@ -1855,9 +1851,8 @@ struct ViewOpLowering : public ConvertOpToLLVMPattern<memref::ViewOp> {
     // Field 3: The offset in the resulting type must be 0. This is
     // because of the type change: an offset on srcType* may not be
     // expressible as an offset on dstType*.
-    targetMemRef.setOffset(
-        rewriter, loc,
-        createIndexAttrConstant(rewriter, loc, indexType, offset));
+    targetMemRef.setOffset(rewriter, loc,
+                           createIndexAttrConstant(rewriter, loc, indexType, 0));
 
     // Early exit for 0-D corner case.
     if (viewMemRefType.getRank() == 0)
@@ -1942,8 +1937,7 @@ struct AtomicRMWOpLowering : public LoadStoreOpLowering<memref::AtomicRMWOp> {
       return failure();
     auto memRefType = atomicOp.getMemRefType();
     SmallVector<int64_t> strides;
-    int64_t offset;
-    if (failed(memRefType.getStridesAndOffset(strides, offset)))
+    if (failed(memRefType.getStrides(strides)))
       return failure();
     auto dataPtr =
         getStridedElementPtr(rewriter, atomicOp.getLoc(), memRefType,
diff --git a/mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp b/mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp
index 01199155ade39..018e70d6ddd32 100644
--- a/mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp
+++ b/mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp
@@ -92,10 +92,11 @@ createMemRefMetadataType(MemRefType type,
   // Get pointer type (using address space 0 by default)
   auto ptrType = LLVM::LLVMPointerType::get(context, *addressSpace);
 
-  // Get the strides offsets and shape.
+  // Get the strides and shape. Offset is no longer carried by the type but is
+  // always part of the runtime descriptor, so it is always included in the
+  // metadata struct.
   SmallVector<int64_t> strides;
-  int64_t offset;
-  if (failed(type.getStridesAndOffset(strides, offset)))
+  if (failed(type.getStrides(strides)))
     return failure();
   ArrayRef<int64_t> shape = type.getShape();
 
@@ -105,7 +106,7 @@ createMemRefMetadataType(MemRefType type,
   // For a ranked memref, the descriptor contains:
   // 1. The pointer to the allocated data
   // 2. The pointer to the aligned data
-  // 3. The dynamic offset?
+  // 3. The runtime offset
   // 4. The dynamic sizes?
   // 5. The dynamic strides?
   SmallVector<Type, 5> elements;
@@ -113,9 +114,8 @@ createMemRefMetadataType(MemRefType type,
   // Allocated pointer.
   elements.push_back(ptrType);
 
-  // Potentially add the dynamic offset.
-  if (offset == ShapedType::kDynamic)
-    elements.push_back(indexType);
+  // Runtime offset (always present).
+  elements.push_back(indexType);
 
   // Potentially add the dynamic sizes.
   for (int64_t dim : shape) {
@@ -153,12 +153,11 @@ LogicalResult FromPtrOpConversion::matchAndRewrite(
   if (!descriptorTy)
     return rewriter.notifyMatchFailure(op, "Failed to convert result type");
 
-  // Get the strides, offsets and shape.
+  // Get the strides and shape. Offset is no longer carried by the type but
+  // always lives in the metadata struct.
   SmallVector<int64_t> strides;
-  int64_t offset;
-  if (failed(mTy.getStridesAndOffset(strides, offset))) {
-    return rewriter.notifyMatchFailure(op,
-                                       "Failed to get the strides and offset");
+  if (failed(mTy.getStrides(strides))) {
+    return rewriter.notifyMatchFailure(op, "Failed to get the strides");
   }
   ArrayRef<int64_t> shape = mTy.getShape();
 
@@ -175,14 +174,10 @@ LogicalResult FromPtrOpConversion::matchAndRewrite(
   // Extract metadata from the passed struct.
   unsigned fieldIdx = 1;
 
-  // Set dynamic offset if needed.
-  if (offset == ShapedType::kDynamic) {
-    Value offsetValue = LLVM::ExtractValueOp::create(
-        rewriter, loc, adaptor.getMetadata(), fieldIdx++);
-    desc.setOffset(rewriter, loc, offsetValue);
-  } else {
-    desc.setConstantOffset(rewriter, loc, offset);
-  }
+  // Set the offset (always present in the metadata struct).
+  Value offsetValue = LLVM::ExtractValueOp::create(
+      rewriter, loc, adaptor.getMetadata(), fieldIdx++);
+  desc.setOffset(rewriter, loc, offsetValue);
 
   // Set dynamic sizes if needed.
   for (auto [i, dim] : llvm::enumerate(shape)) {
@@ -232,12 +227,11 @@ LogicalResult GetMetadataOpConversion::matchAndRewrite(
   // Get the memref descriptor.
   MemRefDescriptor descriptor(adaptor.getPtr());
 
-  // Get the strides offsets and shape.
+  // Get the strides and shape. Offset is no longer carried by the type but
+  // always lives in the metadata struct.
   SmallVector<int64_t> strides;
-  int64_t offset;
-  if (failed(mTy.getStridesAndOffset(strides, offset))) {
-    return rewriter.notifyMatchFailure(op,
-                                       "Failed to get the strides and offset");
+  if (failed(mTy.getStrides(strides))) {
+    return rewriter.notifyMatchFailure(op, "Failed to get the strides");
   }
   ArrayRef<int64_t> shape = mTy.getShape();
 
@@ -253,11 +247,9 @@ LogicalResult GetMetadataOpConversion::matchAndRewrite(
   // Track the current field index.
   unsigned fieldIdx = 1;
 
-  // Add dynamic offset if needed.
-  if (offset == ShapedType::kDynamic) {
-    sV = LLVM::InsertValueOp::create(
-        rewriter, loc, sV, descriptor.offset(rewriter, loc), fieldIdx++);
-  }
+  // Add the offset (always present).
+  sV = LLVM::InsertValueOp::create(
+      rewriter, loc, sV, descriptor.offset(rewriter, loc), fieldIdx++);
 
   // Add dynamic sizes if needed.
   for (auto [i, dim] : llvm::enumerate(shape)) {
diff --git a/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp b/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
index 975fe28399609..5be39c341b160 100644
--- a/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
+++ b/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
@@ -131,10 +131,8 @@ getStaticallyKnownRowStride(ShapedType type, AffineMap permutationMap) {
   // If the memref is 0 or 1D the horizontal stride is 0.
   if (memrefType.getRank() < 2)
     return 0;
-  int64_t offset = 0;
   SmallVector<int64_t> strides;
-  if (failed(memrefType.getStridesAndOffset(strides, offset)) ||
-      strides.back() != 1)
+  if (failed(memrefType.getStrides(strides)) || strides.back() != 1)
     return std::nullopt;
 
   if (permutationMap.getNumResults() != 2)
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index 43e0824fef6cd..69a8db43e200e 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1385,9 +1385,8 @@ class VectorFMAOpNDRewritePattern : public OpRewritePattern<FMAOp> {
 /// static layout.
 static std::optional<SmallVector<int64_t, 4>>
 computeContiguousStrides(MemRefType memRefType) {
-  int64_t offset;
   SmallVector<int64_t, 4> strides;
-  if (failed(memRefType.getStridesAndOffset(strides, offset)))
+  if (failed(memRefType.getStrides(strides)))
     return std::nullopt;
   if (!strides.empty() && strides.back() != 1)
     return std::nullopt;
diff --git a/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp b/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp
index 3f676e2a3d42b..3ca6242d5f7bc 100644
--- a/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp
+++ b/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp
@@ -84,8 +84,7 @@ static LogicalResult transferPreconditions(PatternRewriter &rewriter,
 
   // Validate further transfer op semantics.
   SmallVector<int64_t> strides;
-  int64_t offset;
-  if (failed(srcTy.getStridesAndOffset(strides, offset)) || strides.back() != 1)
+  if (failed(srcTy.getStrides(strides)) || strides.back() != 1)
     return rewriter.notifyMatchFailure(
         xferOp, "Buffer must be contiguous in the innermost dimension");
 
@@ -115,7 +114,7 @@ static xegpu::CreateNdDescOp createNdDescriptor(PatternRewriter &rewriter,
                                                 TypedValue<MemRefType> src) {
   MemRefType srcTy = src.getType();
   assert(srcTy.isStrided() && "Expected strided memref type");
-  auto [strides, offset] = srcTy.getStridesAndOffset();
+  auto strides = srcTy.getStrides();
   // Pass the memref directly only when shape and strides are static and the
   // layout is identity. The type no longer pins a static offset, so any
   // explicit strided layout may carry a runtime offset that has to be
@@ -127,7 +126,6 @@ static xegpu::CreateNdDescOp createNdDescriptor(PatternRewriter &rewriter,
       break;
     }
   }
-  (void)offset;
 
   xegpu::CreateNdDescOp ndDesc;
   if (isStatic) {
@@ -198,11 +196,12 @@ computeMemrefMeta(OpType xferOp, PatternRewriter &rewriter) {
   MemRefType memrefType = dyn_cast<MemRefType>(baseMemref.getType());
 
   Location loc = xferOp.getLoc();
+  // Offset is no longer carried by the type; the runtime offset comes from
+  // memref.extract_strided_metadata below.
   Value offsetVal = nullptr;
   if (memrefType.hasStaticShape()) {
-    int64_t offset;
     SmallVector<int64_t> intStrides;
-    if (failed(memrefType.getStridesAndOffset(intStrides, offset)))
+    if (failed(memrefType.getStrides(intStrides)))
       return {{}, offsetVal};
     bool hasDynamicStrides = llvm::any_of(intStrides, [](int64_t strideVal) {
       return ShapedType::isDynamic(strideVal);
@@ -211,9 +210,6 @@ computeMemrefMeta(OpType xferOp, PatternRewriter &rewriter) {
     if (!hasDynamicStrides)
       for (int64_t s : intStrides)
         strides.push_back(arith::ConstantIndexOp::create(rewriter, loc, s));
-
-    if (!ShapedType::isDynamic(offset))
-      offsetVal = arith::ConstantIndexOp::create(rewriter, loc, offset);
   }
 
   if (strides.empty() || !offsetVal) {
diff --git a/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp b/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp
index 50eba56a16080..6c801c3514559 100644
--- a/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp
+++ b/mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp
@@ -1153,35 +1153,26 @@ struct ConvertXeGPUToXeVMPass
         unsigned rank = memrefTy.getRank();
         Type indexType = builder.getIndexType();
 
-        int64_t intOffsets;
-        SmallVector<int64_t> intStrides;
+        // Offset is no longer carried by the type; always read it from
+        // memref.extract_strided_metadata.
         Value addr;
         Value offset;
-        if (succeeded(memrefTy.getStridesAndOffset(intStrides, intOffsets)) &&
-            ShapedType::isStatic(intOffsets)) {
-          addr = memref::ExtractAlignedPointerAsIndexOp::create(builder, loc,
-                                                                input);
-          offset = arith::ConstantOp::create(builder, loc,
-                                             builder.getIndexAttr(intOffsets));
-        } else {
-
-          // Result types: [base_memref, offset, stride0, stride1, ...,
-          // strideN-1, size0, size1, ..., sizeN-1]
-          SmallVector<Type> resultTypes{
-              MemRefType::get({}, memrefTy.getElementType(),
-                              MemRefLayoutAttrInterface(),
-                              memrefTy.getMemorySpace()),
-              indexType};
-          // strides + sizes
-          resultTypes.append(2 * rank, indexType);
-
-          auto meta = memref::ExtractStridedMetadataOp::create(
-              builder, loc, resultTypes, input);
-
-          addr = memref::ExtractAlignedPointerAsIndexOp::create(
-              builder, loc, meta.getBaseBuffer());
-          offset = meta.getOffset();
-        }
+        // Result types: [base_memref, offset, stride0, stride1, ...,
+        // strideN-1, size0, size1, ..., sizeN-1]
+        SmallVector<Type> resultTypes{
+            MemRefType::get({}, memrefTy.getElementType(),
+                            MemRefLayoutAttrInterface(),
+                            memrefTy.getMemorySpace()),
+            indexType};
+        // strides + sizes
+        resultTypes.append(2 * rank, indexType);
+
+        auto meta = memref::ExtractStridedMetadataOp::create(
+            builder, loc, resultTypes, input);
+
+        addr = memref::ExtractAlignedPointerAsIndexOp::create(
+            builder, loc, meta.getBaseBuffer());
+        offset = meta.getOffset();
 
         auto addrCasted =
             arith::IndexCastUIOp::create(builder, loc, type, addr);
diff --git a/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp b/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
index faee30e70ad9d..7783515908da9 100644
--- a/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
+++ b/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
@@ -227,11 +227,11 @@ static bool staticallyOutOfBounds(OpType op) {
   MemRefType bufferType = op.getMemref().getType();
   if (!bufferType.hasStaticShape())
     return false;
-  int64_t offset;
+  // Offset is no longer carried by the MemRef type; treat as 0 here.
   SmallVector<int64_t> strides;
-  if (failed(bufferType.getStridesAndOffset(strides, offset)))
+  if (failed(bufferType.getStrides(strides)))
     return false;
-  int64_t result = offset + op.getIndexOffset().value_or(0);
+  int64_t result = op.getIndexOffset().value_or(0);
   if (op.getSgprOffset()) {
     std::optional<uint32_t> sgprOffset = getConstantUint32(op.getSgprOffset());
     if (!sgprOffset)
diff --git a/mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp b/mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp
index c525ec116f699..7bfc8b60a6301 100644
--- a/mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp
+++ b/mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp
@@ -39,16 +39,13 @@ FailureOr<Value> mlir::bufferization::castOrReallocMemRefValue(
   // from dynamic to static offset or stride (the canonicalization cannot know
   // at this point that it is really cast compatible).
   auto isGuaranteedCastCompatible = [](MemRefType source, MemRefType target) {
-    int64_t sourceOffset, targetOffset;
     SmallVector<int64_t, 4> sourceStrides, targetStrides;
-    if (failed(source.getStridesAndOffset(sourceStrides, sourceOffset)) ||
-        failed(target.getStridesAndOffset(targetStrides, targetOffset)))
+    if (failed(source.getStrides(sourceStrides)) ||
+        failed(target.getStrides(targetStrides)))
       return false;
     auto dynamicToStatic = [](int64_t a, int64_t b) {
       return ShapedType::isDynamic(a) && ShapedType::isStatic(b);
     };
-    if (dynamicToStatic(sourceOffset, targetOffset))
-      return false;
     for (auto it : zip(sourceStrides, targetStrides))
       if (dynamicToStatic(std::get<0>(it), std::get<1>(it)))
         return false;
diff --git a/mlir/lib/Dialect/Bufferization/Transforms/BufferResultsToOutParams.cpp b/mlir/lib/Dialect/Bufferization/Transforms/BufferResultsToOutParams.cpp
index 90ac2485058ec..4fbb025c1196c 100644
--- a/mlir/lib/Dialect/Bufferization/Transforms/BufferResultsToOutParams.cpp
+++ b/mlir/lib/Dialect/Bufferization/Transforms/BufferResultsToOutParams.cpp
@@ -28,15 +28,13 @@ using AllocDynamicSizesMap =
 
 /// Return `true` if the given MemRef type has a fully dynamic layout.
 static bool hasFullyDynamicLayoutMap(MemRefType type) {
-  int64_t offset;
   SmallVector<int64_t, 4> strides;
-  if (failed(type.getStridesAndOffset(strides, offset)))
+  if (failed(type.getStrides(strides)))
     return false;
   if (!llvm::all_of(strides, ShapedType::isDynamic))
     return false;
   // The type no longer carries a static offset; the strides being all dynamic
   // is enough to consider this a fully dynamic layout.
-  (void)offset;
   return true;
 }
 
diff --git a/mlir/lib/Dialect/GPU/Transforms/DecomposeMemRefs.cpp b/mlir/lib/Dialect/GPU/Transforms/DecomposeMemRefs.cpp
index 4a21095b35566..d7f9f6f783368 100644
--- a/mlir/lib/Dialect/GPU/Transforms/DecomposeMemRefs.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/DecomposeMemRefs.cpp
@@ -61,15 +61,17 @@ getFlatOffsetAndStrides(OpBuilder &rewriter, Location loc, Value source,
         memref::ExtractStridedMetadataOp::create(rewriter, loc, source);
   }
 
-  auto &&[sourceStrides, sourceOffset] = sourceType.getStridesAndOffset();
+  auto sourceStrides = sourceType.getStrides();
 
   auto getDim = [&](int64_t dim, Value dimVal) -> OpFoldResult {
     return ShapedType::isDynamic(dim) ? getAsOpFoldResult(dimVal)
                                       : rewriter.getIndexAttr(dim);
   };
 
+  // Offset is no longer carried by the type; always use the runtime offset
+  // from extract_strided_metadata.
   OpFoldResult origOffset =
-      getDim(sourceOffset, newExtractStridedMetadata.getOffset());
+      getAsOpFoldResult(newExtractStridedMetadata.getOffset());
   ValueRange sourceStridesVals = newExtractStridedMetadata.getStrides();
 
   SmallVector<OpFoldResult> origStrides;
diff --git a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
index 16396a939517c..602f851877736 100644
--- a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+++ b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
@@ -703,10 +703,9 @@ bool CastOp::canFoldIntoConsumerOp(CastOp castOp) {
     return false;
 
   // Only fold casts between strided memref forms.
-  int64_t sourceOffset, resultOffset;
   SmallVector<int64_t, 4> sourceStrides, resultStrides;
-  if (failed(sourceType.getStridesAndOffset(sourceStrides, sourceOffset)) ||
-      failed(resultType.getStridesAndOffset(resultStrides, resultOffset)))
+  if (failed(sourceType.getStrides(sourceStrides)) ||
+      failed(resultType.getStrides(resultStrides)))
     return false;
 
   // If cast is towards more static sizes along any dimension, don't fold.
@@ -746,23 +745,20 @@ bool CastOp::areCastCompatible(TypeRange inputs, TypeRange outputs) {
     if (aT.getElementType() != bT.getElementType())
       return false;
     if (aT.getLayout() != bT.getLayout()) {
-      int64_t aOffset, bOffset;
       SmallVector<int64_t, 4> aStrides, bStrides;
-      if (failed(aT.getStridesAndOffset(aStrides, aOffset)) ||
-          failed(bT.getStridesAndOffset(bStrides, bOffset)) ||
+      if (failed(aT.getStrides(aStrides)) ||
+          failed(bT.getStrides(bStrides)) ||
           aStrides.size() != bStrides.size())
         return false;
 
-      // Strides along a dimension/offset are compatible if the value in the
-      // source memref is static and the value in the target memref is the
-      // same. They are also compatible if either one is dynamic (see
-      // description of MemRefCastOp for details).
-      // Note that for dimensions of size 1, the stride can differ.
+      // Strides along a dimension are compatible if the value in the source
+      // memref is static and the value in the target memref is the same. They
+      // are also compatible if either one is dynamic (see description of
+      // MemRefCastOp for details). Note that for dimensions of size 1, the
+      // stride can differ. Offset is no longer carried by the type.
       auto checkCompatible = [](int64_t a, int64_t b) {
         return (ShapedType::isDynamic(a) || ShapedType::isDynamic(b) || a == b);
       };
-      if (!checkCompatible(aOffset, bOffset))
-        return false;
       for (const auto &[index, aStride] : enumerate(aStrides)) {
         if (aT.getDimSize(index) == 1 || bT.getDimSize(index) == 1)
           continue;
@@ -1067,11 +1063,8 @@ computeMemRefRankReductionMask(MemRefType originalType, MemRefType reducedType,
     return unusedDims;
 
   SmallVector<int64_t> originalStrides, candidateStrides;
-  int64_t originalOffset, candidateOffset;
-  if (failed(
-          originalType.getStridesAndOffset(originalStrides, originalOffset)) ||
-      failed(
-          reducedType.getStridesAndOffset(candidateStrides, candidateOffset)))
+  if (failed(originalType.getStrides(originalStrides)) ||
+      failed(reducedType.getStrides(candidateStrides)))
     return failure();
 
   // Try stride-based first when we have meaningful static stride info
@@ -1560,9 +1553,7 @@ SmallVector<OpFoldResult>
 ExtractStridedMetadataOp::getConstifiedMixedStrides() {
   SmallVector<OpFoldResult> values = getAsOpFoldResult(getStrides());
   SmallVector<int64_t> staticValues;
-  int64_t unused;
-  LogicalResult status =
-      getSource().getType().getStridesAndOffset(staticValues, unused);
+  LogicalResult status = getSource().getType().getStrides(staticValues);
   (void)status;
   assert(succeeded(status) && "could not get strides from type");
   constifyIndexValues(values, staticValues);
@@ -2101,12 +2092,10 @@ LogicalResult ReinterpretCastOp::verify() {
   // Match strides in static_strides attribute. The result type no longer
   // carries an offset, so the static_offsets attribute is the sole carrier of
   // offset information for this op and is not cross-checked here.
-  int64_t resultOffset;
   SmallVector<int64_t, 4> resultStrides;
-  if (failed(resultType.getStridesAndOffset(resultStrides, resultOffset)))
+  if (failed(resultType.getStrides(resultStrides)))
     return emitError("expected result type to have strided layout but found ")
            << resultType;
-  (void)resultOffset;
 
   // Match strides in result memref type and in static_strides attribute.
   for (auto [idx, resultStride, expectedStride] :
@@ -2165,8 +2154,7 @@ SmallVector<OpFoldResult> ReinterpretCastOp::getConstifiedMixedSizes() {
 SmallVector<OpFoldResult> ReinterpretCastOp::getConstifiedMixedStrides() {
   SmallVector<OpFoldResult> values = getMixedStrides();
   SmallVector<int64_t> staticValues;
-  int64_t unused;
-  LogicalResult status = getType().getStridesAndOffset(staticValues, unused);
+  LogicalResult status = getType().getStrides(staticValues);
   (void)status;
   assert(succeeded(status) && "could not get strides from type");
   constifyIndexValues(values, staticValues);
@@ -2483,9 +2471,8 @@ SmallVector<ReassociationExprs, 4> ExpandShapeOp::getReassociationExprs() {
 static FailureOr<StridedLayoutAttr>
 computeExpandedLayoutMap(MemRefType srcType, ArrayRef<int64_t> resultShape,
                          ArrayRef<ReassociationIndices> reassociation) {
-  int64_t srcOffset;
   SmallVector<int64_t> srcStrides;
-  if (failed(srcType.getStridesAndOffset(srcStrides, srcOffset)))
+  if (failed(srcType.getStrides(srcStrides)))
     return failure();
   assert(srcStrides.size() == reassociation.size() && "invalid reassociation");
 
@@ -2756,10 +2743,9 @@ static FailureOr<StridedLayoutAttr>
 computeCollapsedLayoutMap(MemRefType srcType,
                           ArrayRef<ReassociationIndices> reassociation,
                           bool strict = false) {
-  int64_t srcOffset;
   SmallVector<int64_t> srcStrides;
   auto srcShape = srcType.getShape();
-  if (failed(srcType.getStridesAndOffset(srcStrides, srcOffset)))
+  if (failed(srcType.getStrides(srcStrides)))
     return failure();
 
   // The result stride of a reassociation group is the stride of the last entry
@@ -3091,8 +3077,7 @@ MemRefType SubViewOp::inferResultType(MemRefType sourceMemRefType,
   assert(staticStrides.size() == rank && "staticStrides length mismatch");
 
   // Extract source strides (offset is no longer carried by the type).
-  auto [sourceStrides, sourceOffset] = sourceMemRefType.getStridesAndOffset();
-  (void)sourceOffset;
+  auto sourceStrides = sourceMemRefType.getStrides();
 
   // Compute target stride whose value is:
   //   `sourceStrides_i * staticStrides_i`.
@@ -3275,16 +3260,6 @@ void SubViewOp::build(OpBuilder &b, OperationState &result, Value source,
 /// For ViewLikeOpInterface.
 Value SubViewOp::getViewSource() { return getSource(); }
 
-/// Return true if `t1` and `t2` have equal offsets (both dynamic or of same
-/// static value).
-static bool haveCompatibleOffsets(MemRefType t1, MemRefType t2) {
-  int64_t t1Offset, t2Offset;
-  SmallVector<int64_t> t1Strides, t2Strides;
-  auto res1 = t1.getStridesAndOffset(t1Strides, t1Offset);
-  auto res2 = t2.getStridesAndOffset(t2Strides, t2Offset);
-  return succeeded(res1) && succeeded(res2) && t1Offset == t2Offset;
-}
-
 /// Return true if `t1` and `t2` have equal strides (both dynamic or of same
 /// static value). Dimensions of `t1` may be dropped in `t2`; these must be
 /// marked as dropped in `droppedDims`.
@@ -3294,10 +3269,9 @@ static bool haveCompatibleStrides(MemRefType t1, MemRefType t2,
          "incorrect number of bits");
   assert(size_t(t1.getRank() - t2.getRank()) == droppedDims.count() &&
          "incorrect number of dropped dims");
-  int64_t t1Offset, t2Offset;
   SmallVector<int64_t> t1Strides, t2Strides;
-  auto res1 = t1.getStridesAndOffset(t1Strides, t1Offset);
-  auto res2 = t2.getStridesAndOffset(t2Strides, t2Offset);
+  auto res1 = t1.getStrides(t1Strides);
+  auto res2 = t2.getStrides(t2Strides);
   if (failed(res1) || failed(res2))
     return false;
   for (int64_t i = 0, j = 0, e = t1.getRank(); i < e; ++i) {
@@ -3376,10 +3350,7 @@ LogicalResult SubViewOp::verify() {
     return produceSubViewErrorMsg(SliceVerificationResult::MemSpaceMismatch,
                                   *this, expectedType);
 
-  // Verify the offset of the layout map.
-  if (!haveCompatibleOffsets(expectedType, subViewType))
-    return produceSubViewErrorMsg(SliceVerificationResult::LayoutMismatch,
-                                  *this, expectedType);
+  // Offset is no longer carried by the MemRef type.
 
   // The only thing that's left to verify now are the strides. First, compute
   // the unused dimensions due to rank reductions. We have to look at sizes and
@@ -3643,8 +3614,8 @@ struct SubViewReturnTypeCanonicalizer {
     if (droppedDims.none())
       return nonReducedType;
 
-    // Take the strides and offset from the non-rank reduced type.
-    auto [nonReducedStrides, offset] = nonReducedType.getStridesAndOffset();
+    // Take the strides from the non-rank reduced type.
+    auto nonReducedStrides = nonReducedType.getStrides();
 
     // Drop dims from shape and strides.
     SmallVector<int64_t> targetShape;
@@ -3786,8 +3757,7 @@ void TransposeOp::getAsmResultNames(
 static MemRefType inferTransposeResultType(MemRefType memRefType,
                                            AffineMap permutationMap) {
   auto originalSizes = memRefType.getShape();
-  auto [originalStrides, offset] = memRefType.getStridesAndOffset();
-  (void)offset;
+  auto originalStrides = memRefType.getStrides();
   assert(originalStrides.size() == static_cast<unsigned>(memRefType.getRank()));
 
   // Compute permuted sizes and strides.
diff --git a/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp b/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
index d86c3a9448c28..68cb61bf8ad81 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp
@@ -676,8 +676,7 @@ void memref::populateMemRefNarrowTypeEmulationConversions(
 
         // Currently only handle innermost stride being 1, checking
         SmallVector<int64_t> strides;
-        int64_t offset;
-        if (failed(ty.getStridesAndOffset(strides, offset)))
+        if (failed(ty.getStrides(strides)))
           return nullptr;
         if (!strides.empty() && strides.back() != 1)
           return nullptr;
diff --git a/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp b/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp
index 265df32b49b8a..14b37f874f62b 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp
@@ -68,10 +68,9 @@ resolveSubviewStridedMetadata(RewriterBase &rewriter,
   auto newExtractStridedMetadata =
       memref::ExtractStridedMetadataOp::create(rewriter, origLoc, source);
 
-  auto [sourceStrides, sourceOffset] = sourceType.getStridesAndOffset();
-  (void)sourceOffset;
+  auto sourceStrides = sourceType.getStrides();
 #ifndef NDEBUG
-  auto [resultStrides, resultOffset] = subview.getType().getStridesAndOffset();
+  auto resultStrides = subview.getType().getStrides();
 #endif // NDEBUG
 
   // Compute the new strides and offset from the base strides and offset:
@@ -115,7 +114,6 @@ resolveSubviewStridedMetadata(RewriterBase &rewriter,
   // Compute the offset.
   OpFoldResult finalOffset =
       makeComposedFoldedAffineApply(rewriter, origLoc, expr, values);
-  (void)resultOffset;
 
   // The final result is  <baseBuffer, offset, sizes, strides>.
   // Thus we need 1 + 1 + subview.getRank() + subview.getRank(), to hold all
@@ -314,7 +312,7 @@ SmallVector<OpFoldResult> getExpandedStrides(memref::ExpandShapeOp expandShape,
   // Collect the statically known information about the original stride.
   Value source = expandShape.getSrc();
   auto sourceType = cast<MemRefType>(source.getType());
-  auto [strides, offset] = sourceType.getStridesAndOffset();
+  auto strides = sourceType.getStrides();
 
   OpFoldResult origStride = ShapedType::isDynamic(strides[groupId])
                                 ? origStrides[groupId]
@@ -430,7 +428,7 @@ getCollapsedStride(memref::CollapseShapeOp collapseShape, OpBuilder &builder,
   Value source = collapseShape.getSrc();
   auto sourceType = cast<MemRefType>(source.getType());
 
-  auto [strides, offset] = sourceType.getStridesAndOffset();
+  auto strides = sourceType.getStrides();
 
   ArrayRef<int64_t> srcShape = sourceType.getShape();
 
@@ -453,8 +451,7 @@ getCollapsedStride(memref::CollapseShapeOp collapseShape, OpBuilder &builder,
     // We're dealing with a 1x1x...x1 shape. The stride is meaningless,
     // but we still have to make the type system happy.
     MemRefType collapsedType = collapseShape.getResultType();
-    auto [collapsedStrides, collapsedOffset] =
-        collapsedType.getStridesAndOffset();
+    auto collapsedStrides = collapsedType.getStrides();
     int64_t finalStride = collapsedStrides[groupId];
     if (ShapedType::isDynamic(finalStride)) {
       // Look for a dynamic stride. At this point we don't know which one is
@@ -507,8 +504,7 @@ static FailureOr<StridedMetadata> resolveReshapeStridedMetadata(
       memref::ExtractStridedMetadataOp::create(rewriter, origLoc, source);
 
   // Collect statically known information.
-  auto [strides, offset] = sourceType.getStridesAndOffset();
-  (void)offset;
+  auto strides = sourceType.getStrides();
   MemRefType reshapeType = reshape.getResultType();
   unsigned reshapeRank = reshapeType.getRank();
 
diff --git a/mlir/lib/Dialect/MemRef/Transforms/FlattenMemRefs.cpp b/mlir/lib/Dialect/MemRef/Transforms/FlattenMemRefs.cpp
index b47a16f9f4ea5..67273c605a5ef 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/FlattenMemRefs.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/FlattenMemRefs.cpp
@@ -52,10 +52,9 @@ static std::pair<Value, Value> getFlattenMemrefAndOffset(OpBuilder &rewriter,
                                                          Location loc,
                                                          Value source,
                                                          ValueRange indices) {
-  int64_t sourceOffset;
   SmallVector<int64_t, 4> sourceStrides;
   auto sourceType = cast<MemRefType>(source.getType());
-  if (failed(sourceType.getStridesAndOffset(sourceStrides, sourceOffset))) {
+  if (failed(sourceType.getStrides(sourceStrides))) {
     assert(false);
   }
 
@@ -230,12 +229,9 @@ struct AllocLikeFlattenPattern : public OpRewritePattern<AllocLikeOp> {
 
     SmallVector<OpFoldResult> sizes = op.getMixedSizes();
 
-    int64_t staticOffset;
     SmallVector<int64_t> staticStrides;
-    if (failed(memrefType.getStridesAndOffset(staticStrides, staticOffset)))
+    if (failed(memrefType.getStrides(staticStrides)))
       return failure();
-    if (staticOffset == ShapedType::kDynamic)
-      return rewriter.notifyMatchFailure(op, "dynamic offset not supported");
     SmallVector<OpFoldResult> strides;
     strides.reserve(staticStrides.size());
     for (int64_t stride : staticStrides) {
@@ -255,17 +251,10 @@ struct AllocLikeFlattenPattern : public OpRewritePattern<AllocLikeOp> {
             sizes, strides);
     (void)linearizedOffset;
 
-    // The total allocation must cover [0, staticOffset + linearizedExtent).
-    // When the offset is non-zero, add it to the computed extent so that the
-    // buffer is large enough for elements accessed at positions
-    // [staticOffset, staticOffset + linearizedExtent).
+    // Offset is no longer carried by the MemRef type, so the allocation
+    // covers [0, linearizedExtent) and the reinterpret_cast below uses
+    // offset 0.
     OpFoldResult flatSizeOfr = linearizedInfo.linearizedSize;
-    if (staticOffset != 0) {
-      AffineExpr s0;
-      bindSymbols(rewriter.getContext(), s0);
-      flatSizeOfr = affine::makeComposedFoldedAffineApply(
-          rewriter, loc, s0 + staticOffset, {flatSizeOfr});
-    }
 
     // Build the flat 1-D MemRefType. The linearized size may be static or
     // dynamic (OpFoldResult of either IntegerAttr or a Value).
@@ -287,8 +276,8 @@ struct AllocLikeFlattenPattern : public OpRewritePattern<AllocLikeOp> {
     auto newOp = AllocLikeOp::create(rewriter, loc, flatMemrefType, dynSizes,
                                      op.getAlignmentAttr());
     rewriter.replaceOpWithNewOp<memref::ReinterpretCastOp>(
-        op, cast<MemRefType>(op.getType()), newOp,
-        rewriter.getIndexAttr(staticOffset), sizes, strides);
+        op, cast<MemRefType>(op.getType()), newOp, rewriter.getIndexAttr(0),
+        sizes, strides);
     return success();
   }
 };
diff --git a/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp b/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp
index 1ca297c7055b7..d6be69aa2136e 100644
--- a/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp
+++ b/mlir/lib/Dialect/MemRef/Transforms/RuntimeOpVerification.cpp
@@ -124,11 +124,9 @@ struct CastOpInterface
     }
 
     // Get result strides. Offset is no longer carried by the memref type.
-    int64_t resultOffset;
     SmallVector<int64_t> resultStrides;
-    if (failed(resultType.getStridesAndOffset(resultStrides, resultOffset)))
+    if (failed(resultType.getStrides(resultStrides)))
       return;
-    (void)resultOffset;
 
     // Check strides.
     for (const auto &it : llvm::enumerate(resultStrides)) {
diff --git a/mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp b/mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp
index cf126cd85ddce..1151fa678bec0 100644
--- a/mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp
+++ b/mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp
@@ -25,8 +25,7 @@ bool isStaticShapeAndContiguousRowMajor(MemRefType type) {
     return false;
 
   SmallVector<int64_t> strides;
-  int64_t offset;
-  if (failed(type.getStridesAndOffset(strides, offset)))
+  if (failed(type.getStrides(strides)))
     return false;
 
   // MemRef is contiguous if outer dimensions are size-1 and inner
diff --git a/mlir/lib/Dialect/NVGPU/Utils/MMAUtils.cpp b/mlir/lib/Dialect/NVGPU/Utils/MMAUtils.cpp
index 9e5ea93769cdc..c103ce0e49327 100644
--- a/mlir/lib/Dialect/NVGPU/Utils/MMAUtils.cpp
+++ b/mlir/lib/Dialect/NVGPU/Utils/MMAUtils.cpp
@@ -289,7 +289,7 @@ bool nvgpu::canLowerToWarpMatrixOperation(vector::TransferReadOp op) {
   // Check that the last dimension of the read is contiguous. Note that it is
   // possible to expand support for this by scalarizing all the loads during
   // conversion.
-  auto [strides, offset] = sourceType.getStridesAndOffset();
+  auto strides = sourceType.getStrides();
   return strides.back() == 1;
 }
 
@@ -313,6 +313,6 @@ bool nvgpu::canLowerToWarpMatrixOperation(vector::TransferWriteOp op) {
   // Check that the last dimension of the target memref is contiguous. Note that
   // it is possible to expand support for this by scalarizing all the stores
   // during conversion.
-  auto [strides, offset] = sourceType.getStridesAndOffset();
+  auto strides = sourceType.getStrides();
   return strides.back() == 1;
 }
diff --git a/mlir/lib/Dialect/SPIRV/Transforms/SPIRVConversion.cpp b/mlir/lib/Dialect/SPIRV/Transforms/SPIRVConversion.cpp
index 2c9e9c040d460..58339e80a1d17 100644
--- a/mlir/lib/Dialect/SPIRV/Transforms/SPIRVConversion.cpp
+++ b/mlir/lib/Dialect/SPIRV/Transforms/SPIRVConversion.cpp
@@ -206,11 +206,11 @@ getTypeNumBytes(const SPIRVConversionOptions &options, Type type) {
 
   if (auto memRefType = dyn_cast<MemRefType>(type)) {
     // TODO: Layout should also be controlled by the ABI attributes. For now
-    // using the layout from MemRef.
-    int64_t offset;
+    // using the layout from MemRef. Offset is no longer carried by the type;
+    // the runtime offset is treated as 0 for sizing purposes here.
     SmallVector<int64_t, 4> strides;
     if (!memRefType.hasStaticShape() ||
-        failed(memRefType.getStridesAndOffset(strides, offset)))
+        failed(memRefType.getStrides(strides)))
       return std::nullopt;
 
     // To get the size of the memref object in memory, the total size is the
@@ -225,7 +225,6 @@ getTypeNumBytes(const SPIRVConversionOptions &options, Type type) {
 
     auto dims = memRefType.getShape();
     if (llvm::is_contained(dims, ShapedType::kDynamic) ||
-        ShapedType::isDynamic(offset) ||
         llvm::is_contained(strides, ShapedType::kDynamic))
       return std::nullopt;
 
@@ -233,7 +232,7 @@ getTypeNumBytes(const SPIRVConversionOptions &options, Type type) {
     for (const auto &shape : enumerate(dims))
       memrefSize = std::max(memrefSize, shape.value() * strides[shape.index()]);
 
-    return (offset + memrefSize) * *elementSize;
+    return memrefSize * *elementSize;
   }
 
   if (auto tensorType = dyn_cast<TensorType>(type)) {
@@ -1361,13 +1360,12 @@ Value mlir::spirv::getVulkanElementPtr(const SPIRVTypeConverter &typeConverter,
                                        MemRefType baseType, Value basePtr,
                                        ValueRange indices, Location loc,
                                        OpBuilder &builder) {
-  // Get base and offset of the MemRefType and verify they are static.
+  // Get strides of the MemRefType and verify they are static. Offset is no
+  // longer carried by the type and is treated as 0 here.
 
-  int64_t offset;
   SmallVector<int64_t, 4> strides;
-  if (failed(baseType.getStridesAndOffset(strides, offset)) ||
-      llvm::is_contained(strides, ShapedType::kDynamic) ||
-      ShapedType::isDynamic(offset)) {
+  if (failed(baseType.getStrides(strides)) ||
+      llvm::is_contained(strides, ShapedType::kDynamic)) {
     return nullptr;
   }
 
@@ -1383,7 +1381,7 @@ Value mlir::spirv::getVulkanElementPtr(const SPIRVTypeConverter &typeConverter,
     linearizedIndices.push_back(zero);
   } else {
     linearizedIndices.push_back(
-        linearizeIndex(indices, strides, offset, indexType, loc, builder));
+        linearizeIndex(indices, strides, /*offset=*/0, indexType, loc, builder));
   }
   return spirv::AccessChainOp::create(builder, loc, basePtr, linearizedIndices);
 }
@@ -1392,13 +1390,12 @@ Value mlir::spirv::getOpenCLElementPtr(const SPIRVTypeConverter &typeConverter,
                                        MemRefType baseType, Value basePtr,
                                        ValueRange indices, Location loc,
                                        OpBuilder &builder) {
-  // Get base and offset of the MemRefType and verify they are static.
+  // Get strides of the MemRefType and verify they are static. Offset is no
+  // longer carried by the type and is treated as 0 here.
 
-  int64_t offset;
   SmallVector<int64_t, 4> strides;
-  if (failed(baseType.getStridesAndOffset(strides, offset)) ||
-      llvm::is_contained(strides, ShapedType::kDynamic) ||
-      ShapedType::isDynamic(offset)) {
+  if (failed(baseType.getStrides(strides)) ||
+      llvm::is_contained(strides, ShapedType::kDynamic)) {
     return nullptr;
   }
 
@@ -1410,7 +1407,7 @@ Value mlir::spirv::getOpenCLElementPtr(const SPIRVTypeConverter &typeConverter,
     linearIndex = spirv::ConstantOp::getZero(indexType, loc, builder);
   } else {
     linearIndex =
-        linearizeIndex(indices, strides, offset, indexType, loc, builder);
+        linearizeIndex(indices, strides, /*offset=*/0, indexType, loc, builder);
   }
   Type pointeeType =
       cast<spirv::PointerType>(basePtr.getType()).getPointeeType();
diff --git a/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp b/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
index b80bfdad2e848..a44cb6c7e4579 100644
--- a/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
@@ -192,8 +192,7 @@ struct CollapseShapeOpInterface
         // Source memref has a layout map: result keeps a strided layout but
         // carries no static offset (offsets live on ops, not the type).
         SmallVector<int64_t> strides;
-        int64_t offset;
-        if (failed(bufferType.getStridesAndOffset(strides, offset)))
+        if (failed(bufferType.getStrides(strides)))
           return failure();
         resultType = MemRefType::get(
             {}, tensorResultType.getElementType(),
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp
index 0b28fcf848fc8..2811618b1d779 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp
@@ -138,10 +138,8 @@ static MemRefType getCastCompatibleMemRefType(MemRefType aT, MemRefType bT) {
     return aT;
   if (aT.getRank() != bT.getRank())
     return MemRefType();
-  int64_t aOffset, bOffset;
   SmallVector<int64_t, 4> aStrides, bStrides;
-  if (failed(aT.getStridesAndOffset(aStrides, aOffset)) ||
-      failed(bT.getStridesAndOffset(bStrides, bOffset)) ||
+  if (failed(aT.getStrides(aStrides)) || failed(bT.getStrides(bStrides)) ||
       aStrides.size() != bStrides.size())
     return MemRefType();
 
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
index 752610efc6992..c9584117704de 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
@@ -1537,8 +1537,7 @@ struct FoldI1Select : public OpRewritePattern<arith::SelectOp> {
 static FailureOr<size_t>
 getTransferFoldableInnerUnitDims(MemRefType srcType, VectorType vectorType) {
   SmallVector<int64_t> srcStrides;
-  int64_t srcOffset;
-  if (failed(srcType.getStridesAndOffset(srcStrides, srcOffset)))
+  if (failed(srcType.getStrides(srcStrides)))
     return failure();
 
   auto isUnitDim = [](VectorType type, int dim) {
diff --git a/mlir/lib/Dialect/X86/IR/X86Dialect.cpp b/mlir/lib/Dialect/X86/IR/X86Dialect.cpp
index b186652aaa866..45ca6c41d5f65 100644
--- a/mlir/lib/Dialect/X86/IR/X86Dialect.cpp
+++ b/mlir/lib/Dialect/X86/IR/X86Dialect.cpp
@@ -181,7 +181,7 @@ static Value inferStride(Location loc, MemRefType mType, Value base,
   unsigned width = mType.getElementType().getIntOrFloatBitWidth();
   assert(llvm::isPowerOf2_64(width) && width >= 8);
   unsigned bytes = width >> 3;
-  auto [strides, offset] = mType.getStridesAndOffset();
+  auto strides = mType.getStrides();
   if (strides[preLast] == ShapedType::kDynamic) {
     // Dynamic stride needs code to compute the stride at runtime.
     MemRefDescriptor memrefDescriptor(base);
@@ -221,9 +221,7 @@ static LogicalResult tileTransferVerifier(OpTy op) {
     if (rank < 2)
       return op.emitOpError("requires at least 2D memref");
     SmallVector<int64_t> strides;
-    int64_t offset;
-    if (failed(memrefTy.getStridesAndOffset(strides, offset)) ||
-        strides.back() != 1)
+    if (failed(memrefTy.getStrides(strides)) || strides.back() != 1)
       return op.emitOpError("requires memref with unit innermost stride");
   }
 
diff --git a/mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp b/mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp
index 51ce6ce53a2fe..e04ebcfbd0040 100644
--- a/mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp
+++ b/mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp
@@ -235,7 +235,7 @@ void CreateNdDescOp::build(OpBuilder &builder, OperationState &state,
 
   if (auto memrefTy = dyn_cast<MemRefType>(srcTy)) {
     auto memrefShape = memrefTy.getShape();
-    auto [memrefStrides, _] = memrefTy.getStridesAndOffset();
+    auto memrefStrides = memrefTy.getStrides();
 
     // if shape and strides are from Memref, we don't need attributes for them
     // to keep the IR print clean (only do so for full-static case, otherwise
@@ -299,7 +299,7 @@ void CreateNdDescOp::build(OpBuilder &builder, OperationState &state,
 
   if (auto memrefTy = dyn_cast<MemRefType>(srcTy)) {
     auto memrefShape = memrefTy.getShape();
-    auto [memrefStrides, _] = memrefTy.getStridesAndOffset();
+    auto memrefStrides = memrefTy.getStrides();
 
     // if shape and strides are from Memref, we don't need attributes for them
     // to keep the IR print clean (only do so for full-static case, otherwise
diff --git a/mlir/lib/IR/BuiltinAttributeInterfaces.cpp b/mlir/lib/IR/BuiltinAttributeInterfaces.cpp
index 9e8ce4ca3a902..ae39833f2cebc 100644
--- a/mlir/lib/IR/BuiltinAttributeInterfaces.cpp
+++ b/mlir/lib/IR/BuiltinAttributeInterfaces.cpp
@@ -199,17 +199,13 @@ static LogicalResult getStridesAndOffset(AffineMap m, ArrayRef<int64_t> shape,
   return success();
 }
 
-LogicalResult mlir::detail::getAffineMapStridesAndOffset(
-    AffineMap map, ArrayRef<int64_t> shape, SmallVectorImpl<int64_t> &strides,
-    int64_t &offset) {
+LogicalResult
+mlir::detail::getAffineMapStrides(AffineMap map, ArrayRef<int64_t> shape,
+                                  SmallVectorImpl<int64_t> &strides) {
   AffineExpr offsetExpr;
   SmallVector<AffineExpr, 4> strideExprs;
   if (failed(::getStridesAndOffset(map, shape, strideExprs, offsetExpr)))
     return failure();
-  if (auto cst = llvm::dyn_cast<AffineConstantExpr>(offsetExpr))
-    offset = cst.getValue();
-  else
-    offset = ShapedType::kDynamic;
   for (auto e : strideExprs) {
     if (auto c = llvm::dyn_cast<AffineConstantExpr>(e))
       strides.push_back(c.getValue());
diff --git a/mlir/lib/IR/BuiltinAttributes.cpp b/mlir/lib/IR/BuiltinAttributes.cpp
index 10cc732cfc5d6..d4ef08a87fa64 100644
--- a/mlir/lib/IR/BuiltinAttributes.cpp
+++ b/mlir/lib/IR/BuiltinAttributes.cpp
@@ -265,15 +265,9 @@ LogicalResult StridedLayoutAttr::verifyLayout(
 }
 
 LogicalResult
-StridedLayoutAttr::getStridesAndOffset(ArrayRef<int64_t>,
-                                       SmallVectorImpl<int64_t> &strides,
-                                       int64_t &offset) const {
+StridedLayoutAttr::getStrides(ArrayRef<int64_t>,
+                              SmallVectorImpl<int64_t> &strides) const {
   llvm::append_range(strides, getStrides());
-  // The type no longer pins a static offset. Report zero for back-compat with
-  // identity-layout memrefs (which also report zero), so subview/cast offset
-  // checks remain consistent across both layout forms. The runtime offset, if
-  // any, lives on the producing op.
-  offset = 0;
   return success();
 }
 
diff --git a/mlir/lib/IR/BuiltinTypes.cpp b/mlir/lib/IR/BuiltinTypes.cpp
index 786c30851a071..6417d9adb981a 100644
--- a/mlir/lib/IR/BuiltinTypes.cpp
+++ b/mlir/lib/IR/BuiltinTypes.cpp
@@ -799,9 +799,8 @@ int64_t MemRefType::getNumContiguousTrailingDims() {
 
   // Get the strides (if any). Failing to do that, conservatively assume a
   // non-contiguous layout.
-  int64_t offset;
   SmallVector<int64_t> strides;
-  if (!succeeded(getStridesAndOffset(strides, offset)))
+  if (!succeeded(getStrides(strides)))
     return 0;
 
   ArrayRef<int64_t> shape = getShape();
@@ -864,32 +863,27 @@ MemRefType MemRefType::canonicalizeStridedLayout() {
   return MemRefType::Builder(*this).setLayout({});
 }
 
-LogicalResult MemRefType::getStridesAndOffset(SmallVectorImpl<int64_t> &strides,
-                                              int64_t &offset) const {
-  return getLayout().getStridesAndOffset(getShape(), strides, offset);
+LogicalResult MemRefType::getStrides(SmallVectorImpl<int64_t> &strides) const {
+  return getLayout().getStrides(getShape(), strides);
 }
 
-std::pair<SmallVector<int64_t>, int64_t>
-MemRefType::getStridesAndOffset() const {
+SmallVector<int64_t> MemRefType::getStrides() const {
   SmallVector<int64_t> strides;
-  int64_t offset;
-  LogicalResult status = getStridesAndOffset(strides, offset);
+  LogicalResult status = getStrides(strides);
   (void)status;
-  assert(succeeded(status) && "Invalid use of check-free getStridesAndOffset");
-  return {strides, offset};
+  assert(succeeded(status) && "Invalid use of check-free getStrides");
+  return strides;
 }
 
 bool MemRefType::isStrided() {
-  int64_t offset;
   SmallVector<int64_t, 4> strides;
-  auto res = getStridesAndOffset(strides, offset);
+  auto res = getStrides(strides);
   return succeeded(res);
 }
 
 bool MemRefType::isLastDimUnitStride() {
-  int64_t offset;
   SmallVector<int64_t> strides;
-  auto successStrides = getStridesAndOffset(strides, offset);
+  auto successStrides = getStrides(strides);
   return succeeded(successStrides) && (strides.empty() || strides.back() == 1);
 }
 
diff --git a/mlir/python/mlir/dialects/memref.py b/mlir/python/mlir/dialects/memref.py
index 9cf191fde2d96..5d13969aa08d1 100644
--- a/mlir/python/mlir/dialects/memref.py
+++ b/mlir/python/mlir/dialects/memref.py
@@ -36,7 +36,7 @@ def _is_static_int_like(i):
 def _infer_memref_subview_result_type(
     source_memref_type, offsets, static_sizes, static_strides
 ):
-    source_strides, _ = source_memref_type.get_strides_and_offset()
+    source_strides = source_memref_type.get_strides()
     # "canonicalize" from tuple|list -> list
     offsets, static_sizes, static_strides, source_strides = map(
         list, (offsets, static_sizes, static_strides, source_strides)
@@ -101,7 +101,7 @@ def subview(
         sizes = []
     if strides is None:
         strides = []
-    source_strides, source_offset = source.type.get_strides_and_offset()
+    source_strides = source.type.get_strides()
     if result_type is None and all(
         all(_is_static_int_like(i) for i in s) for s in [sizes, strides, source_strides]
     ):
diff --git a/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir b/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
index ae7ca3a0da50e..f77bfc20c2255 100644
--- a/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
+++ b/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
@@ -10,7 +10,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // Test subview with unknown sizes, and constant offsets and strides.
   // CHECK: Op:  %[[SV0:.*]] = memref.subview
   // CHECK-NEXT: result[0]: strided_metadata<
-  // CHECK-SAME: offset = [{unsigned : [1, 1] signed : [1, 1]}]
+  // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: sizes = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: strides = [{unsigned : [64, 64] signed : [64, 64]}, {unsigned : [4, 4] signed : [4, 4]}, {unsigned : [1, 1] signed : [1, 1]}]
   %subview = memref.subview %arg0[%c0, %c0, %c1] [%arg3, %arg4, %arg5] [%c1, %c1, %c1] : memref<8x16x4xf32, strided<[64, 4, 1]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
@@ -18,7 +18,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // Test a subview of a subview, with bounded dynamic offsets.
   // CHECK: Op:  %[[SV1:.*]] = memref.subview
   // CHECK-NEXT: result[0]: strided_metadata<
-  // CHECK-SAME: offset = [{unsigned : [346, 484] signed : [346, 484]}]
+  // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: sizes = [{unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}]
   // CHECK-SAME: strides = [{unsigned : [704, 832] signed : [704, 832]}, {unsigned : [44, 52] signed : [44, 52]}, {unsigned : [11, 13] signed : [11, 13]}]
   %subview_0 = memref.subview %subview[%1, %1, %1] [%c2, %c2, %c2] [%0, %0, %0] : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
@@ -26,7 +26,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // Test a subview of a subview, with constant operands.
   // CHECK: Op:  %[[SV2:.*]] = memref.subview
   // CHECK-NEXT: result[0]: strided_metadata<
-  // CHECK-SAME: offset = [{unsigned : [368, 510] signed : [368, 510]}]
+  // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: sizes = [{unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}]
   // CHECK-SAME: strides = [{unsigned : [704, 832] signed : [704, 832]}, {unsigned : [44, 52] signed : [44, 52]}, {unsigned : [11, 13] signed : [11, 13]}]
   %subview_1 = memref.subview %subview_0[%c0, %c0, %c2] [%c2, %c2, %c2] [%c1, %c1, %c1] : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
@@ -50,7 +50,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // Test a subview with mixed bounded and unbound dynamic sizes.
   // CHECK: Op:  %[[SV5:.*]] = memref.subview
   // CHECK-NEXT: result[0]: strided_metadata<
-  // CHECK-SAME: offset = [{unsigned : [16, 16] signed : [16, 16]}]
+  // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: sizes = [{unsigned : [11, 13] signed : [11, 13]}, {unsigned : [5, 7] signed : [5, 7]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: strides = [{unsigned : [1, 1] signed : [1, 1]}, {unsigned : [64, 64] signed : [64, 64]}, {unsigned : [8, 8] signed : [8, 8]}]
   %subview_4 = memref.subview %arg2[%c0, %c0, %c2] [%0, %1, %arg5] [%c1, %c1, %c1] : memref<8x16x4xf32, strided<[1, 64, 8]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
diff --git a/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir b/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
index b34c6743a817a..1e2be5f935e07 100644
--- a/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
+++ b/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
@@ -66,7 +66,7 @@ func.func @test_to_ptr(%arg0: memref<10xf32, #ptr.generic_space>) -> !ptr.ptr<#p
 
 // Tests extracting metadata from a static-sized memref
 // CHECK-LABEL:   llvm.func @test_get_metadata_static(
-// CHECK-SAME:      %[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.ptr, %[[ARG2:.*]]: i64, %[[ARG3:.*]]: i64, %[[ARG4:.*]]: i64, %[[ARG5:.*]]: i64, %[[ARG6:.*]]: i64) -> !llvm.struct<(ptr)> {
+// CHECK-SAME:      %[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.ptr, %[[ARG2:.*]]: i64, %[[ARG3:.*]]: i64, %[[ARG4:.*]]: i64, %[[ARG5:.*]]: i64, %[[ARG6:.*]]: i64) -> !llvm.struct<(ptr, i64)> {
 // CHECK:           %[[VAL_0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_1:.*]] = llvm.insertvalue %[[ARG0]], %[[VAL_0]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_2:.*]] = llvm.insertvalue %[[ARG1]], %[[VAL_1]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -75,10 +75,12 @@ func.func @test_to_ptr(%arg0: memref<10xf32, #ptr.generic_space>) -> !ptr.ptr<#p
 // CHECK:           %[[VAL_5:.*]] = llvm.insertvalue %[[ARG5]], %[[VAL_4]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_6:.*]] = llvm.insertvalue %[[ARG4]], %[[VAL_5]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[ARG6]], %[[VAL_6]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_8:.*]] = llvm.mlir.undef : !llvm.struct<(ptr)>
+// CHECK:           %[[VAL_8:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64)>
 // CHECK:           %[[VAL_9:.*]] = llvm.extractvalue %[[VAL_7]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_10:.*]] = llvm.insertvalue %[[VAL_9]], %[[VAL_8]][0] : !llvm.struct<(ptr)>
-// CHECK:           llvm.return %[[VAL_10]] : !llvm.struct<(ptr)>
+// CHECK:           %[[VAL_10:.*]] = llvm.insertvalue %[[VAL_9]], %[[VAL_8]][0] : !llvm.struct<(ptr, i64)>
+// CHECK:           %[[VAL_OFF:.*]] = llvm.extractvalue %[[VAL_7]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[VAL_FINAL:.*]] = llvm.insertvalue %[[VAL_OFF]], %[[VAL_10]][1] : !llvm.struct<(ptr, i64)>
+// CHECK:           llvm.return %[[VAL_FINAL]] : !llvm.struct<(ptr, i64)>
 // CHECK:         }
 func.func @test_get_metadata_static(%arg0: memref<10x20xf32, #ptr.generic_space>) -> !ptr.ptr_metadata<memref<10x20xf32, #ptr.generic_space>> {
   %0 = ptr.get_metadata %arg0 : memref<10x20xf32, #ptr.generic_space>
@@ -87,7 +89,7 @@ func.func @test_get_metadata_static(%arg0: memref<10x20xf32, #ptr.generic_space>
 
 // Tests extracting metadata from a dynamically-sized memref
 // CHECK-LABEL:   llvm.func @test_get_metadata_dynamic(
-// CHECK-SAME:      %[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.ptr, %[[ARG2:.*]]: i64, %[[ARG3:.*]]: i64, %[[ARG4:.*]]: i64, %[[ARG5:.*]]: i64, %[[ARG6:.*]]: i64) -> !llvm.struct<(ptr, i64, i64, i64)> {
+// CHECK-SAME:      %[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.ptr, %[[ARG2:.*]]: i64, %[[ARG3:.*]]: i64, %[[ARG4:.*]]: i64, %[[ARG5:.*]]: i64, %[[ARG6:.*]]: i64) -> !llvm.struct<(ptr, i64, i64, i64, i64)> {
 // CHECK:           %[[VAL_0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_1:.*]] = llvm.insertvalue %[[ARG0]], %[[VAL_0]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_2:.*]] = llvm.insertvalue %[[ARG1]], %[[VAL_1]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -96,16 +98,18 @@ func.func @test_get_metadata_static(%arg0: memref<10x20xf32, #ptr.generic_space>
 // CHECK:           %[[VAL_5:.*]] = llvm.insertvalue %[[ARG5]], %[[VAL_4]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_6:.*]] = llvm.insertvalue %[[ARG4]], %[[VAL_5]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[ARG6]], %[[VAL_6]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_8:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_8:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_9:.*]] = llvm.extractvalue %[[VAL_7]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_10:.*]] = llvm.insertvalue %[[VAL_9]], %[[VAL_8]][0] : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_10:.*]] = llvm.insertvalue %[[VAL_9]], %[[VAL_8]][0] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_OFF:.*]] = llvm.extractvalue %[[VAL_7]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[VAL_OFF_INS:.*]] = llvm.insertvalue %[[VAL_OFF]], %[[VAL_10]][1] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_11:.*]] = llvm.extractvalue %[[VAL_7]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_12:.*]] = llvm.insertvalue %[[VAL_11]], %[[VAL_10]][1] : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_12:.*]] = llvm.insertvalue %[[VAL_11]], %[[VAL_OFF_INS]][2] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_13:.*]] = llvm.extractvalue %[[VAL_7]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_14:.*]] = llvm.insertvalue %[[VAL_13]], %[[VAL_12]][2] : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_14:.*]] = llvm.insertvalue %[[VAL_13]], %[[VAL_12]][3] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_15:.*]] = llvm.extractvalue %[[VAL_7]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_16:.*]] = llvm.insertvalue %[[VAL_15]], %[[VAL_14]][3] : !llvm.struct<(ptr, i64, i64, i64)>
-// CHECK:           llvm.return %[[VAL_16]] : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_16:.*]] = llvm.insertvalue %[[VAL_15]], %[[VAL_14]][4] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           llvm.return %[[VAL_16]] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:         }
 func.func @test_get_metadata_dynamic(%arg0: memref<?x?xf32, #ptr.generic_space>) -> !ptr.ptr_metadata<memref<?x?xf32, #ptr.generic_space>> {
   %0 = ptr.get_metadata %arg0 : memref<?x?xf32, #ptr.generic_space>
@@ -114,13 +118,13 @@ func.func @test_get_metadata_dynamic(%arg0: memref<?x?xf32, #ptr.generic_space>)
 
 // Tests reconstructing a static-sized memref from a pointer and metadata
 // CHECK-LABEL:   llvm.func @test_from_ptr_static(
-// CHECK-SAME:      %[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.struct<(ptr)>) -> !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> {
+// CHECK-SAME:      %[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.struct<(ptr, i64)>) -> !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> {
 // CHECK:           %[[VAL_0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_1:.*]] = llvm.extractvalue %[[ARG1]][0] : !llvm.struct<(ptr)>
+// CHECK:           %[[VAL_1:.*]] = llvm.extractvalue %[[ARG1]][0] : !llvm.struct<(ptr, i64)>
 // CHECK:           %[[VAL_2:.*]] = llvm.insertvalue %[[VAL_1]], %[[VAL_0]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_3:.*]] = llvm.insertvalue %[[ARG0]], %[[VAL_2]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_4:.*]] = llvm.mlir.constant(0 : index) : i64
-// CHECK:           %[[VAL_5:.*]] = llvm.insertvalue %[[VAL_4]], %[[VAL_3]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[VAL_OFF:.*]] = llvm.extractvalue %[[ARG1]][1] : !llvm.struct<(ptr, i64)>
+// CHECK:           %[[VAL_5:.*]] = llvm.insertvalue %[[VAL_OFF]], %[[VAL_3]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_6:.*]] = llvm.mlir.constant(10 : index) : i64
 // CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[VAL_6]], %[[VAL_5]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_8:.*]] = llvm.mlir.constant(20 : index) : i64
@@ -138,18 +142,18 @@ func.func @test_from_ptr_static(%arg0: !ptr.ptr<#ptr.generic_space>, %arg1: !ptr
 
 // Tests reconstructing a dynamically-sized memref from a pointer and metadata
 // CHECK-LABEL:   llvm.func @test_from_ptr_dynamic(
-// CHECK-SAME:      %[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.struct<(ptr, i64, i64, i64)>) -> !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> {
+// CHECK-SAME:      %[[ARG0:.*]]: !llvm.ptr, %[[ARG1:.*]]: !llvm.struct<(ptr, i64, i64, i64, i64)>) -> !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> {
 // CHECK:           %[[VAL_0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_1:.*]] = llvm.extractvalue %[[ARG1]][0] : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_1:.*]] = llvm.extractvalue %[[ARG1]][0] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_2:.*]] = llvm.insertvalue %[[VAL_1]], %[[VAL_0]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_3:.*]] = llvm.insertvalue %[[ARG0]], %[[VAL_2]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_4:.*]] = llvm.mlir.constant(0 : index) : i64
+// CHECK:           %[[VAL_4:.*]] = llvm.extractvalue %[[ARG1]][1] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_5:.*]] = llvm.insertvalue %[[VAL_4]], %[[VAL_3]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_6:.*]] = llvm.extractvalue %[[ARG1]][1] : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_6:.*]] = llvm.extractvalue %[[ARG1]][2] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[VAL_6]], %[[VAL_5]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_8:.*]] = llvm.extractvalue %[[ARG1]][2] : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_8:.*]] = llvm.extractvalue %[[ARG1]][3] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_9:.*]] = llvm.insertvalue %[[VAL_8]], %[[VAL_7]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_10:.*]] = llvm.extractvalue %[[ARG1]][3] : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_10:.*]] = llvm.extractvalue %[[ARG1]][4] : !llvm.struct<(ptr, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_11:.*]] = llvm.insertvalue %[[VAL_10]], %[[VAL_9]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_12:.*]] = llvm.mlir.constant(1 : index) : i64
 // CHECK:           %[[VAL_13:.*]] = llvm.insertvalue %[[VAL_12]], %[[VAL_11]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
@@ -174,13 +178,15 @@ func.func @test_from_ptr_dynamic(%arg0: !ptr.ptr<#ptr.generic_space>, %arg1: !pt
 // CHECK:           %[[VAL_8:.*]] = llvm.insertvalue %[[ARG5]], %[[VAL_7]][3, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[VAL_9:.*]] = llvm.insertvalue %[[ARG8]], %[[VAL_8]][4, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[VAL_10:.*]] = llvm.extractvalue %[[VAL_9]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[VAL_11:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64, i64)>
+// CHECK:           %[[VAL_11:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64, i64, i64)>
 // CHECK:           %[[VAL_12:.*]] = llvm.extractvalue %[[VAL_9]][0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[VAL_13:.*]] = llvm.insertvalue %[[VAL_12]], %[[VAL_11]][0] : !llvm.struct<(ptr, i64, i64)>
+// CHECK:           %[[VAL_13:.*]] = llvm.insertvalue %[[VAL_12]], %[[VAL_11]][0] : !llvm.struct<(ptr, i64, i64, i64)>
+// CHECK:           %[[VAL_OFF2:.*]] = llvm.extractvalue %[[VAL_9]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK:           %[[VAL_OFF_INS2:.*]] = llvm.insertvalue %[[VAL_OFF2]], %[[VAL_13]][1] : !llvm.struct<(ptr, i64, i64, i64)>
 // CHECK:           %[[VAL_14:.*]] = llvm.extractvalue %[[VAL_9]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[VAL_15:.*]] = llvm.insertvalue %[[VAL_14]], %[[VAL_13]][1] : !llvm.struct<(ptr, i64, i64)>
+// CHECK:           %[[VAL_15:.*]] = llvm.insertvalue %[[VAL_14]], %[[VAL_OFF_INS2]][2] : !llvm.struct<(ptr, i64, i64, i64)>
 // CHECK:           %[[VAL_16:.*]] = llvm.extractvalue %[[VAL_9]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[VAL_17:.*]] = llvm.insertvalue %[[VAL_16]], %[[VAL_15]][2] : !llvm.struct<(ptr, i64, i64)>
+// CHECK:           %[[VAL_17:.*]] = llvm.insertvalue %[[VAL_16]], %[[VAL_15]][3] : !llvm.struct<(ptr, i64, i64, i64)>
 // CHECK:           llvm.return %[[VAL_9]] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:         }
 func.func @test_memref_mixed(%arg0: memref<10x?x30xf32, #ptr.generic_space>) -> memref<10x?x30xf32, #ptr.generic_space> {
@@ -202,9 +208,9 @@ func.func @test_memref_mixed(%arg0: memref<10x?x30xf32, #ptr.generic_space>) ->
 // CHECK:           %[[VAL_6:.*]] = llvm.insertvalue %[[ARG4]], %[[VAL_5]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[ARG6]], %[[VAL_6]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_8:.*]] = llvm.extractvalue %[[VAL_7]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_9:.*]] = llvm.mlir.undef : !llvm.struct<(ptr)>
+// CHECK:           %[[VAL_9:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64)>
 // CHECK:           %[[VAL_10:.*]] = llvm.extractvalue %[[VAL_7]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_11:.*]] = llvm.insertvalue %[[VAL_10]], %[[VAL_9]][0] : !llvm.struct<(ptr)>
+// CHECK:           %[[VAL_11:.*]] = llvm.insertvalue %[[VAL_10]], %[[VAL_9]][0] : !llvm.struct<(ptr, i64)>
 // CHECK:           llvm.return %[[VAL_7]] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:         }
 func.func @test_memref_strided(%arg0: memref<10x20xf32, strided<[40, 2]>, #ptr.generic_space>) -> memref<10x20xf32, strided<[40, 2]>, #ptr.generic_space> {
@@ -226,34 +232,36 @@ func.func @test_memref_strided(%arg0: memref<10x20xf32, strided<[40, 2]>, #ptr.g
 // CHECK:           %[[VAL_6:.*]] = llvm.insertvalue %[[ARG4]], %[[VAL_5]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[ARG6]], %[[VAL_6]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_8:.*]] = llvm.extractvalue %[[VAL_7]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_9:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_9:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_10:.*]] = llvm.extractvalue %[[VAL_7]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_11:.*]] = llvm.insertvalue %[[VAL_10]], %[[VAL_9]][0] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_11:.*]] = llvm.insertvalue %[[VAL_10]], %[[VAL_9]][0] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_12:.*]] = llvm.extractvalue %[[VAL_7]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[VAL_13:.*]] = llvm.insertvalue %[[VAL_12]], %[[VAL_11]][1] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_14:.*]] = llvm.extractvalue %[[VAL_7]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_15:.*]] = llvm.insertvalue %[[VAL_14]], %[[VAL_11]][1] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_15:.*]] = llvm.insertvalue %[[VAL_14]], %[[VAL_13]][2] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_16:.*]] = llvm.extractvalue %[[VAL_7]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_17:.*]] = llvm.insertvalue %[[VAL_16]], %[[VAL_15]][2] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_17:.*]] = llvm.insertvalue %[[VAL_16]], %[[VAL_15]][3] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_18:.*]] = llvm.extractvalue %[[VAL_7]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_19:.*]] = llvm.insertvalue %[[VAL_18]], %[[VAL_17]][3] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_19:.*]] = llvm.insertvalue %[[VAL_18]], %[[VAL_17]][4] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_20:.*]] = llvm.extractvalue %[[VAL_7]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_21:.*]] = llvm.insertvalue %[[VAL_20]], %[[VAL_19]][4] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_21:.*]] = llvm.insertvalue %[[VAL_20]], %[[VAL_19]][5] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_22:.*]] = llvm.mlir.zero : !llvm.ptr
 // CHECK:           %[[VAL_23:.*]] = llvm.getelementptr %[[VAL_22]][1] : (!llvm.ptr) -> !llvm.ptr, f32
 // CHECK:           %[[VAL_24:.*]] = llvm.ptrtoint %[[VAL_23]] : !llvm.ptr to i64
 // CHECK:           %[[VAL_25:.*]] = llvm.getelementptr inbounds %[[VAL_8]]{{\[}}%[[VAL_24]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
 // CHECK:           %[[VAL_26:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_27:.*]] = llvm.extractvalue %[[VAL_21]][0] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_27:.*]] = llvm.extractvalue %[[VAL_21]][0] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_28:.*]] = llvm.insertvalue %[[VAL_27]], %[[VAL_26]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_29:.*]] = llvm.insertvalue %[[VAL_25]], %[[VAL_28]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[ZERO:.*]] = llvm.mlir.constant(0 : index) : i64
-// CHECK:           %[[VAL_31:.*]] = llvm.insertvalue %[[ZERO]], %[[VAL_29]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_32:.*]] = llvm.extractvalue %[[VAL_21]][1] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[OFF_OUT:.*]] = llvm.extractvalue %[[VAL_21]][1] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_31:.*]] = llvm.insertvalue %[[OFF_OUT]], %[[VAL_29]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[VAL_32:.*]] = llvm.extractvalue %[[VAL_21]][2] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_33:.*]] = llvm.insertvalue %[[VAL_32]], %[[VAL_31]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_34:.*]] = llvm.extractvalue %[[VAL_21]][2] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_34:.*]] = llvm.extractvalue %[[VAL_21]][3] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_35:.*]] = llvm.insertvalue %[[VAL_34]], %[[VAL_33]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_36:.*]] = llvm.extractvalue %[[VAL_21]][3] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_36:.*]] = llvm.extractvalue %[[VAL_21]][4] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_37:.*]] = llvm.insertvalue %[[VAL_36]], %[[VAL_35]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_38:.*]] = llvm.extractvalue %[[VAL_21]][4] : !llvm.struct<(ptr, i64, i64, i64, i64)>
+// CHECK:           %[[VAL_38:.*]] = llvm.extractvalue %[[VAL_21]][5] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_39:.*]] = llvm.insertvalue %[[VAL_38]], %[[VAL_37]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           llvm.return %[[VAL_39]] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:         }
@@ -274,9 +282,9 @@ func.func @test_comprehensive_dynamic(%arg0: memref<?x?xf32, strided<[?, ?]>, #p
 // CHECK:           %[[VAL_2:.*]] = llvm.insertvalue %[[ARG1]], %[[VAL_1]][1] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[VAL_3:.*]] = llvm.insertvalue %[[ARG2]], %[[VAL_2]][2] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:           %[[VAL_4:.*]] = llvm.extractvalue %[[VAL_3]][1] : !llvm.struct<(ptr, ptr, i64)>
-// CHECK:           %[[VAL_5:.*]] = llvm.mlir.undef : !llvm.struct<(ptr)>
+// CHECK:           %[[VAL_5:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64)>
 // CHECK:           %[[VAL_6:.*]] = llvm.extractvalue %[[VAL_3]][0] : !llvm.struct<(ptr, ptr, i64)>
-// CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[VAL_6]], %[[VAL_5]][0] : !llvm.struct<(ptr)>
+// CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[VAL_6]], %[[VAL_5]][0] : !llvm.struct<(ptr, i64)>
 // CHECK:           llvm.return %[[VAL_3]] : !llvm.struct<(ptr, ptr, i64)>
 // CHECK:         }
 func.func @test_memref_0d(%arg0: memref<f32, #ptr.generic_space>) -> memref<f32, #ptr.generic_space> {
diff --git a/mlir/test/Conversion/XeGPUToXeVM/create_nd_tdesc.mlir b/mlir/test/Conversion/XeGPUToXeVM/create_nd_tdesc.mlir
index 34654126ce8d2..809b0c9f7d728 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/create_nd_tdesc.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/create_nd_tdesc.mlir
@@ -7,8 +7,13 @@ gpu.module @create_nd_tdesc {
   // CHECK-SAME: %[[DYN:.*]]: memref<?x?xf16>) kernel {
   gpu.func @create_nd_tdesc(%src: memref<16x32xf32, 1>, %ptr: ui64, %shape1: index, %shape2: index,
   %stride1: index, %stride2: index, %offset1: index, %offset2: index, %dyn: memref<?x?xf16>) kernel {
-        // CHECK: %[[INTPTR_5:.*]] = memref.extract_aligned_pointer_as_index %[[DYN]] : memref<?x?xf16> -> index
-        // CHECK: %[[DYN_ADDR:.*]] = arith.index_castui %[[INTPTR_5]] : index to i64
+        // CHECK: %[[DYN_BASE:.*]], %[[DYN_OFFSET:.*]], %{{.*}}:2, %{{.*}}:2 = memref.extract_strided_metadata %[[DYN]]
+        // CHECK: %[[INTPTR_5:.*]] = memref.extract_aligned_pointer_as_index %[[DYN_BASE]] : memref<f16> -> index
+        // CHECK: %[[DYN_PTR_I64:.*]] = arith.index_castui %[[INTPTR_5]] : index to i64
+        // CHECK: %[[DYN_OFF_I64:.*]] = arith.index_castui %[[DYN_OFFSET]] : index to i64
+        // CHECK: %[[DYN_ELEM_SIZE:.*]] = arith.constant 2 : i64
+        // CHECK: %[[DYN_OFF_BYTES:.*]] = arith.muli %[[DYN_OFF_I64]], %[[DYN_ELEM_SIZE]] : i64
+        // CHECK: %[[DYN_ADDR:.*]] = arith.addi %[[DYN_PTR_I64]], %[[DYN_OFF_BYTES]] : i64
         // CHECK: %[[VAR0:.*]] = index.castu %[[ARG1]] : ui64 to index
         // CHECK: %[[BASE_ADDR:.*]] = arith.index_castui %[[VAR0]] : index to i64
         // CHECK: %[[CST:.*]] = arith.constant dense<0> : vector<8xi32>
@@ -27,8 +32,13 @@ gpu.module @create_nd_tdesc {
         // CHECK: %[[MEMSPACECAST:.*]] = memref.memory_space_cast %[[ARG0]] : memref<16x32xf32, 1> to memref<16x32xf32>
         %srcce = memref.memory_space_cast %src : memref<16x32xf32, 1> to memref<16x32xf32>
 
-        // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[MEMSPACECAST]] : memref<16x32xf32> -> index
-        // CHECK: %[[BASE_ADDR2:.*]] = arith.index_castui %[[INTPTR]] : index to i64
+        // CHECK: %[[SRC_BASE:.*]], %[[SRC_OFFSET:.*]], %{{.*}}:2, %{{.*}}:2 = memref.extract_strided_metadata %[[MEMSPACECAST]]
+        // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[SRC_BASE]] : memref<f32> -> index
+        // CHECK: %[[SRC_PTR_I64:.*]] = arith.index_castui %[[INTPTR]] : index to i64
+        // CHECK: %[[SRC_OFF_I64:.*]] = arith.index_castui %[[SRC_OFFSET]] : index to i64
+        // CHECK: %[[SRC_ELEM_SIZE:.*]] = arith.constant 4 : i64
+        // CHECK: %[[SRC_OFF_BYTES:.*]] = arith.muli %[[SRC_OFF_I64]], %[[SRC_ELEM_SIZE]] : i64
+        // CHECK: %[[BASE_ADDR2:.*]] = arith.addi %[[SRC_PTR_I64]], %[[SRC_OFF_BYTES]] : i64
         // CHECK: %[[CST_1:.*]] = arith.constant dense<0> : vector<8xi32>
         // CHECK: %[[C32_I64:.*]] = arith.constant 32 : i64
         // CHECK: %[[SHAPE_W2:.*]] = arith.trunci %[[C32_I64]] : i64 to i32
diff --git a/mlir/test/Conversion/XeGPUToXeVM/loadstore_1d.mlir b/mlir/test/Conversion/XeGPUToXeVM/loadstore_1d.mlir
index d92f4f5f64df7..4c90b6dc3c167 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/loadstore_1d.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/loadstore_1d.mlir
@@ -9,15 +9,23 @@ gpu.module @load_store_check {
 
         // CHECK: %[[SRCCE:.*]] = memref.memory_space_cast %[[SRC]] : memref<512xf32, 1> to memref<512xf32>
         %srcce = memref.memory_space_cast %src : memref<512xf32, 1> to memref<512xf32>
-        // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[SRCCE]] : memref<512xf32> -> index
+        // CHECK: %[[SRC_BASE:.*]], %[[SRC_OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %[[SRCCE]]
+        // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[SRC_BASE]] : memref<f32> -> index
         // CHECK: %[[INTPTR_I64:.*]] = arith.index_castui %[[INTPTR]] : index to i64
+        // CHECK: %[[OFF_I64:.*]] = arith.index_castui %[[SRC_OFFSET]] : index to i64
+        // CHECK: %[[OFF_BYTES:.*]] = arith.muli %[[OFF_I64]], %{{.*}} : i64
+        // CHECK: %[[BASE_ADDR:.*]] = arith.addi %[[INTPTR_I64]], %[[OFF_BYTES]] : i64
         // CHECK: %[[DSTTE:.*]] = memref.memory_space_cast %[[DST]] : memref<256xf32, 1> to memref<256xf32>
         %dstte = memref.memory_space_cast %dst : memref<256xf32, 1> to memref<256xf32>
-        // CHECK: %[[INTPTR1:.*]] = memref.extract_aligned_pointer_as_index %[[DSTTE]] : memref<256xf32> -> index
+        // CHECK: %[[DST_BASE:.*]], %[[DST_OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %[[DSTTE]]
+        // CHECK: %[[INTPTR1:.*]] = memref.extract_aligned_pointer_as_index %[[DST_BASE]] : memref<f32> -> index
         // CHECK: %[[INTPTR1_I64:.*]] = arith.index_castui %[[INTPTR1]] : index to i64
+        // CHECK: %[[OFF1_I64:.*]] = arith.index_castui %[[DST_OFFSET]] : index to i64
+        // CHECK: %[[OFF1_BYTES:.*]] = arith.muli %[[OFF1_I64]], %{{.*}} : i64
+        // CHECK: %[[BASE_ADDR1:.*]] = arith.addi %[[INTPTR1_I64]], %[[OFF1_BYTES]] : i64
 
         %src_tdesc = xegpu.create_nd_tdesc %srcce : memref<512xf32> -> !xegpu.tensor_desc<32xf32>
-        // CHECK: %[[ADDR:.*]] = arith.addi %[[INTPTR_I64]], %[[C384]] : i64
+        // CHECK: %[[ADDR:.*]] = arith.addi %[[BASE_ADDR]], %[[C384]] : i64
         // CHECK: %[[PTR:.*]] = llvm.inttoptr %[[ADDR]] : i64 to !llvm.ptr<1>
         // CHECK: %[[LOAD:.*]] = xevm.blockload %[[PTR]] <{cache_control = #xevm.load_cache_control<L1c_L2uc_L3c>}>
         // CHECK-SAME: : (!llvm.ptr<1>) -> vector<2xi32>
@@ -25,7 +33,7 @@ gpu.module @load_store_check {
             : !xegpu.tensor_desc<32xf32> -> vector<2xf32>
 
         %dst_tdesc = xegpu.create_nd_tdesc %dstte : memref<256xf32> -> !xegpu.tensor_desc<32xf32, #xegpu.block_tdesc_attr<memory_space = global>>
-        // CHECK: %[[ADDR1:.*]] = arith.addi %[[INTPTR1_I64]], %[[C512]] : i64
+        // CHECK: %[[ADDR1:.*]] = arith.addi %[[BASE_ADDR1]], %[[C512]] : i64
         // CHECK: %[[PTR1:.*]] = llvm.inttoptr %[[ADDR1]] : i64 to !llvm.ptr<1>
         // CHECK: xevm.blockstore %[[PTR1]], %[[LOAD]] <{cache_control = #xevm.store_cache_control<L1wb_L2uc_L3wb>}>
         // CHECK-SAME: : (!llvm.ptr<1>, vector<2xi32>)
diff --git a/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir b/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
index a8842873d3cc7..b48ca19006c92 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
@@ -7,10 +7,10 @@ gpu.module @test_kernel [#xevm.target<chip = "pvc">] {
   //CHECK-LABEL: load_store_matrix_plain
   gpu.func @load_store_matrix_plain(%arg0: memref<4096xi8, 3>) -> f32 {
 
-    //CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %arg0 : memref<4096xi8, 3> -> index
-    //CHECK: %[[C0:.*]] = arith.constant 0 : index
+    //CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %arg0
+    //CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE]] : memref<i8, 3> -> index
     //CHECK: %[[CAST0:.*]] = arith.index_castui %[[INTPTR]] : index to i32
-    //CHECK: %[[CAST1:.*]] = arith.index_castui %[[C0]] : index to i32
+    //CHECK: %[[CAST1:.*]] = arith.index_castui %[[OFFSET]] : index to i32
     //CHECK: %[[C1_I32:.*]] = arith.constant 1 : i32
     //CHECK: %[[MUL:.*]] = arith.muli %[[CAST1]], %[[C1_I32]] : i32
     //CHECK: %[[ADD:.*]] = arith.addi %[[CAST0]], %[[MUL]] : i32
@@ -41,7 +41,7 @@ gpu.module @test_kernel [#xevm.target<chip = "pvc">] {
 
     %subview = memref.subview %view[32, 0] [32, 32] [1, 1] : memref<64x32xf32, 3> to memref<32x32xf32, strided<[32, 1]>, 3>
 
-    //CHECK: %[[intptr:.*]] = memref.extract_aligned_pointer_as_index %[[base_buffer:.*]] : memref<32x32xf32, strided<[32, 1]>, 3> -> index
+    //CHECK: %[[intptr:.*]] = memref.extract_aligned_pointer_as_index %[[base_buffer:.*]] : memref<f32, 3> -> index
     //CHECK: %[[ptr_i32:.*]] = arith.index_castui %[[intptr]] : index to i32
     //CHECK: %[[offset_i32:.*]] = arith.index_castui %[[offset:.*]] : index to i32
     //CHECK: %[[c4_i32:.*]] = arith.constant 4 : i32
@@ -117,10 +117,10 @@ gpu.module @test_kernel [#xevm.target<chip = "pvc">] {
   // its memory layout tuple is ([2,4,16,16],[1024,256,16,1])
   //CHECK-LABEL: load_store_matrix_blocked_nostride
   gpu.func @load_store_matrix_blocked_nostride(%arg0: memref<4096xi8, 3>) -> f16 {
-    //CHECK: %[[intptr:.*]] = memref.extract_aligned_pointer_as_index %arg0 : memref<4096xi8, 3> -> index
-    //CHECK: %[[c0:.*]] = arith.constant 0 : index
+    //CHECK: %[[base:.*]], %[[offset:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %arg0
+    //CHECK: %[[intptr:.*]] = memref.extract_aligned_pointer_as_index %[[base]] : memref<i8, 3> -> index
     //CHECK: %[[cast0:.*]] = arith.index_castui %[[intptr]] : index to i32
-    //CHECK: %[[cast1:.*]] = arith.index_castui %[[c0]] : index to i32
+    //CHECK: %[[cast1:.*]] = arith.index_castui %[[offset]] : index to i32
     //CHECK: %[[c1_i32:.*]] = arith.constant 1 : i32
     //CHECK: %[[mul:.*]] = arith.muli %[[cast1]], %[[c1_i32]] : i32
     //CHECK: %[[add:.*]] = arith.addi %[[cast0]], %[[mul]] : i32
@@ -219,10 +219,10 @@ gpu.module @test_kernel [#xevm.target<chip = "pvc">] {
   //CHECK-LABEL: load_store_matrix_blocked_subgroupblockio
   gpu.func @load_store_matrix_blocked_subgroupblockio(%arg0: memref<4096xi8, 3>) -> vector<8xf16> {
 
-    //CHECK: %[[intptr:.*]] = memref.extract_aligned_pointer_as_index %arg0 : memref<4096xi8, 3> -> index
-    //CHECK: %[[c0:.*]] = arith.constant 0 : index
+    //CHECK: %[[base:.*]], %[[offset:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %arg0
+    //CHECK: %[[intptr:.*]] = memref.extract_aligned_pointer_as_index %[[base]] : memref<i8, 3> -> index
     //CHECK: %[[cast0:.*]] = arith.index_castui %[[intptr]] : index to i32
-    //CHECK: %[[cast1:.*]] = arith.index_castui %[[c0]] : index to i32
+    //CHECK: %[[cast1:.*]] = arith.index_castui %[[offset]] : index to i32
     //CHECK: %[[c1_i32:.*]] = arith.constant 1 : i32
     //CHECK: %[[mul:.*]] = arith.muli %[[cast1]], %[[c1_i32]] : i32
     //CHECK: %[[add:.*]] = arith.addi %[[cast0]], %[[mul]] : i32
@@ -291,10 +291,10 @@ gpu.module @test_kernel [#xevm.target<chip = "pvc">] {
 
   %smem_coop_a = memref.subview %arg0[64, 0][1, 16][1, 1] : memref<256x16xbf16, 3> to memref<1x16xbf16, strided<[16, 1]>, 3>
 
-  //CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %{{.*}} : memref<1x16xbf16, strided<[16, 1]>, 3> -> index
-  //CHECK: %[[C0:.*]] = arith.constant 0 : index
+  //CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %{{.*}}:2, %{{.*}}:2 = memref.extract_strided_metadata
+  //CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE]] : memref<bf16, 3> -> index
   //CHECK: %[[CAST0:.*]] = arith.index_castui %[[INTPTR]] : index to i32
-  //CHECK: %[[CAST1:.*]] = arith.index_castui %[[C0]] : index to i32
+  //CHECK: %[[CAST1:.*]] = arith.index_castui %[[OFFSET]] : index to i32
   //CHECK: %[[C2:.*]] = arith.constant 2 : i32
   //CHECK: %[[MUL:.*]] = arith.muli %[[CAST1]], %[[C2]] : i32
   //CHECK: %{{.*}} = arith.addi %[[CAST0]], %[[MUL]] : i32
diff --git a/mlir/test/Conversion/XeGPUToXeVM/loadstore_nd_sub_byte.mlir b/mlir/test/Conversion/XeGPUToXeVM/loadstore_nd_sub_byte.mlir
index a8b5e695d4c38..0e25e0095f9af 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/loadstore_nd_sub_byte.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/loadstore_nd_sub_byte.mlir
@@ -10,11 +10,13 @@ gpu.module @load_store_check {
         // CHECK: %[[C16_I32:.*]] = arith.constant 16 : i32
         // CHECK: %[[C128_I32:.*]] = arith.constant 128 : i32
         // CHECK: %[[SRCCE:.*]] = memref.memory_space_cast %[[ARG0]]
-        // CHECK: %[[SRCINDEX:.*]] = memref.extract_aligned_pointer_as_index %[[SRCCE]]
+        // CHECK: %[[SRC_BASE:.*]], %{{.*}}, %{{.*}}:2, %{{.*}}:2 = memref.extract_strided_metadata %[[SRCCE]]
+        // CHECK: %[[SRCINDEX:.*]] = memref.extract_aligned_pointer_as_index %[[SRC_BASE]]
         // CHECK: %[[SRCPTR64:.*]] = arith.index_castui %[[SRCINDEX]] : index to i64
         %srcce = memref.memory_space_cast %src : memref<16x128xi4, 1> to memref<16x128xi4>
         // CHECK: %[[DSTTE:.*]] = memref.memory_space_cast %[[ARG1]]
-        // CHECK: %[[DSTINDEX:.*]] = memref.extract_aligned_pointer_as_index %[[DSTTE]]
+        // CHECK: %[[DST_BASE:.*]], %{{.*}}, %{{.*}}:2, %{{.*}}:2 = memref.extract_strided_metadata %[[DSTTE]]
+        // CHECK: %[[DSTINDEX:.*]] = memref.extract_aligned_pointer_as_index %[[DST_BASE]]
         // CHECK: %[[DSTPTR64:.*]] = arith.index_castui %[[DSTINDEX]] : index to i64
         %dstte = memref.memory_space_cast %dst : memref<16x128xi4, 1> to memref<16x128xi4>
 
diff --git a/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir b/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir
index d7211321b659e..194905a462432 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/loadstoreprefetch.mlir
@@ -98,10 +98,14 @@ gpu.func @prefetch_memref_src_value_offset(%src: memref<256xf32>, %offset: vecto
   // CHECK: %[[C4_I64:.*]] = arith.constant 4 : i64
   // CHECK: %[[VAR0:.*]] = vector.extract %[[ARG1]][0] : index from vector<1xindex>
   // CHECK: %[[VAR1:.*]] = arith.index_castui %[[VAR0]] : index to i64
-  // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[ARG0]] : memref<256xf32> -> index
+  // CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %[[ARG0]]
+  // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE]] : memref<f32> -> index
   // CHECK: %[[VAR2:.*]] = arith.index_castui %[[INTPTR]] : index to i64
+  // CHECK: %[[OFF_I64:.*]] = arith.index_castui %[[OFFSET]] : index to i64
+  // CHECK: %[[OFF_BYTES:.*]] = arith.muli %[[OFF_I64]], %[[C4_I64]] : i64
+  // CHECK: %[[BASE_ADDR:.*]] = arith.addi %[[VAR2]], %[[OFF_BYTES]] : i64
   // CHECK: %[[VAR3:.*]] = arith.muli %[[VAR1]], %[[C4_I64]] : i64
-  // CHECK: %[[VAR4:.*]] = arith.addi %[[VAR2]], %[[VAR3]] : i64
+  // CHECK: %[[VAR4:.*]] = arith.addi %[[BASE_ADDR]], %[[VAR3]] : i64
   // CHECK: %[[VAR5:.*]] = llvm.inttoptr %[[VAR4]] : i64 to !llvm.ptr<1>
   // CHECK: xevm.prefetch %[[VAR5]] <{cache_control = #xevm.load_cache_control<L1c_L2uc_L3c>}> : (!llvm.ptr<1>)
   xegpu.prefetch %src[%offset] <{l1_hint = #xegpu.cache_hint<cached>, l2_hint = #xegpu.cache_hint<uncached>}>
@@ -119,11 +123,15 @@ gpu.func @load_gather_from_dyn_memref_subview(%dyn: memref<?xf16>, %offset: vect
   %id = gpu.subgroup_id : index
   %src = memref.subview %dyn[%id][16][1] : memref<?xf16> to memref<16xf16, strided<[1]>>
 
-  // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %{{.*}} : memref<16xf16, strided<[1]>> -> index
+  // CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %{{.*}} : memref<16xf16, strided<[1]>>
+  // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE]] : memref<f16> -> index
   // CHECK: %[[CAST1:.*]] = arith.index_castui %[[INTPTR]] : index to i64
-  // CHECK: %[[MUL1:.*]] = arith.muli %{{.*}}, %{{.*}} : i64
-  // CHECK: %[[ADD1:.*]] = arith.addi %[[CAST1]], %[[MUL1]] : i64
-  // CHECK: %{{.*}} = llvm.inttoptr %[[ADD1]] : i64 to !llvm.ptr<1>
+  // CHECK: %[[OFF_I64:.*]] = arith.index_castui %[[OFFSET]] : index to i64
+  // CHECK: %[[OFF_BYTES:.*]] = arith.muli %[[OFF_I64]], %{{.*}} : i64
+  // CHECK: %[[ADD1:.*]] = arith.addi %[[CAST1]], %[[OFF_BYTES]] : i64
+  // CHECK: %[[MUL2:.*]] = arith.muli %{{.*}}, %{{.*}} : i64
+  // CHECK: %[[ADD2:.*]] = arith.addi %[[ADD1]], %[[MUL2]] : i64
+  // CHECK: %{{.*}} = llvm.inttoptr %[[ADD2]] : i64 to !llvm.ptr<1>
 
   %0 = xegpu.load %src[%offset], %mask <{l1_hint = #xegpu.cache_hint<cached>, l2_hint = #xegpu.cache_hint<uncached>}>
       : memref<16xf16, strided<[1]>>, vector<1xindex>, vector<1xi1> -> vector<1xf16>
diff --git a/mlir/test/Conversion/XeGPUToXeVM/materializecast.mlir b/mlir/test/Conversion/XeGPUToXeVM/materializecast.mlir
index 969c369cd17e8..34a594050adcc 100644
--- a/mlir/test/Conversion/XeGPUToXeVM/materializecast.mlir
+++ b/mlir/test/Conversion/XeGPUToXeVM/materializecast.mlir
@@ -7,8 +7,13 @@ gpu.module @materializecast {
   // CHECK-LABEL: gpu.func @materialize_memref
   // CHECK-SAME: %[[ARG0:.*]]: memref<128xf32>
   gpu.func @materialize_memref(%src: memref<128xf32>) kernel {
-    // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[ARG0]] : memref<128xf32> -> index
-    // CHECK: %[[CASTED:.*]] = arith.index_castui %[[INTPTR]] : index to i64
+    // CHECK: %[[BASE:.*]], %[[OFFSET:.*]], %{{.*}}, %{{.*}} = memref.extract_strided_metadata %[[ARG0]]
+    // CHECK: %[[INTPTR:.*]] = memref.extract_aligned_pointer_as_index %[[BASE]] : memref<f32> -> index
+    // CHECK: %[[BASE_I64:.*]] = arith.index_castui %[[INTPTR]] : index to i64
+    // CHECK: %[[OFFSET_I64:.*]] = arith.index_castui %[[OFFSET]] : index to i64
+    // CHECK: %[[ELEM_SIZE:.*]] = arith.constant 4 : i64
+    // CHECK: %[[OFF_BYTES:.*]] = arith.muli %[[OFFSET_I64]], %[[ELEM_SIZE]] : i64
+    // CHECK: %[[CASTED:.*]] = arith.addi %[[BASE_I64]], %[[OFF_BYTES]] : i64
     %offset = arith.constant 0 : index
     %mask = arith.constant 1 : i1
     %val = xegpu.load %src[%offset], %mask : memref<128xf32>, index, i1 -> f32
diff --git a/mlir/test/Dialect/Affine/memref-stride-calculation.mlir b/mlir/test/Dialect/Affine/memref-stride-calculation.mlir
index e5547cb0080b8..bd35d376f4578 100644
--- a/mlir/test/Dialect/Affine/memref-stride-calculation.mlir
+++ b/mlir/test/Dialect/Affine/memref-stride-calculation.mlir
@@ -3,61 +3,61 @@
 func.func @f(%0: index) {
 // CHECK-LABEL: Testing: f
   %1 = memref.alloc() : memref<3x4x5xf32>
-// CHECK: MemRefType offset: 0 strides: 20, 5, 1
+// CHECK: MemRefType strides: 20, 5, 1
   %2 = memref.alloc(%0) : memref<3x4x?xf32>
-// CHECK: MemRefType offset: 0 strides: ?, ?, 1
+// CHECK: MemRefType strides: ?, ?, 1
   %3 = memref.alloc(%0) : memref<3x?x5xf32>
-// CHECK: MemRefType offset: 0 strides: ?, 5, 1
+// CHECK: MemRefType strides: ?, 5, 1
   %4 = memref.alloc(%0) : memref<?x4x5xf32>
-// CHECK: MemRefType offset: 0 strides: 20, 5, 1
+// CHECK: MemRefType strides: 20, 5, 1
   %5 = memref.alloc(%0, %0) : memref<?x4x?xf32>
-// CHECK: MemRefType offset: 0 strides: ?, ?, 1
+// CHECK: MemRefType strides: ?, ?, 1
   %6 = memref.alloc(%0, %0, %0) : memref<?x?x?xf32>
-// CHECK: MemRefType offset: 0 strides: ?, ?, 1
+// CHECK: MemRefType strides: ?, ?, 1
 
   %11 = memref.alloc() : memref<3x4x5xf32, affine_map<(i, j, k)->(i, j, k)>>
-// CHECK: MemRefType offset: 0 strides: 20, 5, 1
+// CHECK: MemRefType strides: 20, 5, 1
   %b11 = memref.alloc() : memref<3x4x5xf32, strided<[20, 5, 1]>>
-// CHECK: MemRefType offset: 0 strides: 20, 5, 1
+// CHECK: MemRefType strides: 20, 5, 1
   %12 = memref.alloc(%0) : memref<3x4x?xf32, affine_map<(i, j, k)->(i, j, k)>>
-// CHECK: MemRefType offset: 0 strides: ?, ?, 1
+// CHECK: MemRefType strides: ?, ?, 1
   %13 = memref.alloc(%0) : memref<3x?x5xf32, affine_map<(i, j, k)->(i, j, k)>>
-// CHECK: MemRefType offset: 0 strides: ?, 5, 1
+// CHECK: MemRefType strides: ?, 5, 1
   %14 = memref.alloc(%0) : memref<?x4x5xf32, affine_map<(i, j, k)->(i, j, k)>>
-// CHECK: MemRefType offset: 0 strides: 20, 5, 1
+// CHECK: MemRefType strides: 20, 5, 1
   %15 = memref.alloc(%0, %0) : memref<?x4x?xf32, affine_map<(i, j, k)->(i, j, k)>>
-// CHECK: MemRefType offset: 0 strides: ?, ?, 1
+// CHECK: MemRefType strides: ?, ?, 1
   %16 = memref.alloc(%0, %0, %0) : memref<?x?x?xf32, affine_map<(i, j, k)->(i, j, k)>>
-// CHECK: MemRefType offset: 0 strides: ?, ?, 1
+// CHECK: MemRefType strides: ?, ?, 1
 
   %21 = memref.alloc()[%0] : memref<3x4x5xf32, affine_map<(i, j, k)[M]->(32 * i + 16 * j + M * k + 1)>>
-// CHECK: MemRefType offset: 1 strides: 32, 16, ?
+// CHECK: MemRefType strides: 32, 16, ?
   %22 = memref.alloc()[%0] : memref<3x4x5xf32, affine_map<(i, j, k)[M]->(32 * i + M * j + 16 * k + 3)>>
-// CHECK: MemRefType offset: 3 strides: 32, ?, 16
+// CHECK: MemRefType strides: 32, ?, 16
   %b22 = memref.alloc(%0)[%0, %0] : memref<3x4x?xf32, strided<[?, ?, 1]>>
-// CHECK: MemRefType offset: 0 strides: ?, ?, 1
+// CHECK: MemRefType strides: ?, ?, 1
   %23 = memref.alloc(%0)[%0] : memref<3x?x5xf32, affine_map<(i, j, k)[M]->(M * i + 32 * j + 16 * k + 7)>>
-// CHECK: MemRefType offset: 7 strides: ?, 32, 16
+// CHECK: MemRefType strides: ?, 32, 16
   %b23 = memref.alloc(%0)[%0] : memref<3x?x5xf32, strided<[?, 5, 1]>>
-// CHECK: MemRefType offset: 0 strides: ?, 5, 1
+// CHECK: MemRefType strides: ?, 5, 1
   %24 = memref.alloc(%0)[%0] : memref<3x?x5xf32, affine_map<(i, j, k)[M]->(M * i + 32 * j + 16 * k + M)>>
-// CHECK: MemRefType offset: ? strides: ?, 32, 16
+// CHECK: MemRefType strides: ?, 32, 16
   %b24 = memref.alloc(%0)[%0] : memref<3x?x5xf32, strided<[?, 32, 16]>>
-// CHECK: MemRefType offset: 0 strides: ?, 32, 16
+// CHECK: MemRefType strides: ?, 32, 16
   %25 = memref.alloc(%0, %0)[%0, %0] : memref<?x?x16xf32, affine_map<(i, j, k)[M, N]->(M * i + N * j + k + 1)>>
-// CHECK: MemRefType offset: 1 strides: ?, ?, 1
+// CHECK: MemRefType strides: ?, ?, 1
   %b25 = memref.alloc(%0, %0)[%0, %0] : memref<?x?x16xf32, strided<[?, ?, 1]>>
-// CHECK: MemRefType offset: 0 strides: ?, ?, 1
+// CHECK: MemRefType strides: ?, ?, 1
   %26 = memref.alloc(%0)[] : memref<?xf32, affine_map<(i)[M]->(i)>>
-// CHECK: MemRefType offset: 0 strides: 1
+// CHECK: MemRefType strides: 1
   %27 = memref.alloc()[%0] : memref<5xf32, affine_map<(i)[M]->(M)>>
-// CHECK: MemRefType offset: ? strides: 0
+// CHECK: MemRefType strides: 0
   %28 = memref.alloc()[%0] : memref<5xf32, affine_map<(i)[M]->(123)>>
-// CHECK: MemRefType offset: 123 strides: 0
+// CHECK: MemRefType strides: 0
   %29 = memref.alloc()[%0] : memref<f32, affine_map<()[M]->(M)>>
-// CHECK: MemRefType offset: ? strides:
+// CHECK: MemRefType strides:
   %30 = memref.alloc()[%0] : memref<f32, affine_map<()[M]->(123)>>
-// CHECK: MemRefType offset: 123 strides:
+// CHECK: MemRefType strides:
 
   %101 = memref.alloc() : memref<3x4x5xf32, affine_map<(i, j, k)->(i floordiv 4 + j + k)>>
 // CHECK: MemRefType memref<3x4x5xf32, affine_map<(d0, d1, d2) -> (d0 floordiv 4 + d1 + d2)>> cannot be converted to strided form
@@ -67,13 +67,13 @@ func.func @f(%0: index) {
 // CHECK: MemRefType memref<3x4x5xf32, affine_map<(d0, d1, d2) -> (d0 mod 4 + d1 + d2)>> cannot be converted to strided form
 
   %200 = memref.alloc()[%0, %0, %0] : memref<3x4x5xf32, affine_map<(i, j, k)[M, N, K]->(M * i + N * i + N * j + K * k - (M + N - 20)* i)>>
-  // CHECK: MemRefType offset: 0 strides: 20, ?, ?
+  // CHECK: MemRefType strides: 20, ?, ?
   %201 = memref.alloc()[%0, %0, %0] : memref<3x4x5xf32, affine_map<(i, j, k)[M, N, K]->(M * i + N * i + N * K * j + K * K * k - (M + N - 20) * (i + 1))>>
-  // CHECK: MemRefType offset: ? strides: 20, ?, ?
+  // CHECK: MemRefType strides: 20, ?, ?
   %202 = memref.alloc()[%0, %0, %0] : memref<3x4x5xf32, affine_map<(i, j, k)[M, N, K]->(M * (i + 1) + j + k - M)>>
-  // CHECK: MemRefType offset: 0 strides: ?, 1, 1
+  // CHECK: MemRefType strides: ?, 1, 1
   %203 = memref.alloc()[%0, %0, %0] : memref<3x4x5xf32, affine_map<(i, j, k)[M, N, K]->(M + M * (i + N * (j + K * k)))>>
-  // CHECK: MemRefType offset: ? strides: ?, ?, ?
+  // CHECK: MemRefType strides: ?, ?, ?
 
   return
 }
diff --git a/mlir/test/Dialect/GPU/decompose-memrefs.mlir b/mlir/test/Dialect/GPU/decompose-memrefs.mlir
index 5a890acec669c..8a3cd8748c745 100644
--- a/mlir/test/Dialect/GPU/decompose-memrefs.mlir
+++ b/mlir/test/Dialect/GPU/decompose-memrefs.mlir
@@ -1,12 +1,12 @@
 // RUN: mlir-opt -gpu-decompose-memrefs -allow-unregistered-dialect -split-input-file %s | FileCheck %s
 
-//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4] -> (s0 * s1 + s2 * s3 + s4)>
+//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 + s1 * s2 + s3 * s4 + s5)>
 //       CHECK: @decompose_store
 //  CHECK-SAME: (%[[VAL:.*]]: f32, %[[MEM:.*]]: memref<?x?x?xf32>)
 //       CHECK:  %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[MEM]]
 //       CHECK:  gpu.launch
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
-//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
+//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[OFFSET]], %[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
 //       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[]>>
 //       CHECK:  memref.store %[[VAL]], %[[PTR]][] : memref<f32, strided<[]>>
 func.func @decompose_store(%arg0 : f32, %arg1 : memref<?x?x?xf32>) {
@@ -26,13 +26,13 @@ func.func @decompose_store(%arg0 : f32, %arg1 : memref<?x?x?xf32>) {
 
 // -----
 
-//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 * s1 + s2 * s3 + s4 * s5)>
+//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6] -> (s0 + s1 * s2 + s3 * s4 + s5 * s6)>
 //       CHECK: @decompose_store_strided
 //  CHECK-SAME: (%[[VAL:.*]]: f32, %[[MEM:.*]]: memref<?x?x?xf32, strided<[?, ?, ?]>>)
 //       CHECK:  %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[MEM]]
 //       CHECK:  gpu.launch
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
-//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]], %[[STRIDES]]#2]
+//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[OFFSET]], %[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]], %[[STRIDES]]#2]
 //       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[]>>
 //       CHECK:  memref.store %[[VAL]], %[[PTR]][] : memref<f32, strided<[]>>
 func.func @decompose_store_strided(%arg0 : f32, %arg1 : memref<?x?x?xf32, strided<[?, ?, ?]>>) {
@@ -52,13 +52,13 @@ func.func @decompose_store_strided(%arg0 : f32, %arg1 : memref<?x?x?xf32, stride
 
 // -----
 
-//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4] -> (s0 * s1 + s2 * s3 + s4)>
+//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 + s1 * s2 + s3 * s4 + s5)>
 //       CHECK: @decompose_load
 //  CHECK-SAME: (%[[MEM:.*]]: memref<?x?x?xf32>)
 //       CHECK:  %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[MEM]]
 //       CHECK:  gpu.launch
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
-//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
+//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[OFFSET]], %[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
 //       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [], strides: [] : memref<f32> to memref<f32, strided<[]>>
 //       CHECK:  %[[RES:.*]] = memref.load %[[PTR]][] : memref<f32, strided<[]>>
 //       CHECK:  "test.test"(%[[RES]]) : (f32) -> ()
@@ -80,13 +80,13 @@ func.func @decompose_load(%arg0 : memref<?x?x?xf32>) {
 
 // -----
 
-//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4] -> (s0 * s1 + s2 * s3 + s4)>
+//       CHECK: #[[MAP:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 + s1 * s2 + s3 * s4 + s5)>
 //       CHECK: @decompose_subview
 //  CHECK-SAME: (%[[MEM:.*]]: memref<?x?x?xf32>)
 //       CHECK:  %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[MEM]]
 //       CHECK:  gpu.launch
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
-//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
+//       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[OFFSET]], %[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
 //       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX]]], sizes: [%{{.*}}, %{{.*}}, %{{.*}}], strides: [%[[STRIDES]]#0, %[[STRIDES]]#1, 1]
 //       CHECK:  "test.test"(%[[PTR]]) : (memref<?x?x?xf32, strided<[?, ?, ?]>>) -> ()
 func.func @decompose_subview(%arg0 : memref<?x?x?xf32>) {
@@ -109,7 +109,7 @@ func.func @decompose_subview(%arg0 : memref<?x?x?xf32>) {
 
 //       CHECK: #[[MAP:.*]] = affine_map<()[s0] -> (s0 * 2)>
 //       CHECK: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 * 3)>
-//       CHECK: #[[MAP2:.*]] = affine_map<()[s0, s1, s2, s3, s4] -> (s0 * s1 + s2 * s3 + s4)>
+//       CHECK: #[[MAP2:.*]] = affine_map<()[s0, s1, s2, s3, s4, s5] -> (s0 + s1 * s2 + s3 * s4 + s5)>
 //       CHECK: @decompose_subview_strided
 //  CHECK-SAME: (%[[MEM:.*]]: memref<?x?x?xf32>)
 //       CHECK:  %[[BASE:.*]], %[[OFFSET:.*]], %[[SIZES:.*]]:3, %[[STRIDES:.*]]:3 = memref.extract_strided_metadata %[[MEM]]
@@ -117,7 +117,7 @@ func.func @decompose_subview(%arg0 : memref<?x?x?xf32>) {
 //  CHECK-SAME:  threads(%[[TX:.*]], %[[TY:.*]], %[[TZ:.*]]) in
 //       CHECK:  %[[IDX:.*]] = affine.apply #[[MAP]]()[%[[STRIDES]]#0]
 //       CHECK:  %[[IDX1:.*]] = affine.apply #[[MAP1]]()[%[[STRIDES]]#1]
-//       CHECK:  %[[IDX2:.*]] = affine.apply #[[MAP2]]()[%[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
+//       CHECK:  %[[IDX2:.*]] = affine.apply #[[MAP2]]()[%[[OFFSET]], %[[TX]], %[[STRIDES]]#0, %[[TY]], %[[STRIDES]]#1, %[[TZ]]]
 //       CHECK:  %[[PTR:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[IDX2]]], sizes: [%{{.*}}, %{{.*}}, %{{.*}}], strides: [%[[IDX]], %[[IDX1]], 4]
 //       CHECK:  "test.test"(%[[PTR]]) : (memref<?x?x?xf32, strided<[?, ?, 4]>>) -> ()
 func.func @decompose_subview_strided(%arg0 : memref<?x?x?xf32>) {
diff --git a/mlir/test/lib/Analysis/TestMemRefStrideCalculation.cpp b/mlir/test/lib/Analysis/TestMemRefStrideCalculation.cpp
index f17f5db2fa22f..73ac4842d4f50 100644
--- a/mlir/test/lib/Analysis/TestMemRefStrideCalculation.cpp
+++ b/mlir/test/lib/Analysis/TestMemRefStrideCalculation.cpp
@@ -33,19 +33,13 @@ void TestMemRefStrideCalculation::runOnOperation() {
   llvm::outs() << "Testing: " << getOperation().getName() << "\n";
   getOperation().walk([&](memref::AllocOp allocOp) {
     auto memrefType = cast<MemRefType>(allocOp.getResult().getType());
-    int64_t offset;
     SmallVector<int64_t, 4> strides;
-    if (failed(memrefType.getStridesAndOffset(strides, offset))) {
+    if (failed(memrefType.getStrides(strides))) {
       llvm::outs() << "MemRefType " << memrefType << " cannot be converted to "
                    << "strided form\n";
       return;
     }
-    llvm::outs() << "MemRefType offset: ";
-    if (ShapedType::isDynamic(offset))
-      llvm::outs() << "?";
-    else
-      llvm::outs() << offset;
-    llvm::outs() << " strides: ";
+    llvm::outs() << "MemRefType strides: ";
     llvm::interleaveComma(strides, llvm::outs(), [&](int64_t v) {
       if (ShapedType::isDynamic(v))
         llvm::outs() << "?";

>From 3a1518c48ee400877775a3bb1d7b53a1324c5de1 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 13:40:47 +0200
Subject: [PATCH 24/27] [WIP][mlir] step 4: audit fixes for offset removal

Audit-driven correctness fixes following the static-offset removal:

LLVM lowering: callers that rebuild a memref descriptor from a source
descriptor were silently dropping the source's runtime offset because the
old code path could rely on the type-level offset being statically 0. Now
that the type carries no offset, those paths must thread the runtime
offset through the descriptor or bake it into the aligned pointer.

- ViewOpLowering (MemRefToLLVM): bake source offset into the aligned ptr
  via bufferPtr() before applying byteShift; result descriptor offset = 0.
- MemRefReshapeOpLowering (MemRefToLLVM): GEP the aligned ptr by the
  source's runtime offset before installing it on the result; result
  descriptor offset = 0.
- VectorTypeCastOpConversion (VectorToLLVM): bake source offset (in
  source-element units) into the aligned ptr via bufferPtr(); result
  offset = 0. Element type changes between source and target so we
  cannot copy raw offset.
- ToPtrOpConversion (PtrToLLVM): return bufferPtr() (aligned + offset),
  not the raw aligned ptr; ToPtr's contract is the first logical element.

Flang follow-on:
- flang/lib/Optimizer/CodeGen/CodeGen.cpp, FIRToMemRef.cpp, and
  FIRToMemRefTypeConverter.h: use the renamed getStrides API and the
  one-arg StridedLayoutAttr::get to keep flang building.

Test CHECK updates for the new IR shape in:
- Conversion/MemRefToLLVM/{memref-to-llvm,convert-static-memref-ops}
- Conversion/PtrToLLVM/ptr-to-llvm
- Conversion/VectorToLLVM/vector-to-llvm-interface

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
---
 .../Transforms/FIRToMemRefTypeConverter.h     |  2 +-
 flang/lib/Optimizer/CodeGen/CodeGen.cpp       |  4 ++--
 .../lib/Optimizer/Transforms/FIRToMemRef.cpp  |  5 ++--
 .../Conversion/MemRefToLLVM/MemRefToLLVM.cpp  | 24 ++++++++++++-------
 mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp   |  9 ++++---
 .../VectorToLLVM/ConvertVectorToLLVM.cpp      |  8 ++++---
 .../convert-static-memref-ops.mlir            | 12 +++++++---
 .../MemRefToLLVM/memref-to-llvm.mlir          | 16 +++++++++----
 .../Conversion/PtrToLLVM/ptr-to-llvm.mlir     | 14 +++++++----
 .../vector-to-llvm-interface.mlir             |  8 +++++--
 10 files changed, 68 insertions(+), 34 deletions(-)

diff --git a/flang/include/flang/Optimizer/Transforms/FIRToMemRefTypeConverter.h b/flang/include/flang/Optimizer/Transforms/FIRToMemRefTypeConverter.h
index fd434b1f09c9b..09409e392dd4c 100644
--- a/flang/include/flang/Optimizer/Transforms/FIRToMemRefTypeConverter.h
+++ b/flang/include/flang/Optimizer/Transforms/FIRToMemRefTypeConverter.h
@@ -191,7 +191,7 @@ class FIRToMemRefTypeConverter : public mlir::TypeConverter {
       auto memRefTy = convertMemrefType(elTy);
       mlir::MemRefType dynTy = mlir::MemRefType::Builder(memRefTy).setLayout(
           mlir::StridedLayoutAttr::get(
-              memRefTy.getContext(), mlir::ShapedType::kDynamic,
+              memRefTy.getContext(),
               llvm::SmallVector<int64_t>(memRefTy.getRank(),
                                          mlir::ShapedType::kDynamic)));
       return dynTy;
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index b03b169e0af4f..7b26bd9d7f8c3 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -945,9 +945,9 @@ struct ConvertOpConversion : public fir::FIROpConversion<fir::ConvertOp> {
       mlir::Value basePtr = adaptor.getValue();
       assert(basePtr && "null base pointer");
 
-      auto [strides, offset] = memRefTy.getStridesAndOffset();
+      // Offset is no longer carried by MemRefType; only strides matter here.
+      llvm::SmallVector<int64_t> strides = memRefTy.getStrides();
       bool hasStaticLayout =
-          mlir::ShapedType::isStatic(offset) &&
           llvm::none_of(strides, mlir::ShapedType::isDynamic);
 
       auto *firConv =
diff --git a/flang/lib/Optimizer/Transforms/FIRToMemRef.cpp b/flang/lib/Optimizer/Transforms/FIRToMemRef.cpp
index ec58d6f3f1447..157dc37b0f506 100644
--- a/flang/lib/Optimizer/Transforms/FIRToMemRef.cpp
+++ b/flang/lib/Optimizer/Transforms/FIRToMemRef.cpp
@@ -694,10 +694,9 @@ FIRToMemRef::convertArrayCoorOp(Operation *memOp, fir::ArrayCoorOp arrayCoorOp,
 
   assert(strides.size() == sizes.size() && sizes.size() == rank);
 
-  int64_t dynamicOffset = ShapedType::kDynamic;
   SmallVector<int64_t> dynamicStrides(rank, ShapedType::kDynamic);
-  auto stridedLayout = StridedLayoutAttr::get(convertedVal.getContext(),
-                                              dynamicOffset, dynamicStrides);
+  auto stridedLayout =
+      StridedLayoutAttr::get(convertedVal.getContext(), dynamicStrides);
 
   SmallVector<int64_t> dynamicShape(rank, ShapedType::kDynamic);
   memRefTy =
diff --git a/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp b/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
index b7863061a2199..29ad68117fc7e 100644
--- a/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
+++ b/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
@@ -1497,18 +1497,21 @@ struct MemRefReshapeOpLowering
       auto desc =
           MemRefDescriptor::poison(rewriter, loc, llvmTargetDescriptorTy);
 
-      // Set allocated and aligned pointers.
-      Value allocatedPtr, alignedPtr;
+      // Set allocated and aligned pointers. Bake the source descriptor's
+      // runtime offset into the target's aligned pointer so we can start the
+      // new descriptor at offset 0 without losing addressing information.
+      Value allocatedPtr, alignedPtr, srcOffset;
       extractPointersAndOffset(loc, rewriter, *getTypeConverter(),
                                reshapeOp.getSource(), adaptor.getSource(),
-                               &allocatedPtr, &alignedPtr);
+                               &allocatedPtr, &alignedPtr, &srcOffset);
+      Type elemLLVMTy =
+          typeConverter->convertType(targetMemRefType.getElementType());
+      alignedPtr = LLVM::GEPOp::create(rewriter, loc, alignedPtr.getType(),
+                                       elemLLVMTy, alignedPtr, srcOffset);
       desc.setAllocatedPtr(rewriter, loc, allocatedPtr);
       desc.setAlignedPtr(rewriter, loc, alignedPtr);
 
-      // Extract the strides from the type. Offset is no longer carried by the
-      // type; reshape preserves the source descriptor's offset, but here we
-      // reconstruct the descriptor for the target type and conventionally start
-      // the new descriptor at offset 0.
+      // Extract the strides from the type.
       SmallVector<int64_t> strides;
       if (failed(targetMemRefType.getStrides(strides)))
         return rewriter.notifyMatchFailure(
@@ -1838,8 +1841,11 @@ struct ViewOpLowering : public ConvertOpToLLVMPattern<memref::ViewOp> {
     auto srcMemRefType = cast<MemRefType>(viewOp.getSource().getType());
     targetMemRef.setAllocatedPtr(rewriter, loc, allocatedPtr);
 
-    // Field 2: Copy the actual aligned pointer to payload.
-    Value alignedPtr = sourceMemRef.alignedPtr(rewriter, loc);
+    // Field 2: Compute the target aligned pointer. Start from the source's
+    // runtime buffer pointer (aligned ptr + source offset) so any non-zero
+    // source offset is preserved, then apply the byteShift.
+    Value alignedPtr = sourceMemRef.bufferPtr(rewriter, loc, *getTypeConverter(),
+                                              srcMemRefType);
     alignedPtr = LLVM::GEPOp::create(
         rewriter, loc, alignedPtr.getType(),
         typeConverter->convertType(srcMemRefType.getElementType()), alignedPtr,
diff --git a/mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp b/mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp
index 018e70d6ddd32..68055bccee0e5 100644
--- a/mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp
+++ b/mlir/lib/Conversion/PtrToLLVM/PtrToLLVM.cpp
@@ -319,12 +319,15 @@ LogicalResult
 ToPtrOpConversion::matchAndRewrite(ptr::ToPtrOp op, OpAdaptor adaptor,
                                    ConversionPatternRewriter &rewriter) const {
   // Bail if it's not a memref.
-  if (!isa<MemRefType>(op.getPtr().getType()))
+  auto memrefTy = dyn_cast<MemRefType>(op.getPtr().getType());
+  if (!memrefTy)
     return rewriter.notifyMatchFailure(op, "Expected a memref input");
 
-  // Extract the aligned pointer from the memref descriptor.
+  // Extract the buffer pointer (aligned ptr + runtime offset) so the
+  // resulting raw pointer refers to the first logical element of the memref.
+  MemRefDescriptor desc(adaptor.getPtr());
   rewriter.replaceOp(
-      op, MemRefDescriptor(adaptor.getPtr()).alignedPtr(rewriter, op.getLoc()));
+      op, desc.bufferPtr(rewriter, op.getLoc(), *getTypeConverter(), memrefTy));
   return success();
 }
 
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index 69a8db43e200e..d9ee678569d7e 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1458,10 +1458,12 @@ class VectorTypeCastOpConversion
     Value allocated = sourceMemRef.allocatedPtr(rewriter, loc);
     desc.setAllocatedPtr(rewriter, loc, allocated);
 
-    // Set aligned ptr.
-    Value ptr = sourceMemRef.alignedPtr(rewriter, loc);
+    // Set aligned ptr. Element type changes between source and target, so
+    // bake the source's runtime offset (in source-element units) into the
+    // aligned pointer and leave the target descriptor's offset at 0.
+    Value ptr = sourceMemRef.bufferPtr(rewriter, loc, *getTypeConverter(),
+                                       sourceMemRefType);
     desc.setAlignedPtr(rewriter, loc, ptr);
-    // Fill offset 0.
     auto attr = rewriter.getIntegerAttr(rewriter.getIndexType(), 0);
     auto zero = LLVM::ConstantOp::create(rewriter, loc, int64Ty, attr);
     desc.setOffset(rewriter, loc, zero);
diff --git a/mlir/test/Conversion/MemRefToLLVM/convert-static-memref-ops.mlir b/mlir/test/Conversion/MemRefToLLVM/convert-static-memref-ops.mlir
index d299d21b85c57..12da7b86c3c6f 100644
--- a/mlir/test/Conversion/MemRefToLLVM/convert-static-memref-ops.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/convert-static-memref-ops.mlir
@@ -258,8 +258,10 @@ func.func @memref.reshape(%arg0: memref<4x5x6xf32>) -> memref<2x6x20xf32> {
   // CHECK: %[[undef:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
   // CHECK: %[[elem0:.*]] = llvm.extractvalue %[[cast0]][0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
   // CHECK: %[[elem1:.*]] = llvm.extractvalue %[[cast0]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK: %[[srcoff:.*]] = llvm.extractvalue %[[cast0]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK: %[[bufptr:.*]] = llvm.getelementptr %[[elem1]][%[[srcoff]]] : (!llvm.ptr, i64) -> !llvm.ptr, f32
   // CHECK: %[[insert0:.*]] = llvm.insertvalue %[[elem0]], %[[undef]][0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-  // CHECK: %[[insert1:.*]] = llvm.insertvalue %[[elem1]], %[[insert0:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK: %[[insert1:.*]] = llvm.insertvalue %[[bufptr]], %[[insert0:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 
   // CHECK: %[[zero:.*]] = llvm.mlir.constant(0 : index) : i64
   // CHECK: %[[insert2:.*]] = llvm.insertvalue %[[zero]], %[[insert1]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
@@ -296,8 +298,10 @@ func.func @memref.reshape.dynamic.dim(%arg: memref<?x?x?xf32>, %shape: memref<4x
   // CHECK: %[[undef:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>
   // CHECK: %[[alloc_ptr:.*]] = llvm.extractvalue %[[arg_cast]][0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
   // CHECK: %[[align_ptr:.*]] = llvm.extractvalue %[[arg_cast]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK: %[[arg_off:.*]] = llvm.extractvalue %[[arg_cast]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+  // CHECK: %[[arg_bufptr:.*]] = llvm.getelementptr %[[align_ptr]][%[[arg_off]]] : (!llvm.ptr, i64) -> !llvm.ptr, f32
   // CHECK: %[[insert0:.*]] = llvm.insertvalue %[[alloc_ptr]], %[[undef]][0] : !llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>
-  // CHECK: %[[insert1:.*]] = llvm.insertvalue %[[align_ptr]], %[[insert0]][1] : !llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>
+  // CHECK: %[[insert1:.*]] = llvm.insertvalue %[[arg_bufptr]], %[[insert0]][1] : !llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>
 
   // CHECK: %[[zero0:.*]] = llvm.mlir.constant(0 : index) : i64
   // CHECK: %[[insert2:.*]] = llvm.insertvalue %[[zero0]], %[[insert1]][2] : !llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>
@@ -349,8 +353,10 @@ func.func @memref.reshape_index(%arg0: memref<?x?xi32>, %shape: memref<1xindex>)
   // CHECK: %[[undef:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK: %[[alloc_ptr:.*]] = llvm.extractvalue %[[arg_cast]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[align_ptr:.*]] = llvm.extractvalue %[[arg_cast]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[arg_off:.*]] = llvm.extractvalue %[[arg_cast]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+  // CHECK: %[[arg_bufptr:.*]] = llvm.getelementptr %[[align_ptr]][%[[arg_off]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
   // CHECK: %[[insert0:.*]] = llvm.insertvalue %[[alloc_ptr]], %[[undef:.*]][0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-  // CHECK: %[[insert1:.*]] = llvm.insertvalue %[[align_ptr]], %[[insert0:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[insert1:.*]] = llvm.insertvalue %[[arg_bufptr]], %[[insert0:.*]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 
   // CHECK: %[[zero0:.*]] = llvm.mlir.constant(0 : index) : i64
   // CHECK: %[[insert2:.*]] = llvm.insertvalue %[[zero0]], %[[insert1:.*]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
diff --git a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
index 17c1e0ff6ad7d..0bc849e4b7ad9 100644
--- a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
@@ -24,7 +24,9 @@ func.func @view(%arg0 : index, %arg1 : index, %arg2 : index) {
 
   // Test two dynamic sizes.
   // CHECK: llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-  // CHECK: %[[BASE_PTR:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[ALIGNED_PTR:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[DESC_OFF:.*]] = llvm.extractvalue %{{.*}}[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[BASE_PTR:.*]] = llvm.getelementptr %[[ALIGNED_PTR]][%[[DESC_OFF]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
   // CHECK: %[[SHIFTED_BASE_PTR:.*]] = llvm.getelementptr %[[BASE_PTR]][%[[ARG2]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
   // CHECK: llvm.insertvalue %[[SHIFTED_BASE_PTR]], %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[C0:.*]] = llvm.mlir.constant(0 : index) : i64
@@ -39,7 +41,9 @@ func.func @view(%arg0 : index, %arg1 : index, %arg2 : index) {
 
   // Test one dynamic size.
   // CHECK: llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-  // CHECK: %[[BASE_PTR_2:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[ALIGNED_PTR_2:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[DESC_OFF_2:.*]] = llvm.extractvalue %{{.*}}[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[BASE_PTR_2:.*]] = llvm.getelementptr %[[ALIGNED_PTR_2]][%[[DESC_OFF_2]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
   // CHECK: %[[SHIFTED_BASE_PTR_2:.*]] = llvm.getelementptr %[[BASE_PTR_2]][%[[ARG2]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
   // CHECK: llvm.insertvalue %[[SHIFTED_BASE_PTR_2]], %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[C0_2:.*]] = llvm.mlir.constant(0 : index) : i64
@@ -55,7 +59,9 @@ func.func @view(%arg0 : index, %arg1 : index, %arg2 : index) {
 
   // Test static sizes.
   // CHECK: llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-  // CHECK: %[[BASE_PTR_3:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[ALIGNED_PTR_3:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[DESC_OFF_3:.*]] = llvm.extractvalue %{{.*}}[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[BASE_PTR_3:.*]] = llvm.getelementptr %[[ALIGNED_PTR_3]][%[[DESC_OFF_3]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
   // CHECK: %[[SHIFTED_BASE_PTR_3:.*]] = llvm.getelementptr %[[BASE_PTR_3]][%[[ARG2]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
   // CHECK: llvm.insertvalue %[[SHIFTED_BASE_PTR_3]], %{{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[C0_3:.*]] = llvm.mlir.constant(0 : index) : i64
@@ -76,7 +82,9 @@ func.func @view(%arg0 : index, %arg1 : index, %arg2 : index) {
   %6 = memref.alloc() : memref<2048xi8, 4>
 
   // CHECK: llvm.mlir.poison : !llvm.struct<(ptr<4>, ptr<4>, i64, array<2 x i64>, array<2 x i64>)>
-  // CHECK: %[[BASE_PTR_4:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<4>, ptr<4>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[ALIGNED_PTR_4:.*]] = llvm.extractvalue %{{.*}}[1] : !llvm.struct<(ptr<4>, ptr<4>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[DESC_OFF_4:.*]] = llvm.extractvalue %{{.*}}[2] : !llvm.struct<(ptr<4>, ptr<4>, i64, array<1 x i64>, array<1 x i64>)>
+  // CHECK: %[[BASE_PTR_4:.*]] = llvm.getelementptr %[[ALIGNED_PTR_4]][%[[DESC_OFF_4]]] : (!llvm.ptr<4>, i64) -> !llvm.ptr<4>, i8
   // CHECK: %[[SHIFTED_BASE_PTR_4:.*]] = llvm.getelementptr %[[BASE_PTR_4]][%[[ARG2]]] : (!llvm.ptr<4>, i64) -> !llvm.ptr<4>, i8
   // CHECK: llvm.insertvalue %[[SHIFTED_BASE_PTR_4]], %{{.*}}[1] : !llvm.struct<(ptr<4>, ptr<4>, i64, array<2 x i64>, array<2 x i64>)>
   // CHECK: %[[C0_4:.*]] = llvm.mlir.constant(0 : index) : i64
diff --git a/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir b/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
index 1e2be5f935e07..8a69c30b2d811 100644
--- a/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
+++ b/mlir/test/Conversion/PtrToLLVM/ptr-to-llvm.mlir
@@ -56,8 +56,10 @@ func.func @test_type_offset() -> (index, index, index) {
 // CHECK:           %[[VAL_3:.*]] = llvm.insertvalue %[[ARG2]], %[[VAL_2]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 // CHECK:           %[[VAL_4:.*]] = llvm.insertvalue %[[ARG3]], %[[VAL_3]][3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 // CHECK:           %[[VAL_5:.*]] = llvm.insertvalue %[[ARG4]], %[[VAL_4]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-// CHECK:           %[[VAL_6:.*]] = llvm.extractvalue %[[VAL_5]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
-// CHECK:           llvm.return %[[VAL_6]] : !llvm.ptr
+// CHECK:           %[[VAL_ALIGNED:.*]] = llvm.extractvalue %[[VAL_5]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK:           %[[VAL_OFF:.*]] = llvm.extractvalue %[[VAL_5]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK:           %[[VAL_PTR:.*]] = llvm.getelementptr %[[VAL_ALIGNED]][%[[VAL_OFF]]]
+// CHECK:           llvm.return %[[VAL_PTR]] : !llvm.ptr
 // CHECK:         }
 func.func @test_to_ptr(%arg0: memref<10xf32, #ptr.generic_space>) -> !ptr.ptr<#ptr.generic_space> {
   %0 = ptr.to_ptr %arg0 : memref<10xf32, #ptr.generic_space> -> <#ptr.generic_space>
@@ -231,7 +233,9 @@ func.func @test_memref_strided(%arg0: memref<10x20xf32, strided<[40, 2]>, #ptr.g
 // CHECK:           %[[VAL_5:.*]] = llvm.insertvalue %[[ARG5]], %[[VAL_4]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_6:.*]] = llvm.insertvalue %[[ARG4]], %[[VAL_5]][3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[ARG6]], %[[VAL_6]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// CHECK:           %[[VAL_8:.*]] = llvm.extractvalue %[[VAL_7]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[VAL_ALIGNED:.*]] = llvm.extractvalue %[[VAL_7]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[VAL_BUFOFF:.*]] = llvm.extractvalue %[[VAL_7]][2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK:           %[[VAL_8:.*]] = llvm.getelementptr %[[VAL_ALIGNED]][%[[VAL_BUFOFF]]]
 // CHECK:           %[[VAL_9:.*]] = llvm.mlir.undef : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
 // CHECK:           %[[VAL_10:.*]] = llvm.extractvalue %[[VAL_7]][0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 // CHECK:           %[[VAL_11:.*]] = llvm.insertvalue %[[VAL_10]], %[[VAL_9]][0] : !llvm.struct<(ptr, i64, i64, i64, i64, i64)>
@@ -307,7 +311,9 @@ func.func @test_memref_0d(%arg0: memref<f32, #ptr.generic_space>) -> memref<f32,
 // CHECK:           %[[VAL_7:.*]] = llvm.insertvalue %[[ARG7]], %[[VAL_6]][4, 1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[VAL_8:.*]] = llvm.insertvalue %[[ARG5]], %[[VAL_7]][3, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 // CHECK:           %[[VAL_9:.*]] = llvm.insertvalue %[[ARG8]], %[[VAL_8]][4, 2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-// CHECK:           %[[VAL_10:.*]] = llvm.extractvalue %[[VAL_9]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK:           %[[VAL_ALIGNED:.*]] = llvm.extractvalue %[[VAL_9]][1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK:           %[[VAL_BUFOFF:.*]] = llvm.extractvalue %[[VAL_9]][2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+// CHECK:           %[[VAL_10:.*]] = llvm.getelementptr %[[VAL_ALIGNED]][%[[VAL_BUFOFF]]]
 // CHECK:           %[[VAL_11:.*]] = llvm.mlir.zero : !llvm.ptr
 // CHECK:           %[[VAL_12:.*]] = llvm.getelementptr %[[VAL_11]][1] : (!llvm.ptr) -> !llvm.ptr, f32
 // CHECK:           %[[VAL_13:.*]] = llvm.ptrtoint %[[VAL_12]] : !llvm.ptr to i64
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
index 86a70c7bddcfd..84a8fba374a72 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
@@ -764,7 +764,9 @@ func.func @type_cast_f32(%arg0: memref<8x8x8xf32>) -> memref<vector<8x8x8xf32>>
 //       CHECK:   %[[allocated:.*]] = llvm.extractvalue {{.*}}[0] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
 //       CHECK:   llvm.insertvalue %[[allocated]], {{.*}}[0] : !llvm.struct<(ptr, ptr, i64)>
 //       CHECK:   %[[aligned:.*]] = llvm.extractvalue {{.*}}[1] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
-//       CHECK:   llvm.insertvalue %[[aligned]], {{.*}}[1] : !llvm.struct<(ptr, ptr, i64)>
+//       CHECK:   %[[srcoff:.*]] = llvm.extractvalue {{.*}}[2] : !llvm.struct<(ptr, ptr, i64, array<3 x i64>, array<3 x i64>)>
+//       CHECK:   %[[bufptr:.*]] = llvm.getelementptr %[[aligned]][%[[srcoff]]]
+//       CHECK:   llvm.insertvalue %[[bufptr]], {{.*}}[1] : !llvm.struct<(ptr, ptr, i64)>
 //       CHECK:   llvm.mlir.constant(0 : index
 //       CHECK:   llvm.insertvalue {{.*}}[2] : !llvm.struct<(ptr, ptr, i64)>
 
@@ -795,7 +797,9 @@ func.func @type_cast_non_zero_addrspace(%arg0: memref<8x8x8xf32, 3>) -> memref<v
 //       CHECK:   %[[allocated:.*]] = llvm.extractvalue {{.*}}[0] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
 //       CHECK:   llvm.insertvalue %[[allocated]], {{.*}}[0] : !llvm.struct<(ptr<3>, ptr<3>, i64)>
 //       CHECK:   %[[aligned:.*]] = llvm.extractvalue {{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
-//       CHECK:   llvm.insertvalue %[[aligned]], {{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64)>
+//       CHECK:   %[[srcoff:.*]] = llvm.extractvalue {{.*}}[2] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<3 x i64>, array<3 x i64>)>
+//       CHECK:   %[[bufptr:.*]] = llvm.getelementptr %[[aligned]][%[[srcoff]]]
+//       CHECK:   llvm.insertvalue %[[bufptr]], {{.*}}[1] : !llvm.struct<(ptr<3>, ptr<3>, i64)>
 //       CHECK:   llvm.mlir.constant(0 : index
 //       CHECK:   llvm.insertvalue {{.*}}[2] : !llvm.struct<(ptr<3>, ptr<3>, i64)>
 

>From 0e2aa79064796b4f7e937001bd2791e1e5306035 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 14:17:20 +0200
Subject: [PATCH 25/27] [WIP][mlir] step 5: LLVM+SPIR-V audit fixes

- GPUToLLVM memcpy/memset: use bufferPtr so descriptor offset is honored.
- AMDGPUToROCDL FatRawBufferCast: add descOffset*elemBytes to numRecords
  when !resetOffset so the buffer resource covers the shifted base.
- LLVMCommon bare-ptr calling convention: lower memref args via bufferPtr
  (host side) so callees receive a ptr that already includes the offset.
- GPUCommon GPUReturnOp bare-ptr: return bufferPtr for memref results.
- MemRefToLLVM MemRefReshape: load the shape memref through bufferPtr.
- Regression CHECK updates for the four tests above plus FuncToLLVM
  bare-ptr return.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
---
 .../AMDGPUToROCDL/AMDGPUToROCDL.cpp           | 17 +++++++++-
 .../Conversion/GPUCommon/GPUOpsLowering.cpp   | 13 +++++---
 .../GPUCommon/GPUToLLVMConversion.cpp         | 22 +++++++++----
 .../Conversion/LLVMCommon/TypeConverter.cpp   |  9 +++--
 .../Conversion/MemRefToLLVM/MemRefToLLVM.cpp  |  5 ++-
 .../AMDGPUToROCDL/amdgpu-to-rocdl.mlir        | 33 +++++++++++++------
 .../FuncToLLVM/func-memref-return.mlir        |  4 ++-
 ...launch-func-bare-ptr-intersperse-size.mlir | 12 +++++--
 .../GPUCommon/lower-launch-func-bare-ptr.mlir |  4 ++-
 .../convert-dynamic-memref-ops.mlir           |  4 ++-
 10 files changed, 90 insertions(+), 33 deletions(-)

diff --git a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
index fe38acec29e78..9a904c8987744 100644
--- a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+++ b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
@@ -232,10 +232,25 @@ struct FatRawBufferCastLowering
       return op.emitOpError("Can't lower non-stride-offset memrefs");
 
     Value numRecords = adaptor.getValidBytes();
-    if (!numRecords)
+    if (!numRecords) {
       numRecords =
           getNumRecords(rewriter, loc, memrefType, descriptor, strideVals,
                         elementByteWidth, chipset, adaptor.getBoundsCheck());
+      // When the rsrc base is the raw aligned pointer (i.e. we did not bake
+      // the descriptor offset into the base), the runtime offset is added on
+      // top by the buffer rsrc, so num_records must cover that extra range.
+      if (!adaptor.getResetOffset()) {
+        Value descOffset = descriptor.offset(rewriter, loc);
+        Value descOffsetI64 =
+            convertUnsignedToI64(rewriter, loc, descOffset);
+        Value byteWidthConst =
+            createI64Constant(rewriter, loc, elementByteWidth);
+        Value descOffsetBytes =
+            LLVM::MulOp::create(rewriter, loc, descOffsetI64, byteWidthConst);
+        numRecords =
+            LLVM::AddOp::create(rewriter, loc, numRecords, descOffsetBytes);
+      }
+    }
 
     Value basePointer =
         adaptor.getResetOffset()
diff --git a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
index 6a705ebab7aa4..662598d3d9b1f 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
@@ -778,15 +778,18 @@ LogicalResult GPUReturnOpLowering::matchAndRewrite(
 
   bool useBarePtrCallConv = getTypeConverter()->getOptions().useBarePtrCallConv;
   if (useBarePtrCallConv) {
-    // For the bare-ptr calling convention, extract the aligned pointer to
-    // be returned from the memref descriptor.
+    // For the bare-ptr calling convention, extract the buffer pointer
+    // (aligned ptr + runtime offset) to be returned from the memref
+    // descriptor; the bare-ptr ABI cannot carry the offset separately.
     for (auto it : llvm::zip(op->getOperands(), adaptor.getOperands())) {
       Type oldTy = std::get<0>(it).getType();
       Value newOperand = std::get<1>(it);
-      if (isa<MemRefType>(oldTy) && getTypeConverter()->canConvertToBarePtr(
-                                        cast<BaseMemRefType>(oldTy))) {
+      if (auto memrefType = dyn_cast<MemRefType>(oldTy);
+          memrefType && getTypeConverter()->canConvertToBarePtr(
+                            cast<BaseMemRefType>(oldTy))) {
         MemRefDescriptor memrefDesc(newOperand);
-        newOperand = memrefDesc.allocatedPtr(rewriter, loc);
+        newOperand = memrefDesc.bufferPtr(rewriter, loc, *getTypeConverter(),
+                                          memrefType);
       } else if (isa<UnrankedMemRefType>(oldTy)) {
         // Unranked memref is not supported in the bare pointer calling
         // convention.
diff --git a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
index 3e99c537d0e02..53f55c8203406 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
@@ -1116,12 +1116,18 @@ LogicalResult ConvertMemcpyOpToGpuRuntimeCallPattern::matchAndRewrite(
   auto sizeBytes =
       LLVM::PtrToIntOp::create(rewriter, loc, getIndexType(), gepPtr);
 
-  auto src = bitAndAddrspaceCast(loc, rewriter, llvmPointerType,
-                                 srcDesc.alignedPtr(rewriter, loc),
-                                 *getTypeConverter());
+  // Use bufferPtr to fold the descriptor's runtime offset into the base
+  // pointer; otherwise an offset coming from a subview/reinterpret_cast would
+  // be silently dropped by the runtime memcpy.
+  auto dstMemRefType = cast<MemRefType>(memcpyOp.getDst().getType());
+  auto src = bitAndAddrspaceCast(
+      loc, rewriter, llvmPointerType,
+      srcDesc.bufferPtr(rewriter, loc, *getTypeConverter(), memRefType),
+      *getTypeConverter());
   auto dst = bitAndAddrspaceCast(
       loc, rewriter, llvmPointerType,
-      MemRefDescriptor(adaptor.getDst()).alignedPtr(rewriter, loc),
+      MemRefDescriptor(adaptor.getDst())
+          .bufferPtr(rewriter, loc, *getTypeConverter(), dstMemRefType),
       *getTypeConverter());
 
   auto stream = adaptor.getAsyncDependencies().front();
@@ -1160,9 +1166,11 @@ LogicalResult ConvertMemsetOpToGpuRuntimeCallPattern::matchAndRewrite(
 
   auto value =
       LLVM::BitcastOp::create(rewriter, loc, bitCastType, adaptor.getValue());
-  auto dst = bitAndAddrspaceCast(loc, rewriter, llvmPointerType,
-                                 dstDesc.alignedPtr(rewriter, loc),
-                                 *getTypeConverter());
+  // Fold the descriptor's runtime offset into the base pointer.
+  auto dst = bitAndAddrspaceCast(
+      loc, rewriter, llvmPointerType,
+      dstDesc.bufferPtr(rewriter, loc, *getTypeConverter(), memRefType),
+      *getTypeConverter());
 
   auto stream = adaptor.getAsyncDependencies().front();
   FunctionCallBuilder builder =
diff --git a/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp b/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp
index 1eedfb9c3c54d..aeb0c37bb879e 100644
--- a/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp
+++ b/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp
@@ -763,11 +763,14 @@ SmallVector<Value, 4> LLVMTypeConverter::promoteOperands(
        llvm::zip_equal(opOperands, adaptorOperands)) {
     if (useBarePtrCallConv) {
       // For the bare-ptr calling convention, we only have to extract the
-      // aligned pointer of a memref.
-      if (isa<MemRefType>(operand.getType())) {
+      // buffer pointer of a memref. Use bufferPtr (aligned ptr + runtime
+      // offset) so the descriptor's offset is folded into the pointer; the
+      // bare-ptr ABI cannot carry the offset separately.
+      if (auto memrefType = dyn_cast<MemRefType>(operand.getType())) {
         assert(llvmOperand.size() == 1 && "Expected a single operand");
         MemRefDescriptor desc(llvmOperand.front());
-        promotedOperands.push_back(desc.alignedPtr(builder, loc));
+        promotedOperands.push_back(desc.bufferPtr(builder, loc, *this,
+                                                  memrefType));
         continue;
       }
       if (isa<UnrankedMemRefType>(operand.getType())) {
diff --git a/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp b/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
index 29ad68117fc7e..8ceebf103fbb1 100644
--- a/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
+++ b/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp
@@ -1613,7 +1613,10 @@ struct MemRefReshapeOpLowering
         rewriter, loc, *getTypeConverter(), underlyingDescPtr, elementPtrType);
     Value targetStridesBase = UnrankedMemRefDescriptor::strideBasePtr(
         rewriter, loc, *getTypeConverter(), targetSizesBase, resultRank);
-    Value shapeOperandPtr = shapeDesc.alignedPtr(rewriter, loc);
+    // Use bufferPtr so the shape memref's runtime offset is folded in;
+    // otherwise the indexed loads below would read at the wrong address.
+    Value shapeOperandPtr =
+        shapeDesc.bufferPtr(rewriter, loc, *getTypeConverter(), shapeMemRefType);
     Value oneIndex = createIndexAttrConstant(rewriter, loc, getIndexType(), 1);
     Value resultRankMinusOne =
         LLVM::SubOp::create(rewriter, loc, resultRank, oneIndex);
diff --git a/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir b/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
index 6f15498422465..268008bfe1837 100644
--- a/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
+++ b/mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
@@ -15,7 +15,10 @@ func.func @fat_raw_buffer_cast(%buf: memref<8xi32, #gpu.address_space<global>>)
   // CHECK-DAG: %[[offset:.*]] = llvm.extractvalue %[[desc]][2]
   // CHECK-DAG: %[[sizes:.*]] = llvm.extractvalue %[[desc]][3]
   // CHECK-DAG: %[[strides:.*]] = llvm.extractvalue %[[desc]][4]
-  // CHECK-DAG: %[[numRecords:.*]] = llvm.mlir.constant(32 : i64) : i64
+  // CHECK-DAG: %[[staticSize:.*]] = llvm.mlir.constant(32 : i64) : i64
+  // CHECK-DAG: %[[elemBytes:.*]] = llvm.mlir.constant(4 : i64) : i64
+  // CHECK-DAG: %[[offBytes:.*]] = llvm.mul %{{.*}}, %[[elemBytes]] : i64
+  // CHECK-DAG: %[[numRecords:.*]] = llvm.add %[[staticSize]], %[[offBytes]] : i64
   // CHECK-DAG: %[[strideArg:.*]] = llvm.mlir.constant(0 : i16) : i16
   // GFX9:  %[[flags:.*]] = llvm.mlir.constant(159744 : i32)
   // GFX1250: %[[flags:.*]] = llvm.mlir.constant(0 : i32)
@@ -24,9 +27,9 @@ func.func @fat_raw_buffer_cast(%buf: memref<8xi32, #gpu.address_space<global>>)
   // CHECK: %[[ret0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr<7>, ptr<7>, i64, array<1 x i64>, array<1 x i64>)>
   // CHECK: %[[ret1:.*]] = llvm.insertvalue %[[fatBuf]], %[[ret0]][0]
   // CHECK: %[[ret2:.*]] = llvm.insertvalue %[[fatBuf]], %[[ret1]][1]
-  // CHECK: %[[ret3:.*]] = llvm.insertvalue %[[offset]], %[[ret2]][2]
-  // CHECK: %[[ret4:.*]] = llvm.insertvalue %[[sizes]], %[[ret3]][3]
-  // CHECK: %[[ret5:.*]] = llvm.insertvalue %[[strides]], %[[ret4]][4]
+  // CHECK: %[[ret3:.*]] = llvm.insertvalue %{{.*}}, %[[ret2]][2]
+  // CHECK: %[[ret4:.*]] = llvm.insertvalue %{{.*}}, %[[ret3]][3]
+  // CHECK: %[[ret5:.*]] = llvm.insertvalue %{{.*}}, %[[ret4]][4]
   // CHECK: builtin.unrealized_conversion_cast %[[ret5]]
   %ret = amdgpu.fat_raw_buffer_cast %buf : memref<8xi32, #gpu.address_space<global>> to memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
   return %ret : memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
@@ -37,7 +40,10 @@ func.func @fat_raw_buffer_cast_0d(%buf: memref<i32, #gpu.address_space<global>>)
   // CHECK: %[[desc:.*]] = builtin.unrealized_conversion_cast %{{.*}} : memref<i32, #gpu.address_space<global>> to !llvm.struct<(ptr<1>, ptr<1>, i64)>
   // CHECK-DAG: %[[base:.*]] = llvm.extractvalue %[[desc]][1]
   // CHECK-DAG: %[[offset:.*]] = llvm.extractvalue %[[desc]][2]
-  // CHECK-DAG: %[[numRecords:.*]] = llvm.mlir.constant(4 : i64) : i64
+  // CHECK-DAG: %[[staticSize:.*]] = llvm.mlir.constant(4 : i64) : i64
+  // CHECK-DAG: %[[elemBytes:.*]] = llvm.mlir.constant(4 : i64) : i64
+  // CHECK-DAG: %[[offBytes:.*]] = llvm.mul %{{.*}}, %[[elemBytes]] : i64
+  // CHECK-DAG: %[[numRecords:.*]] = llvm.add %[[staticSize]], %[[offBytes]] : i64
   // CHECK-DAG: %[[strideArg:.*]] = llvm.mlir.constant(0 : i16) : i16
   // GFX9:  %[[flags:.*]] = llvm.mlir.constant(159744 : i32)
   // GFX1250: %[[flags:.*]] = llvm.mlir.constant(0 : i32)
@@ -46,7 +52,7 @@ func.func @fat_raw_buffer_cast_0d(%buf: memref<i32, #gpu.address_space<global>>)
   // CHECK: %[[ret0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr<7>, ptr<7>, i64)>
   // CHECK: %[[ret1:.*]] = llvm.insertvalue %[[fatBuf]], %[[ret0]][0]
   // CHECK: %[[ret2:.*]] = llvm.insertvalue %[[fatBuf]], %[[ret1]][1]
-  // CHECK: %[[ret3:.*]] = llvm.insertvalue %[[offset]], %[[ret2]][2]
+  // CHECK: %[[ret3:.*]] = llvm.insertvalue %{{.*}}, %[[ret2]][2]
   // CHECK: builtin.unrealized_conversion_cast %[[ret3]]
   %ret = amdgpu.fat_raw_buffer_cast %buf : memref<i32, #gpu.address_space<global>> to memref<i32, #amdgpu.address_space<fat_raw_buffer>>
   return %ret : memref<i32, #amdgpu.address_space<fat_raw_buffer>>
@@ -58,7 +64,11 @@ func.func @fat_raw_buffer_cast_dyn_size_offset(%buf: memref<?xi32, strided<[1]>,
   // CHECK: %[[stride0:.*]] = llvm.extractvalue %{{.*}}[4, 0]
   // CHECK: %[[maxVals:.*]] = llvm.mul %[[size0]], %[[stride0]]
   // CHECK: %[[byteSize:.*]] = llvm.mlir.constant(4 : i64) : i64
-  // CHECK: %[[numRecords:.*]] = llvm.mul %[[maxVals]], %[[byteSize]]
+  // CHECK: %[[regionSize:.*]] = llvm.mul %[[maxVals]], %[[byteSize]]
+  // CHECK: %[[descOff:.*]] = llvm.extractvalue %{{.*}}[2]
+  // CHECK: %[[elemBytes:.*]] = llvm.mlir.constant(4 : i64) : i64
+  // CHECK: %[[offBytes:.*]] = llvm.mul %[[descOff]], %[[elemBytes]] : i64
+  // CHECK: %[[numRecords:.*]] = llvm.add %[[regionSize]], %[[offBytes]] : i64
   // CHECK: %[[offset:.*]] = llvm.extractvalue %{{.*}}[2]
   // CHECK: rocdl.make.buffer.rsrc %{{.*}}, %{{.*}}, %[[numRecords]], %{{.*}}
   // CHECK: llvm.insertvalue %[[offset]], %{{.*}}[2]
@@ -91,11 +101,14 @@ func.func @fat_raw_buffer_cast_valid_bytes(%buf: memref<8xi32, #gpu.address_spac
 
 // CHECK-LABEL: func @fat_raw_buffer_cast_bounds_check
 func.func @fat_raw_buffer_cast_bounds_check(%buf: memref<8xi32, #gpu.address_space<global>>) -> memref<8xi32, #amdgpu.address_space<fat_raw_buffer>> {
-  // GFX9:  %[[numRecords:.*]] = llvm.mlir.constant({{.*}} : i64)
+  // GFX9:  %[[regionSize:.*]] = llvm.mlir.constant({{.*}} : i64)
+  // GFX9:  %[[numRecords:.*]] = llvm.add %[[regionSize]], %{{.*}} : i64
   // GFX9:  %[[flags:.*]] = llvm.mlir.constant(159744 : i32)
-  // GFX1250: %[[numRecords:.*]] = llvm.mlir.constant(35184372088831 : i64)
+  // GFX1250: %[[regionSize:.*]] = llvm.mlir.constant(35184372088831 : i64)
+  // GFX1250: %[[numRecords:.*]] = llvm.add %[[regionSize]], %{{.*}} : i64
   // GFX1250: %[[flags:.*]] = llvm.mlir.constant(0 : i32)
-  // RDNA:  %[[numRecords:.*]] = llvm.mlir.constant({{.*}} : i64)
+  // RDNA:  %[[regionSize:.*]] = llvm.mlir.constant({{.*}} : i64)
+  // RDNA:  %[[numRecords:.*]] = llvm.add %[[regionSize]], %{{.*}} : i64
   // RDNA:  %[[flags:.*]] = llvm.mlir.constant(553807872 : i32)
   // CHECK: %[[rsrc:.*]] = rocdl.make.buffer.rsrc %{{.*}}, %{{.*}}, %[[numRecords]], %[[flags]]
   %ret = amdgpu.fat_raw_buffer_cast %buf boundsCheck(false) : memref<8xi32, #gpu.address_space<global>> to memref<8xi32, #amdgpu.address_space<fat_raw_buffer>>
diff --git a/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir b/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
index 95a786d9ab0ff..0bf1c19b0020e 100644
--- a/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
+++ b/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
@@ -70,7 +70,9 @@ func.func private @foo(memref<10xi8>) -> memref<20xi8>
 // BAREPTR-SAME:    %[[in:.*]]: !llvm.ptr) -> !llvm.ptr
 func.func @check_memref_func_call(%in : memref<10xi8>) -> memref<20xi8> {
   // BAREPTR:         %[[inDesc:.*]] = llvm.insertvalue %{{.*}}, %{{.*}}[4, 0]
-  // BAREPTR-NEXT:    %[[barePtr:.*]] = llvm.extractvalue %[[inDesc]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // BAREPTR-NEXT:    %[[inAligned:.*]] = llvm.extractvalue %[[inDesc]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // BAREPTR-NEXT:    %[[inOff:.*]] = llvm.extractvalue %[[inDesc]][2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+  // BAREPTR-NEXT:    %[[barePtr:.*]] = llvm.getelementptr %[[inAligned]][%[[inOff]]]
   // BAREPTR-NEXT:    %[[call:.*]] = llvm.call @foo(%[[barePtr]]) : (!llvm.ptr) -> !llvm.ptr
   // BAREPTR-NEXT:    %[[desc0:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
   // BAREPTR-NEXT:    %[[desc1:.*]] = llvm.insertvalue %[[call]], %[[desc0]][0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
diff --git a/mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr-intersperse-size.mlir b/mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr-intersperse-size.mlir
index 171b13da22713..dace29b6ba413 100644
--- a/mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr-intersperse-size.mlir
+++ b/mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr-intersperse-size.mlir
@@ -9,9 +9,15 @@ module attributes {gpu.container_module, spirv.target_env = #spirv.target_env<#s
     // CHECK: [[RANK2UMD:%.*]] = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     %rank2UndefMemrefDescriptor = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
     %c1 = arith.constant 1 : index
-    // CHECK: [[PTR1:%.*]] = llvm.extractvalue [[RANK1UMD]][1]
-    // CHECK: [[PTR2:%.*]] = llvm.extractvalue [[RANK2UMD]][1]
-    // CHECK: [[PTR3:%.*]] = llvm.extractvalue [[RANK2UMD]][1]
+    // CHECK: [[ALIGNED1:%.*]] = llvm.extractvalue [[RANK1UMD]][1]
+    // CHECK: [[OFF1:%.*]] = llvm.extractvalue [[RANK1UMD]][2]
+    // CHECK: [[PTR1:%.*]] = llvm.getelementptr [[ALIGNED1]][[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, f32
+    // CHECK: [[ALIGNED2:%.*]] = llvm.extractvalue [[RANK2UMD]][1]
+    // CHECK: [[OFF2:%.*]] = llvm.extractvalue [[RANK2UMD]][2]
+    // CHECK: [[PTR2:%.*]] = llvm.getelementptr [[ALIGNED2]][[[OFF2]]] : (!llvm.ptr, i64) -> !llvm.ptr, i32
+    // CHECK: [[ALIGNED3:%.*]] = llvm.extractvalue [[RANK2UMD]][1]
+    // CHECK: [[OFF3:%.*]] = llvm.extractvalue [[RANK2UMD]][2]
+    // CHECK: [[PTR3:%.*]] = llvm.getelementptr [[ALIGNED3]][[[OFF3]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
     // CHECK: [[SIZE1:%.*]] = llvm.mlir.constant(32 : index) : i64
     // CHECK: [[SIZE2:%.*]] = llvm.mlir.constant(256 : index) : i64
     // CHECK: [[SIZE3:%.*]] = llvm.mlir.constant(48 : index) : i64
diff --git a/mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr.mlir b/mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr.mlir
index 5e1c3b797235f..8e6f267027a0a 100644
--- a/mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr.mlir
+++ b/mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr.mlir
@@ -18,7 +18,9 @@ module attributes {gpu.container_module} {
   func.func @foo() {
     // CHECK: [[MEMREF:%.*]] = gpu.alloc () : memref<10xf32, 1>
     // CHECK: [[DESCRIPTOR:%.*]] = builtin.unrealized_conversion_cast [[MEMREF]] : memref<10xf32, 1> to !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
-    // CHECK: [[PTR:%.*]] = llvm.extractvalue [[DESCRIPTOR]][1] : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: [[ALIGNED:%.*]] = llvm.extractvalue [[DESCRIPTOR]][1] : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: [[OFF:%.*]] = llvm.extractvalue [[DESCRIPTOR]][2] : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
+    // CHECK: [[PTR:%.*]] = llvm.getelementptr [[ALIGNED]][[[OFF]]] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f32
     // CHECK: gpu.launch_func  @kernels::@kernel_1 blocks in ({{.*}}) threads in ({{.*}}) : i64
     // CHECK: args(%{{.*}} : f32, [[PTR]] : !llvm.ptr<1>)
     %0 = arith.constant 0. : f32
diff --git a/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir b/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
index 2292313bf1402..704d4aa76098f 100644
--- a/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/convert-dynamic-memref-ops.mlir
@@ -552,7 +552,9 @@ func.func @memref_reshape(%input : memref<2x3xf32>, %shape : memref<?xindex>) {
 // Iterate over shape operand in reverse order and set sizes and strides.
 // CHECK: [[SIZES_PTR:%.*]] = llvm.getelementptr [[UNDERLYING_DESC]]{{\[}}0, 3]
 // CHECK: [[STRIDES_PTR:%.*]] = llvm.getelementptr [[SIZES_PTR]]{{\[}}[[RANK]]]
-// CHECK: [[SHAPE_IN_PTR:%.*]] = llvm.extractvalue [[SHAPE]][1] : [[SHAPE_TY]]
+// CHECK: [[SHAPE_ALIGNED:%.*]] = llvm.extractvalue [[SHAPE]][1] : [[SHAPE_TY]]
+// CHECK: [[SHAPE_OFF:%.*]] = llvm.extractvalue [[SHAPE]][2] : [[SHAPE_TY]]
+// CHECK: [[SHAPE_IN_PTR:%.*]] = llvm.getelementptr [[SHAPE_ALIGNED]][[[SHAPE_OFF]]]
 // CHECK: [[C1_:%.*]] = llvm.mlir.constant(1 : index) : i64
 // CHECK: [[RANK_MIN_1:%.*]] = llvm.sub [[RANK]], [[C1_]] : i64
 // CHECK: llvm.br ^bb1([[RANK_MIN_1]], [[C1_]] : i64, i64)

>From e5550c26cd16dd67509c39fbd6910d5205d1a5d8 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 16:56:13 +0200
Subject: [PATCH 26/27] [WIP][mlir] step 6: coverage audit fixes and two real
 bugs

Bugs fixed:
- SubViewOp::fold now requires all offsets be zero and all strides
  be one before folding a subview to its source. Pre-refactor the
  "static layout" check ruled out dynamic offsets because offsets
  lived in the layout attr; after the refactor it only checked
  strides, so a subview with a dynamic %idx offset silently folded
  away.
- affine::normalizeMemRef(ReinterpretCastOp) now reads the op's
  static offset via getStaticOffsets() and composes it into the
  layout map before computing the normalized shape and indexRemap.
  Previously a non-zero offset operand was dropped, producing a
  smaller-than-needed flat memref and mis-indexed user loads.

Infrastructure:
- Added InferStridedMetadataOpInterface impl for ReinterpretCastOp
  so StridedMetadataRangeAnalysis can seed tight offset ranges from
  reinterpret_cast operands. test-strided-metadata-range-analysis
  regains its bounded-offset CHECKs by routing through reinterpret
  casts that pin the entry-state offset.

Coverage restoration:
- VectorToXeGPU gather/scatter/transfer-read/transfer-write: restored
  extract_strided_metadata + arith.addi %[[OFFSET]] CHECKs that
  verify subview offsets flow into the XeGPU index math.
- AMDGPUToROCDL global-prefetch: tightened unanchored GEP CHECKs
  with SSA bindings.
- MemRefToLLVM memref-to-llvm: added @atomic_rmw_with_nonzero_offset
  exercising a constant 5 in descriptor [2] via reinterpret_cast.
- vector-transfer-collapse-inner-most-dims: pinned subview offsets
  [%i, 0] / [0, 0] so dropped-offset regressions would fail CHECK.
- Transforms/compose-subview, Transforms/canonicalize: documented
  the composed-offset math so future readers know op-operand CHECKs
  are authoritative.
- MemRef/canonicalize: restored the dropped "don't simplify
  reinterpret_cast when the offset doesn't match" comment.
- IR/invalid-builtin-types: added a negative test pinning that
  `strided<[...], offset: N>` is rejected with the generic
  "expected '>'" diagnostic.

Renames/retargets:
- FuncToLLVM @check_static_return_with_offset ->
  @check_static_return_with_strides.
- FuncToSPIRV @memref_offset_strides -> @memref_strides (offset-
  vs-array-size cases dropped; strides coverage preserved).
- SCF loop-pipelining #map -> #strided1 (attr is a
  StridedLayoutAttr, not an affine map).
- MemRef/expand-strided-metadata negative-test renamed and
  recommented now that its anchor is the strided-layout attr, not
  an `offset: N` in the type.
- normalize-memrefs-ops @reinterpret_cast_non_zero_offset back to
  size 32, matching the fixed normalize pass behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
---
 .../mlir/Dialect/MemRef/IR/MemRefOps.td       |  1 +
 mlir/lib/Dialect/Affine/Utils/Utils.cpp       | 18 +++++++++
 mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp      | 37 ++++++++++++++++++-
 .../test-strided-metadata-range-analysis.mlir | 21 +++++++----
 .../AMDGPUToROCDL/global-prefetch.mlir        | 18 ++++++---
 .../FuncToLLVM/func-memref-return.mlir        | 13 +++++--
 .../FuncToSPIRV/types-to-spirv.mlir           | 14 +++----
 .../MemRefToLLVM/memref-to-llvm.mlir          | 19 ++++++++++
 .../VectorToXeGPU/gather-to-xegpu.mlir        |  3 ++
 .../VectorToXeGPU/scatter-to-xegpu.mlir       |  3 ++
 .../VectorToXeGPU/transfer-read-to-xegpu.mlir |  7 +++-
 .../transfer-write-to-xegpu.mlir              |  3 ++
 mlir/test/Dialect/MemRef/canonicalize.mlir    | 12 ++++--
 .../MemRef/expand-strided-metadata.mlir       | 10 ++++-
 .../Dialect/MemRef/normalize-memrefs-ops.mlir |  4 +-
 mlir/test/Dialect/SCF/loop-pipelining.mlir    | 20 +++++-----
 ...tor-transfer-collapse-inner-most-dims.mlir |  6 ++-
 mlir/test/IR/invalid-builtin-types.mlir       |  7 ++++
 mlir/test/Transforms/canonicalize.mlir        |  5 +++
 mlir/test/Transforms/compose-subview.mlir     |  7 ++++
 20 files changed, 183 insertions(+), 45 deletions(-)

diff --git a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
index 74ed0d9f5952a..8e201484f093a 100644
--- a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
+++ b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
@@ -1484,6 +1484,7 @@ def MemRef_PrefetchOp : MemRef_Op<"prefetch", [
 def MemRef_ReinterpretCastOp
   : MemRef_OpWithOffsetSizesAndStrides<"reinterpret_cast", [
       DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
+      DeclareOpInterfaceMethods<InferStridedMetadataOpInterface>,
       DeclareOpInterfaceMethods<MemorySpaceCastConsumerOpInterface>,
       AttrSizedOperandSegments,
       MemRefsNormalizable,
diff --git a/mlir/lib/Dialect/Affine/Utils/Utils.cpp b/mlir/lib/Dialect/Affine/Utils/Utils.cpp
index 7043083298615..dc6547c550de4 100644
--- a/mlir/lib/Dialect/Affine/Utils/Utils.cpp
+++ b/mlir/lib/Dialect/Affine/Utils/Utils.cpp
@@ -1785,6 +1785,24 @@ mlir::affine::normalizeMemRef(memref::ReinterpretCastOp reinterpretCastOp) {
   AffineMap oldLayoutMap = memrefType.getLayout().getAffineMap();
   Value oldMemRef = reinterpretCastOp.getResult();
 
+  // Incorporate the op's static offset (if any) into the layout map: memref
+  // types no longer carry offsets, so the affine map used for indexRemap and
+  // for computing the normalized shape must account for the static offset
+  // operand here.
+  ArrayRef<int64_t> staticOffsets = reinterpretCastOp.getStaticOffsets();
+  int64_t staticOffset = 0;
+  if (!staticOffsets.empty() &&
+      !ShapedType::isDynamic(staticOffsets.front()))
+    staticOffset = staticOffsets.front();
+  if (staticOffset != 0) {
+    MLIRContext *ctx = reinterpretCastOp.getContext();
+    AffineMap offsetMap = AffineMap::get(
+        1, 0, getAffineDimExpr(0, ctx) + staticOffset);
+    oldLayoutMap = offsetMap.compose(oldLayoutMap);
+    memrefType =
+        MemRefType::Builder(memrefType).setLayout(AffineMapAttr::get(oldLayoutMap));
+  }
+
   // If `oldLayoutMap` is identity, `memrefType` is already normalized.
   if (oldLayoutMap.isIdentity())
     return success();
diff --git a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
index 602f851877736..6af0e4a53f270 100644
--- a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+++ b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
@@ -2151,6 +2151,39 @@ SmallVector<OpFoldResult> ReinterpretCastOp::getConstifiedMixedSizes() {
   return values;
 }
 
+void ReinterpretCastOp::inferStridedMetadataRanges(
+    ArrayRef<StridedMetadataRange> ranges, GetIntRangeFn getIntRange,
+    SetStridedMetadataRangeFn setMetadata, int32_t indexBitwidth) {
+  auto isUninitialized =
+      +[](IntegerValueRange range) { return range.isUninitialized(); };
+
+  SmallVector<IntegerValueRange> offsetOperands =
+      getIntValueRanges(getMixedOffsets(), getIntRange, indexBitwidth);
+  if (llvm::any_of(offsetOperands, isUninitialized))
+    return;
+
+  SmallVector<IntegerValueRange> sizeOperands =
+      getIntValueRanges(getMixedSizes(), getIntRange, indexBitwidth);
+  if (llvm::any_of(sizeOperands, isUninitialized))
+    return;
+
+  SmallVector<IntegerValueRange> strideOperands =
+      getIntValueRanges(getMixedStrides(), getIntRange, indexBitwidth);
+  if (llvm::any_of(strideOperands, isUninitialized))
+    return;
+
+  SmallVector<ConstantIntRanges> sizes, strides;
+  for (IntegerValueRange &size : sizeOperands)
+    sizes.push_back(size.getValue());
+  for (IntegerValueRange &stride : strideOperands)
+    strides.push_back(stride.getValue());
+
+  setMetadata(getResult(),
+              StridedMetadataRange::getRanked(
+                  SmallVector<ConstantIntRanges>({offsetOperands.front().getValue()}),
+                  std::move(sizes), std::move(strides)));
+}
+
 SmallVector<OpFoldResult> ReinterpretCastOp::getConstifiedMixedStrides() {
   SmallVector<OpFoldResult> values = getMixedStrides();
   SmallVector<int64_t> staticValues;
@@ -3657,7 +3690,9 @@ OpFoldResult SubViewOp::fold(FoldAdaptor adaptor) {
 
   if (resultMemrefType == sourceMemrefType &&
       resultMemrefType.hasStaticShape() &&
-      (!resultLayout || resultLayout.hasStaticLayout())) {
+      (!resultLayout || resultLayout.hasStaticLayout()) &&
+      llvm::all_of(getMixedOffsets(), isZeroInteger) &&
+      llvm::all_of(getMixedStrides(), isOneInteger)) {
     return getViewSource();
   }
 
diff --git a/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir b/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
index f77bfc20c2255..150db50550fff 100644
--- a/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
+++ b/mlir/test/Analysis/DataFlow/test-strided-metadata-range-analysis.mlir
@@ -1,16 +1,24 @@
 // RUN: mlir-opt -test-strided-metadata-range-analysis %s 2>&1 | FileCheck %s
 
-func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1: memref<1x128x1x32x1xf32, strided<[4096, 32, 32, 1, 1]>>, %arg2: memref<8x16x4xf32, strided<[1, 64, 8]>>, %arg3: index, %arg4: index, %arg5: index) {
+// Seed source offsets via memref.reinterpret_cast with static offsets so the
+// range analysis has a tight starting offset range for the subviews below.
+// Without the reinterpret_cast, function arg memref types cannot carry
+// offsets, so the entry state can only report the maximum range.
+
+func.func @memref_subview(%arg0raw: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1: memref<1x128x1x32x1xf32, strided<[4096, 32, 32, 1, 1]>>, %arg2raw: memref<8x16x4xf32, strided<[1, 64, 8]>>, %arg3: index, %arg4: index, %arg5: index) {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
   %0 = test.with_bounds {smax = 13 : index, smin = 11 : index, umax = 13 : index, umin = 11 : index} : index
   %1 = test.with_bounds {smax = 7 : index, smin = 5 : index, umax = 7 : index, umin = 5 : index} : index
 
+  %arg0 = memref.reinterpret_cast %arg0raw to offset: [0], sizes: [8, 16, 4], strides: [64, 4, 1] : memref<8x16x4xf32, strided<[64, 4, 1]>> to memref<8x16x4xf32, strided<[64, 4, 1]>>
+  %arg2 = memref.reinterpret_cast %arg2raw to offset: [16], sizes: [8, 16, 4], strides: [1, 64, 8] : memref<8x16x4xf32, strided<[1, 64, 8]>> to memref<8x16x4xf32, strided<[1, 64, 8]>>
+
   // Test subview with unknown sizes, and constant offsets and strides.
   // CHECK: Op:  %[[SV0:.*]] = memref.subview
   // CHECK-NEXT: result[0]: strided_metadata<
-  // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
+  // CHECK-SAME: offset = [{unsigned : [1, 1] signed : [1, 1]}]
   // CHECK-SAME: sizes = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: strides = [{unsigned : [64, 64] signed : [64, 64]}, {unsigned : [4, 4] signed : [4, 4]}, {unsigned : [1, 1] signed : [1, 1]}]
   %subview = memref.subview %arg0[%c0, %c0, %c1] [%arg3, %arg4, %arg5] [%c1, %c1, %c1] : memref<8x16x4xf32, strided<[64, 4, 1]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
@@ -18,7 +26,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // Test a subview of a subview, with bounded dynamic offsets.
   // CHECK: Op:  %[[SV1:.*]] = memref.subview
   // CHECK-NEXT: result[0]: strided_metadata<
-  // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
+  // CHECK-SAME: offset = [{unsigned : [346, 484] signed : [346, 484]}]
   // CHECK-SAME: sizes = [{unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}]
   // CHECK-SAME: strides = [{unsigned : [704, 832] signed : [704, 832]}, {unsigned : [44, 52] signed : [44, 52]}, {unsigned : [11, 13] signed : [11, 13]}]
   %subview_0 = memref.subview %subview[%1, %1, %1] [%c2, %c2, %c2] [%0, %0, %0] : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
@@ -26,7 +34,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // Test a subview of a subview, with constant operands.
   // CHECK: Op:  %[[SV2:.*]] = memref.subview
   // CHECK-NEXT: result[0]: strided_metadata<
-  // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
+  // CHECK-SAME: offset = [{unsigned : [368, 510] signed : [368, 510]}]
   // CHECK-SAME: sizes = [{unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}, {unsigned : [2, 2] signed : [2, 2]}]
   // CHECK-SAME: strides = [{unsigned : [704, 832] signed : [704, 832]}, {unsigned : [44, 52] signed : [44, 52]}, {unsigned : [11, 13] signed : [11, 13]}]
   %subview_1 = memref.subview %subview_0[%c0, %c0, %c2] [%c2, %c2, %c2] [%c1, %c1, %c1] : memref<?x?x?xf32, strided<[?, ?, ?]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
@@ -50,7 +58,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
   // Test a subview with mixed bounded and unbound dynamic sizes.
   // CHECK: Op:  %[[SV5:.*]] = memref.subview
   // CHECK-NEXT: result[0]: strided_metadata<
-  // CHECK-SAME: offset = [{unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
+  // CHECK-SAME: offset = [{unsigned : [32, 32] signed : [32, 32]}]
   // CHECK-SAME: sizes = [{unsigned : [11, 13] signed : [11, 13]}, {unsigned : [5, 7] signed : [5, 7]}, {unsigned : [0, 18446744073709551615] signed : [-9223372036854775808, 9223372036854775807]}]
   // CHECK-SAME: strides = [{unsigned : [1, 1] signed : [1, 1]}, {unsigned : [64, 64] signed : [64, 64]}, {unsigned : [8, 8] signed : [8, 8]}]
   %subview_4 = memref.subview %arg2[%c0, %c0, %c2] [%0, %1, %arg5] [%c1, %c1, %c1] : memref<8x16x4xf32, strided<[1, 64, 8]>> to memref<?x?x?xf32, strided<[?, ?, ?]>>
@@ -58,8 +66,7 @@ func.func @memref_subview(%arg0: memref<8x16x4xf32, strided<[64, 4, 1]>>, %arg1:
 }
 
 // CHECK:       func.func @memref_subview
-// CHECK:       %[[A0:.*]]: memref<8x16x4xf32, strided<[64, 4, 1]>>
-// CHECK:       %[[SV0]] = memref.subview %[[A0]]
+// CHECK:       %[[SV0]] = memref.subview
 // CHECK-NEXT:  %[[SV1]] = memref.subview
 // CHECK-NEXT:  %[[SV2]] = memref.subview
 // CHECK-NEXT:  %[[SV3]] = memref.subview
diff --git a/mlir/test/Conversion/AMDGPUToROCDL/global-prefetch.mlir b/mlir/test/Conversion/AMDGPUToROCDL/global-prefetch.mlir
index b106d16ecca54..6a32f6d789258 100644
--- a/mlir/test/Conversion/AMDGPUToROCDL/global-prefetch.mlir
+++ b/mlir/test/Conversion/AMDGPUToROCDL/global-prefetch.mlir
@@ -2,8 +2,10 @@
 
 // CHECK-LABEL: @glb_prefetch0
 func.func @glb_prefetch0(%src : memref<64x64xf16, #gpu.address_space<global>>, %i : i64, %j : i64) {
-  // CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
-  // CHECK: %[[PTR:.*]] = llvm.getelementptr inbounds|nuw %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
+  // CHECK: %[[ALIGNED:.*]] = llvm.extractvalue %{{.*}}[1]
+  // CHECK: %[[DESC_OFF:.*]] = llvm.extractvalue %{{.*}}[2]
+  // CHECK: %[[BASE:.*]] = llvm.getelementptr %[[ALIGNED]][%[[DESC_OFF]]] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
+  // CHECK: %[[PTR:.*]] = llvm.getelementptr inbounds|nuw %[[BASE]][%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
   // CHECK: rocdl.global.prefetch %[[PTR]], scope 3 : !llvm.ptr<1>
   amdgpu.global_prefetch %src[%i, %j] HT WGP : memref<64x64xf16, #gpu.address_space<global>>
   func.return
@@ -11,8 +13,10 @@ func.func @glb_prefetch0(%src : memref<64x64xf16, #gpu.address_space<global>>, %
 
 // CHECK-LABEL: @glb_prefetch1
 func.func @glb_prefetch1(%src : memref<64x64xf16, #gpu.address_space<global>>, %i : i64, %j : i64) {
-  // CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
-  // CHECK: %[[PTR:.*]] = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
+  // CHECK: %[[ALIGNED:.*]] = llvm.extractvalue %{{.*}}[1]
+  // CHECK: %[[DESC_OFF:.*]] = llvm.extractvalue %{{.*}}[2]
+  // CHECK: %[[BASE:.*]] = llvm.getelementptr %[[ALIGNED]][%[[DESC_OFF]]] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
+  // CHECK: %[[PTR:.*]] = llvm.getelementptr %[[BASE]][%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
   // CHECK: rocdl.global.prefetch %[[PTR]], scope 10 : !llvm.ptr<1>
   amdgpu.global_prefetch %src[%i, %j] HT SE speculative : memref<64x64xf16, #gpu.address_space<global>>
   func.return
@@ -20,8 +24,10 @@ func.func @glb_prefetch1(%src : memref<64x64xf16, #gpu.address_space<global>>, %
 
 // CHECK-LABEL: @glb_prefetch2
 func.func @glb_prefetch2(%src : memref<64x64xf16, #gpu.address_space<global>>, %i : i64, %j : i64) {
-  // CHECK: %{{.*}} = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
-  // CHECK: %[[PTR:.*]] = llvm.getelementptr %{{.*}}[%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
+  // CHECK: %[[ALIGNED:.*]] = llvm.extractvalue %{{.*}}[1]
+  // CHECK: %[[DESC_OFF:.*]] = llvm.extractvalue %{{.*}}[2]
+  // CHECK: %[[BASE:.*]] = llvm.getelementptr %[[ALIGNED]][%[[DESC_OFF]]] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
+  // CHECK: %[[PTR:.*]] = llvm.getelementptr %[[BASE]][%{{.*}}] : (!llvm.ptr<1>, i64) -> !llvm.ptr<1>, f16
   // CHECK: rocdl.global.prefetch %{{.*}}, scope 16 : !llvm.ptr<1>
   amdgpu.global_prefetch %src[%i, %j] RT DEV speculative : memref<64x64xf16, #gpu.address_space<global>>
   func.return
diff --git a/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir b/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
index 0bf1c19b0020e..be23818db6d50 100644
--- a/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
+++ b/mlir/test/Conversion/FuncToLLVM/func-memref-return.mlir
@@ -35,13 +35,20 @@ func.func @check_static_return(%static : memref<32x18xf32>) -> memref<32x18xf32>
   return %static : memref<32x18xf32>
 }
 
-// CHECK-LABEL: func @check_static_return_with_offset
+// The return type has `strided<[22,1]>` (non-identity strides) rather than
+// identity so the BAREPTR materialization round-trip has to synthesize a
+// descriptor with shape/stride constants. Pre-refactor this test also
+// exercised a non-zero static offset via `offset: 7` baked in the type;
+// offsets are no longer part of memref types, so BAREPTR rebuilds the
+// descriptor with offset 0 (a fresh-from-bare-ptr descriptor cannot
+// recover the caller's original offset through this convention).
+// CHECK-LABEL: func @check_static_return_with_strides
 // CHECK-COUNT-2: !llvm.ptr
 // CHECK-COUNT-5: i64
 // CHECK-SAME: -> !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
-// BAREPTR-LABEL: func @check_static_return_with_offset
+// BAREPTR-LABEL: func @check_static_return_with_strides
 // BAREPTR-SAME: (%[[arg:.*]]: !llvm.ptr) -> !llvm.ptr {
-func.func @check_static_return_with_offset(%static : memref<32x18xf32, strided<[22,1]>>) -> memref<32x18xf32, strided<[22,1]>> {
+func.func @check_static_return_with_strides(%static : memref<32x18xf32, strided<[22,1]>>) -> memref<32x18xf32, strided<[22,1]>> {
 // CHECK:  llvm.return %{{.*}} : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
 
 // BAREPTR: %[[udf:.*]] = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
diff --git a/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir b/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir
index fcde78f9c43a9..6fd8fd706ce96 100644
--- a/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir
+++ b/mlir/test/Conversion/FuncToSPIRV/types-to-spirv.mlir
@@ -702,32 +702,32 @@ func.func @memref_64bit_Output(
 
 // -----
 
-// Check that memref offset and strides affect the array size.
+// Check that memref strides affect the array size. (Pre-refactor this test
+// also covered non-zero static offsets like `offset: 8` producing arrays of
+// size 72; offsets are no longer part of memref types, so offset's influence
+// on array size is no longer testable at the type-conversion layer. The
+// strides' influence on array size remains covered below.)
 module attributes {
   spirv.target_env = #spirv.target_env<
     #spirv.vce<v1.0, [StorageBuffer16BitAccess], [SPV_KHR_16bit_storage]>, #spirv.resource_limits<>>
 } {
 
-// CHECK-LABEL: spirv.func @memref_offset_strides
-func.func @memref_offset_strides(
-// CHECK-SAME: !spirv.array<64 x f32, stride=4> [0])>, StorageBuffer>
+// CHECK-LABEL: spirv.func @memref_strides
+func.func @memref_strides(
 // CHECK-SAME: !spirv.array<64 x f32, stride=4> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<256 x f32, stride=4> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<64 x f32, stride=4> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<88 x f32, stride=4> [0])>, StorageBuffer>
   %arg0: memref<16x4xf32, strided<[4, 1]>, #spirv.storage_class<StorageBuffer>>,  // tightly packed; row major
-  %arg1: memref<16x4xf32, strided<[4, 1]>, #spirv.storage_class<StorageBuffer>>,  // offset 8
   %arg2: memref<16x4xf32, strided<[16, 1]>, #spirv.storage_class<StorageBuffer>>, // pad 12 after each row
   %arg3: memref<16x4xf32, strided<[1, 16]>, #spirv.storage_class<StorageBuffer>>, // tightly packed; col major
   %arg4: memref<16x4xf32, strided<[1, 22]>, #spirv.storage_class<StorageBuffer>>, // pad 4 after each col
 
-// CHECK-SAME: !spirv.array<64 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<64 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<256 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<64 x f16, stride=2> [0])>, StorageBuffer>
 // CHECK-SAME: !spirv.array<88 x f16, stride=2> [0])>, StorageBuffer>
   %arg5: memref<16x4xf16, strided<[4, 1]>, #spirv.storage_class<StorageBuffer>>,
-  %arg6: memref<16x4xf16, strided<[4, 1]>, #spirv.storage_class<StorageBuffer>>,
   %arg7: memref<16x4xf16, strided<[16, 1]>, #spirv.storage_class<StorageBuffer>>,
   %arg8: memref<16x4xf16, strided<[1, 16]>, #spirv.storage_class<StorageBuffer>>,
   %arg9: memref<16x4xf16, strided<[1, 22]>, #spirv.storage_class<StorageBuffer>>
diff --git a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
index 0bc849e4b7ad9..21aa47b8a8c4f 100644
--- a/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
+++ b/mlir/test/Conversion/MemRefToLLVM/memref-to-llvm.mlir
@@ -539,6 +539,25 @@ func.func @atomic_rmw_with_offset(%I : memref<10xi32, strided<[1]>>, %ival : i32
 
 // -----
 
+// Construct a non-zero runtime offset via reinterpret_cast and verify
+// atomic_rmw threads the constant `5` through descriptor [2] into the data
+// pointer. This replaces the pre-refactor type-level `offset: 5` anchor.
+func.func @atomic_rmw_with_nonzero_offset(%M : memref<20xi32>, %ival : i32, %i : index) {
+  %cast = memref.reinterpret_cast %M to offset: [5], sizes: [10], strides: [1] : memref<20xi32> to memref<10xi32, strided<[1]>>
+  memref.atomic_rmw andi %ival, %cast[%i] : (i32, memref<10xi32, strided<[1]>>) -> i32
+  return
+}
+// CHECK-LABEL:  func @atomic_rmw_with_nonzero_offset
+// CHECK:        %[[C5:.+]] = llvm.mlir.constant(5 : index) : i64
+// CHECK:        %[[DESC:.+]] = llvm.insertvalue %[[C5]], %{{.*}}[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
+// CHECK:        %[[ALIGNED:.+]] = llvm.extractvalue %{{.*}}[1]
+// CHECK:        %[[DESC_OFF:.+]] = llvm.extractvalue %{{.*}}[2]
+// CHECK:        %[[BASE:.+]] = llvm.getelementptr %[[ALIGNED]][%[[DESC_OFF]]]
+// CHECK:        llvm.getelementptr %[[BASE]]
+// CHECK:        llvm.atomicrmw _and
+
+// -----
+
 // CHECK-LABEL: func @generic_atomic_rmw
 // CHECK-INTERFACE-LABEL: func @generic_atomic_rmw
 llvm.func @generic_atomic_rmw() {
diff --git a/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir
index e6613ffb3b0c1..5f225ebc2c224 100644
--- a/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/gather-to-xegpu.mlir
@@ -171,7 +171,9 @@ gpu.func @gather_from_subview(%source: memref<4096x4096xf16>,
 // CHECK-SAME:   %[[MASK:.+]]: vector<8xi1>,
 // CHECK-SAME:   %[[PASS:.+]]: vector<8xf16>) -> vector<8xf16> {
 // CHECK:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[MEMREF_OFF]], %[[MEMREF_OFF]]] [256, 256] [1, 1]
+// CHECK:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // CHECK:        arith.muli {{.*}}%[[OFF1]]{{.*}} : index
+// CHECK:        arith.addi %[[OFFSET]]{{.*}} : index
 // CHECK:        %[[BASE_OFF:.+]] = arith.addi {{.*}}%[[OFF2]]{{.*}} : index
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast %[[BASE_OFF]] : index to vector<8xindex>
 // CHECK:        %[[LIN:.+]] = arith.addi %[[SPLAT]], %[[INDICES]] : vector<8xindex>
@@ -202,6 +204,7 @@ gpu.func @non_unit_inner_stride_1D(
 // CHECK-SAME:   %[[MASK:.+]]: vector<8xi1>, %[[PASS:.+]]: vector<8xf32>) -> vector<8xf32> {
 // CHECK:        %[[BB:.+]], %[[M_OFF:.+]], %[[SZ:.+]], %[[STRIDE:.+]] = memref.extract_strided_metadata %[[SRC]]
 // CHECK:        arith.muli %[[OFF1]], %[[STRIDE]] : index
+// CHECK:        arith.addi %[[M_OFF]]{{.*}} : index
 // CHECK:        %[[STRD_VEC:.+]] = vector.broadcast %[[STRIDE]] : index to vector<8xindex>
 // CHECK:        %[[STRD_INDICES:.+]] = arith.muli %[[STRD_VEC:.+]], %[[INDICES]] : vector<8xindex>
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
diff --git a/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir
index 0073a24789509..da38be9832d8f 100644
--- a/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/scatter-to-xegpu.mlir
@@ -130,6 +130,7 @@ gpu.func @non_unit_inner_stride_1D(
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8xindex>, %[[MASK:.+]]: vector<8xi1>) {
 // CHECK:        %[[BB:.+]], %[[M_OFF:.+]], %[[SZ:.+]], %[[STRIDE:.+]] = memref.extract_strided_metadata %[[SRC]]
 // CHECK:        arith.muli %[[OFF1]], %[[STRIDE]] : index
+// CHECK:        arith.addi %[[M_OFF]]{{.*}} : index
 // CHECK:        %[[STRD_VEC:.+]] = vector.broadcast %[[STRIDE]] : index to vector<8xindex>
 // CHECK:        %[[STRD_INDICES:.+]] = arith.muli %[[STRD_VEC:.+]], %[[INDICES]] : vector<8xindex>
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
@@ -191,7 +192,9 @@ gpu.func @scatter_into_subview(%vals: vector<8xf16>,
 // CHECK-SAME:   %[[MEMREF_OFF:.+]]: index, %[[OFF1:.+]]: index, %[[OFF2:.+]]: index,
 // CHECK-SAME:   %[[INDICES:.+]]: vector<8xindex>, %[[MASK:.+]]: vector<8xi1>) {
 // CHECK:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[MEMREF_OFF]], %[[MEMREF_OFF]]] [256, 256] [1, 1]
+// CHECK:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // CHECK:        arith.muli {{.*}}%[[OFF1]]{{.*}} : index
+// CHECK:        arith.addi %[[OFFSET]]{{.*}} : index
 // CHECK:        %[[BASE_OFF:.+]] = arith.addi {{.*}}%[[OFF2]]{{.*}} : index
 // CHECK:        %[[SPLAT:.+]] = vector.broadcast %[[BASE_OFF]] : index to vector<8xindex>
 // CHECK:        %[[LIN:.+]] = arith.addi %[[SPLAT]], %[[INDICES]] : vector<8xindex>
diff --git a/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir
index 642ee80c8c1fd..066f33f9607bd 100644
--- a/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/transfer-read-to-xegpu.mlir
@@ -440,9 +440,10 @@ gpu.func @load_from_subview_1D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-ND-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-ND:        %[[CST:.+]] = arith.constant dense<true> : vector<8xi1>
 // LOAD-ND:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
+// LOAD-ND:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // LOAD-ND:        %[[STEP:.+]] = vector.step : vector<8xindex>
 // LOAD-ND:        arith.muli {{.*}} : index
-// LOAD-ND:        arith.addi {{.*}} : index
+// LOAD-ND:        arith.addi %[[OFFSET]]{{.*}} : index
 // LOAD-ND:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
 // LOAD-ND:        %[[IDX:.+]] = arith.addi %[[SPLAT]], %[[STEP]] : vector<8xindex>
 // LOAD-ND:        %[[COLLAPSE:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> index
@@ -454,9 +455,10 @@ gpu.func @load_from_subview_1D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-GATHER-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-GATHER:        %[[CST:.+]] = arith.constant dense<true> : vector<8xi1>
 // LOAD-GATHER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
+// LOAD-GATHER:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // LOAD-GATHER:        %[[STEP:.+]] = vector.step : vector<8xindex>
 // LOAD-GATHER:        arith.muli {{.*}} : index
-// LOAD-GATHER:        arith.addi {{.*}} : index
+// LOAD-GATHER:        arith.addi %[[OFFSET]]{{.*}} : index
 // LOAD-GATHER:        %[[SPLAT:.+]] = vector.broadcast {{.*}}:  index to vector<8xindex>
 // LOAD-GATHER:        %[[IDX:.+]] = arith.addi %[[SPLAT]], %[[STEP]] : vector<8xindex>
 // LOAD-GATHER:        %[[COLLAPSE:.+]] = memref.extract_aligned_pointer_as_index %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> index
@@ -494,6 +496,7 @@ gpu.func @load_from_subview_2D(%source: memref<4096x4096xf16>, %off1: index, %of
 // LOAD-GATHER-SAME:   %[[OFF1:.+]]: index, %[[OFF2:.+]]: index
 // LOAD-GATHER:        %[[CST:.+]] = arith.constant dense<true> : vector<8x16xi1>
 // LOAD-GATHER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1] : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
+// LOAD-GATHER:        %[[BB:.+]], %[[OFFSET:.+]],{{.*}},{{.*}} = memref.extract_strided_metadata %[[SUBVIEW]] : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // LOAD-GATHER-COUNT2: vector.step
 // LOAD-GATHER-COUNT2: vector.shape_cast
 // LOAD-GATHER-COUNT2: vector.broadcast
diff --git a/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir b/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir
index ce6d062eb8c96..427d135850695 100644
--- a/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir
+++ b/mlir/test/Conversion/VectorToXeGPU/transfer-write-to-xegpu.mlir
@@ -318,8 +318,11 @@ gpu.func @store_to_subview(%vec: vector<8xf16>,
 // STORE-SCATTER:        %[[CST:.+]] = arith.constant dense<true> : vector<8xi1>
 // STORE-SCATTER:        %[[SUBVIEW:.+]] = memref.subview %[[SRC]][%[[OFF1]], %[[OFF2]]] [256, 256] [1, 1]
 // STORE-SCATTER-SAME:     : memref<4096x4096xf16> to memref<256x256xf16, strided<[4096, 1]>>
+// STORE-SCATTER:        %[[BB:.+]], %[[OFFSET:.+]], {{.*}}, {{.*}} = memref.extract_strided_metadata %[[SUBVIEW]]
+// STORE-SCATTER-SAME:     : memref<256x256xf16, strided<[4096, 1]>> -> memref<f16>, index, index, index, index, index
 // STORE-SCATTER:        %[[STEP:.+]] = vector.step : vector<8xindex>
 // STORE-SCATTER:        arith.muli {{.*}} : index
+// STORE-SCATTER:        arith.addi %[[OFFSET]]{{.*}} : index
 // STORE-SCATTER:        arith.addi {{.*}} : index
 // STORE-SCATTER:        %[[SPLAT:.+]] = vector.broadcast {{.*}} : index to vector<8xindex>
 // STORE-SCATTER:        %[[IDX:.+]] = arith.addi %[[SPLAT]], %[[STEP]] : vector<8xindex>
diff --git a/mlir/test/Dialect/MemRef/canonicalize.mlir b/mlir/test/Dialect/MemRef/canonicalize.mlir
index 1e0516d49bfae..a60d3104c46fb 100644
--- a/mlir/test/Dialect/MemRef/canonicalize.mlir
+++ b/mlir/test/Dialect/MemRef/canonicalize.mlir
@@ -70,10 +70,13 @@ func.func @subview_of_static_full_size(%arg0 : memref<4x6x16x32xi8>) -> memref<4
 
 // -----
 
-// CHECK-LABEL: func @subview_of_static_full_size_folds
+// CHECK-LABEL: func @negative_subview_of_static_full_size
 //  CHECK-SAME:   %[[ARG0:.+]]: memref<16x4xf32,  strided<[4, 1]>>
-//       CHECK:    return %[[ARG0]] : memref<16x4xf32,  strided<[4, 1]>>
-func.func @subview_of_static_full_size_folds(%arg0:  memref<16x4xf32,  strided<[4, 1]>>, %idx: index) -> memref<16x4xf32,  strided<[4, 1]>> {
+//  CHECK-SAME:   %[[IDX:.+]]: index
+//       CHECK:   %[[S:.+]] = memref.subview %[[ARG0]][%[[IDX]], 0] [16, 4] [1, 1]
+//  CHECK-SAME:                    to memref<16x4xf32,  strided<[4, 1]>>
+//       CHECK:    return %[[S]] : memref<16x4xf32,  strided<[4, 1]>>
+func.func @negative_subview_of_static_full_size(%arg0:  memref<16x4xf32,  strided<[4, 1]>>, %idx: index) -> memref<16x4xf32,  strided<[4, 1]>> {
   %0 = memref.subview %arg0[%idx, 0][16, 4][1, 1] : memref<16x4xf32,  strided<[4, 1]>> to memref<16x4xf32,  strided<[4, 1]>>
   return %0 : memref<16x4xf32,  strided<[4, 1]>>
 }
@@ -1270,6 +1273,9 @@ func.func @reinterpret_of_extract_strided_metadata_w_different_stride(%arg0 : me
 }
 // -----
 
+// Check that we don't simplify reinterpret_cast of extract_strided_metadata
+// when the offset doesn't match. (The reinterpret_cast uses constant offset 1
+// while extract_strided_metadata produces the source's runtime offset.)
 // CHECK-LABEL: func @reinterpret_of_extract_strided_metadata_w_different_offset
 //  CHECK-SAME: (%[[ARG:.*]]: memref<8x2xf32>)
 //       CHECK: %[[RES:.*]] = memref.reinterpret_cast %[[ARG]] to offset: [1]
diff --git a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
index de197d4b61324..4186be72a1179 100644
--- a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
+++ b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
@@ -809,11 +809,17 @@ func.func @extract_strided_metadata_of_alloc_with_cst_offset(%arg : index)
 
 // -----
 
-// CHECK-LABEL: extract_strided_metadata_of_alloc_with_cst_offset_in_type
+// Negative test: explicit strided layout (even with unit strides) is treated
+// as non-normalized by the pass, so the alloc's extract_strided_metadata is
+// lowered via reinterpret_cast rather than simplified away. The pre-refactor
+// version used `strided<[1], offset: 3>` to inject a non-zero static offset;
+// since types cannot carry offsets anymore, the strided-layout-annotated
+// alloc itself is what keeps this test in the negative-path.
+// CHECK-LABEL: extract_strided_metadata_of_alloc_with_strided_layout
 //       CHECK: %[[ALLOC:.*]] = memref.alloc
 //       CHECK: %[[BASE:.*]] = memref.reinterpret_cast %[[ALLOC]]
 //       CHECK: return %[[BASE]]
-func.func @extract_strided_metadata_of_alloc_with_cst_offset_in_type(%arg : index)
+func.func @extract_strided_metadata_of_alloc_with_strided_layout(%arg : index)
     -> (memref<i16>, index, index, index) {
 
   %A = memref.alloc() : memref<4xi16, strided<[1]>>
diff --git a/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir b/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir
index a7069048032f2..e969ee7bf710b 100644
--- a/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir
+++ b/mlir/test/Dialect/MemRef/normalize-memrefs-ops.mlir
@@ -191,8 +191,8 @@ func.func @reinterpret_cast_non_zero_offset(%arg0: index, %arg1: memref<1x10x17x
   %alloc_1 = memref.alloc() {alignment = 64 : i64} : memref<1x10x17xf32>
   cf.br ^bb3
 ^bb3:  // pred: ^bb1
-  // CHECK: %[[REINTERPRET_CAST:.*]] = memref.reinterpret_cast %{{.*}} to offset: [0], sizes: [5], strides: [1] : memref<2x17xf32> to memref<5xf32>
-  // CHECK: return %[[REINTERPRET_CAST]], %[[REINTERPRET_CAST]], %{{.*}}, %{{.*}}, %{{.*}} : memref<5xf32>, memref<5xf32>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>
+  // CHECK: %[[REINTERPRET_CAST:.*]] = memref.reinterpret_cast %{{.*}} to offset: [0], sizes: [32], strides: [1] : memref<2x17xf32> to memref<32xf32>
+  // CHECK: return %[[REINTERPRET_CAST]], %[[REINTERPRET_CAST]], %{{.*}}, %{{.*}}, %{{.*}} : memref<32xf32>, memref<32xf32>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>
   %reinterpret_cast = memref.reinterpret_cast %alloc_0 to offset: [27], sizes: [1, 5], strides: [17, 1] : memref<2x17xf32> to memref<1x5xf32, strided<[17, 1]>>
   return %reinterpret_cast, %reinterpret_cast, %alloc_0, %alloc, %alloc_1 : memref<1x5xf32, strided<[17, 1]>>, memref<1x5xf32, strided<[17, 1]>>, memref<2x17xf32>, memref<1x10x17xi32>, memref<1x10x17xf32>
 }
diff --git a/mlir/test/Dialect/SCF/loop-pipelining.mlir b/mlir/test/Dialect/SCF/loop-pipelining.mlir
index babda6f1629a6..c5f696ba686f2 100644
--- a/mlir/test/Dialect/SCF/loop-pipelining.mlir
+++ b/mlir/test/Dialect/SCF/loop-pipelining.mlir
@@ -620,7 +620,7 @@ func.func @backedge_same_stage(%A: memref<?xf32>) -> f32 {
 // CHECK-SAME: ins(%[[R]]#0, %[[R]]#1, %{{.*}} : {{.*}}) outs(%[[CV]] :
 
 
-#map = strided<[1]>
+#strided1 = strided<[1]>
 #map1 = affine_map<(d0)->(d0)>
 #map2 = affine_map<(d0)->()>
 #linalg_attrs = {
@@ -641,17 +641,17 @@ func.func @pipeline_op_with_region(%A: memref<?xf32>, %B: memref<?xf32>, %result
   %a_buf = memref.alloc() : memref<2x8xf32>
   %b_buf = memref.alloc() : memref<2x8xf32>
   scf.for %i0 = %c0 to %c4 step %c1 {
-    %A_view = memref.subview %A[%i0][8][1] { __test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 3 } : memref<?xf32> to memref<8xf32, #map>
-    %B_view = memref.subview %B[%i0][8][1] { __test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 4 } : memref<?xf32> to memref<8xf32, #map>
+    %A_view = memref.subview %A[%i0][8][1] { __test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 3 } : memref<?xf32> to memref<8xf32, #strided1>
+    %B_view = memref.subview %B[%i0][8][1] { __test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 4 } : memref<?xf32> to memref<8xf32, #strided1>
     %buf_idx = affine.apply  affine_map<(d0)->(d0 mod 2)> (%i0)[] { __test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 5 }
-    %a_buf_view = memref.subview %a_buf[%buf_idx,0][1,8][1,1] { __test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 6 } : memref<2x8xf32> to memref<8xf32, #map>
-    %b_buf_view = memref.subview %b_buf[%buf_idx,0][1,8][1,1] { __test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 7 } : memref<2x8xf32> to memref<8xf32, #map>
-    memref.copy %A_view , %a_buf_view {__test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 8} : memref<8xf32, #map> to memref<8xf32, #map>
-    memref.copy %B_view , %b_buf_view {__test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 9} : memref<8xf32, #map> to memref<8xf32, #map>
-    %C_view = memref.subview %result[%i0][8][1] { __test_pipelining_stage__ = 1, __test_pipelining_op_order__ = 0 } : memref<?xf32> to memref<8xf32, #map>
+    %a_buf_view = memref.subview %a_buf[%buf_idx,0][1,8][1,1] { __test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 6 } : memref<2x8xf32> to memref<8xf32, #strided1>
+    %b_buf_view = memref.subview %b_buf[%buf_idx,0][1,8][1,1] { __test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 7 } : memref<2x8xf32> to memref<8xf32, #strided1>
+    memref.copy %A_view , %a_buf_view {__test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 8} : memref<8xf32, #strided1> to memref<8xf32, #strided1>
+    memref.copy %B_view , %b_buf_view {__test_pipelining_stage__ = 0, __test_pipelining_op_order__ = 9} : memref<8xf32, #strided1> to memref<8xf32, #strided1>
+    %C_view = memref.subview %result[%i0][8][1] { __test_pipelining_stage__ = 1, __test_pipelining_op_order__ = 0 } : memref<?xf32> to memref<8xf32, #strided1>
     %scalar = arith.addf %cf, %cf {__test_pipelining_stage__ = 1, __test_pipelining_op_order__ = 1} : f32
-    linalg.generic #linalg_attrs ins(%a_buf_view, %b_buf_view, %scalar : memref<8xf32, #map>, memref<8xf32, #map>, f32)
-      outs(%C_view: memref<8xf32, #map>) {
+    linalg.generic #linalg_attrs ins(%a_buf_view, %b_buf_view, %scalar : memref<8xf32, #strided1>, memref<8xf32, #strided1>, f32)
+      outs(%C_view: memref<8xf32, #strided1>) {
       ^bb0(%a: f32, %b: f32, %s: f32, %c: f32):
         %add = arith.addf %a, %b : f32
         %accum = arith.addf %add, %c : f32
diff --git a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
index 35cfb5b7908f4..ddaf46b9cca48 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
@@ -204,8 +204,10 @@ func.func @contiguous_inner_most_dim_with_subview(%src: memref<1000x1xf32>, %i:i
   return %v : vector<4x1xf32>
 }
 //      CHECK: func @contiguous_inner_most_dim_with_subview(%[[SRC:.+]]: memref<1000x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1xf32>
-//      CHECK:   %[[SRC_0:.+]] = memref.subview %[[SRC]]
-//      CHECK:   %[[SRC_1:.+]] = memref.subview %[[SRC_0]]
+//      CHECK:   %[[SRC_0:.+]] = memref.subview %[[SRC]][%[[II]], 0] [40, 1] [1, 1]
+// The rank-reducing inner subview must not add any additional offset; the
+// runtime offset from %[[II]] is already in %[[SRC_0]]'s descriptor.
+//      CHECK:   %[[SRC_1:.+]] = memref.subview %[[SRC_0]][0, 0] [40, 1] [1, 1]
 //      CHECK:   %[[V:.+]] = vector.transfer_read %[[SRC_1]]
 // CHECK-SAME:       {in_bounds = [true]}
 // CHECK-SAME:       vector<4xf32>
diff --git a/mlir/test/IR/invalid-builtin-types.mlir b/mlir/test/IR/invalid-builtin-types.mlir
index cb433c77b11ca..a6017b1f27695 100644
--- a/mlir/test/IR/invalid-builtin-types.mlir
+++ b/mlir/test/IR/invalid-builtin-types.mlir
@@ -84,6 +84,13 @@ func.func private @memref_incorrect_strided_ending() -> memref<?x?xf32, strided<
 
 // -----
 
+// `offset:` is no longer accepted inside strided layouts; it is a bare-text
+// token after the stride list and so the parser bails on the closing '>'.
+// expected-error @below {{expected '>'}}
+func.func private @memref_no_offset_in_strided_layout() -> memref<?xf32, strided<[1], offset: 5>>
+
+// -----
+
 // expected-error @below {{expected the number of strides to match the rank}}
 func.func private @memref_strided_rank_mismatch() -> memref<?x?xf32, strided<[1]>>
 
diff --git a/mlir/test/Transforms/canonicalize.mlir b/mlir/test/Transforms/canonicalize.mlir
index 35fe199610ae2..498dd7804a811 100644
--- a/mlir/test/Transforms/canonicalize.mlir
+++ b/mlir/test/Transforms/canonicalize.mlir
@@ -733,6 +733,11 @@ func.func @view(%arg0 : index) -> (f32, f32, f32, f32) {
 
 // -----
 
+// Offset folding is still verified by the subview op's offset operands
+// (e.g. `[1, 2, 7]` with strides `[6144, 64, 1]` pins the composed runtime
+// offset to 6279; `[2, 4]` with strides `[64, 1]` pins it to 132). The
+// pre-refactor `offset: 6279` / `offset: 132` on the result type was a
+// redundant cross-check.
 // CHECK-LABEL: func @subview
 // CHECK-SAME: %[[ARG0:.*]]: index, %[[ARG1:.*]]: index
 func.func @subview(%arg0 : index, %arg1 : index) -> (index, index) {
diff --git a/mlir/test/Transforms/compose-subview.mlir b/mlir/test/Transforms/compose-subview.mlir
index 9d058a3fa039b..f9ce1e1bff491 100644
--- a/mlir/test/Transforms/compose-subview.mlir
+++ b/mlir/test/Transforms/compose-subview.mlir
@@ -1,5 +1,12 @@
 // RUN: mlir-opt %s -test-compose-subview -split-input-file | FileCheck %s
 
+// These tests verify that nested subviews compose into a single subview whose
+// offset operands encode the composed offset. The composed runtime offset is
+// `sum(offsets[i] * strides[i])` and used to be cross-checked via an
+// `offset: N` field on the result type (e.g. 3*1024 + 384*1 = 3456); since
+// memref types no longer carry offsets, the composed offset operands (e.g.
+// [3, 384]) are the canonical verification.
+
 // CHECK-LABEL: func.func @subview_strided(
 // CHECK-SAME: %[[input:.*]]: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1]>> {
 func.func @subview_strided(%input: memref<4x1024xf32>) -> memref<1x128xf32, strided<[1024, 1]>> {

>From 26501945d8bcfa519c84ef0886f1306d4a12bc93 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 17 Apr 2026 17:30:19 +0200
Subject: [PATCH 27/27] [WIP][mlir] step 7: replacement coverage for type-level
 offset tests

- expand-strided-metadata.mlir: @extract_strided_metadata_of_reinterpret_cast_static_offset
  verifies a reinterpret_cast with `offset: [42]` folds to an
  arith.constant 42 alongside extract_strided_metadata of the source.
  Replaces the static-offset inference coverage lost when the
  TestMemRefStrideCalculation printer stopped emitting offsets.
- NVGPUToNVVM/nvgpu-tma-end-to-end-offset.mlir: new e2e test that
  runs nvgpu-to-nvvm + expand-strided-metadata + finalize-memref-to-llvm
  + reconcile-unrealized-casts + canonicalize and anchors
  `arith.constant 8192` from a TMA subview chain [2,0,0] with
  stride 4096. Restores the cross-pass 8192-offset verification
  that nvgpu-to-nvvm-alone no longer provides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply at anthropic.com>
---
 .../nvgpu-tma-end-to-end-offset.mlir          | 27 +++++++++++++++++++
 .../MemRef/expand-strided-metadata.mlir       | 25 +++++++++++++++++
 2 files changed, 52 insertions(+)
 create mode 100644 mlir/test/Conversion/NVGPUToNVVM/nvgpu-tma-end-to-end-offset.mlir

diff --git a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-tma-end-to-end-offset.mlir b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-tma-end-to-end-offset.mlir
new file mode 100644
index 0000000000000..3605887567487
--- /dev/null
+++ b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-tma-end-to-end-offset.mlir
@@ -0,0 +1,27 @@
+// RUN: mlir-opt %s -convert-nvgpu-to-nvvm -expand-strided-metadata \
+// RUN:   -finalize-memref-to-llvm -reconcile-unrealized-casts -canonicalize \
+// RUN: | FileCheck %s
+
+// End-to-end anchor for TMA async-load with a subview that produces a
+// non-zero runtime offset (2 * 4096 = 8192). Pre-refactor, nvgpu-to-nvvm
+// alone emitted `llvm.mlir.constant(8192)` because the static offset was
+// baked into the memref type. Post-refactor, the offset is computed by
+// memref-to-llvm from the subview indices `[2, 0, 0]` and stride 4096,
+// so the 8192 constant only appears after the full pipeline runs.
+
+!rhsTensorMap = !nvgpu.tensormap.descriptor<tensor = memref<64x64xf16, strided<[64, 1]>, 3>, swizzle = swizzle_128b, l2promo = none, oob = zero, interleave = none>
+!barrierType = !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>>
+
+memref.global "private" @dynamicShmem : memref<0xf16,3>
+
+// CHECK-LABEL: func @async_tma_load_subview
+//       CHECK: arith.constant 8192 : index
+func.func @async_tma_load_subview(%rhsTensorMap: !rhsTensorMap, %mbarrier: !barrierType) {
+  %c0 = arith.constant 0 : index
+  %dynamicMem = memref.get_global @dynamicShmem : memref<0xf16, 3>
+  %rhsShmem2 = memref.reinterpret_cast %dynamicMem to offset: [0], sizes: [4, 64, 64], strides: [4096, 64, 1] : memref<0xf16, 3> to memref<4x64x64xf16,3>
+  %rhsShmem3 = memref.subview %rhsShmem2[2, 0, 0][1, 64, 64][1, 1, 1] : memref<4x64x64xf16,3> to memref<1x64x64xf16, strided<[4096, 64, 1]>, 3>
+  %rhsShmem = memref.subview %rhsShmem3[0, 0, 0][1, 64, 64][1, 1, 1] : memref<1x64x64xf16, strided<[4096, 64, 1]>, 3> to memref<64x64xf16, strided<[64, 1]>, 3>
+  nvgpu.tma.async.load %rhsTensorMap[%c0, %c0], %mbarrier[%c0] to %rhsShmem : !rhsTensorMap, !barrierType -> memref<64x64xf16, strided<[64, 1]>, 3>
+  return
+}
diff --git a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
index 4186be72a1179..412b7a70bb475 100644
--- a/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
+++ b/mlir/test/Dialect/MemRef/expand-strided-metadata.mlir
@@ -1127,6 +1127,31 @@ func.func @extract_strided_metadata_of_extract_strided_metadata(%arg : memref<i3
 
 // -----
 
+// Check that a reinterpret_cast with a static offset folds into an
+// extract_strided_metadata whose offset result is an arith.constant. This
+// exercises the replacement coverage for type-level static-offset inference
+// (which previously lived in `memref-stride-calculation.mlir`).
+// CHECK-LABEL: func @extract_strided_metadata_of_reinterpret_cast_static_offset
+//  CHECK-SAME: %[[ARG:.*]]: memref<?x?xi32, strided<[?, ?]>>, %[[SZ:.*]]: index, %[[STR:.*]]: index
+//   CHECK-DAG: %[[C42:.*]] = arith.constant 42 : index
+//   CHECK-DAG: %[[BASE:.*]], %{{.*}}, %{{.*}}:2, %{{.*}}:2 = memref.extract_strided_metadata %[[ARG]]
+//       CHECK: return %[[BASE]], %[[C42]]
+func.func @extract_strided_metadata_of_reinterpret_cast_static_offset(
+  %arg : memref<?x?xi32, strided<[?, ?]>>,
+  %sz : index, %str : index)
+  -> (memref<i32>, index, index, index, index, index) {
+  %cast = memref.reinterpret_cast %arg to offset: [42], sizes: [%sz, %sz],
+      strides: [%str, %str] : memref<?x?xi32, strided<[?, ?]>> to
+      memref<?x?xi32, strided<[?, ?]>>
+  %base, %off, %sizes:2, %strides:2 =
+    memref.extract_strided_metadata %cast : memref<?x?xi32, strided<[?, ?]>>
+    -> memref<i32>, index, index, index, index, index
+  return %base, %off, %sizes#0, %sizes#1, %strides#0, %strides#1
+    : memref<i32>, index, index, index, index, index
+}
+
+// -----
+
 // Check that we simplify extract_strided_metadata of reinterpret_cast
 // when the source of the reinterpret_cast is compatible with what
 // `extract_strided_metadata`s accept.



More information about the Mlir-commits mailing list