[llvm-branch-commits] [flang] [mlir] [Flang][mlir][OpenMP] Translate omp.declare_simd to LLVM IR (PR #187767)

Sat Mar 21 11:21:04 PDT 2026

https://github.com/chichunchen updated https://github.com/llvm/llvm-project/pull/187767

>From 29547276cada5b708555e913d0eb31852134c483 Mon Sep 17 00:00:00 2001
From: "Chi Chun, Chen" <chichun.chen at hpe.com>
Date: Thu, 12 Mar 2026 17:21:09 -0500
Subject: [PATCH 1/2] [mlir][OpenMP] Translate omp.declare_simd to LLVM IR

Translate omp.declare_simd operations to LLVM IR by computing Vector
Function ABI (VFABI) mangled names and attaching them as function
attributes. This reuses function parameter mangling and codegen logic
in OpenMPIRBuilder which was extracted form Clang [2][3].

For each omp.declare_simd, lowering computes:

- ParamAttrs: one entry per function argument, classifying it as
  Vector / Uniform / Linear (+ step or var-stride) / Aligned.
- Branch kind: Undefined / Inbranch / Notinbranch.
- VLEN: either from simdlen() or derived from ISA-specific rules.

x86 (SSE/AVX/AVX2/AVX-512):

Emits mangled names following the x86 Vector ABI [1]:

  _ZGV<ISA><Mask><VLEN><ParamAttrs>_<name>

where ISA is b (SSE), c (AVX), d (AVX2), e (AVX-512). Without an
explicit simdlen, VLEN is computed from the Characteristic Data Type
(CDT) and the vector register width for each ISA.

AArch64 (AdvSIMD / SVE):

Emits mangled names following the AAVFABI [4]:

  _ZGV<ISA><Mask><VLEN><ParamAttrs>_<name>

where ISA is n (AdvSIMD/Neon) or s (SVE). Without an explicit simdlen,
AdvSIMD VLEN is derived from the Narrowest Data Size (NDS) of the
function signature. The implementation computes Maps To Vector (MTV),
Pass By Value (PBV), Lane Size (LS), NDS, and Widest Data Size (WDS)
per the AAVFABI spec. WDS and architectural limits (128-2048 bit,
128-bit aligned) are used to validate explicit simdlen values, with
diagnostics emitted for invalid cases.

Also adds:
- Linear modifier (val/ref/uval) support in parameter attribute
  lowering, mapping omp::LinearModifier to the appropriate
  DeclareSimdKindTy.
- An arg_types attribute on DeclareSimdOp to recover language-level
  pointee type information for opaque !llvm.ptr parameters, enabling
  correct LS/NDS/WDS computation.

[1] https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt
[2] https://github.com/llvm/llvm-project/commit/c7a82b41a706728ce7c212b5bc40c74d1cce53c7
[3] https://github.com/llvm/llvm-project/commit/a0a2264ef757f8383c6b283b7ad80b33d5d52f13
[4] https://github.com/ARM-software/abi-aa/tree/main/vfabia64

Assisted by copilot with claude model
---
 flang/lib/Lower/OpenMP/OpenMP.cpp             |  32 +-
 flang/test/Lower/OpenMP/declare-simd.f90      |  18 +-
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td |   8 +
 mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp  |   3 +-
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp      | 391 +++++++++++
 mlir/test/Dialect/OpenMP/ops.mlir             |  16 +
 .../openmp-declare-simd-aarch64-01.mlir       | 314 +++++++++
 .../openmp-declare-simd-aarch64-02.mlir       |  39 ++
 .../openmp-declare-simd-aarch64-nds.mlir      |  86 +++
 .../openmp-declare-simd-aarch64-sve.mlir      |  37 ++
 .../openmp-declare-simd-aarch64-warnings.mlir |  74 +++
 .../openmp-declare-simd-aarch64-wds.mlir      | 108 ++++
 .../LLVMIR/openmp-declare-simd-x86.mlir       | 610 ++++++++++++++++++
 13 files changed, 1727 insertions(+), 9 deletions(-)
 create mode 100644 mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-01.mlir
 create mode 100644 mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-02.mlir
 create mode 100644 mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-nds.mlir
 create mode 100644 mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-sve.mlir
 create mode 100644 mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-warnings.mlir
 create mode 100644 mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-wds.mlir
 create mode 100644 mlir/test/Target/LLVMIR/openmp-declare-simd-x86.mlir

diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 152541e83372b..d0a6725a6ab8e 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -4043,7 +4043,37 @@ genOMP(lower::AbstractConverter &converter, lower::SymMap &symTable,
   cp.processSimdlen(clauseOps);
   cp.processUniform(clauseOps);
 
-  mlir::omp::DeclareSimdOp::create(converter.getFirOpBuilder(), loc, clauseOps);
+  auto declareSimdOp = mlir::omp::DeclareSimdOp::create(
+      converter.getFirOpBuilder(), loc, clauseOps);
+
+  // Record the scalar element types of all function arguments so that
+  // the OpenMPToLLVMIRTranslation can recover pointee-type information
+  // lost in opaque pointers for correct LS / NDS / WDS computation.
+  // We strip FIR wrappers (box, heap, ref, array) to get the plain scalar
+  // type (e.g. i32, f64) that survives FIR-to-LLVM type conversion unchanged.
+  if (auto *owningProc = eval.getOwningProcedure();
+      owningProc && !owningProc->isMainProgram()) {
+    const auto &subpSym = owningProc->getSubprogramSymbol();
+    if (auto *details =
+            subpSym.GetUltimate()
+                .detailsIf<Fortran::semantics::SubprogramDetails>()) {
+      llvm::SmallVector<mlir::Attribute> argTypeAttrs;
+      for (const auto *arg : details->dummyArgs()) {
+        if (arg) {
+          mlir::Type ty = converter.genType(*arg);
+          // Unwrap FIR container types to get the scalar element type.
+          ty = fir::getFortranElementType(ty);
+          argTypeAttrs.push_back(mlir::TypeAttr::get(ty));
+        } else {
+          argTypeAttrs.push_back(mlir::TypeAttr::get(
+              mlir::NoneType::get(&converter.getMLIRContext())));
+        }
+      }
+      if (!argTypeAttrs.empty())
+        declareSimdOp.setArgTypesAttr(
+            mlir::ArrayAttr::get(&converter.getMLIRContext(), argTypeAttrs));
+    }
+  }
 }
 
 static void genOpenMPDeclareMapperImpl(
diff --git a/flang/test/Lower/OpenMP/declare-simd.f90 b/flang/test/Lower/OpenMP/declare-simd.f90
index b2c4592ad4e8a..7621fe1a0cd76 100644
--- a/flang/test/Lower/OpenMP/declare-simd.f90
+++ b/flang/test/Lower/OpenMP/declare-simd.f90
@@ -38,7 +38,8 @@ end subroutine declare_simd_aligned
 ! CHECK: %[[X_A:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE_A]] arg 1 {{.*pointer.*}} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, !fir.dscope) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>)
 ! CHECK: %[[Y_A:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE_A]] arg 2 {{.*pointer.*}} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, !fir.dscope) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>)
 ! CHECK: omp.declare_simd aligned(%[[X_A]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>> -> 64 : i64,
-! CHECK-SAME:                    %[[Y_A]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>> -> 64 : i64){{$}}
+! CHECK-SAME:                    %[[Y_A]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>> -> 64 : i64)
+! CHECK-SAME: {arg_types = [f64, f64, i32, i32]}{{$}}
 ! CHECK: return
 
 subroutine declare_simd_linear(x, y, n, i)
@@ -57,7 +58,8 @@ end subroutine  declare_simd_linear
 ! CHECK: %[[SCOPE:.*]] = fir.dummy_scope : !fir.dscope
 ! CHECK: %[[I:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 4 {{.*}} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK: %[[C1:.*]] = arith.constant 1 : i32
-! CHECK: omp.declare_simd linear(ref(%[[I]]#0 : !fir.ref<i32> = %[[C1]] : i32)) {linear_var_types = [i32]}{{$}}
+! CHECK: omp.declare_simd linear(ref(%[[I]]#0 : !fir.ref<i32> = %[[C1]] : i32))
+! CHECK-SAME: {arg_types = [f64, f64, i32, i32], linear_var_types = [i32]}{{$}}
 ! CHECK: return
 
 subroutine declare_simd_simdlen(x, y, n, i)
@@ -70,7 +72,8 @@ end subroutine declare_simd_simdlen
 
 ! CHECK-LABEL: func.func @_QPdeclare_simd_simdlen(
 ! CHECK: %[[SCOPE_S:.*]] = fir.dummy_scope : !fir.dscope
-! CHECK: omp.declare_simd{{.*}}simdlen(8){{$}}
+! CHECK: omp.declare_simd{{.*}}simdlen(8)
+! CHECK-SAME: {arg_types = [f32, f32, i32, i32]}{{$}}
 ! CHECK-NEXT: return
 
 subroutine declare_simd_uniform(x, y, n, i)
@@ -100,6 +103,7 @@ end subroutine declare_simd_uniform
 ! CHECK: %[[XDECL:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 1
 ! CHECK: %[[YDECL:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 2
 ! CHECK: omp.declare_simd uniform(%[[XDECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, %[[YDECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>)
+! CHECK-SAME: {arg_types = [f64, f64, i32, i32]}
 ! CHECK: return
 
 subroutine declare_simd_inbranch()
@@ -161,7 +165,7 @@ end subroutine declare_simd_combined
 ! CHECK-SAME: simdlen(8)
 ! CHECK-SAME: uniform(%[[X_DECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>,
 ! CHECK-SAME:         %[[Y_DECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>)
-! CHECK-SAME: {linear_var_types = [i32]}{{$}}
+! CHECK-SAME: {arg_types = [f64, f64, i32, i32], linear_var_types = [i32]}{{$}}
 ! CHECK: return
 
 subroutine declare_simd_linear_val(a, b)
@@ -180,7 +184,7 @@ end subroutine declare_simd_linear_val
 ! CHECK: %[[C2:.*]] = arith.constant 2 : i32
 ! CHECK: %[[C1:.*]] = arith.constant 1 : i32
 ! CHECK: omp.declare_simd linear(val(%[[A]]#0 : !fir.ref<i32> = %[[C2]] : i32), val(%[[B]]#0 : !fir.ref<i32> = %[[C1]] : i32))
-! CHECK-SAME: {linear_var_types = [i32, i32]}{{$}}
+! CHECK-SAME: {arg_types = [i32, i32], linear_var_types = [i32, i32]}{{$}}
 ! CHECK: return
 
 subroutine declare_simd_linear_ref(x)
@@ -197,7 +201,7 @@ end subroutine declare_simd_linear_ref
 ! CHECK: %[[X:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 1 {{.*}} : (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.dscope) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
 ! CHECK: %[[C4:.*]] = arith.constant 4 : i32
 ! CHECK: omp.declare_simd linear(ref(%[[X]]#0 : !fir.ref<!fir.box<!fir.heap<i32>>> = %[[C4]] : i32))
-! CHECK-SAME: {linear_var_types = [!fir.box<!fir.heap<i32>>]}{{$}}
+! CHECK-SAME: {arg_types = [i32], linear_var_types = [!fir.box<!fir.heap<i32>>]}{{$}}
 ! CHECK: return
 
 subroutine declare_simd_linear_uval(y)
@@ -214,5 +218,5 @@ end subroutine declare_simd_linear_uval
 ! CHECK: %[[Y:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 1 {{.*}} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK: %[[C1:.*]] = arith.constant 1 : i32
 ! CHECK: omp.declare_simd linear(uval(%[[Y]]#0 : !fir.ref<i32> = %[[C1]] : i32))
-! CHECK-SAME: {linear_var_types = [i32]}{{$}}
+! CHECK-SAME: {arg_types = [i32], linear_var_types = [i32]}{{$}}
 ! CHECK: return
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 88c8ab4f6f949..dc5255aaab47e 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -2285,6 +2285,12 @@ def DeclareSimdOp
     a function body. It attaches clauses of declare simd to the enclosing
     function.
 
+    The optional `arg_types` attribute records the original language-level
+    types of the enclosing function's arguments.  This is used during
+    translation to recover pointee-type information lost in opaque
+    `!llvm.ptr`, enabling correct lane-size (LS) computation for the
+    AArch64 AAVFABI.
+
     Example:
     ```mlir
     func.func @add(%a: memref<16xi32>) {
@@ -2294,6 +2300,8 @@ def DeclareSimdOp
     ```
   }] # clausesDescription;
 
+  let arguments = !con(clausesArgs, (ins OptionalAttr<ArrayAttr>:$arg_types));
+
   let builders = [OpBuilder<(
       ins CArg<"const DeclareSimdOperands &">:$clauses)>];
 
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 04418ee39be54..b07220dffcecd 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -4722,7 +4722,8 @@ void DeclareSimdOp::build(OpBuilder &odsBuilder, OperationState &odsState,
                        clauses.linearVars, clauses.linearStepVars,
                        clauses.linearVarTypes, clauses.linearModifiers,
                        clauses.notinbranch, clauses.simdlen,
-                       clauses.uniformVars);
+                       clauses.uniformVars,
+                       /*arg_types=*/nullptr);
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 37b1a37c2e1a5..fb0f6636f8e8f 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -7465,6 +7465,394 @@ convertTargetFreeMemOp(Operation &opInst, llvm::IRBuilderBase &builder,
   return success();
 }
 
+static void populateLinearParam(
+    llvm::DenseMap<Value, unsigned> &argIndexMap, omp::DeclareSimdOp ds,
+    llvm::SmallVectorImpl<llvm::OpenMPIRBuilder::DeclareSimdAttrTy> &attrs) {
+  OperandRange linearVars = ds.getLinearVars();
+  OperandRange linearStepVars = ds.getLinearStepVars();
+  std::optional<ArrayAttr> linearModifiers = ds.getLinearModifiers();
+
+  const llvm::APSInt defaultStep(llvm::APInt(/*numBits=*/32, /*val=*/1),
+                                 /*isUnsigned=*/true);
+
+  auto resolveStepArgIndex = [&](Value stepValue) -> std::optional<unsigned> {
+    // Var-stride can be expressed as `llvm.load %argN`.
+    if (auto load = stepValue.getDefiningOp<LLVM::LoadOp>())
+      stepValue = load.getAddr();
+
+    if (auto it = argIndexMap.find(stepValue); it != argIndexMap.end())
+      return it->second;
+
+    return std::nullopt;
+  };
+
+  for (size_t i = 0; i < linearVars.size(); ++i) {
+    llvm::OpenMPIRBuilder::DeclareSimdAttrTy &paramAttr =
+        attrs[argIndexMap[linearVars[i]]];
+    omp::LinearModifierAttr linearModAttr;
+
+    if (linearModifiers && (*linearModifiers)[i])
+      linearModAttr = dyn_cast<omp::LinearModifierAttr>((*linearModifiers)[i]);
+
+    if (linearModAttr) {
+      switch (linearModAttr.getValue()) {
+      case omp::LinearModifier::ref:
+        paramAttr.Kind = llvm::OpenMPIRBuilder::DeclareSimdKindTy::LinearRef;
+        break;
+      case omp::LinearModifier::uval:
+        paramAttr.Kind = llvm::OpenMPIRBuilder::DeclareSimdKindTy::LinearUVal;
+        break;
+      case omp::LinearModifier::val:
+        // Match clang: val on a non-reference (non-pointer SSA type) is
+        // semantically identical to plain linear.  Only pointer-typed
+        // vars (which may originate from C++ references) get LinearVal.
+        if (isa<LLVM::LLVMPointerType>(linearVars[i].getType()))
+          paramAttr.Kind = llvm::OpenMPIRBuilder::DeclareSimdKindTy::LinearVal;
+        else
+          paramAttr.Kind = llvm::OpenMPIRBuilder::DeclareSimdKindTy::Linear;
+        break;
+      }
+    } else {
+      paramAttr.Kind = llvm::OpenMPIRBuilder::DeclareSimdKindTy::Linear;
+    }
+
+    paramAttr.HasVarStride = false;
+    paramAttr.StrideOrArg = defaultStep;
+
+    if (i >= linearStepVars.size())
+      continue;
+
+    Value stepValue = linearStepVars[i];
+
+    if (std::optional<unsigned> stepArgIdx = resolveStepArgIndex(stepValue)) {
+      paramAttr.HasVarStride = true;
+      paramAttr.StrideOrArg =
+          llvm::APSInt(llvm::APInt(/*numBits=*/32, *stepArgIdx),
+                       /*isUnsigned=*/true);
+      continue;
+    }
+
+    if (auto cst = stepValue.getDefiningOp<LLVM::ConstantOp>()) {
+      IntegerAttr intAttr = cast<IntegerAttr>(cst.getValue());
+      paramAttr.HasVarStride = false;
+      paramAttr.StrideOrArg =
+          llvm::APSInt(intAttr.getValue(), /*isUnsigned=*/false);
+      continue;
+    }
+
+    llvm_unreachable("unhandled linear step form");
+  }
+}
+
+static llvm::OpenMPIRBuilder::DeclareSimdBranch
+getDeclareSimdBranch(omp::DeclareSimdOp &op) {
+  if (op.getInbranch())
+    return llvm::OpenMPIRBuilder::DeclareSimdBranch::Inbranch;
+  if (op.getNotinbranch())
+    return llvm::OpenMPIRBuilder::DeclareSimdBranch::Notinbranch;
+  return llvm::OpenMPIRBuilder::DeclareSimdBranch::Undefined;
+}
+
+static unsigned
+evaluateCDTSize(const llvm::Function *fn,
+                ArrayRef<llvm::OpenMPIRBuilder::DeclareSimdAttrTy> paramAttrs) {
+  // Every vector variant of a SIMD-enabled function has a vector length (VLEN).
+  // If OpenMP clause "simdlen" is used, the VLEN is the value of the argument
+  // of that clause. The VLEN value must be power of 2.
+  // In other case the notion of the function`s "characteristic data type" (CDT)
+  // is used to compute the vector length.
+  // CDT is defined in the following order:
+  //   a) For non-void function, the CDT is the return type.
+  //   b) If the function has any non-uniform, non-linear parameters, then the
+  //   CDT is the type of the first such parameter.
+  //   c) If the CDT determined by a) or b) above is struct, union, or class
+  //   type which is pass-by-value (except for the type that maps to the
+  //   built-in complex data type), the characteristic data type is int.
+  //   d) If none of the above three cases is applicable, the CDT is int.
+  // The VLEN is then determined based on the CDT and the size of vector
+  // register of that ISA for which current vector version is generated. The
+  // VLEN is computed using the formula below:
+  //   VLEN  = sizeof(vector_register) / sizeof(CDT),
+  // where vector register size specified in section 3.2.1 Registers and the
+  // Stack Frame of original AMD64 ABI document.
+  const llvm::DataLayout &dl = fn->getParent()->getDataLayout();
+  llvm::Type *cdtTy = nullptr;
+
+  // For a non-void function, the return type is the characteristic data type.
+  llvm::Type *retTy = fn->getReturnType();
+  if (retTy && !retTy->isVoidTy())
+    cdtTy = retTy;
+
+  // Otherwise, use the first parameter that is still vectorized. Parameters
+  // without an explicit declare simd attribute are treated as vector
+  // parameters because Vector is the default kind.
+  if (!cdtTy) {
+    unsigned numParams = fn->getFunctionType()->getNumParams();
+    for (unsigned i = 0; i < numParams; ++i) {
+      bool isVectorParam = i >= paramAttrs.size() ||
+                           paramAttrs[i].Kind ==
+                               llvm::OpenMPIRBuilder::DeclareSimdKindTy::Vector;
+      if (!isVectorParam)
+        continue;
+      cdtTy = fn->getFunctionType()->getParamType(i);
+      break;
+    }
+  }
+
+  // Aggregates and the lack of a suitable scalar/vector parameter both fall
+  // back to `int`, matching the current lowering rule used here.
+  llvm::Type *intTy = llvm::Type::getInt32Ty(fn->getContext());
+  if (!cdtTy || cdtTy->isStructTy() || cdtTy->isArrayTy())
+    cdtTy = intTy;
+
+  return dl.getTypeSizeInBits(cdtTy);
+}
+
+// Check the values provided via `simdlen` by the user.
+static bool validateAArch64SimdLen(Operation &op, llvm::Function *fn,
+                                   unsigned userVLEN, unsigned wds, char isa) {
+  // 1. A `simdlen(1)` doesn't produce vector signatures.
+  if (userVLEN == 1) {
+    op.emitWarning("simdlen(1) has no effect on AArch64 declare simd");
+    return false;
+  }
+
+  // 2. Section 3.3.1, item 1: user input must be a power of 2 for Advanced
+  // SIMD.
+  if (isa == 'n' && userVLEN && !llvm::isPowerOf2_32(userVLEN)) {
+    op.emitWarning("AArch64 Advanced SIMD declare simd requires simdlen to be "
+                   "a power of 2");
+    return false;
+  }
+
+  // 3. Section 3.4.1: SVE fixed length must obey the architectural limits.
+  if (isa == 's' && userVLEN != 0 &&
+      ((userVLEN * wds > 2048) || (userVLEN * wds % 128 != 0))) {
+    op.emitWarning() << "AArch64 SVE fixed-length declare simd simdlen must "
+                        "fit architectural "
+                        "lane limits for element width "
+                     << wds;
+    return false;
+  }
+
+  return true;
+}
+
+/// Maps To Vector (MTV), as defined in 4.1.1 of the AAVFABI (2021Q1).
+static bool getAArch64MTV(Type ty,
+                          llvm::OpenMPIRBuilder::DeclareSimdKindTy kind) {
+  if (!ty)
+    return false;
+
+  if (kind == llvm::OpenMPIRBuilder::DeclareSimdKindTy::Uniform)
+    return false;
+
+  if (kind == llvm::OpenMPIRBuilder::DeclareSimdKindTy::LinearUVal ||
+      kind == llvm::OpenMPIRBuilder::DeclareSimdKindTy::LinearRef)
+    return false;
+
+  if (kind == llvm::OpenMPIRBuilder::DeclareSimdKindTy::Linear ||
+      kind == llvm::OpenMPIRBuilder::DeclareSimdKindTy::LinearVal)
+    return false;
+
+  return true;
+}
+
+/// Pass By Value (PBV), as defined in 3.1.2 of the AAVFABI.
+static bool getAArch64PBV(Type ty, const DataLayout &dl) {
+  unsigned size = dl.getTypeSizeInBits(ty);
+
+  // Only scalars and pointer-like types within 16 bytes set PBV to true.
+  if (size != 8 && size != 16 && size != 32 && size != 64 && size != 128)
+    return false;
+
+  if (isa<FloatType, IntegerType, IndexType>(ty))
+    return true;
+
+  if (isa<LLVM::LLVMPointerType, omp::PointerLikeType>(ty))
+    return true;
+
+  // TODO: Add support for complex types (section 3.1.2, item 2).
+  return false;
+}
+
+// Computes the lane size (LS) of a return type or of an input parameter,
+// as defined by `LS(P)` in 3.2.1 of the AAVFABI.
+//
+// argElemType provides  the original language-level type for an opaque
+// `!llvm.ptr` parameter, enabling correct LS computation.
+static unsigned getAArch64LS(Type ty,
+                             llvm::OpenMPIRBuilder::DeclareSimdKindTy kind,
+                             const DataLayout &dl,
+                             Type argElemType = nullptr) {
+  if (!getAArch64MTV(ty, kind)) {
+    if (auto ptrLikeTy = dyn_cast<omp::PointerLikeType>(ty)) {
+      Type elemTy = ptrLikeTy.getElementType();
+      if (elemTy && getAArch64PBV(elemTy, dl))
+        return dl.getTypeSizeInBits(elemTy);
+    }
+    // For opaque !llvm.ptr, use the original type from arg_types
+    // if available, since the pointee type is lost at LLVM IR level.
+    if (isa<LLVM::LLVMPointerType>(ty) && argElemType &&
+        getAArch64PBV(argElemType, dl))
+      return dl.getTypeSizeInBits(argElemType);
+  }
+
+  if (getAArch64PBV(ty, dl))
+    return dl.getTypeSizeInBits(ty);
+
+  return dl.getTypeSizeInBits(LLVM::LLVMPointerType::get(ty.getContext(),
+                                                         /*addressSpace=*/0));
+}
+
+// Get Narrowest Data Size (NDS) and Widest Data Size (WDS) from the
+// signature of the scalar function, as defined in 3.2.2 of the AAVFABI.
+//
+// argElemTypes maps function argument index to the original language-level
+// type from `arg_types`, if available.  This is used to recover pointee-type
+// information lost in opaque `!llvm.ptr`.
+static std::tuple<unsigned, unsigned, bool>
+getNDSWDS(FunctionOpInterface funcOp,
+          ArrayRef<llvm::OpenMPIRBuilder::DeclareSimdAttrTy> paramAttrs,
+          const DataLayout &dl, ArrayRef<Type> argElemTypes = {}) {
+  bool outputBecomesInput = false;
+
+  llvm::SmallVector<unsigned, 8> sizes;
+  if (funcOp.getNumResults() != 0) {
+    Type retTy = funcOp.getResultTypes().front();
+    sizes.push_back(getAArch64LS(
+        retTy, llvm::OpenMPIRBuilder::DeclareSimdKindTy::Vector, dl));
+    if (!getAArch64PBV(retTy, dl) &&
+        getAArch64MTV(retTy,
+                      llvm::OpenMPIRBuilder::DeclareSimdKindTy::Vector)) {
+      outputBecomesInput = true;
+    }
+  }
+
+  for (auto [index, argTy] : llvm::enumerate(funcOp.getArgumentTypes())) {
+    Type elemTy = (index < argElemTypes.size()) ? argElemTypes[index] : nullptr;
+    sizes.push_back(getAArch64LS(argTy, paramAttrs[index].Kind, dl, elemTy));
+  }
+
+  assert(!sizes.empty() && "Unable to determine NDS and WDS.");
+  // The LS of a function parameter / return value can only be a power
+  // of 2, starting from 8 bits, up to 128.
+  assert(llvm::all_of(sizes,
+                      [](unsigned size) {
+                        return size == 8 || size == 16 || size == 32 ||
+                               size == 64 || size == 128;
+                      }) &&
+         "Invalid size");
+
+  return std::make_tuple(*llvm::min_element(sizes), *llvm::max_element(sizes),
+                         outputBecomesInput);
+}
+
+static LogicalResult
+convertDeclareSimdOp(Operation &opInst, llvm::IRBuilderBase &builder,
+                     LLVM::ModuleTranslation &moduleTranslation) {
+  llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
+  auto declareSimdOp = cast<omp::DeclareSimdOp>(opInst);
+
+  auto funcOp = opInst.getParentOfType<FunctionOpInterface>();
+  assert(funcOp && "declare_simd must be defined inside a function");
+  llvm::Function *fn = moduleTranslation.lookupFunction(funcOp.getName());
+  assert(fn && "Failed to find corresponding LLVM function for function op");
+
+  llvm::SmallVector<llvm::OpenMPIRBuilder::DeclareSimdAttrTy, 8> paramAttrs(
+      funcOp.getNumArguments());
+
+  llvm::DenseMap<mlir::Value, unsigned> argIndexMap;
+  for (auto [idx, arg] : llvm::enumerate(funcOp.getArguments()))
+    argIndexMap.try_emplace(arg, idx);
+
+  // Populate params for uniform clause.
+  for (Value u : declareSimdOp.getUniformVars()) {
+    paramAttrs[argIndexMap[u]].Kind =
+        llvm::OpenMPIRBuilder::DeclareSimdKindTy::Uniform;
+  }
+
+  // Populate params for aligned clause.
+  OperandRange operands = declareSimdOp.getAlignedVars();
+  std::optional<ArrayAttr> alignmentValues = declareSimdOp.getAlignments();
+  for (size_t i = 0; i < operands.size(); ++i) {
+    auto intAttr = cast<IntegerAttr>((*alignmentValues)[i]);
+    paramAttrs[argIndexMap[operands[i]]].Alignment =
+        llvm::APSInt(intAttr.getValue(), /*isUnsigned=*/true);
+  }
+
+  populateLinearParam(argIndexMap, declareSimdOp, paramAttrs);
+
+  llvm::APSInt vLenVal(llvm::APInt(/*numBits=*/64, /*val=*/0),
+                       /*isUnsigned=*/false);
+  if (std::optional<int64_t> simdlen = declareSimdOp.getSimdlen()) {
+    vLenVal = llvm::APSInt(llvm::APInt(/*numBits=*/64, *simdlen),
+                           /*isUnsigned=*/false);
+  }
+
+  const llvm::Triple &targetTriple = fn->getParent()->getTargetTriple();
+  llvm::OpenMPIRBuilder::DeclareSimdBranch branch =
+      getDeclareSimdBranch(declareSimdOp);
+  if (targetTriple.isX86()) {
+    unsigned numElts = evaluateCDTSize(fn, paramAttrs);
+    assert(numElts && "Non-zero simdlen/cdtsize expected");
+    ompBuilder->emitX86DeclareSimdFunction(fn, numElts, vLenVal, paramAttrs,
+                                           branch);
+  } else if (targetTriple.getArch() == llvm::Triple::aarch64) {
+    DataLayout dl(opInst.getParentOfType<ModuleOp>());
+
+    // Build a per-argument element type array from arg_types.
+    // This recovers pointee-type information for opaque !llvm.ptr params.
+    llvm::SmallVector<Type> argElemTypes(funcOp.getNumArguments());
+    if (std::optional<ArrayAttr> argTypeAttrs = declareSimdOp.getArgTypes()) {
+      for (auto [i, attr] : llvm::enumerate(*argTypeAttrs)) {
+        if (auto tyAttr = dyn_cast_if_present<TypeAttr>(attr))
+          argElemTypes[i] = tyAttr.getValue();
+      }
+    }
+
+    auto [nds, wds, outputBecomesInput] =
+        getNDSWDS(funcOp, paramAttrs, dl, argElemTypes);
+    unsigned vLen = vLenVal.getZExtValue();
+
+    auto hasTargetFeature = [&](llvm::StringRef feature) {
+      llvm::Attribute attr = fn->getFnAttribute("target-features");
+      if (!attr.isStringAttribute())
+        return false;
+
+      llvm::SmallVector<llvm::StringRef, 16> targetFeatures;
+      attr.getValueAsString().split(targetFeatures, ',', /*MaxSplit=*/-1,
+                                    /*KeepEmpty=*/false);
+
+      bool isEnabled = false;
+      for (llvm::StringRef targetFeature : targetFeatures) {
+        if (targetFeature.consume_front("+")) {
+          if (targetFeature == feature)
+            isEnabled = true;
+        } else if (targetFeature.consume_front("-")) {
+          if (targetFeature == feature)
+            isEnabled = false;
+        }
+      }
+      return isEnabled;
+    };
+
+    if (hasTargetFeature("sve")) {
+      if (validateAArch64SimdLen(opInst, fn, vLen, wds, 's')) {
+        ompBuilder->emitAArch64DeclareSimdFunction(
+            fn, vLen, paramAttrs, branch, 's', nds, outputBecomesInput);
+      }
+    } else if (hasTargetFeature("neon")) {
+      if (validateAArch64SimdLen(opInst, fn, vLen, wds, 'n')) {
+        ompBuilder->emitAArch64DeclareSimdFunction(
+            fn, vLen, paramAttrs, branch, 'n', nds, outputBecomesInput);
+      }
+    }
+  }
+
+  return success();
+}
+
 /// Given an OpenMP MLIR operation, create the corresponding LLVM IR (including
 /// OpenMP runtime calls).
 LogicalResult OpenMPDialectLLVMIRTranslationInterface::convertOperation(
@@ -7660,6 +8048,9 @@ LogicalResult OpenMPDialectLLVMIRTranslationInterface::convertOperation(
           .Case([&](omp::TargetFreeMemOp) {
             return convertTargetFreeMemOp(*op, builder, moduleTranslation);
           })
+          .Case([&](omp::DeclareSimdOp op) {
+            return convertDeclareSimdOp(*op, builder, moduleTranslation);
+          })
           .Default([&](Operation *inst) {
             return inst->emitError()
                    << "not yet implemented: " << inst->getName();
diff --git a/mlir/test/Dialect/OpenMP/ops.mlir b/mlir/test/Dialect/OpenMP/ops.mlir
index d924d479eba90..9778dcc8d7ea0 100644
--- a/mlir/test/Dialect/OpenMP/ops.mlir
+++ b/mlir/test/Dialect/OpenMP/ops.mlir
@@ -3603,6 +3603,22 @@ func.func @omp_declare_simd_all_clauses(%a: f64, %b: f64,
   return
 }
 
+// CHECK-LABEL: func.func @omp_declare_simd_arg_types
+func.func @omp_declare_simd_arg_types(%a: f64, %b: i32) -> () {
+  // CHECK: omp.declare_simd {arg_types = [f64, i32]}
+  omp.declare_simd {arg_types = [f64, i32]}
+  return
+}
+
+// CHECK-LABEL: func.func @omp_declare_simd_arg_types_with_linear
+func.func @omp_declare_simd_arg_types_with_linear(%a: f64, %b: !llvm.ptr, %step: i64) -> () {
+  // CHECK: omp.declare_simd
+  // CHECK-SAME: linear(ref(%{{.*}} : !llvm.ptr = %{{.*}} : i64))
+  // CHECK-SAME: {arg_types = [f64, i32], linear_var_types = [i32]}
+  omp.declare_simd linear(ref(%b : !llvm.ptr = %step : i64)) {arg_types = [f64, i32], linear_var_types = [i32]}
+  return
+}
+
 // CHECK-LABEL: func.func @task_affinity_single
 func.func @task_affinity_single(%ptr: !llvm.ptr) {
   // CHECK:         %[[LEN:.*]] = llvm.mlir.constant(400 : i64) : i64
diff --git a/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-01.mlir b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-01.mlir
new file mode 100644
index 0000000000000..eeb28e8fb1033
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-01.mlir
@@ -0,0 +1,314 @@
+// RUN: mlir-translate --mlir-to-llvmir %s | FileCheck %s
+
+module attributes {
+  llvm.target_triple = "aarch64-unknown-linux-gnu",
+  llvm.data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+} {
+  llvm.func @foo(%x: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    omp.declare_simd simdlen(2)
+    omp.declare_simd simdlen(6)
+    omp.declare_simd simdlen(8)
+    %0 = llvm.fpext %x : f32 to f64
+    llvm.return %0 : f64
+  }
+
+  // Make sure that the following two functions by default get generated
+  // with 4 and 2 lanes, as described in the vector ABI.
+
+  llvm.func @bar(%x: f64) -> f32 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd notinbranch
+    %0 = llvm.fptrunc %x : f64 to f32
+    llvm.return %0 : f32
+  }
+
+  llvm.func @baz(%x: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd notinbranch
+    %0 = llvm.fpext %x : f32 to f64
+    llvm.return %0 : f64
+  }
+
+  llvm.func @foo_int(%x: i32) -> i64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    omp.declare_simd simdlen(2)
+    omp.declare_simd simdlen(6)
+    omp.declare_simd simdlen(8)
+    %0 = llvm.sext %x : i32 to i64
+    llvm.return %0 : i64
+  }
+
+  llvm.func @simple_8bit(%x: i8) -> i8 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    llvm.return %x : i8
+  }
+
+  llvm.func @simple_16bit(%x: i16) -> i16 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    llvm.return %x : i16
+  }
+
+  llvm.func @simple_32bit(%x: i32) -> i32 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    llvm.return %x : i32
+  }
+
+  llvm.func @simple_64bit(%x: i64) -> i64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    llvm.return %x : i64
+  }
+
+  llvm.func @a01(%x: i32) -> i8 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    omp.declare_simd simdlen(32)
+    %0 = llvm.trunc %x : i32 to i8
+    llvm.return %0 : i8
+  }
+
+  llvm.func @a02(%x: i16) -> i64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    omp.declare_simd simdlen(2)
+    %0 = llvm.sext %x : i16 to i64
+    llvm.return %0 : i64
+  }
+
+  // ************
+  // * pointers *
+  // ************
+
+  llvm.func @b01(%x: !llvm.ptr) -> i32 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    %0 = llvm.mlir.constant(0 : i32) : i32
+    llvm.return %0 : i32
+  }
+
+  llvm.func @b02(%x: !llvm.ptr) -> i8 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    %0 = llvm.mlir.constant(0 : i8) : i8
+    llvm.return %0 : i8
+  }
+
+  llvm.func @b03(%x: !llvm.ptr) -> !llvm.ptr attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    llvm.return %x : !llvm.ptr
+  }
+
+  // ***********
+  // * masking *
+  // ***********
+
+  llvm.func @c01(%x: !llvm.ptr, %y: i16) -> i32 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd inbranch
+    %0 = llvm.sext %y : i16 to i32
+    llvm.return %0 : i32
+  }
+
+  llvm.func @c02(%x: !llvm.ptr, %y: i8) -> f64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd inbranch uniform(%x : !llvm.ptr)
+    %0 = llvm.sitofp %y : i8 to f64
+    llvm.return %0 : f64
+  }
+
+  // ************************************
+  // * Linear with a constant parameter *
+  // ************************************
+
+  llvm.func @constlinear(%i: i32) -> f64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c1 = llvm.mlir.constant(1 : i32) : i32
+    omp.declare_simd notinbranch linear(%i : i32 = %c1 : i32)
+    %zero = llvm.mlir.constant(0.0 : f64) : f64
+    llvm.return %zero : f64
+  }
+
+  // *************************
+  // * sincos-like signature *
+  // *************************
+
+  llvm.func @sincos(%in: f64, %sin: !llvm.ptr, %cos: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c8 = llvm.mlir.constant(8 : i64) : i64
+    omp.declare_simd linear(%sin : !llvm.ptr = %c8 : i64,
+                            %cos : !llvm.ptr = %c8 : i64)
+    llvm.return
+  }
+
+  llvm.func @SinCos(%in: f64, %sin: !llvm.ptr, %cos: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c8 = llvm.mlir.constant(8 : i64) : i64
+    %c16 = llvm.mlir.constant(16 : i64) : i64
+    omp.declare_simd linear(%sin : !llvm.ptr = %c8 : i64,
+                            %cos : !llvm.ptr = %c16 : i64)
+    llvm.return
+  }
+
+  // ************************************
+  // * linear(val), linear(ref), linear(uval) *
+  // ************************************
+
+  // Listing 2 adapted: linear(val) on a sincos-like signature.
+  // val modifier on !llvm.ptr -> LinearVal -> mangled as 'L'.
+  // NDS=64 (f64 and pointers are 64-bit), no simdlen -> VLEN from NDS.
+  llvm.func @sincos_val(%in: f64, %sin: !llvm.ptr, %cos: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c8 = llvm.mlir.constant(8 : i64) : i64
+    omp.declare_simd linear(val(%sin : !llvm.ptr = %c8 : i64),
+                            val(%cos : !llvm.ptr = %c8 : i64))
+    llvm.return
+  }
+
+  // Listing 3 adapted: linear(ref) on a pointer parameter.
+  // ref modifier -> LinearRef -> mangled as 'R'.
+  // MTV is false for LinearRef, NDS comes from PBV of args.
+  llvm.func @sincos_ref(%in: f64, %sin: !llvm.ptr, %cos: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c8 = llvm.mlir.constant(8 : i64) : i64
+    omp.declare_simd linear(ref(%sin : !llvm.ptr = %c8 : i64),
+                            ref(%cos : !llvm.ptr = %c8 : i64))
+    llvm.return
+  }
+
+  // linear(uval) — mangled as 'U'.
+  // MTV is false for LinearUVal.
+  llvm.func @sincos_uval(%in: f64, %sin: !llvm.ptr, %cos: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c8 = llvm.mlir.constant(8 : i64) : i64
+    omp.declare_simd linear(uval(%sin : !llvm.ptr = %c8 : i64),
+                            uval(%cos : !llvm.ptr = %c8 : i64))
+    llvm.return
+  }
+
+  // Selection of tests based on the examples provided in chapter 5 of
+  // the Vector Function ABI specifications for AArch64, at
+  // https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi.
+
+  // Listing 6, p. 19
+  llvm.func @foo4(%x: !llvm.ptr, %y: f32) -> i32 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c4 = llvm.mlir.constant(4 : i64) : i64
+    %c0 = llvm.mlir.constant(0 : i32) : i32
+    omp.declare_simd linear(%x : !llvm.ptr = %c4 : i64)
+                     aligned(%x : !llvm.ptr -> 16 : i64)
+                     simdlen(4)
+    llvm.return %c0 : i32
+  }
+
+  llvm.func @DoRGB(%x: !llvm.struct<(i8, i8, i8)>) -> !llvm.struct<(i8, i8, i8)> attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd notinbranch
+    llvm.return %x : !llvm.struct<(i8, i8, i8)>
+  }
+
+  // ********************************
+  // * arg_types for NDS/WDS fix    *
+  // ********************************
+
+  // For LinearRef/LinearUVal, MTV=false.  Without arg_types,
+  // opaque !llvm.ptr gives LS=sizeof(ptr)=64.  With arg_types
+  // = [f64, i32, i32], the ptr params' LS becomes sizeof(i32)=32,
+  // so NDS=min(64,32)=32 -> VLEN={2,4} instead of just {2}.
+  llvm.func @sincos_ref_lvt(%in: f64, %sin: !llvm.ptr, %cos: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c4 = llvm.mlir.constant(4 : i64) : i64
+    omp.declare_simd linear(ref(%sin : !llvm.ptr = %c4 : i64),
+                            ref(%cos : !llvm.ptr = %c4 : i64)) {arg_types = [f64, i32, i32]}
+    llvm.return
+  }
+
+  // Same but without arg_types: LS=64 (ptr) -> NDS=64 -> VLEN={2} only
+  llvm.func @sincos_ref_no_lvt(%in: f64, %sin: !llvm.ptr, %cos: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c4 = llvm.mlir.constant(4 : i64) : i64
+    omp.declare_simd linear(ref(%sin : !llvm.ptr = %c4 : i64),
+                            ref(%cos : !llvm.ptr = %c4 : i64))
+    llvm.return
+  }
+}
+
+// CHECK: attributes {{#[0-9]+}} = {
+// CHECK-SAME: "_ZGVnM2v_foo"
+// CHECK-SAME: "_ZGVnM4v_foo"
+// CHECK-SAME: "_ZGVnM8v_foo"
+// CHECK-SAME: "_ZGVnN2v_foo"
+// CHECK-SAME: "_ZGVnN4v_foo"
+// CHECK-SAME: "_ZGVnN8v_foo"
+// CHECK-SAME: "target-features"="+neon"
+// CHECK-SAME: }
+// CHECK-NOT: _ZGVnN6v_foo
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnN2v_bar" "_ZGVnN4v_bar" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnN2v_baz" "_ZGVnN4v_baz" "target-features"="+neon" }
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2v_foo_int" "_ZGVnM4v_foo_int" "_ZGVnM8v_foo_int" "_ZGVnN2v_foo_int" "_ZGVnN4v_foo_int" "_ZGVnN8v_foo_int" "target-features"="+neon" }
+// CHECK-NOT: _ZGVnN6v_foo_int
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM16v_simple_8bit" "_ZGVnM8v_simple_8bit" "_ZGVnN16v_simple_8bit" "_ZGVnN8v_simple_8bit" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM4v_simple_16bit" "_ZGVnM8v_simple_16bit" "_ZGVnN4v_simple_16bit" "_ZGVnN8v_simple_16bit" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2v_simple_32bit" "_ZGVnM4v_simple_32bit" "_ZGVnN2v_simple_32bit" "_ZGVnN4v_simple_32bit" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2v_simple_64bit" "_ZGVnN2v_simple_64bit" "target-features"="+neon" }
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM16v_a01" "_ZGVnM32v_a01" "_ZGVnM8v_a01" "_ZGVnN16v_a01" "_ZGVnN32v_a01" "_ZGVnN8v_a01" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2v_a02" "_ZGVnM4v_a02" "_ZGVnM8v_a02" "_ZGVnN2v_a02" "_ZGVnN4v_a02" "_ZGVnN8v_a02" "target-features"="+neon" }
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2v_b01" "_ZGVnM4v_b01" "_ZGVnN2v_b01" "_ZGVnN4v_b01" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM16v_b02" "_ZGVnM8v_b02" "_ZGVnN16v_b02" "_ZGVnN8v_b02" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2v_b03" "_ZGVnN2v_b03" "target-features"="+neon" }
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM4vv_c01" "_ZGVnM8vv_c01" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM16uv_c02" "_ZGVnM8uv_c02" "target-features"="+neon" }
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnN2l_constlinear" "_ZGVnN4l_constlinear" "target-features"="+neon" }
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2vl8l8_sincos" "_ZGVnN2vl8l8_sincos" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2vl8l16_SinCos" "_ZGVnN2vl8l16_SinCos" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2vL8L8_sincos_val" "_ZGVnN2vL8L8_sincos_val" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2vR8R8_sincos_ref" "_ZGVnN2vR8R8_sincos_ref" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2vU8U8_sincos_uval" "_ZGVnN2vU8U8_sincos_uval" "target-features"="+neon" }
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM4l4a16v_foo4" "_ZGVnN4l4a16v_foo4" "target-features"="+neon" }
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnN2vv_DoRGB" "target-features"="+neon" }
+
+// arg_types gives LS=32 (from i32) -> NDS=32 -> VLEN={2,4}
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2vR4R4_sincos_ref_lvt" "_ZGVnM4vR4R4_sincos_ref_lvt" "_ZGVnN2vR4R4_sincos_ref_lvt" "_ZGVnN4vR4R4_sincos_ref_lvt" "target-features"="+neon" }
+
+// Without arg_types: LS=64 (ptr) -> NDS=64 -> VLEN={2} only
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVnM2vR4R4_sincos_ref_no_lvt" "_ZGVnN2vR4R4_sincos_ref_no_lvt" "target-features"="+neon" }
diff --git a/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-02.mlir b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-02.mlir
new file mode 100644
index 0000000000000..93cc02abe2dcc
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-02.mlir
@@ -0,0 +1,39 @@
+// RUN: mlir-translate --mlir-to-llvmir %s | FileCheck %s
+
+module attributes {
+  llvm.target_triple = "aarch64-unknown-linux-gnu",
+  llvm.data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+} {
+  llvm.func @"_Z1fd"(%x: f64) -> f64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    llvm.return %x : f64
+  }
+
+  llvm.func @"_Z1ff"(%x: f32) -> f32 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd
+    llvm.return %x : f32
+  }
+
+  llvm.func @"_Z1gd"(%x: f64) -> f64 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd
+    llvm.return %x : f64
+  }
+
+  llvm.func @"_Z1gf"(%x: f32) -> f32 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd
+    llvm.return %x : f32
+  }
+}
+
+// CHECK-DAG: { "_ZGVnM2v__Z1fd" "_ZGVnN2v__Z1fd" "target-features"="+neon" }
+// CHECK-DAG: { "_ZGVnM2v__Z1ff" "_ZGVnM4v__Z1ff" "_ZGVnN2v__Z1ff" "_ZGVnN4v__Z1ff" "target-features"="+neon" }
+// CHECK-DAG: { "_ZGVsMxv__Z1gd" "target-features"="+sve" }
+// CHECK-DAG: { "_ZGVsMxv__Z1gf" "target-features"="+sve" }
diff --git a/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-nds.mlir b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-nds.mlir
new file mode 100644
index 0000000000000..be1a9dc95bafc
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-nds.mlir
@@ -0,0 +1,86 @@
+// RUN: mlir-translate --mlir-to-llvmir %s 2>&1 | FileCheck %s
+//
+// Tests for Narrowest Data Size (NDS) on AArch64 AdvSIMD.
+//
+// NDS determines <vlen> for AdvSIMD when no simdlen is specified:
+//   NDS=1 -> VLEN=16,8; NDS=2 -> VLEN=8,4;
+//   NDS=4 -> VLEN=4,2;  NDS>=8 -> VLEN=2.
+
+module attributes {
+  llvm.target_triple = "aarch64-unknown-linux-gnu",
+  llvm.data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+} {
+
+  // NDS = sizeof(char) = 1, ret=i8, arg=i16 -> min(1,2) = 1
+  llvm.func @NDS_is_sizeof_char(%in: i16) -> i8 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd notinbranch
+    %0 = llvm.trunc %in : i16 to i8
+    llvm.return %0 : i8
+  }
+
+  // NDS = sizeof(short) = 2, ret=i32, arg=i16 -> min(4,2) = 2
+  llvm.func @NDS_is_sizeof_short(%in: i16) -> i32 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd notinbranch
+    %0 = llvm.sext %in : i16 to i32
+    llvm.return %0 : i32
+  }
+
+  // NDS = sizeof(float) = 4, linear ptr to float -> pointee size = 4
+  // Without linear, ptr NDS would be 8 (pointer size). With linear,
+  // NDS uses pointee size via arg_types.
+  llvm.func @NDS_is_sizeof_float_with_linear(%in: f64, %sin: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c4 = llvm.mlir.constant(4 : i64) : i64
+    omp.declare_simd notinbranch
+                     linear(%sin : !llvm.ptr = %c4 : i64) {arg_types = [f64, f32]}
+    llvm.return
+  }
+
+  // NDS = sizeof(float) = 4, ret=f64, arg=f32 -> min(8,4) = 4
+  llvm.func @NDS_is_size_of_float(%in: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd notinbranch
+    %0 = llvm.fpext %in : f32 to f64
+    llvm.return %0 : f64
+  }
+
+  // NDS = sizeof(double) = 8, linear ptr to double -> pointee size = 8
+  llvm.func @NDS_is_sizeof_double(%in: f64, %sin: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    %c8 = llvm.mlir.constant(8 : i64) : i64
+    omp.declare_simd notinbranch
+                     linear(%sin : !llvm.ptr = %c8 : i64) {arg_types = [f64, f64]}
+    llvm.return
+  }
+}
+
+// NDS=1 -> VLEN=16,8
+// CHECK-DAG: _ZGVnN16v_NDS_is_sizeof_char
+// CHECK-DAG: _ZGVnN8v_NDS_is_sizeof_char
+// CHECK-NOT: _ZGV{{.*}}_NDS_is_sizeof_char
+
+// NDS=2 -> VLEN=8,4
+// CHECK-DAG: _ZGVnN8v_NDS_is_sizeof_short
+// CHECK-DAG: _ZGVnN4v_NDS_is_sizeof_short
+// CHECK-NOT: _ZGV{{.*}}_NDS_is_sizeof_short
+
+// NDS=4 (linear float pointee) -> VLEN=4,2
+// CHECK-DAG: _ZGVnN4vl4_NDS_is_sizeof_float_with_linear
+// CHECK-DAG: _ZGVnN2vl4_NDS_is_sizeof_float_with_linear
+// CHECK-NOT: _ZGV{{.*}}_NDS_is_sizeof_float_with_linear
+
+// NDS=4 -> VLEN=4,2
+// CHECK-DAG: _ZGVnN4v_NDS_is_size_of_float
+// CHECK-DAG: _ZGVnN2v_NDS_is_size_of_float
+// CHECK-NOT: _ZGV{{.*}}_NDS_is_size_of_float
+
+// NDS=8 -> VLEN=2
+// CHECK-DAG: _ZGVnN2vl8_NDS_is_sizeof_double
+// CHECK-NOT: _ZGV{{.*}}_NDS_is_sizeof_double
diff --git a/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-sve.mlir b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-sve.mlir
new file mode 100644
index 0000000000000..6e6028ecedd2d
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-sve.mlir
@@ -0,0 +1,37 @@
+// RUN: mlir-translate --mlir-to-llvmir %s 2>&1 | FileCheck %s
+
+module attributes {
+  llvm.target_triple = "aarch64-unknown-linux-gnu",
+  llvm.data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+} {
+  llvm.func @foo(%x: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd
+    omp.declare_simd notinbranch
+    omp.declare_simd simdlen(2)
+    omp.declare_simd simdlen(4)
+    omp.declare_simd simdlen(5) // not a multiple of 128-bits
+    omp.declare_simd simdlen(6)
+    omp.declare_simd simdlen(8)
+    omp.declare_simd simdlen(32)
+    omp.declare_simd simdlen(34) // requires more than 2048 bits
+    %0 = llvm.fpext %x : f32 to f64
+    llvm.return %0 : f64
+  }
+
+  llvm.func @a01_fun(%x: i32) -> i8 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd notinbranch
+    %0 = llvm.mlir.constant(0 : i8) : i8
+    llvm.return %0 : i8
+  }
+}
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVsM2v_foo" "_ZGVsM32v_foo" "_ZGVsM4v_foo" "_ZGVsM6v_foo" "_ZGVsM8v_foo" "_ZGVsMxv_foo" "target-features"="+sve" }
+// CHECK-NOT: _ZGVsN
+// CHECK-NOT: _ZGVsM5v_foo
+// CHECK-NOT: _ZGVsM34v_foo
+
+// CHECK-DAG: attributes {{#[0-9]+}} = { "_ZGVsMxv_a01_fun" "target-features"="+sve" }
diff --git a/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-warnings.mlir b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-warnings.mlir
new file mode 100644
index 0000000000000..fc9df1fc3559e
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-warnings.mlir
@@ -0,0 +1,74 @@
+// RUN: mlir-translate --mlir-to-llvmir %s 2>&1 | FileCheck %s
+
+module attributes {
+  llvm.target_triple = "aarch64-unknown-linux-gnu",
+  llvm.data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+} {
+
+  // CHECK: warning: AArch64 Advanced SIMD declare simd requires simdlen to be a power of 2
+  llvm.func @advsimd_non_power2(%x: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd simdlen(6)
+    %0 = llvm.fpext %x : f32 to f64
+    llvm.return %0 : f64
+  }
+
+  // CHECK: warning: simdlen(1) has no effect on AArch64 declare simd
+  llvm.func @advsimd_simdlen1(%x: f64) -> f32 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd simdlen(1)
+    %0 = llvm.fptrunc %x : f64 to f32
+    llvm.return %0 : f32
+  }
+
+  // CHECK: warning: simdlen(1) has no effect on AArch64 declare simd
+  llvm.func @sve_simdlen1(%x: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd simdlen(1)
+    %0 = llvm.fpext %x : f32 to f64
+    llvm.return %0 : f64
+  }
+
+  // WDS=64 (f64 ret), 5*64=320 which is not a multiple of 128.
+  // CHECK: warning: AArch64 SVE fixed-length declare simd simdlen must fit architectural lane limits for element width 64
+  llvm.func @sve_not_multiple_128(%x: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd simdlen(5)
+    %0 = llvm.fpext %x : f32 to f64
+    llvm.return %0 : f64
+  }
+
+  // WDS=64 (f64 ret), 34*64=2176 > 2048.
+  // CHECK: warning: AArch64 SVE fixed-length declare simd simdlen must fit architectural lane limits for element width 64
+  llvm.func @sve_exceeds_2048(%x: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd simdlen(34)
+    %0 = llvm.fpext %x : f32 to f64
+    llvm.return %0 : f64
+  }
+
+  // valid AdvSIMD simdlen (power of 2, > 1).
+  // CHECK-NOT: warning:{{.*}}advsimd_valid
+  llvm.func @advsimd_valid(%x: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+neon"]>
+  } {
+    omp.declare_simd simdlen(4)
+    %0 = llvm.fpext %x : f32 to f64
+    llvm.return %0 : f64
+  }
+
+  // valid SVE simdlen (2*64=128, multiple of 128 and <= 2048).
+  // CHECK-NOT: warning:{{.*}}sve_valid
+  llvm.func @sve_valid(%x: f32) -> f64 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd simdlen(2)
+    %0 = llvm.fpext %x : f32 to f64
+    llvm.return %0 : f64
+  }
+}
diff --git a/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-wds.mlir b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-wds.mlir
new file mode 100644
index 0000000000000..305d549c0af5b
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-declare-simd-aarch64-wds.mlir
@@ -0,0 +1,108 @@
+// RUN: mlir-translate --mlir-to-llvmir %s 2>&1 | FileCheck %s
+//
+// Tests for Widest Data Size (WDS) on AArch64 SVE.
+//
+// WDS is used to check accepted values <N> of simdlen(<N>) when targeting
+// fixed-length SVE vector function names. For X = WDS * <N> * 8,
+// 128-bit <= X <= 2048-bit and X must be a multiple of 128-bit.
+
+module attributes {
+  llvm.target_triple = "aarch64-unknown-linux-gnu",
+  llvm.data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+} {
+
+  // WDS = sizeof(char) = 1, simdlen(8) and simdlen(272) are invalid.
+  llvm.func @WDS_is_sizeof_char(%in: i8) -> i8 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd simdlen(8)
+    omp.declare_simd simdlen(16)
+    omp.declare_simd simdlen(256)
+    omp.declare_simd simdlen(272)
+    llvm.return %in : i8
+  }
+
+  // WDS = sizeof(short) = 2, simdlen(4) and simdlen(136) are invalid.
+  llvm.func @WDS_is_sizeof_short(%in: i16) -> i8 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd simdlen(4)
+    omp.declare_simd simdlen(8)
+    omp.declare_simd simdlen(128)
+    omp.declare_simd simdlen(136)
+    %0 = llvm.trunc %in : i16 to i8
+    llvm.return %0 : i8
+  }
+
+  // WDS = sizeof(float) = 4 because of the linear clause on float pointer.
+  // simdlen(2) and simdlen(68) are invalid.
+  llvm.func @WDS_is_sizeof_float_pointee(%in: f32, %sin: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    %c4 = llvm.mlir.constant(4 : i64) : i64
+    omp.declare_simd notinbranch simdlen(2)
+                     linear(%sin : !llvm.ptr = %c4 : i64) {arg_types = [f32, f32]}
+    omp.declare_simd notinbranch simdlen(4)
+                     linear(%sin : !llvm.ptr = %c4 : i64) {arg_types = [f32, f32]}
+    omp.declare_simd notinbranch simdlen(64)
+                     linear(%sin : !llvm.ptr = %c4 : i64) {arg_types = [f32, f32]}
+    omp.declare_simd notinbranch simdlen(68)
+                     linear(%sin : !llvm.ptr = %c4 : i64) {arg_types = [f32, f32]}
+    llvm.return
+  }
+
+  // WDS = sizeof(double) = 8 because of the linear clause on double pointer.
+  // simdlen(34) is invalid.
+  llvm.func @WDS_is_sizeof_double_pointee(%in: f32, %sin: !llvm.ptr) attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    %c8 = llvm.mlir.constant(8 : i64) : i64
+    omp.declare_simd notinbranch simdlen(2)
+                     linear(%sin : !llvm.ptr = %c8 : i64) {arg_types = [f32, f64]}
+    omp.declare_simd notinbranch simdlen(4)
+                     linear(%sin : !llvm.ptr = %c8 : i64) {arg_types = [f32, f64]}
+    omp.declare_simd notinbranch simdlen(32)
+                     linear(%sin : !llvm.ptr = %c8 : i64) {arg_types = [f32, f64]}
+    omp.declare_simd notinbranch simdlen(34)
+                     linear(%sin : !llvm.ptr = %c8 : i64) {arg_types = [f32, f64]}
+    llvm.return
+  }
+
+  // WDS = sizeof(double) = 8, simdlen(34) is invalid.
+  llvm.func @WDS_is_sizeof_double(%in: f64) -> f64 attributes {
+    target_features = #llvm.target_features<["+sve"]>
+  } {
+    omp.declare_simd simdlen(2)
+    omp.declare_simd simdlen(4)
+    omp.declare_simd simdlen(32)
+    omp.declare_simd simdlen(34)
+    llvm.return %in : f64
+  }
+}
+
+// WDS=1: simdlen(8) -> X=64 < 128: invalid; simdlen(272) -> X=2176 > 2048: invalid
+// CHECK-DAG: _ZGVsM16v_WDS_is_sizeof_char
+// CHECK-DAG: _ZGVsM256v_WDS_is_sizeof_char
+// CHECK-NOT: _ZGV{{.*}}_WDS_is_sizeof_char
+
+// WDS=2: simdlen(4) -> X=64 < 128: invalid; simdlen(136) -> X=2176 > 2048: invalid
+// CHECK-DAG: _ZGVsM8v_WDS_is_sizeof_short
+// CHECK-DAG: _ZGVsM128v_WDS_is_sizeof_short
+// CHECK-NOT: _ZGV{{.*}}_WDS_is_sizeof_short
+
+// WDS=4: simdlen(2) -> X=64 < 128: invalid; simdlen(68) -> X=2176 > 2048: invalid
+// CHECK-DAG: _ZGVsM4vl4_WDS_is_sizeof_float_pointee
+// CHECK-DAG: _ZGVsM64vl4_WDS_is_sizeof_float_pointee
+// CHECK-NOT: _ZGV{{.*}}_WDS_is_sizeof_float_pointee
+
+// WDS=8: simdlen(34) -> X=2176 > 2048: invalid
+// CHECK-DAG: _ZGVsM2vl8_WDS_is_sizeof_double_pointee
+// CHECK-DAG: _ZGVsM4vl8_WDS_is_sizeof_double_pointee
+// CHECK-DAG: _ZGVsM32vl8_WDS_is_sizeof_double_pointee
+// CHECK-NOT: _ZGV{{.*}}_WDS_is_sizeof_double_pointee
+
+// WDS=8: simdlen(34) -> X=2176 > 2048: invalid
+// CHECK-DAG: _ZGVsM2v_WDS_is_sizeof_double
+// CHECK-DAG: _ZGVsM4v_WDS_is_sizeof_double
+// CHECK-DAG: _ZGVsM32v_WDS_is_sizeof_double
+// CHECK-NOT: _ZGV{{.*}}_WDS_is_sizeof_double
diff --git a/mlir/test/Target/LLVMIR/openmp-declare-simd-x86.mlir b/mlir/test/Target/LLVMIR/openmp-declare-simd-x86.mlir
new file mode 100644
index 0000000000000..400c849a9e3ca
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-declare-simd-x86.mlir
@@ -0,0 +1,610 @@
+// RUN: mlir-translate --mlir-to-llvmir %s | FileCheck %s
+//
+// For x86 mangling: _ZGV <ISA> <Mask> <VLEN> <ParamAttrs> _ <FunctionName>
+//   ISA:  b=SSE, c=AVX, d=AVX2, e=AVX-512
+//   Mask: M=inbranch, N=notinbranch, both if unspecified
+//   ParamAttrs: v=vector, u=uniform, l=linear, L=linear(val),
+//               U=linear(uval), R=linear(ref), sN=var-stride(argN),
+//               aN=aligned(N)
+
+module attributes {
+  llvm.target_triple = "x86_64-unknown-linux-gnu",
+  llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+} {
+
+  llvm.func @add_1(%d: !llvm.ptr) {
+    %c32 = llvm.mlir.constant(32 : i64) : i64
+
+    omp.declare_simd linear(%d : !llvm.ptr = %c32 : i64)
+    omp.declare_simd simdlen(32) inbranch
+    omp.declare_simd notinbranch
+
+    llvm.return
+  }
+
+  llvm.func @h_int(%hp: !llvm.ptr, %hp2: !llvm.ptr, %hq: !llvm.ptr,
+                   %lin: !llvm.ptr) {
+    omp.declare_simd aligned(%hp : !llvm.ptr -> 16 : i64,
+                             %hp2 : !llvm.ptr -> 16 : i64)
+    llvm.return
+  }
+
+  llvm.func @h_float(%hp: !llvm.ptr, %hp2: !llvm.ptr, %hq: !llvm.ptr,
+                     %lin: !llvm.ptr) {
+    omp.declare_simd aligned(%hp : !llvm.ptr -> 16 : i64,
+                             %hp2 : !llvm.ptr -> 16 : i64)
+    llvm.return
+  }
+
+  llvm.func @VV_add(%this: !llvm.ptr, %a: !llvm.ptr, %b: i32) -> i32 {
+    %a_val = llvm.load %a : !llvm.ptr -> i32
+    omp.declare_simd uniform(%this : !llvm.ptr, %a : !llvm.ptr)
+                     linear(val(%b : i32 = %a_val : i32))
+    %r = llvm.mlir.constant(0 : i32) : i32
+    llvm.return %r : i32
+  }
+
+  llvm.func @VV_taddpf(%this: !llvm.ptr, %a: !llvm.ptr, %b: !llvm.ptr) -> f32 {
+    %c40 = llvm.mlir.constant(40 : i64) : i64
+    %c4  = llvm.mlir.constant(4 : i64) : i64
+    %c32 = llvm.mlir.constant(32 : i64) : i64
+    omp.declare_simd aligned(%a : !llvm.ptr -> 16 : i64,
+                             %b : !llvm.ptr -> 4 : i64)
+                     linear(%this : !llvm.ptr = %c40 : i64,
+                            %a : !llvm.ptr = %c4 : i64,
+                            ref(%b : !llvm.ptr = %c32 : i64))
+    %zero = llvm.mlir.constant(0.0 : f32) : f32
+    llvm.return %zero : f32
+  }
+
+  llvm.func @VV_tadd(%this: !llvm.ptr, %b: !llvm.ptr, %c: !llvm.ptr) -> i32 {
+    %c8 = llvm.mlir.constant(8 : i64) : i64
+    // #pragma omp declare simd linear(uval(c) : 8)
+    omp.declare_simd linear(uval(%c : !llvm.ptr = %c8 : i64))
+    // #pragma omp declare simd aligned(b : 8)
+    omp.declare_simd aligned(%b : !llvm.ptr -> 8 : i64)
+    %zero = llvm.mlir.constant(0 : i32) : i32
+    llvm.return %zero : i32
+  }
+
+  // aligned(a:32), aligned(b:16)
+  // linear(ref(b):16) -> ref, step=16, ptr rescale: 16*sizeof(float*)=8 -> stride=128
+  llvm.func @TVV_taddpf(%this: !llvm.ptr, %a: !llvm.ptr, %b: !llvm.ptr) -> f32 {
+    %c128 = llvm.mlir.constant(128 : i64) : i64
+    omp.declare_simd aligned(%a : !llvm.ptr -> 32 : i64,
+                             %b : !llvm.ptr -> 16 : i64)
+                     linear(ref(%b : !llvm.ptr = %c128 : i64))
+    %zero = llvm.mlir.constant(0.0 : f32) : f32
+    llvm.return %zero : f32
+  }
+
+  llvm.func @TVV_tadd(%this: !llvm.ptr, %b: !llvm.ptr) -> i32 {
+    omp.declare_simd simdlen(16)
+    omp.declare_simd uniform(%this : !llvm.ptr, %b : !llvm.ptr)
+    %zero = llvm.mlir.constant(0 : i32) : i32
+    llvm.return %zero : i32
+  }
+
+  llvm.func @foo_tmpl(%b: !llvm.ptr, %c: !llvm.ptr) {
+    %c64 = llvm.mlir.constant(64 : i64) : i64
+    omp.declare_simd simdlen(64)
+                     aligned(%b : !llvm.ptr -> 128 : i64)
+                     linear(uval(%c : !llvm.ptr = %c64 : i64))
+    llvm.return
+  }
+
+  llvm.func @A_infunc(%this: !llvm.ptr, %a: i32) -> i32 {
+    %c8 = llvm.mlir.constant(8 : i32) : i32
+    omp.declare_simd linear(%a : i32 = %c8 : i32)
+    llvm.return %a : i32
+  }
+
+  // linear(a:4) -> a is ptr -> Linear; step=4, ptr rescale: 4*sizeof(float)=4 -> stride=16
+  llvm.func @A_outfunc(%this: !llvm.ptr, %a: !llvm.ptr) -> f32 {
+    %c16 = llvm.mlir.constant(16 : i64) : i64
+    omp.declare_simd linear(%a : !llvm.ptr = %c16 : i64)
+    %zero = llvm.mlir.constant(0.0 : f32) : f32
+    llvm.return %zero : f32
+  }
+
+  llvm.func @bar(%v: !llvm.ptr, %a: !llvm.ptr) -> i32 {
+    omp.declare_simd
+    omp.declare_simd notinbranch aligned(%a : !llvm.ptr -> 32 : i64)
+    %zero = llvm.mlir.constant(0 : i32) : i32
+    llvm.return %zero : i32
+  }
+
+  llvm.func @baz(%v: !llvm.ptr, %a: !llvm.ptr) -> f32 {
+    omp.declare_simd
+    omp.declare_simd notinbranch aligned(%a : !llvm.ptr -> 16 : i64)
+    %zero = llvm.mlir.constant(0.0 : f32) : f32
+    llvm.return %zero : f32
+  }
+
+  llvm.func @bay(%v: !llvm.ptr, %a: !llvm.ptr) -> f64 {
+    omp.declare_simd
+    omp.declare_simd notinbranch aligned(%a : !llvm.ptr -> 16 : i64)
+    %zero = llvm.mlir.constant(0.0 : f64) : f64
+    llvm.return %zero : f64
+  }
+
+  llvm.func @bax(%v: !llvm.ptr, %a: !llvm.ptr, %b: !llvm.ptr) {
+    %b_val = llvm.load %b : !llvm.ptr -> i32
+    omp.declare_simd
+    omp.declare_simd inbranch
+                     uniform(%v : !llvm.ptr, %b : !llvm.ptr)
+                     linear(%a : !llvm.ptr = %b_val : i32)
+    llvm.return
+  }
+
+  llvm.func @foo_scalar(%q: !llvm.ptr, %x: f32, %k: i32) -> f32 {
+    %c1 = llvm.mlir.constant(1 : i32) : i32
+    omp.declare_simd uniform(%q : !llvm.ptr)
+                     aligned(%q : !llvm.ptr -> 16 : i64)
+                     linear(%k : i32 = %c1 : i32)
+    %zero = llvm.mlir.constant(0.0 : f32) : f32
+    llvm.return %zero : f32
+  }
+
+  llvm.func @foo_double(%x: f64) -> f64 {
+    omp.declare_simd notinbranch
+    llvm.return %x : f64
+  }
+
+  llvm.func @constlinear(%i: i32) -> f64 {
+    %c1 = llvm.mlir.constant(1 : i32) : i32
+    omp.declare_simd notinbranch linear(%i : i32 = %c1 : i32)
+    %zero = llvm.mlir.constant(0.0 : f64) : f64
+    llvm.return %zero : f64
+  }
+
+  llvm.func @One(%a: !llvm.ptr, %b: !llvm.ptr, %c: i32,
+                 %d: !llvm.ptr, %e: !llvm.ptr, %f: i32) -> f64 {
+    %c2  = llvm.mlir.constant(2 : i64) : i64
+    %c16 = llvm.mlir.constant(16 : i64) : i64
+    %c8  = llvm.mlir.constant(8 : i32) : i32
+    %c1  = llvm.mlir.constant(1 : i64) : i64
+    %c4  = llvm.mlir.constant(4 : i64) : i64
+    %c1i = llvm.mlir.constant(1 : i32) : i32
+    omp.declare_simd simdlen(4)
+       linear(%a : !llvm.ptr = %c2 : i64,
+              %b : !llvm.ptr = %c16 : i64,
+              %c : i32 = %c8 : i32,
+              %d : !llvm.ptr = %c1 : i64,
+              %e : !llvm.ptr = %c4 : i64,
+              %f : i32 = %c1i : i32)
+    %zero = llvm.mlir.constant(0.0 : f64) : f64
+    llvm.return %zero : f64
+  }
+
+  llvm.func @Two(%a: !llvm.ptr, %b: !llvm.ptr, %c: i32,
+                 %d: !llvm.ptr, %e: !llvm.ptr, %f: i32) -> f64 {
+    %c2  = llvm.mlir.constant(2 : i64) : i64
+    %c16 = llvm.mlir.constant(16 : i64) : i64
+    %c8  = llvm.mlir.constant(8 : i32) : i32
+    %c1  = llvm.mlir.constant(1 : i64) : i64
+    %c4  = llvm.mlir.constant(4 : i64) : i64
+    %c1i = llvm.mlir.constant(1 : i32) : i32
+    omp.declare_simd simdlen(4)
+       linear(val(%a : !llvm.ptr = %c2 : i64),
+              val(%b : !llvm.ptr = %c16 : i64),
+              val(%c : i32 = %c8 : i32),
+              val(%d : !llvm.ptr = %c1 : i64),
+              val(%e : !llvm.ptr = %c4 : i64),
+              val(%f : i32 = %c1i : i32))
+    %zero = llvm.mlir.constant(0.0 : f64) : f64
+    llvm.return %zero : f64
+  }
+
+  llvm.func @Three(%a: !llvm.ptr, %b: !llvm.ptr) -> f64 {
+    %c2 = llvm.mlir.constant(2 : i64) : i64
+    %c1 = llvm.mlir.constant(1 : i64) : i64
+    omp.declare_simd simdlen(4)
+       linear(uval(%a : !llvm.ptr = %c2 : i64),
+              uval(%b : !llvm.ptr = %c1 : i64))
+    %zero = llvm.mlir.constant(0.0 : f64) : f64
+    llvm.return %zero : f64
+  }
+
+  llvm.func @Four(%a: !llvm.ptr, %b: !llvm.ptr) -> f64 {
+    %c8 = llvm.mlir.constant(8 : i64) : i64
+    %c4 = llvm.mlir.constant(4 : i64) : i64
+    omp.declare_simd simdlen(4)
+       linear(ref(%a : !llvm.ptr = %c8 : i64),
+              ref(%b : !llvm.ptr = %c4 : i64))
+    %zero = llvm.mlir.constant(0.0 : f64) : f64
+    llvm.return %zero : f64
+  }
+
+  // ParamAttrs:
+  //   a: uniform -> u
+  //   b: linear(b:2) -> ptr, no modifier -> Linear, step=2 -> l2
+  //   c: linear(c:a) -> ptr, no modifier -> Linear, var_stride=arg0 -> ls0
+  //   d: linear(val(d):4) -> LinearVal, step=4 -> L4
+  //   e: linear(val(e):a) -> LinearVal, var_stride=arg0 -> Ls0
+  //   f: linear(uval(f):8) -> LinearUVal, step=8 -> U8
+  //   g: linear(uval(g):a) -> LinearUVal, var_stride=arg0 -> Us0
+  //   h: linear(ref(h):32) -> LinearRef, pre-rescaled stride=32 -> R32
+  //   i: linear(ref(i):a) -> LinearRef, var_stride=arg0 -> Rs0
+  llvm.func @Five(%a: !llvm.ptr, %b: !llvm.ptr, %c: !llvm.ptr, %d: !llvm.ptr,
+                  %e: !llvm.ptr, %f: !llvm.ptr, %g: !llvm.ptr,
+                  %h: !llvm.ptr, %i: !llvm.ptr) -> f64 {
+    %c2  = llvm.mlir.constant(2 : i64) : i64
+    %c4  = llvm.mlir.constant(4 : i64) : i64
+    %c8  = llvm.mlir.constant(8 : i64) : i64
+    %c32 = llvm.mlir.constant(32 : i64) : i64
+    %a_val = llvm.load %a : !llvm.ptr -> i32
+    omp.declare_simd simdlen(4)
+       uniform(%a : !llvm.ptr)
+       linear(%b : !llvm.ptr = %c2 : i64,
+              %c : !llvm.ptr = %a_val : i32,
+              val(%d : !llvm.ptr = %c4 : i64),
+              val(%e : !llvm.ptr = %a_val : i32),
+              uval(%f : !llvm.ptr = %c8 : i64),
+              uval(%g : !llvm.ptr = %a_val : i32),
+              ref(%h : !llvm.ptr = %c32 : i64),
+              ref(%i : !llvm.ptr = %a_val : i32))
+    %zero = llvm.mlir.constant(0.0 : f64) : f64
+    llvm.return %zero : f64
+  }
+
+  //   a: i32, linear(-2) -> l, step=-2 -> ln2
+  //   b: ptr, linear(-32) -> l, step=-32 -> ln32
+  //   c: ptr, uval(-4) -> U, step=-4 -> Un4
+  //   d: ptr, ref(-128) -> R, step=-128 -> Rn128
+  //   e: i8, linear(-1) -> l, step=-1 -> ln1
+  //   f: ptr, linear(-1) -> l, step=-1 -> ln1
+  //   g: i16, linear(0) -> l, step=0 -> l0
+  llvm.func @Six(%a: i32, %b: !llvm.ptr, %c: !llvm.ptr, %d: !llvm.ptr,
+                 %e: i8, %f: !llvm.ptr, %g: i16) -> f64 {
+    %cn2   = llvm.mlir.constant(-2 : i32) : i32
+    %cn32  = llvm.mlir.constant(-32 : i64) : i64
+    %cn4   = llvm.mlir.constant(-4 : i64) : i64
+    %cn128 = llvm.mlir.constant(-128 : i64) : i64
+    %cn1i  = llvm.mlir.constant(-1 : i32) : i32
+    %cn1   = llvm.mlir.constant(-1 : i64) : i64
+    %c0    = llvm.mlir.constant(0 : i32) : i32
+    omp.declare_simd simdlen(4)
+       linear(%a : i32 = %cn2 : i32,
+              %b : !llvm.ptr = %cn32 : i64,
+              uval(%c : !llvm.ptr = %cn4 : i64),
+              ref(%d : !llvm.ptr = %cn128 : i64),
+              %e : i8 = %cn1i : i32,
+              %f : !llvm.ptr = %cn1 : i64,
+              %g : i16 = %c0 : i32)
+    %zero = llvm.mlir.constant(0.0 : f64) : f64
+    llvm.return %zero : f64
+  }
+}
+
+// --- add_1: three declare_simd ops ---
+//
+// linear(d:32): CDT=int(32), VLEN=reg/32
+//   b: VLEN=128/32=4, c: VLEN=256/32=8, d: VLEN=256/32=8, e: VLEN=512/32=16
+//   Ptr linear -> Linear -> l32
+// inbranch simdlen(32): M only
+// notinbranch: N only, CDT=int(VLEN varies)
+//
+// CHECK-DAG: "_ZGVbM4l32_add_1"
+// CHECK-DAG: "_ZGVbN4l32_add_1"
+// CHECK-DAG: "_ZGVcM8l32_add_1"
+// CHECK-DAG: "_ZGVcN8l32_add_1"
+// CHECK-DAG: "_ZGVdM8l32_add_1"
+// CHECK-DAG: "_ZGVdN8l32_add_1"
+// CHECK-DAG: "_ZGVeM16l32_add_1"
+// CHECK-DAG: "_ZGVeN16l32_add_1"
+
+// CHECK-DAG: "_ZGVbM32v_add_1"
+// CHECK-DAG: "_ZGVcM32v_add_1"
+// CHECK-DAG: "_ZGVdM32v_add_1"
+// CHECK-DAG: "_ZGVeM32v_add_1"
+
+// CHECK-DAG: "_ZGVbN2v_add_1"
+// CHECK-DAG: "_ZGVcN4v_add_1"
+// CHECK-DAG: "_ZGVdN4v_add_1"
+// CHECK-DAG: "_ZGVeN8v_add_1"
+
+// --- h_int ---
+// aligned(hp:16, hp2:16), CDT=ptr(64-bit) -> VLEN: b=2,c=4,d=4,e=8
+// CHECK-DAG: "_ZGVbM2va16va16vv_h_int"
+// CHECK-DAG: "_ZGVbN2va16va16vv_h_int"
+// CHECK-DAG: "_ZGVcM4va16va16vv_h_int"
+// CHECK-DAG: "_ZGVcN4va16va16vv_h_int"
+// CHECK-DAG: "_ZGVdM4va16va16vv_h_int"
+// CHECK-DAG: "_ZGVdN4va16va16vv_h_int"
+// CHECK-DAG: "_ZGVeM8va16va16vv_h_int"
+// CHECK-DAG: "_ZGVeN8va16va16vv_h_int"
+
+// --- h_float ---
+// aligned(hp:16, hp2:16), CDT=ptr(64-bit) -> VLEN: b=2,c=4,d=4,e=8
+// CHECK-DAG: "_ZGVbM2va16va16vv_h_float"
+// CHECK-DAG: "_ZGVbN2va16va16vv_h_float"
+// CHECK-DAG: "_ZGVcM4va16va16vv_h_float"
+// CHECK-DAG: "_ZGVcN4va16va16vv_h_float"
+// CHECK-DAG: "_ZGVdM4va16va16vv_h_float"
+// CHECK-DAG: "_ZGVdN4va16va16vv_h_float"
+// CHECK-DAG: "_ZGVeM8va16va16vv_h_float"
+// CHECK-DAG: "_ZGVeN8va16va16vv_h_float"
+
+// --- VV_add: uniform(this,a), linear(val(b):var_stride=a=arg1) ---
+// val on i32 (non-pointer) -> Linear (l)
+// CDT = return i32 (32-bit) -> VLEN: b=4, c=8, d=8, e=16
+// CHECK-DAG: "_ZGVbM4uuls1_VV_add"
+// CHECK-DAG: "_ZGVbN4uuls1_VV_add"
+// CHECK-DAG: "_ZGVcM8uuls1_VV_add"
+// CHECK-DAG: "_ZGVcN8uuls1_VV_add"
+// CHECK-DAG: "_ZGVdM8uuls1_VV_add"
+// CHECK-DAG: "_ZGVdN8uuls1_VV_add"
+// CHECK-DAG: "_ZGVeM16uuls1_VV_add"
+// CHECK-DAG: "_ZGVeN16uuls1_VV_add"
+
+// --- VV_taddpf ---
+// linear(this) -> this is ptr -> Linear; step=1, ptr rescale: 1*sizeof(VV)=40 -> stride=40
+// linear(a)    -> a is ptr -> Linear; step=1, ptr rescale: 1*sizeof(float)=4 -> stride=4
+// linear(ref(b):4) -> LinearRef; step=4, ptr rescale: 4*sizeof(float*)=8 -> stride=32
+// aligned(a) -> default=16; aligned(b:4) -> 4
+// CDT = return f32 (32-bit) -> VLEN: b=4, c=8, d=8, e=16
+// CHECK-DAG: "_ZGVbM4l40l4a16R32a4_VV_taddpf"
+// CHECK-DAG: "_ZGVbN4l40l4a16R32a4_VV_taddpf"
+// CHECK-DAG: "_ZGVcM8l40l4a16R32a4_VV_taddpf"
+// CHECK-DAG: "_ZGVcN8l40l4a16R32a4_VV_taddpf"
+// CHECK-DAG: "_ZGVdM8l40l4a16R32a4_VV_taddpf"
+// CHECK-DAG: "_ZGVdN8l40l4a16R32a4_VV_taddpf"
+// CHECK-DAG: "_ZGVeM16l40l4a16R32a4_VV_taddpf"
+// CHECK-DAG: "_ZGVeN16l40l4a16R32a4_VV_taddpf"
+
+// --- VV_tadd: ---
+// linear(uval(c):8) -> v v U8
+// aligned(b:8) -> v va8 v
+// CDT = return i32 -> VLEN: b=4, c=8, d=8, e=16
+// CHECK-DAG: "_ZGVbM4vvU8_VV_tadd"
+// CHECK-DAG: "_ZGVbN4vvU8_VV_tadd"
+// CHECK-DAG: "_ZGVcM8vvU8_VV_tadd"
+// CHECK-DAG: "_ZGVcN8vvU8_VV_tadd"
+// CHECK-DAG: "_ZGVdM8vvU8_VV_tadd"
+// CHECK-DAG: "_ZGVdN8vvU8_VV_tadd"
+// CHECK-DAG: "_ZGVeM16vvU8_VV_tadd"
+// CHECK-DAG: "_ZGVeN16vvU8_VV_tadd"
+
+// CHECK-DAG: "_ZGVbM4vva8v_VV_tadd"
+// CHECK-DAG: "_ZGVbN4vva8v_VV_tadd"
+// CHECK-DAG: "_ZGVcM8vva8v_VV_tadd"
+// CHECK-DAG: "_ZGVcN8vva8v_VV_tadd"
+// CHECK-DAG: "_ZGVdM8vva8v_VV_tadd"
+// CHECK-DAG: "_ZGVdN8vva8v_VV_tadd"
+// CHECK-DAG: "_ZGVeM16vva8v_VV_tadd"
+// CHECK-DAG: "_ZGVeN16vva8v_VV_tadd"
+
+// --- TVV_taddpf ---
+// aligned(a:32), aligned(b:16), ref(b:128)
+// CDT = return f32 -> VLEN: b=4, c=8, d=8, e=16
+// CHECK-DAG: "_ZGVbM4vva32R128a16_TVV_taddpf"
+// CHECK-DAG: "_ZGVbN4vva32R128a16_TVV_taddpf"
+// CHECK-DAG: "_ZGVcM8vva32R128a16_TVV_taddpf"
+// CHECK-DAG: "_ZGVcN8vva32R128a16_TVV_taddpf"
+// CHECK-DAG: "_ZGVdM8vva32R128a16_TVV_taddpf"
+// CHECK-DAG: "_ZGVdN8vva32R128a16_TVV_taddpf"
+// CHECK-DAG: "_ZGVeM16vva32R128a16_TVV_taddpf"
+// CHECK-DAG: "_ZGVeN16vva32R128a16_TVV_taddpf"
+
+// --- TVV_tadd: ---
+// simdlen(16) -> VLEN=16, all vector -> vv
+// uniform(this, b) -> uu
+// CDT = return i32 -> VLEN: b=4, c=8, d=8, e=16
+// CHECK-DAG: "_ZGVbM4uu_TVV_tadd"
+// CHECK-DAG: "_ZGVbN4uu_TVV_tadd"
+// CHECK-DAG: "_ZGVcM8uu_TVV_tadd"
+// CHECK-DAG: "_ZGVcN8uu_TVV_tadd"
+// CHECK-DAG: "_ZGVdM8uu_TVV_tadd"
+// CHECK-DAG: "_ZGVdN8uu_TVV_tadd"
+// CHECK-DAG: "_ZGVeM16uu_TVV_tadd"
+// CHECK-DAG: "_ZGVeN16uu_TVV_tadd"
+
+// CHECK-DAG: "_ZGVbM16vv_TVV_tadd"
+// CHECK-DAG: "_ZGVbN16vv_TVV_tadd"
+// CHECK-DAG: "_ZGVcM16vv_TVV_tadd"
+// CHECK-DAG: "_ZGVcN16vv_TVV_tadd"
+// CHECK-DAG: "_ZGVdM16vv_TVV_tadd"
+// CHECK-DAG: "_ZGVdN16vv_TVV_tadd"
+// CHECK-DAG: "_ZGVeM16vv_TVV_tadd"
+// CHECK-DAG: "_ZGVeN16vv_TVV_tadd"
+
+// --- foo_tmpl: ---
+// simdlen(64), aligned(b:128), uval(c:64)
+// CHECK-DAG: "_ZGVbM64va128U64_foo_tmpl"
+// CHECK-DAG: "_ZGVbN64va128U64_foo_tmpl"
+// CHECK-DAG: "_ZGVcM64va128U64_foo_tmpl"
+// CHECK-DAG: "_ZGVcN64va128U64_foo_tmpl"
+// CHECK-DAG: "_ZGVdM64va128U64_foo_tmpl"
+// CHECK-DAG: "_ZGVdN64va128U64_foo_tmpl"
+// CHECK-DAG: "_ZGVeM64va128U64_foo_tmpl"
+// CHECK-DAG: "_ZGVeN64va128U64_foo_tmpl"
+
+// --- A_infunc: ---
+// linear(a:8), a is i32 -> Linear
+// CDT = return i32 -> VLEN: b=4, c=8, d=8, e=16
+// CHECK-DAG: "_ZGVbM4vl8_A_infunc"
+// CHECK-DAG: "_ZGVbN4vl8_A_infunc"
+// CHECK-DAG: "_ZGVcM8vl8_A_infunc"
+// CHECK-DAG: "_ZGVcN8vl8_A_infunc"
+// CHECK-DAG: "_ZGVdM8vl8_A_infunc"
+// CHECK-DAG: "_ZGVdN8vl8_A_infunc"
+// CHECK-DAG: "_ZGVeM16vl8_A_infunc"
+// CHECK-DAG: "_ZGVeN16vl8_A_infunc"
+
+// --- A_outfunc: ---
+// linear(a:16), a is ptr -> Linear
+// CDT = return f32 -> VLEN: b=4, c=8, d=8, e=16
+// CHECK-DAG: "_ZGVbM4vl16_A_outfunc"
+// CHECK-DAG: "_ZGVbN4vl16_A_outfunc"
+// CHECK-DAG: "_ZGVcM8vl16_A_outfunc"
+// CHECK-DAG: "_ZGVcN8vl16_A_outfunc"
+// CHECK-DAG: "_ZGVdM8vl16_A_outfunc"
+// CHECK-DAG: "_ZGVdN8vl16_A_outfunc"
+// CHECK-DAG: "_ZGVeM16vl16_A_outfunc"
+// CHECK-DAG: "_ZGVeN16vl16_A_outfunc"
+
+// --- bar: two declare_simd ---
+// all vector -> vv, CDT=return i32, VLEN: b=4,c=8,d=8,e=16
+// CHECK-DAG: "_ZGVbM4vv_bar"
+// CHECK-DAG: "_ZGVbN4vv_bar"
+// CHECK-DAG: "_ZGVcM8vv_bar"
+// CHECK-DAG: "_ZGVcN8vv_bar"
+// CHECK-DAG: "_ZGVdM8vv_bar"
+// CHECK-DAG: "_ZGVdN8vv_bar"
+// CHECK-DAG: "_ZGVeM16vv_bar"
+// CHECK-DAG: "_ZGVeN16vv_bar"
+
+// notinbranch, aligned(a:32)
+// CHECK-DAG: "_ZGVbN4vva32_bar"
+// CHECK-DAG: "_ZGVcN8vva32_bar"
+// CHECK-DAG: "_ZGVdN8vva32_bar"
+// CHECK-DAG: "_ZGVeN16vva32_bar"
+
+// --- baz: ---
+// CDT=return f32 -> VLEN: b=4,c=8,d=8,e=16
+// CHECK-DAG: "_ZGVbM4vv_baz"
+// CHECK-DAG: "_ZGVbN4vv_baz"
+// CHECK-DAG: "_ZGVcM8vv_baz"
+// CHECK-DAG: "_ZGVcN8vv_baz"
+// CHECK-DAG: "_ZGVdM8vv_baz"
+// CHECK-DAG: "_ZGVdN8vv_baz"
+// CHECK-DAG: "_ZGVeM16vv_baz"
+// CHECK-DAG: "_ZGVeN16vv_baz"
+// CHECK-DAG: "_ZGVbN4vva16_baz"
+// CHECK-DAG: "_ZGVcN8vva16_baz"
+// CHECK-DAG: "_ZGVdN8vva16_baz"
+// CHECK-DAG: "_ZGVeN16vva16_baz"
+
+// --- bay: ---
+// CDT=f64(64-bit) -> VLEN: b=2,c=4,d=4,e=8
+// CHECK-DAG: "_ZGVbM2vv_bay"
+// CHECK-DAG: "_ZGVbN2vv_bay"
+// CHECK-DAG: "_ZGVcM4vv_bay"
+// CHECK-DAG: "_ZGVcN4vv_bay"
+// CHECK-DAG: "_ZGVdM4vv_bay"
+// CHECK-DAG: "_ZGVdN4vv_bay"
+// CHECK-DAG: "_ZGVeM8vv_bay"
+// CHECK-DAG: "_ZGVeN8vv_bay"
+// CHECK-DAG: "_ZGVbN2vva16_bay"
+// CHECK-DAG: "_ZGVcN4vva16_bay"
+// CHECK-DAG: "_ZGVdN4vva16_bay"
+// CHECK-DAG: "_ZGVeN8vva16_bay"
+
+// --- bax: inbranch ---
+// all vector -> vvv, CDT=ptr(64-bit), VLEN: b=2,c=4,d=4,e=8
+// no vector params -> CDT=int(32) -> VLEN: b=4,c=8,d=8,e=16
+// CHECK-DAG: "_ZGVbM2vvv_bax"
+// CHECK-DAG: "_ZGVbN2vvv_bax"
+// CHECK-DAG: "_ZGVcM4vvv_bax"
+// CHECK-DAG: "_ZGVcN4vvv_bax"
+// CHECK-DAG: "_ZGVdM4vvv_bax"
+// CHECK-DAG: "_ZGVdN4vvv_bax"
+// CHECK-DAG: "_ZGVeM8vvv_bax"
+// CHECK-DAG: "_ZGVeN8vvv_bax"
+
+// inbranch, uniform(v,b), linear(a:b) -> a is ptr -> Linear var-stride
+// ParamAttrs: u ls2 u -> uls2u
+// CHECK-DAG: "_ZGVbM4uls2u_bax"
+// CHECK-DAG: "_ZGVcM8uls2u_bax"
+// CHECK-DAG: "_ZGVdM8uls2u_bax"
+// CHECK-DAG: "_ZGVeM16uls2u_bax"
+
+// --- foo_scalar: ---
+// ParamAttrs: [0]=uniform+aligned(16), [1]=vector, [2]=linear(k:1)
+// k is i32 -> Linear (non-ptr), step=1
+// uniform(q)+aligned(q:16) + linear(k:1)
+// CDT = return f32 -> VLEN: b=4,c=8,d=8,e=16
+// CHECK-DAG: "_ZGVbM4ua16vl_foo_scalar"
+// CHECK-DAG: "_ZGVbN4ua16vl_foo_scalar"
+// CHECK-DAG: "_ZGVcM8ua16vl_foo_scalar"
+// CHECK-DAG: "_ZGVcN8ua16vl_foo_scalar"
+// CHECK-DAG: "_ZGVdM8ua16vl_foo_scalar"
+// CHECK-DAG: "_ZGVdN8ua16vl_foo_scalar"
+// CHECK-DAG: "_ZGVeM16ua16vl_foo_scalar"
+// CHECK-DAG: "_ZGVeN16ua16vl_foo_scalar"
+
+// --- foo_double: ---
+// CDT = f64 (64-bit) -> VLEN: b=2,c=4,d=4,e=8
+// CHECK-DAG: "_ZGVbN2v_foo_double"
+// CHECK-DAG: "_ZGVcN4v_foo_double"
+// CHECK-DAG: "_ZGVdN4v_foo_double"
+// CHECK-DAG: "_ZGVeN8v_foo_double"
+
+// --- constlinear: notinbranch ---
+// linear(i:1), i is i32 -> Linear
+// CDT = f64 (64-bit) -> VLEN: b=2,c=4,d=4,e=8
+// CHECK-DAG: "_ZGVbN2l_constlinear"
+// CHECK-DAG: "_ZGVcN4l_constlinear"
+// CHECK-DAG: "_ZGVdN4l_constlinear"
+// CHECK-DAG: "_ZGVeN8l_constlinear"
+
+// --- One: ---
+// linear() without modifier, simdlen(4)
+// ptr->Linear(l), i32->Linear(l)
+// CHECK-DAG: "_ZGVbM4l2l16l8ll4l_One"
+// CHECK-DAG: "_ZGVbN4l2l16l8ll4l_One"
+// CHECK-DAG: "_ZGVcM4l2l16l8ll4l_One"
+// CHECK-DAG: "_ZGVcN4l2l16l8ll4l_One"
+// CHECK-DAG: "_ZGVdM4l2l16l8ll4l_One"
+// CHECK-DAG: "_ZGVdN4l2l16l8ll4l_One"
+// CHECK-DAG: "_ZGVeM4l2l16l8ll4l_One"
+// CHECK-DAG: "_ZGVeN4l2l16l8ll4l_One"
+
+// --- Two: ---
+// linear(val), simdlen(4)
+// val on !llvm.ptr -> L, val on i32 -> l
+// CHECK-DAG: "_ZGVbM4L2L16l8LL4l_Two"
+// CHECK-DAG: "_ZGVbN4L2L16l8LL4l_Two"
+// CHECK-DAG: "_ZGVcM4L2L16l8LL4l_Two"
+// CHECK-DAG: "_ZGVcN4L2L16l8LL4l_Two"
+// CHECK-DAG: "_ZGVdM4L2L16l8LL4l_Two"
+// CHECK-DAG: "_ZGVdN4L2L16l8LL4l_Two"
+// CHECK-DAG: "_ZGVeM4L2L16l8LL4l_Two"
+// CHECK-DAG: "_ZGVeN4L2L16l8LL4l_Two"
+
+// --- Three: ---
+// uval: U2 U, simdlen(4)
+// CHECK-DAG: "_ZGVbM4U2U_Three"
+// CHECK-DAG: "_ZGVbN4U2U_Three"
+// CHECK-DAG: "_ZGVcM4U2U_Three"
+// CHECK-DAG: "_ZGVcN4U2U_Three"
+// CHECK-DAG: "_ZGVdM4U2U_Three"
+// CHECK-DAG: "_ZGVdN4U2U_Three"
+// CHECK-DAG: "_ZGVeM4U2U_Three"
+// CHECK-DAG: "_ZGVeN4U2U_Three"
+
+// --- Four: ---
+// ref, simdlen(4)
+// CHECK-DAG: "_ZGVbM4R8R4_Four"
+// CHECK-DAG: "_ZGVbN4R8R4_Four"
+// CHECK-DAG: "_ZGVcM4R8R4_Four"
+// CHECK-DAG: "_ZGVcN4R8R4_Four"
+// CHECK-DAG: "_ZGVdM4R8R4_Four"
+// CHECK-DAG: "_ZGVdN4R8R4_Four"
+// CHECK-DAG: "_ZGVeM4R8R4_Four"
+// CHECK-DAG: "_ZGVeN4R8R4_Four"
+
+// --- Five: ---
+// all modifiers + var stride, simdlen(4)
+// u l2 ls0 L4 Ls0 U8 Us0 R32 Rs0
+// CHECK-DAG: "_ZGVbM4ul2ls0L4Ls0U8Us0R32Rs0_Five"
+// CHECK-DAG: "_ZGVbN4ul2ls0L4Ls0U8Us0R32Rs0_Five"
+// CHECK-DAG: "_ZGVcM4ul2ls0L4Ls0U8Us0R32Rs0_Five"
+// CHECK-DAG: "_ZGVcN4ul2ls0L4Ls0U8Us0R32Rs0_Five"
+// CHECK-DAG: "_ZGVdM4ul2ls0L4Ls0U8Us0R32Rs0_Five"
+// CHECK-DAG: "_ZGVdN4ul2ls0L4Ls0U8Us0R32Rs0_Five"
+// CHECK-DAG: "_ZGVeM4ul2ls0L4Ls0U8Us0R32Rs0_Five"
+// CHECK-DAG: "_ZGVeN4ul2ls0L4Ls0U8Us0R32Rs0_Five"
+
+// --- Six: ---
+// negative strides, simdlen(4)
+// ln2 ln32 Un4 Rn128 ln1 ln1 l0
+// CHECK-DAG: "_ZGVbM4ln2ln32Un4Rn128ln1ln1l0_Six"
+// CHECK-DAG: "_ZGVbN4ln2ln32Un4Rn128ln1ln1l0_Six"
+// CHECK-DAG: "_ZGVcM4ln2ln32Un4Rn128ln1ln1l0_Six"
+// CHECK-DAG: "_ZGVcN4ln2ln32Un4Rn128ln1ln1l0_Six"
+// CHECK-DAG: "_ZGVdM4ln2ln32Un4Rn128ln1ln1l0_Six"
+// CHECK-DAG: "_ZGVdN4ln2ln32Un4Rn128ln1ln1l0_Six"
+// CHECK-DAG: "_ZGVeM4ln2ln32Un4Rn128ln1ln1l0_Six"
+// CHECK-DAG: "_ZGVeN4ln2ln32Un4Rn128ln1ln1l0_Six"

>From 0adf4ef6db4a13586df82b4db0645fc8d0b76efc Mon Sep 17 00:00:00 2001
From: "Chi Chun, Chen" <chichun.chen at hpe.com>
Date: Fri, 20 Mar 2026 13:39:19 -0500
Subject: [PATCH 2/2] Fix declare simd linear stride rescaling and arg_types
 verifier

1. Rescale constant linear steps from source-level element counts to byte
   strides in Flang's processLinear(). For reference-like parameters
   (pointers or non-VALUE dummy arguments) with Linear or LinearRef ABI
   kind, the step must be multiplied by the element size in bytes. This
   matches Clang's rescaling in CGOpenMPRuntime.cpp. Val and UVal kinds
   are not rescaled as they describe value changes, not pointer strides.
   Var-strides are also not rescaled as the value is an argument index.

2. Add a verifier check in DeclareSimdOp to ensure 'arg_types' length
   matches the number of function arguments, preventing out-of-bounds
   access during MLIR-to-LLVM IR translation.

Also restructure processLinear() to compute stepOperand per-variable
instead of appending the same operand for all objects in the clause,
enabling per-variable rescaling.

Assisted with copilot.
---
 flang/lib/Lower/OpenMP/ClauseProcessor.cpp    | 55 +++++++++++++++++--
 flang/lib/Lower/OpenMP/OpenMP.cpp             |  4 +-
 flang/test/Lower/OpenMP/declare-simd.f90      | 14 ++---
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td |  2 +-
 mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp  |  7 +++
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp      | 34 ++++++------
 mlir/test/Dialect/OpenMP/invalid.mlir         |  8 +++
 mlir/test/Dialect/OpenMP/ops.mlir             |  4 +-
 8 files changed, 93 insertions(+), 35 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index c4486b8719b70..93abb10e6ba24 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -1520,6 +1520,9 @@ bool ClauseProcessor::processLinear(mlir::omp::LinearClauseOps &result,
       }
     }
 
+    fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+    mlir::Location currentLocation = converter.getCurrentLocation();
+
     for (const omp::Object &object : objects) {
       semantics::Symbol *sym = object.sym();
       const mlir::Value variable = converter.getSymbolAddress(*sym);
@@ -1527,20 +1530,17 @@ bool ClauseProcessor::processLinear(mlir::omp::LinearClauseOps &result,
       mlir::Type ty = converter.genType(*sym);
       typeAttrs.push_back(mlir::TypeAttr::get(ty));
 
+      mlir::Value stepOperand;
       if (auto &mod =
               std::get<std::optional<omp::clause::Linear::StepComplexModifier>>(
                   clause.t)) {
-        mlir::Value operand =
+        stepOperand =
             fir::getBase(converter.genExprValue(toEvExpr(*mod), stmtCtx));
-        result.linearStepVars.append(objects.size(), operand);
       } else {
         // If nothing is present, add the default step of 1.
-        fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
-        mlir::Location currentLocation = converter.getCurrentLocation();
         mlir::Type integerTy = ty.isInteger() ? ty : firOpBuilder.getI32Type();
-        mlir::Value operand =
+        stepOperand =
             firOpBuilder.createIntegerConstant(currentLocation, integerTy, 1);
-        result.linearStepVars.append(objects.size(), operand);
       }
 
       // Determine the linear modifier:
@@ -1569,6 +1569,49 @@ bool ClauseProcessor::processLinear(mlir::omp::LinearClauseOps &result,
         linearMod = isDeclareSimd ? getDeclareSimdDefaultMod(*sym)
                                   : mlir::omp::LinearModifier::val;
 
+      // For declare simd, rescale constant linear steps from source-level
+      // element counts to byte strides for parameters that will use Linear
+      // or LinearRef ABI kind. This matches Clang's rescaling behavior in
+      // CGOpenMPRuntime.cpp. Val and UVal kinds are not rescaled because
+      // they describe value changes, not pointer strides.
+      if (isDeclareSimd) {
+        bool isRefLike = false;
+        const auto &ultimate = sym->GetUltimate();
+        if (semantics::IsPointer(ultimate))
+          isRefLike = true;
+        else if (const auto *obj =
+                     ultimate
+                         .detailsIf<Fortran::semantics::ObjectEntityDetails>())
+          if (obj->isDummy() && !semantics::IsValue(ultimate))
+            isRefLike = true;
+
+        // Rescale for ref modifier or no modifier on a reference-like param.
+        // val and uval are not rescaled.
+        bool needsRescale =
+            isRefLike &&
+            (!linearMod || *linearMod == mlir::omp::LinearModifier::ref);
+        if (needsRescale) {
+          mlir::Type elemTy = fir::getFortranElementType(ty);
+          if (elemTy.isIntOrFloat()) {
+            unsigned elemSizeBytes = elemTy.getIntOrFloatBitWidth() / 8;
+            if (elemSizeBytes > 1) {
+              // Only rescale constant strides.
+              if (auto cstOp =
+                      stepOperand.getDefiningOp<mlir::arith::ConstantOp>()) {
+                if (auto intAttr =
+                        mlir::dyn_cast<mlir::IntegerAttr>(cstOp.getValue())) {
+                  int64_t rescaled = intAttr.getInt() * elemSizeBytes;
+                  stepOperand = firOpBuilder.createIntegerConstant(
+                      currentLocation, stepOperand.getType(), rescaled);
+                }
+              }
+            }
+          }
+        }
+      }
+
+      result.linearStepVars.push_back(stepOperand);
+
       if (linearMod)
         linearModAttrs.push_back(mlir::omp::LinearModifierAttr::get(
             &converter.getMLIRContext(), *linearMod));
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index d0a6725a6ab8e..571fb36f00fb0 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -4047,8 +4047,8 @@ genOMP(lower::AbstractConverter &converter, lower::SymMap &symTable,
       converter.getFirOpBuilder(), loc, clauseOps);
 
   // Record the scalar element types of all function arguments so that
-  // the OpenMPToLLVMIRTranslation can recover pointee-type information
-  // lost in opaque pointers for correct LS / NDS / WDS computation.
+  // OpenMPToLLVMIRTranslation can recover pointee-type information lost
+  // in opaque pointers for correct LS / NDS / WDS computation.
   // We strip FIR wrappers (box, heap, ref, array) to get the plain scalar
   // type (e.g. i32, f64) that survives FIR-to-LLVM type conversion unchanged.
   if (auto *owningProc = eval.getOwningProcedure();
diff --git a/flang/test/Lower/OpenMP/declare-simd.f90 b/flang/test/Lower/OpenMP/declare-simd.f90
index 7621fe1a0cd76..24b92c9d05eda 100644
--- a/flang/test/Lower/OpenMP/declare-simd.f90
+++ b/flang/test/Lower/OpenMP/declare-simd.f90
@@ -57,8 +57,8 @@ end subroutine  declare_simd_linear
 ! CHECK-LABEL: func.func @_QPdeclare_simd_linear(
 ! CHECK: %[[SCOPE:.*]] = fir.dummy_scope : !fir.dscope
 ! CHECK: %[[I:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 4 {{.*}} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
-! CHECK: %[[C1:.*]] = arith.constant 1 : i32
-! CHECK: omp.declare_simd linear(ref(%[[I]]#0 : !fir.ref<i32> = %[[C1]] : i32))
+! CHECK: %[[C4:.*]] = arith.constant 4 : i32
+! CHECK: omp.declare_simd linear(ref(%[[I]]#0 : !fir.ref<i32> = %[[C4]] : i32))
 ! CHECK-SAME: {arg_types = [f64, f64, i32, i32], linear_var_types = [i32]}{{$}}
 ! CHECK: return
 
@@ -103,7 +103,7 @@ end subroutine declare_simd_uniform
 ! CHECK: %[[XDECL:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 1
 ! CHECK: %[[YDECL:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 2
 ! CHECK: omp.declare_simd uniform(%[[XDECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, %[[YDECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>)
-! CHECK-SAME: {arg_types = [f64, f64, i32, i32]}
+! CHECK-SAME: {arg_types = [f64, f64, i32, i32]}{{$}}
 ! CHECK: return
 
 subroutine declare_simd_inbranch()
@@ -155,13 +155,13 @@ end subroutine declare_simd_combined
 ! CHECK: %[[N_DECL:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 3 {{.*}} : (!fir.ref<i32>, !fir.dscope) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK: %[[X_DECL:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 1 {{.*}} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, !fir.dscope) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>)
 ! CHECK: %[[Y_DECL:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 2 {{.*}} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, !fir.dscope) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>)
-! CHECK: %[[C1:.*]] = arith.constant 1 : i32
+! CHECK: %[[C4:.*]] = arith.constant 4 : i32
 
 ! CHECK: omp.declare_simd
 ! CHECK-SAME: aligned(%[[X_DECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>> -> 64 : i64,
 ! CHECK-SAME:         %[[Y_DECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>> -> 64 : i64)
 ! CHECK-SAME: inbranch
-! CHECK-SAME: linear(ref(%[[I_DECL]]#0 : !fir.ref<i32> = %[[C1]] : i32))
+! CHECK-SAME: linear(ref(%[[I_DECL]]#0 : !fir.ref<i32> = %[[C4]] : i32))
 ! CHECK-SAME: simdlen(8)
 ! CHECK-SAME: uniform(%[[X_DECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>,
 ! CHECK-SAME:         %[[Y_DECL]]#0 : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf64>>>>)
@@ -199,8 +199,8 @@ end subroutine declare_simd_linear_ref
 ! CHECK-LABEL: func.func @_QPdeclare_simd_linear_ref(
 ! CHECK: %[[SCOPE:.*]] = fir.dummy_scope : !fir.dscope
 ! CHECK: %[[X:.*]]:2 = hlfir.declare %{{.*}} dummy_scope %[[SCOPE]] arg 1 {{.*}} : (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.dscope) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
-! CHECK: %[[C4:.*]] = arith.constant 4 : i32
-! CHECK: omp.declare_simd linear(ref(%[[X]]#0 : !fir.ref<!fir.box<!fir.heap<i32>>> = %[[C4]] : i32))
+! CHECK: %[[C16:.*]] = arith.constant 16 : i32
+! CHECK: omp.declare_simd linear(ref(%[[X]]#0 : !fir.ref<!fir.box<!fir.heap<i32>>> = %[[C16]] : i32))
 ! CHECK-SAME: {arg_types = [i32], linear_var_types = [!fir.box<!fir.heap<i32>>]}{{$}}
 ! CHECK: return
 
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index dc5255aaab47e..b33438d492410 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -2286,7 +2286,7 @@ def DeclareSimdOp
     function.
 
     The optional `arg_types` attribute records the original language-level
-    types of the enclosing function's arguments.  This is used during
+    types of the enclosing function's arguments. This is used during
     translation to recover pointee-type information lost in opaque
     `!llvm.ptr`, enabling correct lane-size (LS) computation for the
     AArch64 AAVFABI.
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index b07220dffcecd..ca79b12b68269 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -4707,6 +4707,13 @@ LogicalResult DeclareSimdOp::verify() {
   if (getInbranch() && getNotinbranch())
     return emitOpError("cannot have both 'inbranch' and 'notinbranch'");
 
+  if (auto argTypes = getArgTypes()) {
+    if (argTypes->size() != func.getNumArguments())
+      return emitOpError() << "'arg_types' length (" << argTypes->size()
+                           << ") must match the number of function arguments ("
+                           << func.getNumArguments() << ")";
+  }
+
   if (failed(verifyLinearModifiers(*this, getLinearModifiers(), getLinearVars(),
                                    /*isDeclareSimd=*/true)))
     return failure();
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index fb0f6636f8e8f..0e933432c675e 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -7503,9 +7503,8 @@ static void populateLinearParam(
         paramAttr.Kind = llvm::OpenMPIRBuilder::DeclareSimdKindTy::LinearUVal;
         break;
       case omp::LinearModifier::val:
-        // Match clang: val on a non-reference (non-pointer SSA type) is
-        // semantically identical to plain linear.  Only pointer-typed
-        // vars (which may originate from C++ references) get LinearVal.
+        // val on a non-reference (non-pointer SSA type) is semantically
+        // identical to plain linear. Only pointer-typed vars get LinearVal.
         if (isa<LLVM::LLVMPointerType>(linearVars[i].getType()))
           paramAttr.Kind = llvm::OpenMPIRBuilder::DeclareSimdKindTy::LinearVal;
         else
@@ -7679,30 +7678,31 @@ static bool getAArch64PBV(Type ty, const DataLayout &dl) {
 // Computes the lane size (LS) of a return type or of an input parameter,
 // as defined by `LS(P)` in 3.2.1 of the AAVFABI.
 //
-// argElemType provides  the original language-level type for an opaque
-// `!llvm.ptr` parameter, enabling correct LS computation.
-static unsigned getAArch64LS(Type ty,
+// `paramTy` is the SSA type of the parameter (e.g. !llvm.ptr, i32, f64).
+// `paramElemTy` optionally provides the original language-level pointee type
+// from `arg_types` for opaque `!llvm.ptr` parameters whose pointee type has
+// been erased.
+static unsigned getAArch64LS(Type paramTy,
                              llvm::OpenMPIRBuilder::DeclareSimdKindTy kind,
-                             const DataLayout &dl,
-                             Type argElemType = nullptr) {
-  if (!getAArch64MTV(ty, kind)) {
-    if (auto ptrLikeTy = dyn_cast<omp::PointerLikeType>(ty)) {
+                             const DataLayout &dl, Type paramElemTy = nullptr) {
+  if (!getAArch64MTV(paramTy, kind)) {
+    if (auto ptrLikeTy = dyn_cast<omp::PointerLikeType>(paramTy)) {
       Type elemTy = ptrLikeTy.getElementType();
       if (elemTy && getAArch64PBV(elemTy, dl))
         return dl.getTypeSizeInBits(elemTy);
     }
     // For opaque !llvm.ptr, use the original type from arg_types
     // if available, since the pointee type is lost at LLVM IR level.
-    if (isa<LLVM::LLVMPointerType>(ty) && argElemType &&
-        getAArch64PBV(argElemType, dl))
-      return dl.getTypeSizeInBits(argElemType);
+    if (isa<LLVM::LLVMPointerType>(paramTy) && paramElemTy &&
+        getAArch64PBV(paramElemTy, dl))
+      return dl.getTypeSizeInBits(paramElemTy);
   }
 
-  if (getAArch64PBV(ty, dl))
-    return dl.getTypeSizeInBits(ty);
+  if (getAArch64PBV(paramTy, dl))
+    return dl.getTypeSizeInBits(paramTy);
 
-  return dl.getTypeSizeInBits(LLVM::LLVMPointerType::get(ty.getContext(),
-                                                         /*addressSpace=*/0));
+  return dl.getTypeSizeInBits(
+      LLVM::LLVMPointerType::get(paramTy.getContext(), /*addressSpace=*/0));
 }
 
 // Get Narrowest Data Size (NDS) and Widest Data Size (WDS) from the
diff --git a/mlir/test/Dialect/OpenMP/invalid.mlir b/mlir/test/Dialect/OpenMP/invalid.mlir
index 4879ea754bf75..4bdd12f23db8c 100644
--- a/mlir/test/Dialect/OpenMP/invalid.mlir
+++ b/mlir/test/Dialect/OpenMP/invalid.mlir
@@ -3245,6 +3245,14 @@ func.func @omp_declare_simd_linear_modifiers_mismatch(%iv : i32, %step : i32) {
 
 // -----
 
+func.func @omp_declare_simd_arg_types_mismatch(%a : i32) {
+  // expected-error @below {{'omp.declare_simd' op 'arg_types' length (2) must match the number of function arguments (1)}}
+  omp.declare_simd {arg_types = [i32, i32]}
+  return
+}
+
+// -----
+
 func.func @iterator_bad_result_type(%lb : index, %ub : index, %st : index) {
   // expected-error at +1 {{result #0 must be OpenMP iterator-produced list handle, but got 'index'}}
   %0 = omp.iterator(%i: index) = (%lb to %ub step %st) {
diff --git a/mlir/test/Dialect/OpenMP/ops.mlir b/mlir/test/Dialect/OpenMP/ops.mlir
index 9778dcc8d7ea0..2d8c5171ba56c 100644
--- a/mlir/test/Dialect/OpenMP/ops.mlir
+++ b/mlir/test/Dialect/OpenMP/ops.mlir
@@ -3614,8 +3614,8 @@ func.func @omp_declare_simd_arg_types(%a: f64, %b: i32) -> () {
 func.func @omp_declare_simd_arg_types_with_linear(%a: f64, %b: !llvm.ptr, %step: i64) -> () {
   // CHECK: omp.declare_simd
   // CHECK-SAME: linear(ref(%{{.*}} : !llvm.ptr = %{{.*}} : i64))
-  // CHECK-SAME: {arg_types = [f64, i32], linear_var_types = [i32]}
-  omp.declare_simd linear(ref(%b : !llvm.ptr = %step : i64)) {arg_types = [f64, i32], linear_var_types = [i32]}
+  // CHECK-SAME: {arg_types = [f64, i32, i64], linear_var_types = [i32]}
+  omp.declare_simd linear(ref(%b : !llvm.ptr = %step : i64)) {arg_types = [f64, i32, i64], linear_var_types = [i32]}
   return
 }