[clang] [llvm] [HLSL][Matrix] Support row-major `transpose` and `mul` by inserting matrix memory layout transformations (PR #186898)
Deric C. via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 16 15:03:26 PDT 2026
https://github.com/Icohedron created https://github.com/llvm/llvm-project/pull/186898
Fixes #184906
The SPIRV and DXIL backends assume matrices are provided in column-major order when lowering matrix transpose and matrix multiplication intrinsics.
To support row-major order matrices from Clang/HLSL, we therefore need to convert row-major order matrices into column-major order matrices before applying matrix transpose and multiplication. A conversion from column-major order back to row-major order is also required for correctness after a matrix transpose or matrix multiply.
This PR adds helper functions to the MatrixBuilder to convert a NxM row-/column-major order matrix into a NxM column-/row-major order matrix by applying a matrix transpose.
The transformations take advantage of the fact that a row-major order matrix of NxM dimensions `rNxM` interpreted in column-major order is equivalent to its transpose in column-major order.
Example: Let `r3x2 = [ 0, 1, 2, 3, 4, 5 ]`. The 3x2 matrix is visualized as
```
0 1
2 3
4 5
```
When `[ 0, 1, 2, 3, 4, 5 ]` is interpreted as a 2x3 column-major order matrix, it is visualized as:
```
0 2 4
1 3 5
```
which is equal to the transpose of `r3x2` but in column-major order.
These matrix memory layout transformations are inserted before and after the matrix multiply and transpose intrinsics when lowering HLSL mul and transpose.
While this method of supporting row-major order matrices is not performant, it is correct and will suffice for now until benchmarks are created and performance becomes a primary concern.
Assisted-by: GitHub Copilot (powered by Claude Opus 4.6)
>From 03162e8a21e0fa21a2a518c7fbda0c4d2b1b91a6 Mon Sep 17 00:00:00 2001
From: Deric Cheung <cheung.deric at gmail.com>
Date: Mon, 16 Mar 2026 14:16:31 -0700
Subject: [PATCH] Insert matrix memory layout transposes before and after
matrix intrinsics
The SPIRV and DXIL backends assume matrices are provided in column-major
order when lowering matrix transpose and matrix multiplication
intrinsics.
To support row-major order matrices from Clang/HLSL, we therefore need
to convert row-major order matrices into column-major order matrices
before applying matrix transpose and multiplication. A conversion
from column-major order back to row-major order is also required for
correctness after a matrix transpose or matrix multiply.
This commit adds helper functions to the MatrixBuilder to convert a NxM
row-/column-major order matrix into a NxM column-/row-major order matrix
by applying a matrix transpose.
These matrix memory layout transposes are inserted before and after HLSL
transpose and mul when lowering.
Assisted-by: GitHub Copilot (powered by Claude Opus 4.6)
---
clang/lib/CodeGen/CGHLSLBuiltins.cpp | 46 +++++++++++--
clang/test/CodeGenHLSL/builtins/mul.hlsl | 42 ++++++++----
.../test/CodeGenHLSL/builtins/transpose.hlsl | 64 ++++++++++++-------
llvm/include/llvm/IR/MatrixBuilder.h | 16 +++++
4 files changed, 127 insertions(+), 41 deletions(-)
diff --git a/clang/lib/CodeGen/CGHLSLBuiltins.cpp b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
index 80c590437309d..a891864e6d964 100644
--- a/clang/lib/CodeGen/CGHLSLBuiltins.cpp
+++ b/clang/lib/CodeGen/CGHLSLBuiltins.cpp
@@ -1122,32 +1122,64 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID,
bool IsMat0 = QTy0->isConstantMatrixType();
bool IsMat1 = QTy1->isConstantMatrixType();
+ // The matrix multiply intrinsic only operates on column-major order
+ // matrices. Therefore matrix memory layout transforms must be inserted
+ // before and after matrix multiply intrinsics.
+ bool IsRowMajor = getLangOpts().getDefaultMatrixMemoryLayout() ==
+ LangOptions::MatrixMemoryLayout::MatrixRowMajor;
+
llvm::MatrixBuilder MB(Builder);
if (IsVec0 && IsMat1) {
unsigned N = QTy0->castAs<VectorType>()->getNumElements();
auto *MatTy = QTy1->castAs<ConstantMatrixType>();
- unsigned M = MatTy->getNumColumns();
- return MB.CreateMatrixMultiply(Op0, Op1, 1, N, M, "hlsl.mul");
+ unsigned Rows = MatTy->getNumRows();
+ unsigned Cols = MatTy->getNumColumns();
+ if (IsRowMajor)
+ Op1 = MB.CreateRowMajorToColumnMajorTransform(Op1, Rows, Cols);
+ return MB.CreateMatrixMultiply(Op0, Op1, 1, N, Cols, "hlsl.mul");
}
if (IsMat0 && IsVec1) {
auto *MatTy = QTy0->castAs<ConstantMatrixType>();
unsigned Rows = MatTy->getNumRows();
unsigned Cols = MatTy->getNumColumns();
+ if (IsRowMajor)
+ Op0 = MB.CreateRowMajorToColumnMajorTransform(Op0, Rows, Cols);
return MB.CreateMatrixMultiply(Op0, Op1, Rows, Cols, 1, "hlsl.mul");
}
assert(IsMat0 && IsMat1);
auto *MatTy0 = QTy0->castAs<ConstantMatrixType>();
auto *MatTy1 = QTy1->castAs<ConstantMatrixType>();
- return MB.CreateMatrixMultiply(Op0, Op1, MatTy0->getNumRows(),
- MatTy0->getNumColumns(),
- MatTy1->getNumColumns(), "hlsl.mul");
+ unsigned Rows0 = MatTy0->getNumRows();
+ unsigned Rows1 = MatTy1->getNumRows();
+ unsigned Cols0 = MatTy0->getNumColumns();
+ unsigned Cols1 = MatTy1->getNumColumns();
+ if (IsRowMajor) {
+ Op0 = MB.CreateRowMajorToColumnMajorTransform(Op0, Rows0, Cols0);
+ Op1 = MB.CreateRowMajorToColumnMajorTransform(Op1, Rows1, Cols1);
+ }
+ Value *Result =
+ MB.CreateMatrixMultiply(Op0, Op1, Rows0, Cols0, Cols1, "hlsl.mul");
+ if (IsRowMajor)
+ Result = MB.CreateColumnMajorToRowMajorTransform(Result, Rows0, Cols1);
+ return Result;
}
case Builtin::BI__builtin_hlsl_transpose: {
Value *Op0 = EmitScalarExpr(E->getArg(0));
auto *MatTy = E->getArg(0)->getType()->castAs<ConstantMatrixType>();
+ unsigned Rows = MatTy->getNumRows();
+ unsigned Cols = MatTy->getNumColumns();
llvm::MatrixBuilder MB(Builder);
- return MB.CreateMatrixTranspose(Op0, MatTy->getNumRows(),
- MatTy->getNumColumns());
+ // The matrix transpose intrinsic only operates on column-major order
+ // matrices. Therefore matrix memory layout transforms must be inserted
+ // before and after matrix transpose intrinsics.
+ bool IsRowMajor = getLangOpts().getDefaultMatrixMemoryLayout() ==
+ LangOptions::MatrixMemoryLayout::MatrixRowMajor;
+ if (IsRowMajor)
+ Op0 = MB.CreateRowMajorToColumnMajorTransform(Op0, Rows, Cols);
+ Value *Result = MB.CreateMatrixTranspose(Op0, Rows, Cols);
+ if (IsRowMajor)
+ Result = MB.CreateColumnMajorToRowMajorTransform(Result, Cols, Rows);
+ return Result;
}
case Builtin::BI__builtin_hlsl_elementwise_rcp: {
Value *Op0 = EmitScalarExpr(E->getArg(0));
diff --git a/clang/test/CodeGenHLSL/builtins/mul.hlsl b/clang/test/CodeGenHLSL/builtins/mul.hlsl
index 17749e527af65..f9151225d3846 100644
--- a/clang/test/CodeGenHLSL/builtins/mul.hlsl
+++ b/clang/test/CodeGenHLSL/builtins/mul.hlsl
@@ -1,5 +1,7 @@
-// RUN: %clang_cc1 -finclude-default-header -O1 -triple dxil-pc-shadermodel6.3-library -fnative-half-type -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,DXIL
-// RUN: %clang_cc1 -finclude-default-header -O1 -triple spirv-unknown-vulkan1.3-library -fnative-half-type -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,SPIRV
+// RUN: %clang_cc1 -finclude-default-header -O1 -triple dxil-pc-shadermodel6.3-library -fnative-half-type -emit-llvm -fmatrix-memory-layout=column-major -o - %s | FileCheck %s --check-prefixes=CHECK,COLMAJOR,DXIL
+// RUN: %clang_cc1 -finclude-default-header -O1 -triple spirv-unknown-vulkan1.3-library -fnative-half-type -emit-llvm -fmatrix-memory-layout=column-major -o - %s | FileCheck %s --check-prefixes=CHECK,COLMAJOR,SPIRV
+// RUN: %clang_cc1 -finclude-default-header -O1 -triple dxil-pc-shadermodel6.3-library -fnative-half-type -emit-llvm -fmatrix-memory-layout=row-major -o - %s | FileCheck %s --check-prefixes=CHECK,ROWMAJOR,DXIL
+// RUN: %clang_cc1 -finclude-default-header -O1 -triple spirv-unknown-vulkan1.3-library -fnative-half-type -emit-llvm -fmatrix-memory-layout=row-major -o - %s | FileCheck %s --check-prefixes=CHECK,ROWMAJOR,SPIRV
// -- Case 1: scalar * scalar -> scalar --
@@ -74,7 +76,8 @@ export double test_vec_vec_muld(double3 a, double3 b) { return mul(a, b); }
// -- Case 6: vector * matrix -> vector --
// CHECK-LABEL: test_vec_mat_mul
-// CHECK: %hlsl.mul = {{.*}} call {{.*}} <3 x float> @llvm.matrix.multiply.v3f32.v2f32.v6f32(<2 x float> %v, <6 x float> %m, i32 1, i32 2, i32 3)
+// ROWMAJOR: {{.*}} = {{.*}} call {{.*}} <6 x float> @llvm.matrix.transpose.v6f32(<6 x float> %{{.*}}, i32 3, i32 2)
+// CHECK: %hlsl.mul = {{.*}} call {{.*}} <3 x float> @llvm.matrix.multiply.v3f32.v2f32.v6f32(<2 x float> %v, <6 x float> %{{.*}}, i32 1, i32 2, i32 3)
// CHECK: ret <3 x float> %hlsl.mul
export float3 test_vec_mat_mul(float2 v, float2x3 m) { return mul(v, m); }
@@ -90,22 +93,31 @@ export float2x3 test_mat_scalar_mul(float2x3 a, float b) { return mul(a, b); }
// -- Case 8: matrix * vector -> vector --
// CHECK-LABEL: test_mat_vec_mul
-// CHECK: %hlsl.mul = {{.*}} call {{.*}} <2 x float> @llvm.matrix.multiply.v2f32.v6f32.v3f32(<6 x float> %m, <3 x float> %v, i32 2, i32 3, i32 1)
+// ROWMAJOR: {{.*}} = {{.*}} call {{.*}} <6 x float> @llvm.matrix.transpose.v6f32(<6 x float> %{{.*}}, i32 3, i32 2)
+// CHECK: %hlsl.mul = {{.*}} call {{.*}} <2 x float> @llvm.matrix.multiply.v2f32.v6f32.v3f32(<6 x float> %{{.*}}, <3 x float> %v, i32 2, i32 3, i32 1)
// CHECK: ret <2 x float> %hlsl.mul
export float2 test_mat_vec_mul(float2x3 m, float3 v) { return mul(m, v); }
// -- Case 9: matrix * matrix -> matrix --
// CHECK-LABEL: test_mat_mat_mul
-// CHECK: %hlsl.mul = {{.*}} call {{.*}} <8 x float> @llvm.matrix.multiply.v8f32.v6f32.v12f32(<6 x float> %a, <12 x float> %b, i32 2, i32 3, i32 4)
-// CHECK: ret <8 x float> %hlsl.mul
+// ROWMAJOR: {{.*}} = {{.*}} call {{.*}} <6 x float> @llvm.matrix.transpose.v6f32(<6 x float> %{{.*}}, i32 3, i32 2)
+// ROWMAJOR: {{.*}} = {{.*}} call {{.*}} <12 x float> @llvm.matrix.transpose.v12f32(<12 x float> %{{.*}}, i32 4, i32 3)
+// CHECK: %hlsl.mul = {{.*}} call {{.*}} <8 x float> @llvm.matrix.multiply.v8f32.v6f32.v12f32(<6 x float> %{{.*}}, <12 x float> %{{.*}}, i32 2, i32 3, i32 4)
+// COLMAJOR: ret <8 x float> %hlsl.mul
+// ROWMAJOR: %[[TRANSPOSE_RES:.*]] = {{.*}} call {{.*}} <8 x float> @llvm.matrix.transpose.v8f32(<8 x float> %hlsl.mul, i32 2, i32 4)
+// ROWMAJOR: ret <8 x float> %[[TRANSPOSE_RES]]
export float2x4 test_mat_mat_mul(float2x3 a, float3x4 b) { return mul(a, b); }
// -- Integer matrix multiply --
// CHECK-LABEL: test_mat_mat_muli
-// CHECK: %hlsl.mul = {{.*}} call <8 x i32> @llvm.matrix.multiply.v8i32.v6i32.v12i32(<6 x i32> %a, <12 x i32> %b, i32 2, i32 3, i32 4)
-// CHECK: ret <8 x i32> %hlsl.mul
+// ROWMAJOR: {{.*}} = {{.*}} call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> %{{.*}}, i32 3, i32 2)
+// ROWMAJOR: {{.*}} = {{.*}} call <12 x i32> @llvm.matrix.transpose.v12i32(<12 x i32> %{{.*}}, i32 4, i32 3)
+// CHECK: %hlsl.mul = {{.*}} call <8 x i32> @llvm.matrix.multiply.v8i32.v6i32.v12i32(<6 x i32> %{{.*}}, <12 x i32> %{{.*}}, i32 2, i32 3, i32 4)
+// COLMAJOR: ret <8 x i32> %hlsl.mul
+// ROWMAJOR: %[[TRANSPOSE_RES:.*]] = {{.*}} call <8 x i32> @llvm.matrix.transpose.v8i32(<8 x i32> %hlsl.mul, i32 2, i32 4)
+// ROWMAJOR: ret <8 x i32> %[[TRANSPOSE_RES]]
export int2x4 test_mat_mat_muli(int2x3 a, int3x4 b) { return mul(a, b); }
// -- Half-type overloads (native half) --
@@ -150,16 +162,22 @@ export half test_vec_vec_mulh(half3 a, half3 b) { return mul(a, b); }
export half2x3 test_mat_scalar_mulh(half2x3 a, half b) { return mul(a, b); }
// CHECK-LABEL: test_vec_mat_mulh
-// CHECK: %hlsl.mul = {{.*}}call {{.*}} <3 x half> @llvm.matrix.multiply.v3f16.v2f16.v6f16(<2 x half> %v, <6 x half> %m, i32 1, i32 2, i32 3)
+// ROWMAJOR: {{.*}} = {{.*}} call {{.*}} <6 x half> @llvm.matrix.transpose.v6f16(<6 x half> %{{.*}}, i32 3, i32 2)
+// CHECK: %hlsl.mul = {{.*}}call {{.*}} <3 x half> @llvm.matrix.multiply.v3f16.v2f16.v6f16(<2 x half> %v, <6 x half> %{{.*}}, i32 1, i32 2, i32 3)
// CHECK: ret <3 x half> %hlsl.mul
export half3 test_vec_mat_mulh(half2 v, half2x3 m) { return mul(v, m); }
// CHECK-LABEL: test_mat_vec_mulh
-// CHECK: %hlsl.mul = {{.*}}call {{.*}} <2 x half> @llvm.matrix.multiply.v2f16.v6f16.v3f16(<6 x half> %m, <3 x half> %v, i32 2, i32 3, i32 1)
+// ROWMAJOR: {{.*}} = {{.*}} call {{.*}} <6 x half> @llvm.matrix.transpose.v6f16(<6 x half> %{{.*}}, i32 3, i32 2)
+// CHECK: %hlsl.mul = {{.*}}call {{.*}} <2 x half> @llvm.matrix.multiply.v2f16.v6f16.v3f16(<6 x half> %{{.*}}, <3 x half> %v, i32 2, i32 3, i32 1)
// CHECK: ret <2 x half> %hlsl.mul
export half2 test_mat_vec_mulh(half2x3 m, half3 v) { return mul(m, v); }
// CHECK-LABEL: test_mat_mat_mulh
-// CHECK: %hlsl.mul = {{.*}}call {{.*}} <8 x half> @llvm.matrix.multiply.v8f16.v6f16.v12f16(<6 x half> %a, <12 x half> %b, i32 2, i32 3, i32 4)
-// CHECK: ret <8 x half> %hlsl.mul
+// ROWMAJOR: {{.*}} = {{.*}} call {{.*}} <6 x half> @llvm.matrix.transpose.v6f16(<6 x half> %{{.*}}, i32 3, i32 2)
+// ROWMAJOR: {{.*}} = {{.*}} call {{.*}} <12 x half> @llvm.matrix.transpose.v12f16(<12 x half> %{{.*}}, i32 4, i32 3)
+// CHECK: %hlsl.mul = {{.*}}call {{.*}} <8 x half> @llvm.matrix.multiply.v8f16.v6f16.v12f16(<6 x half> %{{.*}}, <12 x half> %{{.*}}, i32 2, i32 3, i32 4)
+// COLMAJOR: ret <8 x half> %hlsl.mul
+// ROWMAJOR: %[[TRANSPOSE_RES:.*]] = {{.*}} call {{.*}} <8 x half> @llvm.matrix.transpose.v8f16(<8 x half> %hlsl.mul, i32 2, i32 4)
+// ROWMAJOR: ret <8 x half> %[[TRANSPOSE_RES]]
export half2x4 test_mat_mat_mulh(half2x3 a, half3x4 b) { return mul(a, b); }
diff --git a/clang/test/CodeGenHLSL/builtins/transpose.hlsl b/clang/test/CodeGenHLSL/builtins/transpose.hlsl
index 9018e3913269d..b0bb99c70fda2 100644
--- a/clang/test/CodeGenHLSL/builtins/transpose.hlsl
+++ b/clang/test/CodeGenHLSL/builtins/transpose.hlsl
@@ -1,42 +1,62 @@
-// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.3-library -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s
-// RUN: %clang_cc1 -finclude-default-header -triple spirv-unknown-vulkan1.3-library -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s
+// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.3-library -emit-llvm -disable-llvm-passes -fmatrix-memory-layout=column-major -o - %s | FileCheck %s --check-prefixes=CHECK,COLMAJOR
+// RUN: %clang_cc1 -finclude-default-header -triple spirv-unknown-vulkan1.3-library -emit-llvm -disable-llvm-passes -fmatrix-memory-layout=column-major -o - %s | FileCheck %s --check-prefixes=CHECK,COLMAJOR
+// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.3-library -emit-llvm -disable-llvm-passes -fmatrix-memory-layout=row-major -o - %s | FileCheck %s --check-prefixes=CHECK,ROWMAJOR
+// RUN: %clang_cc1 -finclude-default-header -triple spirv-unknown-vulkan1.3-library -emit-llvm -disable-llvm-passes -fmatrix-memory-layout=row-major -o - %s | FileCheck %s --check-prefixes=CHECK,ROWMAJOR
// CHECK-LABEL: define {{.*}}test_transpose_bool2x3
-// CHECK: [[A_ADDR:%.*]] = alloca [3 x <2 x i32>], align 4
-// CHECK: [[A_EXT:%.*]] = zext <6 x i1> %{{.*}} to <6 x i32>
-// CHECK: store <6 x i32> [[A_EXT]], ptr [[A_ADDR]], align 4
-// CHECK: [[A:%.*]] = load <6 x i32>, ptr [[A_ADDR]], align 4
-// CHECK: [[TRANS:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[A]], i32 2, i32 3)
+// COLMAJOR: [[A_ADDR:%.*]] = alloca [3 x <2 x i32>], align 4
+// ROWMAJOR: [[A_ADDR:%.*]] = alloca [2 x <3 x i32>], align 4
+// CHECK: [[A_EXT:%.*]] = zext <6 x i1> %{{.*}} to <6 x i32>
+// CHECK: store <6 x i32> [[A_EXT]], ptr [[A_ADDR]], align 4
+// CHECK: [[A:%.*]] = load <6 x i32>, ptr [[A_ADDR]], align 4
+// COLMAJOR: [[TRANS:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[A]], i32 2, i32 3)
+// ROWMAJOR: [[LAYOUT:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[A]], i32 3, i32 2)
+// ROWMAJOR: [[TRANS:%.*]] = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[LAYOUT]], i32 2, i32 3)
+// ROWMAJOR: {{.*}} = call <6 x i32> @llvm.matrix.transpose.v6i32(<6 x i32> [[TRANS]], i32 3, i32 2)
bool3x2 test_transpose_bool2x3(bool2x3 a) {
return transpose(a);
}
// CHECK-LABEL: define {{.*}}test_transpose_int4x3
-// CHECK: [[A_ADDR:%.*]] = alloca [3 x <4 x i32>], align 4
-// CHECK: store <12 x i32> %{{.*}}, ptr [[A_ADDR]], align 4
-// CHECK: [[A:%.*]] = load <12 x i32>, ptr [[A_ADDR]], align 4
-// CHECK: [[TRANS:%.*]] = call <12 x i32> @llvm.matrix.transpose.v12i32(<12 x i32> [[A]], i32 4, i32 3)
-// CHECK: ret <12 x i32> [[TRANS]]
+// COLMAJOR: [[A_ADDR:%.*]] = alloca [3 x <4 x i32>], align 4
+// ROWMAJOR: [[A_ADDR:%.*]] = alloca [4 x <3 x i32>], align 4
+// CHECK: store <12 x i32> %{{.*}}, ptr [[A_ADDR]], align 4
+// CHECK: [[A:%.*]] = load <12 x i32>, ptr [[A_ADDR]], align 4
+// COLMAJOR: [[TRANS:%.*]] = call <12 x i32> @llvm.matrix.transpose.v12i32(<12 x i32> [[A]], i32 4, i32 3)
+// COLMAJOR: ret <12 x i32> [[TRANS]]
+// ROWMAJOR: [[LAYOUT:%.*]] = call <12 x i32> @llvm.matrix.transpose.v12i32(<12 x i32> [[A]], i32 3, i32 4)
+// ROWMAJOR: [[TRANS:%.*]] = call <12 x i32> @llvm.matrix.transpose.v12i32(<12 x i32> [[LAYOUT]], i32 4, i32 3)
+// ROWMAJOR: [[RESULT:%.*]] = call <12 x i32> @llvm.matrix.transpose.v12i32(<12 x i32> [[TRANS]], i32 3, i32 4)
+// ROWMAJOR: ret <12 x i32> [[RESULT]]
int3x4 test_transpose_int4x3(int4x3 a) {
return transpose(a);
}
// CHECK-LABEL: define {{.*}}test_transpose_float4x4
-// CHECK: [[A_ADDR:%.*]] = alloca [4 x <4 x float>], align 4
-// CHECK: store <16 x float> %{{.*}}, ptr [[A_ADDR]], align 4
-// CHECK: [[A:%.*]] = load <16 x float>, ptr [[A_ADDR]], align 4
-// CHECK: [[TRANS:%.*]] = call {{.*}}<16 x float> @llvm.matrix.transpose.v16f32(<16 x float> [[A]], i32 4, i32 4)
-// CHECK: ret <16 x float> [[TRANS]]
+// CHECK: [[A_ADDR:%.*]] = alloca [4 x <4 x float>], align 4
+// CHECK: store <16 x float> %{{.*}}, ptr [[A_ADDR]], align 4
+// CHECK: [[A:%.*]] = load <16 x float>, ptr [[A_ADDR]], align 4
+// COLMAJOR: [[TRANS:%.*]] = call {{.*}}<16 x float> @llvm.matrix.transpose.v16f32(<16 x float> [[A]], i32 4, i32 4)
+// COLMAJOR: ret <16 x float> [[TRANS]]
+// ROWMAJOR: [[LAYOUT:%.*]] = call {{.*}} <16 x float> @llvm.matrix.transpose.v16f32(<16 x float> [[A]], i32 4, i32 4)
+// ROWMAJOR: [[TRANS:%.*]] = call {{.*}} <16 x float> @llvm.matrix.transpose.v16f32(<16 x float> [[LAYOUT]], i32 4, i32 4)
+// ROWMAJOR: [[RESULT:%.*]] = call {{.*}} <16 x float> @llvm.matrix.transpose.v16f32(<16 x float> [[TRANS]], i32 4, i32 4)
+// ROWMAJOR: ret <16 x float> [[RESULT]]
float4x4 test_transpose_float4x4(float4x4 a) {
return transpose(a);
}
// CHECK-LABEL: define {{.*}}test_transpose_double1x4
-// CHECK: [[A_ADDR:%.*]] = alloca [4 x <1 x double>], align 8
-// CHECK: store <4 x double> %{{.*}}, ptr [[A_ADDR]], align 8
-// CHECK: [[A:%.*]] = load <4 x double>, ptr [[A_ADDR]], align 8
-// CHECK: [[TRANS:%.*]] = call {{.*}}<4 x double> @llvm.matrix.transpose.v4f64(<4 x double> [[A]], i32 1, i32 4)
-// CHECK: ret <4 x double> [[TRANS]]
+// COLMAJOR: [[A_ADDR:%.*]] = alloca [4 x <1 x double>], align 8
+// ROWMAJOR: [[A_ADDR:%.*]] = alloca [1 x <4 x double>], align 8
+// CHECK: store <4 x double> %{{.*}}, ptr [[A_ADDR]], align 8
+// CHECK: [[A:%.*]] = load <4 x double>, ptr [[A_ADDR]], align 8
+// COLMAJOR: [[TRANS:%.*]] = call {{.*}}<4 x double> @llvm.matrix.transpose.v4f64(<4 x double> [[A]], i32 1, i32 4)
+// COLMAJOR: ret <4 x double> [[TRANS]]
+// ROWMAJOR: [[LAYOUT:%.*]] = call {{.*}} <4 x double> @llvm.matrix.transpose.v4f64(<4 x double> [[A]], i32 4, i32 1)
+// ROWMAJOR: [[TRANS:%.*]] = call {{.*}} <4 x double> @llvm.matrix.transpose.v4f64(<4 x double> [[LAYOUT]], i32 1, i32 4)
+// ROWMAJOR: [[RESULT:%.*]] = call {{.*}} <4 x double> @llvm.matrix.transpose.v4f64(<4 x double> [[TRANS]], i32 4, i32 1)
+// ROWMAJOR: ret <4 x double> [[RESULT]]
double4x1 test_transpose_double1x4(double1x4 a) {
return transpose(a);
}
diff --git a/llvm/include/llvm/IR/MatrixBuilder.h b/llvm/include/llvm/IR/MatrixBuilder.h
index 5c891d644bfd2..b161e79a1cb6b 100644
--- a/llvm/include/llvm/IR/MatrixBuilder.h
+++ b/llvm/include/llvm/IR/MatrixBuilder.h
@@ -141,6 +141,22 @@ class MatrixBuilder {
return B.CreateCall(TheFn->getFunctionType(), TheFn, Ops, Name);
}
+ /// Create a column-major matrix from a row-major matrix with the given
+ /// logical dimensions by transposing it.
+ CallInst *CreateRowMajorToColumnMajorTransform(Value *Matrix, unsigned Rows,
+ unsigned Columns,
+ const Twine &Name = "") {
+ return CreateMatrixTranspose(Matrix, Columns, Rows, Name);
+ }
+
+ /// Create a row-major matrix from a column-major matrix with the given
+ /// logical dimensions by transposing it.
+ CallInst *CreateColumnMajorToRowMajorTransform(Value *Matrix, unsigned Rows,
+ unsigned Columns,
+ const Twine &Name = "") {
+ return CreateMatrixTranspose(Matrix, Rows, Columns, Name);
+ }
+
/// Insert a single element \p NewVal into \p Matrix at indices (\p RowIdx, \p
/// ColumnIdx).
Value *CreateMatrixInsert(Value *Matrix, Value *NewVal, Value *RowIdx,
More information about the llvm-commits
mailing list