[clang] [llvm] [HLSL][DirectX] Implement HLSL `mul` function and DXIL lowering of `llvm.matrix.multiply` (PR #184882)

Thu Mar 5 22:00:57 PST 2026

================
@@ -1054,6 +1055,68 @@ Value *CodeGenFunction::EmitHLSLBuiltinExpr(unsigned BuiltinID,
     Value *Mul = Builder.CreateNUWMul(M, A);
     return Builder.CreateNUWAdd(Mul, B);
   }
+  case Builtin::BI__builtin_hlsl_mul: {
+    Value *Op0 = EmitScalarExpr(E->getArg(0));
+    Value *Op1 = EmitScalarExpr(E->getArg(1));
+    QualType QTy0 = E->getArg(0)->getType();
+    QualType QTy1 = E->getArg(1)->getType();
+
+    bool IsVec0 = QTy0->isVectorType();
+    bool IsVec1 = QTy1->isVectorType();
+    bool IsMat0 = QTy0->isConstantMatrixType();
+    bool IsMat1 = QTy1->isConstantMatrixType();
+
+    if (IsVec0 && IsVec1) {
+      // Case 5: vector * vector -> scalar (dot product)
----------------
farzonl wrote:

It is possible there are performance improvements  with higher levels of optimization because clang builtins get ignored by most passes.

 That said codgen will be significantly worse with no optimization because the helper functions will not have been inlined.

https://github.com/llvm/llvm-project/pull/184882